2018-06-28 21:23:57 +02:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
#ifndef ROCKSDB_LITE
|
|
|
|
|
2018-06-28 21:23:57 +02:00
|
|
|
#include "db/db_test_util.h"
|
|
|
|
#include "port/stack_trace.h"
|
|
|
|
#include "rocksdb/perf_context.h"
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
#include "rocksdb/sst_file_manager.h"
|
2019-05-30 20:21:38 +02:00
|
|
|
#include "test_util/fault_injection_test_env.h"
|
2018-06-28 21:23:57 +02:00
|
|
|
#if !defined(ROCKSDB_LITE)
|
2019-05-30 20:21:38 +02:00
|
|
|
#include "test_util/sync_point.h"
|
2018-06-28 21:23:57 +02:00
|
|
|
#endif
|
|
|
|
|
|
|
|
namespace rocksdb {
|
|
|
|
|
|
|
|
class DBErrorHandlingTest : public DBTestBase {
|
|
|
|
public:
|
|
|
|
DBErrorHandlingTest() : DBTestBase("/db_error_handling_test") {}
|
2020-01-30 19:53:46 +01:00
|
|
|
|
|
|
|
std::string GetManifestNameFromLiveFiles() {
|
|
|
|
std::vector<std::string> live_files;
|
|
|
|
uint64_t manifest_size;
|
|
|
|
|
|
|
|
dbfull()->GetLiveFiles(live_files, &manifest_size, false);
|
|
|
|
for (auto& file : live_files) {
|
|
|
|
uint64_t num = 0;
|
|
|
|
FileType type;
|
|
|
|
if (ParseFileName(file, &num, &type) && type == kDescriptorFile) {
|
|
|
|
return file;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return "";
|
|
|
|
}
|
2018-06-28 21:23:57 +02:00
|
|
|
};
|
|
|
|
|
|
|
|
class DBErrorHandlingEnv : public EnvWrapper {
|
|
|
|
public:
|
|
|
|
DBErrorHandlingEnv() : EnvWrapper(Env::Default()),
|
|
|
|
trig_no_space(false), trig_io_error(false) {}
|
|
|
|
|
|
|
|
void SetTrigNoSpace() {trig_no_space = true;}
|
|
|
|
void SetTrigIoError() {trig_io_error = true;}
|
|
|
|
private:
|
|
|
|
bool trig_no_space;
|
|
|
|
bool trig_io_error;
|
|
|
|
};
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
class ErrorHandlerListener : public EventListener {
|
|
|
|
public:
|
|
|
|
ErrorHandlerListener()
|
|
|
|
: mutex_(),
|
|
|
|
cv_(&mutex_),
|
|
|
|
no_auto_recovery_(false),
|
|
|
|
recovery_complete_(false),
|
|
|
|
file_creation_started_(false),
|
|
|
|
override_bg_error_(false),
|
|
|
|
file_count_(0),
|
|
|
|
fault_env_(nullptr) {}
|
|
|
|
|
2018-09-17 22:08:13 +02:00
|
|
|
void OnTableFileCreationStarted(
|
|
|
|
const TableFileCreationBriefInfo& /*ti*/) override {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
file_creation_started_ = true;
|
|
|
|
if (file_count_ > 0) {
|
|
|
|
if (--file_count_ == 0) {
|
|
|
|
fault_env_->SetFilesystemActive(false, file_creation_error_);
|
|
|
|
file_creation_error_ = Status::OK();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
cv_.SignalAll();
|
|
|
|
}
|
|
|
|
|
|
|
|
void OnErrorRecoveryBegin(BackgroundErrorReason /*reason*/,
|
2018-09-17 22:08:13 +02:00
|
|
|
Status /*bg_error*/,
|
|
|
|
bool* auto_recovery) override {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
if (*auto_recovery && no_auto_recovery_) {
|
|
|
|
*auto_recovery = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-17 22:08:13 +02:00
|
|
|
void OnErrorRecoveryCompleted(Status /*old_bg_error*/) override {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
recovery_complete_ = true;
|
|
|
|
cv_.SignalAll();
|
|
|
|
}
|
|
|
|
|
|
|
|
bool WaitForRecovery(uint64_t /*abs_time_us*/) {
|
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
while (!recovery_complete_) {
|
|
|
|
cv_.Wait(/*abs_time_us*/);
|
|
|
|
}
|
|
|
|
if (recovery_complete_) {
|
|
|
|
recovery_complete_ = false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
void WaitForTableFileCreationStarted(uint64_t /*abs_time_us*/) {
|
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
while (!file_creation_started_) {
|
|
|
|
cv_.Wait(/*abs_time_us*/);
|
|
|
|
}
|
|
|
|
file_creation_started_ = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
void OnBackgroundError(BackgroundErrorReason /*reason*/,
|
|
|
|
Status* bg_error) override {
|
|
|
|
if (override_bg_error_) {
|
|
|
|
*bg_error = bg_error_;
|
|
|
|
override_bg_error_ = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void EnableAutoRecovery(bool enable = true) { no_auto_recovery_ = !enable; }
|
|
|
|
|
|
|
|
void OverrideBGError(Status bg_err) {
|
|
|
|
bg_error_ = bg_err;
|
|
|
|
override_bg_error_ = true;
|
|
|
|
}
|
|
|
|
|
|
|
|
void InjectFileCreationError(FaultInjectionTestEnv* env, int file_count,
|
|
|
|
Status s) {
|
|
|
|
fault_env_ = env;
|
|
|
|
file_count_ = file_count;
|
|
|
|
file_creation_error_ = s;
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
InstrumentedMutex mutex_;
|
|
|
|
InstrumentedCondVar cv_;
|
|
|
|
bool no_auto_recovery_;
|
|
|
|
bool recovery_complete_;
|
|
|
|
bool file_creation_started_;
|
|
|
|
bool override_bg_error_;
|
|
|
|
int file_count_;
|
|
|
|
Status file_creation_error_;
|
|
|
|
Status bg_error_;
|
|
|
|
FaultInjectionTestEnv* fault_env_;
|
|
|
|
};
|
|
|
|
|
2018-06-28 21:23:57 +02:00
|
|
|
TEST_F(DBErrorHandlingTest, FLushWriteError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
2018-06-28 21:23:57 +02:00
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.env = fault_env.get();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
options.listeners.emplace_back(listener);
|
2018-06-28 21:23:57 +02:00
|
|
|
Status s;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
2018-06-28 21:23:57 +02:00
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
Put(Key(0), "val");
|
2018-06-28 21:23:57 +02:00
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"FlushJob::Start", [&](void *) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2018-06-28 21:23:57 +02:00
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
2018-06-28 21:23:57 +02:00
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
2020-01-30 19:53:46 +01:00
|
|
|
TEST_F(DBErrorHandlingTest, ManifestWriteError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.env = fault_env.get();
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Flush();
|
|
|
|
Put(Key(1), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, DoubleManifestWriteError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.env = fault_env.get();
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Flush();
|
|
|
|
Put(Key(1), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
|
|
|
|
// This Resume() will attempt to create a new manifest file and fail again
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
// A successful Resume() will create a new manifest file
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, CompactionManifestWriteError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.env = fault_env.get();
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
std::atomic<bool> fail_manifest(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Put(Key(2), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
rocksdb::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
// Wait for flush of 2nd L0 file before starting compaction
|
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"},
|
|
|
|
// Wait for compaction to detect manifest write error
|
|
|
|
{"BackgroundCallCompaction:1",
|
|
|
|
"CompactionManifestWriteError:0"},
|
|
|
|
// Make compaction thread wait for error to be cleared
|
|
|
|
{"CompactionManifestWriteError:1",
|
|
|
|
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"},
|
|
|
|
// Wait for DB instance to clear bg_error before calling
|
|
|
|
// TEST_WaitForCompact
|
|
|
|
{"SstFileManagerImpl::ClearError",
|
|
|
|
"CompactionManifestWriteError:2"}});
|
|
|
|
// trigger manifest write failure in compaction thread
|
|
|
|
rocksdb::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void *) {
|
|
|
|
fail_manifest.store(true);
|
|
|
|
});
|
|
|
|
rocksdb::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
|
|
|
|
if (fail_manifest.load()) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
}
|
|
|
|
});
|
|
|
|
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
// This Flush will trigger a compaction, which will fail when appending to
|
|
|
|
// the manifest
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:0");
|
|
|
|
// Clear all errors so when the compaction is retried, it will succeed
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:1");
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:2");
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
|
|
|
rocksdb::SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
ASSERT_EQ("val", Get(Key(2)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
2018-06-28 21:23:57 +02:00
|
|
|
TEST_F(DBErrorHandlingTest, CompactionWriteError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
2018-06-28 21:23:57 +02:00
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
options.listeners.emplace_back(listener);
|
2018-06-28 21:23:57 +02:00
|
|
|
options.env = fault_env.get();
|
|
|
|
Status s;
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "va;");
|
|
|
|
Put(Key(2), "va;");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
listener->OverrideBGError(
|
|
|
|
Status(Status::NoSpace(), Status::Severity::kHardError)
|
|
|
|
);
|
|
|
|
listener->EnableAutoRecovery(false);
|
2018-06-28 21:23:57 +02:00
|
|
|
rocksdb::SyncPoint::GetInstance()->LoadDependency(
|
2019-12-12 23:05:48 +01:00
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"}});
|
2018-06-28 21:23:57 +02:00
|
|
|
rocksdb::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void *) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
});
|
|
|
|
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
|
2018-06-28 21:23:57 +02:00
|
|
|
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, CorruptionError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
options.env = fault_env.get();
|
|
|
|
Status s;
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "va;");
|
|
|
|
Put(Key(2), "va;");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
rocksdb::SyncPoint::GetInstance()->LoadDependency(
|
2019-12-12 23:05:48 +01:00
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"}});
|
2018-06-28 21:23:57 +02:00
|
|
|
rocksdb::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void *) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::Corruption("Corruption"));
|
|
|
|
});
|
|
|
|
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kUnrecoverableError);
|
|
|
|
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_NE(s, Status::OK());
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
TEST_F(DBErrorHandlingTest, AutoRecoverFlushError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.env = fault_env.get();
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
|
|
|
|
|
|
|
|
s = Put(Key(1), "val");
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, FailRecoverFlushError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.env = fault_env.get();
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
|
|
|
|
// We should be able to shutdown the database while auto recovery is going
|
|
|
|
// on in the background
|
|
|
|
Close();
|
|
|
|
DestroyDB(dbname_, options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, WALWriteError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.writable_file_max_buffer_size = 32768;
|
|
|
|
options.env = fault_env.get();
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
2018-09-17 22:08:13 +02:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 0; i<100; ++i) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(Key(i), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
int write_error = 0;
|
|
|
|
|
|
|
|
for (auto i = 100; i<199; ++i) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(Key(i), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->SetCallBack("WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
|
|
|
|
write_error++;
|
|
|
|
if (write_error > 2) {
|
|
|
|
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
}
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
s = dbfull()->Write(wopts, &batch);
|
|
|
|
ASSERT_EQ(s, s.NoSpace());
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
|
|
|
|
for (auto i=0; i<199; ++i) {
|
|
|
|
if (i < 100) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Reopen(options);
|
|
|
|
for (auto i=0; i<199; ++i) {
|
|
|
|
if (i < 100) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, MultiCFWALWriteError) {
|
|
|
|
std::unique_ptr<FaultInjectionTestEnv> fault_env(
|
|
|
|
new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.writable_file_max_buffer_size = 32768;
|
|
|
|
options.env = fault_env.get();
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
2018-09-17 22:08:13 +02:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
CreateAndReopenWithCF({"one", "two", "three"}, options);
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
for (auto j = 0; j < 100; ++j) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(handles_[i], Key(j), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
int write_error = 0;
|
|
|
|
|
|
|
|
// Write to one CF
|
|
|
|
for (auto i = 100; i < 199; ++i) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(handles_[2], Key(i), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
|
|
|
|
write_error++;
|
|
|
|
if (write_error > 2) {
|
|
|
|
fault_env->SetFilesystemActive(false,
|
|
|
|
Status::NoSpace("Out of space"));
|
|
|
|
}
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
s = dbfull()->Write(wopts, &batch);
|
|
|
|
ASSERT_EQ(s, s.NoSpace());
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_env->SetFilesystemActive(true);
|
|
|
|
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
|
|
|
|
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
// Every CF should have been flushed
|
|
|
|
ASSERT_EQ(NumTableFilesAtLevel(0, i), 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
for (auto j = 0; j < 199; ++j) {
|
|
|
|
if (j < 100) {
|
|
|
|
ASSERT_NE(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
ReopenWithColumnFamilies({"default", "one", "two", "three"}, options);
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
for (auto j = 0; j < 199; ++j) {
|
|
|
|
if (j < 100) {
|
|
|
|
ASSERT_NE(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, MultiDBCompactionError) {
|
|
|
|
FaultInjectionTestEnv* def_env = new FaultInjectionTestEnv(Env::Default());
|
|
|
|
std::vector<std::unique_ptr<FaultInjectionTestEnv>> fault_env;
|
|
|
|
std::vector<Options> options;
|
|
|
|
std::vector<std::shared_ptr<ErrorHandlerListener>> listener;
|
|
|
|
std::vector<DB*> db;
|
|
|
|
std::shared_ptr<SstFileManager> sfm(NewSstFileManager(def_env));
|
|
|
|
int kNumDbInstances = 3;
|
2018-09-17 22:08:13 +02:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
listener.emplace_back(new ErrorHandlerListener());
|
|
|
|
options.emplace_back(GetDefaultOptions());
|
|
|
|
fault_env.emplace_back(new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
options[i].create_if_missing = true;
|
|
|
|
options[i].level0_file_num_compaction_trigger = 2;
|
|
|
|
options[i].writable_file_max_buffer_size = 32768;
|
|
|
|
options[i].env = fault_env[i].get();
|
|
|
|
options[i].listeners.emplace_back(listener[i]);
|
|
|
|
options[i].sst_file_manager = sfm;
|
|
|
|
DB* dbptr;
|
|
|
|
char buf[16];
|
|
|
|
|
|
|
|
listener[i]->EnableAutoRecovery();
|
|
|
|
// Setup for returning error for the 3rd SST, which would be level 1
|
|
|
|
listener[i]->InjectFileCreationError(fault_env[i].get(), 3,
|
|
|
|
Status::NoSpace("Out of space"));
|
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
|
|
|
DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
ASSERT_EQ(DB::Open(options[i], dbname_ + std::string(buf), &dbptr),
|
|
|
|
Status::OK());
|
|
|
|
db.emplace_back(dbptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto j = 0; j <= 100; ++j) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(Key(j), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
// Write to one CF
|
|
|
|
for (auto j = 100; j < 199; ++j) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(Key(j), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
Status s = static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true);
|
|
|
|
ASSERT_EQ(s.severity(), Status::Severity::kSoftError);
|
|
|
|
fault_env[i]->SetFilesystemActive(true);
|
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(true);
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
std::string prop;
|
|
|
|
ASSERT_EQ(listener[i]->WaitForRecovery(5000000), true);
|
2020-02-03 22:30:13 +01:00
|
|
|
ASSERT_EQ(static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true),
|
|
|
|
Status::OK());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(0), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 0);
|
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(1), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
char buf[16];
|
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
|
|
|
delete db[i];
|
|
|
|
fault_env[i]->SetFilesystemActive(true);
|
|
|
|
if (getenv("KEEP_DB")) {
|
|
|
|
printf("DB is still at %s%s\n", dbname_.c_str(), buf);
|
|
|
|
} else {
|
|
|
|
Status s = DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
options.clear();
|
|
|
|
sfm.reset();
|
|
|
|
delete def_env;
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingTest, MultiDBVariousErrors) {
|
|
|
|
FaultInjectionTestEnv* def_env = new FaultInjectionTestEnv(Env::Default());
|
|
|
|
std::vector<std::unique_ptr<FaultInjectionTestEnv>> fault_env;
|
|
|
|
std::vector<Options> options;
|
|
|
|
std::vector<std::shared_ptr<ErrorHandlerListener>> listener;
|
|
|
|
std::vector<DB*> db;
|
|
|
|
std::shared_ptr<SstFileManager> sfm(NewSstFileManager(def_env));
|
|
|
|
int kNumDbInstances = 3;
|
2018-09-17 22:08:13 +02:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
listener.emplace_back(new ErrorHandlerListener());
|
|
|
|
options.emplace_back(GetDefaultOptions());
|
|
|
|
fault_env.emplace_back(new FaultInjectionTestEnv(Env::Default()));
|
|
|
|
options[i].create_if_missing = true;
|
|
|
|
options[i].level0_file_num_compaction_trigger = 2;
|
|
|
|
options[i].writable_file_max_buffer_size = 32768;
|
|
|
|
options[i].env = fault_env[i].get();
|
|
|
|
options[i].listeners.emplace_back(listener[i]);
|
|
|
|
options[i].sst_file_manager = sfm;
|
|
|
|
DB* dbptr;
|
|
|
|
char buf[16];
|
|
|
|
|
|
|
|
listener[i]->EnableAutoRecovery();
|
|
|
|
switch (i) {
|
|
|
|
case 0:
|
|
|
|
// Setup for returning error for the 3rd SST, which would be level 1
|
|
|
|
listener[i]->InjectFileCreationError(fault_env[i].get(), 3,
|
|
|
|
Status::NoSpace("Out of space"));
|
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
// Setup for returning error after the 1st SST, which would result
|
|
|
|
// in a hard error
|
|
|
|
listener[i]->InjectFileCreationError(fault_env[i].get(), 2,
|
|
|
|
Status::NoSpace("Out of space"));
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
|
|
|
DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
ASSERT_EQ(DB::Open(options[i], dbname_ + std::string(buf), &dbptr),
|
|
|
|
Status::OK());
|
|
|
|
db.emplace_back(dbptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto j = 0; j <= 100; ++j) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(Key(j), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
// Write to one CF
|
|
|
|
for (auto j = 100; j < 199; ++j) {
|
2018-09-17 22:08:13 +02:00
|
|
|
batch.Put(Key(j), RandomString(&rnd, 1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
if (i != 1) {
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::NoSpace());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
Status s = static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true);
|
|
|
|
switch (i) {
|
|
|
|
case 0:
|
|
|
|
ASSERT_EQ(s.severity(), Status::Severity::kSoftError);
|
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
ASSERT_EQ(s.severity(), Status::Severity::kHardError);
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
fault_env[i]->SetFilesystemActive(true);
|
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(true);
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
std::string prop;
|
|
|
|
if (i < 2) {
|
|
|
|
ASSERT_EQ(listener[i]->WaitForRecovery(5000000), true);
|
|
|
|
}
|
|
|
|
if (i == 1) {
|
|
|
|
ASSERT_EQ(static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true),
|
|
|
|
Status::OK());
|
|
|
|
}
|
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(0), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 0);
|
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(1), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
char buf[16];
|
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
|
|
|
fault_env[i]->SetFilesystemActive(true);
|
|
|
|
delete db[i];
|
|
|
|
if (getenv("KEEP_DB")) {
|
|
|
|
printf("DB is still at %s%s\n", dbname_.c_str(), buf);
|
|
|
|
} else {
|
|
|
|
DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
options.clear();
|
|
|
|
delete def_env;
|
|
|
|
}
|
|
|
|
|
2018-06-28 21:23:57 +02:00
|
|
|
} // namespace rocksdb
|
|
|
|
|
|
|
|
int main(int argc, char** argv) {
|
|
|
|
rocksdb::port::InstallStackTraceHandler();
|
|
|
|
::testing::InitGoogleTest(&argc, argv);
|
|
|
|
return RUN_ALL_TESTS();
|
|
|
|
}
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 22:36:19 +02:00
|
|
|
|
|
|
|
#else
|
|
|
|
#include <stdio.h>
|
|
|
|
|
|
|
|
int main(int /*argc*/, char** /*argv*/) {
|
|
|
|
fprintf(stderr, "SKIPPED as Cuckoo table is not supported in ROCKSDB_LITE\n");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif // ROCKSDB_LITE
|