2018-06-28 12:23:57 -07:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
#ifndef ROCKSDB_LITE
|
|
|
|
|
2018-06-28 12:23:57 -07:00
|
|
|
#include "db/db_test_util.h"
|
|
|
|
#include "port/stack_trace.h"
|
2020-03-04 12:30:34 -08:00
|
|
|
#include "rocksdb/io_status.h"
|
2018-06-28 12:23:57 -07:00
|
|
|
#include "rocksdb/perf_context.h"
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
#include "rocksdb/sst_file_manager.h"
|
2018-06-28 12:23:57 -07:00
|
|
|
#if !defined(ROCKSDB_LITE)
|
2019-05-30 11:21:38 -07:00
|
|
|
#include "test_util/sync_point.h"
|
2018-06-28 12:23:57 -07:00
|
|
|
#endif
|
2020-07-09 14:33:42 -07:00
|
|
|
#include "util/random.h"
|
|
|
|
#include "utilities/fault_injection_env.h"
|
|
|
|
#include "utilities/fault_injection_fs.h"
|
2018-06-28 12:23:57 -07:00
|
|
|
|
2020-02-20 12:07:53 -08:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2018-06-28 12:23:57 -07:00
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
class DBErrorHandlingFSTest : public DBTestBase {
|
2018-06-28 12:23:57 -07:00
|
|
|
public:
|
2020-08-17 18:41:20 -07:00
|
|
|
DBErrorHandlingFSTest()
|
|
|
|
: DBTestBase("/db_error_handling_fs_test", /*env_do_fsync=*/true) {}
|
2020-01-30 10:53:46 -08:00
|
|
|
|
|
|
|
std::string GetManifestNameFromLiveFiles() {
|
|
|
|
std::vector<std::string> live_files;
|
|
|
|
uint64_t manifest_size;
|
|
|
|
|
|
|
|
dbfull()->GetLiveFiles(live_files, &manifest_size, false);
|
|
|
|
for (auto& file : live_files) {
|
|
|
|
uint64_t num = 0;
|
|
|
|
FileType type;
|
|
|
|
if (ParseFileName(file, &num, &type) && type == kDescriptorFile) {
|
|
|
|
return file;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
return "";
|
|
|
|
}
|
2018-06-28 12:23:57 -07:00
|
|
|
};
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
class DBErrorHandlingFS : public FileSystemWrapper {
|
|
|
|
public:
|
|
|
|
DBErrorHandlingFS()
|
2020-03-23 21:50:42 -07:00
|
|
|
: FileSystemWrapper(FileSystem::Default()),
|
2020-03-04 12:30:34 -08:00
|
|
|
trig_no_space(false),
|
|
|
|
trig_io_error(false) {}
|
|
|
|
|
|
|
|
void SetTrigNoSpace() { trig_no_space = true; }
|
|
|
|
void SetTrigIoError() { trig_io_error = true; }
|
2018-06-28 12:23:57 -07:00
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
private:
|
|
|
|
bool trig_no_space;
|
|
|
|
bool trig_io_error;
|
2018-06-28 12:23:57 -07:00
|
|
|
};
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
class ErrorHandlerFSListener : public EventListener {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
public:
|
2020-03-04 12:30:34 -08:00
|
|
|
ErrorHandlerFSListener()
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
: mutex_(),
|
|
|
|
cv_(&mutex_),
|
|
|
|
no_auto_recovery_(false),
|
|
|
|
recovery_complete_(false),
|
|
|
|
file_creation_started_(false),
|
|
|
|
override_bg_error_(false),
|
|
|
|
file_count_(0),
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs_(nullptr) {}
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
|
2018-09-17 13:08:13 -07:00
|
|
|
void OnTableFileCreationStarted(
|
|
|
|
const TableFileCreationBriefInfo& /*ti*/) override {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
file_creation_started_ = true;
|
|
|
|
if (file_count_ > 0) {
|
|
|
|
if (--file_count_ == 0) {
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs_->SetFilesystemActive(false, file_creation_error_);
|
|
|
|
file_creation_error_ = IOStatus::OK();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
cv_.SignalAll();
|
|
|
|
}
|
|
|
|
|
|
|
|
void OnErrorRecoveryBegin(BackgroundErrorReason /*reason*/,
|
2020-03-04 12:30:34 -08:00
|
|
|
Status /*bg_error*/, bool* auto_recovery) override {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
if (*auto_recovery && no_auto_recovery_) {
|
|
|
|
*auto_recovery = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-09-17 13:08:13 -07:00
|
|
|
void OnErrorRecoveryCompleted(Status /*old_bg_error*/) override {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
recovery_complete_ = true;
|
|
|
|
cv_.SignalAll();
|
|
|
|
}
|
|
|
|
|
|
|
|
bool WaitForRecovery(uint64_t /*abs_time_us*/) {
|
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
while (!recovery_complete_) {
|
|
|
|
cv_.Wait(/*abs_time_us*/);
|
|
|
|
}
|
|
|
|
if (recovery_complete_) {
|
|
|
|
recovery_complete_ = false;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
void WaitForTableFileCreationStarted(uint64_t /*abs_time_us*/) {
|
|
|
|
InstrumentedMutexLock l(&mutex_);
|
|
|
|
while (!file_creation_started_) {
|
|
|
|
cv_.Wait(/*abs_time_us*/);
|
|
|
|
}
|
|
|
|
file_creation_started_ = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
void OnBackgroundError(BackgroundErrorReason /*reason*/,
|
|
|
|
Status* bg_error) override {
|
|
|
|
if (override_bg_error_) {
|
|
|
|
*bg_error = bg_error_;
|
|
|
|
override_bg_error_ = false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void EnableAutoRecovery(bool enable = true) { no_auto_recovery_ = !enable; }
|
|
|
|
|
|
|
|
void OverrideBGError(Status bg_err) {
|
|
|
|
bg_error_ = bg_err;
|
|
|
|
override_bg_error_ = true;
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
void InjectFileCreationError(FaultInjectionTestFS* fs, int file_count,
|
|
|
|
IOStatus io_s) {
|
|
|
|
fault_fs_ = fs;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
file_count_ = file_count;
|
2020-03-04 12:30:34 -08:00
|
|
|
file_creation_error_ = io_s;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
InstrumentedMutex mutex_;
|
|
|
|
InstrumentedCondVar cv_;
|
|
|
|
bool no_auto_recovery_;
|
|
|
|
bool recovery_complete_;
|
|
|
|
bool file_creation_started_;
|
|
|
|
bool override_bg_error_;
|
|
|
|
int file_count_;
|
2020-03-04 12:30:34 -08:00
|
|
|
IOStatus file_creation_error_;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
Status bg_error_;
|
2020-03-04 12:30:34 -08:00
|
|
|
FaultInjectionTestFS* fault_fs_;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
};
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, FLushWriteError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
2018-06-28 12:23:57 -07:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
2018-06-28 12:23:57 -07:00
|
|
|
options.create_if_missing = true;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.listeners.emplace_back(listener);
|
2018-06-28 12:23:57 -07:00
|
|
|
Status s;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
2018-06-28 12:23:57 -07:00
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
Put(Key(0), "val");
|
2020-03-04 12:30:34 -08:00
|
|
|
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
|
|
|
|
fault_fs->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
|
2018-06-28 12:23:57 -07:00
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
2018-06-28 12:23:57 -07:00
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
2018-06-28 12:23:57 -07:00
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, FLushWritRetryableeError) {
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
2020-03-28 19:05:54 -07:00
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
2020-07-15 11:02:44 -07:00
|
|
|
options.max_bgerror_resume_count = 0;
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(1), "val1");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeFinishBuildTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeSyncTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val2", Get(Key(2)));
|
|
|
|
|
|
|
|
Put(Key(3), "val3");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeCloseTableFile",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val3", Get(Key(3)));
|
|
|
|
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, ManifestWriteError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
2020-01-30 10:53:46 -08:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
2020-01-30 10:53:46 -08:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Flush();
|
|
|
|
Put(Key(1), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
2020-03-04 12:30:34 -08:00
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
|
|
|
|
fault_fs->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
|
|
|
|
});
|
2020-01-30 10:53:46 -08:00
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
2020-01-30 10:53:46 -08:00
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
2020-01-30 10:53:46 -08:00
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, ManifestWriteRetryableError) {
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
2020-03-28 19:05:54 -07:00
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
2020-07-15 11:02:44 -07:00
|
|
|
options.max_bgerror_resume_count = 0;
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Flush();
|
|
|
|
Put(Key(1), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, DoubleManifestWriteError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
2020-01-30 10:53:46 -08:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
2020-01-30 10:53:46 -08:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Flush();
|
|
|
|
Put(Key(1), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
2020-03-04 12:30:34 -08:00
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
|
|
|
|
fault_fs->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
|
|
|
|
});
|
2020-01-30 10:53:46 -08:00
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
2020-01-30 10:53:46 -08:00
|
|
|
|
|
|
|
// This Resume() will attempt to create a new manifest file and fail again
|
|
|
|
s = dbfull()->Resume();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
2020-01-30 10:53:46 -08:00
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
// A successful Resume() will create a new manifest file
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, CompactionManifestWriteError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
2020-01-30 10:53:46 -08:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
2020-01-30 10:53:46 -08:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
std::atomic<bool> fail_manifest(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Put(Key(2), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
2020-01-30 10:53:46 -08:00
|
|
|
// Wait for flush of 2nd L0 file before starting compaction
|
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"},
|
2020-02-20 12:07:53 -08:00
|
|
|
// Wait for compaction to detect manifest write error
|
|
|
|
{"BackgroundCallCompaction:1", "CompactionManifestWriteError:0"},
|
|
|
|
// Make compaction thread wait for error to be cleared
|
2020-01-30 10:53:46 -08:00
|
|
|
{"CompactionManifestWriteError:1",
|
|
|
|
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"},
|
2020-02-20 12:07:53 -08:00
|
|
|
// Wait for DB instance to clear bg_error before calling
|
|
|
|
// TEST_WaitForCompact
|
|
|
|
{"SstFileManagerImpl::ErrorCleared", "CompactionManifestWriteError:2"}});
|
2020-01-30 10:53:46 -08:00
|
|
|
// trigger manifest write failure in compaction thread
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void*) { fail_manifest.store(true); });
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
|
|
|
|
if (fail_manifest.load()) {
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(false,
|
|
|
|
IOStatus::NoSpace("Out of space"));
|
2020-02-20 12:07:53 -08:00
|
|
|
}
|
2020-01-30 10:53:46 -08:00
|
|
|
});
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
2020-01-30 10:53:46 -08:00
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
// This Flush will trigger a compaction, which will fail when appending to
|
|
|
|
// the manifest
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:0");
|
|
|
|
// Clear all errors so when the compaction is retried, it will succeed
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
|
2020-01-30 10:53:46 -08:00
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:1");
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:2");
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
|
2020-01-30 10:53:46 -08:00
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
ASSERT_EQ("val", Get(Key(2)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, CompactionManifestWriteRetryableError) {
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
2020-03-28 19:05:54 -07:00
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
options.listeners.emplace_back(listener);
|
2020-07-15 11:02:44 -07:00
|
|
|
options.max_bgerror_resume_count = 0;
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
2020-04-02 18:06:26 -07:00
|
|
|
std::atomic<bool> fail_manifest(false);
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Put(Key(2), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
|
|
|
|
listener->EnableAutoRecovery(false);
|
2020-04-02 18:06:26 -07:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
// Wait for flush of 2nd L0 file before starting compaction
|
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"},
|
|
|
|
// Wait for compaction to detect manifest write error
|
|
|
|
{"BackgroundCallCompaction:1", "CompactionManifestWriteError:0"},
|
|
|
|
// Make compaction thread wait for error to be cleared
|
|
|
|
{"CompactionManifestWriteError:1",
|
|
|
|
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"}});
|
|
|
|
// trigger manifest write failure in compaction thread
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
2020-04-02 18:06:26 -07:00
|
|
|
"BackgroundCallCompaction:0", [&](void*) { fail_manifest.store(true); });
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
|
|
|
|
if (fail_manifest.load()) {
|
Revamp cache_bench to resemble a real workload (#6629)
Summary:
I suspect LRUCache could use some optimization, and to support
such an effort, a good benchmarking tool is needed. The existing
cache_bench was heavily skewed toward insertion and lookup misses, and
did not saturate memory with other work. This change should improve
those things to better resemble a real workload.
(All below using clang compiler, for some consistency, but not
necessarily same version and settings.)
The real workload is from production MySQL on RocksDB, filtering stacks
containing "LRU", "ShardedCache" or "CacheShard."
Lookup inclusive: 66%
Insert inclusive: 17%
Release inclusive: 15%
An alternate simulated workload is MySQL running a LinkBench read test:
Lookup inclusive: 54%
Insert inclusive: 24%
Release inclusive: 21%
cache_bench default settings, prior to this change:
Lookup inclusive: 35.8%
Insert inclusive: 63.6%
Release inclusive: 0%
cache_bench after this change (intended as somewhat "tighter" workload
than average production, more like LinkBench):
Lookup inclusive: 52%
Insert inclusive: 20%
Release inclusive: 26%
And top exclusive stacks (portion of stack samples as filtered above):
Production MySQL:
LRUHandleTable::FindPointer: 25.3%
rocksdb::operator==: 15.1% <-- Slice ==
LRUCacheShard::LRU_Remove: 13.8%
ShardedCache::Lookup: 8.9%
__pthread_mutex_lock: 7.1%
LRUCacheShard::LRU_Insert: 6.3%
MurmurHash64A: 4.8% <-- Since upgraded to XXH3p
...
Old cache_bench:
LRUHandleTable::FindPointer: 23.6%
__pthread_mutex_lock: 15.0%
__pthread_mutex_unlock_usercnt: 11.7%
__lll_lock_wait: 8.6%
__lll_unlock_wake: 6.8%
LRUCacheShard::LRU_Insert: 6.0%
ShardedCache::Lookup: 4.4%
LRUCacheShard::LRU_Remove: 2.8%
...
rocksdb::operator==: 0.2% <-- Slice ==
...
New cache_bench:
LRUHandleTable::FindPointer: 22.8%
__pthread_mutex_unlock_usercnt: 14.3%
rocksdb::operator==: 10.5% <-- Slice ==
LRUCacheShard::LRU_Insert: 9.0%
__pthread_mutex_lock: 5.9%
LRUCacheShard::LRU_Remove: 5.0%
...
ShardedCache::Lookup: 2.9%
...
So there's a bit more lock contention in the benchmark than in
production, but otherwise looks similar enough to me. At least it's a
big improvement over the existing code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6629
Test Plan: No production code changes, ran cache_bench with ASAN
Reviewed By: ltamasi
Differential Revision: D20824318
Pulled By: pdillinger
fbshipit-source-id: 6f8dc5891ead0f87edbed3a615ecd5289d9abe12
2020-04-03 10:24:09 -07:00
|
|
|
fault_fs->SetFilesystemActive(false, error_msg);
|
|
|
|
}
|
|
|
|
});
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
2020-04-02 18:06:26 -07:00
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:0");
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteError:1");
|
|
|
|
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
ASSERT_EQ("val", Get(Key(2)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, CompactionWriteError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
2018-06-28 12:23:57 -07:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
2018-06-28 12:23:57 -07:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.listeners.emplace_back(listener);
|
2018-06-28 12:23:57 -07:00
|
|
|
Status s;
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "va;");
|
|
|
|
Put(Key(2), "va;");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
listener->OverrideBGError(
|
2020-03-04 12:30:34 -08:00
|
|
|
Status(Status::NoSpace(), Status::Severity::kHardError));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
listener->EnableAutoRecovery(false);
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
2019-12-12 14:05:48 -08:00
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"}});
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void*) {
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
|
2018-06-28 12:23:57 -07:00
|
|
|
});
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
2018-06-28 12:23:57 -07:00
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
2018-06-28 12:23:57 -07:00
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
2018-06-28 12:23:57 -07:00
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, CompactionWriteRetryableError) {
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
2020-03-28 19:05:54 -07:00
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
options.listeners.emplace_back(listener);
|
2020-07-15 11:02:44 -07:00
|
|
|
options.max_bgerror_resume_count = 0;
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
Status s;
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(0), "va;");
|
|
|
|
Put(Key(2), "va;");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"}});
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
2020-07-15 11:02:44 -07:00
|
|
|
"CompactionJob::OpenCompactionOutputFile",
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
2020-07-15 11:02:44 -07:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, CorruptionError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2018-06-28 12:23:57 -07:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
2018-06-28 12:23:57 -07:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
Status s;
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "va;");
|
|
|
|
Put(Key(2), "va;");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
2019-12-12 14:05:48 -08:00
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"}});
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void*) {
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(false,
|
|
|
|
IOStatus::Corruption("Corruption"));
|
2018-06-28 12:23:57 -07:00
|
|
|
});
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
2018-06-28 12:23:57 -07:00
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(),
|
|
|
|
ROCKSDB_NAMESPACE::Status::Severity::kUnrecoverableError);
|
2018-06-28 12:23:57 -07:00
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
2018-06-28 12:23:57 -07:00
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_NE(s, Status::OK());
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, AutoRecoverFlushError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
|
|
|
|
|
|
|
|
s = Put(Key(1), "val");
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, FailRecoverFlushError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
2020-02-20 12:07:53 -08:00
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
// We should be able to shutdown the database while auto recovery is going
|
|
|
|
// on in the background
|
|
|
|
Close();
|
|
|
|
DestroyDB(dbname_, options);
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, WALWriteError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.writable_file_max_buffer_size = 32768;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
2018-09-17 13:08:13 -07:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
for (auto i = 0; i < 100; ++i) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
int write_error = 0;
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
for (auto i = 100; i < 199; ++i) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
|
|
|
|
write_error++;
|
|
|
|
if (write_error > 2) {
|
|
|
|
fault_fs->SetFilesystemActive(false,
|
|
|
|
IOStatus::NoSpace("Out of space"));
|
|
|
|
}
|
|
|
|
});
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
s = dbfull()->Write(wopts, &batch);
|
|
|
|
ASSERT_EQ(s, s.NoSpace());
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
|
2020-03-04 12:30:34 -08:00
|
|
|
for (auto i = 0; i < 199; ++i) {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
if (i < 100) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Reopen(options);
|
2020-03-04 12:30:34 -08:00
|
|
|
for (auto i = 0; i < 199; ++i) {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
if (i < 100) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, WALWriteRetryableError) {
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
2020-03-28 19:05:54 -07:00
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.writable_file_max_buffer_size = 32768;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.paranoid_checks = true;
|
2020-07-15 11:02:44 -07:00
|
|
|
options.max_bgerror_resume_count = 0;
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
Status s;
|
|
|
|
Random rnd(301);
|
|
|
|
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
// For the first batch, write is successful, require sync
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 0; i < 100; ++i) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
// For the second batch, the first 2 file Append are successful, then the
|
|
|
|
// following Append fails due to file system retryable IOError.
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
int write_error = 0;
|
|
|
|
|
|
|
|
for (auto i = 100; i < 200; ++i) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
|
|
|
|
write_error++;
|
|
|
|
if (write_error > 2) {
|
|
|
|
fault_fs->SetFilesystemActive(false, error_msg);
|
|
|
|
}
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
s = dbfull()->Write(wopts, &batch);
|
|
|
|
ASSERT_EQ(true, s.IsIOError());
|
|
|
|
}
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
// Data in corrupted WAL are not stored
|
|
|
|
for (auto i = 0; i < 199; ++i) {
|
|
|
|
if (i < 100) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Resume and write a new batch, should be in the WAL
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 200; i < 300; ++i) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.
The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487
Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check
Reviewed By: anand1976
Differential Revision: D20685017
Pulled By: zhichao-cao
fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:03:05 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
for (auto i = 0; i < 300; ++i) {
|
|
|
|
if (i < 100 || i >= 200) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, MultiCFWALWriteError) {
|
2020-03-23 21:50:42 -07:00
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
2020-03-04 12:30:34 -08:00
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
Options options = GetDefaultOptions();
|
2020-03-23 21:50:42 -07:00
|
|
|
options.env = fault_fs_env.get();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.create_if_missing = true;
|
|
|
|
options.writable_file_max_buffer_size = 32768;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
2018-09-17 13:08:13 -07:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
|
|
|
|
listener->EnableAutoRecovery();
|
|
|
|
CreateAndReopenWithCF({"one", "two", "three"}, options);
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
for (auto j = 0; j < 100; ++j) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(handles_[i], Key(j), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
int write_error = 0;
|
|
|
|
|
|
|
|
// Write to one CF
|
|
|
|
for (auto i = 100; i < 199; ++i) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(handles_[2], Key(i), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
|
|
|
|
write_error++;
|
|
|
|
if (write_error > 2) {
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(false,
|
|
|
|
IOStatus::NoSpace("Out of space"));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
s = dbfull()->Write(wopts, &batch);
|
|
|
|
ASSERT_EQ(s, s.NoSpace());
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
|
|
|
|
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
// Every CF should have been flushed
|
|
|
|
ASSERT_EQ(NumTableFilesAtLevel(0, i), 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
for (auto j = 0; j < 199; ++j) {
|
|
|
|
if (j < 100) {
|
|
|
|
ASSERT_NE(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
ReopenWithColumnFamilies({"default", "one", "two", "three"}, options);
|
|
|
|
for (auto i = 1; i < 4; ++i) {
|
|
|
|
for (auto j = 0; j < 199; ++j) {
|
|
|
|
if (j < 100) {
|
|
|
|
ASSERT_NE(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(i, Key(j)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, MultiDBCompactionError) {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
FaultInjectionTestEnv* def_env = new FaultInjectionTestEnv(Env::Default());
|
2020-03-23 21:50:42 -07:00
|
|
|
std::vector<std::unique_ptr<Env>> fault_envs;
|
2020-03-04 12:30:34 -08:00
|
|
|
std::vector<FaultInjectionTestFS*> fault_fs;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
std::vector<Options> options;
|
2020-03-04 12:30:34 -08:00
|
|
|
std::vector<std::shared_ptr<ErrorHandlerFSListener>> listener;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
std::vector<DB*> db;
|
|
|
|
std::shared_ptr<SstFileManager> sfm(NewSstFileManager(def_env));
|
|
|
|
int kNumDbInstances = 3;
|
2018-09-17 13:08:13 -07:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
2020-03-04 12:30:34 -08:00
|
|
|
listener.emplace_back(new ErrorHandlerFSListener());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.emplace_back(GetDefaultOptions());
|
2020-03-23 21:50:42 -07:00
|
|
|
fault_fs.emplace_back(new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::shared_ptr<FileSystem> fs(fault_fs.back());
|
|
|
|
fault_envs.emplace_back(new CompositeEnvWrapper(def_env, fs));
|
|
|
|
options[i].env = fault_envs.back().get();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options[i].create_if_missing = true;
|
|
|
|
options[i].level0_file_num_compaction_trigger = 2;
|
|
|
|
options[i].writable_file_max_buffer_size = 32768;
|
|
|
|
options[i].listeners.emplace_back(listener[i]);
|
|
|
|
options[i].sst_file_manager = sfm;
|
|
|
|
DB* dbptr;
|
|
|
|
char buf[16];
|
|
|
|
|
|
|
|
listener[i]->EnableAutoRecovery();
|
|
|
|
// Setup for returning error for the 3rd SST, which would be level 1
|
2020-03-04 12:30:34 -08:00
|
|
|
listener[i]->InjectFileCreationError(fault_fs[i], 3,
|
|
|
|
IOStatus::NoSpace("Out of space"));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
|
|
|
DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
ASSERT_EQ(DB::Open(options[i], dbname_ + std::string(buf), &dbptr),
|
|
|
|
Status::OK());
|
|
|
|
db.emplace_back(dbptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto j = 0; j <= 100; ++j) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(j), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
// Write to one CF
|
|
|
|
for (auto j = 100; j < 199; ++j) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(j), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
Status s = static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true);
|
|
|
|
ASSERT_EQ(s.severity(), Status::Severity::kSoftError);
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs[i]->SetFilesystemActive(true);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(true);
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
std::string prop;
|
|
|
|
ASSERT_EQ(listener[i]->WaitForRecovery(5000000), true);
|
2020-02-03 13:30:13 -08:00
|
|
|
ASSERT_EQ(static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true),
|
|
|
|
Status::OK());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(0), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 0);
|
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(1), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
char buf[16];
|
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
|
|
|
delete db[i];
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs[i]->SetFilesystemActive(true);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
if (getenv("KEEP_DB")) {
|
|
|
|
printf("DB is still at %s%s\n", dbname_.c_str(), buf);
|
|
|
|
} else {
|
|
|
|
Status s = DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
options.clear();
|
|
|
|
sfm.reset();
|
|
|
|
delete def_env;
|
|
|
|
}
|
|
|
|
|
2020-03-04 12:30:34 -08:00
|
|
|
TEST_F(DBErrorHandlingFSTest, MultiDBVariousErrors) {
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
FaultInjectionTestEnv* def_env = new FaultInjectionTestEnv(Env::Default());
|
2020-03-23 21:50:42 -07:00
|
|
|
std::vector<std::unique_ptr<Env>> fault_envs;
|
2020-03-04 12:30:34 -08:00
|
|
|
std::vector<FaultInjectionTestFS*> fault_fs;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
std::vector<Options> options;
|
2020-03-04 12:30:34 -08:00
|
|
|
std::vector<std::shared_ptr<ErrorHandlerFSListener>> listener;
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
std::vector<DB*> db;
|
|
|
|
std::shared_ptr<SstFileManager> sfm(NewSstFileManager(def_env));
|
|
|
|
int kNumDbInstances = 3;
|
2018-09-17 13:08:13 -07:00
|
|
|
Random rnd(301);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
2020-03-04 12:30:34 -08:00
|
|
|
listener.emplace_back(new ErrorHandlerFSListener());
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options.emplace_back(GetDefaultOptions());
|
2020-03-23 21:50:42 -07:00
|
|
|
fault_fs.emplace_back(new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::shared_ptr<FileSystem> fs(fault_fs.back());
|
|
|
|
fault_envs.emplace_back(new CompositeEnvWrapper(def_env, fs));
|
|
|
|
options[i].env = fault_envs.back().get();
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
options[i].create_if_missing = true;
|
|
|
|
options[i].level0_file_num_compaction_trigger = 2;
|
|
|
|
options[i].writable_file_max_buffer_size = 32768;
|
|
|
|
options[i].listeners.emplace_back(listener[i]);
|
|
|
|
options[i].sst_file_manager = sfm;
|
|
|
|
DB* dbptr;
|
|
|
|
char buf[16];
|
|
|
|
|
|
|
|
listener[i]->EnableAutoRecovery();
|
|
|
|
switch (i) {
|
|
|
|
case 0:
|
|
|
|
// Setup for returning error for the 3rd SST, which would be level 1
|
2020-03-04 12:30:34 -08:00
|
|
|
listener[i]->InjectFileCreationError(fault_fs[i], 3,
|
|
|
|
IOStatus::NoSpace("Out of space"));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
// Setup for returning error after the 1st SST, which would result
|
|
|
|
// in a hard error
|
2020-03-04 12:30:34 -08:00
|
|
|
listener[i]->InjectFileCreationError(fault_fs[i], 2,
|
|
|
|
IOStatus::NoSpace("Out of space"));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
break;
|
|
|
|
default:
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
|
|
|
DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
ASSERT_EQ(DB::Open(options[i], dbname_ + std::string(buf), &dbptr),
|
|
|
|
Status::OK());
|
|
|
|
db.emplace_back(dbptr);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto j = 0; j <= 100; ++j) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(j), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
// Write to one CF
|
|
|
|
for (auto j = 100; j < 199; ++j) {
|
2020-07-09 14:33:42 -07:00
|
|
|
batch.Put(Key(j), rnd.RandomString(1024));
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(db[i]->Write(wopts, &batch), Status::OK());
|
|
|
|
if (i != 1) {
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::OK());
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(db[i]->Flush(FlushOptions()), Status::NoSpace());
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
Status s = static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true);
|
|
|
|
switch (i) {
|
|
|
|
case 0:
|
|
|
|
ASSERT_EQ(s.severity(), Status::Severity::kSoftError);
|
|
|
|
break;
|
|
|
|
case 1:
|
|
|
|
ASSERT_EQ(s.severity(), Status::Severity::kHardError);
|
|
|
|
break;
|
|
|
|
case 2:
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
break;
|
|
|
|
}
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs[i]->SetFilesystemActive(true);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
def_env->SetFilesystemActive(true);
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
std::string prop;
|
|
|
|
if (i < 2) {
|
|
|
|
ASSERT_EQ(listener[i]->WaitForRecovery(5000000), true);
|
|
|
|
}
|
|
|
|
if (i == 1) {
|
|
|
|
ASSERT_EQ(static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true),
|
|
|
|
Status::OK());
|
|
|
|
}
|
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(0), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 0);
|
|
|
|
EXPECT_TRUE(db[i]->GetProperty(
|
|
|
|
"rocksdb.num-files-at-level" + NumberToString(1), &prop));
|
|
|
|
EXPECT_EQ(atoi(prop.c_str()), 1);
|
|
|
|
}
|
|
|
|
|
|
|
|
for (auto i = 0; i < kNumDbInstances; ++i) {
|
|
|
|
char buf[16];
|
|
|
|
snprintf(buf, sizeof(buf), "_%d", i);
|
2020-03-04 12:30:34 -08:00
|
|
|
fault_fs[i]->SetFilesystemActive(true);
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
delete db[i];
|
|
|
|
if (getenv("KEEP_DB")) {
|
|
|
|
printf("DB is still at %s%s\n", dbname_.c_str(), buf);
|
|
|
|
} else {
|
|
|
|
DestroyDB(dbname_ + std::string(buf), options[i]);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
options.clear();
|
|
|
|
delete def_env;
|
|
|
|
}
|
2020-07-15 11:02:44 -07:00
|
|
|
|
2020-07-17 23:26:07 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, DISABLED_FLushWritRetryableeErrorAutoRecover1) {
|
2020-07-15 11:02:44 -07:00
|
|
|
// Fail the first resume and make the second resume successful
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 100000; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(1), "val1");
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"RecoverFromRetryableBGIOError:BeforeWait0",
|
|
|
|
"FLushWritRetryableeErrorAutoRecover1:0"},
|
|
|
|
{"FLushWritRetryableeErrorAutoRecover1:1",
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeWait1"},
|
|
|
|
{"RecoverFromRetryableBGIOError:RecoverSuccess",
|
|
|
|
"FLushWritRetryableeErrorAutoRecover1:2"}});
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeFinishBuildTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover1:0");
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover1:1");
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover1:2");
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
ASSERT_EQ("val2", Get(Key(2)));
|
|
|
|
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest, FLushWritRetryableeErrorAutoRecover2) {
|
|
|
|
// Activate the FS before the first resume
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 100000; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(1), "val1");
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeFinishBuildTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
2020-07-17 23:26:07 -07:00
|
|
|
|
2020-07-15 11:02:44 -07:00
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
2020-07-17 23:26:07 -07:00
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
|
2020-07-15 11:02:44 -07:00
|
|
|
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
ASSERT_EQ("val2", Get(Key(2)));
|
|
|
|
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest, FLushWritRetryableeErrorAutoRecover3) {
|
|
|
|
// Fail all the resume and let user to resume
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 100000; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(1), "val1");
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"FLushWritRetryableeErrorAutoRecover3:0",
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeStart"},
|
|
|
|
{"RecoverFromRetryableBGIOError:LoopOut",
|
|
|
|
"FLushWritRetryableeErrorAutoRecover3:1"}});
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeFinishBuildTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover3:0");
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover3:1");
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
// Auto resume fails due to FS does not recover during resume. User call
|
|
|
|
// resume manually here.
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
ASSERT_EQ("val2", Get(Key(2)));
|
|
|
|
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
2020-07-17 23:26:07 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, DISABLED_FLushWritRetryableeErrorAutoRecover4) {
|
2020-07-15 11:02:44 -07:00
|
|
|
// Fail the first resume and does not do resume second time because
|
|
|
|
// the IO error severity is Fatal Error and not Retryable.
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 10; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
IOStatus nr_msg = IOStatus::IOError("No Retryable Fatal IO Error");
|
|
|
|
nr_msg.SetRetryable(false);
|
|
|
|
|
|
|
|
Put(Key(1), "val1");
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"RecoverFromRetryableBGIOError:BeforeStart",
|
|
|
|
"FLushWritRetryableeErrorAutoRecover4:0"},
|
|
|
|
{"FLushWritRetryableeErrorAutoRecover4:2",
|
|
|
|
"RecoverFromRetryableBGIOError:RecoverFail0"}});
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeFinishBuildTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeResume1",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, nr_msg); });
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover4:0");
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover4:2");
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
// Even the FS is recoverd, due to the Fatal Error in bg_error_ the resume
|
|
|
|
// and flush will all fail.
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
s = dbfull()->Resume();
|
|
|
|
ASSERT_NE(s, Status::OK());
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_NE(s, Status::OK());
|
|
|
|
ASSERT_EQ("NOT_FOUND", Get(Key(2)));
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
ASSERT_EQ("val2", Get(Key(2)));
|
|
|
|
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
2020-07-17 23:26:07 -07:00
|
|
|
TEST_F(DBErrorHandlingFSTest, DISABLED_FLushWritRetryableeErrorAutoRecover5) {
|
2020-07-15 11:02:44 -07:00
|
|
|
// During the resume, call DB->CLose, make sure the resume thread exist
|
|
|
|
// before close continues. Due to the shutdown, the resume is not successful
|
|
|
|
// and the FS does not become active, so close status is still IO error
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 10; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
IOStatus nr_msg = IOStatus::IOError("No Retryable Fatal IO Error");
|
|
|
|
nr_msg.SetRetryable(false);
|
|
|
|
|
|
|
|
Put(Key(1), "val1");
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"RecoverFromRetryableBGIOError:BeforeStart",
|
|
|
|
"FLushWritRetryableeErrorAutoRecover5:0"}});
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeFinishBuildTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover5:0");
|
|
|
|
// The first resume will cause recovery_error and its severity is the
|
|
|
|
// Fatal error
|
|
|
|
s = dbfull()->Close();
|
|
|
|
ASSERT_NE(s, Status::OK());
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_NE("val1", Get(Key(1)));
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
ASSERT_EQ("val2", Get(Key(2)));
|
|
|
|
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest, FLushWritRetryableeErrorAutoRecover6) {
|
|
|
|
// During the resume, call DB->CLose, make sure the resume thread exist
|
|
|
|
// before close continues. Due to the shutdown, the resume is not successful
|
|
|
|
// and the FS does not become active, so close status is still IO error
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 10; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
IOStatus nr_msg = IOStatus::IOError("No Retryable Fatal IO Error");
|
|
|
|
nr_msg.SetRetryable(false);
|
|
|
|
|
|
|
|
Put(Key(1), "val1");
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"FLushWritRetryableeErrorAutoRecover6:0",
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeStart"},
|
|
|
|
{"RecoverFromRetryableBGIOError:BeforeWait0",
|
|
|
|
"FLushWritRetryableeErrorAutoRecover6:1"},
|
|
|
|
{"FLushWritRetryableeErrorAutoRecover6:2",
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeWait1"},
|
|
|
|
{"RecoverFromRetryableBGIOError:AfterWait0",
|
|
|
|
"FLushWritRetryableeErrorAutoRecover6:3"}});
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BuildTable:BeforeFinishBuildTable",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover6:0");
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover6:1");
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover6:2");
|
|
|
|
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover6:3");
|
|
|
|
// The first resume will cause recovery_error and its severity is the
|
|
|
|
// Fatal error
|
|
|
|
s = dbfull()->Close();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val1", Get(Key(1)));
|
|
|
|
Put(Key(2), "val2");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
ASSERT_EQ("val2", Get(Key(2)));
|
|
|
|
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest, ManifestWriteRetryableErrorAutoRecover) {
|
|
|
|
// Fail the first resume and let the second resume be successful
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 100000; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Flush();
|
|
|
|
Put(Key(1), "val");
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"RecoverFromRetryableBGIOError:BeforeStart",
|
|
|
|
"ManifestWriteRetryableErrorAutoRecover:0"},
|
|
|
|
{"ManifestWriteRetryableErrorAutoRecover:1",
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeWait1"},
|
|
|
|
{"RecoverFromRetryableBGIOError:RecoverSuccess",
|
|
|
|
"ManifestWriteRetryableErrorAutoRecover:2"}});
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(false, error_msg); });
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
TEST_SYNC_POINT("ManifestWriteRetryableErrorAutoRecover:0");
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
TEST_SYNC_POINT("ManifestWriteRetryableErrorAutoRecover:1");
|
|
|
|
TEST_SYNC_POINT("ManifestWriteRetryableErrorAutoRecover:2");
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest,
|
|
|
|
CompactionManifestWriteRetryableErrorAutoRecover) {
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 100000; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
std::string old_manifest;
|
|
|
|
std::string new_manifest;
|
|
|
|
std::atomic<bool> fail_manifest(false);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
old_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(0), "val");
|
|
|
|
Put(Key(2), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
// Wait for flush of 2nd L0 file before starting compaction
|
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"},
|
|
|
|
// Wait for compaction to detect manifest write error
|
|
|
|
{"BackgroundCallCompaction:1", "CompactionManifestWriteErrorAR:0"},
|
|
|
|
// Make compaction thread wait for error to be cleared
|
|
|
|
{"CompactionManifestWriteErrorAR:1",
|
|
|
|
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"},
|
|
|
|
{"CompactionManifestWriteErrorAR:2",
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeStart"},
|
|
|
|
// Fail the first resume, before the wait in resume
|
|
|
|
{"RecoverFromRetryableBGIOError:BeforeResume0",
|
|
|
|
"CompactionManifestWriteErrorAR:3"},
|
|
|
|
// Activate the FS before the second resume
|
|
|
|
{"CompactionManifestWriteErrorAR:4",
|
|
|
|
"RecoverFromRetryableBGIOError:BeforeResume1"},
|
|
|
|
// Wait the auto resume be sucessful
|
|
|
|
{"RecoverFromRetryableBGIOError:RecoverSuccess",
|
|
|
|
"CompactionManifestWriteErrorAR:5"}});
|
|
|
|
// trigger manifest write failure in compaction thread
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void*) { fail_manifest.store(true); });
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
|
|
|
|
if (fail_manifest.load()) {
|
|
|
|
fault_fs->SetFilesystemActive(false, error_msg);
|
|
|
|
}
|
|
|
|
});
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:0");
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:1");
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:2");
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:3");
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:4");
|
|
|
|
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:5");
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
new_manifest = GetManifestNameFromLiveFiles();
|
|
|
|
ASSERT_NE(new_manifest, old_manifest);
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
ASSERT_EQ("val", Get(Key(0)));
|
|
|
|
ASSERT_EQ("val", Get(Key(1)));
|
|
|
|
ASSERT_EQ("val", Get(Key(2)));
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest, CompactionWriteRetryableErrorAutoRecover) {
|
|
|
|
// In this test, in the first round of compaction, the FS is set to error.
|
|
|
|
// So the first compaction fails due to retryable IO error and it is mapped
|
|
|
|
// to soft error. Then, compaction is rescheduled, in the second round of
|
|
|
|
// compaction, the FS is set to active and compaction is successful, so
|
|
|
|
// the test will hit the CompactionJob::FinishCompactionOutputFile1 sync
|
|
|
|
// point.
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.level0_file_num_compaction_trigger = 2;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
Status s;
|
|
|
|
std::atomic<bool> fail_first(false);
|
|
|
|
std::atomic<bool> fail_second(true);
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
Put(Key(0), "va;");
|
|
|
|
Put(Key(2), "va;");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
|
|
|
|
listener->EnableAutoRecovery(false);
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
|
|
|
"BackgroundCallCompaction:0"},
|
|
|
|
{"CompactionJob::FinishCompactionOutputFile1",
|
|
|
|
"CompactionWriteRetryableErrorAutoRecover0"}});
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"DBImpl::BackgroundCompaction:Start",
|
|
|
|
[&](void*) { fault_fs->SetFilesystemActive(true); });
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"BackgroundCallCompaction:0", [&](void*) { fail_first.store(true); });
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"CompactionJob::OpenCompactionOutputFile", [&](void*) {
|
|
|
|
if (fail_first.load() && fail_second.load()) {
|
|
|
|
fault_fs->SetFilesystemActive(false, error_msg);
|
|
|
|
fail_second.store(false);
|
|
|
|
}
|
|
|
|
});
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
|
|
|
|
Put(Key(1), "val");
|
|
|
|
s = Flush();
|
|
|
|
ASSERT_EQ(s, Status::OK());
|
|
|
|
|
|
|
|
s = dbfull()->TEST_WaitForCompact();
|
|
|
|
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
|
|
|
|
|
|
|
|
TEST_SYNC_POINT("CompactionWriteRetryableErrorAutoRecover0");
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
Destroy(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest, WALWriteRetryableErrorAutoRecover1) {
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.writable_file_max_buffer_size = 32768;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.paranoid_checks = true;
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 100000; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
Random rnd(301);
|
|
|
|
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
// For the first batch, write is successful, require sync
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 0; i < 100; ++i) {
|
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
// For the second batch, the first 2 file Append are successful, then the
|
|
|
|
// following Append fails due to file system retryable IOError.
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
int write_error = 0;
|
|
|
|
|
|
|
|
for (auto i = 100; i < 200; ++i) {
|
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
|
|
|
}
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"RecoverFromRetryableBGIOError:BeforeResume0", "WALWriteError1:0"},
|
|
|
|
{"WALWriteError1:1", "RecoverFromRetryableBGIOError:BeforeResume1"},
|
|
|
|
{"RecoverFromRetryableBGIOError:RecoverSuccess", "WALWriteError1:2"}});
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
|
|
|
|
write_error++;
|
|
|
|
if (write_error > 2) {
|
|
|
|
fault_fs->SetFilesystemActive(false, error_msg);
|
|
|
|
}
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
s = dbfull()->Write(wopts, &batch);
|
|
|
|
ASSERT_EQ(true, s.IsIOError());
|
|
|
|
|
|
|
|
TEST_SYNC_POINT("WALWriteError1:0");
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
TEST_SYNC_POINT("WALWriteError1:1");
|
|
|
|
TEST_SYNC_POINT("WALWriteError1:2");
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
// Data in corrupted WAL are not stored
|
|
|
|
for (auto i = 0; i < 199; ++i) {
|
|
|
|
if (i < 100) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Resume and write a new batch, should be in the WAL
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 200; i < 300; ++i) {
|
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
for (auto i = 0; i < 300; ++i) {
|
|
|
|
if (i < 100 || i >= 200) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_F(DBErrorHandlingFSTest, WALWriteRetryableErrorAutoRecover2) {
|
|
|
|
// Fail the first recover and try second time.
|
|
|
|
std::shared_ptr<FaultInjectionTestFS> fault_fs(
|
|
|
|
new FaultInjectionTestFS(FileSystem::Default()));
|
|
|
|
std::unique_ptr<Env> fault_fs_env(NewCompositeEnv(fault_fs));
|
|
|
|
std::shared_ptr<ErrorHandlerFSListener> listener(
|
|
|
|
new ErrorHandlerFSListener());
|
|
|
|
Options options = GetDefaultOptions();
|
|
|
|
options.env = fault_fs_env.get();
|
|
|
|
options.create_if_missing = true;
|
|
|
|
options.writable_file_max_buffer_size = 32768;
|
|
|
|
options.listeners.emplace_back(listener);
|
|
|
|
options.paranoid_checks = true;
|
|
|
|
options.max_bgerror_resume_count = 2;
|
|
|
|
options.bgerror_resume_retry_interval = 100000; // 0.1 second
|
|
|
|
Status s;
|
|
|
|
Random rnd(301);
|
|
|
|
|
|
|
|
DestroyAndReopen(options);
|
|
|
|
|
|
|
|
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
|
|
|
|
error_msg.SetRetryable(true);
|
|
|
|
|
|
|
|
// For the first batch, write is successful, require sync
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 0; i < 100; ++i) {
|
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
// For the second batch, the first 2 file Append are successful, then the
|
|
|
|
// following Append fails due to file system retryable IOError.
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
int write_error = 0;
|
|
|
|
|
|
|
|
for (auto i = 100; i < 200; ++i) {
|
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
|
|
|
}
|
|
|
|
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
|
|
|
{{"RecoverFromRetryableBGIOError:BeforeWait0", "WALWriteError2:0"},
|
|
|
|
{"WALWriteError2:1", "RecoverFromRetryableBGIOError:BeforeWait1"},
|
|
|
|
{"RecoverFromRetryableBGIOError:RecoverSuccess", "WALWriteError2:2"}});
|
|
|
|
|
|
|
|
SyncPoint::GetInstance()->SetCallBack(
|
|
|
|
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
|
|
|
|
write_error++;
|
|
|
|
if (write_error > 2) {
|
|
|
|
fault_fs->SetFilesystemActive(false, error_msg);
|
|
|
|
}
|
|
|
|
});
|
|
|
|
SyncPoint::GetInstance()->EnableProcessing();
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
s = dbfull()->Write(wopts, &batch);
|
|
|
|
ASSERT_EQ(true, s.IsIOError());
|
|
|
|
|
|
|
|
TEST_SYNC_POINT("WALWriteError2:0");
|
|
|
|
fault_fs->SetFilesystemActive(true);
|
|
|
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
|
|
|
TEST_SYNC_POINT("WALWriteError2:1");
|
|
|
|
TEST_SYNC_POINT("WALWriteError2:2");
|
|
|
|
}
|
|
|
|
SyncPoint::GetInstance()->DisableProcessing();
|
|
|
|
|
|
|
|
// Data in corrupted WAL are not stored
|
|
|
|
for (auto i = 0; i < 199; ++i) {
|
|
|
|
if (i < 100) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
// Resume and write a new batch, should be in the WAL
|
|
|
|
{
|
|
|
|
WriteBatch batch;
|
|
|
|
|
|
|
|
for (auto i = 200; i < 300; ++i) {
|
|
|
|
batch.Put(Key(i), rnd.RandomString(1024));
|
|
|
|
}
|
|
|
|
|
|
|
|
WriteOptions wopts;
|
|
|
|
wopts.sync = true;
|
|
|
|
ASSERT_EQ(dbfull()->Write(wopts, &batch), Status::OK());
|
|
|
|
};
|
|
|
|
|
|
|
|
Reopen(options);
|
|
|
|
for (auto i = 0; i < 300; ++i) {
|
|
|
|
if (i < 100 || i >= 200) {
|
|
|
|
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
|
|
|
|
} else {
|
|
|
|
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Close();
|
|
|
|
}
|
|
|
|
|
2020-02-20 12:07:53 -08:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|
2018-06-28 12:23:57 -07:00
|
|
|
|
|
|
|
int main(int argc, char** argv) {
|
2020-02-20 12:07:53 -08:00
|
|
|
ROCKSDB_NAMESPACE::port::InstallStackTraceHandler();
|
2018-06-28 12:23:57 -07:00
|
|
|
::testing::InitGoogleTest(&argc, argv);
|
|
|
|
return RUN_ALL_TESTS();
|
|
|
|
}
|
Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
a) On the first occurance of an out of space error during compaction,
subsequent
compactions will be delayed until the disk free space check indicates
enough available space. The required space is computed as the sum of
input sizes.
b) The free space check requirement will be removed once the amount of
free space is greater than the size reserved by in progress
compactions when the first error occured
c) If the out of space error is a hard error, a background thread in
SFM will poll for sufficient headroom before triggering the recovery
of the database and putting it in write-only mode. The headroom is
calculated as the sum of the write_buffer_size of all the DB instances
associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()
Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164
Differential Revision: D9846378
Pulled By: anand1976
fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:36:19 -07:00
|
|
|
|
|
|
|
#else
|
|
|
|
#include <stdio.h>
|
|
|
|
|
|
|
|
int main(int /*argc*/, char** /*argv*/) {
|
|
|
|
fprintf(stderr, "SKIPPED as Cuckoo table is not supported in ROCKSDB_LITE\n");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif // ROCKSDB_LITE
|