rocksdb/db/db_impl
Yanqin Jin dd63f04c83 First step towards handling MANIFEST write error (#6949)
Summary:
This PR provides preliminary support for handling IO error during MANIFEST write.
File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted.
One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach.
If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled.
Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled.

Possible future directions:
- Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added.

Test plan (dev server):
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949

Reviewed By: anand1976

Differential Revision: D22026020

Pulled By: riversand963

fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8
2020-07-09 15:50:33 -07:00
..
db_impl_compaction_flush.cc First step towards handling MANIFEST write error (#6949) 2020-07-09 15:50:33 -07:00
db_impl_debug.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_impl_experimental.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_impl_files.cc First step towards handling MANIFEST write error (#6949) 2020-07-09 15:50:33 -07:00
db_impl_open.cc Let best-efforts recovery ignore CURRENT file (#6970) 2020-06-16 09:35:52 -07:00
db_impl_readonly.cc API change: DB::OpenForReadOnly will not write to the file system unless create_if_missing is true (#6900) 2020-06-03 18:57:49 -07:00
db_impl_readonly.h API change: DB::OpenForReadOnly will not write to the file system unless create_if_missing is true (#6900) 2020-06-03 18:57:49 -07:00
db_impl_secondary.cc Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621) 2020-04-15 17:40:44 -07:00
db_impl_secondary.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_impl_write.cc Add timestamp to delete (#6253) 2020-05-28 10:40:03 -07:00
db_impl.cc First step towards handling MANIFEST write error (#6949) 2020-07-09 15:50:33 -07:00
db_impl.h First step towards handling MANIFEST write error (#6949) 2020-07-09 15:50:33 -07:00
db_secondary_test.cc Do not swallow error returned from SaveTo() (#6801) 2020-05-05 10:46:20 -07:00