Fix MANIFEST name assignment (#6426)
Summary: Currently, a new MANIFEST file is assigned a new file number when 1) no MANIFEST is open, or 2) current MANIFEST file size exceeds a threshold. This is not sufficient. There are cases when the caller explicitly specifies that a new MANIFEST be created. For example, if user sets options.write_dbid_to_manifest = true, and there are WAL files, then RocksDB will run into an issue during recovery. `DBImpl::Recover()` will call `LogAndApply()` to write dbid. At this point, the db being recovered creates a new MANIFEST, say, MANIFEST-000003. Since there are WALs, `DBImpl::RecoverLogFiles` will be called. Towards the end of this function, we call `LogAndApply(new_descriptor_log=true)`, which explicitly creates a new MANIFEST. However, the manifest_file_number is wrong before this fix. Consequently, RocksDB opens an existing, non-empty file for append, effectively truncating the file to zero. If a crash occurs, then there will be data loss. Test Plan (devserver): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6426 Test Plan: make check Differential Revision: D19951866 Pulled By: riversand963 fbshipit-source-id: 4b1b9fc28d4fe2ac12764b388ef9e61f05e766da
This commit is contained in:
parent
fdf882ded2
commit
362b8d4393
@ -12,6 +12,7 @@
|
|||||||
* BlobDB now ignores trivially moved files when updating the mapping between blob files and SSTs. This should mitigate issue #6338 where out of order flush/compaction notifications could trigger an assertion with the earlier code.
|
* BlobDB now ignores trivially moved files when updating the mapping between blob files and SSTs. This should mitigate issue #6338 where out of order flush/compaction notifications could trigger an assertion with the earlier code.
|
||||||
* Batched MultiGet() ignores IO errors while reading data blocks, causing it to potentially continue looking for a key and returning stale results.
|
* Batched MultiGet() ignores IO errors while reading data blocks, causing it to potentially continue looking for a key and returning stale results.
|
||||||
* `WriteBatchWithIndex::DeleteRange` returns `Status::NotSupported`. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes `rocksdb_writebatch_wi_delete_range`, `rocksdb_writebatch_wi_delete_range_cf`, `rocksdb_writebatch_wi_delete_rangev`, and `rocksdb_writebatch_wi_delete_rangev_cf`. In Java, that includes `WriteBatchWithIndex::deleteRange`.
|
* `WriteBatchWithIndex::DeleteRange` returns `Status::NotSupported`. Previously it returned success even though reads on the batch did not account for range tombstones. The corresponding language bindings now cannot be used. In C, that includes `rocksdb_writebatch_wi_delete_range`, `rocksdb_writebatch_wi_delete_range_cf`, `rocksdb_writebatch_wi_delete_rangev`, and `rocksdb_writebatch_wi_delete_rangev_cf`. In Java, that includes `WriteBatchWithIndex::deleteRange`.
|
||||||
|
* Assign new MANIFEST file number when caller tries to create a new MANIFEST by calling LogAndApply(..., new_descriptor_log=true). This bug can cause MANIFEST being overwritten during recovery if options.write_dbid_to_manifest = true and there are WAL file(s).
|
||||||
|
|
||||||
### Performance Improvements
|
### Performance Improvements
|
||||||
* Perfom readahead when reading from option files. Inside DB, options.log_readahead_size will be used as the readahead size. In other cases, a default 512KB is used.
|
* Perfom readahead when reading from option files. Inside DB, options.log_readahead_size will be used as the readahead size. In other cases, a default 512KB is used.
|
||||||
|
@ -3767,8 +3767,6 @@ Status VersionSet::ProcessManifestWrites(
|
|||||||
if (!descriptor_log_ ||
|
if (!descriptor_log_ ||
|
||||||
manifest_file_size_ > db_options_->max_manifest_file_size) {
|
manifest_file_size_ > db_options_->max_manifest_file_size) {
|
||||||
TEST_SYNC_POINT("VersionSet::ProcessManifestWrites:BeforeNewManifest");
|
TEST_SYNC_POINT("VersionSet::ProcessManifestWrites:BeforeNewManifest");
|
||||||
pending_manifest_file_number_ = NewFileNumber();
|
|
||||||
batch_edits.back()->SetNextFile(next_file_number_.load());
|
|
||||||
new_descriptor_log = true;
|
new_descriptor_log = true;
|
||||||
} else {
|
} else {
|
||||||
pending_manifest_file_number_ = manifest_file_number_;
|
pending_manifest_file_number_ = manifest_file_number_;
|
||||||
@ -3779,6 +3777,9 @@ Status VersionSet::ProcessManifestWrites(
|
|||||||
// SwitchMemtable().
|
// SwitchMemtable().
|
||||||
std::unordered_map<uint32_t, MutableCFState> curr_state;
|
std::unordered_map<uint32_t, MutableCFState> curr_state;
|
||||||
if (new_descriptor_log) {
|
if (new_descriptor_log) {
|
||||||
|
pending_manifest_file_number_ = NewFileNumber();
|
||||||
|
batch_edits.back()->SetNextFile(next_file_number_.load());
|
||||||
|
|
||||||
// if we are writing out new snapshot make sure to persist max column
|
// if we are writing out new snapshot make sure to persist max column
|
||||||
// family.
|
// family.
|
||||||
if (column_family_set_->GetMaxColumnFamily() > 0) {
|
if (column_family_set_->GetMaxColumnFamily() > 0) {
|
||||||
|
Loading…
Reference in New Issue
Block a user