BlobDB: ignore trivially moved files when updating the SST<->blob file mapping (#6381)
Summary: BlobDB keeps track of the mapping between SSTs and blob files using the `OnFlushCompleted` and `OnCompactionCompleted` callbacks of the `EventListener` interface: upon receiving a flush notification, a link is added between the newly flushed SST and the corresponding blob file; for compactions, links are removed for the inputs and added for the outputs. The earlier code performed this link deletion and addition even for trivially moved files; the new code walks through the two lists together (in a fashion that's similar to merge sort) and skips such files. This should mitigate https://github.com/facebook/rocksdb/issues/6338, wherein an assertion is triggered with the earlier code when a compaction notification for a trivial move precedes the flush notification for the moved SST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6381 Test Plan: make check Differential Revision: D19773729 Pulled By: ltamasi fbshipit-source-id: ae0f273ded061110dd9334e8fb99b0d7786650b0
This commit is contained in:
parent
107a7ca930
commit
1b4be4cac9
@ -8,6 +8,7 @@
|
|||||||
* Fix a bug that prevents opening a DB after two consecutive crash with TransactionDB, where the first crash recovers from a corrupted WAL with kPointInTimeRecovery but the second cannot.
|
* Fix a bug that prevents opening a DB after two consecutive crash with TransactionDB, where the first crash recovers from a corrupted WAL with kPointInTimeRecovery but the second cannot.
|
||||||
* Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
|
* Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
|
||||||
* Add DBOptions::skip_checking_sst_file_sizes_on_db_open. It disables potentially expensive checking of all sst file sizes in DB::Open().
|
* Add DBOptions::skip_checking_sst_file_sizes_on_db_open. It disables potentially expensive checking of all sst file sizes in DB::Open().
|
||||||
|
* BlobDB now ignores trivially moved files when updating the mapping between blob files and SSTs. This should mitigate issue #6338 where out of order flush/compaction notifications could trigger an assertion with the earlier code.
|
||||||
|
|
||||||
### Public API Change
|
### Public API Change
|
||||||
* The BlobDB garbage collector now emits the statistics `BLOB_DB_GC_NUM_FILES` (number of blob files obsoleted during GC), `BLOB_DB_GC_NUM_NEW_FILES` (number of new blob files generated during GC), `BLOB_DB_GC_FAILURES` (number of failed GC passes), `BLOB_DB_GC_NUM_KEYS_RELOCATED` (number of blobs relocated during GC), and `BLOB_DB_GC_BYTES_RELOCATED` (total size of blobs relocated during GC). On the other hand, the following statistics, which are not relevant for the new GC implementation, are now deprecated: `BLOB_DB_GC_NUM_KEYS_OVERWRITTEN`, `BLOB_DB_GC_NUM_KEYS_EXPIRED`, `BLOB_DB_GC_BYTES_OVERWRITTEN`, `BLOB_DB_GC_BYTES_EXPIRED`, and `BLOB_DB_GC_MICROS`.
|
* The BlobDB garbage collector now emits the statistics `BLOB_DB_GC_NUM_FILES` (number of blob files obsoleted during GC), `BLOB_DB_GC_NUM_NEW_FILES` (number of new blob files generated during GC), `BLOB_DB_GC_FAILURES` (number of failed GC passes), `BLOB_DB_GC_NUM_KEYS_RELOCATED` (number of blobs relocated during GC), and `BLOB_DB_GC_BYTES_RELOCATED` (total size of blobs relocated during GC). On the other hand, the following statistics, which are not relevant for the new GC implementation, are now deprecated: `BLOB_DB_GC_NUM_KEYS_OVERWRITTEN`, `BLOB_DB_GC_NUM_KEYS_EXPIRED`, `BLOB_DB_GC_BYTES_OVERWRITTEN`, `BLOB_DB_GC_BYTES_EXPIRED`, and `BLOB_DB_GC_MICROS`.
|
||||||
|
@ -476,25 +476,69 @@ void BlobDBImpl::ProcessCompactionJobInfo(const CompactionJobInfo& info) {
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Note: the same SST file may appear in both the input and the output
|
// Note: the same SST file may appear in both the input and the output
|
||||||
// file list in case of a trivial move. We process the inputs first
|
// file list in case of a trivial move. We walk through the two lists
|
||||||
// to ensure the blob file still has a link after processing all updates.
|
// below in a fashion that's similar to merge sort to detect this.
|
||||||
|
|
||||||
|
auto cmp = [](const CompactionFileInfo& lhs, const CompactionFileInfo& rhs) {
|
||||||
|
return lhs.file_number < rhs.file_number;
|
||||||
|
};
|
||||||
|
|
||||||
|
auto inputs = info.input_file_infos;
|
||||||
|
auto iit = inputs.begin();
|
||||||
|
const auto iit_end = inputs.end();
|
||||||
|
|
||||||
|
std::sort(iit, iit_end, cmp);
|
||||||
|
|
||||||
|
auto outputs = info.output_file_infos;
|
||||||
|
auto oit = outputs.begin();
|
||||||
|
const auto oit_end = outputs.end();
|
||||||
|
|
||||||
|
std::sort(oit, oit_end, cmp);
|
||||||
|
|
||||||
WriteLock lock(&mutex_);
|
WriteLock lock(&mutex_);
|
||||||
|
|
||||||
for (const auto& input : info.input_file_infos) {
|
while (iit != iit_end && oit != oit_end) {
|
||||||
if (input.oldest_blob_file_number == kInvalidBlobFileNumber) {
|
const auto& input = *iit;
|
||||||
continue;
|
const auto& output = *oit;
|
||||||
}
|
|
||||||
|
|
||||||
UnlinkSstFromBlobFile(input.file_number, input.oldest_blob_file_number);
|
if (input.file_number == output.file_number) {
|
||||||
|
++iit;
|
||||||
|
++oit;
|
||||||
|
} else if (input.file_number < output.file_number) {
|
||||||
|
if (input.oldest_blob_file_number != kInvalidBlobFileNumber) {
|
||||||
|
UnlinkSstFromBlobFile(input.file_number, input.oldest_blob_file_number);
|
||||||
|
}
|
||||||
|
|
||||||
|
++iit;
|
||||||
|
} else {
|
||||||
|
assert(output.file_number < input.file_number);
|
||||||
|
|
||||||
|
if (output.oldest_blob_file_number != kInvalidBlobFileNumber) {
|
||||||
|
LinkSstToBlobFile(output.file_number, output.oldest_blob_file_number);
|
||||||
|
}
|
||||||
|
|
||||||
|
++oit;
|
||||||
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
for (const auto& output : info.output_file_infos) {
|
while (iit != iit_end) {
|
||||||
if (output.oldest_blob_file_number == kInvalidBlobFileNumber) {
|
const auto& input = *iit;
|
||||||
continue;
|
|
||||||
|
if (input.oldest_blob_file_number != kInvalidBlobFileNumber) {
|
||||||
|
UnlinkSstFromBlobFile(input.file_number, input.oldest_blob_file_number);
|
||||||
}
|
}
|
||||||
|
|
||||||
LinkSstToBlobFile(output.file_number, output.oldest_blob_file_number);
|
++iit;
|
||||||
|
}
|
||||||
|
|
||||||
|
while (oit != oit_end) {
|
||||||
|
const auto& output = *oit;
|
||||||
|
|
||||||
|
if (output.oldest_blob_file_number != kInvalidBlobFileNumber) {
|
||||||
|
LinkSstToBlobFile(output.file_number, output.oldest_blob_file_number);
|
||||||
|
}
|
||||||
|
|
||||||
|
++oit;
|
||||||
}
|
}
|
||||||
|
|
||||||
MarkUnreferencedBlobFilesObsolete();
|
MarkUnreferencedBlobFilesObsolete();
|
||||||
|
Loading…
Reference in New Issue
Block a user