Get manifest size again after getting min_log_num during checkpoint (#7836)

Summary:
Currently, manifest size is determined before getting min_log_num.

But between getting manifest size and getting min_log_num, concurrently, a flush might succeed, which will write new records to manifest to make some WALs become outdated, then min_log_num will be correspondingly increased, but the new records in manifest will not be copied into the checkpoint because the manifest's size is determined before them, then the newly outdated WALs will still exist in the checkpoint's manifest, but they are not linked/copied to the checkpoint because their log number is < min_log_num, so a corruption of missing WAL will be reported when restoring from the checkpoint.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7836

Test Plan: make crash_test

Reviewed By: ajkr

Differential Revision: D25788204

Pulled By: cheng-chang

fbshipit-source-id: a4e5acf30f08270b3c0a95304ff559a9e655252f
This commit is contained in:
Cheng Chang 2021-01-07 23:00:12 -08:00 committed by Jay Zhuang
parent 2b64cddf99
commit eeea27a048

View File

@ -235,26 +235,18 @@ Status CheckpointImpl::CreateCustomCheckpoint(
if (!db_->GetIntProperty(DB::Properties::kMinLogNumberToKeep, &min_log_num)) { if (!db_->GetIntProperty(DB::Properties::kMinLogNumberToKeep, &min_log_num)) {
return Status::InvalidArgument("cannot get the min log number to keep."); return Status::InvalidArgument("cannot get the min log number to keep.");
} }
// Between GetLiveFiles and getting min_log_num, flush might happen
if (s.ok() && db_options.allow_2pc) { // concurrently, so new WAL deletions might be tracked in MANIFEST. If we do
// We need to refetch live files with flush to handle this case: // not get the new MANIFEST size, the deleted WALs might not be reflected in
// A previous 000001.log contains the prepare record of transaction tnx1. // the checkpoint's MANIFEST.
// The current log file is 000002.log, and sequence_number points to this //
// file. // If we get min_log_num before the above GetLiveFiles, then there might
// After calling GetLiveFiles(), 000003.log is created. // be too many unnecessary WALs to be included in the checkpoint.
// Then tnx1 is committed. The commit record is written to 000003.log. //
// Now we fetch min_log_num, which will be 3. // Ideally, min_log_num should be got together with manifest_file_size in
// Then only 000002.log and 000003.log will be copied, and 000001.log will // GetLiveFiles atomically. But that needs changes to GetLiveFiles' signature
// be skipped. 000003.log contains commit message of tnx1, but we don't // which is a public API.
// have respective prepare record for it. s = db_->GetLiveFiles(live_files, &manifest_file_size, flush_memtable);
// In order to avoid this situation, we need to force flush to make sure
// all transactions committed before getting min_log_num will be flushed
// to SST files.
// We cannot get min_log_num before calling the GetLiveFiles() for the
// first time, because if we do that, all the logs files will be included,
// far more than needed.
s = db_->GetLiveFiles(live_files, &manifest_file_size, flush_memtable);
}
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles1"); TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles1");
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles2"); TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles2");