Fix WAL corruption from checkpoint/backup race condition
Summary: `Writer::WriteBuffer` was always called at the beginning of checkpoint/backup. But that log writer has no internal synchronization, which meant the same buffer could be flushed twice in a race condition case, causing a WAL entry to be duplicated. Then subsequent WAL entries would be at unexpected offsets, causing the 32KB block boundaries to be overlapped and manifesting as a corruption. This PR fixes the behavior to only use `WriteBuffer` (via `FlushWAL`) in checkpoint/backup when manual WAL flush is enabled. In that case, users are responsible for providing synchronization between WAL flushes. We can also consider removing the call entirely. Closes https://github.com/facebook/rocksdb/pull/3603 Differential Revision: D7277447 Pulled By: ajkr fbshipit-source-id: 1b15bd7fd930511222b075418c10de0aaa70a35a
This commit is contained in:
parent
449627f0ea
commit
0cdaa1a804
@ -12,6 +12,7 @@
|
|||||||
|
|
||||||
### Bug Fixes
|
### Bug Fixes
|
||||||
* Fix a leak in prepared_section_completed_ where the zeroed entries would not removed from the map.
|
* Fix a leak in prepared_section_completed_ where the zeroed entries would not removed from the map.
|
||||||
|
* Fix WAL corruption caused by race condition between user write thread and backup/checkpoint thread.
|
||||||
|
|
||||||
## 5.12.0 (2/14/2018)
|
## 5.12.0 (2/14/2018)
|
||||||
### Public API Change
|
### Public API Change
|
||||||
|
@ -222,7 +222,9 @@ Status CheckpointImpl::CreateCustomCheckpoint(
|
|||||||
|
|
||||||
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles1");
|
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles1");
|
||||||
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles2");
|
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles2");
|
||||||
db_->FlushWAL(false /* sync */);
|
if (db_options.manual_wal_flush) {
|
||||||
|
db_->FlushWAL(false /* sync */);
|
||||||
|
}
|
||||||
}
|
}
|
||||||
// if we have more than one column family, we need to also get WAL files
|
// if we have more than one column family, we need to also get WAL files
|
||||||
if (s.ok()) {
|
if (s.ok()) {
|
||||||
|
Loading…
x
Reference in New Issue
Block a user