rocksdb/db
Andrew Kryczka f24c39ab3d Prevent corruption with parallel manual compactions and change_level == true (#9077)
Summary:
The bug can impact the following scenario. There must be two `CompactRange()`s, call them A and B. Compaction A must have `change_level=true`. Compactions A and B must run in parallel, and new data must be added while they run as well.

Now, on to the details of the race condition. Compaction A must reach the refitting phase while B's next step is to trivial move new data (i.e., data that has been inserted behind A) down to the same level that A's refit targets (`CompactRangeOptions::target_level`). B must be unregistered  (i.e., has not yet called `AddManualCompaction()` for the current `RunManualCompaction()`) while A invokes `DisableManualCompaction()`s to prepare for refitting. In the old code, B could still proceed to register a manual compaction, while A had disabled manual compaction.

The next part of the race condition is B picks and schedules a trivial move while A has released the lock in refitting phase in order to persist the LSM state change (i.e., the log phase of `LogAndApply()`). That way, B does not see the refitted data when picking a trivial-move compaction. So it is susceptible to picking one that overlaps.

Finally, B executes the picked trivial-move compaction. Trivial-move compactions are special in that they never check whether manual compaction is disabled. So the picked compaction causing overlap ends up being applied, leading to LSM corruption if `force_consistency_checks=false`, or entering read-only mode with `Status::Corruption` if `force_consistency_checks=true` (the default).

The fix is just to prevent B from registering itself in `RunManualCompaction()` while manual compactions are disabled, consequently preventing any trivial move or other compaction from being picked/scheduled.

Thanks to siying for finding the bug.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9077

Test Plan: The test does not go all the way in exposing the bug because it requires a compaction to be picked/scheduled while logging LSM state change for RefitLevel(). But the fix is to make such a compaction not picked/scheduled in the first place, so any repro of that scenario would end up hanging RefitLevel() logging. So instead I just verified no such compaction is registered in the scenario where `RefitLevel()` disables manual compactions.

Reviewed By: siying

Differential Revision: D31921908

Pulled By: ajkr

fbshipit-source-id: 9bb5d0e847ad428211227f40830c685c209fbecb
2021-10-27 23:08:56 -07:00
..
blob Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
compaction Incremental Space Amp Compactions in Universal Style (#8655) 2021-10-20 10:04:13 -07:00
db_impl Prevent corruption with parallel manual compactions and change_level == true (#9077) 2021-10-27 23:08:56 -07:00
arena_wrapped_db_iter.cc Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
arena_wrapped_db_iter.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
builder.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
builder.h Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
c_test.c Make it possible to force the garbage collection of the oldest blob files (#8994) 2021-10-11 18:03:01 -07:00
c.cc Make it possible to force the garbage collection of the oldest blob files (#8994) 2021-10-11 18:03:01 -07:00
column_family_test.cc Make it possible to force the garbage collection of the oldest blob files (#8994) 2021-10-11 18:03:01 -07:00
column_family.cc Make it possible to force the garbage collection of the oldest blob files (#8994) 2021-10-11 18:03:01 -07:00
column_family.h Add DB properties for BlobDB (#8734) 2021-09-08 12:22:04 -07:00
compact_files_test.cc Compaction should not move data to up level (#8116) 2021-03-29 17:10:42 -07:00
comparator_db_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
convenience.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
corruption_test.cc Add CreateFrom methods to Env/FileSystem (#8174) 2021-06-15 03:43:48 -07:00
cuckoo_table_db_test.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
db_basic_test.cc Make DB::Close() thread-safe (#8970) 2021-10-18 20:32:35 -07:00
db_block_cache_test.cc Fix PrepopulateBlockCache::kFlushOnly (#8750) 2021-09-15 15:33:20 -07:00
db_bloom_filter_test.cc Make SliceTransform into a Customizable class (#8641) 2021-09-27 07:43:47 -07:00
db_compaction_filter_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_compaction_test.cc Prevent corruption with parallel manual compactions and change_level == true (#9077) 2021-10-27 23:08:56 -07:00
db_dynamic_level_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
db_encryption_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_filesnapshot.cc Add (Live)FileStorageInfo API (#8968) 2021-10-16 10:04:32 -07:00
db_flush_test.cc Fix atomic flush waiting forever for MANIFEST write (#9034) 2021-10-20 21:34:47 -07:00
db_info_dumper.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
db_info_dumper.h Add a DB Session ID (#6959) 2020-06-15 10:47:02 -07:00
db_inplace_update_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_io_failure_test.cc Make MemTableRepFactory into a Customizable class (#8419) 2021-09-08 07:46:44 -07:00
db_iter_stress_test.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
db_iter_test.cc Remove some unneeded code (#8736) 2021-09-01 14:28:58 -07:00
db_iter.cc Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
db_iter.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
db_iterator_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_kv_checksum_test.cc Allow WriteBatch to have keys with different timestamp sizes (#8725) 2021-09-12 15:34:26 -07:00
db_log_iter_test.cc Attempt to deflake DBTestXactLogIterator.TransactionLogIteratorCorruptedLog (#8627) 2021-08-10 11:10:07 -07:00
db_logical_block_size_cache_test.cc Add further tests to ASSERT_STATUS_CHECKED (2) (#7698) 2020-12-09 21:21:16 -08:00
db_memtable_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_merge_operand_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_merge_operator_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_options_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_properties_test.cc Add support for building on s390x platform (#8962) 2021-10-22 10:13:15 -07:00
db_range_del_test.cc Incremental Space Amp Compactions in Universal Style (#8655) 2021-10-20 10:04:13 -07:00
db_secondary_test.cc Make MemTableRepFactory into a Customizable class (#8419) 2021-09-08 07:46:44 -07:00
db_sst_test.cc Don't ignore deletion rate limit if WAL dir is different (#8967) 2021-09-30 13:26:31 -07:00
db_statistics_test.cc Bytes read stat for VerifyChecksum() and VerifyFileChecksums() APIs (#8741) 2021-09-07 13:28:29 -07:00
db_table_properties_test.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
db_tailing_iter_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_test2.cc Fix out-of-bounds access in MultiDBParallelOpenTest (#9046) 2021-10-18 21:25:45 -07:00
db_test_util.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
db_test_util.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
db_test.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
db_universal_compaction_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
db_wal_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_with_timestamp_basic_test.cc Support SingleDelete for user-defined timestamps (#8921) 2021-09-27 11:51:07 -07:00
db_with_timestamp_compaction_test.cc Make MemTableRepFactory into a Customizable class (#8419) 2021-09-08 07:46:44 -07:00
db_write_buffer_manager_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_write_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
dbformat_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
dbformat.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
dbformat.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
deletefile_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
error_handler_fs_test.cc Remove some unneeded code (#8736) 2021-09-01 14:28:58 -07:00
error_handler.cc DB::GetSortedWalFiles() to ensure file deletion is disabled (#8591) 2021-07-29 11:51:08 -07:00
error_handler.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
event_helpers.cc Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
event_helpers.h Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
experimental.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
external_sst_file_basic_test.cc Protect existing files in FaultInjectionTest{Env,FS}::ReopenWritableFile() (#8995) 2021-10-11 16:23:18 -07:00
external_sst_file_ingestion_job.cc Ingest external SST files with Temperature hints (#8949) 2021-10-08 10:32:24 -07:00
external_sst_file_ingestion_job.h Fix sequence number bump logic in multi-CF SST ingestion (#9005) 2021-10-12 20:39:52 -07:00
external_sst_file_test.cc Fix sequence number bump logic in multi-CF SST ingestion (#9005) 2021-10-12 20:39:52 -07:00
fault_injection_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
file_indexer_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
file_indexer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
file_indexer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
filename_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
flush_job_test.cc Fix NotifyOnFlushCompleted() for atomic flush (#8585) 2021-08-03 13:31:10 -07:00
flush_job.cc Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
flush_job.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
flush_scheduler.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
flush_scheduler.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
forward_iterator_bench.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
forward_iterator.cc Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
forward_iterator.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
import_column_family_job.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
import_column_family_job.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
import_column_family_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
internal_stats.cc Support GetMapProperty() with "rocksdb.dbstats" (#9057) 2021-10-20 13:17:00 -07:00
internal_stats.h Support GetMapProperty() with "rocksdb.dbstats" (#9057) 2021-10-20 13:17:00 -07:00
job_context.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
kv_checksum.h Fix some lint warnings reported on 6.25 (#8945) 2021-09-27 11:43:20 -07:00
listener_test.cc Add file operation callbacks to SequentialFileReader (#8982) 2021-10-05 10:51:59 -07:00
log_format.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
log_reader.cc Fix kPointInTimeRecovery handling of truncated WAL (#7701) 2020-11-30 18:11:38 -08:00
log_reader.h Real fix for race in backup custom checksum checking (#7309) 2020-08-26 10:39:20 -07:00
log_test.cc Make StringEnv, StringSink, StringSource use FS classes (#7786) 2021-01-04 16:01:01 -08:00
log_writer.cc Using existing crc32c checksum in checksum handoff for Manifest and WAL (#8412) 2021-06-25 00:47:17 -07:00
log_writer.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
logs_with_prep_tracker.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
logs_with_prep_tracker.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
lookup_key.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
malloc_stats.cc Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
malloc_stats.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
manual_compaction_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
memtable_list_test.cc Fix NotifyOnFlushCompleted() for atomic flush (#8585) 2021-08-03 13:31:10 -07:00
memtable_list.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
memtable_list.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
memtable.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
memtable.h Adapt key-value checksum for timestamp-suffixed keys (#8914) 2021-09-14 13:14:39 -07:00
merge_context.h Add Merge Operator support to WriteBatchWithIndex (#8135) 2021-05-10 12:50:25 -07:00
merge_helper_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
merge_helper.cc Add support for Merge with base value during Compaction in IntegratedBlobDB (#8445) 2021-06-24 18:11:30 -07:00
merge_helper.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
merge_operator.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_test.cc Built-in support for generating unique IDs, bug fix (#8708) 2021-08-30 15:20:41 -07:00
obsolete_files_test.cc Deflaky ObsoleteFilesTest (#9049) 2021-10-18 15:15:23 -07:00
options_file_test.cc No elide constructors (#7798) 2020-12-23 16:55:53 -08:00
output_validator.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
output_validator.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
perf_context_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
periodic_work_scheduler_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
periodic_work_scheduler.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
periodic_work_scheduler.h Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
pinned_iterators_manager.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
plain_table_db_test.cc Remove some unneeded code (#8736) 2021-09-01 14:28:58 -07:00
pre_release_callback.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
prefix_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
range_del_aggregator_bench.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
range_del_aggregator_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
range_del_aggregator.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
range_del_aggregator.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
range_tombstone_fragmenter_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
range_tombstone_fragmenter.cc Added memtable garbage statistics (#8411) 2021-06-18 04:57:27 -07:00
range_tombstone_fragmenter.h Added memtable garbage statistics (#8411) 2021-06-18 04:57:27 -07:00
read_callback.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
repair_test.cc Some fixes and enhancements to ldb repair (#8544) 2021-07-28 16:44:14 -07:00
repair.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
snapshot_checker.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
snapshot_impl.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
snapshot_impl.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
table_cache.cc Add file temperature related counter and bytes stats to and io_stats (#8710) 2021-10-07 14:58:41 -07:00
table_cache.h Add file temperature related counter and bytes stats to and io_stats (#8710) 2021-10-07 14:58:41 -07:00
table_properties_collector_test.cc Support "level_at_creation" in TablePropertiesCollectorFactory::Context (#8919) 2021-09-28 12:35:24 -07:00
table_properties_collector.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
table_properties_collector.h Support "level_at_creation" in TablePropertiesCollectorFactory::Context (#8919) 2021-09-28 12:35:24 -07:00
transaction_log_impl.cc Add file operation callbacks to SequentialFileReader (#8982) 2021-10-05 10:51:59 -07:00
transaction_log_impl.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
trim_history_scheduler.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trim_history_scheduler.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
version_builder_test.cc Print blob file checksums as hex (#8437) 2021-06-22 09:49:44 -07:00
version_builder.cc Add file temperature related counter and bytes stats to and io_stats (#8710) 2021-10-07 14:58:41 -07:00
version_builder.h Handle blob files when options.best_efforts_recovery is true (#8180) 2021-04-19 11:56:14 -07:00
version_edit_handler.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
version_edit_handler.h Fixed manifest_dump issues when printing keys and values containing null characters (#8378) 2021-06-10 12:55:20 -07:00
version_edit_test.cc Make it able to ignore WAL related VersionEdits in older versions (#7873) 2021-01-19 19:27:53 -08:00
version_edit.cc Write file temperature information to manifest (#8284) 2021-05-17 15:15:23 -07:00
version_edit.h Write file temperature information to manifest (#8284) 2021-05-17 15:15:23 -07:00
version_set_test.cc Add support for building on s390x platform (#8962) 2021-10-22 10:13:15 -07:00
version_set.cc Add (Live)FileStorageInfo API (#8968) 2021-10-16 10:04:32 -07:00
version_set.h Add (Live)FileStorageInfo API (#8968) 2021-10-16 10:04:32 -07:00
wal_edit_test.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit.h Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_manager_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
wal_manager.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
wal_manager.h Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
write_batch_base.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_batch_internal.h Adapt key-value checksum for timestamp-suffixed keys (#8914) 2021-09-14 13:14:39 -07:00
write_batch_test.cc Remove some unneeded code (#8736) 2021-09-01 14:28:58 -07:00
write_batch.cc Adapt key-value checksum for timestamp-suffixed keys (#8914) 2021-09-14 13:14:39 -07:00
write_callback_test.cc Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475) 2021-07-07 11:14:05 -07:00
write_callback.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_controller_test.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller.h Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_thread.cc Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
write_thread.h typo: fix typo in db/write_thread's state (#8423) 2021-06-18 17:14:51 -07:00