rocksdb/db
Nikhil Benesch 5f3088d565 Range deletion performance improvements + cleanup (#4014)
Summary:
This fixes the same performance issue that #3992 fixes but with much more invasive cleanup.

I'm more excited about this PR because it paves the way for fixing another problem we uncovered at Cockroach where range deletion tombstones can cause massive compactions. For example, suppose L4 contains deletions from [a, c) and [x, z) and no other keys, and L5 is entirely empty. L6, however, is full of data. When compacting L4 -> L5, we'll end up with one file that spans, massively, from [a, z). When we go to compact L5 -> L6, we'll have to rewrite all of L6! If, instead of range deletions in L4, we had keys a, b, x, y, and z, RocksDB would have been smart enough to create two files in L5: one for a and b and another for x, y, and z.

With the changes in this PR, it will be possible to adjust the compaction logic to split tombstones/start new output files when they would span too many files in the grandparent level.

ajkr please take a look when you have a minute!
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4014

Differential Revision: D8773253

Pulled By: ajkr

fbshipit-source-id: ec62fa85f648fdebe1380b83ed997f9baec35677
2018-07-12 14:42:39 -07:00
..
builder.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
builder.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
c_test.c comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
c.cc Add bottommost_compression_opts to for bottommost_compression (#3985) 2018-06-27 17:42:38 -07:00
column_family_test.cc Extend some tests to format_version=3 (#3942) 2018-06-04 20:13:00 -07:00
column_family.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
column_family.h Add max_subcompactions as a compaction option 2018-04-27 11:57:39 -07:00
compact_files_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
compacted_db_impl.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
compacted_db_impl.h Comment out unused variables 2018-03-05 13:13:41 -08:00
compaction_iteration_stats.h add counter for deletion dropping optimization 2017-08-19 14:10:08 -07:00
compaction_iterator_test.cc option for timing measurement of non-blocking ops during compaction (#4029) 2018-06-21 21:28:05 -07:00
compaction_iterator.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
compaction_iterator.h option for timing measurement of non-blocking ops during compaction (#4029) 2018-06-21 21:28:05 -07:00
compaction_job_stats_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
compaction_job_test.cc Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
compaction_job.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
compaction_job.h Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
compaction_picker_test.cc Delete triggered compaction for universal style 2018-05-29 15:44:34 -07:00
compaction_picker_universal.cc Add bottommost_compression_opts to for bottommost_compression (#3985) 2018-06-27 17:42:38 -07:00
compaction_picker_universal.h Delete triggered compaction for universal style 2018-05-29 15:44:34 -07:00
compaction_picker.cc compaction: fix max_subcompactions option for CompactRange (#4082) 2018-07-05 20:12:56 -07:00
compaction_picker.h Add bottommost_compression_opts to for bottommost_compression (#3985) 2018-06-27 17:42:38 -07:00
compaction.cc Add bottommost_compression_opts to for bottommost_compression (#3985) 2018-06-27 17:42:38 -07:00
compaction.h Add bottommost_compression_opts to for bottommost_compression (#3985) 2018-06-27 17:42:38 -07:00
comparator_db_test.cc use user_key and iterate_upper_bound to determine compatibility of bloom filters (#3899) 2018-06-26 15:57:26 -07:00
convenience.cc Pin mmap files in ReadOnlyDB (#4053) 2018-06-27 17:13:34 -07:00
corruption_test.cc Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs 2018-05-17 02:56:56 -07:00
cuckoo_table_db_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
db_basic_test.cc Disallow compactions if there isn't enough free space 2018-03-06 16:27:54 -08:00
db_blob_index_test.cc fix lite build 2017-10-17 08:57:09 -07:00
db_block_cache_test.cc LRUCache midpoint insertion 2018-05-24 15:57:33 -07:00
db_bloom_filter_test.cc PrefixMayMatch: remove unnecessary check for prefix_extractor_ (#4067) 2018-06-27 20:42:43 -07:00
db_compaction_filter_test.cc Remove tests from ROCKSDB_VALGRIND_RUN 2018-05-30 16:15:16 -07:00
db_compaction_test.cc Check conflict at output level in CompactFiles (#3926) 2018-06-05 14:14:05 -07:00
db_dynamic_level_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
db_encryption_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_filesnapshot.cc fix behavior does not match name for "IsFileDeletionsEnabled" 2018-03-21 22:13:34 -07:00
db_flush_test.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
db_impl_compaction_flush.cc Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
db_impl_debug.cc Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
db_impl_experimental.cc Inform caller when rocksdb is stalling writes 2017-10-05 18:11:43 -07:00
db_impl_files.cc SetOptions Backup Race Condition (#4108) 2018-07-11 14:57:46 -07:00
db_impl_open.cc WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069) 2018-06-28 18:58:29 -07:00
db_impl_readonly.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
db_impl_readonly.h allowing CompactFiles to return new file names 2018-03-15 11:58:12 -07:00
db_impl_write.cc Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
db_impl.cc SetOptions Backup Race Condition (#4108) 2018-07-11 14:57:46 -07:00
db_impl.h WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) 2018-07-06 17:59:13 -07:00
db_info_dumper.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_info_dumper.h Change RocksDB License 2017-07-15 16:11:23 -07:00
db_inplace_update_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_io_failure_test.cc Fix LITE unit tests 2017-07-26 21:11:47 -07:00
db_iter_stress_test.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
db_iter_test.cc Fix regression bug of Prev() with upper bound (#3989) 2018-06-12 16:57:36 -07:00
db_iter.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
db_iter.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
db_iterator_test.cc PrefixMayMatch: remove unnecessary check for prefix_extractor_ (#4067) 2018-06-27 20:42:43 -07:00
db_log_iter_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
db_memtable_test.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
db_merge_operator_test.cc WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955) 2018-06-27 12:23:07 -07:00
db_options_test.cc PersistRocksDBOptions() to use WritableFileWriter 2018-05-21 16:42:22 -07:00
db_properties_test.cc Add table property tracking number of range deletions (#4016) 2018-06-26 20:27:35 -07:00
db_range_del_test.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
db_sst_test.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
db_statistics_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_table_properties_test.cc fix deletion-triggered compaction in table builder 2017-09-28 18:17:30 -07:00
db_tailing_iter_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
db_test2.cc Remove ReadOnly part of PinnableSliceAndMmapReads from Lite (#4070) 2018-06-28 08:42:17 -07:00
db_test_util.cc Extend format 3 to partitioned index/filters (#3958) 2018-06-06 16:58:16 -07:00
db_test_util.h Extend some tests to format_version=3 (#3942) 2018-06-04 20:13:00 -07:00
db_test.cc use user_key and iterate_upper_bound to determine compatibility of bloom filters (#3899) 2018-06-26 15:57:26 -07:00
db_universal_compaction_test.cc Fix universal compaction scheduling conflict with CompactFiles (#4055) 2018-06-26 10:44:56 -07:00
db_wal_test.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
db_write_test.cc Suppress tsan lock-order-inversion on FlushWAL 2018-05-14 21:13:35 -07:00
dbformat_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
dbformat.cc add kEntryRangeDeletion 2018-04-13 11:27:17 -07:00
dbformat.h WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) 2018-07-06 17:59:13 -07:00
deletefile_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
error_handler_test.cc Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
error_handler.cc Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
error_handler.h Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
event_helpers.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
event_helpers.h Change RocksDB License 2017-07-15 16:11:23 -07:00
experimental.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
external_sst_file_basic_test.cc optimize file ingestion checks for range deletion overlap 2017-11-28 11:27:02 -08:00
external_sst_file_ingestion_job.cc Fix a map lookup that may throw exception. (#4098) 2018-07-06 16:12:49 -07:00
external_sst_file_ingestion_job.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
external_sst_file_test.cc Fix ExternalSSTFileTest::OverlappingRanges test on Solaris Sparc (#4012) 2018-06-18 14:57:37 -07:00
fault_injection_test.cc Split FaultInjectionTest.FaultTest to avoid timeout 2018-05-07 12:29:58 -07:00
file_indexer_test.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
file_indexer.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
file_indexer.h Change RocksDB License 2017-07-15 16:11:23 -07:00
filename_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
flush_job_test.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
flush_job.cc Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
flush_job.h Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
flush_scheduler.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
flush_scheduler.h Change RocksDB License 2017-07-15 16:11:23 -07:00
forward_iterator_bench.cc fix gflags namespace 2017-12-01 10:42:05 -08:00
forward_iterator.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
forward_iterator.h Comment out unused variables 2018-03-05 13:13:41 -08:00
internal_stats.cc Add kOptionsStatistics to GetProperty() (#3966) 2018-06-15 17:28:01 -07:00
internal_stats.h Add kOptionsStatistics to GetProperty() (#3966) 2018-06-15 17:28:01 -07:00
job_context.h Introduce and use the option to disable stall notifications structures 2018-05-09 10:13:53 -07:00
listener_test.cc fix some text in comments. 2018-04-10 15:59:24 -07:00
log_format.h Change RocksDB License 2017-07-15 16:11:23 -07:00
log_reader.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
log_reader.h Suppress lint in old files 2018-01-29 12:56:42 -08:00
log_test.cc Add file name info to SequentialFileReader. (#4026) 2018-06-21 08:42:24 -07:00
log_writer.cc Pass manual_wal_flush also to the first wal file 2018-05-14 10:57:56 -07:00
log_writer.h Pass manual_wal_flush also to the first wal file 2018-05-14 10:57:56 -07:00
logs_with_prep_tracker.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
logs_with_prep_tracker.h Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
malloc_stats.cc Disallow compactions if there isn't enough free space 2018-03-06 16:27:54 -08:00
malloc_stats.h Change RocksDB License 2017-07-15 16:11:23 -07:00
managed_iterator.cc Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs 2018-05-17 02:56:56 -07:00
managed_iterator.h Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs 2018-05-17 02:56:56 -07:00
manual_compaction_test.cc Speedup ManualCompactionTest.Test 2018-05-03 16:13:09 -07:00
memtable_list_test.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
memtable_list.cc Support group commits of version edits (#3944) 2018-06-28 12:34:39 -07:00
memtable_list.h Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
memtable.cc WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955) 2018-06-27 12:23:07 -07:00
memtable.h InlineSkiplist: don't decode keys unnecessarily during comparisons 2018-03-23 12:14:30 -07:00
merge_context.h Change RocksDB License 2017-07-15 16:11:23 -07:00
merge_helper_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
merge_helper.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
merge_helper.h WritePrepared Txn: Support merge operator 2018-02-09 14:57:54 -08:00
merge_operator.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
merge_test.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
obsolete_files_test.cc SetOptions Backup Race Condition (#4108) 2018-07-11 14:57:46 -07:00
options_file_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
perf_context_test.cc Improve write time breakdown stats 2018-04-23 17:58:54 -07:00
pinned_iterators_manager.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_db_test.cc Should only decode restart points for uncompressed blocks (#3996) 2018-06-15 19:26:58 -07:00
pre_release_callback.h Fix pre_release callback argument list. 2018-04-05 11:12:16 -07:00
prefix_test.cc use user_key and iterate_upper_bound to determine compatibility of bloom filters (#3899) 2018-06-26 15:57:26 -07:00
range_del_aggregator_test.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
range_del_aggregator.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
range_del_aggregator.h Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
read_callback.h WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955) 2018-06-27 12:23:07 -07:00
repair_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
repair.cc Add file name info to SequentialFileReader. (#4026) 2018-06-21 08:42:24 -07:00
snapshot_checker.h Comment out unused variables 2018-03-05 13:13:41 -08:00
snapshot_impl.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
snapshot_impl.h WritePrepared Txn: smallest_prepare optimization 2018-04-02 20:27:41 -07:00
table_cache.cc Pin mmap files in ReadOnlyDB (#4053) 2018-06-27 17:13:34 -07:00
table_cache.h Pin mmap files in ReadOnlyDB (#4053) 2018-06-27 17:13:34 -07:00
table_properties_collector_test.cc Should only decode restart points for uncompressed blocks (#3996) 2018-06-15 19:26:58 -07:00
table_properties_collector.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
table_properties_collector.h Comment out unused variables 2018-03-05 13:13:41 -08:00
transaction_log_impl.cc WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) 2018-07-06 17:59:13 -07:00
transaction_log_impl.h WritePrepared Txn: Refactor conf params 2017-11-10 17:28:12 -08:00
version_builder_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
version_builder.cc Delay verify compaction output table (#3979) 2018-06-15 12:42:53 -07:00
version_builder.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
version_edit_test.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
version_edit.cc Specify the underlying type of enums. 2018-05-23 16:12:59 -07:00
version_edit.h Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
version_set_test.cc Support group commits of version edits (#3944) 2018-06-28 12:34:39 -07:00
version_set.cc fix clang analyzer warnings (#4072) 2018-06-28 19:12:35 -07:00
version_set.h Support group commits of version edits (#3944) 2018-06-28 12:34:39 -07:00
wal_manager_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
wal_manager.cc Add file name info to SequentialFileReader. (#4026) 2018-06-21 08:42:24 -07:00
wal_manager.h Fix memleak when DB::DeleteFile() 2018-01-11 18:57:33 -08:00
write_batch_base.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_batch_internal.h WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069) 2018-06-28 18:58:29 -07:00
write_batch_test.cc WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) 2018-07-06 17:59:13 -07:00
write_batch.cc WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) 2018-07-06 17:59:13 -07:00
write_callback_test.cc WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069) 2018-06-28 18:58:29 -07:00
write_callback.h Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller.h Change RocksDB License 2017-07-15 16:11:23 -07:00
write_thread.cc Avoid sleep in DBTest.GroupCommitTest to fix flakiness 2018-05-22 12:16:25 -07:00
write_thread.h WritePrepared Txn: Duplicate Keys, Txn Part 2018-02-05 18:43:24 -08:00