rocksdb/db
Abhishek Madan 7528130e38 Cache fragmented range tombstones in BlockBasedTableReader (#4493)
Summary:
This allows tombstone fragmenting to only be performed when the table is opened, and cached for subsequent accesses.

On the same DB used in #4449, running `readrandom` results in the following:
```
readrandom   :       0.983 micros/op 1017076 ops/sec;   78.3 MB/s (63103 of 100000 found)
```

Now that Get performance in the presence of range tombstones is reasonable, I also compared the performance between a DB with range tombstones, "expanded" range tombstones (several point tombstones that cover the same keys the equivalent range tombstone would cover, a common workaround for DeleteRange), and no range tombstones. The created DBs had 5 million keys each, and DeleteRange was called at regular intervals (depending on the total number of range tombstones being written) after 4.5 million Puts. The table below summarizes the results of a `readwhilewriting` benchmark (in order to provide somewhat more realistic results):
```
   Tombstones?    | avg micros/op | stddev micros/op |  avg ops/s   | stddev ops/s
----------------- | ------------- | ---------------- | ------------ | ------------
None              |        0.6186 |          0.04637 | 1,625,252.90 | 124,679.41
500 Expanded      |        0.6019 |          0.03628 | 1,666,670.40 | 101,142.65
500 Unexpanded    |        0.6435 |          0.03994 | 1,559,979.40 | 104,090.52
1k Expanded       |        0.6034 |          0.04349 | 1,665,128.10 | 125,144.57
1k Unexpanded     |        0.6261 |          0.03093 | 1,600,457.50 |  79,024.94
5k Expanded       |        0.6163 |          0.05926 | 1,636,668.80 | 154,888.85
5k Unexpanded     |        0.6402 |          0.04002 | 1,567,804.70 | 100,965.55
10k Expanded      |        0.6036 |          0.05105 | 1,667,237.70 | 142,830.36
10k Unexpanded    |        0.6128 |          0.02598 | 1,634,633.40 |  72,161.82
25k Expanded      |        0.6198 |          0.04542 | 1,620,980.50 | 116,662.93
25k Unexpanded    |        0.5478 |          0.0362  | 1,833,059.10 | 121,233.81
50k Expanded      |        0.5104 |          0.04347 | 1,973,107.90 | 184,073.49
50k Unexpanded    |        0.4528 |          0.03387 | 2,219,034.50 | 170,984.32
```

After a large enough quantity of range tombstones are written, range tombstone Gets can become faster than reading from an equivalent DB with several point tombstones.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4493

Differential Revision: D10842844

Pulled By: abhimadan

fbshipit-source-id: a7d44534f8120e6aabb65779d26c6b9df954c509
2018-10-25 19:26:44 -07:00
..
builder.cc Add listener to sample file io (#3933) 2018-10-12 18:36:11 -07:00
builder.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
c_test.c Adjust c test and fix windows compilation issues 2018-09-14 20:57:22 -07:00
c.cc Memory usage stats in C API (#4340) 2018-09-13 14:27:31 -07:00
column_family_test.cc add locking around calls to RecalculateWriteStallConditions in column_family_test (#4474) 2018-10-09 14:10:13 -07:00
column_family.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
column_family.h Add max_subcompactions as a compaction option 2018-04-27 11:57:39 -07:00
compact_files_test.cc Check for compression lib support before test exec (#4443) 2018-10-02 10:42:01 -07:00
compacted_db_impl.cc move dump stats to a separate thread (#4382) 2018-10-08 22:54:43 -07:00
compacted_db_impl.h Comment out unused variables 2018-03-05 13:13:41 -08:00
compaction_iteration_stats.h add counter for deletion dropping optimization 2017-08-19 14:10:08 -07:00
compaction_iterator_test.cc Drop unnecessary deletion markers during compaction (issue - 3842) (#4289) 2018-08-24 15:17:54 -07:00
compaction_iterator.cc Avoid per-key linear scan over snapshots in compaction (#4495) 2018-10-15 16:21:22 -07:00
compaction_iterator.h option for timing measurement of non-blocking ops during compaction (#4029) 2018-06-21 21:28:05 -07:00
compaction_job_stats_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
compaction_job_test.cc Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
compaction_job.cc Add listener to sample file io (#3933) 2018-10-12 18:36:11 -07:00
compaction_job.h Allow DB resume after background errors (#3997) 2018-06-28 12:34:40 -07:00
compaction_picker_test.cc Remove random writes from SST file ingestion (#4172) 2018-07-27 16:12:23 -07:00
compaction_picker_universal.cc Remove random writes from SST file ingestion (#4172) 2018-07-27 16:12:23 -07:00
compaction_picker_universal.h Delete triggered compaction for universal style 2018-05-29 15:44:34 -07:00
compaction_picker.cc Properly determine a truncated CompactRange stop key (#4496) 2018-10-15 23:22:51 -07:00
compaction_picker.h Properly determine a truncated CompactRange stop key (#4496) 2018-10-15 23:22:51 -07:00
compaction.cc Truncate range tombstones by leveraging InternalKeys (#4432) 2018-10-09 15:19:38 -07:00
compaction.h Truncate range tombstones by leveraging InternalKeys (#4432) 2018-10-09 15:19:38 -07:00
comparator_db_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
convenience.cc Pin mmap files in ReadOnlyDB (#4053) 2018-06-27 17:13:34 -07:00
corruption_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
cuckoo_table_db_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
db_basic_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
db_blob_index_test.cc fix lite build 2017-10-17 08:57:09 -07:00
db_block_cache_test.cc LRUCache midpoint insertion 2018-05-24 15:57:33 -07:00
db_bloom_filter_test.cc use per-level perf context for bloom filter related counters (#4581) 2018-10-24 12:21:38 -07:00
db_compaction_filter_test.cc Remove tests from ROCKSDB_VALGRIND_RUN 2018-05-30 16:15:16 -07:00
db_compaction_test.cc Fix user comparator receiving internal key (#4575) 2018-10-23 08:14:46 -07:00
db_dynamic_level_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
db_encryption_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_filesnapshot.cc avoid copying when iterating using range-based for (#4459) 2018-10-09 17:15:51 -07:00
db_flush_test.cc Add inline comments to flush job (#4464) 2018-10-05 15:41:17 -07:00
db_impl_compaction_flush.cc Add support to flush multiple CFs atomically (#4262) 2018-10-15 20:01:17 -07:00
db_impl_debug.cc move dump stats to a separate thread (#4382) 2018-10-08 22:54:43 -07:00
db_impl_experimental.cc Update JobContext. (#3949) 2018-08-03 17:42:34 -07:00
db_impl_files.cc SetOptions Backup Race Condition (#4108) 2018-07-11 14:57:46 -07:00
db_impl_open.cc Add read retry support to log reader (#4394) 2018-10-19 11:53:00 -07:00
db_impl_readonly.cc Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
db_impl_readonly.h allowing CompactFiles to return new file names 2018-03-15 11:58:12 -07:00
db_impl_write.cc Add listener to sample file io (#3933) 2018-10-12 18:36:11 -07:00
db_impl.cc Remove unused variable 2018-10-24 15:51:45 -07:00
db_impl.h Add support to flush multiple CFs atomically (#4262) 2018-10-15 20:01:17 -07:00
db_info_dumper.cc avoid copying when iterating using range-based for (#4459) 2018-10-09 17:15:51 -07:00
db_info_dumper.h Change RocksDB License 2017-07-15 16:11:23 -07:00
db_inplace_update_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_io_failure_test.cc Fix LITE unit tests 2017-07-26 21:11:47 -07:00
db_iter_stress_test.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
db_iter_test.cc Fix regression bug of Prev() with upper bound (#3989) 2018-06-12 16:57:36 -07:00
db_iter.cc Fix merge operand reappearing when covered by DeleteRange (#4481) 2018-10-10 18:16:12 -07:00
db_iter.h Add tracing function of Seek() and SeekForPrev() to trace_replay (#4228) 2018-08-10 17:57:40 -07:00
db_iterator_test.cc Add tracing function of Seek() and SeekForPrev() to trace_replay (#4228) 2018-08-10 17:57:40 -07:00
db_log_iter_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
db_memtable_test.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
db_merge_operator_test.cc WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955) 2018-06-27 12:23:07 -07:00
db_options_test.cc move dump stats to a separate thread (#4382) 2018-10-08 22:54:43 -07:00
db_properties_test.cc Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
db_range_del_test.cc Lazily initialize RangeDelAggregator stripe map entries (#4497) 2018-10-17 11:47:34 -07:00
db_sst_test.cc Simplify DBWithMaxSpaceAllowedRandomized (#4235) 2018-08-08 07:27:46 -07:00
db_statistics_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_table_properties_test.cc Allow dynamic modification of window size and deletion trigger (#4403) 2018-09-20 15:15:28 -07:00
db_tailing_iter_test.cc Remove managed iterator 2018-07-17 14:43:18 -07:00
db_test2.cc support OnCompactionBegin (#4431) 2018-10-10 17:32:27 -07:00
db_test_util.cc Fix bug in partition filters with format_version=4 (#4381) 2018-09-17 17:28:15 -07:00
db_test_util.h Fix RepeatableThreadTest::MockEnvTest hang (#4560) 2018-10-21 20:17:18 -07:00
db_test.cc Disable GroupCommitTest in Appveyor (#4536) 2018-10-18 14:21:09 -07:00
db_universal_compaction_test.cc Adapt three unit tests with newer compiler/libraries (#4562) 2018-10-24 08:17:56 -07:00
db_wal_test.cc Improve log handling when recover without flush (#4405) 2018-09-26 10:37:48 -07:00
db_write_test.cc Suppress tsan lock-order-inversion on FlushWAL 2018-05-14 21:13:35 -07:00
dbformat_test.cc Relax VersionStorageInfo::GetOverlappingInputs check (#4050) 2018-07-13 17:42:38 -07:00
dbformat.cc types: add kEntryBlobIndex for TablePropertiesCollector (#4233) 2018-08-06 18:27:44 -07:00
dbformat.h Fix two contrun job failures (#4587) 2018-10-24 20:16:45 -07:00
deletefile_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
error_handler_test.cc Fix regression test failures introduced by PR #4164 (#4375) 2018-09-17 13:14:07 -07:00
error_handler.cc Fix typos in comments (#4456) 2018-10-04 20:46:50 -07:00
error_handler.h Fix typos in comments (#4456) 2018-10-04 20:46:50 -07:00
event_helpers.cc Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
event_helpers.h Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
experimental.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
external_sst_file_basic_test.cc Support range deletion tombstones in IngestExternalFile SSTs (#3778) 2018-07-13 22:43:09 -07:00
external_sst_file_ingestion_job.cc Fix typos in comments (#4456) 2018-10-04 20:46:50 -07:00
external_sst_file_ingestion_job.h Remove random writes from SST file ingestion (#4172) 2018-07-27 16:12:23 -07:00
external_sst_file_test.cc Pending output file number should be released after bulkload failure (#4145) 2018-07-17 14:13:16 -07:00
fault_injection_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
file_indexer_test.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
file_indexer.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
file_indexer.h Change RocksDB License 2017-07-15 16:11:23 -07:00
filename_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
flush_job_test.cc Add support to flush multiple CFs atomically (#4262) 2018-10-15 20:01:17 -07:00
flush_job.cc Add support to flush multiple CFs atomically (#4262) 2018-10-15 20:01:17 -07:00
flush_job.h Add support to flush multiple CFs atomically (#4262) 2018-10-15 20:01:17 -07:00
flush_scheduler.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
flush_scheduler.h Change RocksDB License 2017-07-15 16:11:23 -07:00
forward_iterator_bench.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
forward_iterator.cc FindFile: use std::lower_bound reduce the repeated code. (#4372) 2018-09-27 10:35:00 -07:00
forward_iterator.h Comment out unused variables 2018-03-05 13:13:41 -08:00
internal_stats.cc Add kOptionsStatistics to GetProperty() (#3966) 2018-06-15 17:28:01 -07:00
internal_stats.h Add kOptionsStatistics to GetProperty() (#3966) 2018-06-15 17:28:01 -07:00
job_context.h Update JobContext. (#3949) 2018-08-03 17:42:34 -07:00
listener_test.cc Add listener to sample file io (#3933) 2018-10-12 18:36:11 -07:00
log_format.h Fix an inaccurate comment (#4315) 2018-08-24 18:13:20 -07:00
log_reader.cc fix clang analyzer error (#4583) 2018-10-23 22:14:54 -07:00
log_reader.h Add read retry support to log reader (#4394) 2018-10-19 11:53:00 -07:00
log_test.cc Add read retry support to log reader (#4394) 2018-10-19 11:53:00 -07:00
log_writer.cc Pass manual_wal_flush also to the first wal file 2018-05-14 10:57:56 -07:00
log_writer.h Pass manual_wal_flush also to the first wal file 2018-05-14 10:57:56 -07:00
logs_with_prep_tracker.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
logs_with_prep_tracker.h Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
malloc_stats.cc Fix compile error with jemalloc (#4488) 2018-10-12 11:50:50 -07:00
malloc_stats.h Change RocksDB License 2017-07-15 16:11:23 -07:00
manual_compaction_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
memtable_list_test.cc Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
memtable_list.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
memtable_list.h Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
memtable.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
memtable.h Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
merge_context.h #3865 fix performance regression introduced by MergeOperator.ShouldMerge (#4266) 2018-08-16 10:58:05 -07:00
merge_helper_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
merge_helper.cc Range deletion performance improvements + cleanup (#4014) 2018-07-12 14:42:39 -07:00
merge_helper.h Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
merge_operator.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
merge_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
obsolete_files_test.cc Modify verification logic of ObsoleteOptionsFileTest (#4218) 2018-08-03 13:57:40 -07:00
options_file_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
perf_context_test.cc Add PerfContextByLevel to provide per level perf context information (#4226) 2018-10-17 11:19:40 -07:00
pinned_iterators_manager.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_db_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
pre_release_callback.h Fix pre_release callback argument list. 2018-04-05 11:12:16 -07:00
prefix_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
range_del_aggregator_bench.cc Improve RangeDelAggregator benchmarks (#4395) 2018-09-21 16:13:08 -07:00
range_del_aggregator_test.cc Truncate range tombstones by leveraging InternalKeys (#4432) 2018-10-09 15:19:38 -07:00
range_del_aggregator.cc Fix two contrun job failures (#4587) 2018-10-24 20:16:45 -07:00
range_del_aggregator.h Lazily initialize RangeDelAggregator stripe map entries (#4497) 2018-10-17 11:47:34 -07:00
range_tombstone_fragmenter_test.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
range_tombstone_fragmenter.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
range_tombstone_fragmenter.h Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
read_callback.h WriteUnPrepared Txn: Disable seek to snapshot optimization (#3955) 2018-06-27 12:23:07 -07:00
repair_test.cc Acquire lock on DB LOCK file before starting repair. (#4435) 2018-10-12 10:41:54 -07:00
repair.cc Add read retry support to log reader (#4394) 2018-10-19 11:53:00 -07:00
snapshot_checker.h Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
snapshot_impl.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
snapshot_impl.h BlobDB: Fix VisibleToActiveSnapshot() (#4236) 2018-08-06 16:57:42 -07:00
table_cache.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
table_cache.h Truncate range tombstones by leveraging InternalKeys (#4432) 2018-10-09 15:19:38 -07:00
table_properties_collector_test.cc Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
table_properties_collector.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
table_properties_collector.h Comment out unused variables 2018-03-05 13:13:41 -08:00
transaction_log_impl.cc Add read retry support to log reader (#4394) 2018-10-19 11:53:00 -07:00
transaction_log_impl.h WritePrepared Txn: Refactor conf params 2017-11-10 17:28:12 -08:00
version_builder_test.cc Remove random writes from SST file ingestion (#4172) 2018-07-27 16:12:23 -07:00
version_builder.cc VersionBuilder: optmize SaveTo() to linear time. (#4366) 2018-09-14 19:43:04 -07:00
version_builder.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
version_edit_test.cc Update recovery code for version edits group commit. (#3945) 2018-08-20 14:58:00 -07:00
version_edit.cc Update recovery code for version edits group commit. (#3945) 2018-08-20 14:58:00 -07:00
version_edit.h Update recovery code for version edits group commit. (#3945) 2018-08-20 14:58:00 -07:00
version_set_test.cc Dynamic level to adjust level multiplier when write is too heavy (#4338) 2018-10-22 10:21:47 -07:00
version_set.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
version_set.h Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
wal_manager_test.cc Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
wal_manager.cc Add read retry support to log reader (#4394) 2018-10-19 11:53:00 -07:00
wal_manager.h Fix memleak when DB::DeleteFile() 2018-01-11 18:57:33 -08:00
write_batch_base.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_batch_internal.h WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069) 2018-06-28 18:58:29 -07:00
write_batch_test.cc WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) 2018-07-06 17:59:13 -07:00
write_batch.cc WriteBatch::Iterate wrongly returns Status::Corruption (#4478) 2018-10-10 20:57:27 -07:00
write_callback_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
write_callback.h Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller.h Change RocksDB License 2017-07-15 16:11:23 -07:00
write_thread.cc Handle mixed slowdown/no_slowdown writer properly (#4475) 2018-10-09 22:52:40 -07:00
write_thread.h Handle mixed slowdown/no_slowdown writer properly (#4475) 2018-10-09 22:52:40 -07:00