rocksdb/table
Abhishek Madan 7528130e38 Cache fragmented range tombstones in BlockBasedTableReader (#4493)
Summary:
This allows tombstone fragmenting to only be performed when the table is opened, and cached for subsequent accesses.

On the same DB used in #4449, running `readrandom` results in the following:
```
readrandom   :       0.983 micros/op 1017076 ops/sec;   78.3 MB/s (63103 of 100000 found)
```

Now that Get performance in the presence of range tombstones is reasonable, I also compared the performance between a DB with range tombstones, "expanded" range tombstones (several point tombstones that cover the same keys the equivalent range tombstone would cover, a common workaround for DeleteRange), and no range tombstones. The created DBs had 5 million keys each, and DeleteRange was called at regular intervals (depending on the total number of range tombstones being written) after 4.5 million Puts. The table below summarizes the results of a `readwhilewriting` benchmark (in order to provide somewhat more realistic results):
```
   Tombstones?    | avg micros/op | stddev micros/op |  avg ops/s   | stddev ops/s
----------------- | ------------- | ---------------- | ------------ | ------------
None              |        0.6186 |          0.04637 | 1,625,252.90 | 124,679.41
500 Expanded      |        0.6019 |          0.03628 | 1,666,670.40 | 101,142.65
500 Unexpanded    |        0.6435 |          0.03994 | 1,559,979.40 | 104,090.52
1k Expanded       |        0.6034 |          0.04349 | 1,665,128.10 | 125,144.57
1k Unexpanded     |        0.6261 |          0.03093 | 1,600,457.50 |  79,024.94
5k Expanded       |        0.6163 |          0.05926 | 1,636,668.80 | 154,888.85
5k Unexpanded     |        0.6402 |          0.04002 | 1,567,804.70 | 100,965.55
10k Expanded      |        0.6036 |          0.05105 | 1,667,237.70 | 142,830.36
10k Unexpanded    |        0.6128 |          0.02598 | 1,634,633.40 |  72,161.82
25k Expanded      |        0.6198 |          0.04542 | 1,620,980.50 | 116,662.93
25k Unexpanded    |        0.5478 |          0.0362  | 1,833,059.10 | 121,233.81
50k Expanded      |        0.5104 |          0.04347 | 1,973,107.90 | 184,073.49
50k Unexpanded    |        0.4528 |          0.03387 | 2,219,034.50 | 170,984.32
```

After a large enough quantity of range tombstones are written, range tombstone Gets can become faster than reading from an equivalent DB with several point tombstones.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4493

Differential Revision: D10842844

Pulled By: abhimadan

fbshipit-source-id: a7d44534f8120e6aabb65779d26c6b9df954c509
2018-10-25 19:26:44 -07:00
..
adaptive_table_factory.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
adaptive_table_factory.h Comment out unused variables 2018-03-05 13:13:41 -08:00
block_based_filter_block_test.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
block_based_filter_block.cc PrefixMayMatch: remove unnecessary check for prefix_extractor_ (#4067) 2018-06-27 20:42:43 -07:00
block_based_filter_block.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
block_based_table_builder.cc Introduce CacheAllocator, a custom allocator for cache blocks (#4437) 2018-10-02 17:24:58 -07:00
block_based_table_builder.h Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347) 2018-09-06 09:58:34 -07:00
block_based_table_factory.cc Add db_bench options of data block hash index (#4281) 2018-08-16 18:42:46 -07:00
block_based_table_factory.h Improve point-lookup performance using a data block hash index (#4174) 2018-08-15 14:30:03 -07:00
block_based_table_reader.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
block_based_table_reader.h Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
block_builder.cc DataBlockHashIndex: Remove the division from EstimateSize() (#4293) 2018-08-20 23:13:50 -07:00
block_builder.h Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
block_fetcher.cc Introduce CacheAllocator, a custom allocator for cache blocks (#4437) 2018-10-02 17:24:58 -07:00
block_fetcher.h Introduce CacheAllocator, a custom allocator for cache blocks (#4437) 2018-10-02 17:24:58 -07:00
block_prefix_index.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
block_prefix_index.h Change RocksDB License 2017-07-15 16:11:23 -07:00
block_test.cc Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
block.cc Fix sync-point comment in Block destructor (#4380) 2018-09-17 11:58:11 -07:00
block.h Fix typos in comments (#4456) 2018-10-04 20:46:50 -07:00
bloom_block.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
bloom_block.h Change RocksDB License 2017-07-15 16:11:23 -07:00
cleanable_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
cuckoo_table_builder_test.cc Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
cuckoo_table_builder.cc Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
cuckoo_table_builder.h Change RocksDB License 2017-07-15 16:11:23 -07:00
cuckoo_table_factory.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
cuckoo_table_factory.h comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
cuckoo_table_reader_test.cc Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
cuckoo_table_reader.cc Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
cuckoo_table_reader.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
data_block_footer.cc Add db_bench options of data block hash index (#4281) 2018-08-16 18:42:46 -07:00
data_block_footer.h Add db_bench options of data block hash index (#4281) 2018-08-16 18:42:46 -07:00
data_block_hash_index_test.cc Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
data_block_hash_index.cc DataBlockHashIndex: Remove the division from EstimateSize() (#4293) 2018-08-20 23:13:50 -07:00
data_block_hash_index.h DataBlockHashIndex: Remove the division from EstimateSize() (#4293) 2018-08-20 23:13:50 -07:00
filter_block.h use user_key and iterate_upper_bound to determine compatibility of bloom filters (#3899) 2018-06-26 15:57:26 -07:00
flush_block_policy.cc Align SST file data blocks to avoid spanning multiple pages 2018-03-26 20:26:10 -07:00
format.cc fix unused param allocator in compression.h (#4453) 2018-10-04 13:24:22 -07:00
format.h fix unused param allocator in compression.h (#4453) 2018-10-04 13:24:22 -07:00
full_filter_bits_builder.h Skip duplicate bloom keys when whole_key and prefix are mixed 2018-04-24 10:58:16 -07:00
full_filter_block_test.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
full_filter_block.cc Charging block cache more accurately (#4073) 2018-06-29 08:57:20 -07:00
full_filter_block.h Charging block cache more accurately (#4073) 2018-06-29 08:57:20 -07:00
get_context.cc Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
get_context.h Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
index_builder.cc Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
index_builder.h Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
internal_iterator.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
iter_heap.h Make InternalKeyComparator final and directly use it in merging iterator 2017-09-11 12:04:21 -07:00
iterator_wrapper.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
iterator.cc fix typo in error message, twice (#4457) 2018-10-09 17:07:27 -07:00
merger_test.cc Make InternalKeyComparator final and directly use it in merging iterator 2017-09-11 12:04:21 -07:00
merging_iterator.cc Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
merging_iterator.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
meta_blocks.cc Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
meta_blocks.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
mock_table.cc Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
mock_table.h Make BlockBasedTableIterator compaction-aware (#4048) 2018-06-25 13:19:27 -07:00
partitioned_filter_block_test.cc use per-level perf context for bloom filter related counters (#4581) 2018-10-24 12:21:38 -07:00
partitioned_filter_block.cc Fix bug in partition filters with format_version=4 (#4381) 2018-09-17 17:28:15 -07:00
partitioned_filter_block.h Fix bug in partition filters with format_version=4 (#4381) 2018-09-17 17:28:15 -07:00
persistent_cache_helper.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
persistent_cache_helper.h Change RocksDB License 2017-07-15 16:11:23 -07:00
persistent_cache_options.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_builder.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
plain_table_builder.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
plain_table_factory.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
plain_table_factory.h Comment out unused variables 2018-03-05 13:13:41 -08:00
plain_table_index.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_index.h Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
plain_table_key_coding.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
plain_table_key_coding.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_reader.cc Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
plain_table_reader.h Introduce CacheAllocator, a custom allocator for cache blocks (#4437) 2018-10-02 17:24:58 -07:00
scoped_arena_iterator.h Change RocksDB License 2017-07-15 16:11:23 -07:00
sst_file_writer_collectors.h Comment out unused variables 2018-03-05 13:13:41 -08:00
sst_file_writer.cc Add listener to sample file io (#3933) 2018-10-12 18:36:11 -07:00
table_builder.h Remove random writes from SST file ingestion (#4172) 2018-07-27 16:12:23 -07:00
table_properties_internal.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
table_properties.cc Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
table_reader_bench.cc Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
table_reader.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
table_test.cc Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
two_level_iterator.cc Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00
two_level_iterator.h Index value delta encoding (#3983) 2018-08-09 16:58:40 -07:00