rocksdb/utilities
anand76 fefd4b98c5 Introduce a new MultiGet batching implementation (#5011)
Summary:
This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.

Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
2. Bloom filter cachelines can be prefetched, hiding the cache miss latency

The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.

Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).

Batch   Sizes

1        | 2        | 4         | 8      | 16  | 32

Random pattern (Stride length 0)
4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)

Good locality (Stride length 16)
4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135

Good locality (Stride length 256)
4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62

Medium locality (Stride length 4096)
4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891

dbbench command used (on a DB with 4 levels, 12 million keys)-
TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011

Differential Revision: D14348703

Pulled By: anand1976

fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
2019-04-11 14:28:26 -07:00
..
backupable Remove some "using std::..." from header files. (#5113) 2019-03-27 10:28:21 -07:00
blob_db Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
cassandra Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
checkpoint Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
compaction_filters Comment out unused variables 2018-03-05 13:13:41 -08:00
convenience Change RocksDB License 2017-07-15 16:11:23 -07:00
leveldb_options Change RocksDB License 2017-07-15 16:11:23 -07:00
memory Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
merge_operators Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
option_change_migration comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
options Make it easier for users to load options from option file and set shared block cache. (#5063) 2019-03-21 16:25:28 -07:00
persistent_cache Removed const fields in copyable classes (#5095) 2019-04-05 15:40:30 -07:00
simulator_cache Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
table_properties_collectors Reduce runtime of compact_on_deletion_collector_test (#4779) 2018-12-13 14:47:08 -08:00
trace Add the max trace file size limitation option to Tracing (#4610) 2018-11-27 14:27:05 -08:00
transactions Reduce copies of LockInfo (#5172) 2019-04-10 15:58:58 -07:00
ttl Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
write_batch_with_index Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
debug.cc Remove v1 RangeDelAggregator (#4778) 2018-12-17 17:33:46 -08:00
env_librados_test.cc Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
env_librados.cc Suppress lint in old files 2018-01-29 12:56:42 -08:00
env_librados.md Add EnvLibrados - RocksDB Env of RADOS (#1222) 2016-07-21 11:16:34 -07:00
env_mirror_test.cc Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
env_mirror.cc Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
env_timed_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
env_timed.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
merge_operators.h Support StringAppendOperator(delimiter_char) constructor in java-api 2018-03-08 16:17:47 -08:00
object_registry_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
util_merge_operators_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00