rocksdb/util
anand76 fefd4b98c5 Introduce a new MultiGet batching implementation (#5011)
Summary:
This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.

Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
2. Bloom filter cachelines can be prefetched, hiding the cache miss latency

The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.

Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).

Batch   Sizes

1        | 2        | 4         | 8      | 16  | 32

Random pattern (Stride length 0)
4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)

Good locality (Stride length 16)
4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135

Good locality (Stride length 256)
4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62

Medium locality (Stride length 4096)
4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891

dbbench command used (on a DB with 4 levels, 12 million keys)-
TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011

Differential Revision: D14348703

Pulled By: anand1976

fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
2019-04-11 14:28:26 -07:00
..
aligned_buffer.h Fix Copying of data between buffers in FilePrefetchBuffer (#4100) 2018-07-11 12:28:13 -07:00
allocator.h Change RocksDB License 2017-07-15 16:11:23 -07:00
arena_test.cc arena: derive alignment unit from std::max_align_t 2017-10-17 11:13:19 -07:00
arena.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
arena.h arena: derive alignment unit from std::max_align_t 2017-10-17 11:13:19 -07:00
auto_roll_logger_test.cc Fix many bugs in log statement arguments (#5089) 2019-04-04 12:12:11 -07:00
auto_roll_logger.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
auto_roll_logger.h Fix the Logger::Close() and DBImpl::Close() design pattern 2018-02-23 13:57:26 -08:00
autovector_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
autovector.h Use placement new and delete in autovector (#5080) 2019-03-20 10:42:04 -07:00
bloom_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
bloom.cc Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
build_version.cc.in Makefile: generate util/build_version.cc from .in file (#1384) 2016-10-25 11:31:39 -07:00
build_version.h Change RocksDB License 2017-07-15 16:11:23 -07:00
cast_util.h Add a missing "once" in .h 2017-07-31 12:12:03 -07:00
channel.h Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
coding_test.cc Coding.h: Added Fixed16 support (#4142) 2018-07-16 23:43:41 -07:00
coding.cc Enable MSVC W4 with a few exceptions. Fix warnings and bugs 2017-10-19 10:57:12 -07:00
coding.h Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
compaction_job_stats_impl.cc Add a new CPU time counter to compaction report (#4889) 2019-01-29 17:24:00 -08:00
comparator.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
compression_context_cache.cc run make format for PR 3838 (#3954) 2018-06-05 12:58:02 -07:00
compression_context_cache.h run make format for PR 3838 (#3954) 2018-06-05 12:58:02 -07:00
compression.h add compression options to table properties (#5081) 2019-04-02 14:52:34 -07:00
concurrent_arena.cc Cap concurrent arena's shard block size to 128KB (#4147) 2018-07-18 10:43:54 -07:00
concurrent_arena.h util: Fix coverity issues 2017-11-03 14:42:08 -07:00
concurrent_task_limiter_impl.cc Compaction limiter miscs (#4795) 2018-12-26 13:59:35 -08:00
concurrent_task_limiter_impl.h Concurrent task limiter for compaction thread control (#4332) 2018-12-13 13:18:28 -08:00
core_local.h Change RocksDB License 2017-07-15 16:11:23 -07:00
crc32c_ppc_asm.S Updated CRC32 Power Optimization Changes 2017-08-31 14:16:30 -07:00
crc32c_ppc_constants.h Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
crc32c_ppc.c Updated CRC32 Power Optimization Changes 2017-08-31 14:16:30 -07:00
crc32c_ppc.h Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
crc32c_test.cc Build and tests fixes for Solaris Sparc (#4000) 2018-06-15 12:42:53 -07:00
crc32c.cc Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
crc32c.h Updated CRC32 Power Optimization Changes 2017-08-31 14:16:30 -07:00
delete_scheduler_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
delete_scheduler.cc Always delete Blob DB files in the background (#4928) 2019-01-29 15:50:03 -08:00
delete_scheduler.h Always delete Blob DB files in the background (#4928) 2019-01-29 15:50:03 -08:00
duplicate_detector.h Fix many bugs in log statement arguments (#5089) 2019-04-04 12:12:11 -07:00
dynamic_bloom_test.cc Two code changes to make "clang analyze" happy (#4292) 2018-08-20 17:43:41 -07:00
dynamic_bloom.cc Disallow customized hash function in DynamicBloom (#4915) 2019-01-24 10:34:30 -08:00
dynamic_bloom.h Disallow customized hash function in DynamicBloom (#4915) 2019-01-24 10:34:30 -08:00
event_logger_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
event_logger.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
event_logger.h Change RocksDB License 2017-07-15 16:11:23 -07:00
fault_injection_test_env.cc Atomic ingest (#4895) 2019-02-12 19:16:17 -08:00
fault_injection_test_env.h Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
file_reader_writer_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
file_reader_writer.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
file_reader_writer.h Add a new CPU time counter to compaction report (#4889) 2019-01-29 17:24:00 -08:00
file_util.cc Smooth the deletion of WAL files (#5116) 2019-03-28 15:17:13 -07:00
file_util.h Smooth the deletion of WAL files (#5116) 2019-03-28 15:17:13 -07:00
filelock_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
filename.cc Store the return value of Fsync for check 2018-09-14 13:29:56 -07:00
filename.h BlobDB: GetLiveFiles and GetLiveFilesMetadata return relative path (#4326) 2018-08-31 12:12:49 -07:00
filter_policy.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
gflags_compat.h fix gflags namespace 2017-12-01 10:42:05 -08:00
hash_map.h Change RocksDB License 2017-07-15 16:11:23 -07:00
hash_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
hash.cc Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
hash.h Consolidate hash function used for non-persistent data in a new function (#5155) 2019-04-08 13:32:06 -07:00
heap_test.cc fix gflags namespace 2017-12-01 10:42:05 -08:00
heap.h Add compaction logic to RangeDelAggregatorV2 (#4758) 2018-12-17 13:20:51 -08:00
jemalloc_nodump_allocator.cc Detect if Jemalloc is linked with the binary (#4844) 2019-01-03 16:30:12 -08:00
jemalloc_nodump_allocator.h Detect if Jemalloc is linked with the binary (#4844) 2019-01-03 16:30:12 -08:00
kv_map.h Consolidate hash function used for non-persistent data in a new function (#5155) 2019-04-08 13:32:06 -07:00
log_buffer.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
log_buffer.h Change RocksDB License 2017-07-15 16:11:23 -07:00
log_write_bench.cc Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
logging.h WritePrepared: Improve stress tests with slow threads (#4974) 2019-02-19 16:56:49 -08:00
memory_allocator.h s/CacheAllocator/MemoryAllocator/g (#4590) 2018-10-26 14:30:30 -07:00
memory_usage.h Change RocksDB License 2017-07-15 16:11:23 -07:00
mock_time_env.h Update RepeatableThreadTest with MockTimeEnv (#5107) 2019-03-29 10:08:50 -07:00
murmurhash.cc Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
murmurhash.h Change RocksDB License 2017-07-15 16:11:23 -07:00
mutexlock.h Change RocksDB License 2017-07-15 16:11:23 -07:00
ppc-opcode.h Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
random.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
random.h Change RocksDB License 2017-07-15 16:11:23 -07:00
rate_limiter_test.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
rate_limiter.cc rate limit auto-tuning 2017-10-04 19:15:01 -07:00
rate_limiter.h rate limit auto-tuning 2017-10-04 19:15:01 -07:00
repeatable_thread_test.cc Update RepeatableThreadTest with MockTimeEnv (#5107) 2019-03-29 10:08:50 -07:00
repeatable_thread.h Update RepeatableThreadTest with MockTimeEnv (#5107) 2019-03-29 10:08:50 -07:00
set_comparator.h WritePrepared Txn: Move DuplicateDetector to util 2018-03-05 23:57:12 -08:00
slice_transform_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
slice.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
sst_file_manager_impl.cc Fix many bugs in log statement arguments (#5089) 2019-04-04 12:12:11 -07:00
sst_file_manager_impl.h Always delete Blob DB files in the background (#4928) 2019-01-29 15:50:03 -08:00
status.cc Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
stderr_logger.h Change RocksDB License 2017-07-15 16:11:23 -07:00
stop_watch.h Make statistics's stats_level change thread-safe (#5030) 2019-03-01 10:42:09 -08:00
string_util.cc add OptionType kInt32T and kInt64T 2019-03-12 13:49:52 -07:00
string_util.h add OptionType kInt32T and kInt64T 2019-03-12 13:49:52 -07:00
sync_point_impl.cc Avoid acquiring SyncPoint mutex when it is disabled (#3991) 2018-06-13 13:13:18 -07:00
sync_point_impl.h Avoid acquiring SyncPoint mutex when it is disabled (#3991) 2018-06-13 13:13:18 -07:00
sync_point.cc Add read retry support to log reader (#4394) 2018-10-19 11:53:00 -07:00
sync_point.h Fix singleton destruction order of PosixEnv and SyncPoint (#3951) 2018-06-04 15:58:46 -07:00
testharness.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
testharness.h Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
testutil.cc Periodic Compactions (#5166) 2019-04-10 19:31:18 -07:00
testutil.h Backup engine support for direct I/O reads (#4640) 2018-11-13 11:17:25 -08:00
thread_list_test.cc Use nullptr instead of NULL / 0 more consistently. 2018-03-07 12:42:12 -08:00
thread_local_test.cc Comment out unused variables 2018-03-05 13:13:41 -08:00
thread_local.cc Enable building of ARM32 (#4349) 2018-10-09 16:58:25 -07:00
thread_local.h Provide a way to override windows memory allocator with jemalloc for ZSTD 2018-06-04 12:12:48 -07:00
thread_operation.h Add inline comments to flush job (#4464) 2018-10-05 15:41:17 -07:00
threadpool_imp.cc Collect compaction stats by priority and dump to info LOG (#5050) 2019-03-19 17:28:19 -07:00
threadpool_imp.h Support lowering CPU priority of background threads 2018-04-24 08:41:51 -07:00
timer_queue_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
timer_queue.h When closing BlobDB, should first wait for all background tasks (#5005) 2019-02-21 17:26:01 -08:00
trace_replay.cc Add an option to filter traces (#5082) 2019-03-19 14:36:51 -07:00
trace_replay.h Add an option to filter traces (#5082) 2019-03-19 14:36:51 -07:00
transaction_test_util.cc Fix many bugs in log statement arguments (#5089) 2019-04-04 12:12:11 -07:00
transaction_test_util.h Enhance transaction_test_util with delays (#4970) 2019-02-11 16:02:37 -08:00
user_comparator_wrapper.h Fix perf_context.user_key_comparison_count for range scan (#5098) 2019-03-27 10:34:27 -07:00
util.h Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
vector_iterator.h Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
xxhash.cc xxhash 64 support 2018-11-01 15:44:06 -07:00
xxhash.h Move #include outside of namespace (#4629) 2018-11-06 17:18:28 -08:00