rocksdb/util
Mike Kolupaev b4d7209428 Add an option to put first key of each sst block in the index (#5289)
Summary:
The first key is used to defer reading the data block until this file gets to the top of merging iterator's heap. For short range scans, most files never make it to the top of the heap, so this change can reduce read amplification by a lot sometimes.

Consider the following workload. There are a few data streams (we'll be calling them "logs"), each stream consisting of a sequence of blobs (we'll be calling them "records"). Each record is identified by log ID and a sequence number within the log. RocksDB key is concatenation of log ID and sequence number (big endian). Reads are mostly relatively short range scans, each within a single log. Writes are mostly sequential for each log, but writes to different logs are randomly interleaved. Compactions are disabled; instead, when we accumulate a few tens of sst files, we create a new column family and start writing to it.

So, a typical sst file consists of a few ranges of blocks, each range corresponding to one log ID (we use FlushBlockPolicy to cut blocks at log boundaries). A typical read would go like this. First, iterator Seek() reads one block from each sst file. Then a series of Next()s move through one sst file (since writes to each log are mostly sequential) until the subiterator reaches the end of this log in this sst file; then Next() switches to the next sst file and reads sequentially from that, and so on. Often a range scan will only return records from a small number of blocks in small number of sst files; in this case, the cost of initial Seek() reading one block from each file may be bigger than the cost of reading the actually useful blocks.

Neither iterate_upper_bound nor bloom filters can prevent reading one block from each file in Seek(). But this PR can: if the index contains first key from each block, we don't have to read the block until this block actually makes it to the top of merging iterator's heap, so for short range scans we won't read any blocks from most of the sst files.

This PR does the deferred block loading inside value() call. This is not ideal: there's no good way to report an IO error from inside value(). As discussed with siying offline, it would probably be better to change InternalIterator's interface to explicitly fetch deferred value and get status. I'll do it in a separate PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5289

Differential Revision: D15256423

Pulled By: al13n321

fbshipit-source-id: 750e4c39ce88e8d41662f701cf6275d9388ba46a
2019-06-24 20:54:04 -07:00
..
aligned_buffer.h Document AlignedBuffer (#5345) 2019-05-24 10:05:40 -07:00
autovector_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
autovector.h Use placement new and delete in autovector (#5080) 2019-03-20 10:42:04 -07:00
bloom_test.cc Move some logging related files to logging/ (#5387) 2019-05-31 17:23:59 -07:00
bloom.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
build_version.cc.in Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
build_version.h Change RocksDB License 2017-07-15 16:11:23 -07:00
cast_util.h Add a missing "once" in .h 2017-07-31 12:12:03 -07:00
channel.h Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
coding_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
coding.cc Enable MSVC W4 with a few exceptions. Fix warnings and bugs 2017-10-19 10:57:12 -07:00
coding.h Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
compaction_job_stats_impl.cc Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
comparator.cc Add support for timestamp in Get/Put (#5079) 2019-06-05 23:10:47 -07:00
compression_context_cache.cc run make format for PR 3838 (#3954) 2018-06-05 12:58:02 -07:00
compression_context_cache.h run make format for PR 3838 (#3954) 2018-06-05 12:58:02 -07:00
compression.h Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
concurrent_task_limiter_impl.cc Compaction limiter miscs (#4795) 2018-12-26 13:59:35 -08:00
concurrent_task_limiter_impl.h Concurrent task limiter for compaction thread control (#4332) 2018-12-13 13:18:28 -08:00
core_local.h Change RocksDB License 2017-07-15 16:11:23 -07:00
crc32c_arm64.cc RocksDB CRC32c optimization with ARMv8 Intrinsic (#5221) 2019-04-30 10:59:05 -07:00
crc32c_arm64.h simplify include directive involving inttypes (#5402) 2019-06-06 13:56:07 -07:00
crc32c_ppc_asm.S Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_ppc_constants.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_ppc.c C file should not include <cinttypes>, it is a C++ header. (#5499) 2019-06-24 16:12:39 -07:00
crc32c_ppc.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
crc32c.cc RocksDB CRC32c optimization with ARMv8 Intrinsic (#5221) 2019-04-30 10:59:05 -07:00
crc32c.h Updated CRC32 Power Optimization Changes 2017-08-31 14:16:30 -07:00
duplicate_detector.h simplify include directive involving inttypes (#5402) 2019-06-06 13:56:07 -07:00
dynamic_bloom_test.cc simplify include directive involving inttypes (#5402) 2019-06-06 13:56:07 -07:00
dynamic_bloom.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
dynamic_bloom.h Disallow customized hash function in DynamicBloom (#4915) 2019-01-24 10:34:30 -08:00
file_reader_writer_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
file_reader_writer.cc Compaction Reads should read no more than compaction_readahead_size bytes, when set! (#5498) 2019-06-21 21:31:49 -07:00
file_reader_writer.h Combine the read-ahead logic for user reads and compaction reads (#5431) 2019-06-19 14:10:46 -07:00
filelock_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
filter_policy.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
gflags_compat.h fix gflags namespace 2017-12-01 10:42:05 -08:00
hash_map.h Change RocksDB License 2017-07-15 16:11:23 -07:00
hash_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
hash.cc Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
hash.h Consolidate hash function used for non-persistent data in a new function (#5155) 2019-04-08 13:32:06 -07:00
heap_test.cc fix gflags namespace 2017-12-01 10:42:05 -08:00
heap.h Add compaction logic to RangeDelAggregatorV2 (#4758) 2018-12-17 13:20:51 -08:00
kv_map.h Consolidate hash function used for non-persistent data in a new function (#5155) 2019-04-08 13:32:06 -07:00
log_write_bench.cc util: fix log_write_bench (#5335) 2019-05-31 17:17:57 -07:00
murmurhash.cc Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
murmurhash.h Change RocksDB License 2017-07-15 16:11:23 -07:00
mutexlock.h Change RocksDB License 2017-07-15 16:11:23 -07:00
ppc-opcode.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
random.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
random.h Change RocksDB License 2017-07-15 16:11:23 -07:00
rate_limiter_test.cc simplify include directive involving inttypes (#5402) 2019-06-06 13:56:07 -07:00
rate_limiter.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
rate_limiter.h rate limit auto-tuning 2017-10-04 19:15:01 -07:00
repeatable_thread_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
repeatable_thread.h Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
set_comparator.h WritePrepared Txn: Move DuplicateDetector to util 2018-03-05 23:57:12 -08:00
slice_transform_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
slice.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
status.cc Use separate status code for column family drop and db shutdown in progress (#5275) 2019-05-20 10:47:32 -07:00
stderr_logger.h Change RocksDB License 2017-07-15 16:11:23 -07:00
stop_watch.h Make statistics's stats_level change thread-safe (#5030) 2019-03-01 10:42:09 -08:00
string_util.cc simplify include directive involving inttypes (#5402) 2019-06-06 13:56:07 -07:00
string_util.h add OptionType kInt32T and kInt64T 2019-03-12 13:49:52 -07:00
thread_list_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
thread_local_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
thread_local.cc Enable building of ARM32 (#4349) 2018-10-09 16:58:25 -07:00
thread_local.h Provide a way to override windows memory allocator with jemalloc for ZSTD 2018-06-04 12:12:48 -07:00
thread_operation.h Add inline comments to flush job (#4464) 2018-10-05 15:41:17 -07:00
threadpool_imp.cc refactor SavePoints (#5192) 2019-04-19 20:33:04 -07:00
threadpool_imp.h Support lowering CPU priority of background threads 2018-04-24 08:41:51 -07:00
timer_queue_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
timer_queue.h Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
user_comparator_wrapper.h Fix perf_context.user_key_comparison_count for range scan (#5098) 2019-03-27 10:34:27 -07:00
util.h Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
vector_iterator.h Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
xxhash.cc Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
xxhash.h Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00