rocksdb/include/rocksdb
Akanksha Mahajan 2db6a4a1d6 Seek parallelization (#9994)
Summary:
The RocksDB iterator is a hierarchy of iterators. MergingIterator maintains a heap of LevelIterators, one for each L0 file and for each non-zero level. The Seek() operation naturally lends itself to parallelization, as it involves positioning every LevelIterator on the correct data block in the correct SST file. It lookups a level for a target key, to find the first key that's >= the target key. This typically involves reading one data block that is likely to contain the target key, and scan forward to find the first valid key. The forward scan may read more data blocks. In order to find the right data block, the iterator may read some metadata blocks (required for opening a file and searching the index).
This flow can be parallelized.

Design: Seek will be called two times under async_io option. First seek will send asynchronous request to prefetch the data blocks at each level and second seek will follow the normal flow and in FilePrefetchBuffer::TryReadFromCacheAsync it will wait for the Poll() to get the results and add the iterator to min_heap.
- Status::TryAgain is passed down from FilePrefetchBuffer::PrefetchAsync to block_iter_.Status indicating asynchronous request has been submitted.
- If for some reason asynchronous request returns error in submitting the request, it will fallback to sequential reading of blocks in one pass.
- If the data already exists in prefetch_buffer, it will return the data without prefetching further and it will be treated as single pass of seek.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9994

Test Plan:
- **Run Regressions.**
```
./db_bench -db=/tmp/prefix_scan_prefetch_main -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -use_direct_io_for_flush_and_compaction=true -target_file_size_base=16777216
```
i) Previous release 7.0 run for normal prefetching with async_io disabled:
```
./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 7.0
Date:       Thu Mar 17 13:11:34 2022
CPU:        24 * Intel Core Processor (Broadwell)
CPUCache:   16384 KB
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     512 bytes each (256 bytes after compression)
Entries:    5000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    2594.0 MB (estimated)
FileSize:   1373.3 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
DB path: [/tmp/prefix_scan_prefetch_main]
seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
```

ii) normal prefetching after changes with async_io disable:
```
./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
Set seed to 1652922591315307 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 7.3
Date:       Wed May 18 18:09:51 2022
CPU:        32 * Intel Xeon Processor (Skylake)
CPUCache:   16384 KB
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     512 bytes each (256 bytes after compression)
Entries:    5000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    2594.0 MB (estimated)
FileSize:   1373.3 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
DB path: [/tmp/prefix_scan_prefetch_main]
seekrandom   :  483080.466 micros/op 2 ops/sec 120.287 seconds 249 operations;  340.8 MB/s (249 of 249 found)
```
iii) db_bench with async_io enabled completed succesfully

```
./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1 -adaptive_readahead=1
Set seed to 1652924062021732 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 7.3
Date:       Wed May 18 18:34:22 2022
CPU:        32 * Intel Xeon Processor (Skylake)
CPUCache:   16384 KB
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     512 bytes each (256 bytes after compression)
Entries:    5000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    2594.0 MB (estimated)
FileSize:   1373.3 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
DB path: [/tmp/prefix_scan_prefetch_main]
seekrandom   :  553913.576 micros/op 1 ops/sec 120.199 seconds 217 operations;  293.6 MB/s (217 of 217 found)
```

- db_stress with async_io disabled completed succesfully
```
 export CRASH_TEST_EXT_ARGS=" --async_io=0"
 make crash_test -j
```

I**n Progress**: db_stress with async_io is failing and working on debugging/fixing it.

Reviewed By: anand1976

Differential Revision: D36459323

Pulled By: akankshamahajan15

fbshipit-source-id: abb1cd944abe712bae3986ae5b16704b3338917c
2022-05-20 16:09:33 -07:00
..
utilities Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
advanced_options.h Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857) 2022-05-20 12:09:09 -07:00
c.h Seek parallelization (#9994) 2022-05-20 16:09:33 -07:00
cache_bench_tool.h Allow cache_bench/db_bench to use a custom secondary cache (#8312) 2021-05-19 15:26:18 -07:00
cache.h Rewrite memory-charging feature's option API (#9926) 2022-05-17 15:01:51 -07:00
cleanable.h Eliminate unnecessary (slow) block cache Ref()ing in MultiGet (#9899) 2022-04-26 21:59:24 -07:00
compaction_filter.h Rename kRemoveWithSingleDelete to kPurge (#9951) 2022-05-05 08:16:20 -07:00
compaction_job_stats.h Update compaction statistics to include the amount of data read from blob files (#8022) 2021-03-04 00:43:48 -08:00
comparator.h Mark destructors as override (#9404) 2022-01-20 08:44:27 -08:00
compression_type.h Move CompressionType to its own header file (#7162) 2020-08-03 15:49:31 -07:00
concurrent_task_limiter.h Some API clarifications (#9080) 2021-11-02 20:30:07 -07:00
configurable.h Improve performance of SliceTransform::AsString (#9401) 2022-01-27 10:05:33 -08:00
convenience.h Specify largest_seqno in VerifyChecksum (#9919) 2022-05-02 10:22:08 -07:00
customizable.h Mark destructors as override (#9404) 2022-01-20 08:44:27 -08:00
data_structure.h Add (Live)FileStorageInfo API (#8968) 2021-10-16 10:04:32 -07:00
db_bench_tool.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_dump_tool.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_stress_tool.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db.h Mark GetLiveFilesStorageInfo ready for production use (#9868) 2022-04-20 16:09:34 -07:00
env_encryption.h Some API clarifications (#9080) 2021-11-02 20:30:07 -07:00
env.h typo fix: delete duplicate comment word (#9249) 2022-05-06 18:29:33 -07:00
experimental.h Add manifest fix-up utility for file temperatures (#9683) 2022-03-18 16:35:51 -07:00
file_checksum.h Mark destructors as override (#9404) 2022-01-20 08:44:27 -08:00
file_system.h Address comments for PR #9988 and #9996 (#10020) 2022-05-19 15:23:53 -07:00
filter_policy.h Fix a major performance bug in 7.0 re: filter compatibility (#9736) 2022-03-23 10:00:54 -07:00
flush_block_policy.h Some API clarifications (#9080) 2021-11-02 20:30:07 -07:00
functor_wrapper.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
io_status.h Combine data members of IOStatus with Status (#9549) 2022-02-22 11:23:01 -08:00
iostats_context.h Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015) 2022-05-18 15:25:19 -07:00
iterator.h Fix a few documentation errors including in public APIs (#9789) 2022-04-01 10:30:17 -07:00
ldb_tool.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
listener.h Add temperature information to the event listener callbacks (#9591) 2022-02-18 11:23:18 -08:00
memory_allocator.h Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
memtablerep.h Added GetFactoryCount/Names/Types to ObjectRegistry (#9358) 2022-05-16 09:44:43 -07:00
merge_operator.h Fix compile warnings (#9199) 2021-11-24 11:19:06 -08:00
metadata.h Fix FileStorageInfo fields from GetLiveFilesMetaData (#9769) 2022-03-29 14:36:35 -07:00
options.h Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
perf_context.h Seek parallelization (#9994) 2022-05-20 16:09:33 -07:00
perf_level.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
persistent_cache.h Check for and disallow shared key space in block caches (#9172) 2021-11-16 11:16:05 -08:00
rate_limiter.h Remove deprecated option new_table_reader_for_compaction_inputs (#9443) 2022-02-08 19:31:28 -08:00
rocksdb_namespace.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
secondary_cache.h Prevent double caching in the compressed secondary cache (#9747) 2022-04-11 13:28:33 -07:00
slice_transform.h Some better API and other comments (#9533) 2022-02-17 18:51:08 -08:00
slice.h Require C++17 (#9481) 2022-02-04 17:13:10 -08:00
snapshot.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
sst_dump_tool.h Add --version and --help to ldb and sst_dump (#6951) 2020-06-09 10:04:01 -07:00
sst_file_manager.h Some API clarifications (#9080) 2021-11-02 20:30:07 -07:00
sst_file_reader.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
sst_file_writer.h Support timestamps in SstFileWriter (#8899) 2021-09-09 18:58:01 -07:00
sst_partitioner.h Mark destructors as override (#9404) 2022-01-20 08:44:27 -08:00
statistics.h Multi file concurrency in MultiGet using coroutines and async IO (#9968) 2022-05-19 15:36:27 -07:00
stats_history.h More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
status.h Combine data members of IOStatus with Status (#9549) 2022-02-22 11:23:01 -08:00
system_clock.h Fix compile warnings (#9199) 2021-11-24 11:19:06 -08:00
table_properties.h Account memory of big memory users in BlockBasedTable in global memory limit (#9748) 2022-04-06 10:33:00 -07:00
table.h Rewrite memory-charging feature's option API (#9926) 2022-05-17 15:01:51 -07:00
thread_status.h Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015) 2022-05-18 15:25:19 -07:00
threadpool.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trace_reader_writer.h Update comments, fix typos. (#8721) 2021-08-27 13:16:32 -07:00
trace_record_result.h Add IteratorTraceExecutionResult for iterator related trace records. (#8687) 2021-08-20 15:35:56 -07:00
trace_record.h Add IteratorTraceExecutionResult for iterator related trace records. (#8687) 2021-08-20 15:35:56 -07:00
transaction_log.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
types.h Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
unique_id.h Adjust public APIs to prefer 128-bit SST unique ID (#10009) 2022-05-17 18:43:48 -07:00
universal_compaction.h Incremental Space Amp Compactions in Universal Style (#8655) 2021-10-20 10:04:13 -07:00
version.h Update main version.h to NEXT release (7.3) (#9852) 2022-04-18 10:26:21 -07:00
wal_filter.h Fix compile warnings (#9199) 2021-11-24 11:19:06 -08:00
write_batch_base.h Revise APIs related to user-defined timestamp (#8946) 2022-02-01 22:19:01 -08:00
write_batch.h Fix various spelling errors still found in code (#9653) 2022-05-05 19:45:32 -07:00
write_buffer_manager.h Account memory of big memory users in BlockBasedTable in global memory limit (#9748) 2022-04-06 10:33:00 -07:00