rocksdb/util
Peter Dillinger 9d0cae7104 Eliminate unnecessary (slow) block cache Ref()ing in MultiGet (#9899)
Summary:
When MultiGet() determines that multiple query keys can be
served by examining the same data block in block cache (one Lookup()),
each PinnableSlice referring to data in that data block needs to hold
on to the block in cache so that they can be released at arbitrary
times by the API user. Historically this is accomplished with extra
calls to Ref() on the Handle from Lookup(), with each PinnableSlice
cleanup calling Release() on the Handle, but this creates extra
contention on the block cache for the extra Ref()s and Release()es,
especially because they hit the same cache shard repeatedly.

In the case of merge operands (possibly more cases?), the problem was
compounded by doing an extra Ref()+eventual Release() for each merge
operand for a key reusing a block (which could be the same key!), rather
than one Ref() per key. (Note: the non-shared case with `biter` was
already one per key.)

This change optimizes MultiGet not to rely on these extra, contentious
Ref()+Release() calls by instead, in the shared block case, wrapping
the cache Release() cleanup in a refcounted object referenced by the
PinnableSlices, such that after the last wrapped reference is released,
the cache entry is Release()ed. Relaxed atomic refcounts should be
much faster than mutex-guarded Ref() and Release(), and much less prone
to a performance cliff when MultiGet() does a lot of block sharing.

Note that I did not use std::shared_ptr, because that would require an
extra indirection object (shared_ptr itself new/delete) in order to
associate a ref increment/decrement with a Cleanable cleanup entry. (If
I assumed it was the size of two pointers, I could do some hackery to
make it work without the extra indirection, but that's too fragile.)

Some details:
* Fixed (removed) extra block cache tracing entries in cases of cache
entry reuse in MultiGet, but it's likely that in some other cases traces
are missing (XXX comment inserted)
* Moved existing implementations for cleanable.h from iterator.cc to
new cleanable.cc
* Improved API comments on Cleanable
* Added a public SharedCleanablePtr class to cleanable.h in case others
could benefit from the same pattern (potentially many Cleanables and/or
smart pointers referencing a shared Cleanable)
* Add a typedef for MultiGetContext::Mask
* Some variable renaming for clarity

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9899

Test Plan:
Added unit tests for SharedCleanablePtr.

Greatly enhanced ability of existing tests to detect cache use-after-free.
* Release PinnableSlices from MultiGet as they are read rather than in
bulk (in db_test_util wrapper).
* In ASAN build, default to using a trivially small LRUCache for block_cache
so that entries are immediately erased when unreferenced. (Updated two
tests that depend on caching.) New ASAN testsuite running time seems
OK to me.

If I introduce a bug into my implementation where we skip the shared
cleanups on block reuse, ASAN detects the bug in
`db_basic_test *MultiGet*`. If I remove either of the above testing
enhancements, the bug is not detected.

Consider for follow-up work: manipulate or randomize ordering of
PinnableSlice use and release from MultiGet db_test_util wrapper. But in
typical cases, natural ordering gives pretty good functional coverage.

Performance test:
In the extreme (but possible) case of MultiGetting the same or adjacent keys
in a batch, throughput can improve by an order of magnitude.
`./db_bench -benchmarks=multireadrandom -db=/dev/shm/testdb -readonly -num=5 -duration=10 -threads=20 -multiread_batched -batch_size=200`
Before ops/sec, num=5: 1,384,394
Before ops/sec, num=500: 6,423,720
After ops/sec, num=500: 10,658,794
After ops/sec, num=5: 16,027,257

Also note that previously, with high parallelism, having query keys
concentrated in a single block was worse than spreading them out a bit. Now
concentrated in a single block is faster than spread out, which is hopefully
consistent with natural expectation.

Random query performance: with num=1000000, over 999 x 10s runs running before & after simultaneously (each -threads=12):
Before: multireadrandom [AVG    999 runs] : 1088699 (± 7344) ops/sec;  120.4 (± 0.8 ) MB/sec
After: multireadrandom [AVG    999 runs] : 1090402 (± 7230) ops/sec;  120.6 (± 0.8 ) MB/sec
Possibly better, possibly in the noise.

Reviewed By: anand1976

Differential Revision: D35907003

Pulled By: pdillinger

fbshipit-source-id: bbd244d703649a8ca12d476f2d03853ed9d1a17e
2022-04-26 21:59:24 -07:00
..
aligned_buffer.h Fix wrong comments about function TruncateToPageBoundary. (#6975) 2020-10-07 12:34:34 -07:00
autovector_test.cc Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
autovector.h Add manifest fix-up utility for file temperatures (#9683) 2022-03-18 16:35:51 -07:00
bloom_impl.h FilterPolicy API changes for 7.0 (#9501) 2022-02-08 13:56:46 -08:00
bloom_test.cc Account memory of big memory users in BlockBasedTable in global memory limit (#9748) 2022-04-06 10:33:00 -07:00
build_version.cc.in Plugin Registry (#7949) 2022-04-11 13:44:09 -07:00
cast_util.h More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
channel.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
cleanable.cc Eliminate unnecessary (slow) block cache Ref()ing in MultiGet (#9899) 2022-04-26 21:59:24 -07:00
coding_lean.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
coding_test.cc Fix potential overflow of unsigned type in for loop (#6902) 2020-06-02 15:05:07 -07:00
coding.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
coding.h More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
compaction_job_stats_impl.cc Update compaction statistics to include the amount of data read from blob files (#8022) 2021-03-04 00:43:48 -08:00
comparator.cc Return different Status based on ObjectRegistry::NewObject calls (#9333) 2022-02-11 05:11:24 -08:00
compression_context_cache.cc Remove using namespace (#9369) 2022-01-12 09:31:12 -08:00
compression_context_cache.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
compression.cc Integrate WAL compression into log reader/writer. (#9642) 2022-03-09 15:49:53 -08:00
compression.h Fix minimum libzstd version that supports ZSTD_STREAMING (#9841) 2022-04-14 11:05:39 -07:00
concurrent_task_limiter_impl.cc Remove TaskLimiterToken::ReleaseOnce for fix (#8567) 2021-07-21 17:37:53 -07:00
concurrent_task_limiter_impl.h Remove TaskLimiterToken::ReleaseOnce for fix (#8567) 2021-07-21 17:37:53 -07:00
core_local.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
crc32c_arm64.cc Mac M1 crc32 intrinsics ARM64 check support proposal (#7893) 2021-03-10 09:05:56 -08:00
crc32c_arm64.h Fix compilation on Apple Silicon (#7714) 2020-12-04 15:22:33 -08:00
crc32c_ppc_asm.S Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc_constants.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_ppc.c Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
crc32c_test.cc Implementation of Crc32c combine function (#8305) 2021-06-16 18:30:34 -07:00
crc32c.cc Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
crc32c.h Implementation of Crc32c combine function (#8305) 2021-06-16 18:30:34 -07:00
defer_test.cc Fix insecure internal API for GetImpl (#8590) 2021-07-29 17:23:01 -07:00
defer.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
duplicate_detector.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
dynamic_bloom_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
dynamic_bloom.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
dynamic_bloom.h Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-27 14:55:04 -08:00
fastrange.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
file_checksum_helper.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
file_checksum_helper.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
file_reader_writer_test.cc Disallow a combination of options (#9348) 2022-01-27 19:30:24 -08:00
filelock_test.cc Fix MSVC-related build issues (#7439) 2020-10-01 09:23:04 -07:00
filter_bench.cc Refactor FilterPolicies toward Customizable (#9567) 2022-02-16 08:30:03 -08:00
gflags_compat.h Require C++17 (#9481) 2022-02-04 17:13:10 -08:00
hash128.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
hash_containers.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
hash_map.h Change HashMap::Insert()'s value to a const reference (#6567) 2020-03-20 14:59:54 -07:00
hash_test.cc Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
hash.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
heap_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
heap.h Avoid self-move-assign in pop operation of binary heap. (#7942) 2021-02-19 13:47:25 -08:00
kv_map.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
log_write_bench.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
math128.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
math.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
murmurhash.cc C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
murmurhash.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
mutexlock.h Prevents Table Cache to open same files more times (#6707) 2020-04-21 13:16:31 -07:00
ppc-opcode.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
random_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
random.cc Add De/Serialization for CompactionInput/Result (#8247) 2021-05-12 12:36:43 -07:00
random.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
rate_limiter_test.cc Make RateLimiter Customizable (#9141) 2021-12-01 06:57:02 -08:00
rate_limiter.cc remove unused instance variable in GenericRateLimiter (#9484) 2022-02-01 14:04:12 -08:00
rate_limiter.h remove unused instance variable in GenericRateLimiter (#9484) 2022-02-01 14:04:12 -08:00
repeatable_thread_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
repeatable_thread.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
ribbon_alg.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_config.cc Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_config.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_impl.h Account Bloom/Ribbon filter construction memory in global memory limit (#9073) 2021-11-18 09:42:20 -08:00
ribbon_test.cc Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
set_comparator.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
slice_test.cc Require C++17 (#9481) 2022-02-04 17:13:10 -08:00
slice_transform_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
slice.cc Fix LITE build for SliceTransform::AsString (#9460) 2022-01-27 16:58:22 -08:00
status.cc Combine data members of IOStatus with Status (#9549) 2022-02-22 11:23:01 -08:00
stderr_logger.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
stop_watch.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
string_util.cc Remove some unneeded code (#8736) 2021-09-01 14:28:58 -07:00
string_util.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
thread_guard.h Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112) 2021-03-25 22:08:58 -07:00
thread_list_test.cc fix thread status synchronization in thread_list_test (#7825) 2021-01-04 10:46:24 -08:00
thread_local_test.cc Add StartThread type checking wrapper (#8303) 2021-05-19 16:51:13 -07:00
thread_local.cc Fix typo in ThreadData comment (#7131) 2020-07-15 09:23:23 -07:00
thread_local.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
thread_operation.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
threadpool_imp.cc Remove incremental ID from background thread pool names (#9165) 2021-11-16 18:26:12 -08:00
threadpool_imp.h Make it able to lower cpu priority to specific level in threadpool (#6969) 2020-06-13 13:25:20 -07:00
timer_queue_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
timer_queue.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
timer_test.cc Fix a timer crash caused by invalid memory management (#9656) 2022-03-12 11:45:56 -08:00
timer.h Fix a timer crash caused by invalid memory management (#9656) 2022-03-12 11:45:56 -08:00
user_comparator_wrapper.h Enable backward iterator for keys with user-defined timestamp (#8035) 2021-03-10 11:15:46 -08:00
vector_iterator.h Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
work_queue_test.cc Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
work_queue.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
xxhash.cc Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
xxhash.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
xxph3.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00