5f8f2fda0e
Summary: FullFilterBitsReader, after creating in BloomFilterPolicy, was responsible for decoding metadata bits. This meant that FullFilterBitsReader::MayMatch had some metadata checks in order to implement "always true" or "always false" functionality in the case of inconsistent or trivial metadata. This made for ugly mixing-of-concerns code and probably had some runtime cost. It also didn't really support plugging in alternative filter implementations with extensions to the existing metadata schema. BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible for decoding filter metadata bits and constructing appropriate instances deriving from FilterBitsReader. "Always false" and "always true" derived classes allow FullFilterBitsReader not to be concerned with handling of trivial or inconsistent metadata. This also makes for easy expansion to alternative filter implementations in new, alternative derived classes. This change makes calls to FilterBitsReader::MayMatch *necessarily* virtual because there's now more than one built-in implementation. Compared with the previous implementation's extra 'if' checks in MayMatch, there's no consistent performance difference, measured by (an older revision of) filter_bench (differences here seem to be within noise): Inside queries... - Dry run (407) ns/op: 35.9996 + Dry run (407) ns/op: 35.2034 - Single filter ns/op: 47.5483 + Single filter ns/op: 47.4034 - Batched, prepared ns/op: 43.1559 + Batched, prepared ns/op: 42.2923 ... - Random filter ns/op: 150.697 + Random filter ns/op: 149.403 ---------------------------- Outside queries... - Dry run (980) ns/op: 34.6114 + Dry run (980) ns/op: 34.0405 - Single filter ns/op: 56.8326 + Single filter ns/op: 55.8414 - Batched, prepared ns/op: 48.2346 + Batched, prepared ns/op: 47.5667 - Random filter ns/op: 155.377 + Random filter ns/op: 153.942 Average FP rate %: 1.1386 Also, the FullFilterBitsReader ctor was responsible for a surprising amount of CPU in production, due in part to inefficient determination of the CACHE_LINE_SIZE used to construct the filter being read. The overwhelming common case (same as my CACHE_LINE_SIZE) is now substantially optimized, as shown with filter_bench with -new_reader_every=1 (old option - see below) (repeatable result): Inside queries... - Dry run (453) ns/op: 118.799 + Dry run (453) ns/op: 105.869 - Single filter ns/op: 82.5831 + Single filter ns/op: 74.2509 ... - Random filter ns/op: 224.936 + Random filter ns/op: 194.833 ---------------------------- Outside queries... - Dry run (aa1) ns/op: 118.503 + Dry run (aa1) ns/op: 104.925 - Single filter ns/op: 90.3023 + Single filter ns/op: 83.425 ... - Random filter ns/op: 220.455 + Random filter ns/op: 175.7 Average FP rate %: 1.13886 However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse. Also fixed inadequate check of consistency between filter data size and num_lines. (Unit test updated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941 Test Plan: previously added unit tests FullBloomTest.CorruptFilters and FullBloomTest.RawSchema Differential Revision: D18018353 Pulled By: pdillinger fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c |
||
---|---|---|
.. | ||
aligned_buffer.h | ||
autovector_test.cc | ||
autovector.h | ||
bloom_impl.h | ||
bloom_test.cc | ||
bloom.cc | ||
build_version.cc.in | ||
build_version.h | ||
cast_util.h | ||
channel.h | ||
coding_test.cc | ||
coding.cc | ||
coding.h | ||
compaction_job_stats_impl.cc | ||
comparator.cc | ||
compression_context_cache.cc | ||
compression_context_cache.h | ||
compression.h | ||
concurrent_task_limiter_impl.cc | ||
concurrent_task_limiter_impl.h | ||
core_local.h | ||
crc32c_arm64.cc | ||
crc32c_arm64.h | ||
crc32c_ppc_asm.S | ||
crc32c_ppc_constants.h | ||
crc32c_ppc.c | ||
crc32c_ppc.h | ||
crc32c_test.cc | ||
crc32c.cc | ||
crc32c.h | ||
duplicate_detector.h | ||
dynamic_bloom_test.cc | ||
dynamic_bloom.cc | ||
dynamic_bloom.h | ||
file_reader_writer_test.cc | ||
filelock_test.cc | ||
filter_bench.cc | ||
filter_policy.cc | ||
gflags_compat.h | ||
hash_map.h | ||
hash_test.cc | ||
hash.cc | ||
hash.h | ||
heap_test.cc | ||
heap.h | ||
kv_map.h | ||
log_write_bench.cc | ||
murmurhash.cc | ||
murmurhash.h | ||
mutexlock.h | ||
ppc-opcode.h | ||
random.cc | ||
random.h | ||
rate_limiter_test.cc | ||
rate_limiter.cc | ||
rate_limiter.h | ||
repeatable_thread_test.cc | ||
repeatable_thread.h | ||
set_comparator.h | ||
slice_transform_test.cc | ||
slice.cc | ||
status.cc | ||
stderr_logger.h | ||
stop_watch.h | ||
string_util.cc | ||
string_util.h | ||
thread_list_test.cc | ||
thread_local_test.cc | ||
thread_local.cc | ||
thread_local.h | ||
thread_operation.h | ||
threadpool_imp.cc | ||
threadpool_imp.h | ||
timer_queue_test.cc | ||
timer_queue.h | ||
user_comparator_wrapper.h | ||
util.h | ||
vector_iterator.h | ||
xxhash.cc | ||
xxhash.h |