rocksdb

History

Peter Dillinger 8aa99fc71e Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 ) Summary: With many millions of keys, the old Bloom filter implementation for the block-based table (format_version <= 4) would have excessive FP rate due to the limitations of feeding the Bloom filter with a 32-bit hash. This change computes an estimated inflated FP rate due to this effect and warns in the log whenever an SST filter is constructed (almost certainly a "full" not "partitioned" filter) that exceeds 1.5x FP rate due to this effect. The detailed condition is only checked if 3 million keys or more have been added to a filter, as this should be a lower bound for common bits/key settings (< 20). Recommended remedies include smaller SST file size, using format_version >= 5 (for new Bloom filter), or using partitioned filters. This does not change behavior other than generating warnings for some constructed filters using the old implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6317 Test Plan: Example with warning, 15M keys @ 15 bits / key: (working_mem_size_mb is just to stop after building one filter if it's large) $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=15000000 2>&1 \| grep 'FP rate' [WARN] [/block_based/filter_policy.cc:292] Using legacy SST/BBT Bloom filter with excessive key count (15.0M @ 15bpk), causing estimated 1.8x higher filter FP rate. Consider using new Bloom with format_version>=5, smaller SST file size, or partitioned filters. Predicted FP rate %: 0.766702 Average FP rate %: 0.66846 Example without warning (150K keys): $ ./filter_bench -quick -impl=0 -working_mem_size_mb=1 -bits_per_key=15 -average_keys_per_filter=150000 2>&1 \| grep 'FP rate' Predicted FP rate %: 0.422857 Average FP rate %: 0.379301 $ With more samples at 15 bits/key: 150K keys -> no warning; actual: 0.379% FP rate (baseline) 1M keys -> no warning; actual: 0.396% FP rate, 1.045x 9M keys -> no warning; actual: 0.563% FP rate, 1.485x 10M keys -> warning (1.5x); actual: 0.564% FP rate, 1.488x 15M keys -> warning (1.8x); actual: 0.668% FP rate, 1.76x 25M keys -> warning (2.4x); actual: 0.880% FP rate, 2.32x At 10 bits/key: 150K keys -> no warning; actual: 1.17% FP rate (baseline) 1M keys -> no warning; actual: 1.16% FP rate 10M keys -> no warning; actual: 1.32% FP rate, 1.13x 25M keys -> no warning; actual: 1.63% FP rate, 1.39x 35M keys -> warning (1.6x); actual: 1.81% FP rate, 1.55x At 5 bits/key: 150K keys -> no warning; actual: 9.32% FP rate (baseline) 25M keys -> no warning; actual: 9.62% FP rate, 1.03x 200M keys -> no warning; actual: 12.2% FP rate, 1.31x 250M keys -> warning (1.5x); actual: 12.8% FP rate, 1.37x 300M keys -> warning (1.6x); actual: 13.4% FP rate, 1.43x The reason for the modest inaccuracy at low bits/key is that the assumption of independence between a collision between 32-hash values feeding the filter and an FP in the filter is not quite true for implementations using "simple" logic to compute indices from the stock hash result. There's math on this in my dissertation, but I don't think it's worth the effort just for these extreme cases (> 100 million keys and low-ish bits/key). Differential Revision: D19471715 Pulled By: pdillinger fbshipit-source-id: f80c96893a09bf1152630ff0b964e5cdd7e35c68		2020-01-20 21:31:47 -08:00
..
aligned_buffer.h	Document AlignedBuffer (#5345 )	2019-05-24 10:05:40 -07:00
autovector_test.cc	Move some memory related files from util/ to memory/ (#5382 )	2019-05-30 17:44:09 -07:00
autovector.h	Fix the constness issues around autovector::iterator_impl's dereference operators (#6057 )	2019-11-22 21:23:00 -08:00
bloom_impl.h	Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 )	2020-01-20 21:31:47 -08:00
bloom_test.cc	Expose and elaborate FilterBuildingContext (#6088 )	2019-11-26 18:24:10 -08:00
build_version.cc.in	Add copyright headers per FB open-source checkup tool. (#5199 )	2019-04-18 10:55:01 -07:00
build_version.h	Change RocksDB License	2017-07-15 16:11:23 -07:00
cast_util.h	Add a missing "once" in .h	2017-07-31 12:12:03 -07:00
channel.h	Fix build breakage from lock_guard error (#6161 )	2019-12-12 13:50:27 -08:00
coding_test.cc	Move test related files under util/ to test_util/ (#5377 )	2019-05-30 11:25:51 -07:00
coding.cc	Enable MSVC W4 with a few exceptions. Fix warnings and bugs	2017-10-19 10:57:12 -07:00
coding.h	Avoid user key copying for Get/Put/Write with user-timestamp (#5502 )	2019-07-25 15:27:39 -07:00
compaction_job_stats_impl.cc	Refresh snapshot list during long compactions (2nd attempt) (#5278 )	2019-05-03 17:30:22 -07:00
comparator.cc	Add support for timestamp in Get/Put (#5079 )	2019-06-05 23:10:47 -07:00
compression_context_cache.cc	run make format for PR 3838 (#3954 )	2018-06-05 12:58:02 -07:00
compression_context_cache.h	run make format for PR 3838 (#3954 )	2018-06-05 12:58:02 -07:00
compression.h	crash_test to cover bottommost compression and some other changes (#6215 )	2019-12-20 16:14:52 -08:00
concurrent_task_limiter_impl.cc	Compaction limiter miscs (#4795 )	2018-12-26 13:59:35 -08:00
concurrent_task_limiter_impl.h	Apply formatter on recent 45 commits. (#5827 )	2019-09-19 12:34:17 -07:00
core_local.h	Change RocksDB License	2017-07-15 16:11:23 -07:00
crc32c_arm64.cc	Apply formatter to recent 200+ commits. (#5830 )	2019-09-20 12:04:26 -07:00
crc32c_arm64.h	Apply formatter to recent 200+ commits. (#5830 )	2019-09-20 12:04:26 -07:00
crc32c_ppc_asm.S	Remove PATENTS text from a few straggler files (#5326 )	2019-05-21 16:22:35 -07:00
crc32c_ppc_constants.h	Remove PATENTS text from a few straggler files (#5326 )	2019-05-21 16:22:35 -07:00
crc32c_ppc.c	C file should not include <cinttypes>, it is a C++ header. (#5499 )	2019-06-24 16:12:39 -07:00
crc32c_ppc.h	Remove PATENTS text from a few straggler files (#5326 )	2019-05-21 16:22:35 -07:00
crc32c_test.cc	Move test related files under util/ to test_util/ (#5377 )	2019-05-30 11:25:51 -07:00
crc32c.cc	Cleanup the Arm64 CRC32 unused warning (#5565 )	2019-07-15 11:20:26 -07:00
crc32c.h	Updated CRC32 Power Optimization Changes	2017-08-31 14:16:30 -07:00
duplicate_detector.h	simplify include directive involving inttypes (#5402 )	2019-06-06 13:56:07 -07:00
dynamic_bloom_test.cc	Apply formatter to recent 200+ commits. (#5830 )	2019-09-20 12:04:26 -07:00
dynamic_bloom.cc	Apply formatter to recent 200+ commits. (#5830 )	2019-09-20 12:04:26 -07:00
dynamic_bloom.h	MultiGet batching in memtable (#5818 )	2019-10-10 09:39:39 -07:00
file_reader_writer_test.cc	Introduce a new storage specific Env API (#5761 )	2019-12-13 14:48:41 -08:00
filelock_test.cc	Move some memory related files from util/ to memory/ (#5382 )	2019-05-30 17:44:09 -07:00
filter_bench.cc	Warn on excessive keys for legacy Bloom filter with 32-bit hash (#6317 )	2020-01-20 21:31:47 -08:00
gflags_compat.h	filter_bench - a prelim tool for SST filter benchmarking (#5825 )	2019-10-07 20:10:53 -07:00
hash_map.h	Change RocksDB License	2017-07-15 16:11:23 -07:00
hash_test.cc	Add new persistent 64-bit hash (#5984 )	2019-10-31 16:36:35 -07:00
hash.cc	Add new persistent 64-bit hash (#5984 )	2019-10-31 16:36:35 -07:00
hash.h	Add new persistent 64-bit hash (#5984 )	2019-10-31 16:36:35 -07:00
heap_test.cc	fix gflags namespace	2017-12-01 10:42:05 -08:00
heap.h	Add compaction logic to RangeDelAggregatorV2 (#4758 )	2018-12-17 13:20:51 -08:00
kv_map.h	Consolidate hash function used for non-persistent data in a new function (#5155 )	2019-04-08 13:32:06 -07:00
log_write_bench.cc	Divide file_reader_writer.h and .cc (#5803 )	2019-09-16 10:33:51 -07:00
murmurhash.cc	Add GCC 8 to Travis (#3433 )	2018-07-13 10:58:06 -07:00
murmurhash.h	Change RocksDB License	2017-07-15 16:11:23 -07:00
mutexlock.h	Apply formatter on recent 45 commits. (#5827 )	2019-09-19 12:34:17 -07:00
ppc-opcode.h	Remove PATENTS text from a few straggler files (#5326 )	2019-05-21 16:22:35 -07:00
random_test.cc	Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154 )	2019-12-13 14:30:14 -08:00
random.cc	Change RocksDB License	2017-07-15 16:11:23 -07:00
random.h	Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154 )	2019-12-13 14:30:14 -08:00
rate_limiter_test.cc	Apply formatter to recent 200+ commits. (#5830 )	2019-09-20 12:04:26 -07:00
rate_limiter.cc	Move some memory related files from util/ to memory/ (#5382 )	2019-05-30 17:44:09 -07:00
rate_limiter.h	rate limit auto-tuning	2017-10-04 19:15:01 -07:00
repeatable_thread_test.cc	Move some memory related files from util/ to memory/ (#5382 )	2019-05-30 17:44:09 -07:00
repeatable_thread.h	Move test related files under util/ to test_util/ (#5377 )	2019-05-30 11:25:51 -07:00
set_comparator.h	WritePrepared Txn: Move DuplicateDetector to util	2018-03-05 23:57:12 -08:00
slice_transform_test.cc	Move test related files under util/ to test_util/ (#5377 )	2019-05-30 11:25:51 -07:00
slice.cc	Apply modernize-use-override (2nd iteration)	2019-02-14 14:41:36 -08:00
status.cc	Work around weird unused errors with Mingw (#6075 )	2019-11-26 21:42:29 -08:00
stderr_logger.h	Change RocksDB License	2017-07-15 16:11:23 -07:00
stop_watch.h	Make statistics's stats_level change thread-safe (#5030 )	2019-03-01 10:42:09 -08:00
string_util.cc	Refactor trimming logic for immutable memtables (#5022 )	2019-08-23 13:55:34 -07:00
string_util.h	Refactor trimming logic for immutable memtables (#5022 )	2019-08-23 13:55:34 -07:00
thread_list_test.cc	Move test related files under util/ to test_util/ (#5377 )	2019-05-30 11:25:51 -07:00
thread_local_test.cc	Fix thread_local_test failure caused by recent io_uring change (#6136 )	2019-12-09 12:03:30 -08:00
thread_local.cc	Enable building of ARM32 (#4349 )	2018-10-09 16:58:25 -07:00
thread_local.h	Provide a way to override windows memory allocator with jemalloc for ZSTD	2018-06-04 12:12:48 -07:00
thread_operation.h	Add inline comments to flush job (#4464 )	2018-10-05 15:41:17 -07:00
threadpool_imp.cc	Apply formatter to recent 200+ commits. (#5830 )	2019-09-20 12:04:26 -07:00
threadpool_imp.h	Support lowering CPU priority of background threads	2018-04-24 08:41:51 -07:00
timer_queue_test.cc	Change RocksDB License	2017-07-15 16:11:23 -07:00
timer_queue.h	Move test related files under util/ to test_util/ (#5377 )	2019-05-30 11:25:51 -07:00
user_comparator_wrapper.h	Fix perf_context.user_key_comparison_count for range scan (#5098 )	2019-03-27 10:34:27 -07:00
util.h	Add GCC 8 to Travis (#3433 )	2018-07-13 10:58:06 -07:00
vector_iterator.h	Make clang-analyzer happy (#5821 )	2019-09-18 15:25:48 -07:00
xxh3p.h	Add new persistent 64-bit hash (#5984 )	2019-10-31 16:36:35 -07:00
xxhash.cc	Add new persistent 64-bit hash (#5984 )	2019-10-31 16:36:35 -07:00
xxhash.h	Misc hashing updates / upgrades (#5909 )	2019-10-24 17:16:46 -07:00