rocksdb/table/block_based
Peter Dillinger 57f3032285 Allow fractional bits/key in BloomFilterPolicy (#6092)
Summary:
There's no technological impediment to allowing the Bloom
filter bits/key to be non-integer (fractional/decimal) values, and it
provides finer control over the memory vs. accuracy trade-off. This is
especially handy in using the format_version=5 Bloom filter in place
of the old one, because bits_per_key=9.55 provides the same accuracy as
the old bits_per_key=10.

This change not only requires refining the logic for choosing the best
num_probes for a given bits/key setting, it revealed a flaw in that logic.
As bits/key gets higher, the best num_probes for a cache-local Bloom
filter is closer to bpk / 2 than to bpk * 0.69, the best choice for a
standard Bloom filter. For example, at 16 bits per key, the best
num_probes is 9 (FP rate = 0.0843%) not 11 (FP rate = 0.0884%).
This change fixes and refines that logic (for the format_version=5
Bloom filter only, just in case) based on empirical tests to find
accuracy inflection points between each num_probes.

Although bits_per_key is now specified as a double, the new Bloom
filter converts/rounds this to "millibits / key" for predictable/precise
internal computations. Just in case of unforeseen compatibility
issues, we round to the nearest whole number bits / key for the
legacy Bloom filter, so as not to unlock new behaviors for it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6092

Test Plan: unit tests included

Differential Revision: D18711313

Pulled By: pdillinger

fbshipit-source-id: 1aa73295f152a995328cb846ef9157ae8a05522a
2019-11-26 15:59:34 -08:00
..
block_based_filter_block_test.cc Prepare filter tests for more implementations (#5967) 2019-10-31 14:12:33 -07:00
block_based_filter_block.cc Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705) 2019-08-14 18:16:06 -07:00
block_based_filter_block.h Use delete to disable automatic generated methods. (#5009) 2019-09-11 18:09:00 -07:00
block_based_table_builder.cc New Bloom filter implementation for full and partitioned filters (#6007) 2019-11-13 16:44:01 -08:00
block_based_table_builder.h Use delete to disable automatic generated methods. (#5009) 2019-09-11 18:09:00 -07:00
block_based_table_factory.cc Allow fractional bits/key in BloomFilterPolicy (#6092) 2019-11-26 15:59:34 -08:00
block_based_table_factory.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_based_table_reader.cc Fix a buffer overrun problem in BlockBasedTable::MultiGet (#6014) 2019-11-11 16:59:15 -08:00
block_based_table_reader.h Fix a buffer overrun problem in BlockBasedTable::MultiGet (#6014) 2019-11-11 16:59:15 -08:00
block_builder.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_builder.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_prefix_index.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
block_prefix_index.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_test.cc upgrade gtest 1.7.0 => 1.8.1 for json result writing 2019-09-09 11:24:11 -07:00
block_type.h Make the 'block read count' performance counters consistent (#5484) 2019-06-18 19:03:24 -07:00
block.cc Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
block.h Add class comment for Block 2019-09-24 11:02:11 -07:00
cachable_entry.h Move the filter readers out of the block cache (#5504) 2019-07-16 13:14:58 -07:00
data_block_footer.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
data_block_footer.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
data_block_hash_index_test.cc New API to get all merge operands for a Key (#5604) 2019-08-06 14:26:44 -07:00
data_block_hash_index.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
data_block_hash_index.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
filter_block_reader_common.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
filter_block_reader_common.h Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705) 2019-08-14 18:16:06 -07:00
filter_block.h Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
filter_policy_internal.h Allow fractional bits/key in BloomFilterPolicy (#6092) 2019-11-26 15:59:34 -08:00
filter_policy.cc Allow fractional bits/key in BloomFilterPolicy (#6092) 2019-11-26 15:59:34 -08:00
flush_block_policy.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
flush_block_policy.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
full_filter_block_test.cc New Bloom filter implementation for full and partitioned filters (#6007) 2019-11-13 16:44:01 -08:00
full_filter_block.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
full_filter_block.h Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
index_builder.cc Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
index_builder.h Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
mock_block_based_table.h filter_bench - a prelim tool for SST filter benchmarking (#5825) 2019-10-07 20:10:53 -07:00
parsed_full_filter_block.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
parsed_full_filter_block.h New Bloom filter implementation for full and partitioned filters (#6007) 2019-11-13 16:44:01 -08:00
partitioned_filter_block_test.cc New Bloom filter implementation for full and partitioned filters (#6007) 2019-11-13 16:44:01 -08:00
partitioned_filter_block.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
partitioned_filter_block.h Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
uncompression_dict_reader.cc Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
uncompression_dict_reader.h Revert to storing UncompressionDicts in the cache (#5645) 2019-08-23 08:27:30 -07:00