rocksdb/table/block_based
Peter Dillinger 2a383f21f4 Add Bloom/Ribbon hybrid API support (#8679)
Summary:
This is essentially resurrection and fixing of the part of
https://github.com/facebook/rocksdb/issues/8198 that was reverted in https://github.com/facebook/rocksdb/issues/8212, using data added in https://github.com/facebook/rocksdb/issues/8246. Basically,
when configuring Ribbon filter, you can specify an LSM level before which
Bloom will be used instead of Ribbon. But Bloom is only considered for
Leveled and Universal compaction styles and file going into a known LSM
level. This way, SST file writer, FIFO compaction, etc. use Ribbon filter as
you would expect with NewRibbonFilterPolicy.

So that this can be controlled with a single int value and so that flushes
can be distinguished from intra-L0, we consider flush to go to level -1 for
the purposes of this option. (Explained in API comment.)

I also expect the most common and recommended Ribbon configuration to
use Bloom during flush, to minimize slowing down writes and because according
to my estimates, Ribbon only pays off if the structure lives in memory for
more than an hour. Thus, I have changed the default for NewRibbonFilterPolicy
to be this mild hybrid configuration. I don't really want to add something like
NewHybridFilterPolicy because at least the mild hybrid configuration (Bloom for
flush, Ribbon otherwise) should be considered a natural choice.

C APIs also updated, but because they don't support overloading,
rocksdb_filterpolicy_create_ribbon is kept pure ribbon for clarity and
rocksdb_filterpolicy_create_ribbon_hybrid must be called for a hybrid
configuration. While touching C API, I changed bits per key options from
int to double.

BuiltinFilterPolicy is needed so that LevelThresholdFilterPolicy doesn't inherit
unused fields from BloomFilterPolicy.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8679

Test Plan: new + updated tests, including crash test

Reviewed By: jay-zhuang

Differential Revision: D30445797

Pulled By: pdillinger

fbshipit-source-id: 6f5aeddfd6d79f7e55493b563c2d1d2d568892e1
2021-08-20 18:00:16 -07:00
..
binary_search_index_reader.cc Separate internal and user key comparators in BlockIter (#6944) 2020-07-07 17:26:16 -07:00
binary_search_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
block_based_filter_block_test.cc Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
block_based_filter_block.cc Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
block_based_filter_block.h Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
block_based_table_builder.cc Stable cache keys using DB session ids in SSTs (#8659) 2021-08-16 20:37:20 -07:00
block_based_table_builder.h Cache warming blocks during flush (#8561) 2021-08-03 12:44:15 -07:00
block_based_table_factory.cc Dynamically configure BlockBasedTableOptions.prepopulate_block_cache (#8620) 2021-08-05 19:44:51 -07:00
block_based_table_factory.h Refactor: use TableBuilderOptions to reduce parameter lists (#8240) 2021-04-29 07:00:50 -07:00
block_based_table_iterator.cc Clean up InternalIterator upper bound logic a little bit (#7200) 2020-08-05 10:44:57 -07:00
block_based_table_iterator.h Exclude timestamp from prefix extractor (#7668) 2020-12-01 14:07:15 -08:00
block_based_table_reader_impl.h Parallelize secondary cache lookup in MultiGet (#8405) 2021-06-18 09:35:59 -07:00
block_based_table_reader_test.cc Update the block_read_count/block_read_byte counters in MultiGet (#8676) 2021-08-20 11:50:42 -07:00
block_based_table_reader.cc Update the block_read_count/block_read_byte counters in MultiGet (#8676) 2021-08-20 11:50:42 -07:00
block_based_table_reader.h Stable cache keys using DB session ids in SSTs (#8659) 2021-08-16 20:37:20 -07:00
block_builder.cc Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
block_builder.h Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
block_like_traits.h fix lru caching test and fix reference binding to null pointer (#8326) 2021-05-24 08:37:00 -07:00
block_prefetcher.cc Improve BlockPrefetcher to prefetch only for sequential scans (#7394) 2021-04-28 12:53:46 -07:00
block_prefetcher.h Improve BlockPrefetcher to prefetch only for sequential scans (#7394) 2021-04-28 12:53:46 -07:00
block_prefix_index.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
block_prefix_index.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
block_test.cc More Makefile Cleanup (#7097) 2020-07-09 14:35:17 -07:00
block_type.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
block.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
block.h Add EnvTestWithParam::OptionsTest to the ASSERT_STATUS_CHECKED passes (#7283) 2020-08-20 19:18:35 -07:00
cachable_entry.h Parallelize secondary cache lookup in MultiGet (#8405) 2021-06-18 09:35:59 -07:00
data_block_footer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_footer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index_test.cc Make it possible to apply only a subrange of table property collectors (#8298) 2021-05-17 18:28:39 -07:00
data_block_hash_index.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
filter_block_reader_common.cc Parallelize secondary cache lookup in MultiGet (#8405) 2021-06-18 09:35:59 -07:00
filter_block_reader_common.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
filter_block.h Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
filter_policy_internal.h Add Bloom/Ribbon hybrid API support (#8679) 2021-08-20 18:00:16 -07:00
filter_policy.cc Add Bloom/Ribbon hybrid API support (#8679) 2021-08-20 18:00:16 -07:00
flush_block_policy.cc Make FlushBlockPolicyFactory into a Customizable class (#8432) 2021-07-12 09:04:59 -07:00
flush_block_policy.h Make FlushBlockPolicyFactory into a Customizable class (#8432) 2021-07-12 09:04:59 -07:00
full_filter_block_test.cc Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
full_filter_block.cc Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
full_filter_block.h Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
hash_index_reader.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
hash_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
index_builder.cc Move break into block (#7468) 2020-09-30 20:24:23 -07:00
index_builder.h Make db_basic_test pass assert status checked (#7452) 2020-09-29 09:49:04 -07:00
index_reader_common.cc Parallelize secondary cache lookup in MultiGet (#8405) 2021-06-18 09:35:59 -07:00
index_reader_common.h Divide block_based_table_reader.cc (#6527) 2020-03-12 21:41:50 -07:00
mock_block_based_table.h Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
parsed_full_filter_block.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
parsed_full_filter_block.h Use new Insert and Lookup APIs in table reader to support secondary cache (#8315) 2021-05-21 18:29:12 -07:00
partitioned_filter_block_test.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
partitioned_filter_block.cc Parallelize secondary cache lookup in MultiGet (#8405) 2021-06-18 09:35:59 -07:00
partitioned_filter_block.h Add table properties for number of entries added to filters (#8323) 2021-05-21 17:11:32 -07:00
partitioned_index_iterator.cc Fix misspelling of PartitionedIndexIterator (#7450) 2020-09-29 16:28:13 -07:00
partitioned_index_iterator.h Fix misspelling of PartitionedIndexIterator (#7450) 2020-09-29 16:28:13 -07:00
partitioned_index_reader.cc Parallelize secondary cache lookup in MultiGet (#8405) 2021-06-18 09:35:59 -07:00
partitioned_index_reader.h Get() to fail with underlying failures in PartitionIndexReader::CacheDependencies() (#7297) 2020-08-25 19:01:05 -07:00
reader_common.cc Fix block checksum for >=4GB, refactor (#6978) 2020-06-19 16:18:24 -07:00
reader_common.h Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
uncompression_dict_reader.cc Parallelize secondary cache lookup in MultiGet (#8405) 2021-06-18 09:35:59 -07:00
uncompression_dict_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00