rocksdb/table
Hui Xiao 920386f2b7 Detect (new) Bloom/Ribbon Filter construction corruption (#9342)
Summary:
Note: rebase on and merge after https://github.com/facebook/rocksdb/pull/9349, https://github.com/facebook/rocksdb/pull/9345, (optional) https://github.com/facebook/rocksdb/pull/9393
**Context:**
(Quoted from pdillinger) Layers of information during new Bloom/Ribbon Filter construction in building block-based tables includes the following:
a) set of keys to add to filter
b) set of hashes to add to filter (64-bit hash applied to each key)
c) set of Bloom indices to set in filter, with duplicates
d) set of Bloom indices to set in filter, deduplicated
e) final filter and its checksum

This PR aims to detect corruption (e.g, unexpected hardware/software corruption on data structures residing in the memory for a long time) from b) to e) and leave a) as future works for application level.
- b)'s corruption is detected by verifying the xor checksum of the hash entries calculated as the entries accumulate before being added to the filter. (i.e, `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()`)
- c) - e)'s corruption is detected by verifying the hash entries indeed exists in the constructed filter by re-querying these hash entries in the filter (i.e, `FilterBitsBuilder::MaybePostVerify()`) after computing the block checksum (except for PartitionFilter, which is done right after each `FilterBitsBuilder::Finish` for impl simplicity - see code comment for more). For this stage of detection, we assume hash entries are not corrupted after checking on b) since the time interval from b) to c) is relatively short IMO.

Option to enable this feature of detection is `BlockBasedTableOptions::detect_filter_construct_corruption` which is false by default.

**Summary:**
- Implemented new functions `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()` and `FilterBitsBuilder::MaybePostVerify()`
- Ensured hash entries, final filter and banding and their [cache reservation ](https://github.com/facebook/rocksdb/issues/9073) are released properly despite corruption
   - See [Filter.construction.artifacts.release.point.pdf ](https://github.com/facebook/rocksdb/files/7923487/Design.Filter.construction.artifacts.release.point.pdf) for high-level design
   -  Bundled and refactored hash entries's related artifact in XXPH3FilterBitsBuilder into `HashEntriesInfo` for better control on lifetime of these artifact during `SwapEntires`, `ResetEntries`
- Ensured RocksDB block-based table builder calls `FilterBitsBuilder::MaybePostVerify()` after constructing the filter by `FilterBitsBuilder::Finish()`
- When encountering such filter construction corruption, stop writing the filter content to files and mark such a block-based table building non-ok by storing the corruption status in the builder.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9342

Test Plan:
- Added new unit test `DBFilterConstructionCorruptionTestWithParam.DetectCorruption`
- Included this new feature in `DBFilterConstructionReserveMemoryTestWithParam.ReserveMemory` as this feature heavily touch ReserveMemory's impl
   - For fallback case, I run `./filter_bench -impl=3 -detect_filter_construct_corruption=true -reserve_table_builder_memory=true -strict_capacity_limit=true  -quick -runs 10 | grep 'Build avg'` to make sure nothing break.
- Added to `filter_bench`: increased filter construction time by **30%**, mostly by `MaybePostVerify()`
   -  FastLocalBloom
       - Before change: `./filter_bench -impl=2 -quick -runs 10 | grep 'Build avg'`: **28.86643s**
       - After change:
          -  `./filter_bench -impl=2 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless): **27.6644s (-4% perf improvement might be due to now we don't drop bloom hash entry in `AddAllEntries` along iteration but in bulk later, same with the bypassing-MaybePostVerify case below)**
          - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (expect acceptable increase): **34.41159s (+20%)**
          - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (by-passing MaybePostVerify, expect minor increase): **27.13431s (-6%)**
    -  Standard128Ribbon
       - Before change: `./filter_bench -impl=3 -quick -runs 10 | grep 'Build avg'`: **122.5384s**
       - After change:
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless - verified by removing MaybePostVerify under this case and found only +-1ns difference): **124.3588s (+2%)**
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(expect acceptable increase): **159.4946s (+30%)**
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(by-passing MaybePostVerify, expect minor increase) : **125.258s (+2%)**
- Added to `db_stress`: `make crash_test`, `./db_stress --detect_filter_construct_corruption=true`
- Manually smoke-tested: manually corrupted the filter construction in some db level tests with basic PUT and background flush. As expected, the error did get returned to users in subsequent PUT and Flush status.

Reviewed By: pdillinger

Differential Revision: D33746928

Pulled By: hx235

fbshipit-source-id: cb056426be5a7debc1cd16f23bc250f36a08ca57
2022-02-01 17:42:35 -08:00
..
adaptive More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
block_based Detect (new) Bloom/Ribbon Filter construction corruption (#9342) 2022-02-01 17:42:35 -08:00
cuckoo Optimize & clean up footer code (#9280) 2021-12-13 17:43:07 -08:00
plain Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
block_fetcher_test.cc Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
block_fetcher.cc More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
block_fetcher.h More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
cleanable_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
format.cc Optimize & clean up footer code (#9280) 2021-12-13 17:43:07 -08:00
format.h Optimize & clean up footer code (#9280) 2021-12-13 17:43:07 -08:00
get_context.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
get_context.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
internal_iterator.h Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056) 2021-11-10 16:20:04 -08:00
iter_heap.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
iterator_wrapper.h Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056) 2021-11-10 16:20:04 -08:00
iterator.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merger_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
merging_iterator.cc MergingIterator: rearrange fields to reduce paddings (#9024) 2021-10-14 12:01:56 -07:00
merging_iterator.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
meta_blocks.cc Add NewMetaDataIterator method (#8692) 2021-12-21 11:32:49 -08:00
meta_blocks.h More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
mock_table.cc Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache (#8912) 2021-10-07 11:42:31 -07:00
mock_table.h Fix some minor issues in the Customizable infrastructure (#8566) 2021-08-19 10:10:47 -07:00
multiget_context.h Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-27 14:55:04 -08:00
persistent_cache_helper.cc New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
persistent_cache_helper.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
persistent_cache_options.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
scoped_arena_iterator.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
sst_file_dumper.cc Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
sst_file_dumper.h Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
sst_file_reader_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
sst_file_reader.cc Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
sst_file_writer_collectors.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
sst_file_writer.cc Support timestamps in SstFileWriter (#8899) 2021-09-09 18:58:01 -07:00
table_builder.h Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
table_factory.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
table_properties_internal.h Improve / clean up meta block code & integrity (#9163) 2021-11-18 11:43:44 -08:00
table_properties.cc Improve / clean up meta block code & integrity (#9163) 2021-11-18 11:43:44 -08:00
table_reader_bench.cc Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
table_reader_caller.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
table_reader.h dedup ReadOptions in iterator hierarchy (#7210) 2020-08-03 15:23:04 -07:00
table_test.cc Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
two_level_iterator.cc Clarify caching behavior for index and filter partitions (#9068) 2021-10-27 17:23:04 -07:00
two_level_iterator.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
unique_id_impl.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
unique_id.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00