rocksdb/util
Hui Xiao 74544d582f Account Bloom/Ribbon filter construction memory in global memory limit (#9073)
Summary:
Note: This PR is the 4th part of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073) and will rebase/merge only after the first three PRs (https://github.com/facebook/rocksdb/pull/9070, https://github.com/facebook/rocksdb/pull/9071, https://github.com/facebook/rocksdb/pull/9130) merge.

**Context:**
Similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`.

The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](https://github.com/facebook/rocksdb/pull/9073#discussion_r741548409) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option.

**Summary:**
- Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction:
   - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance),
   - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`),
   - final filter (i.e, `mutable_buf`'s size).
      - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as  explicitly delete the filter bits builder when done with the final filter in block based table.
- Added option fo run `filter_bench` with this memory reservation feature

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9073

Test Plan:
- Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of  `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory`
  - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally
- Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter
- Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below:
   - FastLocalBloom
      - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`:
         - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0)
      - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`:
         - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0)
      - new feature of RibbonFilter with fallback  (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
         - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0)

    - Ribbon128
       - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`:
           - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0)
       - new feature  (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `:
           - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0)
       - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` :
          - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0)
          - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console.

Reviewed By: pdillinger

Differential Revision: D31991348

Pulled By: hx235

fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
2021-11-18 09:42:20 -08:00
..
aligned_buffer.h Fix wrong comments about function TruncateToPageBoundary. (#6975) 2020-10-07 12:34:34 -07:00
autovector_test.cc Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
autovector.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
bloom_impl.h Ribbon: InterleavedSolutionStorage (#7598) 2020-11-03 12:46:36 -08:00
bloom_test.cc Account Bloom/Ribbon filter construction memory in global memory limit (#9073) 2021-11-18 09:42:20 -08:00
build_version.cc.in Make builds reproducible (#7866) 2021-01-28 17:42:16 -08:00
cast_util.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
channel.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
coding_lean.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
coding_test.cc Fix potential overflow of unsigned type in for loop (#6902) 2020-06-02 15:05:07 -07:00
coding.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
coding.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
compaction_job_stats_impl.cc Update compaction statistics to include the amount of data read from blob files (#8022) 2021-03-04 00:43:48 -08:00
comparator.cc Add customizable_util.h to the public API (#8301) 2021-06-29 09:08:57 -07:00
compression_context_cache.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
compression_context_cache.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
compression.h Add support for building on s390x platform (#8962) 2021-10-22 10:13:15 -07:00
concurrent_task_limiter_impl.cc Remove TaskLimiterToken::ReleaseOnce for fix (#8567) 2021-07-21 17:37:53 -07:00
concurrent_task_limiter_impl.h Remove TaskLimiterToken::ReleaseOnce for fix (#8567) 2021-07-21 17:37:53 -07:00
core_local.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
crc32c_arm64.cc Mac M1 crc32 intrinsics ARM64 check support proposal (#7893) 2021-03-10 09:05:56 -08:00
crc32c_arm64.h Fix compilation on Apple Silicon (#7714) 2020-12-04 15:22:33 -08:00
crc32c_ppc_asm.S Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc_constants.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_ppc.c Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
crc32c_test.cc Implementation of Crc32c combine function (#8305) 2021-06-16 18:30:34 -07:00
crc32c.cc Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
crc32c.h Implementation of Crc32c combine function (#8305) 2021-06-16 18:30:34 -07:00
defer_test.cc Fix insecure internal API for GetImpl (#8590) 2021-07-29 17:23:01 -07:00
defer.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
duplicate_detector.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
dynamic_bloom_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
dynamic_bloom.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
dynamic_bloom.h Genericize and clean up FastRange (#7436) 2020-09-28 11:35:00 -07:00
fastrange.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
file_checksum_helper.cc Make WalFilter, SstPartitionerFactory, FileChecksumGenFactory, and TableProperties Customizable (#8638) 2021-09-28 05:32:02 -07:00
file_checksum_helper.h Make WalFilter, SstPartitionerFactory, FileChecksumGenFactory, and TableProperties Customizable (#8638) 2021-09-28 05:32:02 -07:00
file_reader_writer_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
filelock_test.cc Fix MSVC-related build issues (#7439) 2020-10-01 09:23:04 -07:00
filter_bench.cc Account Bloom/Ribbon filter construction memory in global memory limit (#9073) 2021-11-18 09:42:20 -08:00
gflags_compat.h Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
hash128.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
hash_map.h Change HashMap::Insert()'s value to a const reference (#6567) 2020-03-20 14:59:54 -07:00
hash_test.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
heap_test.cc Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
heap.h Avoid self-move-assign in pop operation of binary heap. (#7942) 2021-02-19 13:47:25 -08:00
kv_map.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
log_write_bench.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
math128.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
math.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
murmurhash.cc C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
murmurhash.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
mutexlock.h Prevents Table Cache to open same files more times (#6707) 2020-04-21 13:16:31 -07:00
ppc-opcode.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
random_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
random.cc Add De/Serialization for CompactionInput/Result (#8247) 2021-05-12 12:36:43 -07:00
random.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
rate_limiter_test.cc Return Status::NotSupported() in RateLimiter::GetTotalPendingRequests default impl (#8950) 2021-09-22 19:36:06 -07:00
rate_limiter.cc Fix BackupEngine's internal callers of GenericRateLimiter::Request() not honoring bytes <= GetSingleBurstBytes() (#9063) 2021-11-16 09:52:16 -08:00
rate_limiter.h Fix BackupEngine's internal callers of GenericRateLimiter::Request() not honoring bytes <= GetSingleBurstBytes() (#9063) 2021-11-16 09:52:16 -08:00
regex.cc Improve support for using regexes (#8740) 2021-09-07 13:05:23 -07:00
repeatable_thread_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
repeatable_thread.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
ribbon_alg.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_config.cc Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_config.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_impl.h Account Bloom/Ribbon filter construction memory in global memory limit (#9073) 2021-11-18 09:42:20 -08:00
ribbon_test.cc Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
set_comparator.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
slice_test.cc Improve support for using regexes (#8740) 2021-09-07 13:05:23 -07:00
slice_transform_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
slice.cc Allow unregistered options to be ignored in DBOptions from files (#9045) 2021-10-19 10:43:04 -07:00
status.cc Add remote compaction public API (#8300) 2021-05-19 21:41:31 -07:00
stderr_logger.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
stop_watch.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
string_util.cc Remove some unneeded code (#8736) 2021-09-01 14:28:58 -07:00
string_util.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
thread_guard.h Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112) 2021-03-25 22:08:58 -07:00
thread_list_test.cc fix thread status synchronization in thread_list_test (#7825) 2021-01-04 10:46:24 -08:00
thread_local_test.cc Add StartThread type checking wrapper (#8303) 2021-05-19 16:51:13 -07:00
thread_local.cc Fix typo in ThreadData comment (#7131) 2020-07-15 09:23:23 -07:00
thread_local.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
thread_operation.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
threadpool_imp.cc Remove incremental ID from background thread pool names (#9165) 2021-11-16 18:26:12 -08:00
threadpool_imp.h Make it able to lower cpu priority to specific level in threadpool (#6969) 2020-06-13 13:25:20 -07:00
timer_queue_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
timer_queue.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
timer_test.cc Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
timer.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
user_comparator_wrapper.h Enable backward iterator for keys with user-defined timestamp (#8035) 2021-03-10 11:15:46 -08:00
vector_iterator.h Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
work_queue_test.cc Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
work_queue.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
xxhash.cc Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
xxhash.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
xxph3.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00