rocksdb/util
Peter Dillinger a92bd0a183 Optimize memory and CPU for building new Bloom filter (#6175)
Summary:
The filter bits builder collects all the hashes to add in memory before adding them (because the number of keys is not known until we've walked over all the keys). Existing code uses a std::vector for this, which can mean up to 2x than necessary space allocated (and not freed) and up to ~2x write amplification in memory. Using std::deque uses close to minimal space (for large filters, the only time it matters), no write amplification, frees memory while building, and no need for large contiguous memory area. The only cost is more calls to allocator, which does not appear to matter, at least in benchmark test.

For now, this change only applies to the new (format_version=5) Bloom filter implementation, to ease before-and-after comparison downstream.

Temporary memory use during build is about the only way the new Bloom filter could regress vs. the old (because of upgrade to 64-bit hash) and that should only matter for full filters. This change should largely mitigate that potential regression.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6175

Test Plan:
Using filter_bench with -new_builder option and 6M keys per filter is like large full filter (improvement). 10k keys and no -new_builder is like partitioned filters (about the same). (Corresponding configurations run simultaneously on devserver.)

std::vector impl (before)

    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
    average_keys_per_filter=6000000
    Build avg ns/key: 52.2027
    Maximum resident set size (kbytes): 1105016
    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
    average_keys_per_filter=10000
    Build avg ns/key: 30.5694
    Maximum resident set size (kbytes): 1208152

std::deque impl (after)

    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
    average_keys_per_filter=6000000
    Build avg ns/key: 39.0697
    Maximum resident set size (kbytes): 1087196
    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
    average_keys_per_filter=10000
    Build avg ns/key: 30.9348
    Maximum resident set size (kbytes): 1207980

Differential Revision: D19053431

Pulled By: pdillinger

fbshipit-source-id: 2888e748723a19d9ea40403934f13cbb8483430c
2019-12-15 21:31:08 -08:00
..
aligned_buffer.h Document AlignedBuffer (#5345) 2019-05-24 10:05:40 -07:00
autovector_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
autovector.h Fix the constness issues around autovector::iterator_impl's dereference operators (#6057) 2019-11-22 21:23:00 -08:00
bloom_impl.h Allow fractional bits/key in BloomFilterPolicy (#6092) 2019-11-26 15:59:34 -08:00
bloom_test.cc Expose and elaborate FilterBuildingContext (#6088) 2019-11-26 18:24:10 -08:00
build_version.cc.in Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
build_version.h Change RocksDB License 2017-07-15 16:11:23 -07:00
cast_util.h Add a missing "once" in .h 2017-07-31 12:12:03 -07:00
channel.h Fix build breakage from lock_guard error (#6161) 2019-12-12 13:50:27 -08:00
coding_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
coding.cc Enable MSVC W4 with a few exceptions. Fix warnings and bugs 2017-10-19 10:57:12 -07:00
coding.h Avoid user key copying for Get/Put/Write with user-timestamp (#5502) 2019-07-25 15:27:39 -07:00
compaction_job_stats_impl.cc Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
comparator.cc Add support for timestamp in Get/Put (#5079) 2019-06-05 23:10:47 -07:00
compression_context_cache.cc run make format for PR 3838 (#3954) 2018-06-05 12:58:02 -07:00
compression_context_cache.h run make format for PR 3838 (#3954) 2018-06-05 12:58:02 -07:00
compression.h Revert to storing UncompressionDicts in the cache (#5645) 2019-08-23 08:27:30 -07:00
concurrent_task_limiter_impl.cc Compaction limiter miscs (#4795) 2018-12-26 13:59:35 -08:00
concurrent_task_limiter_impl.h Apply formatter on recent 45 commits. (#5827) 2019-09-19 12:34:17 -07:00
core_local.h Change RocksDB License 2017-07-15 16:11:23 -07:00
crc32c_arm64.cc Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
crc32c_arm64.h Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
crc32c_ppc_asm.S Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_ppc_constants.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_ppc.c C file should not include <cinttypes>, it is a C++ header. (#5499) 2019-06-24 16:12:39 -07:00
crc32c_ppc.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
crc32c.cc Cleanup the Arm64 CRC32 unused warning (#5565) 2019-07-15 11:20:26 -07:00
crc32c.h Updated CRC32 Power Optimization Changes 2017-08-31 14:16:30 -07:00
duplicate_detector.h simplify include directive involving inttypes (#5402) 2019-06-06 13:56:07 -07:00
dynamic_bloom_test.cc Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
dynamic_bloom.cc Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
dynamic_bloom.h MultiGet batching in memtable (#5818) 2019-10-10 09:39:39 -07:00
file_reader_writer_test.cc Introduce a new storage specific Env API (#5761) 2019-12-13 14:48:41 -08:00
filelock_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
filter_bench.cc Optimize memory and CPU for building new Bloom filter (#6175) 2019-12-15 21:31:08 -08:00
gflags_compat.h filter_bench - a prelim tool for SST filter benchmarking (#5825) 2019-10-07 20:10:53 -07:00
hash_map.h Change RocksDB License 2017-07-15 16:11:23 -07:00
hash_test.cc Add new persistent 64-bit hash (#5984) 2019-10-31 16:36:35 -07:00
hash.cc Add new persistent 64-bit hash (#5984) 2019-10-31 16:36:35 -07:00
hash.h Add new persistent 64-bit hash (#5984) 2019-10-31 16:36:35 -07:00
heap_test.cc fix gflags namespace 2017-12-01 10:42:05 -08:00
heap.h Add compaction logic to RangeDelAggregatorV2 (#4758) 2018-12-17 13:20:51 -08:00
kv_map.h Consolidate hash function used for non-persistent data in a new function (#5155) 2019-04-08 13:32:06 -07:00
log_write_bench.cc Divide file_reader_writer.h and .cc (#5803) 2019-09-16 10:33:51 -07:00
murmurhash.cc Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
murmurhash.h Change RocksDB License 2017-07-15 16:11:23 -07:00
mutexlock.h Apply formatter on recent 45 commits. (#5827) 2019-09-19 12:34:17 -07:00
ppc-opcode.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
random_test.cc Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154) 2019-12-13 14:30:14 -08:00
random.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
random.h Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154) 2019-12-13 14:30:14 -08:00
rate_limiter_test.cc Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
rate_limiter.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
rate_limiter.h rate limit auto-tuning 2017-10-04 19:15:01 -07:00
repeatable_thread_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
repeatable_thread.h Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
set_comparator.h WritePrepared Txn: Move DuplicateDetector to util 2018-03-05 23:57:12 -08:00
slice_transform_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
slice.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
status.cc Work around weird unused errors with Mingw (#6075) 2019-11-26 21:42:29 -08:00
stderr_logger.h Change RocksDB License 2017-07-15 16:11:23 -07:00
stop_watch.h Make statistics's stats_level change thread-safe (#5030) 2019-03-01 10:42:09 -08:00
string_util.cc Refactor trimming logic for immutable memtables (#5022) 2019-08-23 13:55:34 -07:00
string_util.h Refactor trimming logic for immutable memtables (#5022) 2019-08-23 13:55:34 -07:00
thread_list_test.cc Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
thread_local_test.cc Fix thread_local_test failure caused by recent io_uring change (#6136) 2019-12-09 12:03:30 -08:00
thread_local.cc Enable building of ARM32 (#4349) 2018-10-09 16:58:25 -07:00
thread_local.h Provide a way to override windows memory allocator with jemalloc for ZSTD 2018-06-04 12:12:48 -07:00
thread_operation.h Add inline comments to flush job (#4464) 2018-10-05 15:41:17 -07:00
threadpool_imp.cc Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
threadpool_imp.h Support lowering CPU priority of background threads 2018-04-24 08:41:51 -07:00
timer_queue_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
timer_queue.h Move test related files under util/ to test_util/ (#5377) 2019-05-30 11:25:51 -07:00
user_comparator_wrapper.h Fix perf_context.user_key_comparison_count for range scan (#5098) 2019-03-27 10:34:27 -07:00
util.h Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
vector_iterator.h Make clang-analyzer happy (#5821) 2019-09-18 15:25:48 -07:00
xxh3p.h Add new persistent 64-bit hash (#5984) 2019-10-31 16:36:35 -07:00
xxhash.cc Add new persistent 64-bit hash (#5984) 2019-10-31 16:36:35 -07:00
xxhash.h Misc hashing updates / upgrades (#5909) 2019-10-24 17:16:46 -07:00