rocksdb/table/block_based
Peter Dillinger a92bd0a183 Optimize memory and CPU for building new Bloom filter (#6175)
Summary:
The filter bits builder collects all the hashes to add in memory before adding them (because the number of keys is not known until we've walked over all the keys). Existing code uses a std::vector for this, which can mean up to 2x than necessary space allocated (and not freed) and up to ~2x write amplification in memory. Using std::deque uses close to minimal space (for large filters, the only time it matters), no write amplification, frees memory while building, and no need for large contiguous memory area. The only cost is more calls to allocator, which does not appear to matter, at least in benchmark test.

For now, this change only applies to the new (format_version=5) Bloom filter implementation, to ease before-and-after comparison downstream.

Temporary memory use during build is about the only way the new Bloom filter could regress vs. the old (because of upgrade to 64-bit hash) and that should only matter for full filters. This change should largely mitigate that potential regression.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6175

Test Plan:
Using filter_bench with -new_builder option and 6M keys per filter is like large full filter (improvement). 10k keys and no -new_builder is like partitioned filters (about the same). (Corresponding configurations run simultaneously on devserver.)

std::vector impl (before)

    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
    average_keys_per_filter=6000000
    Build avg ns/key: 52.2027
    Maximum resident set size (kbytes): 1105016
    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
    average_keys_per_filter=10000
    Build avg ns/key: 30.5694
    Maximum resident set size (kbytes): 1208152

std::deque impl (after)

    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
    average_keys_per_filter=6000000
    Build avg ns/key: 39.0697
    Maximum resident set size (kbytes): 1087196
    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
    average_keys_per_filter=10000
    Build avg ns/key: 30.9348
    Maximum resident set size (kbytes): 1207980

Differential Revision: D19053431

Pulled By: pdillinger

fbshipit-source-id: 2888e748723a19d9ea40403934f13cbb8483430c
2019-12-15 21:31:08 -08:00
..
block_based_filter_block_test.cc Prepare filter tests for more implementations (#5967) 2019-10-31 14:12:33 -07:00
block_based_filter_block.cc Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705) 2019-08-14 18:16:06 -07:00
block_based_filter_block.h Use delete to disable automatic generated methods. (#5009) 2019-09-11 18:09:00 -07:00
block_based_table_builder.cc Apply formatter to some recent commits (#6138) 2019-12-09 15:49:49 -08:00
block_based_table_builder.h Expose and elaborate FilterBuildingContext (#6088) 2019-11-26 18:24:10 -08:00
block_based_table_factory.cc Expose and elaborate FilterBuildingContext (#6088) 2019-11-26 18:24:10 -08:00
block_based_table_factory.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_based_table_reader.cc Introduce a new storage specific Env API (#5761) 2019-12-13 14:48:41 -08:00
block_based_table_reader.h Introduce a new storage specific Env API (#5761) 2019-12-13 14:48:41 -08:00
block_builder.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_builder.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_prefix_index.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
block_prefix_index.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
block_test.cc upgrade gtest 1.7.0 => 1.8.1 for json result writing 2019-09-09 11:24:11 -07:00
block_type.h Make the 'block read count' performance counters consistent (#5484) 2019-06-18 19:03:24 -07:00
block.cc Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
block.h Add class comment for Block 2019-09-24 11:02:11 -07:00
cachable_entry.h Move the filter readers out of the block cache (#5504) 2019-07-16 13:14:58 -07:00
data_block_footer.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
data_block_footer.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
data_block_hash_index_test.cc Introduce a new storage specific Env API (#5761) 2019-12-13 14:48:41 -08:00
data_block_hash_index.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
data_block_hash_index.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
filter_block_reader_common.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
filter_block_reader_common.h Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705) 2019-08-14 18:16:06 -07:00
filter_block.h Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
filter_policy_internal.h Expose and elaborate FilterBuildingContext (#6088) 2019-11-26 18:24:10 -08:00
filter_policy.cc Optimize memory and CPU for building new Bloom filter (#6175) 2019-12-15 21:31:08 -08:00
flush_block_policy.cc Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
flush_block_policy.h Organizing rocksdb/table directory by format 2019-05-30 14:51:11 -07:00
full_filter_block_test.cc Expose and elaborate FilterBuildingContext (#6088) 2019-11-26 18:24:10 -08:00
full_filter_block.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
full_filter_block.h Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
index_builder.cc Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
index_builder.h Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
mock_block_based_table.h Expose and elaborate FilterBuildingContext (#6088) 2019-11-26 18:24:10 -08:00
parsed_full_filter_block.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
parsed_full_filter_block.h New Bloom filter implementation for full and partitioned filters (#6007) 2019-11-13 16:44:01 -08:00
partitioned_filter_block_test.cc Expose and elaborate FilterBuildingContext (#6088) 2019-11-26 18:24:10 -08:00
partitioned_filter_block.cc Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
partitioned_filter_block.h Store the filter bits reader alongside the filter block contents (#5936) 2019-10-18 19:32:59 -07:00
uncompression_dict_reader.cc Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
uncompression_dict_reader.h Revert to storing UncompressionDicts in the cache (#5645) 2019-08-23 08:27:30 -07:00