rocksdb/db/blob
Levi Tamasi 320d9a8e8a Use a sorted vector instead of a map to store blob file metadata (#9526)
Summary:
The patch replaces `std::map` with a sorted `std::vector` for
`VersionStorageInfo::blob_files_` and preallocates the space
for the `vector` before saving the `BlobFileMetaData` into the
new `VersionStorageInfo` in `VersionBuilder::Rep::SaveBlobFilesTo`.
These changes reduce the time the DB mutex is held while
saving new `Version`s, and using a sorted `vector` also makes
lookups faster thanks to better memory locality.

In addition, the patch introduces helper methods
`VersionStorageInfo::GetBlobFileMetaData` and
`VersionStorageInfo::GetBlobFileMetaDataLB` that can be used by
clients to perform lookups in the `vector`, and does some general
cleanup in the parts of code where blob file metadata are used.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9526

Test Plan:
Ran `make check` and the crash test script for a while.

Performance was tested using a load-optimized benchmark (`fillseq` with vector memtable, no WAL) and small file sizes so that a significant number of files are produced:

```
numactl --interleave=all ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/ltamasi-dbbench --wal_dir=/data/ltamasi-dbbench --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --enable_blob_files=1 --blob_file_size=16777216 --min_blob_size=0 --blob_compression_type=lz4 --enable_blob_garbage_collection=1 --seed=<some value>
```

Final statistics before the patch:

```
Cumulative writes: 0 writes, 700M keys, 0 commit groups, 0.0 writes per commit group, ingest: 284.62 GB, 121.27 MB/s
Interval writes: 0 writes, 334K keys, 0 commit groups, 0.0 writes per commit group, ingest: 139.28 MB, 72.46 MB/s
```

With the patch:

```
Cumulative writes: 0 writes, 760M keys, 0 commit groups, 0.0 writes per commit group, ingest: 308.66 GB, 131.52 MB/s
Interval writes: 0 writes, 445K keys, 0 commit groups, 0.0 writes per commit group, ingest: 185.35 MB, 93.15 MB/s
```

Total time to complete the benchmark is 2611 seconds with the patch, down from 2986 secs.

Reviewed By: riversand963

Differential Revision: D34082728

Pulled By: ltamasi

fbshipit-source-id: fc598abf676dce436734d06bb9d2d99a26a004fc
2022-02-09 12:36:43 -08:00
..
blob_constants.h Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_counting_iterator_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
blob_counting_iterator.h Log the amount of blob garbage generated by compactions in the MANIFEST (#8450) 2021-06-24 16:11:56 -07:00
blob_fetcher.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_fetcher.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_file_addition_test.cc Print blob file checksums as hex (#8437) 2021-06-22 09:49:44 -07:00
blob_file_addition.cc Print blob file checksums as hex (#8437) 2021-06-22 09:49:44 -07:00
blob_file_addition.h Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_builder_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
blob_file_builder.cc Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
blob_file_builder.h Expose blob file information through the EventListener interface (#8675) 2021-09-16 17:23:36 -07:00
blob_file_cache_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
blob_file_cache.cc Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
blob_file_cache.h Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
blob_file_completion_callback.h Fix LITE mode builds on MacOs (#8981) 2021-10-04 05:30:26 -07:00
blob_file_garbage_test.cc Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_garbage.cc Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_garbage.h Move BlobDB related files under db/ to db/blob/ (#6519) 2020-03-12 11:00:56 -07:00
blob_file_meta.cc Add BlobMetaData retrieval methods (#8273) 2021-06-28 08:13:29 -07:00
blob_file_meta.h Add BlobMetaData retrieval methods (#8273) 2021-06-28 08:13:29 -07:00
blob_file_reader_test.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_file_reader.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_file_reader.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
blob_garbage_meter_test.cc Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_garbage_meter.cc Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_garbage_meter.h Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_index.h Batch blob read IO for MultiGet (#8699) 2021-09-17 19:23:13 -07:00
blob_log_format.cc Introduce BlobFileCache and add support for blob files to Get() (#7540) 2020-10-15 13:04:47 -07:00
blob_log_format.h Add a class for measuring the amount of garbage generated during compaction (#8426) 2021-06-21 22:25:30 -07:00
blob_log_sequential_reader.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
blob_log_sequential_reader.h Fix a issue with initializing blob header buffer (#8537) 2021-08-02 17:15:06 -07:00
blob_log_writer.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
blob_log_writer.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
db_blob_basic_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
db_blob_compaction_test.cc Use a sorted vector instead of a map to store blob file metadata (#9526) 2022-02-09 12:36:43 -08:00
db_blob_corruption_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
db_blob_index_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
prefetch_buffer_collection.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
prefetch_buffer_collection.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00