rocksdb/tools
Levi Tamasi 3e1bf771a3 Make it possible to force the garbage collection of the oldest blob files (#8994)
Summary:
The current BlobDB garbage collection logic works by relocating the valid
blobs from the oldest blob files as they are encountered during compaction,
and cleaning up blob files once they contain nothing but garbage. However,
with sufficiently skewed workloads, it is theoretically possible to end up in a
situation when few or no compactions get scheduled for the SST files that contain
references to the oldest blob files, which can lead to increased space amp due
to the lack of GC.

In order to efficiently handle such workloads, the patch adds a new BlobDB
configuration option called `blob_garbage_collection_force_threshold`,
which signals to BlobDB to schedule targeted compactions for the SST files
that keep alive the oldest batch of blob files if the overall ratio of garbage in
the given blob files meets the threshold *and* all the given blob files are
eligible for GC based on `blob_garbage_collection_age_cutoff`. (For example,
if the new option is set to 0.9, targeted compactions will get scheduled if the
sum of garbage bytes meets or exceeds 90% of the sum of total bytes in the
oldest blob files, assuming all affected blob files are below the age-based cutoff.)
The net result of these targeted compactions is that the valid blobs in the oldest
blob files are relocated and the oldest blob files themselves cleaned up (since
*all* SST files that rely on them get compacted away).

These targeted compactions are similar to periodic compactions in the sense
that they force certain SST files that otherwise would not get picked up to undergo
compaction and also in the sense that instead of merging files from multiple levels,
they target a single file. (Note: such compactions might still include neighboring files
from the same level due to the need of having a "clean cut" boundary but they never
include any files from any other level.)

This functionality is currently only supported with the leveled compaction style
and is inactive by default (since the default value is set to 1.0, i.e. 100%).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8994

Test Plan: Ran `make check` and tested using `db_bench` and the stress/crash tests.

Reviewed By: riversand963

Differential Revision: D31489850

Pulled By: ltamasi

fbshipit-source-id: 44057d511726a0e2a03c5d9313d7511b3f0c4eab
2021-10-11 18:03:01 -07:00
..
advisor Update branch as "main" in tools/advisor/README.md (#8744) 2021-09-01 20:26:28 -07:00
block_cache_analyzer Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
dump Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
rdb Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
analyze_txn_stress_test.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
auto_sanity_test.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
backup_db.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
benchmark_leveldb.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
benchmark.sh Use the write amplification value calculated by RocksDB in benchmark.sh (#8915) 2021-09-15 12:16:59 -07:00
blob_dump.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
check_all_python.py Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
check_format_compatible.sh Add 6.24 to the format compatibility checker (#8716) 2021-08-26 16:35:58 -07:00
CMakeLists.txt Mark dependencies as PRIVATE and fix missing dependencies in tools. (#6790) 2020-05-12 21:07:55 -07:00
db_bench_tool_test.cc Make it possible to force the garbage collection of the oldest blob files (#8994) 2021-10-11 18:03:01 -07:00
db_bench_tool.cc Make it possible to force the garbage collection of the oldest blob files (#8994) 2021-10-11 18:03:01 -07:00
db_bench.cc Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
db_crashtest.py Make it possible to force the garbage collection of the oldest blob files (#8994) 2021-10-11 18:03:01 -07:00
db_repl_stress.cc Migrate away from Travis+Linux+amd64 (#7791) 2020-12-22 00:20:57 -08:00
db_sanity_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
dbench_monitor Fix /bin/bash shebangs 2017-08-03 15:56:46 -07:00
Dockerfile adding docker build script and dockerfile 2015-05-22 16:03:39 -07:00
generate_random_db.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
ingest_external_sst.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
io_tracer_parser_test.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
io_tracer_parser_tool.cc Add request_id in IODebugContext. (#8045) 2021-04-01 13:14:51 -07:00
io_tracer_parser_tool.h Add IO Tracer Parser (#7333) 2020-09-23 15:50:26 -07:00
io_tracer_parser.cc Add IO Tracer Parser (#7333) 2020-09-23 15:50:26 -07:00
ldb_cmd_impl.h Some fixes and enhancements to ldb repair (#8544) 2021-07-28 16:44:14 -07:00
ldb_cmd_test.cc Fix flaky ldb_cmd_test tests caused by file deletions during validation (#8942) 2021-09-21 11:27:38 -07:00
ldb_cmd.cc List blob files when using command - list_live_files_metadata (#8976) 2021-09-30 15:13:11 -07:00
ldb_test.py Add list live files metadata (#8446) 2021-06-22 19:07:46 -07:00
ldb_tool.cc Add list live files metadata (#8446) 2021-06-22 19:07:46 -07:00
ldb.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
pflag Fix /bin/bash shebangs 2017-08-03 15:56:46 -07:00
reduce_levels_test.cc Remove some unneeded code (#8736) 2021-09-01 14:28:58 -07:00
regression_test.sh Fix regression test script (#8753) 2021-09-03 19:05:33 -07:00
restore_db.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
rocksdb_dump_test.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
run_flash_bench.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
run_leveldb.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
sample-dump.dmp First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
simulated_hybrid_file_system.cc Change the File System File Wrappers to std::unique_ptr (#8618) 2021-09-13 08:46:19 -07:00
simulated_hybrid_file_system.h Change the File System File Wrappers to std::unique_ptr (#8618) 2021-09-13 08:46:19 -07:00
sst_dump_test.cc Add CreateFrom methods to Env/FileSystem (#8174) 2021-06-15 03:43:48 -07:00
sst_dump_tool.cc Add CreateFrom methods to Env/FileSystem (#8174) 2021-06-15 03:43:48 -07:00
sst_dump.cc Implement a new subcommand "identify" for sst_dump (#6943) 2020-06-08 13:58:28 -07:00
trace_analyzer_test.cc Simplify TraceAnalyzer (#8697) 2021-08-24 18:18:36 -07:00
trace_analyzer_tool.cc Simplify TraceAnalyzer (#8697) 2021-08-24 18:18:36 -07:00
trace_analyzer_tool.h Simplify TraceAnalyzer (#8697) 2021-08-24 18:18:36 -07:00
trace_analyzer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
verify_random_db.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
write_external_sst.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
write_stress_runner.py Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
write_stress.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00