14eca6bf04
Summary: The implementation of GetApproximateSizes was inconsistent in its treatment of the size of non-data blocks of SST files, sometimes including and sometimes now. This was at its worst with large portion of table file used by filters and querying a small range that crossed a table boundary: the size estimate would include large filter size. It's conceivable that someone might want only to know the size in terms of data blocks, but I believe that's unlikely enough to ignore for now. Similarly, there's no evidence the internal function AppoximateOffsetOf is used for anything other than a one-sided ApproximateSize, so I intend to refactor to remove redundancy in a follow-up commit. So to fix this, GetApproximateSizes (and implementation details ApproximateSize and ApproximateOffsetOf) now consistently include in their returned sizes a portion of table file metadata (incl filters and indexes) based on the size portion of the data blocks in range. In other words, if a key range covers data blocks that are X% by size of all the table's data blocks, returned approximate size is X% of the total file size. It would technically be more accurate to attribute metadata based on number of keys, but that's not computationally efficient with data available and rarely a meaningful difference. Also includes miscellaneous comment improvements / clarifications. Also included is a new approximatesizerandom benchmark for db_bench. No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784 Test Plan: Test added to DBTest.ApproximateSizesFilesWithErrorMargin. Old code running new test... [ RUN ] DBTest.ApproximateSizesFilesWithErrorMargin db/db_test.cc:1562: Failure Expected: (size) <= (11 * 100), actual: 9478 vs 1100 Other tests updated to reflect consistent accounting of metadata. Reviewed By: siying Differential Revision: D21334706 Pulled By: pdillinger fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185 |
||
---|---|---|
.. | ||
advisor | ||
block_cache_analyzer | ||
dump | ||
rdb | ||
analyze_txn_stress_test.sh | ||
auto_sanity_test.sh | ||
benchmark_leveldb.sh | ||
benchmark.sh | ||
blob_dump.cc | ||
check_all_python.py | ||
check_format_compatible.sh | ||
CMakeLists.txt | ||
db_bench_tool_test.cc | ||
db_bench_tool.cc | ||
db_bench.cc | ||
db_crashtest.py | ||
db_repl_stress.cc | ||
db_sanity_test.cc | ||
dbench_monitor | ||
Dockerfile | ||
generate_random_db.sh | ||
ingest_external_sst.sh | ||
ldb_cmd_impl.h | ||
ldb_cmd_test.cc | ||
ldb_cmd.cc | ||
ldb_test.py | ||
ldb_tool.cc | ||
ldb.cc | ||
pflag | ||
reduce_levels_test.cc | ||
regression_test.sh | ||
report_lite_binary_size.sh | ||
rocksdb_dump_test.sh | ||
run_flash_bench.sh | ||
run_leveldb.sh | ||
sample-dump.dmp | ||
sst_dump_test.cc | ||
sst_dump_tool_imp.h | ||
sst_dump_tool.cc | ||
sst_dump.cc | ||
trace_analyzer_test.cc | ||
trace_analyzer_tool.cc | ||
trace_analyzer_tool.h | ||
trace_analyzer.cc | ||
verify_random_db.sh | ||
write_external_sst.sh | ||
write_stress_runner.py | ||
write_stress.cc |