rocksdb/tools
Peter Dillinger 14eca6bf04 For ApproximateSizes, pro-rate table metadata size over data blocks (#6784)
Summary:
The implementation of GetApproximateSizes was inconsistent in
its treatment of the size of non-data blocks of SST files, sometimes
including and sometimes now. This was at its worst with large portion
of table file used by filters and querying a small range that crossed
a table boundary: the size estimate would include large filter size.

It's conceivable that someone might want only to know the size in terms
of data blocks, but I believe that's unlikely enough to ignore for now.
Similarly, there's no evidence the internal function AppoximateOffsetOf
is used for anything other than a one-sided ApproximateSize, so I intend
to refactor to remove redundancy in a follow-up commit.

So to fix this, GetApproximateSizes (and implementation details
ApproximateSize and ApproximateOffsetOf) now consistently include in
their returned sizes a portion of table file metadata (incl filters
and indexes) based on the size portion of the data blocks in range. In
other words, if a key range covers data blocks that are X% by size of all
the table's data blocks, returned approximate size is X% of the total
file size. It would technically be more accurate to attribute metadata
based on number of keys, but that's not computationally efficient with
data available and rarely a meaningful difference.

Also includes miscellaneous comment improvements / clarifications.

Also included is a new approximatesizerandom benchmark for db_bench.
No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784

Test Plan:
Test added to DBTest.ApproximateSizesFilesWithErrorMargin.
Old code running new test...

    [ RUN      ] DBTest.ApproximateSizesFilesWithErrorMargin
    db/db_test.cc:1562: Failure
    Expected: (size) <= (11 * 100), actual: 9478 vs 1100

Other tests updated to reflect consistent accounting of metadata.

Reviewed By: siying

Differential Revision: D21334706

Pulled By: pdillinger

fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185
2020-06-02 12:30:23 -07:00
..
advisor Rules Advisor: some fixes to support fetching stats from ODS (#4223) 2018-08-02 15:42:42 -07:00
block_cache_analyzer C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
dump Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
rdb Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
analyze_txn_stress_test.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
auto_sanity_test.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
benchmark_leveldb.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
benchmark.sh Fixed typo in benchmark.sh (#6434) 2020-02-19 17:08:02 -08:00
blob_dump.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
check_all_python.py Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
check_format_compatible.sh Two Improvements to tools/check_format_compatible.sh (#6702) 2020-04-15 11:28:11 -07:00
CMakeLists.txt Mark dependencies as PRIVATE and fix missing dependencies in tools. (#6790) 2020-05-12 21:07:55 -07:00
db_bench_tool_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_bench_tool.cc For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) 2020-06-02 12:30:23 -07:00
db_bench.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_crashtest.py Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
db_repl_stress.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_sanity_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
dbench_monitor Fix /bin/bash shebangs 2017-08-03 15:56:46 -07:00
Dockerfile adding docker build script and dockerfile 2015-05-22 16:03:39 -07:00
generate_random_db.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
ingest_external_sst.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
ldb_cmd_impl.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
ldb_cmd_test.cc Improve ldb consistency checks (#6802) 2020-05-08 14:17:47 -07:00
ldb_cmd.cc sst_dump to reduce number of file reads (#6836) 2020-05-12 18:23:33 -07:00
ldb_test.py Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
ldb_tool.cc Remove extraneous newline from ldb stderr (#6897) 2020-06-01 12:15:36 -07:00
ldb.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
pflag Fix /bin/bash shebangs 2017-08-03 15:56:46 -07:00
reduce_levels_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
regression_test.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
report_lite_binary_size.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
rocksdb_dump_test.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
run_flash_bench.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
run_leveldb.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
sample-dump.dmp First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
sst_dump_test.cc Make options length longer for sst_dump_test (#6846) 2020-05-19 09:22:12 -07:00
sst_dump_tool_imp.h sst_dump to reduce number of file reads (#6836) 2020-05-12 18:23:33 -07:00
sst_dump_tool.cc sst_dump to reduce number of file reads (#6836) 2020-05-12 18:23:33 -07:00
sst_dump.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trace_analyzer_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trace_analyzer_tool.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trace_analyzer_tool.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trace_analyzer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
verify_random_db.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
write_external_sst.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
write_stress_runner.py Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
write_stress.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00