A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Peter Dillinger 14eca6bf04 For ApproximateSizes, pro-rate table metadata size over data blocks (#6784)
Summary:
The implementation of GetApproximateSizes was inconsistent in
its treatment of the size of non-data blocks of SST files, sometimes
including and sometimes now. This was at its worst with large portion
of table file used by filters and querying a small range that crossed
a table boundary: the size estimate would include large filter size.

It's conceivable that someone might want only to know the size in terms
of data blocks, but I believe that's unlikely enough to ignore for now.
Similarly, there's no evidence the internal function AppoximateOffsetOf
is used for anything other than a one-sided ApproximateSize, so I intend
to refactor to remove redundancy in a follow-up commit.

So to fix this, GetApproximateSizes (and implementation details
ApproximateSize and ApproximateOffsetOf) now consistently include in
their returned sizes a portion of table file metadata (incl filters
and indexes) based on the size portion of the data blocks in range. In
other words, if a key range covers data blocks that are X% by size of all
the table's data blocks, returned approximate size is X% of the total
file size. It would technically be more accurate to attribute metadata
based on number of keys, but that's not computationally efficient with
data available and rarely a meaningful difference.

Also includes miscellaneous comment improvements / clarifications.

Also included is a new approximatesizerandom benchmark for db_bench.
No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784

Test Plan:
Test added to DBTest.ApproximateSizesFilesWithErrorMargin.
Old code running new test...

    [ RUN      ] DBTest.ApproximateSizesFilesWithErrorMargin
    db/db_test.cc:1562: Failure
    Expected: (size) <= (11 * 100), actual: 9478 vs 1100

Other tests updated to reflect consistent accounting of metadata.

Reviewed By: siying

Differential Revision: D21334706

Pulled By: pdillinger

fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185
2020-06-02 12:30:23 -07:00
.circleci pin image version in circle CI builds (#6876) 2020-05-23 21:23:27 -07:00
.github/workflows Clean up some code related to file checksums (#6861) 2020-05-21 08:12:51 -07:00
buckifier Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
build_tools Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
cache Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Find the correct gcov (#6904) 2020-06-01 16:33:05 -07:00
db For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) 2020-06-02 12:30:23 -07:00
db_stress_tool fix transaction rollback in db_stress TestMultiGet (#6873) 2020-05-24 15:27:24 -07:00
docs Log warning for high bits/key in legacy Bloom filter (#6312) 2020-01-17 19:37:35 -08:00
env Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
examples Add missing my_pid to fprintf in multi_process_example (#6731) 2020-05-08 20:49:33 -07:00
file Reduce dependency on gtest dependency in release code (#6907) 2020-06-02 12:11:24 -07:00
hdfs prototype status check enforcement (#6798) 2020-05-08 12:40:43 -07:00
include/rocksdb For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) 2020-06-02 12:30:23 -07:00
java Add newer WBWI::NewIteratorWithBase functions to RocksJava (#6872) 2020-05-27 11:59:12 -07:00
logging Fix info log source file display length (#5824) 2020-04-08 20:18:08 -07:00
memory C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
memtable C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
monitoring Fix FilterBench when RTTI=0 (#6732) 2020-04-29 13:09:23 -07:00
options Allow MultiGet users to limit cumulative value size (#6826) 2020-05-27 13:07:14 -07:00
port C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
table For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) 2020-06-02 12:30:23 -07:00
test_util Reduce dependency on gtest dependency in release code (#6907) 2020-06-02 12:11:24 -07:00
third-party Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
tools For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) 2020-06-02 12:30:23 -07:00
trace_replay Fix multiple CF replay failure in db_bench replay (#6787) 2020-05-01 00:03:38 -07:00
util Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
utilities Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Misc things for ASSERT_STATUS_CHECKED, also gcc 4.8.5 (#6871) 2020-05-23 06:53:37 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md For ApproximateSizes, pro-rate table metadata size over data blocks (#6784) 2020-06-02 12:30:23 -07:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md LANGUAGE-BINDINGS.md: mention python-rocksdb 2019-03-20 11:10:48 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
README.md Add Slack forum to README (#6773) 2020-04-30 11:00:28 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
TARGETS Update googletest from 1.8.1 to 1.10.0 (#6808) 2020-06-01 20:33:42 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add YugabyteDB to USERS (#6786) 2020-05-06 10:28:29 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.