A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Peter Dillinger b0e248e2a4 Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453)
Summary:
MemTable::MultiGet was not considering range tombstones before
querying Bloom filter. This means range tombstones would be skipped for
keys (or prefixes) with no other entries in the memtable. This could cause
old values for a key (in SST files) to still show up until the range tombstone
covering it has been flushed.

This is fixed by essentially disabling the memtable Bloom filter when there
are any range tombstones. (This could be better optimized in the future, but
good enough for now.)

Did some other cleanup/optimization in the same code to (more than) offset
the cost of checking on range tombstones in more cases. There is now
notable improvement when memtable_whole_key_filtering and prefix_extractor
are used together (unusual), and this makes MultiGet closer to the Get
implementation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9453

Test Plan:
new unit test added. Added memtable Bloom to crash test.

Performance testing
--------------------

Build WAL-only DB (recovers to memtable):
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=1000000 -write_buffer_size=250000000
```

Query test command, to maximize sensitivity to the changed code:
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=multireadrandom -num=10000000 -write_buffer_size=250000000 -memtable_bloom_size_ratio=0.015 -multiread_batched -batch_size=24 -threads=8 -memtable_whole_key_filtering=$MWKF -prefix_size=$PXS
```
(Note -num here is 10x larger for mostly memtable misses)

Before & after run simultaneously, average over 10 iterations per data point, ops/sec.

MWKF=0 PXS=0 (Bloom disabled)
Before: 5724844
After: 6722066

MWKF=0 PXS=7 (prefixes hardly unique; Bloom not useful)
Before: 9981319
After: 10237990

MWKF=0 PXS=8 (prefixes unique; Bloom useful)
Before:  12081715
After: 12117603

MWKF=1 PXS=0 (whole key Bloom useful)
Before: 11944354
After: 12096085

MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes not useful in old version)
Before: 9444299
After: 11826029

MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes useful in old version)
Before: 11784465
After: 11778591

Only in this last case is the 'before' *slightly* faster, perhaps because hashing prefixes is slightly faster than hashing whole keys. Otherwise, 'after' is faster.

Reviewed By: ajkr

Differential Revision: D33805025

Pulled By: pdillinger

fbshipit-source-id: 597523cae4f4eafdf6ae6bb2bc6cb46f83b017bf
2022-01-31 10:12:30 -08:00
.circleci Fix some CI output (#9193) 2021-11-20 10:21:58 -08:00
.github/workflows Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
buckifier Update TARGETS and related scripts (#9310) 2021-12-17 11:51:51 -08:00
build_tools More improvements to output for CircleCI (#9201) 2021-11-23 22:10:27 -08:00
cache Fix unity build with SUPPORT_CLOCK_CACHE (#9309) 2021-12-17 14:15:07 -08:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Remove asan_symbolize.py for internal asan build (#8737) 2021-09-07 15:39:11 -07:00
db Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-31 10:12:30 -08:00
db_stress_tool Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
docs Misc doc fixes (#8983) 2021-10-07 11:22:17 -07:00
env New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
examples Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
file Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
fuzz Make EventListener into a Customizable Class (#8473) 2021-07-27 07:47:02 -07:00
hdfs fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
include/rocksdb Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-31 10:12:30 -08:00
java fix java doc issues (#9253) 2021-12-16 21:04:41 -08:00
logging Use system-wide thread ID in info log lines (#9164) 2021-11-12 19:46:06 -08:00
memory Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
memtable Minor improvement to CacheReservationManager/WriteBufferManager/CompressionDictBuilding (#9139) 2021-11-05 16:13:47 -07:00
microbench Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
monitoring Add tiered storage related read bytes stats to Statistic (#9123) 2021-11-16 15:17:17 -08:00
options Make RocksDB codebase compatible with newer compilers like clang-12 (#9370) 2022-01-10 11:27:59 -08:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port Fix/improve 'must free heap allocations' code (#9209) 2021-11-29 10:53:52 -08:00
table Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-31 10:12:30 -08:00
test_util More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
third-party Add support for building on s390x platform (#8962) 2021-10-22 10:13:15 -07:00
tools Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-31 10:12:30 -08:00
trace_replay Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
util Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-31 10:12:30 -08:00
utilities Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore gitignore cmake-build-* for CLion integration (#7933) 2021-02-19 13:43:15 -08:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Re-enable 390x+cmake* Travis jobs (#9110) 2021-11-03 20:30:15 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Remove 2019 from appveyor (#7038) 2020-06-29 14:31:41 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453) 2022-01-31 10:12:30 -08:00
INSTALL.md Update installation instructions (#8158) 2021-04-06 16:02:04 -07:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Update branch name to "main" in README/LANGUAGE_BINDINGS (#8727) 2021-09-01 15:26:34 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
PLUGINS.md Add ZenFS to plugin list (#8218) 2021-04-22 11:12:40 -07:00
README.md Update branch name to "main" in README/LANGUAGE_BINDINGS (#8727) 2021-09-01 15:26:34 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
TARGETS Update TARGETS and related scripts (#9310) 2021-12-17 11:51:51 -08:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Update USERS.md (#8923) 2021-10-01 16:10:35 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.