rocksdb

Go to file

anand76 fefd4b98c5 Introduce a new MultiGet batching implementation (#5011 )

Summary:
This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.

Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
2. Bloom filter cachelines can be prefetched, hiding the cache miss latency

The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.

Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).

Batch   Sizes

1        | 2        | 4         | 8      | 16  | 32

Random pattern (Stride length 0)
4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)

Good locality (Stride length 16)
4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135

Good locality (Stride length 256)
4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62

Medium locality (Stride length 4096)
4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891

dbbench command used (on a DB with 4 levels, 12 million keys)-
TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011

Differential Revision: D14348703

Pulled By: anand1976

fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b

2019-04-11 14:28:26 -07:00

buckifier

Add load statements to rocksdb TARGETS files

2019-02-13 14:08:21 -08:00

build_tools

Fix db_stress for custom env (#5122 )

2019-03-28 19:20:27 -07:00

cache

Consolidate hash function used for non-persistent data in a new function (#5155 )

2019-04-08 13:32:06 -07:00

cmake

Make FindZLIB consistent with official definitions (#4823 )

2019-01-02 12:49:57 -08:00

coverage

Remove unused imports, from python scripts. (#4057 )

2018-06-26 12:43:04 -07:00

Introduce a new MultiGet batching implementation (#5011 )

2019-04-11 14:28:26 -07:00

docs

Blog post for format_version=4

2019-03-08 16:49:30 -08:00

env

Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165 )

2019-04-11 10:45:36 -07:00

examples

Support for single-primary, multi-secondary instances (#4899 )

2019-03-26 16:45:31 -07:00

hdfs

Fix db_stress for custom env (#5122 )

2019-03-28 19:20:27 -07:00

include/rocksdb

Introduce a new MultiGet batching implementation (#5011 )

2019-04-11 14:28:26 -07:00

java

Document the interaction between disableWAL and BackupEngine (#5071 )

2019-03-19 14:58:14 -07:00

memtable

Consolidate hash function used for non-persistent data in a new function (#5155 )

2019-04-08 13:32:06 -07:00

monitoring

Introduce a new MultiGet batching implementation (#5011 )

2019-04-11 14:28:26 -07:00

options

Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165 )

2019-04-11 10:45:36 -07:00

port

#5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152 )

2019-04-04 11:38:19 -07:00

table

Introduce a new MultiGet batching implementation (#5011 )

2019-04-11 14:28:26 -07:00

third-party/gtest-1.7.0/fused-src/gtest

remove bundled but unused fbson library (#5108 )

2019-03-26 16:37:52 -07:00

tools

Introduce a new MultiGet batching implementation (#5011 )

2019-04-11 14:28:26 -07:00

util

Introduce a new MultiGet batching implementation (#5011 )

2019-04-11 14:28:26 -07:00

utilities

Introduce a new MultiGet batching implementation (#5011 )

2019-04-11 14:28:26 -07:00

.clang-format

A script that automatically reformat affected lines

2014-01-14 12:21:24 -08:00

.gitignore

RocksDB Trace Analyzer (#4091 )

2018-08-13 11:44:02 -07:00

.lgtm.yml

Create lgtm.yml for LGTM.com C/C++ analysis (#4058 )

2018-06-26 12:43:04 -07:00

.travis.yml

Fix printf formatting on MacOS (#4533 )

2018-10-19 14:46:09 -07:00

appveyor.yml

Add RocksJava build to AppVeyor

2019-01-03 10:44:44 -08:00

AUTHORS

Update RocksDB Authors File

2017-10-18 14:42:10 -07:00

CMakeLists.txt

Support for single-primary, multi-secondary instances (#4899 )

2019-03-26 16:45:31 -07:00

CODE_OF_CONDUCT.md

Add Code of Conduct

2017-12-05 18:42:35 -08:00

CONTRIBUTING.md

Add Code of Conduct

2017-12-05 18:42:35 -08:00

COPYING

Add GPLv2 as an alternative license.

2017-04-27 18:06:12 -07:00

DEFAULT_OPTIONS_HISTORY.md

options.delayed_write_rate use the rate of rate_limiter by default.

2017-05-24 09:58:24 -07:00

defs.bzl

[sync fix] Add defs.bzl

2019-02-28 11:35:30 -08:00

DUMP_FORMAT.md

First version of rocksdb_dump and rocksdb_undump.

2015-06-19 16:24:36 -07:00

HISTORY.md

Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165 )

2019-04-11 10:45:36 -07:00

INSTALL.md

Update the version of the dependencies used by the RocksJava static build (#4761 )

2018-12-18 20:25:43 -08:00

issue_template.md

Add a template for issues

2017-09-29 11:41:28 -07:00

LANGUAGE-BINDINGS.md

LANGUAGE-BINDINGS.md: mention python-rocksdb

2019-03-20 11:10:48 -07:00

LICENSE.Apache

Change RocksDB License

2017-07-15 16:11:23 -07:00

LICENSE.leveldb

Add back the LevelDB license file

2017-07-16 18:42:18 -07:00

Makefile

Support for single-primary, multi-secondary instances (#4899 )

2019-03-26 16:45:31 -07:00

README.md

Add LevelDB repository link in the Readme

2019-04-01 18:19:09 -07:00

ROCKSDB_LITE.md

Fix some typos in comments and docs.

2018-03-08 10:27:25 -08:00

src.mk

Support for single-primary, multi-secondary instances (#4899 )

2019-03-26 16:45:31 -07:00

TARGETS

Support for single-primary, multi-secondary instances (#4899 )

2019-03-26 16:45:31 -07:00

thirdparty.inc

Provide a way to override windows memory allocator with jemalloc for ZSTD

2018-06-04 12:12:48 -07:00

USERS.md

Adding IOTA Foundation to USERS.MD (#4436 )

2018-10-02 10:03:46 -07:00

Vagrantfile

Adding CentOS 7 Vagrantfile & build script

2018-02-26 15:27:17 -08:00

WINDOWS_PORT.md

#5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152 )

2019-04-04 11:38:19 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.

Languages

C++ 82.1%

Java 10.3%

C 2.5%

Python 1.7%

Perl 1.1%

Other 2.1%