A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Peter Dillinger 079e77ff9e Revamp cache_bench to resemble a real workload (#6629)
Summary:
I suspect LRUCache could use some optimization, and to support
such an effort, a good benchmarking tool is needed. The existing
cache_bench was heavily skewed toward insertion and lookup misses, and
did not saturate memory with other work. This change should improve
those things to better resemble a real workload.

(All below using clang compiler, for some consistency, but not
necessarily same version and settings.)

The real workload is from production MySQL on RocksDB, filtering stacks
containing "LRU", "ShardedCache" or "CacheShard."
Lookup inclusive: 66%
Insert inclusive: 17%
Release inclusive: 15%

An alternate simulated workload is MySQL running a LinkBench read test:
Lookup inclusive: 54%
Insert inclusive: 24%
Release inclusive: 21%

cache_bench default settings, prior to this change:
Lookup inclusive: 35.8%
Insert inclusive: 63.6%
Release inclusive: 0%

cache_bench after this change (intended as somewhat "tighter" workload
than average production, more like LinkBench):
Lookup inclusive: 52%
Insert inclusive: 20%
Release inclusive: 26%

And top exclusive stacks (portion of stack samples as filtered above):
Production MySQL:
LRUHandleTable::FindPointer: 25.3%
rocksdb::operator==: 15.1%  <-- Slice ==
LRUCacheShard::LRU_Remove: 13.8%
ShardedCache::Lookup: 8.9%
__pthread_mutex_lock: 7.1%
LRUCacheShard::LRU_Insert: 6.3%
MurmurHash64A: 4.8%  <-- Since upgraded to XXH3p
...

Old cache_bench:
LRUHandleTable::FindPointer: 23.6%
__pthread_mutex_lock: 15.0%
__pthread_mutex_unlock_usercnt: 11.7%
__lll_lock_wait: 8.6%
__lll_unlock_wake: 6.8%
LRUCacheShard::LRU_Insert: 6.0%
ShardedCache::Lookup: 4.4%
LRUCacheShard::LRU_Remove: 2.8%
...
rocksdb::operator==: 0.2%  <-- Slice ==
...

New cache_bench:
LRUHandleTable::FindPointer: 22.8%
__pthread_mutex_unlock_usercnt: 14.3%
rocksdb::operator==: 10.5%  <-- Slice ==
LRUCacheShard::LRU_Insert: 9.0%
__pthread_mutex_lock: 5.9%
LRUCacheShard::LRU_Remove: 5.0%
...
ShardedCache::Lookup: 2.9%
...

So there's a bit more lock contention in the benchmark than in
production, but otherwise looks similar enough to me. At least it's a
big improvement over the existing code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6629

Test Plan: No production code changes, ran cache_bench with ASAN

Reviewed By: ltamasi

Differential Revision: D20824318

Pulled By: pdillinger

fbshipit-source-id: 6f8dc5891ead0f87edbed3a615ecd5289d9abe12
2020-04-03 10:26:49 -07:00
.circleci Migrate AppVeyor to CircleCI (#6518) 2020-03-13 21:58:51 -07:00
buckifier Buck config: Re-enable liburing under Linux (#6451) 2020-02-24 15:47:34 -08:00
build_tools Force Java version on Travis CI (#6512) 2020-03-12 12:24:51 -07:00
cache Revamp cache_bench to resemble a real workload (#6629) 2020-04-03 10:26:49 -07:00
cmake cmake: do not build tests for Release build and cleanups (#5916) 2019-12-13 12:48:06 -08:00
coverage Update a few scripts to be python3 compatible (#6525) 2020-03-24 21:00:27 -07:00
db Revamp cache_bench to resemble a real workload (#6629) 2020-04-03 10:26:49 -07:00
db_stress_tool Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491) 2020-03-18 17:14:15 -07:00
docs Log warning for high bits/key in legacy Bloom filter (#6312) 2020-01-17 19:37:35 -08:00
env Add counter in perf_context to time cipher time (#6596) 2020-04-01 16:59:35 -07:00
examples Use DestroyColumnFamilyHandle instead of directly deleting column family handle (#6505) 2020-03-12 14:30:46 -07:00
file Use FileChecksumGenFactory for SST file checksum (#6600) 2020-03-29 15:58:46 -07:00
hdfs Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
include/rocksdb Add counter in perf_context to time cipher time (#6596) 2020-04-01 16:59:35 -07:00
java Use an Amazon S3 bucket for downloading deps (#6526) 2020-03-13 13:39:03 -07:00
logging Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
memory Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
memtable Fix a bug that crashes the service when write buffer manager fails to insert to block cache (#6619) 2020-04-01 11:27:40 -07:00
monitoring Fix msvc debug test failures (#6579) 2020-04-03 09:54:25 -07:00
options Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
port Fix jemalloc forward declarations (#6613) 2020-03-31 11:38:51 -07:00
table Fix msvc debug test failures (#6579) 2020-04-03 09:54:25 -07:00
test_util Simplify migration to FileSystem API (#6552) 2020-03-23 21:54:21 -07:00
third-party Add dependency of gtest on pthread (#6572) 2020-04-01 13:53:55 -07:00
tools Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
trace_replay Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
util Revamp cache_bench to resemble a real workload (#6629) 2020-04-03 10:26:49 -07:00
utilities Revert the recent cache deleter change (#6620) 2020-03-31 16:11:06 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Separate timestamp related test from db_basic_test (#6516) 2020-03-13 11:37:15 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Exclude more Travis builds for each pull request (#6557) 2020-03-20 13:20:19 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Separate timestamp related test from db_basic_test (#6516) 2020-03-13 11:37:15 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Remove redundant in HISTORY (#6627) 2020-04-02 12:12:05 -07:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md LANGUAGE-BINDINGS.md: mention python-rocksdb 2019-03-20 11:10:48 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
README.md Replaced some words (#5877) 2019-10-07 12:28:09 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
TARGETS Add pipelined & parallel compression optimization (#6262) 2020-04-01 16:40:18 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md add user nebula (#6271) 2020-01-08 13:46:43 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.