A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Luca Giacchino 66a95f0fac Provide an allocator for new memory type to be used with RocksDB block cache (#6214)
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.

**Performance**

We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.

The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.

![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)

**Changes**

- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator

**Minimum Requirements**

- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind

**Memory Configuration**

The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.

Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.

**Usage**

When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):

```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"

NewLRUCache(
    capacity /*size_t*/,
    6 /*cache_numshardbits*/,
    false /*strict_capacity_limit*/,
    false /*cache_high_pri_pool_ratio*/,
    std::make_shared<MemkindKmemAllocator>());
```

Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214

Reviewed By: cheng-chang

Differential Revision: D19292435

fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
2020-04-09 20:47:23 -07:00
.circleci Migrate AppVeyor to CircleCI (#6518) 2020-03-13 21:58:51 -07:00
buckifier Buck config: Re-enable liburing under Linux (#6451) 2020-02-24 15:47:34 -08:00
build_tools Provide an allocator for new memory type to be used with RocksDB block cache (#6214) 2020-04-09 20:47:23 -07:00
cache Revamp cache_bench to resemble a real workload (#6629) 2020-04-03 10:26:49 -07:00
cmake cmake: do not build tests for Release build and cleanups (#5916) 2019-12-13 12:48:06 -08:00
coverage Update a few scripts to be python3 compatible (#6525) 2020-03-24 21:00:27 -07:00
db Fix wrong key being read on ingested file with global seqno and delta encoding (#6669) 2020-04-08 21:22:15 -07:00
db_stress_tool Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491) 2020-03-18 17:14:15 -07:00
docs Log warning for high bits/key in legacy Bloom filter (#6312) 2020-01-17 19:37:35 -08:00
env Add counter in perf_context to time cipher time (#6596) 2020-04-01 16:59:35 -07:00
examples Use DestroyColumnFamilyHandle instead of directly deleting column family handle (#6505) 2020-03-12 14:30:46 -07:00
file Fix result slice's address for direct io read (#6672) 2020-04-08 21:20:31 -07:00
hdfs Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
include/rocksdb added new functions to c-api (#5630) 2020-04-07 14:45:39 -07:00
java Fix crash in JNI getApproximateSizes (#6652) 2020-04-07 20:19:25 -07:00
logging Fix info log source file display length (#5824) 2020-04-08 20:18:08 -07:00
memory Provide an allocator for new memory type to be used with RocksDB block cache (#6214) 2020-04-09 20:47:23 -07:00
memtable Fix a bug that crashes the service when write buffer manager fails to insert to block cache (#6619) 2020-04-01 11:27:40 -07:00
monitoring fix compiler errors with -DNPERF_CONTEXT (#6642) 2020-04-03 13:24:16 -07:00
options Fix memory corruption caused by new test in options_settable_test (#6676) 2020-04-09 11:23:32 -07:00
port Fix jemalloc forward declarations (#6613) 2020-03-31 11:38:51 -07:00
table Fix wrong key being read on ingested file with global seqno and delta encoding (#6669) 2020-04-08 21:22:15 -07:00
test_util Simplify migration to FileSystem API (#6552) 2020-03-23 21:54:21 -07:00
third-party Add dependency of gtest on pthread (#6572) 2020-04-01 13:53:55 -07:00
tools Provide an allocator for new memory type to be used with RocksDB block cache (#6214) 2020-04-09 20:47:23 -07:00
trace_replay Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
util Add a simple timer support to schedule work at fixed times/intervals (#6543) 2020-04-07 11:55:27 -07:00
utilities Add unit test for TransactionLockMgr (#6599) 2020-04-08 13:51:51 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Separate timestamp related test from db_basic_test (#6516) 2020-03-13 11:37:15 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Temporarily disable ppc64le unit tests in PRs (#6682) 2020-04-09 16:42:44 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Separate timestamp related test from db_basic_test (#6516) 2020-03-13 11:37:15 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Provide an allocator for new memory type to be used with RocksDB block cache (#6214) 2020-04-09 20:47:23 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Add two more optimization improvements to HISTORY (#6679) 2020-04-09 11:19:51 -07:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md LANGUAGE-BINDINGS.md: mention python-rocksdb 2019-03-20 11:10:48 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Provide an allocator for new memory type to be used with RocksDB block cache (#6214) 2020-04-09 20:47:23 -07:00
README.md Replaced some words (#5877) 2019-10-07 12:28:09 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Provide an allocator for new memory type to be used with RocksDB block cache (#6214) 2020-04-09 20:47:23 -07:00
TARGETS Add unit test for TransactionLockMgr (#6599) 2020-04-08 13:51:51 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md add user nebula (#6271) 2020-01-08 13:46:43 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.