A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Adam Retter 7242dae7fe Improve RocksJava Comparator (#6252)
Summary:
This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.

**NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.

Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).

In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.

In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.

 ---
[JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.

With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425).

These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.

 ---
These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.

```
ComparatorBenchmarks.put                                                native_bytewise  thrpt   25  124483.795 ± 2032.443  ops/s
ComparatorBenchmarks.put                                        native_reverse_bytewise  thrpt   25  114414.536 ± 3486.156  ops/s
ComparatorBenchmarks.put              java_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   17228.250 ± 1288.546  ops/s
ComparatorBenchmarks.put          java_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   16035.865 ± 1248.099  ops/s
ComparatorBenchmarks.put                java_bytewise_non-direct_reused-64_thread-local  thrpt   25   21571.500 ±  871.521  ops/s
ComparatorBenchmarks.put                  java_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   23613.773 ± 8465.660  ops/s
ComparatorBenchmarks.put              java_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   16768.172 ± 5618.489  ops/s
ComparatorBenchmarks.put                    java_bytewise_direct_reused-64_thread-local  thrpt   25   23921.164 ± 8734.742  ops/s
ComparatorBenchmarks.put                              java_bytewise_non-direct_no-reuse  thrpt   25   17899.684 ±  839.679  ops/s
ComparatorBenchmarks.put                                  java_bytewise_direct_no-reuse  thrpt   25   22148.316 ± 1215.527  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   11311.126 ±  820.602  ops/s
ComparatorBenchmarks.put  java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   11421.311 ±  807.210  ops/s
ComparatorBenchmarks.put        java_reverse_bytewise_non-direct_reused-64_thread-local  thrpt   25   11554.005 ±  960.556  ops/s
ComparatorBenchmarks.put          java_reverse_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   22960.523 ± 1673.421  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   18293.317 ± 1434.601  ops/s
ComparatorBenchmarks.put            java_reverse_bytewise_direct_reused-64_thread-local  thrpt   25   24479.361 ± 2157.306  ops/s
ComparatorBenchmarks.put                      java_reverse_bytewise_non-direct_no-reuse  thrpt   25    7942.286 ±  626.170  ops/s
ComparatorBenchmarks.put                          java_reverse_bytewise_direct_no-reuse  thrpt   25   11781.955 ± 1019.843  ops/s
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252

Differential Revision: D19331064

Pulled By: pdillinger

fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
2020-02-03 12:30:13 -08:00
buckifier PosixRandomAccessFile::MultiRead() to use I/O uring if supported (#5881) 2019-12-07 20:55:52 -08:00
build_tools Don't download from (unreliable) maven.org (#6348) 2020-01-30 11:02:08 -08:00
cache Remove key length assertion LRUHandle::CalcTotalCharge (#6115) 2019-12-02 15:00:07 -08:00
cmake cmake: do not build tests for Release build and cleanups (#5916) 2019-12-13 12:48:06 -08:00
coverage Fix interpreter lines for files with python2-only syntax. 2019-07-09 10:51:37 -07:00
db Fix DBTest2.ChangePrefixExtractor LITE build (#6356) 2020-01-31 15:44:14 -08:00
db_stress_tool Fix release warning for unused bg_canceled 2020-01-31 15:09:10 -08:00
docs Log warning for high bits/key in legacy Bloom filter (#6312) 2020-01-17 19:37:35 -08:00
env Fix some shadow warning (#6242) 2020-01-08 18:20:13 -08:00
examples Update example of optimistic transaction (#6074) 2020-01-16 14:04:44 -08:00
file Force a new manifest file if append to current one fails (#6331) 2020-01-30 10:56:29 -08:00
hdfs Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
include/rocksdb Add statistics for BlobDB GC (#6296) 2020-01-29 16:46:16 -08:00
java Improve RocksJava Comparator (#6252) 2020-02-03 12:30:13 -08:00
logging Increase max_log_size in FlushJob to 1024 bytes (#6258) 2020-01-06 10:16:52 -08:00
memory Charge block cache for cache internal usage (#5797) 2019-09-16 15:26:21 -07:00
memtable Misc hashing updates / upgrades (#5909) 2019-10-24 17:16:46 -07:00
monitoring Apply formatter to recent 200+ commits. (#5830) 2019-09-20 12:04:26 -07:00
options Add ReadOptions.auto_prefix_mode (#6314) 2020-01-28 14:44:05 -08:00
port Implement getfreespace for WinEnv (#6265) 2020-01-07 13:56:13 -08:00
table Add ReadOptions.auto_prefix_mode (#6314) 2020-01-28 14:44:05 -08:00
test_util unordered_write incompatible with max_successive_merges (#6284) 2020-01-10 16:53:19 -08:00
third-party Apply formatter to some recent commits (#6138) 2019-12-09 15:49:49 -08:00
tools Revert "crash_test to enable block-based table hash index (#6310)" (#6327) 2020-01-23 09:09:17 -08:00
trace_replay Misc hashing updates / upgrades (#5909) 2019-10-24 17:16:46 -07:00
util Shorten certain test names to avoid infra failure (#6352) 2020-01-30 23:10:24 -08:00
utilities Shorten certain test names to avoid infra failure (#6352) 2020-01-30 23:10:24 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Make buckifier python3 compatible (#5922) 2019-10-23 13:52:27 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Don't download from (unreliable) maven.org (#6348) 2020-01-30 11:02:08 -08:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Don't download from (unreliable) maven.org (#6348) 2020-01-30 11:02:08 -08:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Introduce a new storage specific Env API (#5761) 2019-12-13 14:48:41 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Add clarifying/instructive header to TARGETS and defs.bzl 2019-11-05 20:20:33 -08:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Improve RocksJava Comparator (#6252) 2020-02-03 12:30:13 -08:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md LANGUAGE-BINDINGS.md: mention python-rocksdb 2019-03-20 11:10:48 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Don't download from (unreliable) maven.org (#6348) 2020-01-30 11:02:08 -08:00
README.md Replaced some words (#5877) 2019-10-07 12:28:09 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Introduce a new storage specific Env API (#5761) 2019-12-13 14:48:41 -08:00
TARGETS Introduce a new storage specific Env API (#5761) 2019-12-13 14:48:41 -08:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md add user nebula (#6271) 2020-01-08 13:46:43 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.