rocksdb/java/rocksjni/comparatorjnicallback.h
Adam Retter 7242dae7fe Improve RocksJava Comparator (#6252)
Summary:
This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.

**NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.

Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).

In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.

In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.

 ---
[JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.

With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425).

These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.

 ---
These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.

```
ComparatorBenchmarks.put                                                native_bytewise  thrpt   25  124483.795 ± 2032.443  ops/s
ComparatorBenchmarks.put                                        native_reverse_bytewise  thrpt   25  114414.536 ± 3486.156  ops/s
ComparatorBenchmarks.put              java_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   17228.250 ± 1288.546  ops/s
ComparatorBenchmarks.put          java_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   16035.865 ± 1248.099  ops/s
ComparatorBenchmarks.put                java_bytewise_non-direct_reused-64_thread-local  thrpt   25   21571.500 ±  871.521  ops/s
ComparatorBenchmarks.put                  java_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   23613.773 ± 8465.660  ops/s
ComparatorBenchmarks.put              java_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   16768.172 ± 5618.489  ops/s
ComparatorBenchmarks.put                    java_bytewise_direct_reused-64_thread-local  thrpt   25   23921.164 ± 8734.742  ops/s
ComparatorBenchmarks.put                              java_bytewise_non-direct_no-reuse  thrpt   25   17899.684 ±  839.679  ops/s
ComparatorBenchmarks.put                                  java_bytewise_direct_no-reuse  thrpt   25   22148.316 ± 1215.527  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   11311.126 ±  820.602  ops/s
ComparatorBenchmarks.put  java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   11421.311 ±  807.210  ops/s
ComparatorBenchmarks.put        java_reverse_bytewise_non-direct_reused-64_thread-local  thrpt   25   11554.005 ±  960.556  ops/s
ComparatorBenchmarks.put          java_reverse_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   22960.523 ± 1673.421  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   18293.317 ± 1434.601  ops/s
ComparatorBenchmarks.put            java_reverse_bytewise_direct_reused-64_thread-local  thrpt   25   24479.361 ± 2157.306  ops/s
ComparatorBenchmarks.put                      java_reverse_bytewise_non-direct_no-reuse  thrpt   25    7942.286 ±  626.170  ops/s
ComparatorBenchmarks.put                          java_reverse_bytewise_direct_no-reuse  thrpt   25   11781.955 ± 1019.843  ops/s
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252

Differential Revision: D19331064

Pulled By: pdillinger

fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
2020-02-03 12:30:13 -08:00

138 lines
5.2 KiB
C++

// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
// This file implements the callback "bridge" between Java and C++ for
// rocksdb::Comparator
#ifndef JAVA_ROCKSJNI_COMPARATORJNICALLBACK_H_
#define JAVA_ROCKSJNI_COMPARATORJNICALLBACK_H_
#include <jni.h>
#include <memory>
#include <string>
#include "rocksjni/jnicallback.h"
#include "rocksdb/comparator.h"
#include "rocksdb/slice.h"
#include "port/port.h"
#include "util/thread_local.h"
namespace rocksdb {
enum ReusedSynchronisationType {
/**
* Standard mutex.
*/
MUTEX,
/**
* Use adaptive mutex, which spins in the user space before resorting
* to kernel. This could reduce context switch when the mutex is not
* heavily contended. However, if the mutex is hot, we could end up
* wasting spin time.
*/
ADAPTIVE_MUTEX,
/**
* There is a reused buffer per-thread.
*/
THREAD_LOCAL
};
struct ComparatorJniCallbackOptions {
// Set the synchronisation type used to guard the reused buffers.
// Only used if max_reused_buffer_size > 0.
// Default: ADAPTIVE_MUTEX
ReusedSynchronisationType reused_synchronisation_type =
ReusedSynchronisationType::ADAPTIVE_MUTEX;
// Indicates if a direct byte buffer (i.e. outside of the normal
// garbage-collected heap) is used for the callbacks to Java,
// as opposed to a non-direct byte buffer which is a wrapper around
// an on-heap byte[].
// Default: true
bool direct_buffer = true;
// Maximum size of a buffer (in bytes) that will be reused.
// Comparators will use 5 of these buffers,
// so the retained memory size will be 5 * max_reused_buffer_size.
// When a buffer is needed for transferring data to a callback,
// if it requires less than max_reused_buffer_size, then an
// existing buffer will be reused, else a new buffer will be
// allocated just for that callback. -1 to disable.
// Default: 64 bytes
int32_t max_reused_buffer_size = 64;
};
/**
* This class acts as a bridge between C++
* and Java. The methods in this class will be
* called back from the RocksDB storage engine (C++)
* we then callback to the appropriate Java method
* this enables Comparators to be implemented in Java.
*
* The design of this Comparator caches the Java Slice
* objects that are used in the compare and findShortestSeparator
* method callbacks. Instead of creating new objects for each callback
* of those functions, by reuse via setHandle we are a lot
* faster; Unfortunately this means that we have to
* introduce independent locking in regions of each of those methods
* via the mutexs mtx_compare and mtx_findShortestSeparator respectively
*/
class ComparatorJniCallback : public JniCallback, public Comparator {
public:
ComparatorJniCallback(
JNIEnv* env, jobject jcomparator,
const ComparatorJniCallbackOptions* options);
~ComparatorJniCallback();
virtual const char* Name() const;
virtual int Compare(const Slice& a, const Slice& b) const;
virtual void FindShortestSeparator(
std::string* start, const Slice& limit) const;
virtual void FindShortSuccessor(std::string* key) const;
const ComparatorJniCallbackOptions* m_options;
private:
struct ThreadLocalBuf {
ThreadLocalBuf(JavaVM* _jvm, bool _direct_buffer, jobject _jbuf) :
jvm(_jvm), direct_buffer(_direct_buffer), jbuf(_jbuf) {}
JavaVM* jvm;
bool direct_buffer;
jobject jbuf;
};
inline void MaybeLockForReuse(const std::unique_ptr<port::Mutex>& mutex,
const bool cond) const;
inline void MaybeUnlockForReuse(const std::unique_ptr<port::Mutex>& mutex,
const bool cond) const;
jobject GetBuffer(JNIEnv* env, const Slice& src, bool reuse_buffer,
ThreadLocalPtr* tl_buf, jobject jreuse_buffer) const;
jobject ReuseBuffer(JNIEnv* env, const Slice& src,
jobject jreuse_buffer) const;
jobject NewBuffer(JNIEnv* env, const Slice& src) const;
void DeleteBuffer(JNIEnv* env, jobject jbuffer) const;
// used for synchronisation in compare method
std::unique_ptr<port::Mutex> mtx_compare;
// used for synchronisation in findShortestSeparator method
std::unique_ptr<port::Mutex> mtx_shortest;
// used for synchronisation in findShortSuccessor method
std::unique_ptr<port::Mutex> mtx_short;
std::unique_ptr<const char[]> m_name;
jclass m_abstract_comparator_jni_bridge_clazz; // TODO(AR) could we make this static somehow?
jclass m_jbytebuffer_clazz; // TODO(AR) we could cache this globally for the entire VM if we switch more APIs to use ByteBuffer // TODO(AR) could we make this static somehow?
jmethodID m_jcompare_mid; // TODO(AR) could we make this static somehow?
jmethodID m_jshortest_mid; // TODO(AR) could we make this static somehow?
jmethodID m_jshort_mid; // TODO(AR) could we make this static somehow?
jobject m_jcompare_buf_a;
jobject m_jcompare_buf_b;
jobject m_jshortest_buf_start;
jobject m_jshortest_buf_limit;
jobject m_jshort_buf_key;
ThreadLocalPtr* m_tl_buf_a;
ThreadLocalPtr* m_tl_buf_b;
};
} // namespace rocksdb
#endif // JAVA_ROCKSJNI_COMPARATORJNICALLBACK_H_