Summary:
**Summary**:
2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations.
**Unit test**: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test.
This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR can be exactly predicted, showing that we have a full understanding of the underlying operations.
**Performance testing**:
`./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times.
Timing done using "date" function in a bash script.
_Results_:
Original Rocksdb fork: mean 66.6 sec, std 1.18 sec.
This feature branch: mean 67.4 sec, std 1.35 sec.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8411
Reviewed By: akankshamahajan15
Differential Revision: D29150629
Pulled By: bjlemaire
fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1
Summary:
In a distributed environment, a file `rename()` operation can succeed on server (remote)
side, but the client can somehow return non-ok status to RocksDB. Possible reasons include
network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which
can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a
new MANIFEST. We currently always delete the new MANIFEST if an error occurs.
This is problematic in distributed world. If the server-side successfully updates the CURRENT
file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail.
As a fix, we can track the execution result of IO operations on the new MANIFEST.
- If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original
MANIFEST. Therefore, it is safe to remove the new MANIFEST.
- If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up
code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local
POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the
new MANIFEST.) Therefore, we keep the new MANIFEST.
- Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT.
- If process reopens the db immediately after the failure, then the CURRENT file can point
to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can
succeed and ignore the other.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192
Test Plan: make check
Reviewed By: zhichao-cao
Differential Revision: D27804648
Pulled By: riversand963
fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4
Summary:
Added `TableProperties::{fast,slow}_compression_estimated_data_size`.
These properties are present in block-based tables when
`ColumnFamilyOptions::sample_for_compression > 0` and the necessary
compression library is supported when the file is generated. They
contain estimates of what `TableProperties::data_size` would be if the
"fast"/"slow" compression library had been used instead. One
limitation is we do not record exactly which "fast" (ZSTD or Zlib)
or "slow" (LZ4 or Snappy) compression library produced the result.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139
Test Plan:
- new unit test
- ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression
Reviewed By: riversand963
Differential Revision: D27454338
Pulled By: ajkr
fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4
Summary:
Which should return 2 long instead of an array.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8098
Reviewed By: mrambacher
Differential Revision: D27308741
Pulled By: jay-zhuang
fbshipit-source-id: 44beea2bd28cf6779b048bebc98f2426fe95e25c
Summary:
This is a small fix to what I think is a mistype in two comments in `DBOptionsInterface.java`. If it was not an error, feel free to close.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8086
Reviewed By: ajkr
Differential Revision: D27260488
Pulled By: mrambacher
fbshipit-source-id: 469daadaf6039d5b5187132b8e0c7c3672842f21
Summary:
Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume, auto resume total retry, and auto resume sucess; Histogram for auto resume retry count in each recovery call.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8050
Test Plan: make check and add test to error_handler_fs_test
Reviewed By: anand1976
Differential Revision: D26990565
Pulled By: zhichao-cao
fbshipit-source-id: 49f71e8ea4e9db8b189943976404205b56ab883f
Summary:
support getUsage and getPinnedUsage in JavaAPI for Cache
also fix a typo in LRUCacheTest.java that the highPriPoolRatio is not valid(set 5, I guess it means 0.05)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7925
Reviewed By: mrambacher
Differential Revision: D26900241
Pulled By: ajkr
fbshipit-source-id: 735d1e40a16fa8919c89c7c7154ba7f81208ec33
Summary:
Fixes 3 minor Javadoc copy-paste errors in the `RocksDB#newIterator()` and `Transaction#getIterator()` variants that take a column family handle but are talking about iterating over "the database" or "the default column family".
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8034
Reviewed By: jay-zhuang
Differential Revision: D26877667
Pulled By: mrambacher
fbshipit-source-id: 95dd95b667c496e389f221acc9a91b340e4b63bf
Summary:
The variable `byteCompressionType` is only assigned values of primitive type and is never 'null', but it is declared with the boxed type 'Byte'.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7981
Reviewed By: ajkr
Differential Revision: D26546600
Pulled By: jay-zhuang
fbshipit-source-id: 07b579cdfcfc2262a448ca3626e216416fd05892
Summary:
Haven't seen any production issues with new Bloom filter and
it's now > 1 year old (added in 6.6.0).
Updated check_format_compatible.sh and HISTORY.md
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8017
Test Plan: tests updated (or prior bugs fixed)
Reviewed By: ajkr
Differential Revision: D26762197
Pulled By: pdillinger
fbshipit-source-id: 0e755c46b443087c1544da0fd545beb9c403d1c2
Summary:
This request is adding support for using DirectSlice in ReadOptions lower/upper bounds.
To be more efficient I have added setLength to DirectSlice so I can just update the length to be used by slice from direct buffer. It is also needed, because when one creates iterator it keep pointer to original slice so setting new slice in options does not help (it needs to reuse existing one). Using this approach one can modify the slice any time during operations with iterator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7132
Reviewed By: zhichao-cao
Differential Revision: D25840092
Pulled By: jay-zhuang
fbshipit-source-id: 760167baf61568c9a35138145c4bf9b06824cb71
Summary:
Classes ColumnFamilyHandle and CapturingWriteBatchHandler.Event have
byte array fields as part of their identity, but they do not use the
arrays' content to compute the instance's hash, and instead rely on the
arrays' identity, causing instances to have different hashcodes
although they are equal.
The PR addresses it by using the arrays' content to compute the hash,
like the equals method does.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7860
Reviewed By: jay-zhuang
Differential Revision: D25901327
Pulled By: akankshamahajan15
fbshipit-source-id: 347e7b3d2ba7befe7faa956b033e6421b9d0c235
Summary:
Consider the following sequence of events:
1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
2. Syncing MANIFEST failed and db crashed.
3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
4. Db crashed again.
5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.
It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.
After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621
Test Plan:
1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
2. a new unit test is added in db_basic_test to simulate step 3.
Reviewed By: riversand963
Differential Revision: D24668144
Pulled By: cheng-chang
fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
Summary:
The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608
Test Plan: watch the existing java tests to pass
Reviewed By: zhichao-cao
Differential Revision: D24611621
Pulled By: cheng-chang
fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d
Summary:
- Takes the burden off developer to close ColumnFamilyHandle instances before closing RocksDB instance
- The change is backward-compatible
----
Previously the pattern for working with Column Families was:
```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {
// list of column family descriptors, first entry must always be default column family
final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
);
// a list which will hold the handles for the column families once the db is opened
final List<ColumnFamilyHandle> columnFamilyHandleList =
new ArrayList<>();
try (final DBOptions options = new DBOptions()
.setCreateIfMissing(true)
.setCreateMissingColumnFamilies(true);
final RocksDB db = RocksDB.open(options,
"path/to/do", cfDescriptors,
columnFamilyHandleList)) {
try {
// do something
} finally {
// NOTE user must explicitly frees the column family handles before freeing the db
for (final ColumnFamilyHandle columnFamilyHandle :
columnFamilyHandleList) {
columnFamilyHandle.close();
}
} // frees the column family options
}
} // frees the db and the db options
```
With the changes in this PR, the Java user no longer has to worry about manually closing the Column Families, which allows them to write simpler symmetrical create/free oriented code like this:
```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {
// list of column family descriptors, first entry must always be default column family
final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
);
// a list which will hold the handles for the column families once the db is opened
final List<ColumnFamilyHandle> columnFamilyHandleList =
new ArrayList<>();
try (final DBOptions options = new DBOptions()
.setCreateIfMissing(true)
.setCreateMissingColumnFamilies(true);
final RocksDB db = RocksDB.open(options,
"path/to/do", cfDescriptors,
columnFamilyHandleList)) {
// do something
} // frees the column family options, then frees the db and the db options
}
}
```
**NOTE**: The changes in this PR are backwards API compatible, which means existing code using the original approach will also continue to function correctly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7428
Reviewed By: cheng-chang
Differential Revision: D24063348
Pulled By: jay-zhuang
fbshipit-source-id: 648d7526669923128c863ead94516bf4d50ac658
Summary:
Allows adding event listeners in RocksJava.
* Adds listeners getter and setter in `Options` and `DBOptions` classes.
* Adds `EventListener` Java interface and base class for implementing custom event listener callbacks - `AbstractEventListener`, which has an underlying native callback class implementing C++ `EventListener` class.
* `AbstractEventListener` class has mechanism for selectively enabling its callback methods in order to prevent invoking Java method if it is not implemented. This decreases performance cost in case only subset of event listener callback methods is needed - the JNI code for remaining "no-op" callbacks is not executed.
* The code is covered by unit tests in `EventListenerTest.java`, there are also tests added for setting/getting listeners field in `OptionsTest.java` and `DBOptionsTest.java`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7425
Reviewed By: pdillinger
Differential Revision: D24063390
Pulled By: jay-zhuang
fbshipit-source-id: 508c359538983d6b765e70d9989c351794a944ee
Summary:
Add following stats for MultiGet in Histogram to get more insight on MultiGet.
1. Number of index and filter blocks read from file as part of MultiGet
request per level.
2. Number of data blocks read from file per level.
3. Number of SST files loaded from file system per level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7366
Reviewed By: anand1976
Differential Revision: D24127040
Pulled By: akankshamahajan15
fbshipit-source-id: e63a003056b833729b277edc0639c08fb432756b
Summary:
as title
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7347
Test Plan: unit tests included
Reviewed By: jay-zhuang
Differential Revision: D23592552
Pulled By: pdillinger
fbshipit-source-id: 1c3571b6f42bfd0cfd723ff49d01fbc02a1be45b
Summary:
Previously RocksJava limited the format_version to 4. However, the C++ API is now at 5, and this will likely increase again in future. The Java API now allows any positive integer, and an exception is raised from JNI if the format_version is out-of-bounds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7242
Reviewed By: cheng-chang
Differential Revision: D23077941
Pulled By: pdillinger
fbshipit-source-id: ee69f7203448acddc41c6d86b470ed987d3d366d
Summary:
The PR fixes a Java test for Merge operator `uint64add`.
The current implementation uses wrong byte order for long serialization, but fails to catch this error because the merge sum is lower than `256`.
The PR makes this test case more representative (i.e. it fails with wrong byte order) and changes the byte order to little endian.
Some background: RocksDB uses LittleEndian byte order for integer serialization across all platforms. `MergeTest` uses `ByteBuffer` that defaults to BigEndian byte order.
This test case might probably be used as a sample of `MergeOperator` usage in Java.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7243
Reviewed By: ajkr
Differential Revision: D23079593
Pulled By: pdillinger
fbshipit-source-id: 82e8e166901d66733e96a0116f88d0ec4761ddf1
Summary:
Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165
Test Plan:
TTL and periodic can be checked by runnning db_bench with the options activated:
/db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1
./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1
Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected.
Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking
Reviewed By: ajkr
Differential Revision: D22693050
Pulled By: akabcenell
fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d
Summary:
SST Partitioner interface that allows to split SST files during compactions.
It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6957
Reviewed By: ajkr
Differential Revision: D22461239
fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e
Summary:
The methods in convenience.h are used to compare/convert objects to/from strings. There is a mishmash of parameters in use here with more needed in the future. This PR replaces those parameters with a single structure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389
Reviewed By: siying
Differential Revision: D21163707
Pulled By: zhichao-cao
fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd
Summary:
This PR exposes the `Iterator::Refresh` method to the Java API by adding it on the `RocksIteratorInterface` interface. There are three concrete implementations: `RocksIterator`, `SstFileReaderIterator`, and `WBWIRocksIterator`. For the first two cases, the JNI side simply delegates to the underlying `Iterator::Refresh` method; in the last case, as it doesn't share an ancestor, and per the discussion in https://github.com/facebook/rocksdb/issues/3465, a `Status::NotSupported` exception is thrown.
As the last PR had no activity in a while, I'm opening a new one - I'm completely fine with merging the previous PR if it gets completed before this is reviewed.
Let me know if there's anything missing or anything else I can do 👍
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6573
Reviewed By: cheng-chang
Differential Revision: D20604666
Pulled By: pdillinger
fbshipit-source-id: 4de17df1180c3b87b76cfdd77b674b81fc0563f7
Summary:
This change is fixing a crash happening in getApproximateSizes JNI implementation. It also reenables Java test that was crashing most likelly because if this bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6652
Reviewed By: cheng-chang
Differential Revision: D20874865
Pulled By: pdillinger
fbshipit-source-id: da95516f15e5df2efe1a4e5690a2ce172cb53f87
Summary:
Adding a Java API for rocksdb::CancelAllBackgroundWork() so that the user can call this (when required) before closing the DB. This is to **prevent the crashes when manual compaction is running and the user decides to close the DB**.
Calling CancelAllBackgroundWork() seems to be the recommended way to make sure that it's safe to close the DB (according to RocksDB FAQ: https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ#basic-readwrite).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6657
Reviewed By: cheng-chang
Differential Revision: D20896395
Pulled By: pdillinger
fbshipit-source-id: 8a8208c10093db09bd35db9af362211897870d96
Summary:
In most places in the code the variable names are spelled correctly as
COMMITTED but in a couple places not. This fixes them and ensures the
variable is always called COMMITTED everywhere.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6481
Differential Revision: D20306776
Pulled By: pdillinger
fbshipit-source-id: b6c1bfe41db559b4bc6955c530934460c07f7022
Summary:
Threw assert error at assert(isOwningHandle()) in ColumnFamilyHandle.getDescriptor(),
because default CF don't own a handle, due to [RocksDB.getDefaultColumnFamily()](3a408eeae9/java/src/main/java/org/rocksdb/RocksDB.java (L3702)) called cfHandle.disOwnNativeHandle().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6006
Differential Revision: D19031448
fbshipit-source-id: 2420c45e835bda0e552e919b1b63708472b91538
Summary:
It is very useful to support direct ByteBuffers in Java. It allows to have zero memory copy and some serializers are using that directly so one do not need to create byte[] array for it.
This change also contains some fixes for Windows JNI build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/2283
Differential Revision: D19834971
Pulled By: pdillinger
fbshipit-source-id: 44173aa02afc9836c5498c592fd1ea95b6086e8e
Summary:
This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.
**NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.
Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).
In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.
In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.
---
[JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.
With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425).
These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.
---
These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.
```
ComparatorBenchmarks.put native_bytewise thrpt 25 124483.795 ± 2032.443 ops/s
ComparatorBenchmarks.put native_reverse_bytewise thrpt 25 114414.536 ± 3486.156 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 17228.250 ± 1288.546 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 16035.865 ± 1248.099 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_reused-64_thread-local thrpt 25 21571.500 ± 871.521 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_adaptive-mutex thrpt 25 23613.773 ± 8465.660 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 16768.172 ± 5618.489 ops/s
ComparatorBenchmarks.put java_bytewise_direct_reused-64_thread-local thrpt 25 23921.164 ± 8734.742 ops/s
ComparatorBenchmarks.put java_bytewise_non-direct_no-reuse thrpt 25 17899.684 ± 839.679 ops/s
ComparatorBenchmarks.put java_bytewise_direct_no-reuse thrpt 25 22148.316 ± 1215.527 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_adaptive-mutex thrpt 25 11311.126 ± 820.602 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex thrpt 25 11421.311 ± 807.210 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_reused-64_thread-local thrpt 25 11554.005 ± 960.556 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_adaptive-mutex thrpt 25 22960.523 ± 1673.421 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_non-adaptive-mutex thrpt 25 18293.317 ± 1434.601 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_reused-64_thread-local thrpt 25 24479.361 ± 2157.306 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_non-direct_no-reuse thrpt 25 7942.286 ± 626.170 ops/s
ComparatorBenchmarks.put java_reverse_bytewise_direct_no-reuse thrpt 25 11781.955 ± 1019.843 ops/s
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252
Differential Revision: D19331064
Pulled By: pdillinger
fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
Summary:
There are no API changes ;-)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6218
Differential Revision: D19200373
Pulled By: pdillinger
fbshipit-source-id: 58d34b01ea53b75a1eccbd72f8b14d6256a7380f
Summary:
Should fix Travis build error that randomly showed up upon
using Java 13 version of javadoc.
AdvancedColumnFamilyOptionsInterface.java:257: error:
unexpected heading used: <H2>, compared to implicit preceding heading: <H3>
According to this reference https://bugs.openjdk.java.net/browse/JDK-8220379
it should work to start at h4, but that didn't work, so avoiding
headings should be fine.
Also fix Java EnvironmentTest for JDK13.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6208
Test Plan: Travis run on PR (don't have Java 13 handy)
Differential Revision: D19163105
Pulled By: pdillinger
fbshipit-source-id: 4a9419cbe7ef780fba771b8a1508e1ea80d17b3e
Summary:
Add the jni library for musl-libc, specifically for incorporating into Alpine based docker images. The classifier is `musl64`.
I have signed the CLA electronically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3143
Differential Revision: D18719372
fbshipit-source-id: 6189d149310b6436d6def7d808566b0234b23313
Summary:
There's no technological impediment to allowing the Bloom
filter bits/key to be non-integer (fractional/decimal) values, and it
provides finer control over the memory vs. accuracy trade-off. This is
especially handy in using the format_version=5 Bloom filter in place
of the old one, because bits_per_key=9.55 provides the same accuracy as
the old bits_per_key=10.
This change not only requires refining the logic for choosing the best
num_probes for a given bits/key setting, it revealed a flaw in that logic.
As bits/key gets higher, the best num_probes for a cache-local Bloom
filter is closer to bpk / 2 than to bpk * 0.69, the best choice for a
standard Bloom filter. For example, at 16 bits per key, the best
num_probes is 9 (FP rate = 0.0843%) not 11 (FP rate = 0.0884%).
This change fixes and refines that logic (for the format_version=5
Bloom filter only, just in case) based on empirical tests to find
accuracy inflection points between each num_probes.
Although bits_per_key is now specified as a double, the new Bloom
filter converts/rounds this to "millibits / key" for predictable/precise
internal computations. Just in case of unforeseen compatibility
issues, we round to the nearest whole number bits / key for the
legacy Bloom filter, so as not to unlock new behaviors for it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6092
Test Plan: unit tests included
Differential Revision: D18711313
Pulled By: pdillinger
fbshipit-source-id: 1aa73295f152a995328cb846ef9157ae8a05522a
Summary:
Add unordered_write option api and related ut to rocksjava
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5839
Differential Revision: D17604446
Pulled By: maysamyabandeh
fbshipit-source-id: c6b07e85ca9d5e3a92973ddb6ab2bc079e53c9c1
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830
Test Plan: Run all existing tests.
Differential Revision: D17488031
fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
Summary:
Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827
Test Plan: Run all existing tests.
Differential Revision: D17483727
fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f
Summary:
The actual value of default write buffer size within `rocksdb/include/rocksdb/options.h` is 64 MB, we should correct this value in java doc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5670
Differential Revision: D16668815
Pulled By: maysamyabandeh
fbshipit-source-id: cc3a981c9f1c2cd4a8392b0ed5f1fd0a2d729afb
Summary:
if read_options.snapshot is not set, ::Get will take the last sequence number after taking a super-version and uses that as the sequence number. Theoretically max_eviceted_seq_ could advance this sequence number. This could lead ::IsInSnapshot that will be invoked by the ReadCallback to notice the absence of the snapshot. In this case, the ReadCallback should have passed a non-value to snap_released so that it could be set by the ::IsInSnapshot. The patch does that, and adds a unit test to verify it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5664
Differential Revision: D16614033
Pulled By: maysamyabandeh
fbshipit-source-id: 06fb3fd4aacd75806ed1a1acec7961f5d02486f2
Summary:
As [BlockBasedTableConfig setBlockCacheSize()](1966a7c055/java/src/main/java/org/rocksdb/BlockBasedTableConfig.java (L728)) said, If cacheSize is non-positive, then cache will not be used. but when we configure a negative number or 0, there is an unexpected result: the block cache becomes 8M.
- Allow 0 as a valid size. When block cache size is 0, an 8MB block cache is created, as it is the default C++ API behavior. Also updated the comment.
- Set no_block_cache true if negative value is passed to block cache size, and no block cache will be created.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5465
Differential Revision: D15968788
Pulled By: sagar0
fbshipit-source-id: ee02d6e95841c9e2c316a64bfdf192d46ff5638a
Summary:
Make the generics of the Options interfaces more strict so they are usable in a Kotlin Multiplatform expect/actual typealias implementation without causing a Violation of Finite Bound Restriction.
This fix would enable the creation of a generic Kotlin multiplatform library by just typealiasing the JVM implementation to the current Java implementation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5461
Differential Revision: D15903288
Pulled By: sagar0
fbshipit-source-id: 75e83fdf5d2fcede40744a17e767563d6a4b0696
Summary:
I would like to be able to read out the current Filter that has been set (or not) for a BlockBasedTableConfig. Added one public method to BlockBasedTableConfig:
public Filter filterPolicy() {
return filterPolicy;
}
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5186
Differential Revision: D14921415
Pulled By: siying
fbshipit-source-id: 2a63c8685480197862b49fc48916c757cd6daf95
Summary:
BackupEngine relies on write-ahead logs to back up the memtable. Disabling write-ahead logs
can result in backups failing to preserve unflushed keys. This PR updates the documentation to specify this behavior, and suggest always flushing the memtable when write-ahead logs are disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5071
Differential Revision: D14524124
Pulled By: miasantreble
fbshipit-source-id: 635f855f8a42ad60273b5efd226139b511e3e5d5
Summary:
Disabling `org.rocksdb.RocksDBTest.getApproximateSizes` test as it is frequently crashing on travis (#5020). It will be re-enabled once the root-cause is found and fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5035
Differential Revision: D14294736
Pulled By: sagar0
fbshipit-source-id: e28bff0d143a58ad6c82991fec3d4cf8c0209995
Summary:
This reverts commit ee1818081f.
We are not ready to deprecate this feature. revert it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5034
Differential Revision: D14287246
Pulled By: siying
fbshipit-source-id: e4beafdeaee1c94364fdaa6ba198218d158339f7
Summary:
`DefaultEnvTest.incBackgroundThreadsIfNeeded` jtest should assert that the number of threads is greater than or equal to the minimum number of threads.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5021
Differential Revision: D14268311
Pulled By: sagar0
fbshipit-source-id: 01fb32b5b3ce636451d162fa1a2bbc5bd1974682
Summary:
This is my latest round of changes to add missing items to RocksJava. More to come in future PRs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4833
Differential Revision: D14152266
Pulled By: sagar0
fbshipit-source-id: d6cff67e26da06c131491b5cf6911a8cd0db0775
Summary:
The info log header feature never worked well, because log level Header was not
translated to Logger::LogHeader() call. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4980
Differential Revision: D14087283
Pulled By: siying
fbshipit-source-id: 7e7d03ce35fa8d13d4ee549f46f7326f7bc0006d
Summary:
We introduced ttl option in CompactionOptionsFIFO when ttl-based file
deletion (compaction) was supported only as part of FIFO Compaction. But
with the extension of ttl semantics even to Level compaction,
CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start
using ColumnFamilyOptions.ttl for FIFO compaction as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965
Differential Revision: D14072960
Pulled By: sagar0
fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e
Summary:
Store_index_in_file is a less useful feature. To simplify the code to maintain, we are dropping the feature.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4914
Differential Revision: D13791883
Pulled By: siying
fbshipit-source-id: d187c5d662584866103e4b77d09dfb925509ae2e
Summary:
Expose common stats min,max,count,sum via statistics JNI. These stats are not fully exposed on the Java side as is, but are available on the native side.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4742
Differential Revision: D13403766
Pulled By: ajkr
fbshipit-source-id: 5b70f7bd3fb7490aab73dcbd09f13490fce5c773
Summary:
Updating the `HistogramType.java` and `TickerType.java` to expose and correct metrics for statistics callbacks.
Moved `NO_ITERATOR_CREATED` to the proper stat name and deprecated `NO_ITERATORS`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4733
Differential Revision: D13466936
Pulled By: sagar0
fbshipit-source-id: a58d1edcc07c7b68c3525b1aa05828212c89c6c7
Summary:
This PR fixes#4721. When an exception is caught and thrown as a different exception, then the original exception should be inserted as a cause of the new exception. This bug in RocksDB was swallowing the underlying exception from `NativeLibraryLoader` and throwing the following exception
```
...
Caused by: java.lang.RuntimeException: Unable to load the RocksDB shared libraryjava.nio.channels.ClosedByInterruptException
at org.rocksdb.RocksDB.loadLibrary(RocksDB.java:67)
at org.rocksdb.RocksDB.<clinit>(RocksDB.java:35)
... 73 more
```
The fix is simple and self-explanatory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4728
Differential Revision: D13418371
Pulled By: sagar0
fbshipit-source-id: d76c25af2a83a0f8ba62cc8d7b721bfddc85fdf1
Summary:
Fixes some RocksJava regressions recently introduced, whereby RocksJava would not build on JDK 7.
These should have been visible on Travis-CI!
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4768
Differential Revision: D13418173
Pulled By: sagar0
fbshipit-source-id: 57bf223188887f84d9e072031af2e0d2c8a69c30
Summary:
When adding CompactionFilter and CompactionFilterFactory settings to the Java layer, ColumnFamilyOptions was modified directly instead of ColumnFamilyOptionsInterface. This meant that the old-stye Options monolith was left behind.
This patch fixes that, by:
- promoting the CompactionFilter + CompactionFilterFactory setters from ColumnFamilyOptions -> ColumnFamilyOptionsInterface
- adding getters in ColumnFamilyOptionsInterface
- implementing setters in Options
- implementing getters in both ColumnFamilyOptions and Options
- adding testcases
- reusing a test CompactionFilterFactory by moving it to a common location
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3461
Differential Revision: D13278788
Pulled By: sagar0
fbshipit-source-id: 72602c6eb97dc80734e718abb5e2e9958d3c753b
Summary:
Compile logs have a bit of noise due to missing javadoc annotations. Updating docs to reduce.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4764
Differential Revision: D13400193
Pulled By: sagar0
fbshipit-source-id: 65c7efb70747cc3bb35a336a6881ea6536ae5ff4
Summary:
Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false).
The Java APIs are accordingly updated.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680
Differential Revision: D13068508
Pulled By: maysamyabandeh
fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d
Summary:
Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors.
This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator
Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702
Differential Revision: D13146643
Pulled By: miasantreble
fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0
Summary:
Make CompactionOptionsFIFO's ttl and allow_compaction options to be available in RocksJava.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4609
Differential Revision: D12849503
Pulled By: sagar0
fbshipit-source-id: 47baa97918d252370f234c36c1af15ff2dad7658
Summary:
Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656
Differential Revision: D13039257
Pulled By: miasantreble
fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1
Summary:
Currently, `Statistics` can record tick by `recordTick()` whose second parameter is an `uint64_t`.
That means tick can only increase.
If we want to reduce tick, we have to work around like `RecordTick(statistics_, NO_ITERATORS, uint64_t(-1));`.
That's kind of a hack.
So, this PR divide `NO_ITERATORS` into two counters `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETE`, making the counters increase only.
Fixes#3013 .
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4498
Differential Revision: D10395010
Pulled By: sagar0
fbshipit-source-id: cfb523b22a37411c794b4e9da090f1ae30293db2
Summary:
In 'StatisticsCollector', the call of Thread.sleep() might be better outside the loop?
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4588
Differential Revision: D12903406
Pulled By: sagar0
fbshipit-source-id: 1647ed779e9972bc2cea03f4c38e37ab3ad7c361
Summary:
1. `WriteBufferManager` should have a reference alive in Java side through `Options`/`DBOptions` otherwise, if it's GC'ed at java side, native side can seg fault.
2. native method `setWriteBufferManager()` in `DBOptions.java` doesn't have it's jni method invocation in rocksdbjni which is added in this PR
3. `DBOptionsTest.java` is referencing object of `Options`. Instead it should be testing against `DBOptions`. Seems like a copy paste error.
4. Add a getter for WriteBufferManager.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4579
Differential Revision: D10561150
Pulled By: sagar0
fbshipit-source-id: 139a15c7f051a9f77b4200215b88267b48fbc487
Summary:
Allow rocks java to explicitly create WriteBufferManager by plumbing it to the native code through JNI.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4492
Differential Revision: D10428506
Pulled By: sagar0
fbshipit-source-id: cd9dd8c2ef745a0303416b44e2080547bdcca1fd