Summary:
https://github.com/facebook/rocksdb/pull/7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet.
This PR targets to introduce this feature for rocksJava.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9076
Reviewed By: akankshamahajan15
Differential Revision: D31931092
Pulled By: anand1976
fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578
Summary:
This PR adds support for building on s390x including updating travis CI. It uses the previous work in https://github.com/facebook/rocksdb/pull/6168 and adds some more changes to get all current tests (make check and jni tests) to pass. The tests were run with snappy, lz4, bzip2 and zstd all compiled in.
There are a few pieces still needed to get the travis build working that I don't think I can do. adamretter is this something you could help with?
1. A prebuilt https://rocksdb-deps.s3-us-west-2.amazonaws.com/cmake/cmake-3.14.5-Linux-s390x.deb package
2. A https://hub.docker.com/r/evolvedbinary/rocksjava s390x image
Not sure if there is more required for travis. Happy to help in any way I can.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8962
Reviewed By: mrambacher
Differential Revision: D31802198
Pulled By: pdillinger
fbshipit-source-id: 683511466fa6b505f85ba5a9964a268c6151f0c2
Summary:
Implementation of https://github.com/facebook/rocksdb/issues/8221, plus/including extension of Java options API to allow the get() of options from RocksDB. The extension allows more comprehensive testing of options at the Java side, by validating that the options are set at the C++ side.
Variations on methods:
MutableColumnFamilyOptions.MutableColumnFamilyOptionsBuilder getOptions()
MutableDBOptions.MutableDBOptionsBuilder getDBOptions()
retrieve the options via RocksDB C++ interfaces, and parse the resulting string into one of the Java-style option objects.
This necessitated generalising the parsing of option strings in Java, which now parses the full range of option strings returned by the C++ interface, rather than a useful subset. This necessitates the list-separator being changed to :(colon) from , (comma).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8999
Reviewed By: jay-zhuang
Differential Revision: D31655487
Pulled By: ltamasi
fbshipit-source-id: c38e98145c81c61dc38238b0df580db176ce4efd
Summary:
closes https://github.com/facebook/rocksdb/issues/8039
Unnecessary use of multiple local JNI references at the same time, 1 per key, was limiting the size of the key array. The local references don't need to be held simultaneously, so if we rearrange the code we can make it work for bigger key arrays.
Incidentally, make errors throw helpful exception messages rather than returning a null pointer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9012
Reviewed By: mrambacher
Differential Revision: D31580862
Pulled By: jay-zhuang
fbshipit-source-id: ce05831d52ede332e1b20e74d2dc621d219b9616
Summary:
Add a stat for secondary cache hits. The ```Cache::Lookup``` API had an unused ```stats``` parameter. This PR uses that to pass the pointer to a ```Statistics``` object that ```LRUCache``` uses to record the stat.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8666
Test Plan: Update a unit test in lru_cache_test
Reviewed By: zhichao-cao
Differential Revision: D30353816
Pulled By: anand1976
fbshipit-source-id: 2046f78b460428877a26ffdd2bb914ae47dfbe77
Summary:
Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310
Test Plan: Add several unit tests.
Reviewed By: ajkr
Differential Revision: D28493792
fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501
Summary:
By default, the low priority pool is not the flush pool, so calling `Env#setBackgroundThreads` without providing a priority will not do what the caller expected.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8576
Reviewed By: ajkr
Differential Revision: D29925154
Pulled By: mrambacher
fbshipit-source-id: cd7211fc374e7d9929a9b88ea0a5ba8134b76099
Summary:
An arbitrary string can be used as a delimiter in StringAppend merge operator
flavor. In particular, it allows using an empty string, combining binary values for
the same key byte-to-byte one next to another.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8536
Reviewed By: mrambacher
Differential Revision: D29962120
Pulled By: zhichao-cao
fbshipit-source-id: 4ef5d846a47835cf428a11200409e30e2dbffc4f
Summary:
I previously didn't notice the DB mutex was being held during
block cache entry stat scans, probably because I primarily checked for
read performance regressions, because they require the block cache and
are traditionally latency-sensitive.
This change does some refactoring to avoid holding DB mutex and to
avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats").
Some tests have to be updated because now the stats collector is
populated in the Cache aggressively on DB startup rather than lazily.
(I hope to clean up some of this added complexity in the future.)
This change also ensures proper treatment of need_out_of_mutex for
non-int DB properties.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538
Test Plan:
Added unit test logic that uses sync points to fail if the DB mutex
is held during a scan, covering the various ways that a scan might be
triggered.
Performance test - the known impact to holding the DB mutex is on
TransactionDB, and the easiest way to see the impact is to hack the
scan code to almost always miss and take an artificially long time
scanning. Here I've injected an unconditional 5s sleep at the call to
ApplyToAllEntries.
Before (hacked):
$ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 433.219 micros/op 2308 ops/sec; 0.1 MB/s ( transactions:78999 aborts:0)
rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856
$ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 448.802 micros/op 2228 ops/sec; 0.1 MB/s ( transactions:75999 aborts:0)
rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323
Notice the 5s P100 write time.
After (hacked):
$ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 303.645 micros/op 3293 ops/sec; 0.1 MB/s ( transactions:98999 aborts:0)
rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407
$ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 310.383 micros/op 3221 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0)
rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918
P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code:
$ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 311.365 micros/op 3211 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0)
rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767
$ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 308.395 micros/op 3242 ops/sec; 0.1 MB/s ( transactions:97999 aborts:0)
rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832
No substantial difference.
Reviewed By: siying
Differential Revision: D29738847
Pulled By: pdillinger
fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b
Summary:
**Summary**:
2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations.
**Unit test**: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test.
This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR can be exactly predicted, showing that we have a full understanding of the underlying operations.
**Performance testing**:
`./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times.
Timing done using "date" function in a bash script.
_Results_:
Original Rocksdb fork: mean 66.6 sec, std 1.18 sec.
This feature branch: mean 67.4 sec, std 1.35 sec.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8411
Reviewed By: akankshamahajan15
Differential Revision: D29150629
Pulled By: bjlemaire
fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1
Summary:
In a distributed environment, a file `rename()` operation can succeed on server (remote)
side, but the client can somehow return non-ok status to RocksDB. Possible reasons include
network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which
can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a
new MANIFEST. We currently always delete the new MANIFEST if an error occurs.
This is problematic in distributed world. If the server-side successfully updates the CURRENT
file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail.
As a fix, we can track the execution result of IO operations on the new MANIFEST.
- If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original
MANIFEST. Therefore, it is safe to remove the new MANIFEST.
- If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up
code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local
POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the
new MANIFEST.) Therefore, we keep the new MANIFEST.
- Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT.
- If process reopens the db immediately after the failure, then the CURRENT file can point
to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can
succeed and ignore the other.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192
Test Plan: make check
Reviewed By: zhichao-cao
Differential Revision: D27804648
Pulled By: riversand963
fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4
Summary:
Added `TableProperties::{fast,slow}_compression_estimated_data_size`.
These properties are present in block-based tables when
`ColumnFamilyOptions::sample_for_compression > 0` and the necessary
compression library is supported when the file is generated. They
contain estimates of what `TableProperties::data_size` would be if the
"fast"/"slow" compression library had been used instead. One
limitation is we do not record exactly which "fast" (ZSTD or Zlib)
or "slow" (LZ4 or Snappy) compression library produced the result.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139
Test Plan:
- new unit test
- ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression
Reviewed By: riversand963
Differential Revision: D27454338
Pulled By: ajkr
fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4
Summary:
Which should return 2 long instead of an array.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8098
Reviewed By: mrambacher
Differential Revision: D27308741
Pulled By: jay-zhuang
fbshipit-source-id: 44beea2bd28cf6779b048bebc98f2426fe95e25c
Summary:
This is a small fix to what I think is a mistype in two comments in `DBOptionsInterface.java`. If it was not an error, feel free to close.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8086
Reviewed By: ajkr
Differential Revision: D27260488
Pulled By: mrambacher
fbshipit-source-id: 469daadaf6039d5b5187132b8e0c7c3672842f21
Summary:
Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume, auto resume total retry, and auto resume sucess; Histogram for auto resume retry count in each recovery call.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8050
Test Plan: make check and add test to error_handler_fs_test
Reviewed By: anand1976
Differential Revision: D26990565
Pulled By: zhichao-cao
fbshipit-source-id: 49f71e8ea4e9db8b189943976404205b56ab883f
Summary:
support getUsage and getPinnedUsage in JavaAPI for Cache
also fix a typo in LRUCacheTest.java that the highPriPoolRatio is not valid(set 5, I guess it means 0.05)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7925
Reviewed By: mrambacher
Differential Revision: D26900241
Pulled By: ajkr
fbshipit-source-id: 735d1e40a16fa8919c89c7c7154ba7f81208ec33
Summary:
Fixes 3 minor Javadoc copy-paste errors in the `RocksDB#newIterator()` and `Transaction#getIterator()` variants that take a column family handle but are talking about iterating over "the database" or "the default column family".
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8034
Reviewed By: jay-zhuang
Differential Revision: D26877667
Pulled By: mrambacher
fbshipit-source-id: 95dd95b667c496e389f221acc9a91b340e4b63bf
Summary:
The variable `byteCompressionType` is only assigned values of primitive type and is never 'null', but it is declared with the boxed type 'Byte'.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7981
Reviewed By: ajkr
Differential Revision: D26546600
Pulled By: jay-zhuang
fbshipit-source-id: 07b579cdfcfc2262a448ca3626e216416fd05892
Summary:
Haven't seen any production issues with new Bloom filter and
it's now > 1 year old (added in 6.6.0).
Updated check_format_compatible.sh and HISTORY.md
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8017
Test Plan: tests updated (or prior bugs fixed)
Reviewed By: ajkr
Differential Revision: D26762197
Pulled By: pdillinger
fbshipit-source-id: 0e755c46b443087c1544da0fd545beb9c403d1c2
Summary:
This request is adding support for using DirectSlice in ReadOptions lower/upper bounds.
To be more efficient I have added setLength to DirectSlice so I can just update the length to be used by slice from direct buffer. It is also needed, because when one creates iterator it keep pointer to original slice so setting new slice in options does not help (it needs to reuse existing one). Using this approach one can modify the slice any time during operations with iterator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7132
Reviewed By: zhichao-cao
Differential Revision: D25840092
Pulled By: jay-zhuang
fbshipit-source-id: 760167baf61568c9a35138145c4bf9b06824cb71
Summary:
Classes ColumnFamilyHandle and CapturingWriteBatchHandler.Event have
byte array fields as part of their identity, but they do not use the
arrays' content to compute the instance's hash, and instead rely on the
arrays' identity, causing instances to have different hashcodes
although they are equal.
The PR addresses it by using the arrays' content to compute the hash,
like the equals method does.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7860
Reviewed By: jay-zhuang
Differential Revision: D25901327
Pulled By: akankshamahajan15
fbshipit-source-id: 347e7b3d2ba7befe7faa956b033e6421b9d0c235
Summary:
Consider the following sequence of events:
1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
2. Syncing MANIFEST failed and db crashed.
3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
4. Db crashed again.
5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.
It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.
After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621
Test Plan:
1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
2. a new unit test is added in db_basic_test to simulate step 3.
Reviewed By: riversand963
Differential Revision: D24668144
Pulled By: cheng-chang
fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
Summary:
The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608
Test Plan: watch the existing java tests to pass
Reviewed By: zhichao-cao
Differential Revision: D24611621
Pulled By: cheng-chang
fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d
Summary:
- Takes the burden off developer to close ColumnFamilyHandle instances before closing RocksDB instance
- The change is backward-compatible
----
Previously the pattern for working with Column Families was:
```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {
// list of column family descriptors, first entry must always be default column family
final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
);
// a list which will hold the handles for the column families once the db is opened
final List<ColumnFamilyHandle> columnFamilyHandleList =
new ArrayList<>();
try (final DBOptions options = new DBOptions()
.setCreateIfMissing(true)
.setCreateMissingColumnFamilies(true);
final RocksDB db = RocksDB.open(options,
"path/to/do", cfDescriptors,
columnFamilyHandleList)) {
try {
// do something
} finally {
// NOTE user must explicitly frees the column family handles before freeing the db
for (final ColumnFamilyHandle columnFamilyHandle :
columnFamilyHandleList) {
columnFamilyHandle.close();
}
} // frees the column family options
}
} // frees the db and the db options
```
With the changes in this PR, the Java user no longer has to worry about manually closing the Column Families, which allows them to write simpler symmetrical create/free oriented code like this:
```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {
// list of column family descriptors, first entry must always be default column family
final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
);
// a list which will hold the handles for the column families once the db is opened
final List<ColumnFamilyHandle> columnFamilyHandleList =
new ArrayList<>();
try (final DBOptions options = new DBOptions()
.setCreateIfMissing(true)
.setCreateMissingColumnFamilies(true);
final RocksDB db = RocksDB.open(options,
"path/to/do", cfDescriptors,
columnFamilyHandleList)) {
// do something
} // frees the column family options, then frees the db and the db options
}
}
```
**NOTE**: The changes in this PR are backwards API compatible, which means existing code using the original approach will also continue to function correctly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7428
Reviewed By: cheng-chang
Differential Revision: D24063348
Pulled By: jay-zhuang
fbshipit-source-id: 648d7526669923128c863ead94516bf4d50ac658
Summary:
Allows adding event listeners in RocksJava.
* Adds listeners getter and setter in `Options` and `DBOptions` classes.
* Adds `EventListener` Java interface and base class for implementing custom event listener callbacks - `AbstractEventListener`, which has an underlying native callback class implementing C++ `EventListener` class.
* `AbstractEventListener` class has mechanism for selectively enabling its callback methods in order to prevent invoking Java method if it is not implemented. This decreases performance cost in case only subset of event listener callback methods is needed - the JNI code for remaining "no-op" callbacks is not executed.
* The code is covered by unit tests in `EventListenerTest.java`, there are also tests added for setting/getting listeners field in `OptionsTest.java` and `DBOptionsTest.java`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7425
Reviewed By: pdillinger
Differential Revision: D24063390
Pulled By: jay-zhuang
fbshipit-source-id: 508c359538983d6b765e70d9989c351794a944ee
Summary:
Add following stats for MultiGet in Histogram to get more insight on MultiGet.
1. Number of index and filter blocks read from file as part of MultiGet
request per level.
2. Number of data blocks read from file per level.
3. Number of SST files loaded from file system per level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7366
Reviewed By: anand1976
Differential Revision: D24127040
Pulled By: akankshamahajan15
fbshipit-source-id: e63a003056b833729b277edc0639c08fb432756b
Summary:
as title
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7347
Test Plan: unit tests included
Reviewed By: jay-zhuang
Differential Revision: D23592552
Pulled By: pdillinger
fbshipit-source-id: 1c3571b6f42bfd0cfd723ff49d01fbc02a1be45b
Summary:
Previously RocksJava limited the format_version to 4. However, the C++ API is now at 5, and this will likely increase again in future. The Java API now allows any positive integer, and an exception is raised from JNI if the format_version is out-of-bounds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7242
Reviewed By: cheng-chang
Differential Revision: D23077941
Pulled By: pdillinger
fbshipit-source-id: ee69f7203448acddc41c6d86b470ed987d3d366d
Summary:
The PR fixes a Java test for Merge operator `uint64add`.
The current implementation uses wrong byte order for long serialization, but fails to catch this error because the merge sum is lower than `256`.
The PR makes this test case more representative (i.e. it fails with wrong byte order) and changes the byte order to little endian.
Some background: RocksDB uses LittleEndian byte order for integer serialization across all platforms. `MergeTest` uses `ByteBuffer` that defaults to BigEndian byte order.
This test case might probably be used as a sample of `MergeOperator` usage in Java.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7243
Reviewed By: ajkr
Differential Revision: D23079593
Pulled By: pdillinger
fbshipit-source-id: 82e8e166901d66733e96a0116f88d0ec4761ddf1
Summary:
Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165
Test Plan:
TTL and periodic can be checked by runnning db_bench with the options activated:
/db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1
./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1
Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected.
Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking
Reviewed By: ajkr
Differential Revision: D22693050
Pulled By: akabcenell
fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d
Summary:
SST Partitioner interface that allows to split SST files during compactions.
It basically instruct compaction to create a new file when needed. When one is using well defined prefixes and prefixed way of defining tables it is good to define also partitioning so that promotion of some SST file does not cover huge key space on next level (worst case complete space).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6957
Reviewed By: ajkr
Differential Revision: D22461239
fbshipit-source-id: 9ce07bba08b3ba89c2d45630520368f704d1316e