Commit Graph

790 Commits

Author SHA1 Message Date
Alan Paxton
f5526af8ed Fix multiget throwing NPE for num of keys > 70k (#9012)
Summary:
closes https://github.com/facebook/rocksdb/issues/8039

Unnecessary use of multiple local JNI references at the same time, 1 per key, was limiting the size of the key array. The local references don't need to be held simultaneously, so if we rearrange the code we can make it work for bigger key arrays.

Incidentally, make errors throw helpful exception messages rather than returning a null pointer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9012

Reviewed By: mrambacher

Differential Revision: D31580862

Pulled By: jay-zhuang

fbshipit-source-id: ce05831d52ede332e1b20e74d2dc621d219b9616
2021-10-14 11:48:12 -07:00
Jay Zhuang
6b34eb0ebc Add remote compaction read/write bytes statistics (#8939)
Summary:
Add basic read/write bytes statistics on the primary side:
`REMOTE_COMPACT_READ_BYTES`
`REMOTE_COMPACT_WRITE_BYTES`

Fixed existing statistics missing some IO for remote compaction.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8939

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D31074672

Pulled By: jay-zhuang

fbshipit-source-id: c57afdba369990185008ffaec7e3fe7c62e8902f
2021-09-28 14:00:37 -07:00
Peter Dillinger
cb5b851ff8 Add (& fix) some simple source code checks (#8821)
Summary:
* Don't hardcode namespace rocksdb (use ROCKSDB_NAMESPACE)
* Don't #include <rocksdb/...> (use double quotes)
* Support putting NOCOMMIT (any case) in source code that should not be
committed/pushed in current state.

These will be run with `make check` and in GitHub actions

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8821

Test Plan: existing tests, manually try out new checks

Reviewed By: zhichao-cao

Differential Revision: D30791726

Pulled By: pdillinger

fbshipit-source-id: 399c883f312be24d9e55c58951d4013e18429d92
2021-09-07 21:19:27 -07:00
Andrew Kryczka
9308ff366c Bytes read/written stats for CreateNewBackup*() (#8819)
Summary:
Gets `Statistics` from the options associated with the `DB` undergoing backup, and populates new ticker stats with the thread-local `IOContext` read/write counters for the threads doing backup work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8819

Reviewed By: pdillinger

Differential Revision: D30779238

Pulled By: ajkr

fbshipit-source-id: 75ccafc355f90906df5cf80367f7245b985772d8
2021-09-07 18:25:16 -07:00
Andrew Kryczka
941543721d Bytes read stat for VerifyChecksum() and VerifyFileChecksums() APIs (#8741)
Summary:
- Clarified some comments on compatibility for adding new ticker stats
- Added read I/O stats for `VerifyChecksum()` and `VerifyFileChecksums()` APIs

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8741

Test Plan: new unit test

Reviewed By: zhichao-cao

Differential Revision: D30708578

Pulled By: ajkr

fbshipit-source-id: d06b961f7e199ae92c266b683e39870aa8f63449
2021-09-07 13:28:29 -07:00
Peter Dillinger
c9cd5d25a8 Remove some unneeded code (#8736)
Summary:
* FullKey and ParseFullKey appear to serve no purpose in the public API
(or anything else) so removed. Only use in one test updated.
* NumberToString serves no purpose vs. ToString so removed, numerous
calls updated
* Remove unnecessary forward declarations in metadata.h by re-arranging
class definitions.
* Remove some unneeded semicolons

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736

Test Plan: existing tests

Reviewed By: mrambacher

Differential Revision: D30700039

Pulled By: pdillinger

fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2
2021-09-01 14:28:58 -07:00
anand76
add68bd28a Add a stat to count secondary cache hits (#8666)
Summary:
Add a stat for secondary cache hits. The ```Cache::Lookup``` API had an unused ```stats``` parameter. This PR uses that to pass the pointer to a ```Statistics``` object that ```LRUCache``` uses to record the stat.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8666

Test Plan: Update a unit test in lru_cache_test

Reviewed By: zhichao-cao

Differential Revision: D30353816

Pulled By: anand1976

fbshipit-source-id: 2046f78b460428877a26ffdd2bb914ae47dfbe77
2021-08-16 21:01:14 -07:00
sdong
e7c24168d8 Move old files to warm tier in FIFO compactions (#8310)
Summary:
Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310

Test Plan: Add several unit tests.

Reviewed By: ajkr

Differential Revision: D28493792

fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501
2021-08-09 12:51:14 -07:00
Brendan MacDonell
8ca081780b Correct javadoc for Env#setBackgroundThreads(int) (#8576)
Summary:
By default, the low priority pool is not the flush pool, so calling `Env#setBackgroundThreads` without providing a priority will not do what the caller expected.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8576

Reviewed By: ajkr

Differential Revision: D29925154

Pulled By: mrambacher

fbshipit-source-id: cd7211fc374e7d9929a9b88ea0a5ba8134b76099
2021-08-06 08:52:14 -07:00
Mikhail Golubev
8f52972cf9 Allow to use a string as a delimiter in StringAppendOperator (#8536)
Summary:
An arbitrary string can be used as a delimiter in StringAppend merge operator
flavor. In particular, it allows using an empty string, combining binary values for
the same key byte-to-byte one next to another.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8536

Reviewed By: mrambacher

Differential Revision: D29962120

Pulled By: zhichao-cao

fbshipit-source-id: 4ef5d846a47835cf428a11200409e30e2dbffc4f
2021-08-02 16:50:41 -07:00
Anatolii Zhmaiev
9ddb55a8f6 Add periodic_compaction_seconds option to RocksJava (#8579)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/8578

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8579

Reviewed By: ajkr

Differential Revision: D29895081

Pulled By: mrambacher

fbshipit-source-id: 3e4120e26a3e8252f8301d657c0aaa0b8550cddf
2021-07-26 17:33:42 -07:00
Peter Dillinger
df5dc73bec Don't hold DB mutex for block cache entry stat scans (#8538)
Summary:
I previously didn't notice the DB mutex was being held during
block cache entry stat scans, probably because I primarily checked for
read performance regressions, because they require the block cache and
are traditionally latency-sensitive.

This change does some refactoring to avoid holding DB mutex and to
avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats").
Some tests have to be updated because now the stats collector is
populated in the Cache aggressively on DB startup rather than lazily.
(I hope to clean up some of this added complexity in the future.)

This change also ensures proper treatment of need_out_of_mutex for
non-int DB properties.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538

Test Plan:
Added unit test logic that uses sync points to fail if the DB mutex
is held during a scan, covering the various ways that a scan might be
triggered.

Performance test - the known impact to holding the DB mutex is on
TransactionDB, and the easiest way to see the impact is to hack the
scan code to almost always miss and take an artificially long time
scanning. Here I've injected an unconditional 5s sleep at the call to
ApplyToAllEntries.

Before (hacked):

    $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
    randomtransaction :     433.219 micros/op 2308 ops/sec;    0.1 MB/s ( transactions:78999 aborts:0)
    rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856
    $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
    randomtransaction :     448.802 micros/op 2228 ops/sec;    0.1 MB/s ( transactions:75999 aborts:0)
    rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323

Notice the 5s P100 write time.

After (hacked):

    $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
    randomtransaction :     303.645 micros/op 3293 ops/sec;    0.1 MB/s ( transactions:98999 aborts:0)
    rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407
    $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
    randomtransaction :     310.383 micros/op 3221 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
    rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918

P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code:

    $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
    randomtransaction :     311.365 micros/op 3211 ops/sec;    0.1 MB/s ( transactions:96999 aborts:0)
    rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767
    $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
    randomtransaction :     308.395 micros/op 3242 ops/sec;    0.1 MB/s ( transactions:97999 aborts:0)
    rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832

No substantial difference.

Reviewed By: siying

Differential Revision: D29738847

Pulled By: pdillinger

fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b
2021-07-16 14:13:08 -07:00
longlijian
4e4ec16957 Replace the namespace "rocksdb" to "ROCKSDB_NAMESPACE" (#8531)
Summary:
For more detail can reference the https://github.com/facebook/rocksdb/issues/6433
(https://github.com/facebook/rocksdb/pull/6433)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8531

Reviewed By: siying

Differential Revision: D29717057

Pulled By: ajkr

fbshipit-source-id: 3ccad9501e5612590e54a7cf8c447118f323c7f4
2021-07-15 17:23:39 -07:00
Baptiste Lemaire
e817bc9628 Added memtable garbage statistics (#8411)
Summary:
**Summary**:
2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations.

**Unit test**: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test.
This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR  can be exactly predicted, showing that we have a full understanding of the underlying operations.

**Performance testing**:
`./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times.
Timing done using "date" function in a bash script.
_Results_:
Original Rocksdb fork: mean 66.6 sec, std 1.18 sec.
This feature branch: mean 67.4 sec, std 1.35 sec.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8411

Reviewed By: akankshamahajan15

Differential Revision: D29150629

Pulled By: bjlemaire

fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1
2021-06-18 04:57:27 -07:00
Sidi Mohamed EL AATIFI
298edae941 Fix a typo in Javadoc (#8394)
Summary:
iterateLowerBound Slice representing the lower bound

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8394

Reviewed By: ajkr

Differential Revision: D29085721

Pulled By: jay-zhuang

fbshipit-source-id: a154375879395c48e9bd3794d296e70316894056
2021-06-17 12:02:57 -07:00
mrambacher
8948dc8524 Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262)
Summary:
The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions.  This change cleans that up by introducing an ImmutableOptions struct.  Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form).

Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR.  All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes.

Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262

Reviewed By: pdillinger

Differential Revision: D28226540

Pulled By: mrambacher

fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf
2021-05-05 14:00:17 -07:00
Adam Retter
69c986825e Fix javadoc for keyMayExist (#8232)
Summary:
Closes https://github.com/facebook/rocksdb/issues/6985

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8232

Reviewed By: jay-zhuang

Differential Revision: D27999779

Pulled By: mrambacher

fbshipit-source-id: a37c88d93bde2692b8be9e46e673dda7bea701b2
2021-04-26 08:34:10 -07:00
Yanqin Jin
a376c22066 Handle rename() failure in non-local FS (#8192)
Summary:
In a distributed environment, a file `rename()` operation can succeed on server (remote)
side, but the client can somehow return non-ok status to RocksDB. Possible reasons include
network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which
can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a
new MANIFEST. We currently always delete the new MANIFEST if an error occurs.

This is problematic in distributed world. If the server-side successfully updates the CURRENT
file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail.

As a fix, we can track the execution result of IO operations on the new MANIFEST.
- If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original
  MANIFEST. Therefore, it is safe to remove the new MANIFEST.
- If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up
  code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local
  POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the
  new MANIFEST.) Therefore, we keep the new MANIFEST.
    - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT.
    - If process reopens the db immediately after the failure, then the CURRENT file can point
      to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can
      succeed and ignore the other.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192

Test Plan: make check

Reviewed By: zhichao-cao

Differential Revision: D27804648

Pulled By: riversand963

fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4
2021-04-19 18:11:13 -07:00
Andrew Kryczka
1ba2b8a568 Add sample_for_compression results to table properties (#8139)
Summary:
Added `TableProperties::{fast,slow}_compression_estimated_data_size`.
These properties are present in block-based tables when
`ColumnFamilyOptions::sample_for_compression > 0` and the necessary
compression library is supported when the file is generated. They
contain estimates of what `TableProperties::data_size` would be if the
"fast"/"slow" compression library had been used instead. One
limitation is we do not record exactly which "fast" (ZSTD or Zlib)
or "slow" (LZ4 or Snappy) compression library produced the result.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139

Test Plan:
- new unit test
- ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression

Reviewed By: riversand963

Differential Revision: D27454338

Pulled By: ajkr

fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4
2021-03-31 18:21:50 -07:00
Jay Zhuang
a781b103da Fix getApproximateMemTableStats() return type (#8098)
Summary:
Which should return 2 long instead of an array.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8098

Reviewed By: mrambacher

Differential Revision: D27308741

Pulled By: jay-zhuang

fbshipit-source-id: 44beea2bd28cf6779b048bebc98f2426fe95e25c
2021-03-31 09:46:47 -07:00
Vlad Artamonov
4a6bc47b2e Fix possible mistype in a comment (#8086)
Summary:
This is a small fix to what I think is a mistype in two comments in `DBOptionsInterface.java`. If it was not an error, feel free to close.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8086

Reviewed By: ajkr

Differential Revision: D27260488

Pulled By: mrambacher

fbshipit-source-id: 469daadaf6039d5b5187132b8e0c7c3672842f21
2021-03-23 12:37:24 -07:00
Zhichao Cao
08ec5e7321 Add the statistics and info log for Error handler (#8050)
Summary:
Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume, auto resume total retry, and auto resume sucess; Histogram for auto resume retry count in each recovery call.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8050

Test Plan: make check and add test to error_handler_fs_test

Reviewed By: anand1976

Differential Revision: D26990565

Pulled By: zhichao-cao

fbshipit-source-id: 49f71e8ea4e9db8b189943976404205b56ab883f
2021-03-17 22:38:13 -07:00
Xiaopeng Zhang
c603f2f898 support getUsage and getPinnedUsage in JavaAPI for Cache (#7925)
Summary:
support getUsage and getPinnedUsage in JavaAPI for Cache
also fix a typo in LRUCacheTest.java that the highPriPoolRatio is not valid(set 5, I guess it means 0.05)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7925

Reviewed By: mrambacher

Differential Revision: D26900241

Pulled By: ajkr

fbshipit-source-id: 735d1e40a16fa8919c89c7c7154ba7f81208ec33
2021-03-17 09:30:33 -07:00
stefan-zobel
8d9088464b Java-API: Fix minor Javadoc copy-paste errors (#8034)
Summary:
Fixes 3 minor Javadoc copy-paste errors in the `RocksDB#newIterator()` and `Transaction#getIterator()` variants that take a column family handle but are talking about iterating over "the database" or "the default column family".

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8034

Reviewed By: jay-zhuang

Differential Revision: D26877667

Pulled By: mrambacher

fbshipit-source-id: 95dd95b667c496e389f221acc9a91b340e4b63bf
2021-03-16 18:07:09 -07:00
stefan-zobel
cc34da75b5 Java-API: byteCompressionType should be declared as primitive type byte (#7981)
Summary:
The variable `byteCompressionType` is only assigned values of primitive type and is never 'null', but it is declared with the boxed type 'Byte'.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7981

Reviewed By: ajkr

Differential Revision: D26546600

Pulled By: jay-zhuang

fbshipit-source-id: 07b579cdfcfc2262a448ca3626e216416fd05892
2021-03-09 22:05:16 -08:00
Peter Dillinger
0028e3398b Make format_version=5 new default (#8017)
Summary:
Haven't seen any production issues with new Bloom filter and
it's now > 1 year old (added in 6.6.0).

Updated check_format_compatible.sh and HISTORY.md

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8017

Test Plan: tests updated (or prior bugs fixed)

Reviewed By: ajkr

Differential Revision: D26762197

Pulled By: pdillinger

fbshipit-source-id: 0e755c46b443087c1544da0fd545beb9c403d1c2
2021-03-09 12:42:53 -08:00
stefan-zobel
430842f948 Java-API: Missing space in string literal (#7982)
Summary:
`TtlDB.open()`: missing space after 'column'
`AdvancedColumnFamilyOptionsInterface.setLevelCompactionDynamicLevelBytes()`: missing space after 'cause'

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7982

Reviewed By: ajkr

Differential Revision: D26546632

Pulled By: jay-zhuang

fbshipit-source-id: 885dedcaa2200842764fbac9ce3766d54e1c8914
2021-03-09 11:30:29 -08:00
Andrew Kryczka
d904233d2f Limit buffering for collecting samples for compression dictionary (#7970)
Summary:
For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.

However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.

Related changes include:

- Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
- Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
- Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970

Test Plan:
- updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
- looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.

Reviewed By: pdillinger

Differential Revision: D26467994

Pulled By: ajkr

fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
2021-02-19 14:09:54 -08:00
stefan-zobel
251143f8fb rocksdbjni: Possible NPE in RocksDB.setOptions #7869 (#7909)
Summary:
Fix for https://github.com/facebook/rocksdb/issues/7869

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7909

Reviewed By: akankshamahajan15

Differential Revision: D26181440

Pulled By: ajkr

fbshipit-source-id: f323aec9d91e177fa873599b99801b391cf094b1
2021-02-18 15:48:39 -08:00
Xiaopeng Zhang
bf6795aea0 fix java sample typo and replace deprecated code with latest (#7906)
Summary:
1. replace deprecated code in sample java with latest api
2. fix optimistictransaction sample code typo

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7906

Reviewed By: ajkr

Differential Revision: D26127429

Pulled By: jay-zhuang

fbshipit-source-id: f015ad1435f565cffb8798a4fb5afc44c72d73d7
2021-02-01 14:45:34 -08:00
Xiaopeng Zhang
36963dc2ca fix write option typo in java samples (#7894)
Summary:
this is a trivial PR for rocksdb java samples, I think it is a typo about write options. to do sync write, WAL should not be disabled

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7894

Reviewed By: jay-zhuang

Differential Revision: D26047128

Pulled By: mrambacher

fbshipit-source-id: a06ce54cb61af0d3f2578a709c34a0b1ccecb0b2
2021-01-26 19:13:08 -08:00
Tomas Kolda
d76a8eeef7 Fixing Windows build using CMake (#7854)
Summary:
Builds were not producing Windows binaries properly in 6.15 branch:

```
00:00:46.413 Tests run: 11, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.183 sec <<< FAILURE! - in org.rocksdb.EventListenerTest
00:00:46.414 testAllCallbacksInvocation(org.rocksdb.EventListenerTest)  Time elapsed: 0.012 sec  <<< ERROR!
00:00:46.414 java.lang.UnsatisfiedLinkError: org.rocksdb.test.TestableEventListener.invokeAllCallbacks(J)V
00:00:46.414 	at org.rocksdb.test.TestableEventListener.invokeAllCallbacks(Native Method)
00:00:46.414 	at org.rocksdb.test.TestableEventListener.invokeAllCallbacks(TestableEventListener.java:19)
00:00:46.414 	at org.rocksdb.EventListenerTest.testAllCallbacksInvocation(EventListenerTest.java:436)
```

```
00:00:41.497        "D:\j\workspace\RocksDB_Build_Windows\build\java\rocksdbjni_headers.vcxproj" (default target) (3) ->
00:00:41.497        (CustomBuild target) ->
00:00:41.497          CUSTOMBUILD : error : Could not find class file for 'org.rocksdb.TestableEventListener'. [D:\j\workspace\RocksDB_Build_Windows\build\java\rocksdbjni_headers.vcxproj]
```

Also failed on Linux as library was not initialized yet:

```
00:01:25.103 Running org.rocksdb.NativeComparatorWrapperTest
00:01:25.133 Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.006 sec <<< FAILURE! - in org.rocksdb.NativeComparatorWrapperTest
00:01:25.133 rountrip(org.rocksdb.NativeComparatorWrapperTest)  Time elapsed: 0.002 sec  <<< ERROR!
00:01:25.133 java.lang.UnsatisfiedLinkError: org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.newStringComparator()J
00:01:25.133 	at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.newStringComparator(Native Method)
00:01:25.133 	at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.initializeNative(NativeComparatorWrapperTest.java:87)
00:01:25.133 	at org.rocksdb.RocksCallbackObject.<init>(RocksCallbackObject.java:28)
00:01:25.133 	at org.rocksdb.AbstractComparator.<init>(AbstractComparator.java:20)
00:01:25.133 	at org.rocksdb.NativeComparatorWrapper.<init>(NativeComparatorWrapper.java:16)
00:01:25.133 	at org.rocksdb.NativeComparatorWrapperTest$NativeStringComparatorWrapper.<init>(NativeComparatorWrapperTest.java:82)
00:01:25.133 	at org.rocksdb.NativeComparatorWrapperTest.rountrip(NativeComparatorWrapperTest.java:30)
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7854

Reviewed By: jay-zhuang

Differential Revision: D25873378

Pulled By: ajkr

fbshipit-source-id: 88afb08bfd30edff31f17da063e636df0769cbfe
2021-01-15 17:53:16 -08:00
Tomas Kolda
1001bc01c9 Read Options to support direct slice (#7132)
Summary:
This request is adding support for using DirectSlice in ReadOptions lower/upper bounds.

To be more efficient I have added setLength to DirectSlice so I can just update the length to be used by slice from direct buffer. It is also needed, because when one creates iterator it keep pointer to original slice so setting new slice in options does not help (it needs to reuse existing one). Using this approach one can modify the slice any time during operations with iterator.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7132

Reviewed By: zhichao-cao

Differential Revision: D25840092

Pulled By: jay-zhuang

fbshipit-source-id: 760167baf61568c9a35138145c4bf9b06824cb71
2021-01-15 17:05:18 -08:00
Tomas Kolda
ac956f2bea S390 Linux is failing tests ColumnFamilyOptionsTest.cfPaths (#7853)
Summary:
Fix ColumnFamilyOptionsTest.cfPaths and OptionsTest.cfPaths in 6.15 branch (and probably other branches including master)

has_exception variable was not initialized which was causing test failures and incorrect behavior on s390 platform (and maybe others as variable content is undefined).

adamretter please take a look.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7853

Reviewed By: akankshamahajan15

Differential Revision: D25901639

Pulled By: jay-zhuang

fbshipit-source-id: 151b5db27b495fc6d8ed54c0eccbde2508215ac5
2021-01-15 16:32:31 -08:00
Adam Retter
3e6ee9f82e Update the versions of the test dependencies used for RocksJava (#7805)
Summary:
Update the versions of the dependencies used for testing RocksJava.

pdillinger Please can you add the following to your S3 bucket:
1. https://repo1.maven.org/maven2/junit/junit/4.13.1/junit-4.13.1.jar
2. https://repo1.maven.org/maven2/org/hamcrest/hamcrest/2.2/hamcrest-2.2.jar
3. https://repo1.maven.org/maven2/cglib/cglib/3.3.0/cglib-3.3.0.jar
4. https://repo1.maven.org/maven2/org/assertj/assertj-core/2.9.0/assertj-core-2.9.0.jar

Thanks.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7805

Reviewed By: akankshamahajan15

Differential Revision: D25906134

Pulled By: jay-zhuang

fbshipit-source-id: 1c6c7d461a73abaff1796bb31f0ad90dcbdef1a0
2021-01-13 16:01:38 -08:00
Laurent Goujon
0426d4a4ee Fix Java hashCode implementation (#7860)
Summary:
Classes ColumnFamilyHandle and CapturingWriteBatchHandler.Event have
byte array fields as part of their identity, but they do not use the
arrays' content to compute the instance's hash, and instead rely on the
arrays' identity, causing instances to have different hashcodes
although they are equal.
The PR addresses it by using the arrays' content to compute the hash,
like the equals method does.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7860

Reviewed By: jay-zhuang

Differential Revision: D25901327

Pulled By: akankshamahajan15

fbshipit-source-id: 347e7b3d2ba7befe7faa956b033e6421b9d0c235
2021-01-13 10:04:42 -08:00
Adam Retter
62afa968c2 Fix various small build issues, Java API naming (#7776)
Summary:
* Compatibility with older GCC.
* Compatibility with older jemalloc libraries.
* Remove Docker warning when building i686 binaries.
* Fix case inconsistency in Java API naming (potential update to HISTORY.md deferred)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7776

Reviewed By: akankshamahajan15

Differential Revision: D25607235

Pulled By: pdillinger

fbshipit-source-id: 7ab0fb7fa7a34e97ed0bec991f5081acb095777d
2020-12-18 16:12:26 -08:00
Adam Retter
29d12748b0 Fix failing RocksJava test compilation and add CI (#7769)
Summary:
* Fixes a Java test compilation issue on macOS
* Cleans up CircleCI RocksDBJava build config
* Adds CircleCI for RocksDBJava on MacOS
* Ensures backwards compatibility with older macOS via CircleCI
* Fixes RocksJava static builds ordering
* Adds missing RocksJava static builds to CircleCI for Mac and Linux
* Improves parallelism in RocksJava builds
* Reduces the size of the machines used for RocksJava CircleCI as they don't need to be so large (Saves credits)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7769

Reviewed By: akankshamahajan15

Differential Revision: D25601293

Pulled By: pdillinger

fbshipit-source-id: 0a0bb9906f65438fe143487d78e37e1947364d08
2020-12-16 16:00:02 -08:00
Cheng Chang
5e794b0841 Fix a recovery corner case (#7621)
Summary:
Consider the following sequence of events:

1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
2. Syncing MANIFEST failed and db crashed.
3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
4. Db crashed again.
5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.

It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.

After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621

Test Plan:
1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
2. a new unit test is added in db_basic_test to simulate step 3.

Reviewed By: riversand963

Differential Revision: D24668144

Pulled By: cheng-chang

fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
2020-11-07 22:23:27 -08:00
cheng-chang
1f627210ca Simplify a test case in Java ReadOnlyTest (#7608)
Summary:
The original test nests a lot of `try` blocks. This PR flattens these blocks into independent blocks, so that each `try` block closes the DB before opening the next DB instance.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7608

Test Plan: watch the existing java tests to pass

Reviewed By: zhichao-cao

Differential Revision: D24611621

Pulled By: cheng-chang

fbshipit-source-id: d486c5d37ac25d4b860d739ef2cdd58e6064d42d
2020-11-04 16:49:17 -08:00
Jermy Li
99a0305bb8 java: correct method name RocksDB.GetColumnFamilyMetaData() (#7606)
Summary:
update GetColumnFamilyMetaData() to getColumnFamilyMetaData()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7606

Reviewed By: zhichao-cao

Differential Revision: D24610298

Pulled By: cheng-chang

fbshipit-source-id: d24f9b65478da1456f50747637dc95688af874de
2020-10-28 18:13:27 -07:00
Ramkumar Vadivelu
9a690a74e1 In ParseInternalKey(), include corrupt key info in Status (#7515)
Summary:
Fixes Issue https://github.com/facebook/rocksdb/issues/7497

When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`

Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.

Tests:
- make check
- some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
- some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test

Example of new status returns:
- Key too small - `Corrupted Key: Internal Key too small. Size=5`
- Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
- Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515

Reviewed By: ajkr

Differential Revision: D24240264

Pulled By: ramvadiv

fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
2020-10-28 10:12:58 -07:00
Adam Retter
ccbf468cb1 Small JNI improvements (#7371)
Summary:
* Avoid some unnecessary array copy operations on read/write
* Remove some duplicated code
* Don't leak arrays on some exceptions
* Fixed some doc comments

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7371

Reviewed By: jay-zhuang

Differential Revision: D24312932

Pulled By: pdillinger

fbshipit-source-id: 422fe6b98bbdb922a148922ac0d2d965c715176e
2020-10-14 22:23:56 -07:00
Tomasz Posluszny
05fba96927 Make RocksDB instance responsible for closing associated ColumnFamilyHandle instances (#7428)
Summary:
- Takes the burden off developer to close ColumnFamilyHandle instances before closing RocksDB instance
- The change is backward-compatible

----
Previously the pattern for working with Column Families was:

```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {

  // list of column family descriptors, first entry must always be default column family
  final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
      new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
      new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
  );

  // a list which will hold the handles for the column families once the db is opened
  final List<ColumnFamilyHandle> columnFamilyHandleList =
      new ArrayList<>();

  try (final DBOptions options = new DBOptions()
      .setCreateIfMissing(true)
      .setCreateMissingColumnFamilies(true);
       final RocksDB db = RocksDB.open(options,
           "path/to/do", cfDescriptors,
           columnFamilyHandleList)) {

    try {

      // do something

    } finally {

      // NOTE user must explicitly frees the column family handles before freeing the db
      for (final ColumnFamilyHandle columnFamilyHandle :
          columnFamilyHandleList) {
        columnFamilyHandle.close();
      }
    } // frees the column family options
  }
} // frees the db and the db options
```

With the changes in this PR, the Java user no longer has to worry about manually closing the Column Families, which allows them to write simpler symmetrical create/free oriented code like this:

```java
try (final ColumnFamilyOptions cfOpts = new ColumnFamilyOptions().optimizeUniversalStyleCompaction()) {

  // list of column family descriptors, first entry must always be default column family
  final List<ColumnFamilyDescriptor> cfDescriptors = Arrays.asList(
      new ColumnFamilyDescriptor(RocksDB.DEFAULT_COLUMN_FAMILY, cfOpts),
      new ColumnFamilyDescriptor("my-first-columnfamily".getBytes(), cfOpts)
  );

  // a list which will hold the handles for the column families once the db is opened
  final List<ColumnFamilyHandle> columnFamilyHandleList =
      new ArrayList<>();

  try (final DBOptions options = new DBOptions()
      .setCreateIfMissing(true)
      .setCreateMissingColumnFamilies(true);
       final RocksDB db = RocksDB.open(options,
           "path/to/do", cfDescriptors,
           columnFamilyHandleList)) {

        // do something

    } // frees the column family options, then frees the db and the db options
  }
}
```

**NOTE**: The changes in this PR are backwards API compatible, which means existing code using the original approach will also continue to function correctly.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7428

Reviewed By: cheng-chang

Differential Revision: D24063348

Pulled By: jay-zhuang

fbshipit-source-id: 648d7526669923128c863ead94516bf4d50ac658
2020-10-14 14:39:14 -07:00
Tomasz Posluszny
6528ecc800 Add event listeners to RocksJava (#7425)
Summary:
Allows adding event listeners in RocksJava.

* Adds listeners getter and setter in `Options` and `DBOptions` classes.
* Adds `EventListener` Java interface and base class for implementing custom event listener callbacks - `AbstractEventListener`, which has an underlying native callback class implementing C++ `EventListener` class.
* `AbstractEventListener` class has mechanism for selectively enabling its callback methods in order to prevent invoking Java method if it is not implemented. This decreases performance cost in case only subset of event listener callback methods is needed - the JNI code for remaining "no-op" callbacks is not executed.
* The code is covered by unit tests in `EventListenerTest.java`, there are also tests added for setting/getting listeners field in `OptionsTest.java` and `DBOptionsTest.java`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7425

Reviewed By: pdillinger

Differential Revision: D24063390

Pulled By: jay-zhuang

fbshipit-source-id: 508c359538983d6b765e70d9989c351794a944ee
2020-10-14 11:33:52 -07:00
Akanksha Mahajan
38d0a365e3 Add Stats for MultiGet (#7366)
Summary:
Add following stats for MultiGet in Histogram to get more insight on MultiGet.
    1. Number of index and filter blocks read from file as part of MultiGet
    request per level.
    2. Number of data blocks read from file per level.
    3. Number of SST files loaded from file system per level.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7366

Reviewed By: anand1976

Differential Revision: D24127040

Pulled By: akankshamahajan15

fbshipit-source-id: e63a003056b833729b277edc0639c08fb432756b
2020-10-07 13:28:48 -07:00
Ramkumar Vadivelu
e04a50923d Change ParseInternalKey() to return Status instead of bool (#7457)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/7430

Change ParseInternalKey() to return Status instead of bool.

db_bench (seekrandom) based before/after results with value size of 100 bytes and 16 bytes can be found at (tests ran on an udb server):
https://www.dropbox.com/s/47bwamdy5ozngph/PIK_ret_Status_results.xlsx?dl=0

![db_bench_results](https://user-images.githubusercontent.com/62277872/94642825-2a21a800-029a-11eb-88f2-124136c83fd3.png)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7457

Reviewed By: ajkr

Differential Revision: D24002433

Pulled By: ramvadiv

fbshipit-source-id: ac253ecf577a29044c47c3fe254a01e71404c44c
2020-09-30 19:16:47 -07:00
Tomasz Posluszny
6efae4b00d Add missing Java API for boolean and numerical fields in DBOptions (#7387)
Summary:
Exposes the following previously missing DBOptions fields in the RocksJava API:
- persist_stats_to_disk
- max_write_batch_group_size_bytes
- skip_checking_sst_file_sizes_on_db_open
- avoid_unnecessary_blocking_io
- write_dbid_to_manifest
- log_readahead_size
- best_efforts_recovery
- max_bgerror_resume_count
- bgerror_resume_retry_interval

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7387

Reviewed By: siying

Differential Revision: D23707785

Pulled By: jay-zhuang

fbshipit-source-id: e5688c7d40d83128734605ef7b0720a55fdfa699
2020-09-18 12:28:40 -07:00
Adam Retter
3ac07a12fe RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046)
Summary:
Expose from C++ API to Java API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7046

Reviewed By: riversand963

Differential Revision: D23726297

Pulled By: pdillinger

fbshipit-source-id: fc66bf626ce6fe9797e7d021ac849eacab91bf6d
2020-09-17 15:41:25 -07:00
Tomasz Posluszny
6b72342a12 Implement missing Java API for ColumnFamilyOptions (#7372)
Summary:
Covered methods:
- OldDefaults()
- OptimizeForSmallDb(std::shared_ptr<Cache>)

Covered fields:
- cf_paths

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7372

Reviewed By: pdillinger

Differential Revision: D23683449

Pulled By: jay-zhuang

fbshipit-source-id: 3e5a8b657cc382c19de3a48c666a3b0e8d96968d
2020-09-14 12:09:04 -07:00