rocksdb

Author	SHA1	Message	Date
Peter Dillinger	420d51b9a0	Update Java API for FilterPolicy changes (#9569 ) Summary: Obsolete block-based filter no longer in public API, from https://github.com/facebook/rocksdb/issues/9535 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9569 Test Plan: existing tests Reviewed By: jay-zhuang Differential Revision: D34243579 Pulled By: pdillinger fbshipit-source-id: ec5127d9bb9cc3f70501c531829a735bffdd1418	2022-02-15 12:18:52 -08:00
Levi Tamasi	ac251aa641	Add Java bindings for blob compaction readahead size (#9554 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9554 Test Plan: Added new unit tests. Reviewed By: mrambacher Differential Revision: D34197121 Pulled By: ltamasi fbshipit-source-id: 15056e26d632057a7c052a5024a560ba0eac554c	2022-02-14 09:15:42 -08:00
Alan Paxton	eed71dfa82	Transaction multiGet convert to list-based (#9522 ) Summary: Transaction multiGet convert to list-based. RocksDB Java (non-transactional) has multiGetAsList() methods to expose multiGet(). These return a list of results. These methods replaced multiGet() methods returning an array of results, which were deprecated in Rocks 6 and are being removed in Rocks 7. The transactional API still presents multiGet() methods returning arrays, so in Rocks 7 we replace these with multiGetAsList()methods and deprecate the multiGet() methods. This does not require any changes to the supporting JNI/C++ code, only to the wrappers which present the Java API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9522 Reviewed By: mrambacher Differential Revision: D34114373 Pulled By: jay-zhuang fbshipit-source-id: cb22d6095934d951b6aee4aed3e07923d3c18007	2022-02-14 08:33:02 -08:00
Alan Paxton	99d86252b6	remove deprecated dispose() for Rocks JNI interface Java objects. (#9523 ) Summary: For RocksDB 7. Remove deprecated dispose() And as a consequence remove finalize(), which is good Modern Java hygiene. It is extremely non-deterministic when `finalize()` is called on an object, and resource closure/recovery of underlying native/C++ objects and/or non-memory resource cannot be adequately controlled through GC finalization. The RocksDB Java/JNI interface provides and encourages the use of AutoCloseable objects with close() methods, allowing predictable disposal of resources at exit from try-with-resource blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9523 Reviewed By: mrambacher Differential Revision: D34079843 Pulled By: jay-zhuang fbshipit-source-id: d1f0463a89a548b5d57bfaa50154379e722d189a	2022-02-09 11:32:53 -08:00
Akanksha Mahajan	9745c68eb1	Remove deprecated option new_table_reader_for_compaction_inputs (#9443 ) Summary: In RocksDB option new_table_reader_for_compaction_inputs has not effect on Compaction or on the behavior of RocksDB library. Therefore, we are removing it in the upcoming 7.0 release. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9443 Test Plan: CircleCI Reviewed By: ajkr Differential Revision: D33788508 Pulled By: akankshamahajan15 fbshipit-source-id: 324ca6f12bfd019e9bd5e1b0cdac39be5c3cec7d	2022-02-08 19:31:28 -08:00
Radek Hubner	42c8afd85a	WriteOptions - add missing java API. (#9295 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9295 Reviewed By: riversand963 Differential Revision: D33672440 Pulled By: ajkr fbshipit-source-id: 85f73a9297888b00255b636e7826b37186aba45c	2022-02-04 16:08:06 -08:00
Si Ke	2c3a780901	Fixed all RocksJava test failures in Centos and Alpine (#9395 ) Summary: Fixed all RocksJava test failures in Centos and Alpine 32 bit and 64 bit OSes Pull Request resolved: https://github.com/facebook/rocksdb/pull/9395 Reviewed By: mrambacher Differential Revision: D33771987 Pulled By: ajkr fbshipit-source-id: fed91033b8df08f191ad65e1fb745a9264bbfa70	2022-02-04 16:03:56 -08:00
Jermy Li	83ff350ff2	jni: expose memtable_whole_key_filtering option (#9394 ) Summary: refer to: https://github.com/facebook/rocksdb/wiki/Prefix-Seek#configure-prefix-bloom-filter Pull Request resolved: https://github.com/facebook/rocksdb/pull/9394 Reviewed By: mrambacher Differential Revision: D33671533 Pulled By: ajkr fbshipit-source-id: d90db1712efdd5dd65020329867381d6b3cf2626	2022-02-04 16:01:16 -08:00
Yanqin Jin	d10c5c08d3	Remove iter_start_seqnum and preserve_deletes (#9430 ) Summary: According to https://github.com/facebook/rocksdb/blob/6.27.fb/db/db_impl/db_impl.cc#L2896:L2911 and https://github.com/facebook/rocksdb/blob/6.27.fb/db/db_impl/db_impl_open.cc#L203:L208, we are going to remove `iter_start_seqnum` and `preserve_deletes` starting from RocksDB 7.0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9430 Test Plan: make check and CI Reviewed By: ajkr Differential Revision: D33753639 Pulled By: riversand963 fbshipit-source-id: c80aab8e8d8fc33e52472fed524ed703d0ffc8b6	2022-01-28 13:28:38 -08:00
Jay Zhuang	22321e1027	Remove unused API base_background_compactions (#9462 ) Summary: The API is deprecated long time ago. Clean up the codebase by removing it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9462 Test Plan: CI, fake release: D33835220 Reviewed By: riversand963 Differential Revision: D33835103 Pulled By: jay-zhuang fbshipit-source-id: 6d2dc12c8e7fdbe2700865a3e61f0e3f78bd8184	2022-01-27 21:05:18 -08:00
Peter Dillinger	78aee6fedc	Remove obsolete backupable_db.h, utility_db.h (#9438 ) Summary: This also removes the obsolete names BackupableDBOptions and UtilityDB. API users must now use BackupEngineOptions and DBWithTTL::Open. In C API, `rocksdb_backupable_db_` is replaced `rocksdb_backup_engine_`. Similar renaming in Java API. In reference to https://github.com/facebook/rocksdb/issues/9389 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9438 Test Plan: CI Reviewed By: mrambacher Differential Revision: D33780269 Pulled By: pdillinger fbshipit-source-id: 4a6cfc5c1b4c78bcad790b9d3dd13c5fdf4a1fac	2022-01-27 15:45:30 -08:00
Hui Xiao	1e0e883ca5	Remove deprecated API AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit (#9452 ) Summary: Context/Summary: AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit have been marked as deprecated and it's time to actually remove the code. - Keep `soft_rate_limit`/`hard_rate_limit` in `cf_mutable_options_type_info` to prevent throwing `InvalidArgument` in `GetColumnFamilyOptionsFromMap` when reading an option file still with these options (e.g, old option file generated from RocksDB before the deprecation) - Keep `soft_rate_limit`/`hard_rate_limit` in under `OptionsOldApiTest.GetOptionsFromMapTest` to test the case mentioned above. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9452 Test Plan: Rely on my eyeball and CI Reviewed By: ajkr Differential Revision: D33804938 Pulled By: hx235 fbshipit-source-id: 133d49f7ec5238d7efceeb0a3122a5792a2b9945	2022-01-27 13:01:09 -08:00
Yanqin Jin	50135c1bf3	Move HDFS support to separate repo (#9170 ) Summary: This PR moves HDFS support from RocksDB repo to a separate repo. The new (temporary?) repo in this PR serves as an example before we finalize the decision on where and who to host hdfs support. At this point, people can start from the example repo and fork. Java/JNI is not included yet, and needs to be done later if necessary. The goal is to include this commit in RocksDB 7.0 release. Reference: https://github.com/ajkr/dedupfs by ajkr Pull Request resolved: https://github.com/facebook/rocksdb/pull/9170 Test Plan: Follow the instructions in https://github.com/riversand963/rocksdb-hdfs-env/blob/master/README.md. Build and run db_bench and db_stress. make check Reviewed By: ajkr Differential Revision: D33751662 Pulled By: riversand963 fbshipit-source-id: 22b4db7f31762ed417a20239f5a08dcd1696244f	2022-01-24 20:23:54 -08:00
Eric Thérond	5602b1d3d9	Add support for Apple Silicon to RocksJava (#9254 ) Summary: Fixes facebook/rocksdb#7720 Updated Makefile with flags to define target architecture when compiling/linking, and added goal `rocksdbjavastaticosxub` to build a OS X Universal Binary native library. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9254 Reviewed By: mrambacher Differential Revision: D33551160 Pulled By: pdillinger fbshipit-source-id: 9ce9962e03aacf55014545a6cdf638b5b14b8fa9	2022-01-12 17:20:58 -08:00
Andrew Kryczka	b860a42158	Recover to exact latest seqno of data committed to MANIFEST (#9305 ) Summary: The LastSequence field in the MANIFEST file is the baseline seqno for a recovered DB. Recovering WAL entries might cause the recovered DB's seqno to advance above this baseline, but the recovered DB will never use a smaller seqno. Before this PR, we were writing the DB's seqno at the time of LogAndApply() as the LastSequence value. This works in the sense that it is a large enough baseline for the recovered DB that it'll never overwrite any records in existing SST files. At the same time, it's arbitrarily larger than what's needed. This behavior comes from LevelDB, where there was no tracking of largest seqno in an SST file. Now we know the largest seqno of newly written SST files, so we can write an exact value in LastSequence that actually reflects the largest seqno in any file referred to by the MANIFEST. This is primarily useful for correctness testing with unsynced data loss, where the recovered DB's seqno needs to indicate what records were recovered. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9305 Test Plan: - https://github.com/facebook/rocksdb/issues/9338 adds crash-recovery correctness testing coverage for WAL disabled use cases - https://github.com/facebook/rocksdb/issues/9357 will extend that testing to cover file ingestion - Added assertion at end of LogAndApply() for `VersionSet::descriptor_last_sequence_` consistency with files - Manually tested upgrade/downgrade compatibility with a custom crash test that randomly picks between a `db_stress` built with and without this PR (for old code it must run with `-disable_wal=0`) Reviewed By: riversand963 Differential Revision: D33182770 Pulled By: ajkr fbshipit-source-id: 0bfafaf685f347cc8cb0e1d62e0186340a738f7d	2022-01-05 16:02:21 -08:00
stefan-zobel	7ae213f735	Minor Javadoc fixes (#9203 ) Summary: Added two missing parameter tags with description and added some descriptions for parameter / return tags Pull Request resolved: https://github.com/facebook/rocksdb/pull/9203 Reviewed By: jay-zhuang Differential Revision: D32990607 Pulled By: mrambacher fbshipit-source-id: 10aea4c4cf1c28d5e97d19722ee835a965d1eb55	2021-12-21 05:40:51 -08:00
Jermy Li	9828b6d5fd	fix java doc issues (#9253 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9253 Reviewed By: jay-zhuang Differential Revision: D32990516 Pulled By: mrambacher fbshipit-source-id: c7cdb6562ac6871bca6ea0d9efa454f3a902a137	2021-12-16 21:04:41 -08:00
Andrea Cavalli	9918e1ee5a	Set KeyMayExist fields visibility to public (#9285 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/9284 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9285 Reviewed By: pdillinger Differential Revision: D33062006 Pulled By: mrambacher fbshipit-source-id: c3471c2db717fa5bc2337cf996ce744af0ed877d	2021-12-16 10:59:05 -08:00
Alan Paxton	c1ec0b28eb	java / jni io_uring support (#9224 ) Summary: Existing multiGet() in java calls multi_get_helper() which then calls DB::std::vector MultiGet(). This doesn't take advantage of io_uring. This change adds another JNI level method that runs a parallel code path using the DB::void MultiGet(), using ByteBuffers at the JNI level. We call it multiGetDirect(). In addition to using the io_uring path, this code internally returns pinned slices which we can copy out of into our direct byte buffers; this should reduce the overall number of copies in the code path to/from Java. Some jmh benchmark runs (100k keys, 1000 key multiGet) suggest that for value sizes > 1k, we see about a 20% performance improvement, although performance is slightly reduced for small value sizes, there's a little bit more overhead in the JNI methods. Closes https://github.com/facebook/rocksdb/issues/8407 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9224 Reviewed By: mrambacher Differential Revision: D32951754 Pulled By: jay-zhuang fbshipit-source-id: 1f70df7334be2b6c42a9c8f92725f67c71631690	2021-12-15 18:09:25 -08:00
Radek Hubner	7ac3a5d406	ReadOptions - Add missing java API. (#9248 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9248 Reviewed By: mrambacher Differential Revision: D33011237 Pulled By: jay-zhuang fbshipit-source-id: b6544ad40cb722e327bac60a0af711db253e36d7	2021-12-15 17:46:05 -08:00
Davide Angelocola	8a97c541e4	Fix copy constructors of Options and ColumnFamilyOptions (#9166 ) Summary: Looks like some fields are not copied by the copy constructor. Please confirm if it is a real issue! Pull Request resolved: https://github.com/facebook/rocksdb/pull/9166 Reviewed By: jay-zhuang Differential Revision: D32532093 Pulled By: mrambacher fbshipit-source-id: f636ef9425a530a8655947115160ae471916252b	2021-12-13 07:22:56 -08:00
Yanqin Jin	bd513fd075	Add commit marker with timestamp (#9266 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9266 This diff adds a new tag `CommitWithTimestamp`. Currently, there is no API to trigger writing this tag to WAL, thus it is unavailable to users. This is an ongoing effort to add user-defined timestamp support to write-committed transactions. This diff also indicates all column families that may potentially participate in the same transaction must either disable timestamp or have the same timestamp format, since `CommitWithTimestamp` tag is followed by a single byte-array denoting the commit timestamp of the transaction. We will enforce this checking in a future diff. We keep this diff small. Reviewed By: ltamasi Differential Revision: D31721350 fbshipit-source-id: e1450811443647feb6ca01adec4c8aaae270ffc6	2021-12-10 11:05:35 -08:00
Jermy Li	c39a808cb6	Deprecate WriteBatch.remove() and use the new style delete() (#9256 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9256 Reviewed By: mrambacher Differential Revision: D32971447 Pulled By: jay-zhuang fbshipit-source-id: 6954d7287229a8c776092bd82af3a8a8cd92b35e	2021-12-10 09:18:17 -08:00
stefan-zobel	f57745814f	Minor RocksJava Java code cosmetics (#9204 ) Summary: Specifically: - unused imports - code formatting - typos in comments - unnecessary casts - missing default label in switch statement - explicit use of long literals in multiplication - use generics where possible without backward compatibility risk Pull Request resolved: https://github.com/facebook/rocksdb/pull/9204 Reviewed By: ajkr Differential Revision: D32955184 Pulled By: jay-zhuang fbshipit-source-id: 42d05ce42639d982b9ea34c8081266dfba7f1efa	2021-12-09 20:00:48 -08:00
Hui Xiao	9daf07305c	Replace TableProperties::properties_offsets map with external_sst_file_global_seqno_offset (#9212 ) Summary: Context: Searching `TableProperties::properties_offsets` across the codebase reveals that internally it is only used to find the external SST file's global seqno offeset. Therefore we can narrow it down and replace this map property with a uint64_t property `external_sst_file_global_seqno_offset` to save memory usage related to table properties. Note: - See PR comments for discussion about potential impact on existing external usage of `TableProperties::properties_offsets` - See PR comments for discussion on keeping external SST file global seqno's offset VS using a simple flag indicating seqno's existence. Summary: - Replaced `TableProperties::properties_offsets` with `TableProperties::external_sst_file_global_seqno_offset` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9212 Test Plan: - Relied on existing tests should be sufficient since `TableProperties::properties_offsets` existed before and should already be tested. Reviewed By: ajkr Differential Revision: D32665941 Pulled By: hx235 fbshipit-source-id: 718e44617346dc4f3b1276ee953e61c196277795	2021-12-02 08:30:36 -08:00
Adam Retter	d94932323a	Check that newIteratorWithBase regardless of WBWI Overwrite Mode (#8134 ) Summary: The behaviour of WBWI has changed when calling newIteratorWithBase when overwrite is set to true or false. This PR simply adds tests to assert the new correct behaviour. Closes https://github.com/facebook/rocksdb/issues/7370 Closes https://github.com/facebook/rocksdb/pull/8134 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9107 Reviewed By: pdillinger Differential Revision: D32099475 Pulled By: mrambacher fbshipit-source-id: 245f483f73db866cc8a51219a2bff2e09e59faa0	2021-11-18 11:53:09 -08:00
Davide Angelocola	c9539ede76	Fix integer overflow in TraceOptions (#9157 ) Summary: Hello from a happy user of rocksdb java :-) Default constructor of TraceOptions is supposed to initialize size to 64GB but the expression contains an integer overflow. Simple test case with JShell: ``` jshell> 64 * 1024 * 1024 * 1024 $1 ==> 0 jshell> 64L * 1024 * 1024 * 1024 $2 ==> 68719476736 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9157 Reviewed By: pdillinger, zhichao-cao Differential Revision: D32369273 Pulled By: mrambacher fbshipit-source-id: 6a0c95fff7a91f27ff15d65b662c6b101756b450	2021-11-17 08:41:48 -08:00
Zhichao Cao	b694cd0e0d	Add tiered storage related read bytes stats to Statistic (#9123 ) Summary: Add the 3 read bytes counter to the Statistic, which will be used by storage tiering and get the information for files with different temperature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9123 Test Plan: added new testing cases. Reviewed By: siying Differential Revision: D32154745 Pulled By: zhichao-cao fbshipit-source-id: b7905d6dae469a72428742364ec07b634b6f15da	2021-11-16 15:17:17 -08:00
Alan Paxton	e5b34f5867	Fb 5789 max total WAL size clarification (#9108 ) Summary: Add clarification/extension to comments on max_total_wal_size and the Java wrapper MaxTotalWalSize to better explain the effect of the option on log file sizes. Closes https://github.com/facebook/rocksdb/issues/5789 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9108 Reviewed By: pdillinger Differential Revision: D32066640 Pulled By: mrambacher fbshipit-source-id: 7d5affc87e4119019054af9c884a2ea01d68f5b7	2021-11-08 08:54:37 -08:00
Adam Retter	be351f4754	Restore Java 7 Compatibility (#9103 ) Summary: RocksDB should still compile on Java 7. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9103 Reviewed By: pdillinger Differential Revision: D32067561 Pulled By: mrambacher fbshipit-source-id: bbe9c18c8007ab3e113de4add56a84c9bde61c8e	2021-11-08 08:21:02 -08:00
Alan Paxton	ec9082d698	Regression tests for tickets fixed by previous change. (#9019 ) Summary: closes https://github.com/facebook/rocksdb/issues/5891 closes https://github.com/facebook/rocksdb/issues/2001 Java BytewiseComparator is now unsigned compliant, consistent with the default C++ comparator, which has always been thus. Consequently 2 tickets reporting the previous broken state can be closed. This test confirms that the following issues were in fact resolved by a change made between 6.2.2 and 6.22.1, to wit https://github.com/facebook/rocksdb/commit/7242dae7 which as part of its effect, changed the Java bytewise comparators. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9019 Reviewed By: pdillinger Differential Revision: D31610910 Pulled By: mrambacher fbshipit-source-id: 664230f1377a1aa270136edd63eea2c206b907e9	2021-11-01 15:06:47 -07:00
Alan Paxton	73e6b89fad	Java wrapper for blob_gc_force_threshold as blobGarbageCollectionForceThreshold (#9109 ) Summary: Extra option added as a supplement to https://github.com/facebook/rocksdb/pull/8999 Closes https://github.com/facebook/rocksdb/issues/8221 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9109 Reviewed By: mrambacher Differential Revision: D32065039 Pulled By: ltamasi fbshipit-source-id: 6c484050a30fe0523850a8a3c95dc85b0a501362	2021-11-01 11:59:10 -07:00
myasuka	dc00e4b120	Introduce allowStall option for write buffer manager constructor (#9076 ) Summary: https://github.com/facebook/rocksdb/pull/7898 enable write buffer manager to stall write when memory_usage exceeds buffer_size, this is really useful for container running case to limit the memory usage. However, this feature is not visiable for rocksJava yet. This PR targets to introduce this feature for rocksJava. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9076 Reviewed By: akankshamahajan15 Differential Revision: D31931092 Pulled By: anand1976 fbshipit-source-id: 5531c16a87598663a02368c07b5e13a503164578	2021-10-26 12:09:54 -07:00
Jonathan Albrecht	e970248602	Add support for building on s390x platform (#8962 ) Summary: This PR adds support for building on s390x including updating travis CI. It uses the previous work in https://github.com/facebook/rocksdb/pull/6168 and adds some more changes to get all current tests (make check and jni tests) to pass. The tests were run with snappy, lz4, bzip2 and zstd all compiled in. There are a few pieces still needed to get the travis build working that I don't think I can do. adamretter is this something you could help with? 1. A prebuilt https://rocksdb-deps.s3-us-west-2.amazonaws.com/cmake/cmake-3.14.5-Linux-s390x.deb package 2. A https://hub.docker.com/r/evolvedbinary/rocksjava s390x image Not sure if there is more required for travis. Happy to help in any way I can. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8962 Reviewed By: mrambacher Differential Revision: D31802198 Pulled By: pdillinger fbshipit-source-id: 683511466fa6b505f85ba5a9964a268c6151f0c2	2021-10-22 10:13:15 -07:00
Alan Paxton	8d615a2b1d	New-style blob option bindings, Java option getter and improve/fix option parsing (#8999 ) Summary: Implementation of https://github.com/facebook/rocksdb/issues/8221, plus/including extension of Java options API to allow the get() of options from RocksDB. The extension allows more comprehensive testing of options at the Java side, by validating that the options are set at the C++ side. Variations on methods: MutableColumnFamilyOptions.MutableColumnFamilyOptionsBuilder getOptions() MutableDBOptions.MutableDBOptionsBuilder getDBOptions() retrieve the options via RocksDB C++ interfaces, and parse the resulting string into one of the Java-style option objects. This necessitated generalising the parsing of option strings in Java, which now parses the full range of option strings returned by the C++ interface, rather than a useful subset. This necessitates the list-separator being changed to :(colon) from , (comma). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8999 Reviewed By: jay-zhuang Differential Revision: D31655487 Pulled By: ltamasi fbshipit-source-id: c38e98145c81c61dc38238b0df580db176ce4efd	2021-10-19 09:21:52 -07:00
Alan Paxton	86cf7266c3	keyMayExist() supports ByteBuffer (#9013 ) Summary: closes https://github.com/facebook/rocksdb/issues/7917 Implemented ByteBuffer API variants of Java keyMayExist() uniformly with and without column families, read options and return data values. Implemented 2 supporting C++ JNI methods. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9013 Reviewed By: mrambacher Differential Revision: D31665989 Pulled By: jay-zhuang fbshipit-source-id: 8adc1730217dba38d6fa7b31d788650a33e28af1	2021-10-18 17:20:07 -07:00
Alan Paxton	f5526af8ed	Fix multiget throwing NPE for num of keys > 70k (#9012 ) Summary: closes https://github.com/facebook/rocksdb/issues/8039 Unnecessary use of multiple local JNI references at the same time, 1 per key, was limiting the size of the key array. The local references don't need to be held simultaneously, so if we rearrange the code we can make it work for bigger key arrays. Incidentally, make errors throw helpful exception messages rather than returning a null pointer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9012 Reviewed By: mrambacher Differential Revision: D31580862 Pulled By: jay-zhuang fbshipit-source-id: ce05831d52ede332e1b20e74d2dc621d219b9616	2021-10-14 11:48:12 -07:00
Jay Zhuang	6b34eb0ebc	Add remote compaction read/write bytes statistics (#8939 ) Summary: Add basic read/write bytes statistics on the primary side: `REMOTE_COMPACT_READ_BYTES` `REMOTE_COMPACT_WRITE_BYTES` Fixed existing statistics missing some IO for remote compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8939 Test Plan: CI Reviewed By: ajkr Differential Revision: D31074672 Pulled By: jay-zhuang fbshipit-source-id: c57afdba369990185008ffaec7e3fe7c62e8902f	2021-09-28 14:00:37 -07:00
anand76	add68bd28a	Add a stat to count secondary cache hits (#8666 ) Summary: Add a stat for secondary cache hits. The ```Cache::Lookup``` API had an unused ```stats``` parameter. This PR uses that to pass the pointer to a ```Statistics``` object that ```LRUCache``` uses to record the stat. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8666 Test Plan: Update a unit test in lru_cache_test Reviewed By: zhichao-cao Differential Revision: D30353816 Pulled By: anand1976 fbshipit-source-id: 2046f78b460428877a26ffdd2bb914ae47dfbe77	2021-08-16 21:01:14 -07:00
sdong	e7c24168d8	Move old files to warm tier in FIFO compactions (#8310 ) Summary: Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310 Test Plan: Add several unit tests. Reviewed By: ajkr Differential Revision: D28493792 fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501	2021-08-09 12:51:14 -07:00
Brendan MacDonell	8ca081780b	Correct javadoc for Env#setBackgroundThreads(int) (#8576 ) Summary: By default, the low priority pool is not the flush pool, so calling `Env#setBackgroundThreads` without providing a priority will not do what the caller expected. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8576 Reviewed By: ajkr Differential Revision: D29925154 Pulled By: mrambacher fbshipit-source-id: cd7211fc374e7d9929a9b88ea0a5ba8134b76099	2021-08-06 08:52:14 -07:00
Mikhail Golubev	8f52972cf9	Allow to use a string as a delimiter in StringAppendOperator (#8536 ) Summary: An arbitrary string can be used as a delimiter in StringAppend merge operator flavor. In particular, it allows using an empty string, combining binary values for the same key byte-to-byte one next to another. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8536 Reviewed By: mrambacher Differential Revision: D29962120 Pulled By: zhichao-cao fbshipit-source-id: 4ef5d846a47835cf428a11200409e30e2dbffc4f	2021-08-02 16:50:41 -07:00
Anatolii Zhmaiev	9ddb55a8f6	Add periodic_compaction_seconds option to RocksJava (#8579 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/8578 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8579 Reviewed By: ajkr Differential Revision: D29895081 Pulled By: mrambacher fbshipit-source-id: 3e4120e26a3e8252f8301d657c0aaa0b8550cddf	2021-07-26 17:33:42 -07:00
Peter Dillinger	df5dc73bec	Don't hold DB mutex for block cache entry stat scans (#8538 ) Summary: I previously didn't notice the DB mutex was being held during block cache entry stat scans, probably because I primarily checked for read performance regressions, because they require the block cache and are traditionally latency-sensitive. This change does some refactoring to avoid holding DB mutex and to avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats"). Some tests have to be updated because now the stats collector is populated in the Cache aggressively on DB startup rather than lazily. (I hope to clean up some of this added complexity in the future.) This change also ensures proper treatment of need_out_of_mutex for non-int DB properties. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538 Test Plan: Added unit test logic that uses sync points to fail if the DB mutex is held during a scan, covering the various ways that a scan might be triggered. Performance test - the known impact to holding the DB mutex is on TransactionDB, and the easiest way to see the impact is to hack the scan code to almost always miss and take an artificially long time scanning. Here I've injected an unconditional 5s sleep at the call to ApplyToAllEntries. Before (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 433.219 micros/op 2308 ops/sec; 0.1 MB/s ( transactions:78999 aborts:0) rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856 $ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 448.802 micros/op 2228 ops/sec; 0.1 MB/s ( transactions:75999 aborts:0) rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323 Notice the 5s P100 write time. After (hacked): $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 303.645 micros/op 3293 ops/sec; 0.1 MB/s ( transactions:98999 aborts:0) rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407 $ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 310.383 micros/op 3221 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918 P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code: $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 311.365 micros/op 3211 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0) rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767 $ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 \| egrep 'db.db.write.micros\|micros/op' randomtransaction : 308.395 micros/op 3242 ops/sec; 0.1 MB/s ( transactions:97999 aborts:0) rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832 No substantial difference. Reviewed By: siying Differential Revision: D29738847 Pulled By: pdillinger fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b	2021-07-16 14:13:08 -07:00
Baptiste Lemaire	e817bc9628	Added memtable garbage statistics (#8411 ) Summary: Summary: 2 new statistics counters are added to RocksDB: `MEMTABLE_PAYLOAD_BYTES_AT_FLUSH` and `MEMTABLE_GARBAGE_BYTES_AT_FLUSH`. The former tracks how many raw bytes of useful data are present on the memtable at flush time, whereas the latter is tracks how many of these raw bytes are considered garbage, meaning that they ended up not being imported on the SSTables resulting from the flush operations. Unit test: run `make db_flush_test -j$(nproc); ./db_flush_test` to run the unit test. This executable includes 3 tests, that test support and correct stat calculations for workloads with inserts, deletes, and DeleteRanges. The parameters are set such that the workloads are performed on a single memtable, and a single SSTable is created as a result of the flush operation. The flush operation is manually called in the test file. The tests verify that the values of these 2 statistics counters introduced in this PR can be exactly predicted, showing that we have a full understanding of the underlying operations. Performance testing: `./db_bench -statistics -benchmarks=fillrandom -num=10000000` repeated 10 times. Timing done using "date" function in a bash script. _Results_: Original Rocksdb fork: mean 66.6 sec, std 1.18 sec. This feature branch: mean 67.4 sec, std 1.35 sec. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8411 Reviewed By: akankshamahajan15 Differential Revision: D29150629 Pulled By: bjlemaire fbshipit-source-id: 7b3c2e86d50c6aa34fa50fd134282eacb543a5b1	2021-06-18 04:57:27 -07:00
Sidi Mohamed EL AATIFI	298edae941	Fix a typo in Javadoc (#8394 ) Summary: iterateLowerBound Slice representing the lower bound Pull Request resolved: https://github.com/facebook/rocksdb/pull/8394 Reviewed By: ajkr Differential Revision: D29085721 Pulled By: jay-zhuang fbshipit-source-id: a154375879395c48e9bd3794d296e70316894056	2021-06-17 12:02:57 -07:00
Adam Retter	69c986825e	Fix javadoc for keyMayExist (#8232 ) Summary: Closes https://github.com/facebook/rocksdb/issues/6985 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8232 Reviewed By: jay-zhuang Differential Revision: D27999779 Pulled By: mrambacher fbshipit-source-id: a37c88d93bde2692b8be9e46e673dda7bea701b2	2021-04-26 08:34:10 -07:00
Yanqin Jin	a376c22066	Handle rename() failure in non-local FS (#8192 ) Summary: In a distributed environment, a file `rename()` operation can succeed on server (remote) side, but the client can somehow return non-ok status to RocksDB. Possible reasons include network partition, connection issue, etc. This happens in `rocksdb::SetCurrentFile()`, which can be called in `LogAndApply() -> ProcessManifestWrites()` if RocksDB tries to switch to a new MANIFEST. We currently always delete the new MANIFEST if an error occurs. This is problematic in distributed world. If the server-side successfully updates the CURRENT file via renaming, then a subsequent `DB::Open()` will try to look for the new MANIFEST and fail. As a fix, we can track the execution result of IO operations on the new MANIFEST. - If IO operations on the new MANIFEST fail, then we know the CURRENT must point to the original MANIFEST. Therefore, it is safe to remove the new MANIFEST. - If IO operations on the new MANIFEST all succeed, but somehow we end up in the clean up code block, then we do not know whether CURRENT points to the new or old MANIFEST. (For local POSIX-compliant FS, it should still point to old MANIFEST, but it does not matter if we keep the new MANIFEST.) Therefore, we keep the new MANIFEST. - Any future `LogAndApply()` will switch to a new MANIFEST and update CURRENT. - If process reopens the db immediately after the failure, then the CURRENT file can point to either the new MANIFEST or the old one, both of which exist. Therefore, recovery can succeed and ignore the other. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8192 Test Plan: make check Reviewed By: zhichao-cao Differential Revision: D27804648 Pulled By: riversand963 fbshipit-source-id: 9c16f2a5ce41bc6aadf085e48449b19ede8423e4	2021-04-19 18:11:13 -07:00
Andrew Kryczka	1ba2b8a568	Add sample_for_compression results to table properties (#8139 ) Summary: Added `TableProperties::{fast,slow}_compression_estimated_data_size`. These properties are present in block-based tables when `ColumnFamilyOptions::sample_for_compression > 0` and the necessary compression library is supported when the file is generated. They contain estimates of what `TableProperties::data_size` would be if the "fast"/"slow" compression library had been used instead. One limitation is we do not record exactly which "fast" (ZSTD or Zlib) or "slow" (LZ4 or Snappy) compression library produced the result. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139 Test Plan: - new unit test - ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression Reviewed By: riversand963 Differential Revision: D27454338 Pulled By: ajkr fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4	2021-03-31 18:21:50 -07:00
Jay Zhuang	a781b103da	Fix getApproximateMemTableStats() return type (#8098 ) Summary: Which should return 2 long instead of an array. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8098 Reviewed By: mrambacher Differential Revision: D27308741 Pulled By: jay-zhuang fbshipit-source-id: 44beea2bd28cf6779b048bebc98f2426fe95e25c	2021-03-31 09:46:47 -07:00

1 2 3 4 5 ...

316 Commits