rocksdb

Author	SHA1	Message	Date
Adam Retter	7d7e88c7d1	Improve build detect for RISCV (#9366 ) Summary: Related to: https://github.com/facebook/rocksdb/pull/9215 * Adds build_detect_platform support for RISCV on Linux (at least on SiFive Unmatched platforms) This still leaves some linking issues on RISCV remaining (e.g. when building `db_test`): ``` /usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `__gnu_cxx::new_allocator<char>::deallocate(char, unsigned long)': /usr/include/c++/10/ext/new_allocator.h:133: undefined reference to `__atomic_compare_exchange_1' /usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `std::__atomic_base<bool>::compare_exchange_weak(bool&, bool, std::memory_order, std::memory_order)': /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1' /usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1' /usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1' /usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1' /usr/bin/ld: ./librocksdb_debug.a(memtable.o):/usr/include/c++/10/bits/atomic_base.h:464: more undefined references to `__atomic_compare_exchange_1' follow /usr/bin/ld: ./librocksdb_debug.a(db_impl.o): in function `rocksdb::DBImpl::NewIteratorImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData, unsigned long, rocksdb::ReadCallback, bool, bool)': /home/adamretter/rocksdb/db/db_impl/db_impl.cc:3019: undefined reference to `__atomic_exchange_1' /usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::Writer::CreateMutex()': /home/adamretter/rocksdb/./db/write_thread.h:205: undefined reference to `__atomic_compare_exchange_1' /usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::SetState(rocksdb::WriteThread::Writer, unsigned char)': /home/adamretter/rocksdb/db/write_thread.cc:222: undefined reference to `__atomic_compare_exchange_1' collect2: error: ld returned 1 exit status make: *** [Makefile:1449: db_test] Error 1 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9366 Reviewed By: jay-zhuang Differential Revision: D34377664 Pulled By: mrambacher fbshipit-source-id: c86f9d0cd1cb0c18de72b06f1bf5847f23f51118	2022-03-01 04:24:54 -08:00
Andrew Kryczka	0a89cea5f5	Handle failures in block-based table size/offset approximation (#9615 ) Summary: In crash test with fault injection, we were seeing stack traces like the following: ``` https://github.com/facebook/rocksdb/issues/3 0x00007f75f763c533 in __GI___assert_fail (assertion=assertion@entry=0x1c5b2a0 "end_offset >= start_offset", file=file@entry=0x1c580a0 "table/block_based/block_based_table_reader.cc", line=line@entry=3245, function=function@entry=0x1c60e60 "virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller)") at assert.c:101 https://github.com/facebook/rocksdb/issues/4 0x00000000010ea9b4 in rocksdb::BlockBasedTable::ApproximateSize (this=<optimized out>, start=..., end=..., caller=<optimized out>) at table/block_based/block_based_table_reader.cc:3224 https://github.com/facebook/rocksdb/issues/5 0x0000000000be61fb in rocksdb::TableCache::ApproximateSize (this=0x60f0000161b0, start=..., end=..., fd=..., caller=caller@entry=rocksdb::kCompaction, internal_comparator=..., prefix_extractor=...) at db/table_cache.cc:719 https://github.com/facebook/rocksdb/issues/6 0x0000000000c3eaec in rocksdb::VersionSet::ApproximateSize (this=<optimized out>, v=<optimized out>, f=..., start=..., end=..., caller=<optimized out>) at ./db/version_set.h:850 https://github.com/facebook/rocksdb/issues/7 0x0000000000c6ebc3 in rocksdb::VersionSet::ApproximateSize (this=<optimized out>, options=..., v=v@entry=0x621000047500, start=..., end=..., start_level=start_level@entry=0, end_level=<optimized out>, caller=<optimized out>) at db/version_set.cc:5657 https://github.com/facebook/rocksdb/issues/8 0x000000000166e894 in rocksdb::CompactionJob::GenSubcompactionBoundaries (this=<optimized out>) at ./include/rocksdb/options.h:1869 https://github.com/facebook/rocksdb/issues/9 0x000000000168c526 in rocksdb::CompactionJob::Prepare (this=this@entry=0x7f75f3ffcf00) at db/compaction/compaction_job.cc:546 ``` The problem occurred in `ApproximateSize()` when the index `Seek()` for the first `ApproximateDataOffsetOf()` encountered an I/O error, while the second `Seek()` did not. In the old code that scenario caused `start_offset == data_size` , thus it was easy to trip the assertion that `end_offset >= start_offset`. The fix is to set `start_offset == 0` when the first index `Seek()` fails, and `end_offset == data_size` when the second index `Seek()` fails. I doubt these give an "on average correct" answer for how this function is used, but I/O errors in index seeks are hopefully rare, it looked consistent with what was already there, and it was easier to calculate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9615 Test Plan: run the repro command for a while and stopped seeing coredumps - ``` $ while ! ./db_stress --block_size=128 --cache_size=32768 --clear_column_family_one_in=0 --column_families=1 --continuous_verification_interval=0 --db=/dev/shm/rocksdb_crashtest --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --index_type=2 --iterpercent=10 --kill_random_test=18887 --max_key=1000000 --max_bytes_for_level_base=2048576 --nooverwritepercent=1 --open_files=-1 --open_read_fault_one_in=32 --ops_per_thread=1000000 --prefixpercent=5 --read_fault_one_in=0 --readpercent=45 --reopen=0 --skip_verifydb=1 --subcompactions=2 --target_file_size_base=524288 --test_batches_snapshots=0 --value_size_mult=32 --write_buffer_size=524288 --writepercent=35 ; do : ; done ``` Reviewed By: pdillinger Differential Revision: D34383069 Pulled By: ajkr fbshipit-source-id: fac26c3b20ea962e75387515ba5f2724dc48719f	2022-02-28 23:45:08 -08:00
stefan-zobel	ddb7620a61	Fix trivial Javadoc omissions (#9534 ) Summary: - fix spelling of `valueSizeSofLimit` and add "param" description in ReadOptions - add 3 missing "return" in RocksDB Pull Request resolved: https://github.com/facebook/rocksdb/pull/9534 Reviewed By: riversand963 Differential Revision: D34131186 Pulled By: mrambacher fbshipit-source-id: 7eb7ec177906052837180b291d67fb1c838c49e1	2022-02-28 11:51:17 -08:00
Andrew Kryczka	9983eecdfb	Dedicate cacheline for DB mutex (#9637 ) Summary: We found a case of cacheline bouncing due to writers locking/unlocking `mutex_` and readers accessing `block_cache_tracer_`. We discovered it only after the issue was fixed by https://github.com/facebook/rocksdb/issues/9462 shifting the `DBImpl` members such that `mutex_` and `block_cache_tracer_` were naturally placed in separate cachelines in our regression testing setup. This PR forces the cacheline alignment of `mutex_` so we don't accidentally reintroduce the problem. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9637 Reviewed By: riversand963 Differential Revision: D34502233 Pulled By: ajkr fbshipit-source-id: 46aa313b7fe83e80c3de254e332b6fb242434c07	2022-02-27 11:36:54 -08:00
Changneng Chen	9ed96703d1	Add support for BlobDB to ldb (#9630 ) Summary: Add the configuration options and help messages of BlobDB to `ldb` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9630 Test Plan: `python ./tools/ldb_test.py` Reviewed By: ltamasi Differential Revision: D34443176 Pulled By: changneng fbshipit-source-id: 5b3f185cdfc2561e06dd37215c7edfbca07dbe80	2022-02-25 23:13:11 -08:00
Hui Xiao	87a8b3c8af	Deflake DBErrorHandlingFSTest.MultiCFWALWriteError (#9496 ) Summary: Context: As part of https://github.com/facebook/rocksdb/pull/6949, file deletion is disabled for faulty database on the IOError of MANIFEST write/sync and [re-enabled again during `DBImpl::Resume()` if all recovery is completed](`e66199d848 (diff-d9341fbe2a5d4089b93b22c5ed7f666bc311b378c26d0786f4b50c290e460187R396)`). Before re-enabling file deletion, it `assert(versions_->io_status().ok());`, which IMO assumes `versions_` is the `version_` in the recovery process. However, this is not necessarily true due to `s = error_handler_.ClearBGError();` happening before that assertion can unblock some foreground thread by [`EventHelpers::NotifyOnErrorRecoveryEnd()`](`3122cb4358/db/error_handler.cc (L552-L553)`) as part of the `ClearBGError()`. That foreground thread can do whatever it wants including closing/reopening the db and clean up that same `versions_`. As a consequence, `assert(versions_->io_status().ok());`, will access `io_status()` of a nullptr and test like `DBErrorHandlingFSTest.MultiCFWALWriteError` becomes flaky. The unblocked foreground thread (in this case, the testing thread) proceeds to [reopen the db](https://github.com/facebook/rocksdb/blob/6.29.fb/db/error_handler_fs_test.cc?fbclid=IwAR1kQOxSbTUmaHQPAGz5jdMHXtDsDFKiFl8rifX-vIz4B23Y0S9jBkssSCg#L1494), where [`versions_` gets reset to nullptr](https://github.com/facebook/rocksdb/blob/6.29.fb/db/db_impl/db_impl.cc?fbclid=IwAR2uRhwBiPKgmE9q_6CM2mzbfwjoRgsGpXOrHruSJUDcAKc9rYZtVSvKdOY#L678) as part of the old db clean-up. If this happens right before `assert(versions_->io_status().ok()); ` gets excuted in the background thread, then we can see error like ``` db/db_impl/db_impl.cc:420:5: runtime error: member call on null pointer of type 'rocksdb::VersionSet' assert(versions_->io_status().ok()); ``` Summary: - I proposed to call `s = error_handler_.ClearBGError();` after we know it's fine to wake up foreground, which I think is right before we LOG `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");` - As the context, the orignal https://github.com/facebook/rocksdb/pull/3997 introducing `DBImpl::Resume()` calls `s = error_handler_.ClearBGError();` very close to calling `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");` while the later https://github.com/facebook/rocksdb/pull/6949 distances these two calls a bit. - And it seems fine to me that `s = error_handler_.ClearBGError();` happens after `EnableFileDeletions(/force=/true);` at least syntax-wise since these two functions are orthogonal. And it also seems okay to me that we re-enable file deletion before `s = error_handler_.ClearBGError();`, which basically is resetting some state variables. - In addition, to preserve the previous behavior of https://github.com/facebook/rocksdb/pull/6949 where status of re-enabling file deletion is not taken account into the general status of resuming the db, I separated `enable_file_deletion_s` from the general `s` - In addition, to make `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");` more clear, I separated it into its own if-block. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9496 Test Plan: - Manually reproduce the assertion failure in`DBErrorHandlingFSTest.MultiCFWALWriteError` by injecting sleep like below so that it's more likely for `assert(versions_->io_status().ok());` to execute after [reopening the db](https://github.com/facebook/rocksdb/blob/6.29.fb/db/error_handler_fs_test.cc?fbclid=IwAR1kQOxSbTUmaHQPAGz5jdMHXtDsDFKiFl8rifX-vIz4B23Y0S9jBkssSCg#L1494) in the foreground (i.e, testing) thread ``` sleep(1); assert(versions_->io_status().ok()); ``` `python3 gtest-parallel/gtest_parallel.py -r 100 -w 100 rocksdb/error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.MultiCFWALWriteError` ``` [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBErrorHandlingFSTest [ RUN ] DBErrorHandlingFSTest.MultiCFWALWriteError Received signal 11 (Segmentation fault) #0 rocksdb/error_handler_fs_test() [0x5818a4] rocksdb::DBImpl::ResumeImpl(rocksdb::DBRecoverContext) /data/users/huixiao/rocksdb/db/db_impl/db_impl.cc:421 https://github.com/facebook/rocksdb/issues/1 rocksdb/error_handler_fs_test() [0x6379ff] rocksdb::ErrorHandler::RecoverFromBGError(bool) /data/users/huixiao/rocksdb/db/error_handler.cc:600 https://github.com/facebook/rocksdb/issues/2 rocksdb/error_handler_fs_test() [0x7c5362] rocksdb::SstFileManagerImpl::ClearError() /data/users/huixiao/rocksdb/file/sst_file_manager_impl.cc:310 https://github.com/facebook/rocksdb/issues/3 rocksdb/error_handler_fs_test() ``` - The assertion failure does not happen with PR `python3 gtest-parallel/gtest_parallel.py -r 100 -w 100 rocksdb/error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.MultiCFWALWriteError` `[100/100] DBErrorHandlingFSTest.MultiCFWALWriteError (43785 ms) ` Reviewed By: riversand963, anand1976 Differential Revision: D33990099 Pulled By: hx235 fbshipit-source-id: 2e0259a471fa8892ff177da91b3e1c0792dd7bab	2022-02-25 14:44:46 -08:00
Siddhartha Roychowdhury	21345d2823	Streaming Compression API for WAL compression. (#9619 ) Summary: Implement a streaming compression API (compress/uncompress) to use for WAL compression. The log_writer would use the compress class/API to compress a record before writing it out in chunks. The log_reader would use the uncompress class/API to uncompress the chunks and combine into a single record. Added unit test to verify the API for different sizes/compression types. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9619 Test Plan: make -j24 check Reviewed By: anand1976 Differential Revision: D34437346 Pulled By: sidroyc fbshipit-source-id: b180569ad2ddcf3106380f8758b556cc0ad18382	2022-02-23 23:45:04 -08:00
Bo Wang	f706a9c199	Add a secondary cache implementation based on LRUCache 1 (#9518 ) Summary: Summary: RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO. This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518 Test Plan: In this PR, the lru_secondary_cache_test.cc includes the following tests: 1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests. 2. The integration tests with both primary cache and this secondary cache . Follow Up: 1. Statistics (e.g. compression ratio) will be added in another PR. 2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact. Reviewed By: anand1976 Differential Revision: D34430930 Pulled By: gitbw95 fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded	2022-02-23 16:06:27 -08:00
Yanqin Jin	6f12599863	Support WBWI for keys having timestamps (#9603 ) Summary: This PR supports inserting keys to a `WriteBatchWithIndex` for column families that enable user-defined timestamps and reading the keys back. The index does not have timestamps. Writing a key to WBWI is unchanged, because the underlying WriteBatch already supports it. When reading the keys back, we need to make sure to distinguish between keys with and without timestamps before comparison. When user calls `GetFromBatchAndDB()`, no timestamp is needed to query the batch, but a timestamp has to be provided to query the db. The assumption is that data in the batch must be newer than data from the db. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9603 Test Plan: make check Reviewed By: ltamasi Differential Revision: D34354849 Pulled By: riversand963 fbshipit-source-id: d25d1f84e2240ce543e521fa30595082fb8db9a0	2022-02-22 14:23:01 -08:00
Andrew Kryczka	8ca433f912	Fix test race conditions with OnFlushCompleted() (#9617 ) Summary: We often see flaky tests due to `DB::Flush()` or `DBImpl::TEST_WaitForFlushMemTable()` not waiting until event listeners complete. For example, https://github.com/facebook/rocksdb/issues/9084, https://github.com/facebook/rocksdb/issues/9400, https://github.com/facebook/rocksdb/issues/9528, plus two new ones this week: "EventListenerTest.OnSingleDBFlushTest" and "DBFlushTest.FireOnFlushCompletedAfterCommittedResult". I ran a `make check` with the below race condition-coercing patch and fixed issues it found besides old BlobDB. ``` diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc index 0e1864788..aaba68c4a 100644 --- a/db/db_impl/db_impl_compaction_flush.cc +++ b/db/db_impl/db_impl_compaction_flush.cc @@ -861,6 +861,8 @@ void DBImpl::NotifyOnFlushCompleted( mutable_cf_options.level0_stop_writes_trigger); // release lock while notifying events mutex_.Unlock(); + bg_cv_.SignalAll(); + sleep(1); { for (auto& info : *flush_jobs_info) { info->triggered_writes_slowdown = triggered_writes_slowdown; ``` The reason I did not fix old BlobDB issues is because it appears to have a fundamental (non-test) issue. In particular, it uses an EventListener to keep track of the files. OnFlushCompleted() could be delayed until even after a compaction involving that flushed file completes, causing the compaction to unexpectedly delete an untracked file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9617 Test Plan: `make check` including the race condition coercing patch Reviewed By: hx235 Differential Revision: D34384022 Pulled By: ajkr fbshipit-source-id: 2652ded39b415277c5d6a628414345223930514e	2022-02-22 12:23:00 -08:00
Andrew Kryczka	96978e4d96	Enable core dumps in TSAN/UBSAN crash tests (#9616 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9616 Reviewed By: hx235 Differential Revision: D34383489 Pulled By: ajkr fbshipit-source-id: e4299000ef38073ec57e6ab5836150fdf8ce43d4	2022-02-22 12:23:00 -08:00
anand76	d795a730be	Combine data members of IOStatus with Status (#9549 ) Summary: Combine the data members retryable_, data_loss_ and scope_ of IOStatus with Status, as protected members. IOStatus is now defined as a derived class of Status with no new data, but additional methods. This will allow us to eventually track the result of FileSystem calls in RocksDB with one variable instead of two. Benchmark commands and results are below. The performance after changes seems slightly better. ```./db_bench -db=/data/mysql/rocksdb/prefix_scan -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -use_direct_io_for_flush_and_compaction=true -target_file_size_base=16777216``` ```./db_bench -use_existing_db=true --db=/data/mysql/rocksdb/prefix_scan -benchmarks="readseq,seekrandom,readseq" -key_size=32 -value_size=512 -num=5000000 -seek_nexts=10000 -use_direct_reads=true -duration=60 -ops_between_duration_checks=1 -readonly=true -adaptive_readahead=false -threads=1 -cache_size=10485760000``` Before - seekrandom : 3715.432 micros/op 269 ops/sec; 1394.9 MB/s (16149 of 16149 found) seekrandom : 3687.177 micros/op 271 ops/sec; 1405.6 MB/s (16273 of 16273 found) seekrandom : 3709.646 micros/op 269 ops/sec; 1397.1 MB/s (16175 of 16175 found) readseq : 0.369 micros/op 2711321 ops/sec; 1406.6 MB/s readseq : 0.363 micros/op 2754092 ops/sec; 1428.8 MB/s readseq : 0.372 micros/op 2688046 ops/sec; 1394.6 MB/s After - seekrandom : 3606.830 micros/op 277 ops/sec; 1436.9 MB/s (16636 of 16636 found) seekrandom : 3594.467 micros/op 278 ops/sec; 1441.9 MB/s (16693 of 16693 found) seekrandom : 3597.919 micros/op 277 ops/sec; 1440.5 MB/s (16677 of 16677 found) readseq : 0.354 micros/op 2822809 ops/sec; 1464.5 MB/s readseq : 0.358 micros/op 2795080 ops/sec; 1450.1 MB/s readseq : 0.354 micros/op 2822889 ops/sec; 1464.5 MB/s Pull Request resolved: https://github.com/facebook/rocksdb/pull/9549 Reviewed By: pdillinger Differential Revision: D34310362 Pulled By: anand1976 fbshipit-source-id: 54b27756edf9c9ecfe730a2dce542a7a46743096	2022-02-22 11:23:01 -08:00
Patrick Somaru	ba65cfff63	configure microbenchmarks, regenerate targets (#9599 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9599 Reviewed By: jay-zhuang, hodgesds Differential Revision: D34214408 fbshipit-source-id: 6932200772f52ce77e550646ee3d1a928295844a	2022-02-22 09:24:51 -08:00
Andrew Kryczka	3379d1466f	Fix DBTest2.BackupFileTemperature memory leak (#9610 ) Summary: Valgrind was failing with the below error because we forgot to destroy the `BackupEngine` object: ``` ==421173== Command: ./db_test2 --gtest_filter=DBTest2.BackupFileTemperature ==421173== Note: Google Test filter = DBTest2.BackupFileTemperature [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBTest2 [ RUN ] DBTest2.BackupFileTemperature --421173-- WARNING: unhandled amd64-linux syscall: 425 --421173-- You may be able to write your own handler. --421173-- Read the file README_MISSING_SYSCALL_OR_IOCTL. --421173-- Nevertheless we consider this a bug. Please report --421173-- it at http://valgrind.org/support/bug_reports.html. [ OK ] DBTest2.BackupFileTemperature (3366 ms) [----------] 1 test from DBTest2 (3371 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (3413 ms total) [ PASSED ] 1 test. ==421173== ==421173== HEAP SUMMARY: ==421173== in use at exit: 13,042 bytes in 195 blocks ==421173== total heap usage: 26,022 allocs, 25,827 frees, 27,555,265 bytes allocated ==421173== ==421173== 8 bytes in 1 blocks are possibly lost in loss record 6 of 167 ==421173== at 0x4838DBF: operator new(unsigned long) (vg_replace_malloc.c:344) ==421173== by 0x8D4606: allocate (new_allocator.h:114) ==421173== by 0x8D4606: allocate (alloc_traits.h:445) ==421173== by 0x8D4606: _M_allocate (stl_vector.h:343) ==421173== by 0x8D4606: reserve (vector.tcc:78) ==421173== by 0x8D4606: rocksdb::BackupEngineImpl::Initialize() (backupable_db.cc:1174) ==421173== by 0x8D5473: Initialize (backupable_db.cc:918) ==421173== by 0x8D5473: rocksdb::BackupEngine::Open(rocksdb::BackupEngineOptions const&, rocksdb::Env, rocksdb::BackupEngine*) (backupable_db.cc:937) ==421173== by 0x50AC8F: Open (backup_engine.h:585) ==421173== by 0x50AC8F: rocksdb::DBTest2_BackupFileTemperature_Test::TestBody() (db_test2.cc:6996) ... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9610 Test Plan: ``` $ make -j24 ROCKSDBTESTS_SUBSET=db_test2 valgrind_check_some ``` Reviewed By: akankshamahajan15 Differential Revision: D34371210 Pulled By: ajkr fbshipit-source-id: 68154fcb0c51b28222efa23fa4ee02df8d925a18	2022-02-21 19:23:19 -08:00
Andrew Kryczka	7ae4da924a	Update HISTORY.md and version.h for 7.0 release (#9609 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9609 Reviewed By: riversand963 Differential Revision: D34370309 Pulled By: ajkr fbshipit-source-id: 5fc9306439aefa4b2d61d847534ea6758c30b6a5	2022-02-20 15:22:54 -08:00
Akanksha Mahajan	3699b171e4	Change enum SizeApproximationFlags to enum class (#9604 ) Summary: Change enum SizeApproximationFlags to enum and class and add overloaded operators for the transition between enum class and uint8_t Pull Request resolved: https://github.com/facebook/rocksdb/pull/9604 Test Plan: Circle CI jobs Reviewed By: riversand963 Differential Revision: D34360281 Pulled By: akankshamahajan15 fbshipit-source-id: 6351dfdb717ae3c4530d324c3d37a8ecb01dd1ef	2022-02-18 20:22:57 -08:00
Jay Zhuang	d3a2f284d9	Add Temperature info in `NewSequentialFile()` (#9499 ) Summary: Add Temperature hints information from RocksDB in API `NewSequentialFile()`. backup and checkpoint operations need to open the source files with `NewSequentialFile()`, which will have the temperature hints. Other operations are not covered. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9499 Test Plan: Added unittest Reviewed By: pdillinger Differential Revision: D34006115 Pulled By: jay-zhuang fbshipit-source-id: 568b34602b76520e53128672bd07e9d886786a2f	2022-02-18 18:23:07 -08:00
Akanksha Mahajan	559525dcbb	Add Async Read and Poll APIs in FileSystem (#9564 ) Summary: This PR adds support for new APIs Async Read that reads the data asynchronously and Poll API that checks if requested read request has completed or not. Usage: In RocksDB, we are currently planning to prefetch data asynchronously during sequential scanning and RocksDB will call these APIs to prefetch more data in advanced. Design: - ReadAsync API submits the read request to underlying FileSystem in order to read data asynchronously. When read request is completed, callback function will be called. cb_arg is used by RocksDB to track the original request submitted and IOHandle is used by FileSystem to keep track of IO requests at their level. - The Poll API is added in FileSystem because the call could end up handling completions for multiple different files which is not specific to a FSRandomAccessFile instance. There could be multiple outstanding file reads from different files in future and they can complete in any order. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9564 Test Plan: Test will be added in separate PR. Reviewed By: anand1976 Differential Revision: D34226216 Pulled By: akankshamahajan15 fbshipit-source-id: 95e64edafb17f543f7232421d51e2665a3267f69	2022-02-18 17:23:18 -08:00
Bo Wang	67f071fade	Fixes #9565 (#9586 ) Summary: [Compaction::IsTrivialMove](`a2b9be42b6/db/compaction/compaction.cc (L318)`) checks whether allow_trivial_move is set, and if so it returns the value of is_trivial_move_. The allow_trivial_move option is there for universal compaction. So when this is set and leveled compaction is enabled, then useful code that follows this block never gets a chance to run. A check that [compaction_style == kCompactionStyleUniversal](`320d9a8e8a/db/db_impl/db_impl_compaction_flush.cc (L1030)`) should be added to avoid doing the wrong thing for leveled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9586 Test Plan: To reproduce this: First edit db/compaction/compaction.cc with ``` diff --git a/db/compaction/compaction.cc b/db/compaction/compaction.cc index 7ae50b91e..52dd489b1 100644 --- a/db/compaction/compaction.cc +++ b/db/compaction/compaction.cc @@ -319,6 +319,8 @@ bool Compaction::IsTrivialMove() const { // input files are non overlapping if ((mutable_cf_options_.compaction_options_universal.allow_trivial_move) && (output_level_ != 0)) { + printf("IsTrivialMove:: return %d because universal allow_trivial_move\n", (int) is_trivial_move_); + // abort(); return is_trivial_move_; } ``` And then run ``` ./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/m/rx --wal_dir=/data/m/rx --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --seed=1641328309 --universal_allow_trivial_move=1 ``` Example output with the debug code added ``` IsTrivialMove:: return 0 because universal allow_trivial_move IsTrivialMove:: return 0 because universal allow_trivial_move ``` After this PR, the bug is fixed. Reviewed By: ajkr Differential Revision: D34350451 Pulled By: gitbw95 fbshipit-source-id: 3232005cc47c40a7e75d316cfc7960beb5bdff3a	2022-02-18 14:23:07 -08:00
pat somaru	736bc83270	fix issue with buckifier update (#9602 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9602 Reviewed By: jay-zhuang Differential Revision: D34350406 Pulled By: likewhatevs fbshipit-source-id: caa81f272a429fbf7293f0588ea24cc53b29ee98	2022-02-18 14:23:07 -08:00
Jay Zhuang	f4b2500e12	Add last level and non-last level read statistics (#9519 ) Summary: Add last level and non-last level read statistics: ``` LAST_LEVEL_READ_BYTES, LAST_LEVEL_READ_COUNT, NON_LAST_LEVEL_READ_BYTES, NON_LAST_LEVEL_READ_COUNT, ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9519 Test Plan: added unittest Reviewed By: siying Differential Revision: D34062539 Pulled By: jay-zhuang fbshipit-source-id: 908644c3050878b4234febdc72e3e19d89af38cd	2022-02-18 14:23:07 -08:00
mrambacher	30b08878d8	Make FilterPolicy Customizable (#9590 ) Summary: Make FilterPolicy into a Customizable class. Allow new FilterPolicy to be discovered through the ObjectRegistry Pull Request resolved: https://github.com/facebook/rocksdb/pull/9590 Reviewed By: pdillinger Differential Revision: D34327367 Pulled By: mrambacher fbshipit-source-id: 37e7edac90ec9457422b72f359ab8ef48829c190	2022-02-18 13:22:31 -08:00
Patrick Somaru	f066b5cecb	update buckifier, add support for microbenchmarks (#9598 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9598 Reviewed By: jay-zhuang, hodgesds Differential Revision: D34130191 fbshipit-source-id: e5413f7d6af70a66940022d153b64a3383eccff1	2022-02-18 11:23:18 -08:00
Jay Zhuang	2fbc672732	Add temperature information to the event listener callbacks (#9591 ) Summary: RocksDB try to provide temperature information in the event listener callbacks. The information is not guaranteed, as some operation like backup won't have these information. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9591 Test Plan: Added unittest Reviewed By: siying, pdillinger Differential Revision: D34309339 Pulled By: jay-zhuang fbshipit-source-id: 4aca4f270f99fa49186d85d300da42594663d6d7	2022-02-18 11:23:18 -08:00
Andrew Kryczka	54fb2a8975	Change type of cache buffer passed to `Cache::CreateCallback()` to `const void*` (#9595 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9595 Reviewed By: riversand963 Differential Revision: D34329906 Pulled By: ajkr fbshipit-source-id: 508601856fa9bee4d40f4a68d14d333ef2143d40	2022-02-17 21:09:56 -08:00
Peter Dillinger	48b9de4a3e	Mark more OldDefaults as deprecated (#9594 ) Summary: `ColumnFamilyOptions::OldDefaults` and `DBOptions::OldDefaults` now deprecated. Were previously overlooked with `Options::OldDefaults` in https://github.com/facebook/rocksdb/issues/9363 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9594 Test Plan: comments only Reviewed By: jay-zhuang Differential Revision: D34318592 Pulled By: pdillinger fbshipit-source-id: 773c97a61e2a8290ae154f363dd61c1f35a9dd16	2022-02-17 20:28:10 -08:00
Alan Paxton	ce84e50288	Plugin java jni support (#9575 ) Summary: Extend the plugin architecture to allow for the inclusion, building and testing of Java and JNI components of a plugin. This will cause the JAR built by `$ make rocksdbjava` to include the extra functionality provided by the plugin, and will cause `$ make jtest` to add the java tests provided by the plugin to the tests built and run by Java testing. The plugin's `<plugin>.mk` file can define: ``` <plugin>_JNI_NATIVE_SOURCES <plugin>_NATIVE_JAVA_CLASSES <plugin>_JAVA_TESTS ``` The plugin should provide java/src, java/test and java/rocksjni directories. When a plugin is required to be build it must be named in the ROCKSDB_PLUGINS environment variable (as per the plugin architecture). This now has the effect of adding the files specified by the above definitions to the appropriate parts of the build. An example of a plugin with a Java component can be found as part of the hdfs plugin in https://github.com/riversand963/rocksdb-hdfs-env - at the time of writing the Java part of this fails tests, and needs a little work to complete, but it builds correctly under the plugin model. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9575 Reviewed By: hx235 Differential Revision: D34253948 Pulled By: riversand963 fbshipit-source-id: b3dde5da06f3d3c25c54246892097ae2a369b42d	2022-02-17 19:39:23 -08:00
Peter Dillinger	561be005ba	Some better API and other comments (#9533 ) Summary: Various comments, mostly about SliceTransform + prefix extractors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9533 Test Plan: comments only Reviewed By: ajkr Differential Revision: D34094367 Pulled By: pdillinger fbshipit-source-id: 9742ce3b89ef7fd5c5e748fec862e6361ed44e95	2022-02-17 18:51:08 -08:00
Alan Paxton	8d9c203f69	Remove previously deprecated Java where RocksDB also removed it, or where no direct equivalent existed. (#9576 ) Summary: For RocksDB v7 major release. Remove previously deprecated Java API methods and associated tests - where equivalent/alternative functionality exists and is already tested AND - where the core RocksDB function/feature has also been removed - OR the functionality exists only in Java so the previous deprecation only affected Java methods RETAIN deprecated Java which reflects functionality which is deprecated by, but also still supported by, the core of RocksDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9576 Reviewed By: ajkr Differential Revision: D34314983 Pulled By: jay-zhuang fbshipit-source-id: 7cf9c17e3e07be9d289beb99f81b71e8e09ac403	2022-02-17 17:29:35 -08:00
Peter Dillinger	725833a424	Hide FilterBits{Builder,Reader} from public API (#9592 ) Summary: We don't have any evidence of people using these to build custom filters. The recommended way of customizing filter handling is to defer to various built-in policies based on FilterBuildingContext (e.g. to build Monkey filtering policy). With old API, we have evidence of people modifying keys going into filter, but most cases of that can be handled with prefix_extractor. Having FilterBitsBuilder+Reader in the public API is an ogoing hinderance to code evolution (e.g. recent new Finish and MaybePostVerify), and so this change removes them from the public API for 7.0. Maybe they will come back in some form later, but lacking evidence of them providing value in the public API, we want to take back more freedom to evolve these. With this moved to internal-only, there is no rush to clean up the complex Finish signatures, or add memory allocator support, but doing so is much easier with them out of public API, for example to use CacheAllocationPtr without exposing it in the public API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9592 Test Plan: cosmetic changes only Reviewed By: hx235 Differential Revision: D34315470 Pulled By: pdillinger fbshipit-source-id: 03e03bb66a72c73df2c464d2dbbbae906dd8f99b	2022-02-17 16:34:46 -08:00
anand76	627deb7ceb	Fix some MultiGet batching stats (#9583 ) Summary: The NUM_INDEX_AND_FILTER_BLOCKS_READ_PER_LEVEL, NUM_DATA_BLOCKS_READ_PER_LEVEL, and NUM_SST_READ_PER_LEVEL stats were being recorded only when the last file in a level happened to have hits. They are supposed to be updated for every level. Also, there was some overcounting of GetContextStats. This PR fixes both the problems. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9583 Test Plan: Update the unit test in db_basic_test Reviewed By: akankshamahajan15 Differential Revision: D34308044 Pulled By: anand1976 fbshipit-source-id: b3b36020fda26ba91bc6e0e47d52d58f4d7f656e	2022-02-17 16:31:41 -08:00
Siddhartha Roychowdhury	39b0d92153	Add record to set WAL compression type if enabled (#9556 ) Summary: When WAL compression is enabled, add a record (new record type) to store the compression type to indicate that all subsequent records are compressed. The log reader will store the compression type when this record is encountered and use the type to uncompress the subsequent records. Compress and uncompress to be implemented in subsequent diffs. Enabled WAL compression in some WAL tests to check for regressions. Some tests that rely on offsets have been disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9556 Reviewed By: anand1976 Differential Revision: D34308216 Pulled By: sidroyc fbshipit-source-id: 7f10595e46f3277f1ea2d309fbf95e2e935a8705	2022-02-17 16:19:31 -08:00
Jay Zhuang	f092f0fa5d	Add subcompaction event API (#9311 ) Summary: Add event callback for subcompaction and adds a sub_job_id to identify it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9311 Reviewed By: ajkr Differential Revision: D33892707 Pulled By: jay-zhuang fbshipit-source-id: 57b5e5e594d61b2112d480c18a79a36751f65a4e	2022-02-17 15:47:10 -08:00
Peter Dillinger	a86ee02d34	Clarify compiler support release note (#9593 ) Summary: in HISTORY.md Pull Request resolved: https://github.com/facebook/rocksdb/pull/9593 Test Plan: release note only Reviewed By: siying Differential Revision: D34318189 Pulled By: pdillinger fbshipit-source-id: ba2eca8bede2d42a3fefd10b954b92cb54f831f2	2022-02-17 15:39:17 -08:00
Alan Paxton	36ce2e2a0a	Update build files for java8 build (#9541 ) Summary: For RocksJava 7 we will move from requiring Java 7 to Java 8. * This simplifies the `Makefile` as we no longer need to deal with Java 7; so we no longer use `javah`. * Added a java-version target which is invoked by the java target, and which exits if the version of java being used is not 8 or greater. * Enforces java 8 as a minimum. * Fixed CMake build. * Fixed broken java event listener test, as the test was broken and the assertions in the callbacks were not causing assertions in the tests. The callbacks now queue up assertion errors for the main thread of the tests to check. * Fixed C++ dangling pointers in the test code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9541 Reviewed By: pdillinger Differential Revision: D34214929 Pulled By: jay-zhuang fbshipit-source-id: fdff348758d0a23a742e83c87d5f54073ce16ca6	2022-02-17 13:29:21 -08:00
Adam Retter	5e64407923	Support C++17 Docker build environments for RocksJava (#9500 ) Summary: See https://github.com/facebook/rocksdb/issues/9388#issuecomment-1029583789 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9500 Reviewed By: pdillinger Differential Revision: D34114687 Pulled By: jay-zhuang fbshipit-source-id: 22129d99ccd0dba7e8f1b263ddc5520d939641bf	2022-02-17 12:48:38 -08:00
Andrew Kryczka	babe56ddba	Add rate limiter priority to ReadOptions (#9424 ) Summary: Users can set the priority for file reads associated with their operation by setting `ReadOptions::rate_limiter_priority` to something other than `Env::IO_TOTAL`. Rate limiting `VerifyChecksum()` and `VerifyFileChecksums()` is the motivation for this PR, so it also includes benchmarks and minor bug fixes to get that working. `RandomAccessFileReader::Read()` already had support for rate limiting compaction reads. I changed that rate limiting to be non-specific to compaction, but rather performed according to the passed in `Env::IOPriority`. Now the compaction read rate limiting is supported by setting `rate_limiter_priority = Env::IO_LOW` on its `ReadOptions`. There is no default value for the new `Env::IOPriority` parameter to `RandomAccessFileReader::Read()`. That means this PR goes through all callers (in some cases multiple layers up the call stack) to find a `ReadOptions` to provide the priority. There are TODOs for cases I believe it would be good to let user control the priority some day (e.g., file footer reads), and no TODO in cases I believe it doesn't matter (e.g., trace file reads). The API doc only lists the missing cases where a file read associated with a provided `ReadOptions` cannot be rate limited. For cases like file ingestion checksum calculation, there is no API to provide `ReadOptions` or `Env::IOPriority`, so I didn't count that as missing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9424 Test Plan: - new unit tests - new benchmarks on ~50MB database with 1MB/s read rate limit and 100ms refill interval; verified with strace reads are chunked (at 0.1MB per chunk) and spaced roughly 100ms apart. - setup command: `./db_bench -benchmarks=fillrandom,compact -db=/tmp/testdb -target_file_size_base=1048576 -disable_auto_compactions=true -file_checksum=true` - benchmarks command: `strace -ttfe pread64 ./db_bench -benchmarks=verifychecksum,verifyfilechecksums -use_existing_db=true -db=/tmp/testdb -rate_limiter_bytes_per_sec=1048576 -rate_limit_bg_reads=1 -rate_limit_user_ops=true -file_checksum=true` - crash test using IO_USER priority on non-validation reads with https://github.com/facebook/rocksdb/issues/9567 reverted: `python3 tools/db_crashtest.py blackbox --max_key=1000000 --write_buffer_size=524288 --target_file_size_base=524288 --level_compaction_dynamic_level_bytes=true --duration=3600 --rate_limit_bg_reads=true --rate_limit_user_ops=true --rate_limiter_bytes_per_sec=10485760 --interval=10` Reviewed By: hx235 Differential Revision: D33747386 Pulled By: ajkr fbshipit-source-id: a2d985e97912fba8c54763798e04f006ccc56e0c	2022-02-16 23:18:14 -08:00
Yanqin Jin	1cda273dc3	Fix a silent data loss for write-committed txn (#9571 ) Summary: The following sequence of events can cause silent data loss for write-committed transactions. ``` Time thread 1 bg flush \| db->Put("a") \| txn = NewTxn() \| txn->Put("b", "v") \| txn->Prepare() // writes only to 5.log \| db->SwitchMemtable() // memtable 1 has "a" \| // close 5.log, \| // creates 8.log \| trigger flush \| pick memtable 1 \| unlock db mutex \| write new sst \| txn->ctwb->Put("gtid", "1") // writes 8.log \| txn->Commit() // writes to 8.log \| // writes to memtable 2 \| compute min_log_number_to_keep_2pc, this \| will be 8 (incorrect). \| \| Purge obsolete wals, including 5.log \| V ``` At this point, writes of txn exists only in memtable. Close db without flush because db thinks the data in memtable are backed by log. Then reopen, the writes are lost except key-value pair {"gtid"->"1"}, only the commit marker of txn is in 8.log The reason lies in `PrecomputeMinLogNumberToKeep2PC()` which calls `FindMinPrepLogReferencedByMemTable()`. In the above example, when bg flush thread tries to find obsolete wals, it uses the information computed by `PrecomputeMinLogNumberToKeep2PC()`. The return value of `PrecomputeMinLogNumberToKeep2PC()` depends on three components - `PrecomputeMinLogNumberToKeepNon2PC()`. This represents the WAL that has unflushed data. As the name of this method suggests, it does not account for 2PC. Although the keys reside in the prepare section of a previous WAL, the column family references the current WAL when they are actually inserted into the memtable during txn commit. - `prep_tracker->FindMinLogContainingOutstandingPrep()`. This represents the WAL with a prepare section but the txn hasn't committed. - `FindMinPrepLogReferencedByMemTable()`. This represents the WAL on which some memtables (mutable and immutable) depend for their unflushed data. The bug lies in `FindMinPrepLogReferencedByMemTable()`. Originally, this function skips checking the column families that are being flushed, but the unit test added in this PR shows that they should not be. In this unit test, there is only the default column family, and one of its memtables has unflushed data backed by a prepare section in 5.log. We should return this information via `FindMinPrepLogReferencedByMemTable()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9571 Test Plan: ``` ./transaction_test --gtest_filter=/TransactionTest.SwitchMemtableDuringPrepareAndCommit_WC/ make check ``` Reviewed By: siying Differential Revision: D34235236 Pulled By: riversand963 fbshipit-source-id: 120eb21a666728a38dda77b96276c6af72b008b1	2022-02-16 23:08:58 -08:00
Peter Dillinger	1e403a0c6c	Fix assertion failure in FastLocalBloomBitsBuilder (#9585 ) Summary: As in ``` db_stress: table/block_based/filter_policy.cc:316: rocksdb::{anonymous}::FastLocalBloomBitsBuilder::FastLocalBloomBitsBuilder(int, std::atomic<long int>*, std::shared_ptr<rocksdb::CacheReservationManager>, bool): Assertion `millibits_per_key >= 1000' failed. ``` This assertion failure was actually happening with our RibbonFilterPolicy which falls back to Bloom for some cases, often for flush, but was missing new special logic to skip generating filter for 0 bits per key case. Fixed by adding the logic in other builtin FilterPolicy implementations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9585 Test Plan: Updated db_bloom_filter_test to do more integration testing of the RibbonFilterPolicy ("auto Ribbon") class, incl regression test this with SkipFilterOnEssentiallyZeroBpk Reviewed By: ajkr Differential Revision: D34295101 Pulled By: pdillinger fbshipit-source-id: 3488eb207fc1d67bbbd1301313714aa1b6406e6e	2022-02-16 22:43:34 -08:00
sdong	8286469b9a	LDB to add --secondary_path to help (#9582 ) Summary: Opening DB as seconeary instance has been supported in ldb but it is not mentioned in --help. Mention it there. The part of the help message after the modification: ``` commands MUST specify --db=<full_path_to_db_directory> when necessary commands can optionally specify --env_uri=<uri_of_environment> or --fs_uri=<uri_of_filesystem> if necessary --secondary_path=<secondary_path> to open DB as secondary instance. Operations not supported in secondary instance will fail. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9582 Test Plan: Build and run ldb --help Reviewed By: riversand963 Differential Revision: D34286427 fbshipit-source-id: e56c5290d0548098ab6acc6dde2167f5a64f34f3	2022-02-16 17:07:37 -08:00
Jay Zhuang	31031c0210	Remove deprecated RemoteCompaction API (#9570 ) Summary: Remove deprecated remote compaction APIs `CompactionService::Start()` and `CompactionService::WaitForComplete()`. Please use `CompactionService::StartV2()`, `CompactionService::WaitForCompleteV2()` instead, which provides the same information plus extra data like priority, db_id, etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9570 Test Plan: CI Reviewed By: riversand963 Differential Revision: D34255969 Pulled By: jay-zhuang fbshipit-source-id: c6376eccdd1123f1c42ab53771b5f65f8160c325	2022-02-16 13:25:28 -08:00
mrambacher	c42d0cf862	Add support for decimals to PatternEntry (#9577 ) Summary: Add support for doubles to ObjectLibrary::PatternEntry. This support will allow patterns containing a non-integer number to be parsed correctly. Added appropriate test cases to cover this new option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9577 Reviewed By: pdillinger Differential Revision: D34269763 Pulled By: mrambacher fbshipit-source-id: b5ce16cbd3665c2974ec0f3412ef2b403ef8b155	2022-02-16 11:15:19 -08:00
Yueh-Hsuan Chiang	48f6c2a049	Add Solana's RocksDB use case in USERS.md (#9558 ) Summary: Add Solana's RocksDB use case in USERS.md. Solana is a fast, secure, scalable, and decentralized blockchain. It uses RocksDB as the underlying storage for its ledger store. github: https://github.com/solana-labs/solana Pull Request resolved: https://github.com/facebook/rocksdb/pull/9558 Reviewed By: jay-zhuang Differential Revision: D34249087 Pulled By: riversand963 fbshipit-source-id: 7524eff4952e2676e8520ac491ffb6a686fb4d7e	2022-02-16 09:23:01 -08:00
Peter Dillinger	8c681087c7	Refactor FilterPolicies toward Customizable (#9567 ) Summary: Some changes to make it easier to make FilterPolicy customizable. Especially, create distinct classes for the different testing-only and user-facing built-in FilterPolicy modes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9567 Test Plan: tests updated, with no intended difference in functionality tested. No difference in test performance seen as a result of moving to string-based filter type configuration. Reviewed By: mrambacher Differential Revision: D34234694 Pulled By: pdillinger fbshipit-source-id: 8a94931a9e04c3bcca863a4f524cfd064aaf0122	2022-02-16 08:30:03 -08:00
Jay Zhuang	a0c569ee1d	Cancel manual compaction in thread-pool queue (#9557 ) Summary: Fix `DisableManualCompaction()` has to wait scheduled manual compaction to start the execution to cancel the job. When a manual compaction in thread-pool queue is cancel, set the job is_canceled to true and clean the resource. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9557 Test Plan: added unittest that will hang without the change Reviewed By: ajkr Differential Revision: D34214910 Pulled By: jay-zhuang fbshipit-source-id: 89dbaee78ddf26eb13ce862c2b15f4a098b36a78	2022-02-15 19:23:01 -08:00
Andrew Kryczka	ad2cab8f0c	minor tweaks to db_crashtest.py settings (#9483 ) Summary: I did another pass through running CI jobs. It is uncommon now to see `db_stress` stuck in the setup phase but still happen. One reason was repeatedly reading/verifying checksum on filter blocks when `-cache_index_and_filter_blocks=1` and `-cache_size=1048576`. To address that I increased the cache size. Another reason was having a WAL with many range tombstones and every `db_stress` run using `-avoid_flush_during_recovery=1` (in that scenario, the setup phase spent too much CPU in `rocksdb::MemTable::NewRangeTombstoneIteratorInternal()`). To address that I fixed the `-avoid_flush_during_recovery` setting so it is reevaluated for every `db_stress` run. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9483 Reviewed By: riversand963 Differential Revision: D33922929 Pulled By: ajkr fbshipit-source-id: 0a298ec7c4df6f6b44620233996047a2dc7ee5f3	2022-02-15 13:56:27 -08:00
Hui Xiao	57418aba51	Fix a typo in HISTORY.md for 7.0 (#9574 ) Summary: See PR Pull Request resolved: https://github.com/facebook/rocksdb/pull/9574 Reviewed By: ajkr, mrambacher Differential Revision: D34239184 Pulled By: hx235 fbshipit-source-id: 6b5cc70d86b804ab4645bc2cd0243961c2fb00ee	2022-02-15 12:31:16 -08:00
Hui Xiao	443d8ef094	Fix PinSelf() read-after-free in DB::GetMergeOperands() (#9507 ) Summary: Context: Running the new test `DBMergeOperandTest.MergeOperandReadAfterFreeBug` prior to this fix surfaces the read-after-free bug of PinSef() as below: ``` READ of size 8 at 0x60400002529d thread T0 https://github.com/facebook/rocksdb/issues/5 0x7f199a in rocksdb::PinnableSlice::PinSelf(rocksdb::Slice const&) include/rocksdb/slice.h:171 https://github.com/facebook/rocksdb/issues/6 0x7f199a in rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImplOptions&) db/db_impl/db_impl.cc:1919 https://github.com/facebook/rocksdb/issues/7 0x540d63 in rocksdb::DBImpl::GetMergeOperands(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle, rocksdb::Slice const&, rocksdb::PinnableSlice, rocksdb::GetMergeOperandsOptions, int) db/db_impl/db_impl.h:203 freed by thread T0 here: https://github.com/facebook/rocksdb/issues/3 0x1191399 in rocksdb::cache_entry_roles_detail::RegisteredDeleter<rocksdb::Block, (rocksdb::CacheEntryRole)0>::Delete(rocksdb::Slice const&, void) cache/cache_entry_roles.h:99 https://github.com/facebook/rocksdb/issues/4 0x719348 in rocksdb::LRUHandle::Free() cache/lru_cache.h:205 https://github.com/facebook/rocksdb/issues/5 0x71047f in rocksdb::LRUCacheShard::Release(rocksdb::Cache::Handle, bool) cache/lru_cache.cc:547 https://github.com/facebook/rocksdb/issues/6 0xa78f0a in rocksdb::Cleanable::DoCleanup() include/rocksdb/cleanable.h:60 https://github.com/facebook/rocksdb/issues/7 0xa78f0a in rocksdb::Cleanable::Reset() include/rocksdb/cleanable.h:38 https://github.com/facebook/rocksdb/issues/8 0xa78f0a in rocksdb::PinnedIteratorsManager::ReleasePinnedData() db/pinned_iterators_manager.h:71 https://github.com/facebook/rocksdb/issues/9 0xd0c21b in rocksdb::PinnedIteratorsManager::~PinnedIteratorsManager() db/pinned_iterators_manager.h:24 https://github.com/facebook/rocksdb/issues/10 0xd0c21b in rocksdb::Version::Get(rocksdb::ReadOptions const&, rocksdb::LookupKey const&, rocksdb::PinnableSlice, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, rocksdb::Status, rocksdb::MergeContext, unsigned long, bool, bool, unsigned long, rocksdb::ReadCallback, bool, bool) db/pinned_iterators_manager.h:22 https://github.com/facebook/rocksdb/issues/11 0x7f0fdf in rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImplOptions&) db/db_impl/db_impl.cc:1886 https://github.com/facebook/rocksdb/issues/12 0x540d63 in rocksdb::DBImpl::GetMergeOperands(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle, rocksdb::Slice const&, rocksdb::PinnableSlice, rocksdb::GetMergeOperandsOptions, int) db/db_impl/db_impl.h:203 previously allocated by thread T0 here: https://github.com/facebook/rocksdb/issues/1 0x1239896 in rocksdb::AllocateBlock(unsigned long, *rocksdb::MemoryAllocator)** memory/memory_allocator.h:35 https://github.com/facebook/rocksdb/issues/2 0x1239896 in rocksdb::BlockFetcher::CopyBufferToHeapBuf() table/block_fetcher.cc:171 https://github.com/facebook/rocksdb/issues/3 0x1239896 in rocksdb::BlockFetcher::GetBlockContents() table/block_fetcher.cc:206 https://github.com/facebook/rocksdb/issues/4 0x122eae5 in rocksdb::BlockFetcher::ReadBlockContents() table/block_fetcher.cc:325 https://github.com/facebook/rocksdb/issues/5 0x11b1f45 in rocksdb::Status rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::Block>(rocksdb::FilePrefetchBuffer, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, bool, rocksdb::CachableEntry<rocksdb::Block>, rocksdb::BlockType, rocksdb::GetContext, rocksdb::BlockCacheLookupContext, rocksdb::BlockContents) const table/block_based/block_based_table_reader.cc:1503 ``` Here is the analysis: - We have [PinnedIteratorsManager](https://github.com/facebook/rocksdb/blob/6.28.fb/db/version_set.cc#L1980) with `Cleanable` capability in our `Version::Get()` path. It's responsible for managing the life-time of pinned iterator and invoking registered cleanup functions during its own destruction. - For example in case above, the merge operands's clean-up gets associated with this manger in [GetContext::push_operand](https://github.com/facebook/rocksdb/blob/6.28.fb/table/get_context.cc#L405). During PinnedIteratorsManager's [destruction](https://github.com/facebook/rocksdb/blob/6.28.fb/db/pinned_iterators_manager.h#L67), the release function associated with those merge operand data is invoked. And that's what we see in "freed by thread T955 here" in ASAN.* - Bug 🐛: `PinnedIteratorsManager` is local to `Version::Get()` while the data of merge operands need to outlive `Version::Get` and stay till they get [PinSelf()](https://github.com/facebook/rocksdb/blob/6.28.fb/db/db_impl/db_impl.cc#L1905), which is the read-after-free in ASAN. - This bug is likely to be an overlook of `PinnedIteratorsManager` when developing the API `DB::GetMergeOperands` cuz the current logic works fine with the existing case of getting the merged value where the operands do not need to live that long. - This bug was not surfaced much (even in its unit test) due to the release function associated with the merge operands (which are actually blocks put in cache as you can see in `BlockBasedTable::MaybeReadBlockAndLoadToCache` in "previously allocated by" in ASAN report) is a cache entry deleter. The deleter will call `Cache::Release()` which, for LRU cache, won't immediately deallocate the block based on LRU policy [unless the cache is full or being instructed to force erase](https://github.com/facebook/rocksdb/blob/6.28.fb/cache/lru_cache.cc#L521-L531) - `DBMergeOperandTest.MergeOperandReadAfterFreeBug` makes the cache extremely small to force cache full. Summary: - Fix the bug by align `PinnedIteratorsManager`'s lifetime with the merge operands Pull Request resolved: https://github.com/facebook/rocksdb/pull/9507 Test Plan: - New test `DBMergeOperandTest.MergeOperandReadAfterFreeBug` - db bench on read path - Setup (LSM tree with several levels, cache the whole db to avoid read IO, warm cache with readseq to avoid read IO): `TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks="fillrandom,readseq -num=1000000 -cache_size=100000000 -write_buffer_size=10000 -statistics=1 -max_bytes_for_level_base=10000 -level0_file_num_compaction_trigger=1``TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks="readrandom" -num=1000000 -cache_size=100000000 ` - Actual command run (run 20-run for 20 times and then average the 20-run's average micros/op) - `for j in {1..20}; do (for i in {1..20}; do rm -rf /dev/shm/rocksdb/ && TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks="fillrandom,readseq,readrandom" -num=1000000 -cache_size=100000000 -write_buffer_size=10000 -statistics=1 -max_bytes_for_level_base=10000 -level0_file_num_compaction_trigger=1 \| egrep 'readrandom'; done > rr_output_pre.txt && (awk '{sum+=$3; sum_sqrt+=$3^2}END{print sum/20, sqrt(sum_sqrt/20-(sum/20)^2)}' rr_output_pre.txt) >> rr_output_pre_2.txt); done` - Result: Pre-change: 3.79193 micros/op; Post-change: 3.79528 micros/op (+0.09%) (pre-change)sorted avg micros/op of each 20-run \| std of micros/op of each 20-run \| (post-change) sorted avg micros/op of each 20-run \| std of micros/op of each 20-run -- \| -- \| -- \| -- 3.58355 \| 0.265209 \| 3.48715 \| 0.382076 3.58845 \| 0.519927 \| 3.5832 \| 0.382726 3.66415 \| 0.452097 \| 3.677 \| 0.563831 3.68495 \| 0.430897 \| 3.68405 \| 0.495355 3.70295 \| 0.482893 \| 3.68465 \| 0.431438 3.719 \| 0.463806 \| 3.71945 \| 0.457157 3.7393 \| 0.453423 \| 3.72795 \| 0.538604 3.7806 \| 0.527613 \| 3.75075 \| 0.444509 3.7817 \| 0.426704 \| 3.7683 \| 0.468065 3.809 \| 0.381033 \| 3.8086 \| 0.557378 3.80985 \| 0.466011 \| 3.81805 \| 0.524833 3.8165 \| 0.500351 \| 3.83405 \| 0.529339 3.8479 \| 0.430326 \| 3.86285 \| 0.44831 3.85125 \| 0.434108 \| 3.8717 \| 0.544098 3.8556 \| 0.524602 \| 3.895 \| 0.411679 3.8656 \| 0.476383 \| 3.90965 \| 0.566636 3.8911 \| 0.488477 \| 3.92735 \| 0.608038 3.898 \| 0.493978 \| 3.9439 \| 0.524511 3.97235 \| 0.515008 \| 3.9623 \| 0.477416 3.9768 \| 0.519993 \| 3.98965 \| 0.521481 - CI Reviewed By: ajkr Differential Revision: D34030519 Pulled By: hx235 fbshipit-source-id: a99ac585c11704c5ed93af033cb29ba0a7b16ae8	2022-02-15 12:25:18 -08:00
Peter Dillinger	420d51b9a0	Update Java API for FilterPolicy changes (#9569 ) Summary: Obsolete block-based filter no longer in public API, from https://github.com/facebook/rocksdb/issues/9535 Pull Request resolved: https://github.com/facebook/rocksdb/pull/9569 Test Plan: existing tests Reviewed By: jay-zhuang Differential Revision: D34243579 Pulled By: pdillinger fbshipit-source-id: ec5127d9bb9cc3f70501c531829a735bffdd1418	2022-02-15 12:18:52 -08:00
Peter Dillinger	e24734f843	Use -Wno-invalid-offsetof instead of dangerous offset_of hack (#9563 ) Summary: After https://github.com/facebook/rocksdb/issues/9515 added a unique_ptr to Status, we see some warnings-as-error in some internal builds like this: ``` stderr: rocksdb/src/db/compaction/compaction_job.cc:2839:7: error: offset of on non-standard-layout type 'struct CompactionServiceResult' [-Werror,-Winvalid-offsetof] {offsetof(struct CompactionServiceResult, status), ^ ~~~~~~ ``` I see three potential solutions to resolving this: * Expand our use of an idiom that works around the warning (see offset_of functions removed in this change, inspired by https://gist.github.com/graphitemaster/494f21190bb2c63c5516) However, this construction is invoking undefined behavior that assumes consistent layout with no compiler-introduced indirection. A compiler incompatible with our assumptions will likely compile the code and exhibit undefined behavior. * Migrate to something in place of offset, like a function mapping CompactionServiceResult* to Status* (for the `status` field). This might be required in the long term. * Selected: Use our new C++17 dependency to use offsetof in a well-defined way when the compiler allows it. From a comment on https://gist.github.com/graphitemaster/494f21190bb2c63c5516: > A final note: in C++17, offsetof is conditionally supported, which > means that you can use it on any type (not just standard layout > types) and the compiler will error if it can't compile it correctly. > That appears to be the best option if you can live with C++17 and > don't need constexpr support. The C++17 semantics are confirmed on https://en.cppreference.com/w/cpp/types/offsetof, so we can suppress the warning as long as we accept that we might run into a compiler that rejects the code, and at that point we will find a solution, such as the more intrusive "migrate" solution above. Although this is currently only showing in our buck build, it will surely show up also with make and cmake, so I have updated those configurations as well. Also in the buck build, -Wno-expansion-to-defined does not appear to be needed anymore (both current compiler configurations) so I removed it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9563 Test Plan: Tried out buck builds with both current compiler configurations Reviewed By: riversand963 Differential Revision: D34220931 Pulled By: pdillinger fbshipit-source-id: d39436008259bd1eaaa87c77be69fb2a5b559e1f	2022-02-15 09:19:19 -08:00

... 2 3 4 5 6 ...

10933 Commits