Summary:
1) In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
2) For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.
As a solution,
1. the corrupted WALs whose numbers are larger than the
corrupted wal and smaller than the new WAL will be moved to archive folder.
2. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9634
Test Plan:
1. Added new unit tests
2. make crast_test -j
Reviewed By: riversand963
Differential Revision: D34463666
Pulled By: akankshamahajan15
fbshipit-source-id: e233d3af0ed4e2028ca0cf051e5a334a0fdc9d19
Summary:
Currently async prefetching is enabled for implicit internal auto readahead in FilePrefetchBuffer if `ReadOptions.async_io` is set. This PR enables async prefetching for `ReadOptions.readahead_size` when `ReadOptions.async_io` is set true.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9827
Test Plan: Update unit test
Reviewed By: anand1976
Differential Revision: D35552129
Pulled By: akankshamahajan15
fbshipit-source-id: d9f9a96672852a591375a21eef15355cf3289f5c
Summary:
Added a Plugin class to the ObjectRegistry. Enabled compile-time and program-time addition of plugins to the Registry.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7949
Reviewed By: mrambacher
Differential Revision: D33517674
Pulled By: pdillinger
fbshipit-source-id: c3e3270aab76a489bfa9e85d78cdfca951912557
Summary:
Options `preserve_deletes` and `iter_start_seqnum` have been removed since 7.0.
This PR removes dead code related to these two removed options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9825
Test Plan: make check
Reviewed By: akankshamahajan15
Differential Revision: D35517950
Pulled By: riversand963
fbshipit-source-id: 86282ce5ec4087acb94a06a42a1b6d55b1715482
Summary:
Since all plaftorms don't support io_uring. So updated the unit
test to take that into consideration when testing async reads in unit tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9819
Test Plan:
valgrind --error-exitcode=2 --leak-check=full ./prefetch_test
--gtest_filter=PrefetchTest2.ReadAsyncWithPosixFS
CircleCI jobs
Reviewed By: pdillinger
Differential Revision: D35469959
Pulled By: akankshamahajan15
fbshipit-source-id: b170459ec816487fc0a13b1d55dbbe4f754b2eba
Summary:
Currently RocksDB reset async_read_in_progress_ in callback
due to which underlying filesystem relying on Poll API won't be called
leading to stale memory access.
In order to fix it, async_read_in_progress_ will be reset after Poll API
is called to make sure underlying file_system waiting on Poll can clear
its state or take appropriate action.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9815
Test Plan: CircleCI tests
Reviewed By: anand1976
Differential Revision: D35451534
Pulled By: akankshamahajan15
fbshipit-source-id: b70ef6251a7aa9ed4876ba5e5100baa33d7d474c
Summary:
When sub compaction is decided for L0->L1 compaction, most of the cases, all L0 files will be involved in all sub compactions. However, it is not always the case. When files are generally (but not strictly) inserted in sequential order, there can be a subset of L0 files invovled. Yet RocksDB always open all those L0 files, and build an iterator, read many of the files' first of last block with expensive readahead. We trim some input files to reduce overhead a little bit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9802
Test Plan: Add a unit test to cover this case and manually validate the behavior while running the test.
Reviewed By: ajkr
Differential Revision: D35371031
fbshipit-source-id: 701ed7375b5cbe41672e93b38fe8a1503dad08b6
Summary:
This change adds two unit tests that would each catch the
regression fixed in https://github.com/facebook/rocksdb/issues/9736
* TableMetaIndexKeys - detects any churn in metaindex block keys
generated by SST files using standard db_test_util configurations.
* BloomFilterCompatibility - this detects if any common built-in
FilterPolicy configurations fail to read filters generated by another.
(The regression bug caused NewRibbonFilterPolicy not to read filters
from NewBloomFilterPolicy and vice-versa.) This replaces some previous
tests that didn't really appear to be testing much of anything except
basic data correctness, which doesn't tell you a filter is being used.
Light refactoring in meta_blocks.cc/h to support inspecting metaindex
keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9773
Test Plan:
this is the test. Verified that 7.0.2 fails both tests and 7.0.3 passes.
With backporting for intentional API changes in 7.0, 6.29 also passes.
Reviewed By: ajkr
Differential Revision: D35236248
Pulled By: pdillinger
fbshipit-source-id: 493dfe9ad7e27524bf7c6c1af8a4b8c31bc6ef5a
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9803
Only use Meta-internal version now. precommit_checker.py also now obsolete
Bring back `make commit_prereq` in follow-up work
Reviewed By: jay-zhuang
Differential Revision: D35372283
fbshipit-source-id: 7428438ca51f878802c301d0d5591675e551a113
Summary:
Update stats in random_access_file_reader for Read and
ReadAsync API to take into account the read latency for async
prefetching.
It also fixes ERROR_HANDLER_AUTORESUME_RETRY_COUNT stat whose value was
incorrect in portal.h
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9810
Test Plan: Update unit test
Reviewed By: anand1976
Differential Revision: D35433081
Pulled By: akankshamahajan15
fbshipit-source-id: aeec3901270e58a003ce6b5214bd25ddcb3a12a9
Summary:
Fixes https://github.com/facebook/rocksdb/issues/9779.
The padding at the end of a struct is added implicitly according to the
sizeof spec: "When applied to a class, the result is the
number of bytes in an object of that class including any padding
required for placing objects of that type in an array"
(https://eel.is/c++draft/expr.sizeof#2.sentence-2). We should drop the
explicit padding since it assumed support for zero-length arrays, which
is non-standard.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9809
Test Plan: rely on CI
Reviewed By: riversand963
Differential Revision: D35413496
Pulled By: ajkr
fbshipit-source-id: 25d52ca45e648ad0d5657149f26f6adecbed1cb4
Summary:
The name of this property "kIsFileDeletionsEnabled" is very, very easy to misunderstand.
I think 0 represents false (i.e. disabled) and non-0 means true (enabled), and this property is just the opposite.
I modified the name of this property, and as few other positions as possible, so that the final meaning remains the same, but the name of this property is more common sense.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9791
Reviewed By: ajkr
Differential Revision: D35362166
Pulled By: jay-zhuang
fbshipit-source-id: 85310d88bdd131893effb64e1adb7d0d7b202f88
Summary:
For write-prepared/write-unprepared transactions,
GetCommitTimeWriteBatch() can be used only if the transaction is started
with `TransactionOptions::use_only_the_last_commit_time_batch_for_recovery` set
to true. Otherwise, it is possible that multiple uncommitted versions of the
same key exist in the database. During bottommost compaction, RocksDB may
set the sequence numbers of both to zero once they become committed, causing
output SST file to have two identical internal keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9794
Test Plan:
make check
pay special attention to the following
```
transaction_test --gtest_filter=MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/*
```
Reviewed By: lth
Differential Revision: D35327214
Pulled By: riversand963
fbshipit-source-id: 3bae00a28359c10e96e4c6f676d20de5610d8a0f
Summary:
Various renaming and fixes to get rid of remaining uses of
"backupable" which is terminology leftover from the original, flawed
design of BackupableDB. Now any DB can be backed up, using BackupEngine.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9792
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D35334386
Pulled By: pdillinger
fbshipit-source-id: 2108a42b4575c8cccdfd791c549aae93ec2f3329
Summary:
**Context/Todo:**
As requested, allow IOOptions to take in an Env::IOPriority for convenience to pass down rate limiter related hint to file system level and for future interaction between RocksDB internal's rate limiting and custom file system level's rate-limiting.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9806
Test Plan: No actual code changes in RocksDB internals
Reviewed By: ajkr
Differential Revision: D35388966
Pulled By: hx235
fbshipit-source-id: 5891c97c3f9184cd221a9ab8536ce8dfa8526c08
Summary:
If FilePrefetchBuffer object is destroyed and then later Poll() calls callback on object which has been destroyed, it gives segfault on accessing destroyed object. It was caught after adding unit tests that tests Posix implementation of ReadAsync and Poll APIs.
This PR also updates and fixes existing IOURing tests which were not running locally because RocksDbIOUringEnable function wasn't defined and IOUring was disabled for those tests
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9777
Test Plan: Added new unit test
Reviewed By: anand1976
Differential Revision: D35254002
Pulled By: akankshamahajan15
fbshipit-source-id: 68e80054ffb14ae25c255920ebc6548ca5f130a1
Summary:
Make `commit_prereq` work and a few other improvements:
* Remove gcc 481 and gcc5xx which are no longer supported
* Remove platform007 which is gone
* `make clean` work for both mac and linux
* `precommit_checker.py` to python3
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9797
Test Plan: `make commit_prereq`
Reviewed By: ajkr
Differential Revision: D35338536
Pulled By: jay-zhuang
fbshipit-source-id: 1e159962ab9d31c43c4b85de7d0f582d3e881ffe
Summary:
Right now, parallelism information passed to "build_tools/rocksdb-lego-determinator no_compression" isn't effective when the test actually runs, as the information is dropped in the middle. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9796
Test Plan: Run "build_tools/rocksdb-lego-determinator no_compression" and execute the command line generated and observe the parallelism.
Reviewed By: jay-zhuang
Differential Revision: D35330085
fbshipit-source-id: e9b32d0520d61fbc2697ebd841099485f64482e3
Summary:
build_tools/rocksdb-lego-determinator is to generate commands for continuous tests. Recently it changed to by default run tests in parallel with parallelism to be number of CPU processors. This sometimes causes out of space when running so many tests in parallel. Reduce the parallelism by half to temporarily work it around.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9788
Test Plan: Run build_tools/rocksdb-lego-determinator and watch generated commands.
Reviewed By: pdillinger
Differential Revision: D35327704
fbshipit-source-id: 95a8c51a111bb6ab62c456c74ab9c905b457ea8f
Summary:
So the build on dev server will work.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9787
Test Plan: `$ make db_basic_bench` on dev server.
Reviewed By: ajkr
Differential Revision: D35295466
Pulled By: jay-zhuang
fbshipit-source-id: 58dccc65bc29e1185b97cbeb7630ed66deb604aa
Summary:
There's an existing benchmark, "getmergeoperands", but it is unconventional in that it has multiple phases and hardcoded setup parameters.
This PR adds a different one, "readrandomoperands", that follows the pattern of other benchmarks of having a single phase and taking its configuration from existing flags.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9785
Test Plan:
```
$ ./db_bench -benchmarks=mergerandom -merge_operator=StringAppendOperator -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -compression_type=none -disable_auto_compactions=true
$ ./db_bench -use_existing_db=true -benchmarks=readrandomoperands -merge_operator=StringAppendOperator -disable_auto_compactions=true -duration=10
...
readrandomoperands : 542.082 micros/op 1844 ops/sec; 0.2 MB/s (11980 of 18999 found)
```
Reviewed By: jay-zhuang
Differential Revision: D35290412
Pulled By: ajkr
fbshipit-source-id: fb367ca614b128cef844a75f0e5d9dd7c3328d85
Summary:
min_log_number_to_keep denotes that the WALs whose numbers are below
this value **will** be deleted by RocksDB.
delete_wals_before will be used by RocksDB if
track_and_verify_wals_in_manifest is set to true. During recovery,
RocksDB uses the info encoded in delete_wals_before to reconstruct its
knowledge about what WALs to expect existing.
If these two tags are not encoded in the same VersionEdit, then it's
possible for min_log_number_to_keep=100 to exist, but
delete_wals_before=100 to be lost due to power failure. Subsequent
recovery will delete 99.log. If the db crashes again, the following
recovery will expect to see 99.log since there is no
delete_wals_before=100 in the MANIFEST, but the WAL is already deleted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9766
Test Plan:
First of all, make check.
Second, format compatibility.
SHORT_TEST=1 ./tools/check_format_compatible.sh
Reviewed By: ltamasi
Differential Revision: D35203623
Pulled By: riversand963
fbshipit-source-id: 45623fc4b4b50d299d5e0f9559a3a4c5e9522c8f
Summary:
Right now we log a wrong error when DB::Open() fails. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9784
Test Plan: CI runs should pass
Reviewed By: ajkr, riversand963
Differential Revision: D35290203
fbshipit-source-id: ffc640afa27f6b0a2382ee153dc43f28d9e242be
Summary:
There is no need to release-and-acquire immediately when no listener is registered. This is
what we have been doing for `NotifyOnFlushBegin()`, `NotifyOnFlushCompleted()`, `NotifyOnCompactionBegin()`,
`NotifyOnCompactionCompleted()`, and some other `NotifyOnXX` methods in event_helpers.cc.
Do the same for `NotifyOnMemTableSealed ()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9758
Test Plan: make check
Reviewed By: jay-zhuang
Differential Revision: D35159552
Pulled By: riversand963
fbshipit-source-id: 6e0aac50bd5c8f506d809b6638c33a7a28d1e87f
Summary:
much needed
Some other minor tweaks also
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9778
Test Plan: existing tests
Reviewed By: ajkr
Differential Revision: D35258195
Pulled By: pdillinger
fbshipit-source-id: 974ddafc23a540aacceb91da72e81593d818f99c
Summary:
This commit was generated using `mgt import`.
pristine code for third-party libraries:
third-party/benchmark
upgrade google benchmark to v1.6.1
contains a local patch that reverts [this](https://github.com/google/benchmark/pull/1227?fbclid=IwAR2CCmIJmjU62SPPQQf_t8kdAsMjYv_Pa_GxabYUOdQpGPZUHKwbnYS_1oE) and changs `enum Flags` to be `enum Flags : uint32_t`.
Reviewed By: chadaustin
Differential Revision: D35136540
fbshipit-source-id: f3662f953cd87956e5e9b767e55e3697f99d3b49
Summary:
In making `SstFileMetaData` inherit from `FileStorageInfo`, I
overlooked setting some `FileStorageInfo` fields when then default
`SstFileMetaData()` ctor is used. This affected `GetLiveFilesMetaData()`.
Also removed some buggy `static_cast<size_t>`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9769
Test Plan: Updated tests
Reviewed By: jay-zhuang
Differential Revision: D35220383
Pulled By: pdillinger
fbshipit-source-id: 05b4ee468258dbd3699517e1124838bf405fe7f8
Summary:
Fixes https://github.com/facebook/rocksdb/issues/9718
The verify_checksums flag of read_options should be passed to the read options used by the BlockFetcher in a couple of cases where it is not at present. It will now happen (but did not, previously) on iteration and on [multi]get, where a fetcher is created as part of the iterate/get call.
This may result in much better performance in a few workloads where the client chooses to remove verification.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9767
Reviewed By: mrambacher
Differential Revision: D35218986
Pulled By: jay-zhuang
fbshipit-source-id: 329d29764bb70fbc7f2673440bc46c107a813bc8