Summary:
- When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1`
- It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090
- I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923
Differential Revision: D13827226
Pulled By: ajkr
fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947
Summary:
Since bzip.org is no longer maintained, download the bzip2 packages from a snapshot taken by the internet archive until we figure
out a more credible source.
Summary:
Followup for #4266. There is one more place in **get_context.cc** where **MergeOperator::ShouldMerge** should be called with reversed list of operands.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4284
Differential Revision: D9380008
Pulled By: sagar0
fbshipit-source-id: 70ec26e607e5b88465e1acbdcd6c6171bd76b9f2
This PR addresses issue #3865 and implements the following approach to fix it:
- adds `MergeContext::GetOperandsDirectionForward` and `MergeContext::GetOperandsDirectionBackward` to query merge operands in a specific order
- `MergeContext::GetOperands` becomes a shortcut for `MergeContext::GetOperandsDirectionForward`
- pass `MergeContext::GetOperandsDirectionBackward` to `MergeOperator::ShouldMerge` and document the order
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4266
Differential Revision: D9360750
Pulled By: sagar0
fbshipit-source-id: 20cb73ff017760b062ecdcf4382560767086e092
Summary:
We add two subcommands `write_extern_sst` and `ingest_extern_sst` to ldb. This PR avoids changing existing code because we hope to cherry-pick to earlier releases to support compatibility check for external SST file ingestion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4205
Differential Revision: D9112711
Pulled By: riversand963
fbshipit-source-id: 7cae88380d4de86da8440230e87eca66755648e4
Summary:
DBImpl::FindObsoleteFiles() may call GetChildren() multiple times if different CFs are on the same path. Fix it.
Closes https://github.com/facebook/rocksdb/pull/3885
Differential Revision: D8084634
Pulled By: siying
fbshipit-source-id: b471fbc251f6a05e9243304dc14c0831060cc0b0
Summary:
Two CI tests never pass because of the environment problem. Delete them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4110
Differential Revision: D8805713
Pulled By: siying
fbshipit-source-id: 6eb4813dc2094ee2045ec8ede7fe8967d546d6e8
Summary:
Now by default, with NewSstFileManager, checkpoints may be corrupted. Disable this feature to avoid this issue.
Closes https://github.com/facebook/rocksdb/pull/4092
Differential Revision: D8729856
Pulled By: siying
fbshipit-source-id: 914c321d6eaf52d8c5981171322d85dd29088307
Summary:
It seems that compilation has been made stricter about unused args.
Closes https://github.com/facebook/rocksdb/pull/4080
Differential Revision: D8712049
Pulled By: sagar0
fbshipit-source-id: 984af1982638af3568aac1a167f565f4741badee
Summary:
-Wshorten-64-to-32 is invalid flag in fbcode. Changing it to -Warrowing.
Closes https://github.com/facebook/rocksdb/pull/4028
Differential Revision: D8553694
Pulled By: yiwu-arbug
fbshipit-source-id: 1523cbcb4c76cf1d2b10a4d28b5f58c78e6cb876
Summary:
This fixes a regression in one of myrocks regression tests (readwhilewriting), introduced in 8bf555f487
This PR changes two lines of code: one of them actually fixes the observed regression, the other is a mostly unrelated small fix that I'm piggy-backing here. EDIT: Nevermind, it fixes one line. More details in inline comments.
Closes https://github.com/facebook/rocksdb/pull/3953
Differential Revision: D8270664
Pulled By: al13n321
fbshipit-source-id: a7d91e196807d1e816551591257c700f70e4ccac
Summary:
https://github.com/facebook/rocksdb/pull/3764 introduced an optimization feature to skip duplicate prefix entires in full bloom filters. Unfortunately it also introduces a bug in partitioned full filters, where the duplicate prefix should still be inserted if it is in a new partition. The patch fixes the bug by resetting the duplicate detection logic each time a partition is cut.
This bug could result into false negatives, which means that DB could skip an existing key.
Closes https://github.com/facebook/rocksdb/pull/4024
Differential Revision: D8518866
Pulled By: maysamyabandeh
fbshipit-source-id: 044f4d988e606a330ecafd8c79daceb68b8796bf
Summary:
A recent change pushed down the upper bound checking to child iterators. However, this causes the logic of following sequence wrong:
Seek(key);
if (!Valid()) SeekToLast();
Because !Valid() may be caused by upper bounds, rather than the end of the iterator. In this case SeekToLast() points to totally wrong places. This can cause wrong results, infinite loops, or segfault in some cases.
This sequence is called when changing direction from forward to backward. And this by itself also implicitly happen during reseeking optimization in Prev().
Fix this bug by using SeekForPrev() rather than this sequuence, as what is already done in prefix extrator case.
Closes https://github.com/facebook/rocksdb/pull/3989
Differential Revision: D8385422
Pulled By: siying
fbshipit-source-id: 429e869990cfd2dc389421e0836fc496bed67bb4
Summary:
Please refer to earlier discussion in [issue 3609](https://github.com/facebook/rocksdb/issues/3609).
There was also an alternative fix in [PR 3888](https://github.com/facebook/rocksdb/pull/3888), but the proposed solution requires complex change.
To summarize the cause of the problem. Upon creation of a column family, a `BlockBasedTableFactory` object is `new`ed and encapsulated by a `std::shared_ptr`. Since there is no other `std::shared_ptr` pointing to this `BlockBasedTableFactory`, when the column family is dropped, the `ColumnFamilyData` is `delete`d, causing the destructor of `std::shared_ptr`. Since there is no other `std::shared_ptr`, the underlying memory is also freed.
Later when the db exits, it releases all the table readers, including the table readers that have been operating on the dropped column family. This needs to access the `table_options` owned by `BlockBasedTableFactory` that has already been deleted. Therefore, a segfault is raised.
Previous workaround is to purge all obsolete files upon `ColumnFamilyData` destruction, which leads to a force release of table readers of the dropped column family. However this does not work when the user disables file deletion.
Our solution in this PR is making a copy of `table_options` in `BlockBasedTable::Rep`. This solution increases memory copy and usage, but is much simpler.
Test plan
```
$ make -j16
$ ./column_family_test --gtest_filter=ColumnFamilyTest.CreateDropAndDestroy:ColumnFamilyTest.CreateDropAndDestroyWithoutFileDeletion
```
Expected behavior:
All tests should pass.
Closes https://github.com/facebook/rocksdb/pull/3898
Differential Revision: D8149421
Pulled By: riversand963
fbshipit-source-id: eaecc2e064057ef607fbdd4cc275874f866c3438
Summary:
Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
* If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
* When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.
However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).
This PR changes the convention to:
* If status() is not ok, Valid() always returns false.
* Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)
This does sacrifice the two use cases listed above, but siying said it's ok.
Overview of the changes:
* A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
* Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
* A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.
To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:
Iterators that didn't need changes:
* status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
* Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.
Iterators with changes (see inline comments for details):
* DBIter - an overhaul:
- It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
- It had a few code paths silently discarding subiterator's status. The stress test caught a few.
- The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
- Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
- It used to not reset status on seek for some types of errors.
- Some simplifications and better comments.
- Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
* MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
* ForwardIterator - changed to the new convention, also slightly simplified.
* ForwardLevelIterator - fixed some bugs and simplified.
* LevelIterator - simplified.
* TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
* BlockBasedTableIterator - minor changes.
* BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
* PlainTableIterator - some seeks used to not reset status.
* CuckooTableIterator - tiny code cleanup.
* ManagedIterator - fixed some bugs.
* BaseDeltaIterator - changed to the new convention and fixed a bug.
* BlobDBIterator - seeks used to not reset status.
* KeyConvertingIterator - some small change.
Closes https://github.com/facebook/rocksdb/pull/3810
Differential Revision: D7888019
Pulled By: al13n321
fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
Summary:
log_ contract specifies that it should not be modified unless both mutex_ and log_write_mutex_ are held. log_.erase however does that with only holding mutex_. This causes a race condition with two_write_queues since logs_.back is read with holding only log_write_mutex_ (which is correct according to logs_ contract) but logs_.erase is called concurrently. This is probably the cause of logs_.back returning nullptr in https://github.com/facebook/rocksdb/issues/3852 although I could not reproduce it.
Fixes https://github.com/facebook/rocksdb/issues/3852
Closes https://github.com/facebook/rocksdb/pull/3859
Differential Revision: D8026103
Pulled By: maysamyabandeh
fbshipit-source-id: ee394e00fe4aa520d884c5ef87981e9d6b5ccb28
Summary:
TSAN reports a false alarm for lock-order-inversion in DBWriteTest.IOErrorOnWALWritePropagateToWriteThreadFollower but Open and FlushWAL are not run concurrently. Suppressing the error by skipping FlushWAL in the test until TSAN is fixed.
The alternative would be to use
```
TSAN_OPTIONS="suppressions=tsan-suppressions.txt" ./db_write_test
```
but it does not seem straightforward to integrate it to our test infra.
Closes https://github.com/facebook/rocksdb/pull/3854
Differential Revision: D8000202
Pulled By: maysamyabandeh
fbshipit-source-id: fde33483d963a7ad84d3145123821f64960a4802
Summary:
This feature was introduced for universal compaction in cc01985d. At that point we thought it'd be used only to prevent long-running universal full compactions from blocking short-lived upper-level compactions. Now we have a level compaction user who could benefit from it since they use more expensive compression algorithm in the bottom level. So enable it for level.
Closes https://github.com/facebook/rocksdb/pull/3835
Differential Revision: D7957179
Pulled By: ajkr
fbshipit-source-id: 177285d2cef3b650b6a4d81dc5db84bc441c9fe4
Summary:
I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`.
```
$ make db_stress -j4
...
tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *')
status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size);
^~~~~
./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here
virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0;
^
1 error generated.
make: *** [tools/db_stress.o] Error 1
make: *** Waiting for unfinished jobs....
```
Closes https://github.com/facebook/rocksdb/pull/3839
Differential Revision: D7979236
Pulled By: sagar0
fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62
Summary:
Currently manual_wal_flush if set in the options will be used only for the wal files created during wal switch. The configuration thus does not affect the first wal file. The patch fixes that and also update the related unit tests.
This PR is built on top of https://github.com/facebook/rocksdb/pull/3756
Closes https://github.com/facebook/rocksdb/pull/3824
Differential Revision: D7909153
Pulled By: maysamyabandeh
fbshipit-source-id: 024ed99d2555db06bf096c902b998e432bb7b9ce
Summary:
The patch clarifies the ownership of the root db after TransactionDB::Open. If it is a success the ownership if with the TransactionDB, and the root db will be deleted when the destructor of the base class, StackableDB, is called. If it is failure, the temporarily created root db will also be deleted properly.
The patch also includes lots of useful formatting changes.
Closes https://github.com/facebook/rocksdb/pull/3714 upon which this patch is built.
Closes https://github.com/facebook/rocksdb/pull/3806
Differential Revision: D7878010
Pulled By: maysamyabandeh
fbshipit-source-id: f54f3942e29434143ae5a2423ceec9c7072cd4c2
Summary:
Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used.
This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously.
Closes https://github.com/facebook/rocksdb/pull/3829
Differential Revision: D7915443
Pulled By: ajkr
fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279
Summary:
- Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option.
- Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default
- Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params.
- Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication
Closes https://github.com/facebook/rocksdb/pull/3809
Differential Revision: D7885779
Pulled By: ajkr
fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8
Summary:
The only use of RandomRW is to change seqno when bulkloading, and in this use case, the file should exist. We should fail the file opening in this case.
Closes https://github.com/facebook/rocksdb/pull/3827
Differential Revision: D7913719
Pulled By: siying
fbshipit-source-id: 62cf6734f1a6acb9e14f715b927da388131c3492
Summary:
Now BlockBasedTableIterator directly uses BlockIter. By making BlockIter final, we can prevent unintended virtual function overriding.
Closes https://github.com/facebook/rocksdb/pull/3828
Differential Revision: D7933816
Pulled By: siying
fbshipit-source-id: 026a08cb5c5b6d3d6f44743152b4251da4756f2c
Summary:
`ReadaheadRandomAccessFile` had an unwritten assumption, which was that its wrapped file's `Read()` function always copies into the provided scratch buffer. Actually this was not true when the wrapped file was `PosixMmapReadableFile`, whose `Read()` implementation does no copying and instead returns a `Slice` pointing directly into the `mmap`'d memory region. This PR:
- prevents `ReadaheadRandomAccessFile` from ever wrapping mmap readable files
- adds an assert for the assumption `ReadaheadRandomAccessFile` makes about the wrapped file's use of scratch buffer
Closes https://github.com/facebook/rocksdb/pull/3813
Differential Revision: D7891513
Pulled By: ajkr
fbshipit-source-id: dc64a55222d6af280c39a1852ee39e9e9d7cde7d
Summary:
People also use ON/OFF, TRUE/FALSE and other switch options that is allowed by cmake.
Closes https://github.com/facebook/rocksdb/pull/3814
Differential Revision: D7899032
Pulled By: ajkr
fbshipit-source-id: b71511af59e0a78eedafb639b5002c47050bf3c2
Summary:
TBBROOT and LIBRARY_PATH are set in env by the script.
With TBB 2018 the library path is $TBBROOT/lib/intel64/gcc4.7 for anything above gcc 4.7, which is both compiler and architecture related. We cannot simply do ${TBB_ROOT_DIR}/lib.
Closes https://github.com/facebook/rocksdb/pull/3815
Differential Revision: D7899006
Pulled By: ajkr
fbshipit-source-id: 159ab1f6a5c40452ed6aa8d79300206953d916c2
Summary:
tsan flavor of this test occasionally times out in our test infra. The patch split the test to two, each working on half of the option range.
Before:
[ OK ] FaultTest/FaultInjectionTest.FaultTest/0 (5918 ms)
[ OK ] FaultTest/FaultInjectionTest.FaultTest/1 (5336 ms)
After:
[ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/0 (2930 ms)
[ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/1 (2676 ms)
[ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/2 (2759 ms)
[ OK ] FaultTest/FaultInjectionTestSplitted.FaultTest/3 (2546 ms)
Closes https://github.com/facebook/rocksdb/pull/3819
Differential Revision: D7894975
Pulled By: maysamyabandeh
fbshipit-source-id: 809f1411cbcc27f8aa71a6b29a16b039f51b67c9
Summary:
The origin commit #3635 will hurt performance for users who aren't using range deletions, because unneeded std::set operations, so it was reverted by commit 44653c7b7a. (see #3672)
To fix this, move the set to and add a check in , i.e., file will be added only if is non-nullptr.
The db_bench command which find the performance regression:
> ./db_bench --benchmarks=fillrandom,seekrandomwhilewriting --threads=1 --num=1000000 --reads=150000 --key_size=66 > --value_size=1262 --statistics=0 --compression_ratio=0.5 --histogram=1 --seek_nexts=1 --stats_per_interval=1 > --stats_interval_seconds=600 --max_background_flushes=4 --num_multi_db=1 --max_background_compactions=16 --seed=1522388277 > -write_buffer_size=1048576 --level0_file_num_compaction_trigger=10000 --compression_type=none
Before and after the modification, I re-run this command on the machine, the results of are as follows:
**fillrandom**
Table | P50 | P75 | P99 | P99.9 | P99.99 |
---- | --- | --- | --- | ----- | ------ |
before commit | 5.92 | 8.57 | 19.63 | 980.97 | 12196.00 |
after commit | 5.91 | 8.55 | 19.34 | 965.56 | 13513.56 |
**seekrandomwhilewriting**
Table | P50 | P75 | P99 | P99.9 | P99.99 |
---- | --- | --- | --- | ----- | ------ |
before commit | 1418.62 | 1867.01 | 3823.28 | 4980.99 | 9240.00 |
after commit | 1450.54 | 1880.61 | 3962.87 | 5429.60 | 7542.86 |
Closes https://github.com/facebook/rocksdb/pull/3800
Differential Revision: D7874245
Pulled By: ajkr
fbshipit-source-id: 2e8bec781b3f7399246babd66395c88619534a17
Summary:
In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed.
Closes https://github.com/facebook/rocksdb/pull/3804
Differential Revision: D7874694
Pulled By: ajkr
fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b
Summary:
Currently HarnessTest.Randomized is already split but some of the splits are faster than the others. The reason is that each split takes a continuous range of the generated args and the test with later args takes longer to finish. The patch evenly split the args among splits in a round robin fashion.
Before:
```
[ OK ] HarnessTest.Randomized1n2 (2278 ms)
[ OK ] HarnessTest.Randomized3n4 (1095 ms)
[ OK ] HarnessTest.Randomized5 (658 ms)
[ OK ] HarnessTest.Randomized6 (1258 ms)
[ OK ] HarnessTest.Randomized7 (6476 ms)
[ OK ] HarnessTest.Randomized8 (8182 ms)
```
After
```
[ OK ] HarnessTest.Randomized1 (2649 ms)
[ OK ] HarnessTest.Randomized2 (2645 ms)
[ OK ] HarnessTest.Randomized3 (2577 ms)
[ OK ] HarnessTest.Randomized4 (2490 ms)
[ OK ] HarnessTest.Randomized5 (2553 ms)
[ OK ] HarnessTest.Randomized6 (2560 ms)
[ OK ] HarnessTest.Randomized7 (2501 ms)
[ OK ] HarnessTest.Randomized8 (2574 ms)
```
Closes https://github.com/facebook/rocksdb/pull/3808
Differential Revision: D7882663
Pulled By: maysamyabandeh
fbshipit-source-id: 09b749a9684b6d7d65466aa4b00c5334a49e833e