Summary:
In the current MultiGet, if the KV-pairs do not belong to the data blocks in the block cache, multiple blocks are read from a SST. It will trigger one block read for each block request and read them in parallel. In some cases, if some data blocks are adjacent in the SST, the reads for these blocks can be combined to a single large read, which can reduce the system calls and reduce the read latency if possible.
Considering to fill the block cache, if multiple data blocks are in the same memory buffer, we need to copy them to the heap separately. Therefore, only in the case that 1) data block compression is enabled, and 2) compressed block cache is null, we can do combined read. Otherwise, extra memory copy is needed, which may cause extra overhead. In the current case, data blocks will be uncompressed to a new memory space.
Also, in the case that 1) data block compression is enabled, and 2) compressed block cache is null, it is possible the data block is actually not compressed. In the current logic, these data blocks will not be added to the uncompressed_cache. So if memory buffer is shared and the data block is not compressed, the data block are copied to the head and fill the cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6089
Test Plan: Added test case to ParallelIO.MultiGet. Pass make asan_check
Differential Revision: D18734668
Pulled By: zhichao-cao
fbshipit-source-id: 67c5615ed373e51e42635fd74b36f8f3a66d5da4
Summary:
Several options are trivially added to crash test and random values are picked.
Made simple test run non-dynamic level and normal test run dynamic level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176
Test Plan: Run crash_test and watch the printing
Differential Revision: D19053955
fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502
Summary:
compaction history is stored in manifest files. Preserve all of them in db_stress would help debugging.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6142
Test Plan: Run db_stress and observe that manifest files are preserved. Run whole crash_test and see how DB directory looks like.
Differential Revision: D19047026
fbshipit-source-id: f83c3e0bb5332b1b4768be5dcee56a24f9b760a9
Summary:
In the current db_stress, all the keys are generated randomly and follows the uniform distribution. In order to test some corner cases that some key are always updated or read, we need to generate the key based on other distributions. In this PR, the key is generated based on Zipfian distribution and the skewness can be controlled by setting hot_key_alpha (0.8 to 1.5 is suggested). The larger hot_key_alpha is, the more skewed will be. Not that, usually, if hot_key_alpha is larger than 2, there might be only 1 or 2 keys that are generated. If hot_key_alpha is 0, it generate the key follows uniform distribution (random key)
Testing plan: pass the db_stress and printed the keys to make sure it follows the distribution.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6163
Differential Revision: D18978480
Pulled By: zhichao-cao
fbshipit-source-id: e123b4865477f7478e83fb581f9576bada334680
Summary:
https://github.com/facebook/rocksdb/pull/6177 introduced a data race
involving `MemTableList::InstallNewVersion` and `MemTableList::NumFlushed`.
The patch fixes this by caching whether the current version has any
memtable history (i.e. flushed memtables that are kept around for
transaction conflict checking) in an `std::atomic<bool>` member called
`current_has_history_`, similarly to how `current_memory_usage_excluding_last_`
is handled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6187
Test Plan:
```
make clean
COMPILE_WITH_TSAN=1 make db_test -j24
./db_test
```
Differential Revision: D19084059
Pulled By: ltamasi
fbshipit-source-id: 327a5af9700fb7102baea2cc8903c085f69543b9
Summary:
The filter bits builder collects all the hashes to add in memory before adding them (because the number of keys is not known until we've walked over all the keys). Existing code uses a std::vector for this, which can mean up to 2x than necessary space allocated (and not freed) and up to ~2x write amplification in memory. Using std::deque uses close to minimal space (for large filters, the only time it matters), no write amplification, frees memory while building, and no need for large contiguous memory area. The only cost is more calls to allocator, which does not appear to matter, at least in benchmark test.
For now, this change only applies to the new (format_version=5) Bloom filter implementation, to ease before-and-after comparison downstream.
Temporary memory use during build is about the only way the new Bloom filter could regress vs. the old (because of upgrade to 64-bit hash) and that should only matter for full filters. This change should largely mitigate that potential regression.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6175
Test Plan:
Using filter_bench with -new_builder option and 6M keys per filter is like large full filter (improvement). 10k keys and no -new_builder is like partitioned filters (about the same). (Corresponding configurations run simultaneously on devserver.)
std::vector impl (before)
$ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
average_keys_per_filter=6000000
Build avg ns/key: 52.2027
Maximum resident set size (kbytes): 1105016
$ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
average_keys_per_filter=10000
Build avg ns/key: 30.5694
Maximum resident set size (kbytes): 1208152
std::deque impl (after)
$ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
average_keys_per_filter=6000000
Build avg ns/key: 39.0697
Maximum resident set size (kbytes): 1087196
$ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
average_keys_per_filter=10000
Build avg ns/key: 30.9348
Maximum resident set size (kbytes): 1207980
Differential Revision: D19053431
Pulled By: pdillinger
fbshipit-source-id: 2888e748723a19d9ea40403934f13cbb8483430c
Summary:
Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171
Test Plan:
```
make -j32 crash_test
Differential Revision: D19038399
Pulled By: maysamyabandeh
fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0
Summary:
We have observed an increase in CPU load caused by frequent calls to
`ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
when using `max_write_buffer_size_to_maintain` to limit the amount of
memtable history maintained for transaction conflict checking. Part of the issue
is that trimming can potentially be scheduled even if there is no memtable
history. The patch adds a check that fixes this.
See also https://github.com/facebook/rocksdb/pull/6169.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6177
Test Plan:
Compared `perf` output for
```
./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
```
before and after the change. There is a significant reduction for the call chain
`rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` ->
`rocksdb::ThreadLocalPtr::StaticMeta::Scrape` even without https://github.com/facebook/rocksdb/pull/6169.
Differential Revision: D19057445
Pulled By: ltamasi
fbshipit-source-id: dff81882d7b280e17eda7d9b072a2d4882c50f79
Summary:
Close asserts that there is no unreleased snapshots. For WritePrepared transaction, this means that the background work that holds on a snapshot must be canceled first. Update the stress tests to respect the sequence.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6174
Test Plan:
```
make -j32 crash_test
Differential Revision: D19057322
Pulled By: maysamyabandeh
fbshipit-source-id: c9e9e24f779bbfb0ab72c2717e34576c01bc6362
Summary:
Periodic compactions ensure that even SSTs that do not get picked up
otherwise eventually go through compaction; used in conjunction with
BlobDB's garbage collection, they enable BlobDB to reclaim space when
old blob files are used by such straggling SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6172
Test Plan: Ran `make check` and used the BlobDB mode of `db_bench`.
Differential Revision: D19045045
Pulled By: ltamasi
fbshipit-source-id: 04636ecc4b6cfe8d495bf656faa65d54a5eb1a93
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
Differential Revision: D18868376
Pulled By: anand1976
fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
Summary:
And clean up related code, especially in stress test.
(More clean up of db_stress_test_base.cc coming after this.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154
Test Plan: make check, make blackbox_crash_test for a bit
Differential Revision: D18938180
Pulled By: pdillinger
fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e
Summary:
We have observed an increase in CPU load caused by frequent calls to
`ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
when using `max_write_buffer_size_to_maintain` to limit the amount of
memtable history maintained for transaction conflict checking. As it turns out,
this is caused by the code creating and installing a new `SuperVersion` even if
no memtables were actually trimmed. The patch adds a check to avoid this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6169
Test Plan:
Compared `perf` output for
```
./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
```
before and after the change. With the fix, the call chain `rocksdb::DBImpl::TrimMemtableHistory` ->
`rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape`
no longer registers in the `perf` report.
Differential Revision: D19031509
Pulled By: ltamasi
fbshipit-source-id: 02686fce594e5b50eba0710e4b28a9b808c8aa20
Summary:
With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164
Test Plan:
```
make -j32 crash_test_with_txn
Differential Revision: D18991899
Pulled By: maysamyabandeh
fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
Summary:
The patch adds logic that relocates live blobs from the oldest N non-TTL
blob files as they are encountered during compaction (assuming the BlobDB
configuration option `enable_garbage_collection` is `true`), where N is defined
as the number of immutable non-TTL blob files multiplied by the value of
a new BlobDB configuration option called `garbage_collection_cutoff`.
(The default value of this parameter is 0.25, that is, by default the valid blobs
residing in the oldest 25% of immutable non-TTL blob files are relocated.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6121
Test Plan: Added unit test and tested using the BlobDB mode of `db_bench`.
Differential Revision: D18785357
Pulled By: ltamasi
fbshipit-source-id: 8c21c512a18fba777ec28765c88682bb1a5e694e
Summary:
It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI.
This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early.
fixed https://github.com/facebook/rocksdb/issues/5982
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6147
Differential Revision: D18926378
fbshipit-source-id: 1dff6d068c603d012b81446812368bfee95a5e15
Summary:
Read keys from a snapshot that a range deletion were added after the snapshot was created and this range deletion was inside an immutable memtable, we will get wrong key set.
More detail rest in codes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6062
Differential Revision: D18966785
Pulled By: pdillinger
fbshipit-source-id: 38a60bb1e2d0a1dbfc8ec641617200b6a02b86c3
Summary:
**Summary:**
This PR fixes two unordered_write related issues:
- ingestion job may skip the necessary memtable flush https://github.com/facebook/rocksdb/issues/6026
- compact range may cause memtable is flushed before pending unordered write finished
1. `CompactRange` triggers memtable flush but doesn't wait for pending-writes
2. there are some pending writes but memtable is already flushed
3. the memtable related WAL is removed( note that the pending-writes were recorded in that WAL).
4. pending-writes write to newer created memtable
5. there is a restart
6. missing the previous pending-writes because WAL is removed but they aren't included in SST.
**How to solve:**
- Wait pending memtable writes before ingestion job check memtable key range
- Wait pending memtable writes before flush memtable.
**Note that: `CompactRange` calls `RangesOverlapWithMemtables` too without waiting for pending waits, but I'm not sure whether it affects the correctness.**
**Test Plan:**
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6113
Differential Revision: D18895674
Pulled By: maysamyabandeh
fbshipit-source-id: da22b4476fc7e06c176020e7cc171eb78189ecaf
Summary:
While the instruction of installing "make format" dependencies works on some platforms, it is hard to use for some others. Improve it a little bit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6162
Test Plan: Run "make format" on an envrionment missing the dependencies and see the instructions printed out
Differential Revision: D18970773
fbshipit-source-id: fd21b31053407cc171a6675f781a556a1c3e8945
Summary:
This change fixes a source issue that caused compile time error which breaks build for many fbcode services in that setup. The size() member function of channel is a const member, so member variables accessed within it are implicitly const as well. This caused error when clang fails to resolve to a constructor that takes std::mutex because the suitable constructor got rejected due to loss of constness for its argument. The fix is to add mutable modifier to the lock_ member of channel.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6161
Differential Revision: D18967685
Pulled By: maysamyabandeh
fbshipit-source-id: 698b6a5153c3c92eeacb842c467aa28cc350d432
Summary:
Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158
Test Plan:
```
make -j32 crash_test_with_txn
```
Differential Revision: D18946307
Pulled By: maysamyabandeh
fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950
Summary:
This test was recently updated but failed to account for Bloom
schema variance by CACHE_LINE_SIZE. (Since CACHE_LINE_SIZE is not
defined in our C code, the test now simply allows a valid result for any
CACHE_LINE_SIZE, not just the current one.)
Unblock https://github.com/facebook/rocksdb/issues/5932
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6153
Test Plan:
ran unit test with builds TEST_CACHE_LINE_SIZE=128, =256, and
unset (64 on Intel)
Differential Revision: D18936015
Pulled By: pdillinger
fbshipit-source-id: e5e3852f95283d34d624632c1ae8d3adb2f2662c
Summary:
`low_pri_write_rate_limiter_` is not being used. Removing. `WriteController` has an internal low_pri rate limiter which is the real rate limiter for low-pri writes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6068
Test Plan: make
Differential Revision: D18664120
fbshipit-source-id: dfe3e4de033cf3522b67781b383aad7d0936034c
Summary:
Adds example to show the difference of reading from snapshot and from the latest state.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6059
Test Plan: cd examples && make transaction_example && ./transaction_example
Differential Revision: D18797616
fbshipit-source-id: f17a2cb12187092ea243159e6ccf55790859e0c0
Summary:
Add SyncWAL to db_stress. Specify with `-sync_wal_one_in=N` so that it will be
called once every N operations on average.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6149
Test Plan:
```
$make db_stress
$./db_stress -sync_wal_one_in=100 -ops_per_thread=100000
```
Differential Revision: D18922529
Pulled By: riversand963
fbshipit-source-id: 4c0b8cb8fa21852722cffd957deddf688f12ea56
Summary:
CancelAllBackgroundWork() and Close() are frequently used features but we don't cover it in stress test. Simply execute them before closing the DB with 1/2 chance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6141
Test Plan: Run "db_stress".
Differential Revision: D18900861
fbshipit-source-id: 49b46ccfae120d0f9de3e0543b82fb6d715949d0
Summary:
This is required to compile on Windows with Visual Studio 2015, which is used for creating the RocksJava releases.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5446
Differential Revision: D18924811
fbshipit-source-id: a183a62e79a2af5aaf59cd08235458a172fe7dcb
Summary:
Add an option to explicitly disable building shared versions of the
RocksDB libraries. The shared libraries cannot be built in cases where
some dependencies are only available as static libraries. This allows
still building RocksDB in these situations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6122
Differential Revision: D18920740
fbshipit-source-id: d24f66d93c68a1e65635e6e0b663bae62c903bca
Summary:
Right now, in db_stress, compact range is simply executed without any immediate data validation. Add a simply validation which compares hash for all keys within the compact range to stay the same against the same snapshot before and after the compaction.
Also, randomly tune most knobs of CompactRangeOptions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6140
Test Plan: Run db_stress with "--compact_range_one_in=2000 --compact_range_width=100000000" for a while. Manually ingest some hacky code and observe the error path.
Differential Revision: D18900230
fbshipit-source-id: d96e75bc8c38dd5ec702571ffe7cf5f4ea93ee10
Summary:
Error message when running `make` on Mac OS with master branch (v6.6.0):
```
$ make
$DEBUG_LEVEL is 1
Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
#include <folly/synchronization/WaitOptions.h>
^
1 error generated.
third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
#include <folly/synchronization/ParkingLot.h>
^
1 error generated.
third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
#include <folly/synchronization/DistributedMutex.h>
^
1 error generated.
third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
#include <folly/synchronization/AtomicNotification.h>
^
1 error generated.
third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
#include <folly/detail/Futex.h>
^
1 error generated.
GEN util/build_version.cc
$DEBUG_LEVEL is 1
Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
#include <folly/synchronization/WaitOptions.h>
^
1 error generated.
third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
#include <folly/synchronization/ParkingLot.h>
^
1 error generated.
third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
#include <folly/synchronization/DistributedMutex.h>
^
1 error generated.
third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
#include <folly/synchronization/AtomicNotification.h>
^
1 error generated.
third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
#include <folly/detail/Futex.h>
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6145
Differential Revision: D18910812
fbshipit-source-id: 5a4475466c2d0601657831a0b48d34316b2f0816
Summary:
Especially with non-integral bits/key now supported,
db_crashtest should vary the bloom_bits configuration. The probabilities
look like this:
1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
chance of 0 which disables the bloom filter.
1/2 chance of a float from a lognormal distribution with a median of 10.
This always produces positive values but with a decent chance of < 1
(overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
implementation limits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103
Test Plan:
start 'make blackbox_crash_test' several times and look at
configuration output
Differential Revision: D18734877
Pulled By: pdillinger
fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
Summary:
Formatter somehow complains some recent lines changed. Apply them to make the formatter happy.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6138
Test Plan: See CI passes.
Differential Revision: D18895950
fbshipit-source-id: 7d1696cf3e3a682bc10a30cdca748a23c6565255
Summary:
Two changes:
1. Prevent static variables in a header file
2. Add "override" keyword when virtual functions are overridden.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6137
Test Plan: Build db_stress with or without LITE.
Differential Revision: D18892007
fbshipit-source-id: 295356427a34473b23ed36d6ed4ef3ae35a32db0
Summary:
Add overrides needed in FilterPolicy wrapper to fix
rocksdb_filterpolicy_create_bloom_full (see issue https://github.com/facebook/rocksdb/issues/6129). Re-enabled
assertion in BloomFilterPolicy::CreateFilter that was being violated.
Expanded c_test to identify Bloom filter implementations by FP counts.
(Without the fix, updated test will trigger assertion and fail otherwise
without the assertion.)
Fixes https://github.com/facebook/rocksdb/issues/6129
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6132
Test Plan: updated c_test, also run under valgrind.
Differential Revision: D18864911
Pulled By: pdillinger
fbshipit-source-id: 08e81d7b5368b08e501cd402ef5583f2650c19fa
Summary:
thread_local_test now fails because it asserts no thread local instance is created when the test started. However, right now a thread local instance might be created when creating PosixEnv as a static variable. Fix the test by relaxing the assumption of starting from 0.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6136
Test Plan: Find an environment where the test fails, and see it passes with the fix applied.
Differential Revision: D18889224
fbshipit-source-id: 7946f3bfea81d236f7bb1554076696705b211b92
Summary:
BlockBasedTableBuilder uses ExtractUserKey in EnterUnbuffered. This would
cause index filter building error, since user-provided timestamp is supported
by ExtractUserKeyAndStripTimestamp, and it's used in Add. This commit changes
ExtractUserKey to ExtractUserKeyAndStripTimestamp.
A test case is also added by modifying DBBasicTestWithTimestampWithParam_
PutAndGet test in db_basic_test to cover ExtractUserKeyAndStripTimestamp usage
in both kBuffered and kUnbuffered state of BlockBasedTableBuilder.
Before the ExtractUserKeyAndStripTimstamp fix:
```
$ ./db_basic_test --gtest_filter="*PutAndGet*"
Note: Google Test filter = *PutAndGet*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam
[ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
[ FAILED ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0, where GetParam() = false (1177 ms)
[ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1
[ OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 (1056 ms)
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam (2233 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (2233 ms total)
[ PASSED ] 1 test.
[ FAILED ] 1 test, listed below:
[ FAILED ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0, where GetParam() = false
1 FAILED TEST
```
After the ExtractUserKeyAndStripTimstamp fix:
```
$ ./db_basic_test --gtest_filter="*PutAndGet*"
Note: Google Test filter = *PutAndGet*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam
[ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0
[ OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0 (1417 ms)
[ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1
[ OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 (1041 ms)
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam (2458 ms total)
[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (2458 ms total)
[ PASSED ] 2 tests.
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6100
Differential Revision: D18769654
Pulled By: riversand963
fbshipit-source-id: 76c2cf2c9a5e0d85db95d98e812e6af0c2a15c6b
Summary:
ASAN reports:
internal_repo_rocksdb/repo:db_test - MultiThreaded/MultiThreadedDBTest.MultiThreaded/43: fatal
==2692739==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6130000500ca at pc 0x0000006be780 bp 0x7efef85ccd20 sp 0x7efef85cc4d0
[CONTEXT] === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
[CONTEXT] READ of size 331 at 0x6130000500ca thread T195
[CONTEXT] #0 db_test_bin+0x6be77f __interceptor_strlen.part.35
[CONTEXT] https://github.com/facebook/rocksdb/issues/1 internal_repo_rocksdb/repo/include/rocksdb/slice.h:55 rocksdb::Slice::Slice(char const*)
[CONTEXT] https://github.com/facebook/rocksdb/issues/2 internal_repo_rocksdb/repo/env/io_posix.cc:522 rocksdb::PosixRandomAccessFile::MultiRead(rocksdb::ReadRequest*, unsigned long)
I looked at env/io_posix.cc:522 but don't see a reason why the line needs to be there at all, because it is not used before overwritten. So it must be a line that is put there as a bug. Remove it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6135
Test Plan: Rerun the same test which passes after the fix. Run all the tests and make sure they all pass.
Differential Revision: D18880251
fbshipit-source-id: 3b84ac6a05b67b529c4202e0ceb4c047460f44f2