Commit Graph

8583 Commits

Author SHA1 Message Date
Zhichao Cao
cddd637997 Merge adjacent file block reads in RocksDB MultiGet() and Add uncompressed block to cache (#6089)
Summary:
In the current MultiGet, if the KV-pairs do not belong to the data blocks in the block cache, multiple blocks are read from a SST. It will trigger one block read for each block request and read them in parallel. In some cases, if some data blocks are adjacent in the SST, the reads for these blocks can be combined to a single large read, which can reduce the system calls and reduce the read latency if possible.

Considering to fill the block cache, if multiple data blocks are in the same memory buffer, we need to copy them to the heap separately. Therefore, only in the case that 1) data block compression is enabled, and 2) compressed block cache is null, we can do combined read. Otherwise, extra memory copy is needed, which may cause extra overhead. In the current case, data blocks will be uncompressed to a new memory space.

Also, in the case that 1) data block compression is enabled, and 2) compressed block cache is null, it is possible the data block is actually not compressed. In the current logic, these data blocks will not be added to the uncompressed_cache. So if memory buffer is shared and the data block is not compressed, the data block are copied to the head and fill the cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6089

Test Plan: Added test case to ParallelIO.MultiGet. Pass make asan_check

Differential Revision: D18734668

Pulled By: zhichao-cao

fbshipit-source-id: 67c5615ed373e51e42635fd74b36f8f3a66d5da4
2019-12-16 16:26:03 -08:00
sdong
bcc372c0c3 Add some new options to crash_test (#6176)
Summary:
Several options are trivially added to crash test and random values are picked.
Made simple test run non-dynamic level and normal test run dynamic level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176

Test Plan: Run crash_test and watch the printing

Differential Revision: D19053955

fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502
2019-12-16 15:43:13 -08:00
Levi Tamasi
2d095b4dbc Update HISTORY.md with the recent memtable trimming fixes
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6194

Differential Revision: D19125292

Pulled By: ltamasi

fbshipit-source-id: d41aca2755ec4bec07feedd6b561e8d18606a931
2019-12-16 15:19:52 -08:00
sdong
35126dd874 db_stress: preserve all historic manifest files (#6142)
Summary:
compaction history is stored in manifest files. Preserve all of them in db_stress would help debugging.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6142

Test Plan: Run db_stress and observe that manifest files are preserved. Run whole crash_test and see how DB directory looks like.

Differential Revision: D19047026

fbshipit-source-id: f83c3e0bb5332b1b4768be5dcee56a24f9b760a9
2019-12-16 14:32:34 -08:00
Zhichao Cao
fbda25f57a db_stress: generate the key based on Zipfian distribution (hot key) (#6163)
Summary:
In the current db_stress, all the keys are generated randomly and follows the uniform distribution. In order to test some corner cases that some key are always updated or read, we need to generate the key based on other distributions. In this PR, the key is generated based on Zipfian distribution and the skewness can be controlled by setting hot_key_alpha (0.8 to 1.5 is suggested). The larger hot_key_alpha is, the more skewed will be. Not that, usually, if hot_key_alpha is larger than 2, there might be only 1 or 2 keys that are generated. If hot_key_alpha is 0, it generate the key follows uniform distribution (random key)

Testing plan: pass the db_stress and printed the keys to make sure it follows the distribution.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6163

Differential Revision: D18978480

Pulled By: zhichao-cao

fbshipit-source-id: e123b4865477f7478e83fb581f9576bada334680
2019-12-16 14:01:58 -08:00
Levi Tamasi
db7c687523 Fix a data race related to memtable trimming (#6187)
Summary:
https://github.com/facebook/rocksdb/pull/6177 introduced a data race
involving `MemTableList::InstallNewVersion` and `MemTableList::NumFlushed`.
The patch fixes this by caching whether the current version has any
memtable history (i.e. flushed memtables that are kept around for
transaction conflict checking) in an `std::atomic<bool>` member called
`current_has_history_`, similarly to how `current_memory_usage_excluding_last_`
is handled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6187

Test Plan:
```
make clean
COMPILE_WITH_TSAN=1 make db_test -j24
./db_test
```

Differential Revision: D19084059

Pulled By: ltamasi

fbshipit-source-id: 327a5af9700fb7102baea2cc8903c085f69543b9
2019-12-16 13:16:31 -08:00
Peter Dillinger
a92bd0a183 Optimize memory and CPU for building new Bloom filter (#6175)
Summary:
The filter bits builder collects all the hashes to add in memory before adding them (because the number of keys is not known until we've walked over all the keys). Existing code uses a std::vector for this, which can mean up to 2x than necessary space allocated (and not freed) and up to ~2x write amplification in memory. Using std::deque uses close to minimal space (for large filters, the only time it matters), no write amplification, frees memory while building, and no need for large contiguous memory area. The only cost is more calls to allocator, which does not appear to matter, at least in benchmark test.

For now, this change only applies to the new (format_version=5) Bloom filter implementation, to ease before-and-after comparison downstream.

Temporary memory use during build is about the only way the new Bloom filter could regress vs. the old (because of upgrade to 64-bit hash) and that should only matter for full filters. This change should largely mitigate that potential regression.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6175

Test Plan:
Using filter_bench with -new_builder option and 6M keys per filter is like large full filter (improvement). 10k keys and no -new_builder is like partitioned filters (about the same). (Corresponding configurations run simultaneously on devserver.)

std::vector impl (before)

    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
    average_keys_per_filter=6000000
    Build avg ns/key: 52.2027
    Maximum resident set size (kbytes): 1105016
    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
    average_keys_per_filter=10000
    Build avg ns/key: 30.5694
    Maximum resident set size (kbytes): 1208152

std::deque impl (after)

    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -new_builder -working_mem_size_mb=1000 -
    average_keys_per_filter=6000000
    Build avg ns/key: 39.0697
    Maximum resident set size (kbytes): 1087196
    $ /usr/bin/time -v ./filter_bench -impl=2 -quick -working_mem_size_mb=1000 -
    average_keys_per_filter=10000
    Build avg ns/key: 30.9348
    Maximum resident set size (kbytes): 1207980

Differential Revision: D19053431

Pulled By: pdillinger

fbshipit-source-id: 2888e748723a19d9ea40403934f13cbb8483430c
2019-12-15 21:31:08 -08:00
anand76
ad34faba15 Fix unity test (#6178)
Summary:
Fix the test failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6178

Differential Revision: D19071208

Pulled By: maysamyabandeh

fbshipit-source-id: 71622832ac93ff2663946c546d9642d5b9e3d194
2019-12-14 15:39:41 -08:00
Maysam Yabandeh
4b97812da8 Add long-running snapshots to stress tests (#6171)
Summary:
Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171

Test Plan:
```
make -j32 crash_test

Differential Revision: D19038399

Pulled By: maysamyabandeh

fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0
2019-12-14 15:22:40 -08:00
Levi Tamasi
bd8404feff Do not schedule memtable trimming if there is no history (#6177)
Summary:
We have observed an increase in CPU load caused by frequent calls to
`ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
when using `max_write_buffer_size_to_maintain` to limit the amount of
memtable history maintained for transaction conflict checking. Part of the issue
is that trimming can potentially be scheduled even if there is no memtable
history. The patch adds a check that fixes this.

See also https://github.com/facebook/rocksdb/pull/6169.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6177

Test Plan:
Compared `perf` output for

```
./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
```

before and after the change. There is a significant reduction for the call chain
`rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` ->
`rocksdb::ThreadLocalPtr::StaticMeta::Scrape` even without https://github.com/facebook/rocksdb/pull/6169.

Differential Revision: D19057445

Pulled By: ltamasi

fbshipit-source-id: dff81882d7b280e17eda7d9b072a2d4882c50f79
2019-12-13 19:11:19 -08:00
Maysam Yabandeh
349bd3ed82 CancelAllBackgroundWork before Close in db stress (#6174)
Summary:
Close asserts that there is no unreleased snapshots. For WritePrepared transaction, this means that the background work that holds on a snapshot must be canceled first. Update the stress tests to respect the sequence.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6174

Test Plan:
```
make -j32 crash_test

Differential Revision: D19057322

Pulled By: maysamyabandeh

fbshipit-source-id: c9e9e24f779bbfb0ab72c2717e34576c01bc6362
2019-12-13 18:22:50 -08:00
Adam Retter
edbf0e2d90 Env should also load the native library (#6167)
Summary:
Closes https://github.com/facebook/rocksdb/issues/6118
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6167

Differential Revision: D19053577

Pulled By: pdillinger

fbshipit-source-id: 86aca9a5bec0947a641649b515da17b3cb12bdde
2019-12-13 16:27:55 -08:00
Levi Tamasi
0d2172f128 Make it possible to enable periodic compactions for BlobDB (#6172)
Summary:
Periodic compactions ensure that even SSTs that do not get picked up
otherwise eventually go through compaction; used in conjunction with
BlobDB's garbage collection, they enable BlobDB to reclaim space when
old blob files are used by such straggling SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6172

Test Plan: Ran `make check` and used the BlobDB mode of `db_bench`.

Differential Revision: D19045045

Pulled By: ltamasi

fbshipit-source-id: 04636ecc4b6cfe8d495bf656faa65d54a5eb1a93
2019-12-13 16:13:25 -08:00
anand76
afa2420c2b Introduce a new storage specific Env API (#5761)
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.

This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.

The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.

This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.

The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761

Differential Revision: D18868376

Pulled By: anand1976

fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
2019-12-13 14:48:41 -08:00
Peter Dillinger
58d46d1915 Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154)
Summary:
And clean up related code, especially in stress test.

(More clean up of db_stress_test_base.cc coming after this.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154

Test Plan: make check, make blackbox_crash_test for a bit

Differential Revision: D18938180

Pulled By: pdillinger

fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e
2019-12-13 14:30:14 -08:00
Levi Tamasi
6d54eb3dc2 Do not create/install new SuperVersion if nothing was deleted during memtable trim (#6169)
Summary:
We have observed an increase in CPU load caused by frequent calls to
`ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
when using `max_write_buffer_size_to_maintain` to limit the amount of
memtable history maintained for transaction conflict checking. As it turns out,
this is caused by the code creating and installing a new `SuperVersion` even if
no memtables were actually trimmed. The patch adds a check to avoid this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6169

Test Plan:
Compared `perf` output for

```
./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
```

before and after the change. With the fix, the call chain `rocksdb::DBImpl::TrimMemtableHistory` ->
`rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape`
no longer registers in the `perf` report.

Differential Revision: D19031509

Pulled By: ltamasi

fbshipit-source-id: 02686fce594e5b50eba0710e4b28a9b808c8aa20
2019-12-13 13:29:29 -08:00
Kefu Chai
ac304adf46 cmake: do not build tests for Release build and cleanups (#5916)
Summary:
fixes https://github.com/facebook/rocksdb/issues/2445
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5916

Differential Revision: D19031236

fbshipit-source-id: bc3107b6b25a01958677d7cb411b1f381aae91c6
2019-12-13 12:48:06 -08:00
Maysam Yabandeh
fec7302a9d Enable unordered_write in stress tests (#6164)
Summary:
With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164

Test Plan:
```
make -j32 crash_test_with_txn

Differential Revision: D18991899

Pulled By: maysamyabandeh

fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
2019-12-13 10:25:04 -08:00
Levi Tamasi
583c6953d8 Move out valid blobs from the oldest blob files during compaction (#6121)
Summary:
The patch adds logic that relocates live blobs from the oldest N non-TTL
blob files as they are encountered during compaction (assuming the BlobDB
configuration option `enable_garbage_collection` is `true`), where N is defined
as the number of immutable non-TTL blob files multiplied by the value of
a new BlobDB configuration option called `garbage_collection_cutoff`.
(The default value of this parameter is 0.25, that is, by default the valid blobs
residing in the oldest 25% of immutable non-TTL blob files are relocated.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6121

Test Plan: Added unit test and tested using the BlobDB mode of `db_bench`.

Differential Revision: D18785357

Pulled By: ltamasi

fbshipit-source-id: 8c21c512a18fba777ec28765c88682bb1a5e694e
2019-12-13 10:13:05 -08:00
Jermy Li
c2029f9716 Support concurrent CF iteration and drop (#6147)
Summary:
It's easy to cause coredump when closing ColumnFamilyHandle with unreleased iterators, especially iterators release is controlled by java GC when using JNI.

This patch fixed concurrent CF iteration and drop, we let iterators(actually SuperVersion) hold a ColumnFamilyData reference to prevent the CF from being released too early.

fixed https://github.com/facebook/rocksdb/issues/5982
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6147

Differential Revision: D18926378

fbshipit-source-id: 1dff6d068c603d012b81446812368bfee95a5e15
2019-12-12 19:04:48 -08:00
myasuka
4b74035e40 Correct java docs of RocksDB options (#6123)
Summary:
Correct javadocs of several RocksDB option classes to not mislead RocksJava users.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6123

Differential Revision: D18989044

Pulled By: pdillinger

fbshipit-source-id: a5ac6a415e5311084b10d973d354e6925788f01e
2019-12-12 18:10:03 -08:00
奏之章
c4ce8e637f Fix RangeDeletion bug (#6062)
Summary:
Read keys from a snapshot that a range deletion were added after the snapshot  was created and this range deletion was inside an immutable memtable, we will get wrong key set.
More detail rest in codes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6062

Differential Revision: D18966785

Pulled By: pdillinger

fbshipit-source-id: 38a60bb1e2d0a1dbfc8ec641617200b6a02b86c3
2019-12-12 15:18:02 -08:00
Connor
a844591201 wait pending memtable writes on file ingestion or compact range (#6113)
Summary:
**Summary:**
This PR fixes two unordered_write related issues:
- ingestion job may skip the necessary memtable flush https://github.com/facebook/rocksdb/issues/6026
- compact range may cause memtable is flushed before pending unordered write finished
    1. `CompactRange` triggers memtable flush but doesn't wait for pending-writes
    2.  there are some pending writes but memtable is already flushed
    3.  the memtable related WAL is removed( note that the pending-writes were recorded in that WAL).
    4.  pending-writes write to newer created memtable
    5. there is a restart
    6. missing the previous pending-writes because WAL is removed but they aren't included in SST.

**How to solve:**
- Wait pending memtable writes before ingestion job check memtable key range
- Wait pending memtable writes before flush memtable.
**Note that: `CompactRange` calls `RangesOverlapWithMemtables` too without waiting for pending waits, but I'm not sure whether it affects the correctness.**

**Test Plan:**
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6113

Differential Revision: D18895674

Pulled By: maysamyabandeh

fbshipit-source-id: da22b4476fc7e06c176020e7cc171eb78189ecaf
2019-12-12 14:08:02 -08:00
sdong
814d4e7ce0 Improve instructions to install formatter (#6162)
Summary:
While the instruction of installing "make format" dependencies works on some platforms, it is hard to use for some others. Improve it a little bit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6162

Test Plan: Run "make format" on an envrionment missing the dependencies and see the instructions printed out

Differential Revision: D18970773

fbshipit-source-id: fd21b31053407cc171a6675f781a556a1c3e8945
2019-12-12 14:04:01 -08:00
Maysam Yabandeh
a796c06fef Fix build breakage from lock_guard error (#6161)
Summary:
This change fixes a source issue that caused compile time error which breaks build for many fbcode services in that setup. The size() member function of channel is a const member, so member variables accessed within it are implicitly const as well. This caused error when clang fails to resolve to a constructor that takes std::mutex because the suitable constructor got rejected due to loss of constness for its argument. The fix is to add mutable modifier to the lock_ member of channel.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6161

Differential Revision: D18967685

Pulled By: maysamyabandeh

fbshipit-source-id: 698b6a5153c3c92eeacb842c467aa28cc350d432
2019-12-12 13:50:27 -08:00
Adam Retter
b433bbefe9 Add missing mutable DBOptions to RocksJava (#6152)
Summary:
As requested in https://github.com/facebook/rocksdb/issues/6127
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6152

Differential Revision: D18955608

Pulled By: pdillinger

fbshipit-source-id: 3e1367d944e44d5f1675a422f7dd2451c86feb6f
2019-12-12 12:01:19 -08:00
Levi Tamasi
3b607610df Do not update SST <-> blob file mapping if compaction failed
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6156

Test Plan: Extended unit tests.

Differential Revision: D18943867

Pulled By: ltamasi

fbshipit-source-id: b3669d2dd6af08e987ad1a59d6712ae2514da0b1
2019-12-12 11:30:45 -08:00
Maysam Yabandeh
8613ee2e94 Enable all txn write policies in crash test (#6158)
Summary:
Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158

Test Plan:
```
make -j32 crash_test_with_txn
```

Differential Revision: D18946307

Pulled By: maysamyabandeh

fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950
2019-12-12 10:43:49 -08:00
Levi Tamasi
e1dfe80fe0 Mark BlobIndex::DebugString const
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6157

Test Plan: make check

Differential Revision: D18944259

Pulled By: ltamasi

fbshipit-source-id: 7fb29447b52d801215bd6ab811e229a7fa2c763d
2019-12-11 17:19:43 -08:00
Maysam Yabandeh
1ad6fa9cc7 Enable txn in crash tests (#6155)
Summary:
Start daily crash tests with use_txn flag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6155

Differential Revision: D18943630

Pulled By: maysamyabandeh

fbshipit-source-id: eea99a6ffd5f57fb9651f6ca7dab8fbf70379c87
2019-12-11 16:01:55 -08:00
Peter Dillinger
d0ad3c59d8 Fix c_test:filter for various CACHE_LINE_SIZEs (#6153)
Summary:
This test was recently updated but failed to account for Bloom
schema variance by CACHE_LINE_SIZE. (Since CACHE_LINE_SIZE is not
defined in our C code, the test now simply allows a valid result for any
CACHE_LINE_SIZE, not just the current one.)

Unblock https://github.com/facebook/rocksdb/issues/5932
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6153

Test Plan:
ran unit test with builds TEST_CACHE_LINE_SIZE=128, =256, and
unset (64 on Intel)

Differential Revision: D18936015

Pulled By: pdillinger

fbshipit-source-id: e5e3852f95283d34d624632c1ae8d3adb2f2662c
2019-12-11 15:17:08 -08:00
奏之章
3717a88289 Fix UniversalCompaction trivial move bug (#6067)
Summary:
`curr.level` is `c->inputs_` index, not real level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6067

Differential Revision: D18935726

fbshipit-source-id: 4354e6e9cd900ca56c96e9d770f0ab6634e45daf
2019-12-11 11:27:53 -08:00
ferhat elmas
afdc58d478 Fix typos in history
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6116

Differential Revision: D18935622

fbshipit-source-id: 59f7a7bc9f0116ae6354ea217896622a34329d3c
2019-12-11 11:04:46 -08:00
Yi Wu
05a86318a7 Remove unused low_pri_write_rate_limiter_ (#6068)
Summary:
`low_pri_write_rate_limiter_` is not being used. Removing. `WriteController` has an internal low_pri rate limiter which is the real rate limiter for low-pri writes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6068

Test Plan: make

Differential Revision: D18664120

fbshipit-source-id: dfe3e4de033cf3522b67781b383aad7d0936034c
2019-12-11 10:28:33 -08:00
Cheng Chang
77565d7532 Add example to show the effect of Get in snapshot isolation (#6059)
Summary:
Adds example to show the difference of reading from snapshot and from the latest state.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6059

Test Plan: cd examples && make transaction_example && ./transaction_example

Differential Revision: D18797616

fbshipit-source-id: f17a2cb12187092ea243159e6ccf55790859e0c0
2019-12-11 09:56:42 -08:00
Yanqin Jin
383f5071f0 Add SyncWAL to db_stress (#6149)
Summary:
Add SyncWAL to db_stress. Specify with `-sync_wal_one_in=N` so that it will be
called once every N operations on average.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6149

Test Plan:
```
$make db_stress
$./db_stress -sync_wal_one_in=100 -ops_per_thread=100000
```

Differential Revision: D18922529

Pulled By: riversand963

fbshipit-source-id: 4c0b8cb8fa21852722cffd957deddf688f12ea56
2019-12-10 21:55:25 -08:00
sdong
7a99162a74 db_stress: sometimes call CancelAllBackgroundWork() and Close() before closing DB (#6141)
Summary:
CancelAllBackgroundWork() and Close() are frequently used features but we don't cover it in stress test. Simply execute them before closing the DB with 1/2 chance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6141

Test Plan: Run "db_stress".

Differential Revision: D18900861

fbshipit-source-id: 49b46ccfae120d0f9de3e0543b82fb6d715949d0
2019-12-10 20:04:52 -08:00
Adam Retter
984b6e71d6 Add Visual Studio 2015 to AppVeyor (#5446)
Summary:
This is required to compile on Windows with Visual Studio 2015, which is used for creating the RocksJava releases.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5446

Differential Revision: D18924811

fbshipit-source-id: a183a62e79a2af5aaf59cd08235458a172fe7dcb
2019-12-10 20:02:31 -08:00
Peter Dillinger
a653857178 Add PauseBackgroundWork() to db_stress (#6148)
Summary:
Worker thread will occasionally call PauseBackgroundWork(),
briefly sleep (to avoid stalling itself) and then call
ContinueBackgroundWork().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148

Test Plan:
some running of 'make blackbox_crash_test' with temporary
printf output to confirm code occasionally reached.

Differential Revision: D18913886

Pulled By: pdillinger

fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be
2019-12-10 15:46:48 -08:00
Adam Simpkins
2bb5fc1280 Add an option to the CMake build to disable building shared libraries (#6122)
Summary:
Add an option to explicitly disable building shared versions of the
RocksDB libraries.  The shared libraries cannot be built in cases where
some dependencies are only available as static libraries.  This allows
still building RocksDB in these situations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6122

Differential Revision: D18920740

fbshipit-source-id: d24f66d93c68a1e65635e6e0b663bae62c903bca
2019-12-10 15:20:50 -08:00
Yanqin Jin
2b060c1498 Use Env::GetChildren() instead of readdir (#6139)
Summary:
For more portability, switch from readdir to Env::GetChildren() in ldb's
manifest_dump subcommand.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6139

Test Plan:
```
$make check
```
Manually check ldb command.

Differential Revision: D18898197

Pulled By: riversand963

fbshipit-source-id: 92afca379e9fbe78ab70b2eb40d127daad8df5e2
2019-12-10 11:49:09 -08:00
sdong
14c38baca0 db_stress: sometimes validate compact range data (#6140)
Summary:
Right now, in db_stress, compact range is simply executed without any immediate data validation. Add a simply validation which compares hash for all keys within the compact range to stay the same against the same snapshot before and after the compaction.

Also, randomly tune most knobs of CompactRangeOptions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6140

Test Plan: Run db_stress with "--compact_range_one_in=2000 --compact_range_width=100000000" for a while. Manually ingest some hacky code and observe the error path.

Differential Revision: D18900230

fbshipit-source-id: d96e75bc8c38dd5ec702571ffe7cf5f4ea93ee10
2019-12-10 11:41:50 -08:00
Jermy Li
1dd3194f56 Fix compile error "folly/xx.h file not found" on Mac OS (#6145)
Summary:
Error message when running `make` on Mac OS with master branch (v6.6.0):
```
$ make
$DEBUG_LEVEL is 1
Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
#include <folly/synchronization/WaitOptions.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
#include <folly/synchronization/ParkingLot.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
#include <folly/synchronization/DistributedMutex.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
#include <folly/synchronization/AtomicNotification.h>
         ^
1 error generated.
third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
#include <folly/detail/Futex.h>
         ^
1 error generated.
  GEN      util/build_version.cc
$DEBUG_LEVEL is 1
Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
#include <folly/synchronization/WaitOptions.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
#include <folly/synchronization/ParkingLot.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
#include <folly/synchronization/DistributedMutex.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
#include <folly/synchronization/AtomicNotification.h>
         ^
1 error generated.
third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
#include <folly/detail/Futex.h>
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6145

Differential Revision: D18910812

fbshipit-source-id: 5a4475466c2d0601657831a0b48d34316b2f0816
2019-12-10 11:24:11 -08:00
Peter Dillinger
6380df5e10 Vary bloom_bits in db_crashtest (#6103)
Summary:
Especially with non-integral bits/key now supported,
db_crashtest should vary the bloom_bits configuration. The probabilities
look like this:

1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
chance of 0 which disables the bloom filter.

1/2 chance of a float from a lognormal distribution with a median of 10.
This always produces positive values but with a decent chance of < 1
(overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
implementation limits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103

Test Plan:
start 'make blackbox_crash_test' several times and look at
configuration output

Differential Revision: D18734877

Pulled By: pdillinger

fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
2019-12-10 08:39:50 -08:00
sdong
a68dff5c35 Apply formatter to some recent commits (#6138)
Summary:
Formatter somehow complains some recent lines changed. Apply them to make the formatter happy.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6138

Test Plan: See CI passes.

Differential Revision: D18895950

fbshipit-source-id: 7d1696cf3e3a682bc10a30cdca748a23c6565255
2019-12-09 15:49:49 -08:00
sdong
a960287dee db_stress: Some code style improvements (#6137)
Summary:
Two changes:
1. Prevent static variables in a header file
2. Add "override" keyword when virtual functions are overridden.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6137

Test Plan: Build db_stress with or without LITE.

Differential Revision: D18892007

fbshipit-source-id: 295356427a34473b23ed36d6ed4ef3ae35a32db0
2019-12-09 14:38:42 -08:00
Peter Dillinger
e43d2c4424 Fix & test rocksdb_filterpolicy_create_bloom_full (#6132)
Summary:
Add overrides needed in FilterPolicy wrapper to fix
rocksdb_filterpolicy_create_bloom_full (see issue https://github.com/facebook/rocksdb/issues/6129). Re-enabled
assertion in BloomFilterPolicy::CreateFilter that was being violated.
Expanded c_test to identify Bloom filter implementations by FP counts.
(Without the fix, updated test will trigger assertion and fail otherwise
without the assertion.)

Fixes https://github.com/facebook/rocksdb/issues/6129
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6132

Test Plan: updated c_test, also run under valgrind.

Differential Revision: D18864911

Pulled By: pdillinger

fbshipit-source-id: 08e81d7b5368b08e501cd402ef5583f2650c19fa
2019-12-09 12:21:14 -08:00
sdong
3c347821b7 Fix thread_local_test failure caused by recent io_uring change (#6136)
Summary:
thread_local_test now fails because it asserts no thread local instance is created when the test started. However, right now a thread local instance might be created when creating PosixEnv as a static variable. Fix the test by relaxing the assumption of starting from 0.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6136

Test Plan: Find an environment where the test fails, and see it passes with the fix applied.

Differential Revision: D18889224

fbshipit-source-id: 7946f3bfea81d236f7bb1554076696705b211b92
2019-12-09 12:03:30 -08:00
Ziyue Yang
7e2f831924 Fix wrong ExtractUserKey usage in BlockBasedTableBuilder::EnterUnbuff… (#6100)
Summary:
BlockBasedTableBuilder uses ExtractUserKey in EnterUnbuffered. This would
cause index filter building error, since user-provided timestamp is supported
by ExtractUserKeyAndStripTimestamp, and it's used in Add. This commit changes
ExtractUserKey to ExtractUserKeyAndStripTimestamp.

A test case is also added by modifying DBBasicTestWithTimestampWithParam_
PutAndGet test in db_basic_test to cover ExtractUserKeyAndStripTimestamp usage
in both kBuffered and kUnbuffered state of BlockBasedTableBuilder.

Before the ExtractUserKeyAndStripTimstamp fix:

```
$ ./db_basic_test --gtest_filter="*PutAndGet*"
Note: Google Test filter = *PutAndGet*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam
[ RUN      ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
db/db_basic_test.cc:2109: Failure
db_->Get(ropts, cfh, "key" + std::to_string(j), &value)
NotFound:
[  FAILED  ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0, where GetParam() = false (1177 ms)
[ RUN      ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1
[       OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 (1056 ms)
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam (2233 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (2233 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0, where GetParam() = false

 1 FAILED TEST
```

After the ExtractUserKeyAndStripTimstamp fix:

```
$ ./db_basic_test --gtest_filter="*PutAndGet*"
Note: Google Test filter = *PutAndGet*
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam
[ RUN      ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0
[       OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0 (1417 ms)
[ RUN      ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1
[       OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 (1041 ms)
[----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam (2458 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (2458 ms total)
[  PASSED  ] 2 tests.
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6100

Differential Revision: D18769654

Pulled By: riversand963

fbshipit-source-id: 76c2cf2c9a5e0d85db95d98e812e6af0c2a15c6b
2019-12-09 10:57:02 -08:00
sdong
d1ae2c3faf Fix an asan warning caused by the recent io_uring change (#6135)
Summary:
ASAN reports:

internal_repo_rocksdb/repo:db_test - MultiThreaded/MultiThreadedDBTest.MultiThreaded/43: fatal
==2692739==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6130000500ca at pc 0x0000006be780 bp 0x7efef85ccd20 sp 0x7efef85cc4d0
[CONTEXT] === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
[CONTEXT] READ of size 331 at 0x6130000500ca thread T195
[CONTEXT]      #0 db_test_bin+0x6be77f                     __interceptor_strlen.part.35
[CONTEXT]      https://github.com/facebook/rocksdb/issues/1 internal_repo_rocksdb/repo/include/rocksdb/slice.h:55 rocksdb::Slice::Slice(char const*)
[CONTEXT]      https://github.com/facebook/rocksdb/issues/2 internal_repo_rocksdb/repo/env/io_posix.cc:522 rocksdb::PosixRandomAccessFile::MultiRead(rocksdb::ReadRequest*, unsigned long)

I looked at env/io_posix.cc:522 but don't see a reason why the line needs to be there at all, because it is not used before overwritten. So it must be a line that is put there as a bug. Remove it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6135

Test Plan: Rerun the same test which passes after the fix. Run all the tests and make sure they all pass.

Differential Revision: D18880251

fbshipit-source-id: 3b84ac6a05b67b529c4202e0ceb4c047460f44f2
2019-12-09 10:25:09 -08:00