Commit Graph

1300 Commits

Author SHA1 Message Date
anand76
bd44ec2006 Fix reopen voting logic in db_stress when using MultiGet (#5374)
Summary:
When the --reopen option is non-zero, the DB is reopened after every ops_per_thread/(reopen+1) ops, with the check being done after every op. With MultiGet, we might do multiple ops in one iteration, which broke the logic that checked when to synchronize among the threads and reopen the DB. This PR fixes that logic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5374

Differential Revision: D15559780

Pulled By: anand1976

fbshipit-source-id: ee6563a68045df7f367eca3cbc2500d3e26359ef
2019-05-30 11:41:08 -07:00
Siying Dong
e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Siying Dong
545d206040 Move some file related files outside util/ (#5375)
Summary:
util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375

Differential Revision: D15550935

Pulled By: siying

fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
2019-05-29 20:47:06 -07:00
Maysam Yabandeh
eab4f49a2c WritePrepared: skip_concurrency_control option (#5330)
Summary:
This enables the user to set TransactionDBOptions::skip_concurrency_control so the standard `DB::Write(const WriteOptions& opts, WriteBatch* updates)` would skip the concurrency control. This would give higher throughput to the users who know their use case doesn't need concurrency control.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5330

Differential Revision: D15525932

Pulled By: maysamyabandeh

fbshipit-source-id: 68421ac1ba34f549a4a8de9ce4c2dccf6fb4b06b
2019-05-28 16:29:45 -07:00
Silver Chan
2095ae8858 fixed db_stress.cc build error (#5307)
Summary:
when building this file using Xcode 10.2.1 in MacOSX10.14, the compiler report this error:
`
rocksdb/tools/db_stress.cc:3613:33: error: implicit instantiation of
      undefined template 'std::__1::array<std::__1::basic_string<char>, 10>'
    std::array<std::string, 10> keys = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"};
/usr/include/c++/v1/__tuple:223:64: note:
      template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TEMPLATE_VIS array;
                                                               ^
1 error generated.
`
if including array, this error will be fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5307

Differential Revision: D15475217

Pulled By: sagar0

fbshipit-source-id: b04a7658c2ca2573157028863b3a80f5ab52b9de
2019-05-23 14:03:25 -07:00
Zhichao Cao
a13026fb2f Added trace replay fast forward function (#5273)
Summary:
In the current db_bench trace replay, the replay process strictly follows the timestamp to issue the queries. In some cases, user does not care about the time. Therefore, fast forward is needed for users to speed up the replay process.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5273

Differential Revision: D15389232

Pulled By: zhichao-cao

fbshipit-source-id: 735d629b9d2a167b05af3e4fa0ddf9d5d0be1806
2019-05-16 20:21:18 -07:00
anand76
6492430eaf Fix a bug in db_stress and an incorrect assertion in FilePickerMultiGet (#5301)
Summary:
This PR has two fixes for crash test failures -
1. Fix a bug in TestMultiGet() in db_stress that was passing list of key to MultiGet() in the wrong order, thus ensuring that actual values don't match expected values
2. Remove an incorrect assertion in FilePickerMultiGet::GetNextFileInLevelWithKeys() that checks that files in a level are in sorted order. This is not true with MultiGet(), especially if there are duplicate keys and we may have to go back one file for the next key. Furthermore, this assertion makes more sense when a new version is created, rather than at lookup time

Test -
asan_crash and ubsan_crash tests
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5301

Differential Revision: D15337383

Pulled By: anand1976

fbshipit-source-id: 35092cb15bbc1700e5e823cbe07bfa62f1e9e6c6
2019-05-14 11:58:04 -07:00
Maysam Yabandeh
f383641a1d Unordered Writes (#5218)
Summary:
Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees.
The patch is prepared based on an original by siying
Existing unit tests are extended to include unordered_write option.

Benchmark Results:
```
TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions  --unordered_write=1
```
With WAL
- Vanilla RocksDB: 78.6 MB/s
- WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x)
- unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees)

Without WAL
- Vanilla RocksDB: 111.3 MB/s
- WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x)
- unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees)

- WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x)

Limitations:
- The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218

Differential Revision: D15219029

Pulled By: maysamyabandeh

fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759
2019-05-13 17:47:21 -07:00
Yi Wu
92c60547fe db_bench: fix hang on IO error (#5300)
Summary:
db_bench will wait indefinitely if there's background error. Fix by pass `abs_time_us` to cond var.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5300

Differential Revision: D15319945

Pulled By: miasantreble

fbshipit-source-id: 0034fb7f6ec7c3303c4ccf26e54c20fbdac8ab44
2019-05-13 11:30:35 -07:00
anand76
181bb43f08 Fix bugs in FilePickerMultiGet (#5292)
Summary:
This PR fixes a couple of bugs in FilePickerMultiGet that were causing db_stress test failures. The failures were caused by -
1. Improper handling of a key that matches the user key portion of an L0 file's largest key. In this case, the curr_index_in_curr_level file index in L0 for that key was getting incremented, but batch_iter_ was not advanced. By design, all keys in a batch are supposed to be checked against an L0 file before advancing to the next L0 file. Not advancing to the next key in the batch was causing a double increment of curr_index_in_curr_level due to the same key being processed again
2. Improper handling of a key that matches the user key portion of the largest key in the last file of L1 and higher. This was resulting in a premature end to the processing of the batch for that level when the next key in the batch is a duplicate. Typically, the keys in MultiGet will not be duplicates, but its good to handle that case correctly

Test -
asan_crash
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5292

Differential Revision: D15282530

Pulled By: anand1976

fbshipit-source-id: d1a6a86e0af273169c3632db22a44d79c66a581f
2019-05-09 13:18:00 -07:00
anand76
930bfa5750 Disable MultiGet from db_stress (#5284)
Summary:
Disable it for now until we can get stress tests to pass consistently.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5284

Differential Revision: D15230727

Pulled By: anand1976

fbshipit-source-id: 239baacdb3c4cd4fb7c4447f7582b9042501d752
2019-05-06 18:26:50 -07:00
Maysam Yabandeh
6a40ee5eb1 Refresh snapshot list during long compactions (2nd attempt) (#5278)
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
For simplicity, to avoid the feature is disabled in two cases: i) When more than one sub-compaction are sharing the same snapshot list, ii) when Range Delete is used in which the range delete aggregator has its own copy of snapshot list.
This fixes the reverted https://github.com/facebook/rocksdb/pull/5099 issue with range deletes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5278

Differential Revision: D15203291

Pulled By: maysamyabandeh

fbshipit-source-id: fa645611e606aa222c7ce53176dc5bb6f259c258
2019-05-03 17:30:22 -07:00
Zhongyi Xie
5d27d65bef multiget: fix memory issues due to vector auto resizing (#5279)
Summary:
This PR fixes three memory issues found by ASAN
* in db_stress, the key vector for MultiGet is created using `emplace_back` which could potentially invalidates references to the underlying storage (vector<string>) due to auto resizing. Fix by calling reserve in advance.
* Similar issue in construction of GetContext autovector in version_set.cc
* In multiget_context.h use T[] specialization for unique_ptr that holds a char array
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5279

Differential Revision: D15202893

Pulled By: miasantreble

fbshipit-source-id: 14cc2cda0ed64d29f2a1e264a6bfdaa4294ee75d
2019-05-03 15:58:43 -07:00
Zhongyi Xie
3e994809a1 fix implicit conversion error reported by clang check (#5277)
Summary:
fix the following clang check errors
```
tools/db_stress.cc:3609:30: error: implicit conversion loses integer precision: 'std::vector::size_type' (aka 'unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
    int num_keys = rand_keys.size();
        ~~~~~~~~   ~~~~~~~~~~^~~~~~
tools/db_stress.cc:3888:30: error: implicit conversion loses integer precision: 'std::vector::size_type' (aka 'unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
    int num_keys = rand_keys.size();
        ~~~~~~~~   ~~~~~~~~~~^~~~~~
2 errors generated.
make: *** [tools/db_stress.o] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5277

Differential Revision: D15196620

Pulled By: miasantreble

fbshipit-source-id: d56b1420d4a9f1df875fc52877a5fbb342bc7cae
2019-05-03 10:02:27 -07:00
anand76
434ccf2df4 Add option to use MultiGet in db_stress (#5264)
Summary:
The new option will pick a batch size randomly in the range 1-64. It will then space the keys in the batch by random intervals.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5264

Differential Revision: D15175522

Pulled By: anand1976

fbshipit-source-id: c16baa69d0f1ff4cf53c55c813ddd82c8aeb58fc
2019-05-01 23:06:56 -07:00
Andrew Kryczka
b02d0c238d Init compression dict handle before reading meta-blocks (#5267)
Summary:
At least one of the meta-block loading functions (`ReadRangeDelBlock`)
uses the same block reading function (`NewDataBlockIterator`) as data
block reads, which means it uses the dictionary handle. However, the
dictionary handle was uninitialized while reading meta-blocks, causing
readers to receive an error. This situation was only noticed when
`cache_index_and_filter_blocks=true`.

This PR initializes the handle to null while reading meta-blocks to
prevent the error. It also adds support to `db_stress` /
`db_crashtest.py` for `cache_index_and_filter_blocks`.

Fixes #5263.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5267

Differential Revision: D15149264

Pulled By: maysamyabandeh

fbshipit-source-id: 991d38a306c62db5976778bfb050fa3cd4a0671b
2019-04-30 09:50:49 -07:00
Yanqin Jin
210b49cac9 Disable pipelined write in atomic flush stress test (#5266)
Summary:
Since currently pipelined write allows one thread to perform memtable writes
while another thread is traversing the `flush_scheduler_`, it will cause an
assertion failure in `FlushScheduler::Clear`. To unblock crash recoery tests,
we temporarily disable pipelined write when atomic flush is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5266

Differential Revision: D15142285

Pulled By: riversand963

fbshipit-source-id: a0c20fe4ac543e08feaed602414f982054df7831
2019-04-30 08:12:42 -07:00
Sagar Vemuri
3548e4220d Improve explicit user readahead performance (#5246)
Summary:
Improve the iterators performance when the user explicitly sets the readahead size via `ReadOptions.readahead_size`.

1. Stop creating new table readers when the user explicitly sets readahead size.
2. Make use of an internal buffer based on `FilePrefetchBuffer` instead of using `ReadaheadRandomAccessFileReader`, to handle the user readahead requests (for both buffered and direct io cases).
3. Add `readahead_size` to db_bench.

**Benchmarks:**
https://gist.github.com/sagar0/53693edc320a18abeaeca94ca32f5737

For 1 MB readahead, Buffered IO performance improves by 28% and Direct IO performance improves by 50%.
For 512KB readahead, Buffered IO performance improves by 30% and Direct IO performance improves by 67%.

**Test Plan:**
Updated `DBIteratorTest.ReadAhead` test to make sure that:
- no new table readers are created for iterators on setting ReadOptions.readahead_size
- At least "readahead" number of bytes are actually getting read on each iterator read.

TODO later:
- Use similar logic for compactions as well.
- This ties in nicely with #4052 and paves the way for removing ReadaheadRandomAcessFile later.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5246

Differential Revision: D15107946

Pulled By: sagar0

fbshipit-source-id: 2c1149729ca7d779e4e8b7710ba6f4e8cbfd3bea
2019-04-26 21:24:10 -07:00
Adam Retter
990b2f4cb3 Fix compilation on db_bench_tool.cc on Windows (#5227)
Summary:
I needed this change to be able to build the v6.0.1 release on Windows.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5227

Differential Revision: D15033815

Pulled By: sagar0

fbshipit-source-id: 579f3b8e694c34c0d43527eb2fa37175e37f5911
2019-04-23 11:16:51 -07:00
Fosco Marotto
6c2bf9e916 Add copyright headers per FB open-source checkup tool. (#5199)
Summary:
internal task: T35568575
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199

Differential Revision: D14962794

Pulled By: gfosco

fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe
2019-04-18 10:55:01 -07:00
Zhongyi Xie
baa5302447 Avoid double-compacting data in bottom level in manual compactions (#5138)
Summary:
Depending on the config, manual compaction (leveled compaction style) does following compactions:
L0->L1
L1->L2
...
Ln-1 -> Ln
Ln -> Ln
The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only.
Resolves issue https://github.com/facebook/rocksdb/issues/4995
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138

Differential Revision: D14940106

Pulled By: miasantreble

fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5
2019-04-16 23:32:20 -07:00
Yi Wu
b70967aac7 db_bench: support seek to non-exist prefix (#5163)
Summary:
Add `--seek_missing_prefix` flag to db_bench to allow benchmarking seeking to non-existing prefix. Usage example:
```
./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --prefix_size=9 --keys_per_prefix=10
./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --num=100000000 --prefix_size=9 --keys_per_prefix=10 --reads=1000 --prefix_same_as_start=true --seek_missing_prefix=true
```
Also adding `--total_order_seek` and `--prefix_same_as_start` flags.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5163

Differential Revision: D14935724

Pulled By: riversand963

fbshipit-source-id: 7c41023f007febe373eb1589861f215432a9e18a
2019-04-15 10:54:58 -07:00
Yanqin Jin
3189398c00 Fix bugs detected by clang analyzer (#5185)
Summary:
as titled. False positive included, fixed anyway to make the check
pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185

Differential Revision: D14909384

Pulled By: riversand963

fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3
2019-04-12 10:45:56 -07:00
anand76
fefd4b98c5 Introduce a new MultiGet batching implementation (#5011)
Summary:
This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.

Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
2. Bloom filter cachelines can be prefetched, hiding the cache miss latency

The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.

Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).

Batch   Sizes

1        | 2        | 4         | 8      | 16  | 32

Random pattern (Stride length 0)
4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)

Good locality (Stride length 16)
4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135

Good locality (Stride length 256)
4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62

Medium locality (Stride length 4096)
4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891

dbbench command used (on a DB with 4 levels, 12 million keys)-
TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011

Differential Revision: D14348703

Pulled By: anand1976

fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
2019-04-11 14:28:26 -07:00
datonli
f0edf9d575 #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152)
Summary:
mv port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5152

Differential Revision: D14779409

Pulled By: siying

fbshipit-source-id: d4162c47c979c6e8cc6a9e601802864ab3768ecb
2019-04-04 11:38:19 -07:00
Yanqin Jin
d77476ef55 Fix db_stress for custom env (#5122)
Summary:
Fix some hdfs-related code so that it can compile and run 'db_stress'
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122

Differential Revision: D14675495

Pulled By: riversand963

fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a
2019-03-28 19:20:27 -07:00
Siying Dong
2b4d5ceb47 Remove some "using std::..." from header files. (#5113)
Summary:
The code convention we are following, Google C++ Style, discourage
alias in header files, especially public headers:
https://google.github.io/styleguide/cppguide.html#Aliases
Remove some of them. Might removed some from .cc files as well to be consistent.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5113

Differential Revision: D14633030

Pulled By: siying

fbshipit-source-id: b990edc919d5de60295992284f980195e501d424
2019-03-27 10:28:21 -07:00
Yanqin Jin
9358178edc Support for single-primary, multi-secondary instances (#4899)
Summary:
This PR allows RocksDB to run in single-primary, multi-secondary process mode.
The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.

This PR has several components:
1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.

2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.

3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
3.1 Tail the primary's MANIFEST during recovery.
3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
3.3 Tailing WAL will be in a future PR.

4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899

Differential Revision: D14510945

Pulled By: riversand963

fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
2019-03-26 16:45:31 -07:00
Zhongyi Xie
52e6404e0f ldb command parsing: allow option values to contain equals signs (#5088)
Summary:
Right now ldb command doesn't allow cases where option values contain equals sign. For example,
```
ldb --db=/tmp/test scan --from='q=3' --max_keys=1
```
after parsing, ldb will have one option 'db', 'max_keys' and one flag 'from'.
This PR updates the parsing logic so that it now supports the above mentioned cases
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5088

Differential Revision: D14600869

Pulled By: miasantreble

fbshipit-source-id: c6ef518c74a98d7b6675ea5954ae08b1bda5554e
2019-03-25 13:23:11 -07:00
Shobhit Dayal
b45b1cde3e Feature for sampling and reporting compressibility (#4842)
Summary:
This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms:
1. lz4 or snappy for fast compression
2. zstd or zlib for slow but higher compression.

The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType.

The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842

Differential Revision: D13629011

Pulled By: shobhitdayal

fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
2019-03-18 12:15:34 -07:00
Andrew Kryczka
2263f86901 exercise WAL recycling in crash test (#5070)
Summary:
Since this feature affects the WAL behavior, it seems important our crash-recovery tests cover it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5070

Differential Revision: D14470085

Pulled By: miasantreble

fbshipit-source-id: 9b9682a718a926d57d055e0a5ec867efbd2eb9c1
2019-03-15 12:03:26 -07:00
Zhichao Cao
dcde292c3b Add the -try_process_corrupted_trace option to trace_analyzer (#5067)
Summary:
In the current trace_analyzer implementation, once the trace file has corrupted content, which can be caused by unexpected tracing operations or other reasons, trace_analyzer will print the error and stop analyzing.

By adding the -try_process_corrupted_trace option, user can try to process the corrupted trace file and get the analyzing results of the trace records from the beginning to the the first corrupted point in the trace file. Analyzing might fail even this option is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5067

Differential Revision: D14433037

Pulled By: zhichao-cao

fbshipit-source-id: d095233ba371726869af0def0cdee23b69896831
2019-03-14 20:03:01 -07:00
Andrew Kryczka
5a5c0492db ldb: set total_order_seek for scans (#5066)
Summary:
Without `total_order_seek=true`, using this command with `prefix_extractor` set skips over lots of keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5066

Differential Revision: D14425967

Pulled By: sagar0

fbshipit-source-id: f6f142733258d92604f920615be9266e1fe797f8
2019-03-12 13:10:39 -07:00
Zhichao Cao
05ebfebc17 Fixed the potential stack overflow of MixGraph in db_bench (#5051)
Summary:
In the MixGraph benchmark of db_bench, The max buffer size used for value of KV-pair might be extremely large (64MB), which might cause function stack overflow in some platforms, reduced to 1MB.

Added the finished ops printing in MixGraph benchmark.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5051

Differential Revision: D14379571

Pulled By: zhichao-cao

fbshipit-source-id: 24084fbe38f60f2902d9a40f6bc9a25e4e2c9bb9
2019-03-08 14:10:17 -08:00
Andrew Kryczka
18d2e4beb7 Run db_bench on database generated externally (#5017)
Summary:
Added an option, `-use_existing_keys`, which can be set to run
benchmarks against an arbitrary existing database. Now users can
benchmark against their actual database rather than synthetic data.

Before the run begins, it loads all the keys into memory, then uses that
set of keys rather than synthesizing new ones in `GenerateKeyFromInt`.
This is mainly intended for small-scale DBs where the memory consumption
is not a concern.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5017

Differential Revision: D14270303

Pulled By: riversand963

fbshipit-source-id: 6328df9dffb5e19170270dd00a69f4bbe424e5ed
2019-03-01 11:19:03 -08:00
Siying Dong
aef763b6d6 Make statistics's stats_level change thread-safe (#5030)
Summary:
Right now, users can change statistics.stats_level while DB is running, but TSAN may report
data race. We make stats_level_ to be atomic, and access them using accessors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5030

Differential Revision: D14267519

Pulled By: siying

fbshipit-source-id: 37d7ebeff7a43a406230143422a16af899163f73
2019-03-01 10:42:09 -08:00
Maysam Yabandeh
0b80f6b380 WritePrepared: script to analyze stress test failures (#5033)
Summary:
This the hackish script we used to find the root cause of failures in transaction stress tests. It is not well-written and does not require rigorous reviewing but it is better than starting from scratch each time we observe an issue. The stress tests would just say that at which snapshots the sum of all the keys in a set is inconsistent with another set. To help debugging one need to know which key exactly returned inconsistent results. The script looks at the transactions between two conflicting snapshots, and performs thee changes manually to see for which key the read value was inconsistent.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5033

Differential Revision: D14280362

Pulled By: maysamyabandeh

fbshipit-source-id: d5826055c46711460ba81480d96cb5ea082814a5
2019-03-01 09:18:40 -08:00
Siying Dong
5e298f865b Add two more StatsLevel (#5027)
Summary:
Statistics cost too much CPU for some use cases. Add two stats levels
so that people can choose to skip two types of expensive stats, timers and
histograms.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027

Differential Revision: D14252765

Pulled By: siying

fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592
2019-02-28 10:27:59 -08:00
Zhongyi Xie
c4f5d0aa15 add GetStatsHistory to retrieve stats snapshots (#4748)
Summary:
This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0.

(This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748

Differential Revision: D13961471

Pulled By: miasantreble

fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04
2019-02-20 15:52:54 -08:00
Zhongyi Xie
ed995c6a69 add whole key bloom filter support in memtables (#4985)
Summary:
MyRocks calls `GetForUpdate` on `INSERT`, for unique key check, and in almost all cases GetForUpdate returns empty result. For such cases, whole key bloom filter is helpful.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4985

Differential Revision: D14118257

Pulled By: miasantreble

fbshipit-source-id: d35cb7109c62fd5ad541a26968e3a3e16d3e85ea
2019-02-19 12:15:39 -08:00
Siying Dong
4db46aa2e6 Fix LITE Build (#4989)
Summary:
LITE mode has EventListener to be an empty class. However in db_bench,
it is used. When "override" is added to the functions, the build breaks. Fix it
by keeping the listener empty in LITE mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4989

Differential Revision: D14108132

Pulled By: siying

fbshipit-source-id: 80121aab35b1120e502b37b782301dd700692697
2019-02-15 16:13:11 -08:00
Aubin Sanyal
3231a2e581 Deprecate ttl option from CompactionOptionsFIFO (#4965)
Summary:
We introduced ttl option in CompactionOptionsFIFO when ttl-based file
deletion (compaction) was supported only as part of FIFO Compaction. But
with the extension of ttl semantics even to Level compaction,
CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start
using ColumnFamilyOptions.ttl for FIFO compaction as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965

Differential Revision: D14072960

Pulled By: sagar0

fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e
2019-02-15 09:51:41 -08:00
Michael Liu
ca89ac2ba9 Apply modernize-use-override (2nd iteration)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14090024

fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
2019-02-14 14:41:36 -08:00
Yanqin Jin
4fc442029a Avoid using kInAtomicGroup tag for single-cf op (#4981)
Summary:
if an operation just involves a single column family, then we do
not have to set the kInAtomicGroup tag when writing to MANIFEST. This change
can fix a compatibility test failure, i.e. 5.15 and earlier cannot recognize
kInAtomicGroup tag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4981

Differential Revision: D14072687

Pulled By: riversand963

fbshipit-source-id: 46b0c61e399f16c6b7169de0b33430d0ed90d6d4
2019-02-13 18:33:42 -08:00
Andrew Kryczka
62f70f6d14 Reduce scope of compression dictionary to single SST (#4952)
Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.

So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:

- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952

Differential Revision: D13967980

Pulled By: ajkr

fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
2019-02-11 19:47:32 -08:00
Andrew Kryczka
1218704b61 Fix compression_zstd_max_train_bytes coverage in stress test (#4957)
Summary:
Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957

Differential Revision: D13994302

Pulled By: ajkr

fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65
2019-02-11 14:56:39 -08:00
Siying Dong
cf3a671733 Remove cuckoo hash memtable (#4953)
Summary:
Cuckoo Hash is less useful than we initially expected. Remove it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953

Differential Revision: D13979264

Pulled By: siying

fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed
2019-02-07 16:15:27 -08:00
Siying Dong
d9c9f3c809 db_bench: fix "micros/op" reporting (#4949)
Summary:
4985a9f73b (diff-e5276985b26a0551957144f4420a594bR511)
changes the meaning of latency reporting from running time per query, to elapse_time / #ops, without providing a reason why.
Considering that this is a counter-intuitive reporting, Reverting the change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4949

Differential Revision: D13964684

Pulled By: siying

fbshipit-source-id: d6304d3d4b5a802daa292302623c7dbca9a680bc
2019-02-05 17:20:02 -08:00
Alexander Zinoviev
32a6dd9a41 Add a new CPU time counter to compaction report (#4889)
Summary:
Measure CPU time consumed for a compaction and report it in the stats report
Enable NowCPUNanos() to work for MacOS
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889

Differential Revision: D13701276

Pulled By: zinoale

fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055
2019-01-29 17:24:00 -08:00
zhichao-cao
e2547103fd Fix the build error caused by the dynamic array (#4918)
Summary:
In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256.

Tested with make check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4918

Differential Revision: D13844298

Pulled By: sagar0

fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab
2019-01-28 12:39:57 -08:00
Andrew Kryczka
e242fa4664 Add latest toolchain (gcc-8, etc.) build support for fbcode users (#4923)
Summary:
- When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1`
- It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090
- I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923

Differential Revision: D13827226

Pulled By: ajkr

fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947
2019-01-28 11:26:32 -08:00
Sagar Vemuri
0cead31d10 Fix Clang static analyzer warning in db_bench (#4910)
Summary:
Fixed clang static analyzer warning about division by 0.
```
ar: creating librocksdb_debug.a
tools/db_bench_tool.cc:4650:43: warning: Division by zero
      int pos = static_cast<int>(rand_num % range_);
                                 ~~~~~~~~~^~~~~~~~
1 warning generated.
make: *** [analyze] Error 1
```

This is from the new code I recently merged in ce8e88d2d7.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4910

Differential Revision: D13788037

Pulled By: sagar0

fbshipit-source-id: f48851dca85047c19fbb1a361e25ce643aa4c7ea
2019-01-23 13:33:02 -08:00
Zhongyi Xie
cbe0239270 add cast to avoid loss of precision error (#4906)
Summary:
this PR address the following error:
> tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32]
        s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size));
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906

Differential Revision: D13780185

Pulled By: miasantreble

fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3
2019-01-22 22:44:17 -08:00
Zhichao Cao
ce8e88d2d7 Generate mixed workload with Get, Put, Seek in db_bench (#4788)
Summary:
Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics.

After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The *_accessed_key_stats.txt,  *-accessed_value_size_distribution.txt, *-iterator_length_distribution.txt, and *-qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer))

The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench

For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k*(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html)

As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_a*sin(sine_b*x + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "*-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes.

To use the bench mark, user can indicate the following parameters as examples:
```
-benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788

Differential Revision: D13573940

Pulled By: sagar0

fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222
2019-01-22 10:44:26 -08:00
Andrew Kryczka
01013ae766 Digest ZSTD compression dictionary once when writing SST file (#4849)
Summary:
This is essentially a re-submission of #4251 with a few improvements:

- Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict`
- Eliminated `Init` functions. Instead do all initialization work in constructors.
- Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849

Differential Revision: D13606039

Pulled By: ajkr

fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447
2019-01-18 19:12:57 -08:00
Siying Dong
4e37251b4d With ldb --try_load_options and wal_dir doesn't exist, ignore it (#4875)
Summary:
LDB is frequently used to exam data copied. wal_dir in option file is not modified and it usually points to the path it copied from.
The user experience will be better if when ldb sees wal_dir pointed by the option file doesn't exist, rather than fail, just ignore it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4875

Differential Revision: D13643173

Pulled By: siying

fbshipit-source-id: 2e64d4ea2ec49a6794b9a706b7fc1ba901128bb8
2019-01-11 16:48:32 -08:00
Yanqin Jin
ffc9f84649 Free memory after use
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4857

Differential Revision: D13602688

Pulled By: riversand963

fbshipit-source-id: 993419a6afb982a7a701ff71daebebb4b4a6b265
2019-01-08 17:19:09 -08:00
Yanqin Jin
e686caffec Remove unnecessary assersion in AtomicFlushStressTest::TestCheckpoint (#4846)
Summary:
as titled.
We can remove the assersion because we do not perform verification in
AtomicFlushStressTest::TestCheckpoint for similar reasons to TestGet, TestPut,
etc.
Therefore, we override TestCheckpoint in AtomicFlushStressTest so that the
assertion `rand_column_families.size() == rand_keys.size()' is removed, and we
do not verify the DB in this function.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4846

Differential Revision: D13583377

Pulled By: riversand963

fbshipit-source-id: 03647f3da67e27a397413fd666e3bb43003bf596
2019-01-07 16:47:26 -08:00
Huachao Huang
74f7d7551e tools: use provided options instead of the default (#4839)
Summary:
The current implementation hardcode the default options in different
places, which makes it impossible to support other environments (like
encrypted environment).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4839

Differential Revision: D13573578

Pulled By: sagar0

fbshipit-source-id: 76b58b4b758902798d10ff2f52d9f39abff015e7
2019-01-03 11:23:49 -08:00
Yanqin Jin
565b5bdc42 Add support for read-only db chkpt stress (#4690)
Summary:
Updated stress test will support testing of db in read-only mode.
The user has to make sure that only read/scan operations are enabled.
This PR relies on #4681.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4690

Differential Revision: D13102741

Pulled By: riversand963

fbshipit-source-id: f5a36b34db187fe12dd355f7eda161f99d6c75e4
2019-01-02 17:40:53 -08:00
Andrew Kryczka
ace543a815 fix accounting for range tombstones in TableProperties (#4841)
Summary:
- To be consistent with the accounting of other optypes in `TableProperties`, we should count range tombstones in `TableProperties::num_entries` and `TableProperties::num_deletions`.
- Updated assertions in stress test's `OnTableFileCreated` handler to accept files with range tombstones only.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4841

Differential Revision: D13568424

Pulled By: ajkr

fbshipit-source-id: 0139d7806494eda20ece67ec460d2458dbbf6026
2019-01-02 15:08:53 -08:00
Andrew Kryczka
68d949b3e3 Enable DeleteRange in stress/crash tests (#4483)
Summary:
Set `delrangepercent=1` when `test_batches_snapshots=false`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4483

Differential Revision: D10324361

Pulled By: ajkr

fbshipit-source-id: 0cde1f1504f9493408a0c6493b976d7e5f5b2d23
2018-12-18 13:42:49 -08:00
Fosco Marotto
311cd8cf2f Updated benchmark script (#4134)
Summary:
When producing the updated performance on flash results for the wiki, these are the updates which were made.

https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4134

Differential Revision: D13491052

Pulled By: gfosco

fbshipit-source-id: dcd92f24659e0917cb1ac54a4446aa8e7aac8b0d
2018-12-17 16:34:30 -08:00
Andrew Kryczka
8d2b74d287 Refine db_stress params for atomic flush (#4781)
Summary:
Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781

Differential Revision: D13463211

Pulled By: ajkr

fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd
2018-12-13 22:10:38 -08:00
DorianZheng
2670fe8c73 Get CompactionJobInfo from CompactFiles
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716

Differential Revision: D13207677

Pulled By: ajkr

fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b
2018-12-13 14:21:24 -08:00
Sagar Vemuri
70645355ad Move FIFOCompactionPicker to a separate file (#4724)
Summary:
**Summary:**
Simplified the code layout by moving FIFOCompactionPicker to a separate file.
**Why?:**
While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724

Differential Revision: D13227914

Pulled By: sagar0

fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3
2018-11-29 16:04:52 -08:00
Sagar Vemuri
c94f073e5e Fix Mac build break in casting (#4722)
Summary:
Mac build is failing with the below error:
```
$ make db_bench -j8
...
...
tools/db_bench_tool.cc:4583:25: error: no matching function for call to 'max'
              (uint64_t)std::max(0l, seek_pos - FLAGS_max_scan_distance),
                        ^~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2717:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('long' vs. 'long long')
max(const _Tp& __a, const _Tp& __b)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2727:1: note: candidate template ignored: could not match 'initializer_list<type-parameter-0-0>' against 'long'
max(initializer_list<_Tp> __t, _Compare __comp)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2709:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided
max(const _Tp& __a, const _Tp& __b, _Compare __comp)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2735:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were provided
max(initializer_list<_Tp> __t)
^
1 error generated.
make: *** [tools/db_bench_tool.o] Error 1
```

My compiler version:
Mac OS X Mojave
```
$ clang++ --version
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4722

Differential Revision: D13220196

Pulled By: sagar0

fbshipit-source-id: 01e5e928288a5613027c83a26ad8aedf04438b14
2018-11-27 13:30:16 -08:00
Huachao Huang
5e72bc113a Add SstFileReader to read sst files (#4717)
Summary:
A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.

TODO: there is only a very simple test now, because I want to get some feedback first.
If the changes look good, I will add more tests soon.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717

Differential Revision: D13212686

Pulled By: ajkr

fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
2018-11-27 13:02:23 -08:00
Po-Chuan Hsieh
60deb4485e Fix build with ROCKSDB_LITE and -Wunused-private-field (#4715)
Summary:
The error message of databases/rocksdb-lite (FreeBSD port) is as follows:
```
  tools/db_bench_tool.cc:1976:16: error: private field 'trace_options_' is not used [-Werror,-Wunused-private-field]
    TraceOptions trace_options_;
                 ^
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4715

Differential Revision: D13207902

Pulled By: ajkr

fbshipit-source-id: be3c612eba656aeddb77e35e2f201dd25dc92f7e
2018-11-26 21:35:38 -08:00
Abhishek Madan
0ed738fdd0 Add max_scan_distance flag to db_bench (#4660)
Summary:
The new flag makes it possible to constrain iterator traversal
by the upper/lower bound the iterator is expected to pass. This allows
seekrandom results to be more easily comparable between DBs with and
without deletions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4660

Differential Revision: D13053111

Pulled By: abhimadan

fbshipit-source-id: 33e250f2e2d210b54c7726399da30a33f723c33c
2018-11-14 10:46:12 -08:00
Yanqin Jin
de65103553 Improve result report of scan (#4648)
Summary:
When iterator becomes invalid, there are two possibilities.
First, all data in the column family have been scanned and there is nothing
more to scan.
Second, an underlying error has occurred, causing `status()` to be !ok.
Therefore, we need to check for both cases when `!iter->Valid()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4648

Differential Revision: D12959601

Pulled By: riversand963

fbshipit-source-id: 49c9382c9ea9e78f2e2b6f3708f0670b822ca8dd
2018-11-13 20:03:59 -08:00
Zhichao Cao
d761857d56 Add unique key number changing statistics to Trace_analyzer (#4646)
Summary:
Changes:
1. in current version, key size distribution is printed out as the result. In this change, the result will be output to a file to make further analyze easier
2. To understand how the unique keys are accessed over time, the total unique key number of each CF of each query type in each second over time is output to a file. In this way, user could know when the unique keys are accessed frequently or accessed rarely.
3. output the total QPS of each CF to a file
4. Add the print result of total queries of each CF of each query type.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4646

Differential Revision: D12968156

Pulled By: zhichao-cao

fbshipit-source-id: 6c411c7ec47c7843a70929136efd71a150db0e4c
2018-11-12 08:26:50 -08:00
Sagar Vemuri
dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Andrew Kryczka
8ba17f382e Verify restore from backup in db_stress (#4655)
Summary:
We already exercised backup functionality in `db_stress` according to the `-backup_one_in` flag. This PR verifies the backup can be restored/opened and sanity checks a few keys. Changes in this PR:

- Extracted existing backup-related logic to a helper function, `TestBackupRestore`
- Added restore logic, which targets a hidden directory named "./.restore\<thread number\>", similar to how backups target hidden directories named "./.backup\<thread number\>".
- After restore, check the existence/non-existence of a few keys.
- With this PR, backup is no longer compatible with clearing column families.
- Also included unrelated fixes to set `ReadOptions::total_order_seek=true` when using `-compare_full_db_state_snapshot`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4655

Differential Revision: D12972496

Pulled By: ajkr

fbshipit-source-id: 481a40052d9a38d1bd5c5159aa4d7c5a4b546b80
2018-11-08 15:15:24 -08:00
Yanqin Jin
d7a04383d1 Include newer RocksDB versions in compat test (#4634)
Summary:
Include 5.16 and 5.17 in check_format_compatible.sh
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4634

Differential Revision: D12947140

Pulled By: riversand963

fbshipit-source-id: 79852b76d5139b2f31db59ed14cb368be01f2c32
2018-11-06 14:25:39 -08:00
Yanqin Jin
50895e5f0d Update manual flush stress test (#4608)
Summary:
Originally, the manual flush calls in db_stress flushes only a single column
family, which is not sufficient when atomic flush is enabled.
With atomic flush, we should call `Flush(flush_opts, cfhs)` to better test this
new feature. Specifically, we manuall flush all column families so that
database verification is easier.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4608

Differential Revision: D12849160

Pulled By: riversand963

fbshipit-source-id: ae1f0dd825247b42c0aba520a5c967335102c876
2018-10-30 17:30:28 -07:00
Abhishek Madan
eaaf1a6f05 Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594)
Summary:
Since the number of range deletions are reported in
TableProperties, it is confusing to not report the number of merge
operands and point deletions as top-level properties; they are
accessible through the public API, but since they are not the "main"
properties, they do not appear in aggregated table properties, or the
string representation of table properties.

This change promotes those two property keys to
`rocksdb/table_properties.h`, adds corresponding uint64 members for
them, deprecates the old access methods `GetDeletedKeys()` and
`GetMergeOperands()` (though they are still usable for now), and removes
`InternalKeyPropertiesCollector`. The property key strings are the same
as before this change, so this should be able to read DBs written from older
versions (though I haven't tested this yet).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594

Differential Revision: D12826893

Pulled By: abhimadan

fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
2018-10-30 15:34:27 -07:00
Yanqin Jin
912bbbbc72 Enable crash-recovery stress test for atomic flush (#4605)
Summary:
This PR adds test of atomic flush to our continuous stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605

Differential Revision: D12840607

Pulled By: riversand963

fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9
2018-10-30 14:03:36 -07:00
Yanqin Jin
7fb39f1ae1 Fix a warning against implicit type conversion (#4593)
Summary:
Test plan
```
$USE_CLANG=1 make -j32 all check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4593

Differential Revision: D12811159

Pulled By: riversand963

fbshipit-source-id: 5e3bbe058c5a8d5a286a19d7643593fc154a2d6d
2018-10-29 09:54:36 -07:00
Yanqin Jin
5b4c709fad Enable atomic flush (#4023)
Summary:
Adds a DB option `atomic_flush` to control whether to enable this feature. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4023

Differential Revision: D8518381

Pulled By: riversand963

fbshipit-source-id: 1e3bb33e99bb102876a31b378d93b0138ff6634f
2018-10-26 15:08:43 -07:00
Zhongyi Xie
fe0d23059d Fix two contrun job failures (#4587)
Summary:
Currently there are two contrun test failures:
* rocksdb-contrun-lite:
> tools/db_bench_tool.cc: In function ‘int rocksdb::db_bench_tool(int, char**)’:
tools/db_bench_tool.cc:5814:5: error: ‘DumpMallocStats’ is not a member of ‘rocksdb’
     rocksdb::DumpMallocStats(&stats_string);
     ^
make: *** [tools/db_bench_tool.o] Error 1
* rocksdb-contrun-unity:
> In file included from unity.cc:44:0:
db/range_tombstone_fragmenter.cc: In member function ‘void rocksdb::FragmentedRangeTombstoneIterator::FragmentTombstones(std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice> >, rocksdb::SequenceNumber)’:
db/range_tombstone_fragmenter.cc:90:14: error: reference to ‘ParsedInternalKeyComparator’ is ambiguous
   auto cmp = ParsedInternalKeyComparator(icmp_);

This PR will fix them
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4587

Differential Revision: D10846554

Pulled By: miasantreble

fbshipit-source-id: 8d3358879e105060197b1379c84aecf51b352b93
2018-10-24 20:16:45 -07:00
Yi Wu
0415244bfa option to print malloc stats at the end of db_bench (#4582)
Summary:
Option to print malloc stats to stdout at the end of db_bench. This is different from `--dump_malloc_stats`, which periodically print the same information to LOG file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4582

Differential Revision: D10520814

Pulled By: yiwu-arbug

fbshipit-source-id: beff5e514e414079d31092b630813f82939ffe5c
2018-10-24 11:39:05 -07:00
Simon Grätzer
f959e88048 Fix printf formatting on MacOS (#4533)
Summary:
On MacOS with clang the compilation of _tools/db_bench_tool.cc_ always fails because the format used in a `fprintf` call has the wrong type. This PR should hopefully fix this issue
```
tools/db_bench_tool.cc:4233:61: error: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long')
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4533

Differential Revision: D10471657

Pulled By: maysamyabandeh

fbshipit-source-id: f20f5f3756d3571b586c895c845d0d4d1e34a398
2018-10-19 14:46:09 -07:00
Yanqin Jin
da4aa59b4c Add read retry support to log reader (#4394)
Summary:
Current `log::Reader` does not perform retry after encountering `EOF`. In the future, we need the log reader to be able to retry tailing the log even after `EOF`.

Current implementation is simple. It does not provide more advanced retry policies. Will address this in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4394

Differential Revision: D9926508

Pulled By: riversand963

fbshipit-source-id: d86d145792a41bd64a72f642a2a08c7b7b5201e1
2018-10-19 11:53:00 -07:00
Abhishek Madan
35cd754a6d Add writes_before_delete_range flag to db_bench (#4538)
Summary:
The new flag allows tombstones to be generated after enough
keys have been written to the database, which makes it easier to ensure
that tombstones cover a lot of keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4538

Differential Revision: D10455685

Pulled By: abhimadan

fbshipit-source-id: f25d5421745a353c830dea12b79784e852056551
2018-10-18 17:19:59 -07:00
Zhongyi Xie
d6ec288703 Add PerfContextByLevel to provide per level perf context information (#4226)
Summary:
Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level.
This will be helpful when analyzing point and range query performance as well as tuning bloom filter
Also replaced __thread with thread_local keyword for perf_context
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226

Differential Revision: D10369509

Pulled By: miasantreble

fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82
2018-10-17 11:19:40 -07:00
Young Tack Jin
c648d90f8e benchmark.sh: to fix divide by zero runtime error (#4442)
Summary:
"Write (GB)" of $9 rather than "Rnp1 (GB)" of $8
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4442

Differential Revision: D10318193

Pulled By: yiwu-arbug

fbshipit-source-id: 03a7ef1938d9332e06fb3fd8490ca212f61fac6b
2018-10-10 21:03:19 -07:00
Zhichao Cao
7ca1a1f0d8 Fix trace_analyzer potential huge memory wasting due to no valid query analyzed (#4473)
Summary:
If the query types being analyzed do not appear in the trace, the current trace_analyzer will use 0 as the begin time, which create the time duration from 1970/01/01 to the now time. It will waste huge memory. Fixed by adding the trace_create_time to limit the duration.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4473

Differential Revision: D10246204

Pulled By: zhichao-cao

fbshipit-source-id: 42850b080b2e62f586fe73afd7737c2246d1a8c8
2018-10-10 10:00:00 -07:00
Igor Canadi
1cf5deb8fd Introduce CacheAllocator, a custom allocator for cache blocks (#4437)
Summary:
This is a conceptually simple change, but it touches many files to
pass the allocator through function calls.

We introduce CacheAllocator, which can be used by clients to configure
custom allocator for cache blocks. Our motivation is to hook this up
with folly's `JemallocNodumpAllocator`
(f43ce6d686/folly/experimental/JemallocNodumpAllocator.h),
but there are many other possible use cases.

Additionally, this commit cleans up memory allocation in
`util/compression.h`, making sure that all allocations are wrapped in a
unique_ptr as soon as possible.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437

Differential Revision: D10132814

Pulled By: yiwu-arbug

fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564
2018-10-02 17:24:58 -07:00
Andrew Kryczka
d56070d875 Fix benchmark script with vector memtable (#4428)
Summary:
I guess we didn't update this script when `--allow_concurrent_memtable_write` became true by default.

Fixes #4413.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4428

Differential Revision: D10036452

Pulled By: ajkr

fbshipit-source-id: f464be0642bd096d9040f82cdc3eae614a902183
2018-09-26 13:22:45 -07:00
Abhishek Madan
519f8b145f Generate appropriate number of keys in db_bench (#4404)
Summary:
If range tombstones are generated every few writes, the
KeyGenerator's limit is now extended to account for the additional
Next() calls. This is primarily important for `filluniquerandom`
benchmarks that enforce the call limit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404

Differential Revision: D9949326

Pulled By: abhimadan

fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38
2018-09-19 16:28:21 -07:00
Zhongyi Xie
9b3cf908a6 add missing range in random.choice argument (#4397)
Summary:
This will fix the broken asan crash test:
> Traceback (most recent call last):
  File "tools/db_crashtest.py", line 384, in <module>
    main()
  File "tools/db_crashtest.py", line 368, in main
    parser.add_argument("--" + k, type=type(v() if callable(v) else v))
  File "tools/db_crashtest.py", line 59, in <lambda>
    "index_block_restart_interval": lambda: random.choice(1, 16),
TypeError: choice() takes exactly 2 arguments (3 given)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397

Differential Revision: D9933041

Pulled By: miasantreble

fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
2018-09-19 12:13:20 -07:00
Maysam Yabandeh
a0ebec3804 Extend crash test with index_block_restart_interval (#4383)
Summary:
The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383

Differential Revision: D9887304

Pulled By: maysamyabandeh

fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
2018-09-18 15:43:29 -07:00
Andrew Kryczka
8c25204633 Support manual flush in stress/crash tests (#4368)
Summary:
- Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
- Enabled by default in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368

Differential Revision: D9838593

Pulled By: ajkr

fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
2018-09-17 12:27:55 -07:00
Anand Ananthabhotla
a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00
Dmitri Smirnov
879998b369 Adjust c test and fix windows compilation issues
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4369

Differential Revision: D9844200

Pulled By: sagar0

fbshipit-source-id: 0d9f5f73b28234eaac55d3551ce4e2dc177af138
2018-09-14 20:57:22 -07:00
Andrew Kryczka
c94523ee56 Delete code for WAL reader to start at nonzero offset (#4362)
Summary:
The code is dead in RocksDB as `log::Reader::initial_offset_` is always zero. We should delete it so we don't have to maintain it like in #4359.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4362

Differential Revision: D9817829

Pulled By: ajkr

fbshipit-source-id: 474a2c679e5bd273b40608f3a5332931d9eefe6d
2018-09-13 17:13:03 -07:00
kckjn97
902261519e correct mistyped msg. (#4341)
Summary:
corrected the mistyped message.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4341

Differential Revision: D9816571

Pulled By: ajkr

fbshipit-source-id: 1df0424e981a01470a638a37b925c4133d59a48b
2018-09-13 14:57:38 -07:00
Maysam Yabandeh
3f5282268f Skip concurrency control during recovery of pessimistic txn (#4346)
Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346

Differential Revision: D9759149

Pulled By: maysamyabandeh

fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
2018-09-10 16:57:53 -07:00
Andrew Kryczka
2c14662213 Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347)
Summary:
Reverting is needed to unblock a user building against master, who is blocked for multiple days due to a thread-safety issue in `GetEmptyDict`. We haven't been able to fix it quickly, so reverting.

Simply ran `git revert 6c40806e51a89386d2b066fddf73d3fd03a36f65`. There were no merge conflicts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4347

Differential Revision: D9668365

Pulled By: ajkr

fbshipit-source-id: 0c56334f0a23cf5ee0233d4e4679eae6709739cd
2018-09-06 09:58:34 -07:00
Andrew Kryczka
1a88c43751 Reduce empty SST creation/deletion in compaction (#4336)
Summary:
This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug.

Also fixed an uninitialized variable in db_stress.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336

Differential Revision: D9600080

Pulled By: ajkr

fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7
2018-08-31 12:28:52 -07:00
Zhongyi Xie
1cf17ba53b Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323)
Summary:
Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323

Differential Revision: D9599170

Pulled By: miasantreble

fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
2018-08-30 18:42:51 -07:00
Shrikanth Shankar
4848bd0c4e Drop unnecessary deletion markers during compaction (issue - 3842) (#4289)
Summary:
This PR fixes issue 3842. We drop deletion markers iff
1. We are the bottom most level AND
2. All other occurrences of the key are in the same snapshot range as the delete

I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far)

make check -j64
db_stress
db_stress  --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */
./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289

Differential Revision: D9491165

Pulled By: shrikanthshankar

fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2
2018-08-24 15:17:54 -07:00
Yanqin Jin
8022500ecc Add compatibility test of SST ingestion (#4310)
Summary:
Test plan
```
$cd rocksdb/
$./tools/check_format_compatible.sh
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4310

Differential Revision: D9498125

Pulled By: riversand963

fbshipit-source-id: 83cf6992949a52199e7812bb41bc9281ac271a24
2018-08-24 14:27:43 -07:00
Andrew Kryczka
e7bb8e9b92 Fix clang build of db_stress (#4312)
Summary:
Blame: #4307
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312

Differential Revision: D9494093

Pulled By: ajkr

fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
2018-08-23 21:57:57 -07:00
Andrew Kryczka
6c40806e51 Digest ZSTD compression dictionary once per SST file (#4251)
Summary:
In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.

ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.

There are a couple other changes included in this PR:

- Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
- Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251

Differential Revision: D9257078

Pulled By: ajkr

fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
2018-08-23 19:28:18 -07:00
Andrew Kryczka
ee234e83e3 Invoke OnTableFileCreated for empty SSTs (#4307)
Summary:
The API comment on `OnTableFileCreationStarted` (b6280d01f9/include/rocksdb/listener.h (L331-L333)) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.

This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307

Differential Revision: D9485201

Pulled By: ajkr

fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
2018-08-23 18:27:30 -07:00
zhichao-cao
cf7150ac2e Add the unit test of Iterator to trace_analyzer_test (#4282)
Summary:
Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282

Differential Revision: D9436758

Pulled By: zhichao-cao

fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
2018-08-23 17:28:32 -07:00
Yanqin Jin
bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00
Fenggang Wu
9d646a6311 Add db_bench options of data block hash index (#4281)
Summary:
Add `--data_block_index_type` and `--data_block_hash_table_util_ratio` option to `db_bench`.

`--data_block_index_type` can be either of `binary` (default) or `binary_and_hash`;
`--data_block_hash_table_util_ratio` will be a double. The default value is `0.75`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4281

Differential Revision: D9361476

Pulled By: fgwu

fbshipit-source-id: dc53e01acef9db81b9eec5e8a96f3bc8ed718c10
2018-08-16 18:42:46 -07:00
Siying Dong
9c0c8f5ff6 GetAllKeyVersions() to take an extra argument of max_num_ikeys. (#4271)
Summary:
Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271

Differential Revision: D9369149

Pulled By: siying

fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf
2018-08-16 15:57:08 -07:00
Zhichao Cao
8ae2bf5331 Fix the build and test bugs in the Trace_analyzer (#4274)
Summary:
The wrong options are used in the trace_analyzer_test, removed. The potential loses integer precision are fixed.

Pass the specified testing case, make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4274

Reviewed By: yiwu-arbug

Differential Revision: D9327811

Pulled By: zhichao-cao

fbshipit-source-id: d62cb18d6586503a490cd323bfc1c672b68b346e
2018-08-14 18:27:48 -07:00
Anand Ananthabhotla
bf07e90cf2 Fix db_stress assertion failures on 0 byte SSTs (#4273)
Summary:
In the OnTableFileCreation() listener, assert on various TableProperties
only when file size > 0 bytes. The listener can get called even for 0
byte SSTs which have been deleted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4273

Differential Revision: D9322738

Pulled By: anand1976

fbshipit-source-id: 17cdfb3d0da946b9a158d7328e5db1c87973956b
2018-08-14 14:58:26 -07:00
Maysam Yabandeh
d122025891 Extend stress test to format_version 4 (#4265)
Summary:
Stress tests currently cover format_version 2 and 3. The patch adds 4 as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4265

Differential Revision: D9323185

Pulled By: maysamyabandeh

fbshipit-source-id: 54d11e41ecae09bae14cadd7313f07c9a3db5a57
2018-08-14 14:13:33 -07:00
Zhichao Cao
999d955e4f RocksDB Trace Analyzer (#4091)
Summary:
A framework of trace analyzing for RocksDB

After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
**Input:**
1. trace file
2. Whole keys space file

**Statistics:**
1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
2. Key hotness (access count) of each one
3. Key space separation based on given prefix
4. Key size distribution
5. Value size distribution if appliable
6. Top K accessed keys
7. QPS statistics including the average QPS and peak QPS
8. Top K accessed prefix
9. The query correlation analyzing, output the number of X after Y and the corresponding average time
    intervals

**Output:**
1. key access heat map (either in the accessed key space or whole key space)
2. trace sequence file (interpret the raw trace file to line base text file for future use)
3. Time serial (The key space ID and its access time)
4. Key access count distritbution
5. Key size distribution
6. Value size distribution (in each intervals)
7. whole key space separation by the prefix
8. Accessed key space separation by the prefix
9. QPS of each operation and each column family
10. Top K QPS and their accessed prefix range

**Test:**
1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
2. Generated the trace and analyze the trace

**Implemented but not tested (due to the limitation of trace_replay):**
1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
2. Analyzing the number of Key found by Get

**Future Work:**
1.  Support execution time analyzing of each requests
2.  Support cache hit situation and block read situation of Get
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091

Differential Revision: D9256157

Pulled By: zhichao-cao

fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
2018-08-13 11:44:02 -07:00
Yanqin Jin
1b1d264342 Remove an assersion about file size (#4268)
Summary:
Due to 4ea56b1bd0, we should also remove the
assersion in stress test. This removal can be temporary, and we can add it back
once we figure out the reason for the 0-byte SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4268

Differential Revision: D9297186

Pulled By: riversand963

fbshipit-source-id: cebba9a68f42e815f8cf24471176d2cfdf962f63
2018-08-13 11:12:50 -07:00
Yanqin Jin
b271f956c2 Fix a TSAN failure (#4250)
Summary:
TSAN fails due to comparison between signed int and unsigned long. Fix it by
static_casting.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4250

Differential Revision: D9256535

Pulled By: riversand963

fbshipit-source-id: c6bad23ff70c6d0ec58e2e85c401ce0ad45de609
2018-08-09 19:42:32 -07:00
Dmitri Smirnov
ab22cf349e Implement Env::NumFileLinks (#4221)
Summary:
Although delete scheduler implementation allows for the interface not to be supported, the delete_scheduler_test does not allow for that.
Address compiler warnings
Make sst_dump_test use test directory structure as the current execution directory may not be writiable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4221

Differential Revision: D9210152

Pulled By: siying

fbshipit-source-id: 381a74511e969ecb8089d5c4b4df87dc30c8df63
2018-08-09 14:29:11 -07:00
Yanqin Jin
de7f423a82 Add SST ingestion to ldb (#4205)
Summary:
We add two subcommands `write_extern_sst` and `ingest_extern_sst` to ldb. This PR avoids changing existing code because we hope to cherry-pick to earlier releases to support compatibility check for external SST file ingestion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4205

Differential Revision: D9112711

Pulled By: riversand963

fbshipit-source-id: 7cae88380d4de86da8440230e87eca66755648e4
2018-08-09 14:29:11 -07:00
Andrew Kryczka
7a9a164276 Fix db_bench default compression level (#4248)
Summary:
db_bench's previous default compression level (-1) was not the default compression level in all libraries. In particular, in ZSTD negative values are valid compression levels, while ZSTD's default compression level is three.

This PR changes db_bench's default to be RocksDB's library-independent default compression level (see #3895). I also changed a couple other flags to get their default values from an options object directly rather than hardcoding.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4248

Differential Revision: D9235140

Pulled By: ajkr

fbshipit-source-id: be4e0722d59fa1968832183db36d1d20fcf11e5b
2018-08-09 10:28:14 -07:00
Andrew Kryczka
6175b4b294 Support dictionary compression in stress/crash tests (#4234)
Summary:
- Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test
- Randomly enable/disable the above flags in crash test
- Set `--compression_type=zstd` in FB-specific crash test runs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234

Differential Revision: D9187207

Pulled By: ajkr

fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2
2018-08-06 15:27:29 -07:00
Sagar Vemuri
fefdac1004 Fix lite build failure in db_bench due to trace/replay (#4225)
Summary:
Fix lite build failure in db_bench due to trace/replay feature.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4225

Differential Revision: D9153303

Pulled By: sagar0

fbshipit-source-id: 9f7a8035429d0dcdbe99616d11389ed7bccf44be
2018-08-03 11:58:55 -07:00
Pooja Malik
9dbf39399e Rules Advisor: some fixes to support fetching stats from ODS (#4223)
Summary:
This PR includes fixes for some bugs that I encountered while testing the Optimizer with ODS stats support.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4223

Differential Revision: D9140786

Pulled By: poojam23

fbshipit-source-id: 045cb3f27d075c2042040ac2d561938349419516
2018-08-02 15:42:42 -07:00
Pooja Malik
892a156267 Advisor: README and blog, and also tests for DBBenchRunner, DatabaseOptions (#4201)
Summary:
This pull request adds a README file and a blog post for the Advisor tool. It also adds the missing tests for some Optimizer modules. Some comments are added to the classes being tested for improved readability.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4201

Reviewed By: maysamyabandeh

Differential Revision: D9125311

Pulled By: poojam23

fbshipit-source-id: aefcf2f06eaa05490cc2834ef5aa6e21f0d1dc55
2018-08-01 16:13:09 -07:00
Sagar Vemuri
12b6cdeed3 Trace and Replay for RocksDB (#3837)
Summary:
A framework for tracing and replaying RocksDB operations.

A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench.

- Column-families are supported
- Multi-threaded tracing is supported.
- TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it.
- This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations.

Currently supported DB operations:
- Writes:
-- Put
-- Merge
-- Delete
-- SingleDelete
-- DeleteRange
-- Write
- Reads:
-- Get (point lookups)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837

Differential Revision: D7974837

Pulled By: sagar0

fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72
2018-08-01 00:27:08 -07:00
Yanqin Jin
8abafb1feb Generalize parameters generation. (#4046)
Summary:
Making generation of column families and keys virtual function so that
subclasses of StressTest can override them to provide custom parameter
generation for more flexibility. This will be useful for future tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4046

Differential Revision: D9073382

Pulled By: riversand963

fbshipit-source-id: 2754f0fdfa5c24d95c1f92d4944bc479552fb665
2018-07-30 17:42:12 -07:00
Yanqin Jin
54de56844d Remove random writes from SST file ingestion (#4172)
Summary:
RocksDB used to store global_seqno in external SST files written by
SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the
`global_seqno`. Since random write is not supported in some non-POSIX compliant
file systems, external SST file ingestion is not supported on these file
systems. To address this limitation, we no longer update `global_seqno` during
file ingestion. Later RocksDB uses the MANIFEST and other information in table
properties to deduce global seqno for externally-ingested SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172

Differential Revision: D8961465

Pulled By: riversand963

fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071
2018-07-27 16:12:23 -07:00
DorianZheng
f5e46354d2 Protect external file when ingesting (#4099)
Summary:
If crash happen after a hard link established, Recover function may reuse the file number that has already assigned to the internal file, and this will overwrite the external file. To protect the external file, we have to make sure the file number will never being reused.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/4099

Differential Revision: D9034092

Pulled By: riversand963

fbshipit-source-id: 3f1a737440b86aa2ef01673e5013aacbb7c33e28
2018-07-27 14:13:12 -07:00
Pooja Malik
134a52e144 Optimizer's skeleton: use advisor to optimize config options (#4169)
Summary:
In https://github.com/facebook/rocksdb/pull/3934 we introduced advisor scripts that make suggestions in the config options based on the log file and stats from a run of rocksdb. The optimizer runs the advisor on a benchmark application in a loop and automatically applies the suggested changes until the config options are optimized. This is a work in progress and the patch is the initial skeleton for the optimizer. The sample application that is run in the loop is currently dbbench.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4169

Reviewed By: maysamyabandeh

Differential Revision: D9023671

Pulled By: poojam23

fbshipit-source-id: a6192d475c462cf6eb2b316716f97cb400fcb64d
2018-07-26 17:13:32 -07:00
Siying Dong
4b0a43574a db_stress to cover upper bound in iterators (#4162)
Summary:
db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162

Differential Revision: D8953507

Pulled By: siying

fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b
2018-07-23 10:45:29 -07:00
Zhichao Cao
6811fb0658 Fixed the db_bench MergeRandom only access CF_default (#4155)
Summary:
When running the tracing and analyzing, I found that MergeRandom benchmark in db_bench only access the default column family even the -num_column_families is specified > 1.

changes: Using the db_with_cfh as DB to randomly select the column family to execute the Merge operation if -num_column_families is specified > 1.

Tested with make asan_check and verified in tracing
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4155

Differential Revision: D8907888

Pulled By: zhichao-cao

fbshipit-source-id: 2b4bc8fe0e99c8f262f5be6b986c7025d62cf850
2018-07-20 15:58:54 -07:00
Siying Dong
a5e851e113 Reformatting some recent changes (#4161)
Summary:
Lint is not happy with some new code recently committed. Format them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4161

Differential Revision: D8940582

Pulled By: siying

fbshipit-source-id: c9b43b1ef8c88b5e923911058b44eb77234b36b7
2018-07-20 14:43:38 -07:00
Pooja Malik
1857576e03 db_bench support for OPTIONS+bloom and nicer output for perf_context (#4153)
Summary:
Adding the string "PERF_CONTEXT:" before the perf_context stats are printed. Setting the filter policy if it's a block based table even when options are being loaded from the provided FLAGS_options_file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4153

Differential Revision: D8905517

Pulled By: poojam23

fbshipit-source-id: 5956ed7882d39ec8ae654d5dadeb88727a36f0dd
2018-07-18 16:27:49 -07:00
Maysam Yabandeh
8581a93a6b Per-thread unique test db names (#4135)
Summary:
The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135

Differential Revision: D8846653

Pulled By: maysamyabandeh

fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
2018-07-13 17:27:39 -07:00
Zhongyi Xie
23b76252c8 db_bench: enable setting cache_size when loading options file
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4118

Differential Revision: D8845554

Pulled By: miasantreble

fbshipit-source-id: 13bd3c1259a7c30bad762a413fe3bb24eea650ba
2018-07-13 16:43:53 -07:00
Zhongyi Xie
de98fd88e3 Support compaction filter in db_bench (#4106)
Summary:
Right now there is no support for enabling compaction filter in db_bench, we should add support for that to facilitate testing of compaction filter.
This PR adds a compaction filter called KeepFilter and make `Filter` always returns false, essentially a noop compaction filter. This will allow us to test compaction filter code path without having to support arbitrary compaction filters
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4106

Differential Revision: D8828517

Pulled By: miasantreble

fbshipit-source-id: 9ad76d04103eaa9d00da98334b4a39e542d26c41
2018-07-12 19:42:27 -07:00
Andrew Kryczka
97fe23fc5c Fix unsigned int flag in db_bench (#4129)
Summary:
`DEFINE_uint32` was unavailable on some platforms, e.g., https://travis-ci.org/facebook/rocksdb/jobs/403352902. Use `DEFINE_uint64` instead which should work as it's used many times elsewhere in this file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4129

Differential Revision: D8830311

Pulled By: ajkr

fbshipit-source-id: b4fc90ba3f50e649c070ce8069c68e530d731f05
2018-07-12 18:43:23 -07:00
Andrew Kryczka
63904434eb db_bench periodically dump stats to info log (#4109)
Summary:
give control of how often stats are printed, including jemalloc stats if enabled. Previously the default was 10 minutes so we'd only see updated stats for very long benchmark runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4109

Differential Revision: D8796444

Pulled By: ajkr

fbshipit-source-id: fd7902fe3f105fae89322c4ab63316bba4a2b15e
2018-07-12 15:57:42 -07:00
Manuel Ung
b9846370e9 WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078)
Summary:
This adds support for recovering WriteUnprepared transactions through the following changes:
- The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
- `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
- `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.

Commit/Rollback of live transactions is still unimplemented and will come later.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078

Differential Revision: D8703382

Pulled By: lth

fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
2018-07-06 17:59:13 -07:00
Maysam Yabandeh
235ab9dd32 Pin mmap files in ReadOnlyDB (#4053)
Summary:
https://github.com/facebook/rocksdb/pull/3881 fixed a bug where PinnableSlice pin mmap files which could be deleted with background compaction. This is however a non-issue for ReadOnlyDB when there is no compaction running and max_open_files is -1. This patch reenables the pinning feature for that case.
Closes https://github.com/facebook/rocksdb/pull/4053

Differential Revision: D8662546

Pulled By: maysamyabandeh

fbshipit-source-id: 402962602eb0f644e17822748332999c3af029fd
2018-06-27 17:13:34 -07:00
Peter (Stig) Edwards
2694b6dc26 Remove unused imports, from python scripts. (#4057)
Summary:
Also remove redefined variable.
As reported on https://lgtm.com/projects/g/facebook/rocksdb/
Closes https://github.com/facebook/rocksdb/pull/4057

Differential Revision: D8648342

Pulled By: ajkr

fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183
2018-06-26 12:43:04 -07:00
Yanqin Jin
2729dd72ad Reclaim memory allocated to backup_engine.
Summary: Closes https://github.com/facebook/rocksdb/pull/4045

Differential Revision: D8595609

Pulled By: riversand963

fbshipit-source-id: 5ba5954d804b82b0e7264b2e18e1da4c94103b53
2018-06-23 17:12:14 -07:00
Maysam Yabandeh
80ade9ad83 Pin top-level index on partitioned index/filter blocks (#4037)
Summary:
Top-level index in partitioned index/filter blocks are small and could be pinned in memory. So far we use that by cache_index_and_filter_blocks to false. This however make it difficult to keep account of the total memory usage. This patch introduces pin_top_level_index_and_filter which in combination with cache_index_and_filter_blocks=true keeps the top-level index in cache and yet pinned them to avoid cache misses and also cache lookup overhead.
Closes https://github.com/facebook/rocksdb/pull/4037

Differential Revision: D8596218

Pulled By: maysamyabandeh

fbshipit-source-id: 3a5f7f9ca6b4b525b03ff6bd82354881ae974ad2
2018-06-22 15:27:46 -07:00
Yi Wu
c726f7fda8 Fix dangling checkpoint pointer in db_stress (#4042)
Summary:
Fix db_stress failed to delete checkpoint pointer. It's caught by asan_crash test.
Closes https://github.com/facebook/rocksdb/pull/4042

Differential Revision: D8592604

Pulled By: yiwu-arbug

fbshipit-source-id: 7b2d67d5e3dfb05f71c33fcf320482303e97d3ef
2018-06-22 11:43:50 -07:00
Andrew Kryczka
0a5b16c7c5 Cleanup staging directory at start of checkpoint (#4035)
Summary:
- Attempt to clean the checkpoint staging directory before starting a checkpoint. It was already cleaned up at the end of checkpoint. But it wasn't cleaned up in the edge case where the process crashed while staging checkpoint files.
- Attempt to clean the checkpoint directory before calling `Checkpoint::Create` in `db_stress`. This handles the case where checkpoint directory was created by a previous `db_stress` run but the process crashed before cleaning it up.
- Use `DestroyDB` for cleaning checkpoint directory since a checkpoint is a DB.
Closes https://github.com/facebook/rocksdb/pull/4035

Reviewed By: yiwu-arbug

Differential Revision: D8580223

Pulled By: ajkr

fbshipit-source-id: 28c667400e249fad0fdedc664b349031b7b61599
2018-06-21 16:27:12 -07:00
Yanqin Jin
397495964b Fix a warning (treated as error) caused by type mismatch.
Summary: Closes https://github.com/facebook/rocksdb/pull/4032

Differential Revision: D8573061

Pulled By: riversand963

fbshipit-source-id: 112324dcb35956d6b3ec891073f4f21493933c8b
2018-06-21 11:13:09 -07:00
Yanqin Jin
524c6e6b72 Add file name info to SequentialFileReader. (#4026)
Summary:
We potentially need this information for tracing, profiling and diagnosis.
Closes https://github.com/facebook/rocksdb/pull/4026

Differential Revision: D8555214

Pulled By: riversand963

fbshipit-source-id: 4263e06c00b6d5410b46aa46eb4e358ff2161dd2
2018-06-21 08:42:24 -07:00
Andrew Kryczka
14cee194d6 Support file ingestion in stress test (#4018)
Summary:
Once per `ingest_external_file_one_in` operations, uses SstFileWriter to create a file containing `ingest_external_file_width` consecutive keys. The file is named containing the thread ID to avoid clashes. The file is then added to the DB using `IngestExternalFile`.

We can't enable it by default in crash test because `nooverwritepercent` and `test_batches_snapshot` both must be zero for the DB's whole lifetime. Perhaps we should setup a separate test with that config as range deletion also requires it.
Closes https://github.com/facebook/rocksdb/pull/4018

Differential Revision: D8507698

Pulled By: ajkr

fbshipit-source-id: 1437ea26fd989349a9ce8b94117241c65e40f10f
2018-06-20 22:27:45 -07:00
Andrew Kryczka
7f3a634e06 Support pipelined write in stress/crash tests
Summary: Closes https://github.com/facebook/rocksdb/pull/4019

Differential Revision: D8508681

Pulled By: ajkr

fbshipit-source-id: 23a3c07d642386446e322b02e69cdf70d12ef009
2018-06-19 09:14:12 -07:00
Andrew Kryczka
8585059ae0 Support backup and checkpoint in db_stress (#4005)
Summary:
Add the `backup_one_in` and `checkpoint_one_in` options to periodically trigger backups and checkpoints. The directory names contain thread ID to avoid clashing with parallel backups/checkpoints. Enable checkpoint in crash test so our CI runs will use it. Didn't enable backup in crash test since it copies all the files which is too slow.
Closes https://github.com/facebook/rocksdb/pull/4005

Differential Revision: D8472275

Pulled By: ajkr

fbshipit-source-id: ff91bdc37caac4ffd97aea8df96b3983313ac1d5
2018-06-18 19:28:18 -07:00
Andrew Kryczka
de2c6fb158 Fix stderr processing in crash test (#4006)
Summary:
Fixed bug where `db_stress` output a line with a warning followed by a line with an error, and `db_crashtest.py` considered that a success. For example:

```
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
open error: Corruption: SST file is ahead of WALs
```
Closes https://github.com/facebook/rocksdb/pull/4006

Differential Revision: D8473463

Pulled By: ajkr

fbshipit-source-id: 60461bdd7491d9d26c63f7d4ee522a0f88ba3de7
2018-06-18 17:58:13 -07:00
Hans-Wilhelm Warlo
4faaab70a6 Benchmark sine wave write rate limit (#3914)
Summary:
As mentioned at the [dev forum.](https://www.facebook.com/groups/rocksdb.dev/1693425187422655/)

Let me know if you would like me to do any changes!
Closes https://github.com/facebook/rocksdb/pull/3914

Differential Revision: D8452824

Pulled By: siying

fbshipit-source-id: 56439b3228ecdcc5a199d5198eff2fab553be961
2018-06-15 12:12:03 -07:00
Siying Dong
f5281a53a4 tools/check_format_compatible.sh to cover forward option reading too (#3994)
Summary:
Make sure that some recent releases can read master's option files while ignoring unknown options. Also add two more recent release branches.
Closes https://github.com/facebook/rocksdb/pull/3994

Differential Revision: D8409499

Pulled By: siying

fbshipit-source-id: 1b025f19ba288da0517f6b4572797573e23e23c2
2018-06-15 11:12:29 -07:00
Andrew Kryczka
7497f992e0 Run manual compaction in stress/crash tests (#3936)
Summary:
- Add support to `db_stress` for `CompactRange`
- Enable `CompactRange` and `CompactFiles` in crash tests
Closes https://github.com/facebook/rocksdb/pull/3936

Differential Revision: D8230953

Pulled By: ajkr

fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8
2018-06-13 16:45:28 -07:00
Andrew Kryczka
dd216dd76a Choose unique keys faster in db_stress (#3990)
Summary:
db_stress initialization randomly chooses a set of keys to not overwrite. It was doing it separately for each column family. That caused 30+ second initialization times for the non-simple crash tests, which have 10 CFs. This PR:

- reuses the same set of randomly chosen no-overwrite keys across all CFs
- logs a couple more timestamps so we can more easily see initialization time
Closes https://github.com/facebook/rocksdb/pull/3990

Differential Revision: D8393821

Pulled By: ajkr

fbshipit-source-id: d0b263a298df607285ffdd8b0983ff6575cc6c34
2018-06-13 13:43:23 -07:00
Yanqin Jin
3470c75852 Fix build errors.
Summary: Closes https://github.com/facebook/rocksdb/pull/3967

Differential Revision: D8322775

Pulled By: riversand963

fbshipit-source-id: bd73067bd5d3ed4627348f0685bc499359ad6442
2018-06-07 15:43:09 -07:00
Zhichao Cao
23e1d23675 Fixed the fprintf of uint64_t by using PRIu64 (#3963)
Summary:
Fixed the fprintf format of uint64_t by using PRIu64 in file tools/ldb_cmd.cc
Closes https://github.com/facebook/rocksdb/pull/3963

Differential Revision: D8306179

Pulled By: zhichao-cao

fbshipit-source-id: 597dcd55321576801bbf2cf4714736ebc4750a0c
2018-06-07 11:44:48 -07:00
Yanqin Jin
0a0860a5fb Refactoring db_stress.cc (#3902)
Summary:
We use `db_stress.cc` intensively to test and verify the behavior of RocksDB. Sometimes we need to add new tests for recently added features. Original `StressTest` class provides many general functionality that can be leveraged by other tests. Therefore, in this refactoring PR, I try to identify the general operations as well as operations that future tests most likely want to customize. Future tests can inherit `StressTest` and overriding the virtual functions to test custom logic.
Closes https://github.com/facebook/rocksdb/pull/3902

Differential Revision: D8284607

Pulled By: riversand963

fbshipit-source-id: 019302d04665a2b18334b6d05d04a477168c8ea4
2018-06-07 10:43:00 -07:00
Pooja Malik
5504a056f8 Adding advisor Rules and parser scripts with unit tests. (#3934)
Summary:
This adds some rules in the tools/advisor/advisor/rules.ini (refer this for more information) file and corresponding python parser scripts for parsing the rules file and the rocksdb LOG and OPTIONS files. This is WIP for adding rules depending on ODS. The starting point of the script is the rocksdb/tools/advisor/advisor/rule_parser.py file.
Closes https://github.com/facebook/rocksdb/pull/3934

Reviewed By: maysamyabandeh

Differential Revision: D8304059

Pulled By: poojam23

fbshipit-source-id: 47f2a50f04d46d40e225dd1cbf58ba490f79e239
2018-06-06 14:42:59 -07:00
Zhongyi Xie
f1592a06c2 run make format for PR 3838 (#3954)
Summary:
PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings.
Run `make format` to fix formatting as suggested by siying .
Also piggyback two changes:
1) fix singleton destruction order for windows and posix env
2) fix two clang warnings
Closes https://github.com/facebook/rocksdb/pull/3954

Differential Revision: D8272041

Pulled By: miasantreble

fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f
2018-06-05 12:58:02 -07:00
Maysam Yabandeh
d0c38c0c8c Extend some tests to format_version=3 (#3942)
Summary:
format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3.
Closes https://github.com/facebook/rocksdb/pull/3942

Differential Revision: D8238413

Pulled By: maysamyabandeh

fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035
2018-06-04 20:13:00 -07:00
Dmitri Smirnov
f4b72d7056 Provide a way to override windows memory allocator with jemalloc for ZSTD
Summary:
Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably.

For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance.

The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression.
Closes https://github.com/facebook/rocksdb/pull/3838

Differential Revision: D8229794

Pulled By: miasantreble

fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d
2018-06-04 12:12:48 -07:00
Andrew Kryczka
4f297ad05f Fix crash test check for direct I/O
Summary:
We need to keep the DB directory around since the direct IO check in "db_crashtest.py" relies on it existing. This PR fixes an issue where it was removed after each stress test run during the second half of whitebox crash testing.
Closes https://github.com/facebook/rocksdb/pull/3946

Differential Revision: D8247998

Pulled By: ajkr

fbshipit-source-id: 4e7cffbdab9b40df125e7842d0d59916e76261d3
2018-06-03 21:42:12 -07:00
Andrew Kryczka
88c3ee2d31 Configure direct I/O statically in db_stress
Summary:
Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested):

- It's a DB option so SetDBOptions should've been called instead
- It's not a dynamic option so even SetDBOptions would fail
- It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU.

In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT.
Closes https://github.com/facebook/rocksdb/pull/3939

Differential Revision: D8238120

Pulled By: ajkr

fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279
2018-06-01 16:42:34 -07:00
Jacquin Mininger
727eb881a5 Compile error in db bench tool
Summary:
Small format error below causes build to fail. I believe that this :
```
fprintf(stderr, "num reads to do %lu\n", reads_);
```
Can be changed to this:
```
fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_);
```
Successful build
```
  CC       utilities/blob_db/blob_dump_tool.o
  AR       librocksdb_debug.a
ar: creating archive librocksdb_debug.a
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols
  CC       tools/db_bench.o
  CC       tools/db_bench_tool.o
tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat]
    fprintf(stderr, "num reads to do %lu\n", reads_);
                                     ~~~     ^~~~~~
                                     %lld
1 error generated.
make: *** [tools/db_bench_tool.o] Error 1
```

```
$ cd rocksdb
$ make all

$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```
Closes https://github.com/facebook/rocksdb/pull/3909

Differential Revision: D8215710

Pulled By: siying

fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751
2018-05-30 18:01:36 -07:00
Yi Wu
bc7e8d472e LRUCache midpoint insertion
Summary:
Implement midpoint insertion strategy where new blocks will be insert to the middle of LRU list, then move the head on the first hit in cache.
Closes https://github.com/facebook/rocksdb/pull/3877

Differential Revision: D8100895

Pulled By: yiwu-arbug

fbshipit-source-id: f4bd83cb8be469e5d02072cfc8bd66011391f3da
2018-05-24 15:57:33 -07:00
Dmitri Smirnov
3db8504cde Catchup with posix features
Summary:
Catch up with Posix features
  NewWritableRWFile must fail when file does not exists
  Implement Env::Truncate()
  Adjust Env options optimization functions
  Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857

Differential Revision: D8053610

Pulled By: ajkr

fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
2018-05-24 15:13:04 -07:00
Andrew Kryczka
fcb31016e9 Avoid single-deleting merge operands in db_stress
Summary:
I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them.
Closes https://github.com/facebook/rocksdb/pull/3878

Differential Revision: D8097346

Pulled By: ajkr

fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1
2018-05-22 10:58:36 -07:00
Zhongyi Xie
c3ebc75843 Move prefix_extractor to MutableCFOptions
Summary:
Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
This PR aims to make it possible to dynamically change bloom filter config.
Closes https://github.com/facebook/rocksdb/pull/3601

Differential Revision: D7253114

Pulled By: miasantreble

fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
2018-05-21 14:43:11 -07:00
Yanqin Jin
a0c7b4d526 Set the default value of max_manifest_file_size.
Summary:
In the past, the default value of max_manifest_file_size is uint64_t::MAX,
allowing a long running RocksDB process to grow its MANIFEST file to take up
the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB.

siying miasantreble Please let me know whether this looks good to you. Thanks!
Closes https://github.com/facebook/rocksdb/pull/3867

Differential Revision: D8051524

Pulled By: riversand963

fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72
2018-05-18 08:11:55 -07:00
Sagar Vemuri
ebb823f746 Fix db_stress build on mac
Summary:
I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`.
```
$ make db_stress -j4
...
tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *')
        status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size);
                                                                    ^~~~~
./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here
  virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0;
                                                                 ^
1 error generated.
make: *** [tools/db_stress.o] Error 1
make: *** Waiting for unfinished jobs....
```
Closes https://github.com/facebook/rocksdb/pull/3839

Differential Revision: D7979236

Pulled By: sagar0

fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62
2018-05-14 11:14:07 -07:00
Andrew Kryczka
072ae671a7 Apply use_direct_io_for_flush_and_compaction to writes only
Summary:
Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used.

This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously.
Closes https://github.com/facebook/rocksdb/pull/3829

Differential Revision: D7915443

Pulled By: ajkr

fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279
2018-05-09 19:42:58 -07:00
Andrew Kryczka
d19f568abf Refactor argument handling in db_crashtest.py
Summary:
- Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option.
- Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default
- Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params.
- Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication
Closes https://github.com/facebook/rocksdb/pull/3809

Differential Revision: D7885779

Pulled By: ajkr

fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8
2018-05-09 13:42:41 -07:00
Andrew Kryczka
4c5a3232e4 Fix db_stress memory leak ASAN error
Summary:
In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed.
Closes https://github.com/facebook/rocksdb/pull/3804

Differential Revision: D7874694

Pulled By: ajkr

fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b
2018-05-04 16:45:15 -07:00
Zhongyi Xie
a703432808 MaxFileSizeForLevel: adjust max_file_size for dynamic level compaction
Summary:
`MutableCFOptions::RefreshDerivedOptions` always assume base level is L1, which is not true when `level_compaction_dynamic_level_bytes=true` and Level based compaction is used.
This PR fixes this by recomputing `max_file_size` at query time (in `MaxFileSizeForLevel`)
Fixes https://github.com/facebook/rocksdb/issues/3229

In master:

```
Level Files Size(MB)
--------------------
  0       14      846
  1        0        0
  2        0        0
  3        0        0
  4        0        0
  5       15      366
  6       11      481
Cumulative compaction: 3.83 GB write, 2.27 GB read
```
In branch:
```
Level Files Size(MB)
--------------------
  0        9      544
  1        0        0
  2        0        0
  3        0        0
  4        0        0
  5        0        0
  6      445      935
Cumulative compaction: 2.91 GB write, 1.46 GB read
```

db_bench command used:
```
./db_bench --benchmarks="fillrandom,deleterandom,fillrandom,levelstats,stats" --statistics -deletes=5000 -db=tmp -compression_type=none --num=20000 -value_size=100000 -level_compaction_dynamic_level_bytes=true -target_file_size_base=2097152 -target_file_size_multiplier=2
```
Closes https://github.com/facebook/rocksdb/pull/3755

Differential Revision: D7721381

Pulled By: miasantreble

fbshipit-source-id: 39afb8503190bac3b466adf9bbf2a9b3655789f8
2018-05-03 16:42:13 -07:00
Dmitri Smirnov
acb61b7a52 Adjust pread/pwrite to return Status
Summary:
Returning bytes_read causes the caller to call GetLastError()
  to report failure but the lasterror may be overwritten by then
  so we lose the error code.
  Fix up CMake file to include xpress source code only when needed.
  Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795

Differential Revision: D7832935

Pulled By: anand1976

fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b
2018-05-01 13:42:46 -07:00
Andrew Kryczka
46152d53bf Second attempt at db_stress crash-recovery verification
Summary:
- Original commit: a4fb1f8c04
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e

This PR includes the contents of the original commit plus two bug fixes, which are:

- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793

Differential Revision: D7811671

Pulled By: ajkr

fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
2018-04-30 12:27:34 -07:00
Andrew Kryczka
6afe22db2e revert db_stress crash-recovery verification
Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786

Differential Revision: D7794516

Pulled By: ajkr

fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33
2018-04-27 12:57:01 -07:00
Zhongyi Xie
459bb9028f remove prefixscanrandom from db_bench help
Summary:
fix issue reported in https://github.com/facebook/rocksdb/issues/3757
Closes https://github.com/facebook/rocksdb/pull/3784

Differential Revision: D7794107

Pulled By: miasantreble

fbshipit-source-id: 43535074fcb82adb5656bcb916284b2dfc5cbb64
2018-04-27 12:13:19 -07:00
Andrew Kryczka
db36f222d8 Allow options file in db_stress and db_crashtest
Summary:
- When options file is provided to db_stress, take supported options from the file instead of from flags
- Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file
- Allow options filename to be passed via `db_crashtest.py`
Closes https://github.com/facebook/rocksdb/pull/3768

Differential Revision: D7755331

Pulled By: ajkr

fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a
2018-04-26 18:42:07 -07:00
Andrew Kryczka
a4fb1f8c04 Add crash-recovery correctness check to db_stress
Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
2018-04-24 15:58:22 -07:00
Gabriel Wicke
090c78a0d7 Support lowering CPU priority of background threads
Summary:
Background activities like compaction can negatively affect
latency of higher-priority tasks like request processing. To avoid this,
rocksdb already lowers the IO priority of background threads on Linux
systems. While this takes care of typical IO-bound systems, it does not
help much when CPU (temporarily) becomes the bottleneck. This is
especially likely when using more expensive compression settings.

This patch adds an API to allow for lowering the CPU priority of
background threads, modeled on the IO priority API. Benchmarks (see
below) show significant latency and throughput improvements when CPU
bound. As a result, workloads with some CPU usage bursts should benefit
from lower latencies at a given utilization, or should be able to push
utilization higher at a given request latency target.

A useful side effect is that compaction CPU usage is now easily visible
in common tools, allowing for an easier estimation of the contribution
of compaction vs. request processing threads.

As with IO priority, the implementation is limited to Linux, degrading
to a no-op on other systems.
Closes https://github.com/facebook/rocksdb/pull/3763

Differential Revision: D7740096

Pulled By: gwicke

fbshipit-source-id: e5d32373e8dc403a7b0c2227023f9ce4f22b413c
2018-04-24 08:41:51 -07:00
Zhongyi Xie
8a9c7f71c9 fix compilation error: implicit conversion loses integer precision
Summary:
Fix compilation error with clang:
> tools/db_stress.cc:2598:21: error: implicit conversion loses integer precision: 'gflags::uint64' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32]
        Random rand(FLAGS_seed);
               ~~~~ ^~~~~~~~~~
Closes https://github.com/facebook/rocksdb/pull/3746

Differential Revision: D7703209

Pulled By: miasantreble

fbshipit-source-id: 18c56a5138a2f308e4213594bc82e8e64bc21570
2018-04-19 18:57:43 -07:00
Maysam Yabandeh
6d06be22c0 Improve db_stress with transactions
Summary:
db_stress was already capable running transactions by setting use_txn. Running it under stress showed a couple of problems fixed in this patch.
- The uncommitted transaction must be either rolled back or commit after recovery.
- Current implementation of WritePrepared transaction cannot handle cf drop before crash. Clarified that in the comments and added safety checks. When running with use_txn, clear_column_family_one_in must be set to 0.
Closes https://github.com/facebook/rocksdb/pull/3733

Differential Revision: D7654419

Pulled By: maysamyabandeh

fbshipit-source-id: a024bad80a9dc99677398c00d29ff17d4436b7f3
2018-04-18 16:32:35 -07:00
Yanqin Jin
2ee1496c43 Add missing whitespace.
Summary: Closes https://github.com/facebook/rocksdb/pull/3729

Differential Revision: D7645465

Pulled By: riversand963

fbshipit-source-id: a64da0960fe6c39847ef848b8888fe9a9c1df25d
2018-04-17 09:57:40 -07:00
Yi Wu
2c2f388897 db_bench fillXXXdeterministic should respect compression type
Summary:
db_bench fillXXXdeterministic should respect compression type when calling CompactFiles().
Closes https://github.com/facebook/rocksdb/pull/3731

Differential Revision: D7647761

Pulled By: yiwu-arbug

fbshipit-source-id: 15e12429e0dd93ece2231b015f2e26c2d94781e6
2018-04-16 18:01:47 -07:00
Zhongyi Xie
af95aecd01 use delete[] to dealloc an array
Summary:
fix a bug in `db_stress` where an int array was incorrectly deallocated using delete instead of delete[]
Closes https://github.com/facebook/rocksdb/pull/3725

Differential Revision: D7634749

Pulled By: miasantreble

fbshipit-source-id: 489b776f5f4c03de1824edac5495787ec19cc910
2018-04-15 23:56:39 -07:00
Zhongyi Xie
954b496b3f fix memory leak in two_level_iterator
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
2018-04-15 17:26:26 -07:00
Amy Tai
28087acd79 Implemented Knuth shuffle to construct permutation for selecting no_o…
Summary:
…verwrite_keys. Also changed each no_overwrite_key set to an unordered set, otherwise Knuth shuffle only gets you 2x time improvement, because insertion (and subsequent internal sorting) into an ordered set is the bottleneck.

With this change, each iteration of permutation construction and prefix selection takes around 40 secs, as opposed to 360 secs previously. However, this still means that with the default 10 CF per blackbox test case, the test is going to time out given the default interval of 200 secs.

Also, there is currently an assertion error affecting all blackbox tests in db_crashtest.py; this assertion error will be fixed in a future PR.
Closes https://github.com/facebook/rocksdb/pull/3699

Differential Revision: D7624616

Pulled By: amytai

fbshipit-source-id: ea64fbe83407ff96c1c0ecabbc6c830576939393
2018-04-13 22:13:13 -07:00
David Lai
3be9b36453 comment unused parameters to turn on -Wunused-parameter flag
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662

Differential Revision: D7426121

Pulled By: Dayvedde

fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
2018-04-12 17:59:16 -07:00
Maysam Yabandeh
eb5a295440 WritePrepared Txn: add write_committed option to dump_wal
Summary:
Currently dump_wal cannot print the prepared records from the WAL that is generated by WRITE_PREPARED write policy since the default reaction of the handler is to return NotSupported if markers of WRITE_PREPARED are encountered. This patch enables the admin to pass --write_committed=false option, which will be accordingly passed to the handler. Note that DBFileDumperCommand and DBDumperCommand are still not updated by this patch but firstly they are not urgent and secondly we need to revise this approach later when we also add WRITE_UNPREPARED markers so I leave it for future work.

Tested by running it on a WAL generated by WRITE_PREPARED:
$ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log  | grep BEGIN_PREARE | head -1
1,2,70,0,BEGIN_PREARE
$ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log --write_committed=false | grep BEGIN_PREARE | head -1
1,2,70,0,BEGIN_PREARE PUT(0) : 0x30303031313330313938 PUT(0) : 0x30303032353732313935 END_PREPARE(0x74786E31313535383434323738303738363938313335312D30)
Closes https://github.com/facebook/rocksdb/pull/3682

Differential Revision: D7522090

Pulled By: maysamyabandeh

fbshipit-source-id: a0332207261c61e18b2f9dfbe9feecd9a1339aca
2018-04-07 21:56:42 -07:00
Phani Shekhar Mantripragada
446b32cfc3 Support for Column family specific paths.
Summary:
In this change, an option to set different paths for different column families is added.
This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path.
To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path.

Changes :
1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions.  This member is used to identify the path information whenever files are accessed.
2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting.
3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths.
4) Unit tests are added appropriately.
Closes https://github.com/facebook/rocksdb/pull/3102

Differential Revision: D6951697

Pulled By: ajkr

fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d
2018-04-05 19:58:20 -07:00
Yi Wu
36a9f22931 Blob DB: blob_dump to show uncompressed values
Summary:
Make blob_dump tool able to show uncompressed values if the blob file is compressed. Also show total compressed vs. raw size at the end if --show_summary is provided.
Closes https://github.com/facebook/rocksdb/pull/3633

Differential Revision: D7348926

Pulled By: yiwu-arbug

fbshipit-source-id: ca709cb4ed5cf6a550ff2987df8033df81516f8e
2018-04-05 11:12:16 -07:00
Andrew Kryczka
b058a33705 Reduce default --nooverwritepercent in black-box crash tests
Summary:
Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`.

Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds.
Closes https://github.com/facebook/rocksdb/pull/3671

Differential Revision: D7457732

Pulled By: ajkr

fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0
2018-04-03 15:28:40 -07:00
Anand Ananthabhotla
f9f4d40f93 Align SST file data blocks to avoid spanning multiple pages
Summary:
Provide a block_align option in BlockBasedTableOptions to allow
alignment of SST file data blocks. This will avoid higher
IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages.
When this option is set to true, the block alignment is set to lower of
block size and 4KB.
Closes https://github.com/facebook/rocksdb/pull/3502

Differential Revision: D7400897

Pulled By: anand1976

fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44
2018-03-26 20:26:10 -07:00
Sagar Vemuri
a993c0139d Add 5.11 and 5.12 to tools/check_format_compatible.sh
Summary: Closes https://github.com/facebook/rocksdb/pull/3646

Differential Revision: D7384727

Pulled By: sagar0

fbshipit-source-id: f713af7adb2ffea5303bbf0fac8a8a1630af7b38
2018-03-23 12:43:06 -07:00
Siying Dong
6383e42362 benchmark.sh to use --max_background_job
Summary: Closes https://github.com/facebook/rocksdb/pull/3632

Differential Revision: D7347012

Pulled By: siying

fbshipit-source-id: 46230ec4a917ccf4c478825b07e92b4665a4820b
2018-03-20 18:57:55 -07:00
Bruce Mitchener
a3a3f5497c Fix some typos in comments and docs.
Summary: Closes https://github.com/facebook/rocksdb/pull/3568

Differential Revision: D7170953

Pulled By: siying

fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c
2018-03-08 10:27:25 -08:00
Yi Wu
b864bc9b5b Blob DB: Improve FIFO eviction
Summary:
Improving blob db FIFO eviction with the following changes,
* Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size.
* FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion.
* FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files.
* If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO.
* Compaction filter also filter those blob indexes where corresponding blob file is gone.
* Add an event listener to listen compaction/flush event and update sst file size.
* Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db.
* More blob db statistics around FIFO.
* Fix some locking issue when accessing a blob file.
Closes https://github.com/facebook/rocksdb/pull/3556

Differential Revision: D7139328

Pulled By: yiwu-arbug

fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa
2018-03-06 11:57:42 -08:00
Pooya Shareghi
0a2354ca8f Added bytes XOR merge operator
Summary:
Closes https://github.com/facebook/rocksdb/pull/575

I fixed the merge conflicts etc.
Closes https://github.com/facebook/rocksdb/pull/3065

Differential Revision: D7128233

Pulled By: sagar0

fbshipit-source-id: 2c23a48c9f0432c290b0cd16a12fb691bb37820c
2018-03-06 10:27:36 -08:00
Andrew Kryczka
5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Maysam Yabandeh
d060421c77 Fix a leak in prepared_section_completed_
Summary:
The zeroed entries were not removed from prepared_section_completed_ map. This patch adds a unit test to show the problem and fixes that by refactoring the code. The new code is more efficient since i) it uses two separate mutex to avoid contention between commit and prepare threads, ii) it uses a sorted vector for maintaining uniq log entires with prepare which avoids a very large heap with many duplicate entries.
Closes https://github.com/facebook/rocksdb/pull/3545

Differential Revision: D7106071

Pulled By: maysamyabandeh

fbshipit-source-id: b3ae17cb6cd37ef10b6b35e0086c15c758768a48
2018-03-01 20:41:56 -08:00
Igor Sugak
aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai
f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Andrew Kryczka
1960e73e21 fix handling of empty string as checkpoint directory
Summary:
- made `CreateCheckpoint` properly return `InvalidArgument` when called with an empty directory. Previously it triggered an assertion failure due to a bug in the logic.
- made `ldb` set empty `checkpoint_dir` if that's what the user specifies, so that we can use it to properly test `CreateCheckpoint` in the future.

Differential Revision: D6874562

fbshipit-source-id: dcc1bd41768261d9338987fa7711444289707ed7
2018-02-20 16:44:00 -08:00
Yi Wu
989d12313c Legocastle job to report lite build binary size to scuba
Summary:
Add a legocastle job to continuously build the last 10 commits every 4 hours and report lite build binary size to scuba.
Closes https://github.com/facebook/rocksdb/pull/3511

Differential Revision: D7001730

Pulled By: yiwu-arbug

fbshipit-source-id: 7c8ca87c46d663c786a0d32be69ebbe7b19a5eb9
2018-02-15 17:27:24 -08:00
Andrew Kryczka
0a0fad447b db_bench separate options for partition index and filters
Summary:
Some workloads (like my current benchmarking) may want partitioned indexes without partitioned filters. Particularly, when `-optimize_filters_for_hits=true`, the total index size may be larger than the total filter size, so it can make sense to hold all filters in-memory but not all indexes.
Closes https://github.com/facebook/rocksdb/pull/3492

Differential Revision: D6970092

Pulled By: ajkr

fbshipit-source-id: b7fa1828e1d13829339aefb90fd56eb7c5337f61
2018-02-12 14:57:13 -08:00
Chinmay Kamat
9fc72d6f16 Compilation fixes for powerpc build, -Wparentheses-equality error and missing header guards
Summary:
This pull request contains miscellaneous compilation fixes.

Thanks,
Chinmay
Closes https://github.com/facebook/rocksdb/pull/3462

Differential Revision: D6941424

Pulled By: sagar0

fbshipit-source-id: fe9c26507bf131221f2466740204bff40a15614a
2018-02-09 14:12:43 -08:00
Tamir Duberstein
cd5092e168 Suppress unused warnings
Summary:
- Use `__unused__` everywhere
- Suppress unused warnings in Release mode
    + This currently affects non-MSVC builds (e.g. mingw64).
Closes https://github.com/facebook/rocksdb/pull/3448

Differential Revision: D6885496

Pulled By: miasantreble

fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211
2018-02-02 12:27:07 -08:00
Siying Dong
e2d4b0efb1 db_bench: sanity check CuckooTable with mmap_read option
Summary:
This is to avoid run time error. Fail the db_bench immediately if cuckoo table is used but mmap_read is not specified.
Closes https://github.com/facebook/rocksdb/pull/3420

Differential Revision: D6838284

Pulled By: siying

fbshipit-source-id: 20893fa28d40fadc31e4ff154bed02f5a1bad341
2018-01-29 14:27:32 -08:00
Mark Isaacson
b8eb32f8cf Suppress lint in old files
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.

Reviewed By: yiwu-arbug

Differential Revision: D6821806

fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
2018-01-29 12:56:42 -08:00
Andrew Kryczka
9f7ccc8445 fix db_bench filluniquerandom key count assertion
Summary:
It failed every time. I guess people usually ran with assertions disabled.
Closes https://github.com/facebook/rocksdb/pull/3422

Differential Revision: D6822984

Pulled By: ajkr

fbshipit-source-id: 2e90db75618b26ac1c46ddfa9e03c095c7bf16e3
2018-01-29 11:43:21 -08:00
Andrew Kryczka
0e6e405fec db_bench support for memtable in-place update
Summary: Closes https://github.com/facebook/rocksdb/pull/3416

Differential Revision: D6820606

Pulled By: ajkr

fbshipit-source-id: 5035ffb33ade8d50520cafeb685ee8c8fcf1cca8
2018-01-26 10:57:49 -08:00
Siying Dong
47ad6b81ff Add 5.10.fb to tools/check_format_compatible.sh
Summary: Closes https://github.com/facebook/rocksdb/pull/3383

Differential Revision: D6762375

Pulled By: siying

fbshipit-source-id: dc1e0dc9718ffb59ffe42e2a2c844b67f935a5fb
2018-01-19 12:42:07 -08:00
Anand Ananthabhotla
199405192d Add a BlockBasedTableOption to turn off index block compression.
Summary:
Add a new bool option index_uncompressed in BlockBasedTableOptions.
Closes https://github.com/facebook/rocksdb/pull/3303

Differential Revision: D6686161

Pulled By: anand1976

fbshipit-source-id: 748b46993d48a01e5f89b6bd3e41f06a59ec6054
2018-01-10 15:11:59 -08:00
Yi Wu
46ec52499e Fix db_bench write being disabled in lite build
Summary:
The macro was added by mistake in #2372
Closes https://github.com/facebook/rocksdb/pull/3343

Differential Revision: D6681356

Pulled By: yiwu-arbug

fbshipit-source-id: 4180172fb0eaef4189c07f219241e0c261c03461
2018-01-09 10:57:29 -08:00
Maysam Yabandeh
00b33c2474 WritePrepared Txn: address some pending TODOs
Summary:
This patch addresses a couple of minor TODOs for WritePrepared Txn such as double checking some assert statements at runtime as well, skip extra AddPrepared in non-2pc transactions, and safety check for infinite loops.
Closes https://github.com/facebook/rocksdb/pull/3302

Differential Revision: D6617002

Pulled By: maysamyabandeh

fbshipit-source-id: ef6673c139cb49f64c0879508d2f573b78609aca
2018-01-09 08:57:20 -08:00
yingsu00
f54d7f5fea Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**

RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.

This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.

**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.

Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.

1) ReadRandom in db_bench overall metrics

    PER RUN
    Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
    3-way      |  1   | 4.143   | 241387 | 26.7
    3-way      |  2   | 3.775   | 264872 | 29.3
    3-way      | 3    | 4.116   | 242929 | 26.9
    FastCrc32c|1  | 4.037   | 247727 | 27.4
    FastCrc32c|2  | 4.648   | 215166 | 23.8
    FastCrc32c|3  | 4.352   | 229799 | 25.4

     AVG
    Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
    3-way           |     4.01                               |      249,729                 |      27.63
    FastCrc32c  |     4.35                              |     230,897                  |      25.53

 2)   Crc32c computation CPU cost (inclusive samples percentage)
    PER RUN
    Implementation | run |  TotalSamples   | Crc32c percentage
    3-way                 |  1    |  4,572,250,000 | 4.37%
    3-way                 |  2    |  3,779,250,000 | 4.62%
    3-way                 |  3    |  4,129,500,000 | 4.48%
    FastCrc32c       |  1    |  4,663,500,000 | 11.24%
    FastCrc32c       |  2    |  4,047,500,000 | 12.34%
    FastCrc32c       |  3    |  4,366,750,000 | 11.68%

 **# Test Plan**
     make -j64 corruption_test && ./corruption_test
      By default it uses 3-way SSE algorithm

     NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test

    make clean && DEBUG_LEVEL=0 make -j64 db_bench
    make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173

Differential Revision: D6330882

Pulled By: yingsu00

fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-19 18:26:49 -08:00
Maysam Yabandeh
95583e1532 db_stress: skip snapshot check if cf is dropped
Summary:
We added a new verification that ensures a value that snapshot reads when is released is the same as when it was created. This test however fails when the cf is dropped in between. The patch skips the tests if that was the case.
Closes https://github.com/facebook/rocksdb/pull/3279

Differential Revision: D6581584

Pulled By: maysamyabandeh

fbshipit-source-id: afe37d371c0f91818d2e279b3949b810e112e8eb
2017-12-15 16:28:04 -08:00
Maysam Yabandeh
cd2e5cae7f WritePrepared Txn: make db_stress transactional
Summary:
Add "--use_txn" option to use transactional API in db_stress, default being WRITE_PREPARED policy, which is the main intention of modifying db_stress. It also extend the existing snapshots to verify that before releasing a snapshot a read from it returns the same value as before.
Closes https://github.com/facebook/rocksdb/pull/3243

Differential Revision: D6556912

Pulled By: maysamyabandeh

fbshipit-source-id: 1ae31465be362d44bd06e635e2e9e49a1da11268
2017-12-13 11:57:29 -08:00
Yi Wu
e1c569c324 Fix clang-analyzer false-positive on ldb_cmd.cc
Summary:
clang-analyzer complaint about db_ being nullptr, but it couldn't be because it checks exec_stats before proceed. Add an assert to get around the false-positive.

Test Plan
`make analyze`
Closes https://github.com/facebook/rocksdb/pull/3236

Differential Revision: D6505417

Pulled By: yiwu-arbug

fbshipit-source-id: e5b65764ea994dd9e4bab3e697b97dc70dc22cab
2017-12-06 22:58:46 -08:00
Sagar Vemuri
d51fcb21f4 Blob DB: Add db_bench options
Summary:
Adding more BlobDB db_bench options which are needed for benchmarking.
Closes https://github.com/facebook/rocksdb/pull/3230

Differential Revision: D6500711

Pulled By: sagar0

fbshipit-source-id: 91d63122905854ef7c9148a0235568719146e6c5
2017-12-06 20:44:12 -08:00
Yi Wu
7f04af32a5 ldb to allow db with --try_load_options and without an options file
Summary:
This is to fix tools/check_format_compatible.sh. The tool try to open
old versions of rocksdb with the provided options file. When options
file is missing (e.g. rocksdb 2.2), it should still proceed with default
options.
Closes https://github.com/facebook/rocksdb/pull/3232

Differential Revision: D6503955

Pulled By: yiwu-arbug

fbshipit-source-id: e44cfcce7ddc7d12cf83466ed3f3fe7624aa78b8
2017-12-06 16:42:26 -08:00
Yi Wu
b5798bd324 Add missing recent versions to format compatible test
Summary:
Add recent versions for format compatible test. We should probably update the script to auto include available versions (by looking at include/rocksdb/versions.h and deduce branch names), but we can do it later.
Closes https://github.com/facebook/rocksdb/pull/3233

Differential Revision: D6503631

Pulled By: yiwu-arbug

fbshipit-source-id: e2b01d1ef6e784ff6ffa1bd75d741755e3c69a8c
2017-12-06 16:13:50 -08:00
Andrew Kryczka
63f1c0a57d fix gflags namespace
Summary:
I started adding gflags support for cmake on linux and got frustrated that I'd need to duplicate the build_detect_platform logic, which determines namespace based on attempting compilation. We can do it differently -- use the GFLAGS_NAMESPACE macro if available, and if not, that indicates it's an old gflags version without configurable namespace so we can simply hardcode "google".
Closes https://github.com/facebook/rocksdb/pull/3212

Differential Revision: D6456973

Pulled By: ajkr

fbshipit-source-id: 3e6d5bde3ca00d4496a120a7caf4687399f5d656
2017-12-01 10:42:05 -08:00
Maysam Yabandeh
18dcf7f98d WritePrepared Txn: PreReleaseCallback
Summary:
Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
Closes https://github.com/facebook/rocksdb/pull/3205

Differential Revision: D6438959

Pulled By: maysamyabandeh

fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
2017-11-30 23:50:45 -08:00
Andrew Kryczka
ed3af9ef99 improve ldb CLI option support
Summary:
- Made CLI arguments take precedence over options file when both are provided. Note some of the CLI args are not settable via options file, like `--compression_max_dict_bytes`, so it's necessary to allow both ways of providing options simultaneously.
- Changed `PrepareOptionsForOpenDB` to update the proper `ColumnFamilyOptions` if one exists for the user's `--column_family_name` argument. I supported this only in the base class, `LDBCommand`, so it works for the general arguments. Will defer adding support for subcommand-specific arguments.
- Made the command fail if `--try_load_options` is provided and loading options file returns NotFound. I found the previous behavior of silently continuing confusing.
Closes https://github.com/facebook/rocksdb/pull/3144

Differential Revision: D6270544

Pulled By: ajkr

fbshipit-source-id: 7c2eac9f9b38720523d74466fb9e78db53561367
2017-11-28 17:28:58 -08:00
Prashant D
c1ed005a21 tools: Fix coverity issues
Summary:
tools/ldb_cmd.cc:
```
310  ignore_unknown_options_ = IsFlagPresent(flags, ARG_IGNORE_UNKNOWN_OPTIONS);

CID 1322798 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
5. uninit_member: Non-static class member db_ttl_ is not initialized in this constructor nor in any functions that it calls.
311}
```
Closes https://github.com/facebook/rocksdb/pull/3122

Differential Revision: D6428576

Pulled By: sagar0

fbshipit-source-id: d77f04dd201f7f1d9f59ef88a215ee7ad7b934e9
2017-11-28 15:27:41 -08:00
Sagar Vemuri
8954f830a0 Blob DB: db_bench flag to control BlobDB's garbage collection
Summary:
flag: blob_db_enable_gc, to control BlobDb's enable_garbage_collection.
Closes https://github.com/facebook/rocksdb/pull/3190

Differential Revision: D6383395

Pulled By: sagar0

fbshipit-source-id: 4134e835150748c425b8187264273a54c6d8381c
2017-11-20 23:26:15 -08:00
Yi Wu
9871ea4357 Regression test build binaries with PORTABLE=1
Summary:
We hit "Illegal instruction" error in regression test with "shlx" instruction. Setting PORTABLE=1 to resolve it.
Closes https://github.com/facebook/rocksdb/pull/3165

Differential Revision: D6321972

Pulled By: yiwu-arbug

fbshipit-source-id: cc9fe0dbd4698d1b66a750a0b062f66899862719
2017-11-13 21:26:24 -08:00
Andrew Kryczka
114896c4e0 db_bench compression options
Summary:
- moved existing compression options to `InitializeOptionsGeneral` since they cannot be set through options file
- added flag for `zstd_max_train_bytes` which was recently introduced by #3057
Closes https://github.com/facebook/rocksdb/pull/3128

Differential Revision: D6240460

Pulled By: ajkr

fbshipit-source-id: 27dbebd86a55de237ba6a45cc79cff9214e82ebc
2017-11-07 14:00:03 -08:00
Andrew Kryczka
65c95d9c59 support db_bench compact benchmark on bottommost files
Summary:
Without this option, running the compact benchmark on a DB containing only bottommost files simply returned immediately.
Closes https://github.com/facebook/rocksdb/pull/3138

Differential Revision: D6256660

Pulled By: ajkr

fbshipit-source-id: e3b64543acd503d821066f4200daa201d4fb3a9d
2017-11-07 10:57:24 -08:00
Andrew Kryczka
4d43c6a6a4 db_stress snapshot compatibility with reopens
Summary:
- Release all snapshots before crashing and reopening the DB. Without this, we may attempt to release snapshots from an old DB using a new DB. That tripped an assertion.
- Release multiple snapshots in the same operation if needed. Without this, we would sometimes leak snapshots.
Closes https://github.com/facebook/rocksdb/pull/3098

Differential Revision: D6194923

Pulled By: ajkr

fbshipit-source-id: b9c89bcca7ebcbb6c7802c616f9d1175a005aadf
2017-10-31 01:26:08 -07:00
Andrew Kryczka
d75793d6b4 db_stress support long-held snapshots
Summary:
Add options to `db_stress` (correctness testing tool) to randomly acquire snapshot and release it after some period of time. It's useful for correctness testing of #3009, as well as other parts of compaction that behave differently depending on which snapshots are held.
Closes https://github.com/facebook/rocksdb/pull/3038

Differential Revision: D6086501

Pulled By: ajkr

fbshipit-source-id: 3ec0d8666c78ac507f1f808887c4ff759ba9b865
2017-10-20 15:26:59 -07:00
Dmitri Smirnov
ebab2e2d42 Enable MSVC W4 with a few exceptions. Fix warnings and bugs
Summary: Closes https://github.com/facebook/rocksdb/pull/3018

Differential Revision: D6079011

Pulled By: yiwu-arbug

fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57
2017-10-19 10:57:12 -07:00
Andrew Kryczka
731895214b db_bench randomtransaction print throughput
Summary:
print throughput in MB/s upon finishing randomtransaction benchmark
Closes https://github.com/facebook/rocksdb/pull/3016

Differential Revision: D6070426

Pulled By: ajkr

fbshipit-source-id: 69df43beed4c374a36d826e761ca3a83e1fdcbf5
2017-10-16 18:42:25 -07:00
Andrew Kryczka
1026e794a3 rate limit auto-tuning
Summary:
Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in `GenericRateLimiter::Tune`. Other changes:

- make rate limiter's `Env*` configurable for testing
- track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter.
Closes https://github.com/facebook/rocksdb/pull/2899

Differential Revision: D5858704

Pulled By: ajkr

fbshipit-source-id: cc2bac30f85e7f6fd63655d0a6732ef9ed7403b1
2017-10-04 19:15:01 -07:00
Siying Dong
2a3363d52e ldb dump can print histogram of value size
Summary:
Make "ldb dump --count_only" print histogram of value size. Also, fix a bug that "ldb dump --path=<db_path>" doesn't work.
Closes https://github.com/facebook/rocksdb/pull/2944

Differential Revision: D5954527

Pulled By: siying

fbshipit-source-id: c620a444ec544258b8d113f5f663c375dd53d6be
2017-10-02 09:41:17 -07:00
Andrew Kryczka
8fc3de3c62 make rate limiter a general option
Summary:
it's unsupported in options file, so the flag should be respected by db_bench even when an options file is provided.
Closes https://github.com/facebook/rocksdb/pull/2910

Differential Revision: D5869836

Pulled By: ajkr

fbshipit-source-id: f67f591ae083e95e989f86b6fad50765d2e3d855
2017-09-21 11:11:00 -07:00
Amy Xu
5785b1fcb8 Fix naming in InternalKey
Summary:
- Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic
Closes https://github.com/facebook/rocksdb/pull/2868

Differential Revision: D5804152

Pulled By: axxufb

fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183
2017-09-12 17:17:42 -07:00
Kamalalochana Subbaiah
e612e31740 Updated CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2716

Differential Revision: D5606836

Pulled By: siying

fbshipit-source-id: 720262453b1546e5fdbbc668eff56848164113f3
2017-08-31 14:16:30 -07:00
Dmitri Smirnov
8fa4d108a2 Try to switch to Stduio 2017
Summary: Closes https://github.com/facebook/rocksdb/pull/2802

Differential Revision: D5746710

Pulled By: siying

fbshipit-source-id: daa621ba5fccb84c0d6cdb7755c5e09319c45cb4
2017-08-31 10:30:27 -07:00
Sagar Vemuri
06b37eef7b Set defaults for high-pri and low-pri thread pools in regression test script
Summary:
**Summary**:
Set defaults for high-pri and low-pri thread pools in regression test script.

**Reason for this change**:
With #2680 , high-pri and low-pri thread pools get different numbers than before if  `num_high_pri_threads` and `num_low_pri_threads` options are not explicitly passed to db_bench in regression test script ... leading to a false-positive regression.

**Test Plan**:
REMOTE_HOST=udb1671.prn3 TEST_MODE=1 FBSOURCE=~/fbsource ~/fbsource/fbcode/rocks/tools/debug_regression_test.sh viewstate  (with very minor changes to the internals).

Observe P50 and P99 which showed up as regressions in our graphs.

Stats with the commit prior to #2680 , ie. 4f81ab3 :
seekrandomwhilewriting :      75.096 micros/op 13316 ops/sec;  168.6 MB/s (7499074 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1197.7254  StdDev: 33.35
Min: 187  Median: 980.5292  Max: 1816424
Percentiles: **P50: 980.53** P75: 1494.57 **P99: 4185.64** P99.9: 7800.11 P99.99: 15039.64

Stats at #2680, ie. at commit dce6d5a (false-positive regression):
seekrandomwhilewriting :      85.330 micros/op 11719 ops/sec;  148.4 MB/s (7499073 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1362.3261  StdDev: 27.86
Min: 185  Median: 1088.1915  Max: 652760
Percentiles: **P50: 1088.19** P75: 1658.12 **P99: 5361.15** P99.9: 7997.95 P99.99: 11730.07

Stats with the current change on top of dce6d5a :
seekrandomwhilewriting :      77.780 micros/op 12856 ops/sec;  162.8 MB/s (7499102 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1226.6744  StdDev: 17.16
Min: 185  Median: 994.2956  Max: 2553530
Percentiles: **P50: 994.30** P75: 1513.68 **P99: 4284.30** P99.9: 9338.64 P99.99: 23008.86
Closes https://github.com/facebook/rocksdb/pull/2801

Differential Revision: D5742338

Pulled By: sagar0

fbshipit-source-id: cc5d727c1a131f2a7070d1bb892efbe929b976ff
2017-08-30 17:29:34 -07:00
Andrew Kryczka
7fbb9eccaf support disabling checksum in block-based table
Summary:
store a zero as the checksum when disabled since it's easier to keep block trailer a fixed length.
Closes https://github.com/facebook/rocksdb/pull/2781

Differential Revision: D5694702

Pulled By: ajkr

fbshipit-source-id: 69cea9da415778ba2b600dfd9d0dfc8cb5188ecd
2017-08-23 19:40:47 -07:00
yiwu-arbug
c1384a7076 fix db_stress uint64_t to int32 cast
Summary:
Clang complain about an cast from uint64_t to int32 in db_stress. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/2755

Differential Revision: D5655947

Pulled By: yiwu-arbug

fbshipit-source-id: cfac10e796e0adfef4727090b50975b0d6e2c9be
2017-08-17 17:56:55 -07:00
Andrew Kryczka
23593171c4 minor improvements to db_stress
Summary:
fix some things that made this command hard to use from CLI:

- use default values for `target_file_size_base` and `max_bytes_for_level_base`. previously we were using small values for these but default value of `write_buffer_size`, which led to enormous number of L1 files.
- failure message for `value_size_mult` too big. previously there was just an assert, so in non-debug mode it'd overrun the value buffer and crash mysteriously.
- only print verification success if there's no failure. before it'd print both in the failure case.
- support `memtable_prefix_bloom_size_ratio`
- support `num_bottom_pri_threads` (universal compaction)
Closes https://github.com/facebook/rocksdb/pull/2741

Differential Revision: D5629495

Pulled By: ajkr

fbshipit-source-id: ddad97d6d4ba0884e7c0f933b0a359712514fc1d
2017-08-16 19:13:01 -07:00
Andrew Kryczka
7aa96db7a2 db_stress rolling active window
Summary:
Support a window of `active_width` keys that rolls through `[0, max_key)` over the duration of the test. Operations only affect keys inside the window. This gives us the ability to detect L0->L0 deletion bug (#2722).
Closes https://github.com/facebook/rocksdb/pull/2739

Differential Revision: D5628555

Pulled By: ajkr

fbshipit-source-id: 9cb2d8f4ab1a7c73f7797b8e19f7094970ea8749
2017-08-15 12:02:16 -07:00
Andrew Kryczka
8254e9b57c make sst_dump compression size command consistent
Summary:
- like other subcommands, reporting compression sizes should be specified with the `--command` CLI arg.
- also added `--compression_types` arg as it's useful to restrict the types of compression used, at least in my dictionary compression experiments.
Closes https://github.com/facebook/rocksdb/pull/2706

Differential Revision: D5589520

Pulled By: ajkr

fbshipit-source-id: 305bb4ebcc95eecc8a85523cd3b1050619c9ddc5
2017-08-11 16:03:44 -07:00
Andrew Kryczka
74f18c1301 db_bench support for non-uniform column family ops
Summary:
Previously we could only select the CF on which to operate uniformly at random. This is a limitation, e.g., when testing universal compaction as all CFs would need to run full compaction at roughly the same time, which isn't realistic.

This PR allows the user to specify the probability distribution for selecting CFs via the `--column_family_distribution` argument.
Closes https://github.com/facebook/rocksdb/pull/2677

Differential Revision: D5544436

Pulled By: ajkr

fbshipit-source-id: 478d56260995236ae90895ce5bd51f38882e185a
2017-08-11 13:57:17 -07:00
Siying Dong
666a005f9b Support prefetch last 512KB with direct I/O in block based file reader
Summary:
Right now, if direct I/O is enabled, prefetching the last 512KB cannot be applied, except compaction inputs or readahead is enabled for iterators. This can create a lot of I/O for HDD cases. To solve the problem, the 512KB is prefetched in block based table if direct I/O is enabled. The prefetched buffer is passed in totegher with random access file reader, so that we try to read from the buffer before reading from the file. This can be extended in the future to support flexible user iterator readahead too.
Closes https://github.com/facebook/rocksdb/pull/2708

Differential Revision: D5593091

Pulled By: siying

fbshipit-source-id: ee36ff6d8af11c312a2622272b21957a7b5c81e7
2017-08-11 12:16:45 -07:00
Siying Dong
b87ee6f773 Use more keys per lock in daily TSAN crash test
Summary:
TSAN shows error when we grab too many locks at the same time. In TSAN crash test, make one shard key cover 2^22 keys so that no many keys will be hold at the same time.
Closes https://github.com/facebook/rocksdb/pull/2719

Differential Revision: D5609035

Pulled By: siying

fbshipit-source-id: 930e5d63fff92dbc193dc154c4c615efbdf06c6a
2017-08-10 17:56:57 -07:00
Aaron G
7848f0b24c add VerifyChecksum() to db.h
Summary:
We need a tool to check any sst file corruption in the db.
It will check all the sst files in current version and read all the blocks (data, meta, index) with checksum verification. If any verification fails, the function will return non-OK status.
Closes https://github.com/facebook/rocksdb/pull/2498

Differential Revision: D5324269

Pulled By: lightmark

fbshipit-source-id: 6f8a272008b722402a772acfc804524c9d1a483b
2017-08-09 15:58:13 -07:00
Andrew Kryczka
dce6d5a838 db_bench background work thread pool size arguments
Summary:
The background thread pools' sizes weren't easily configurable by `max_background_compactions` and `max_background_flushes` in multi-instance setups. Introduced separate arguments for their sizes.
Closes https://github.com/facebook/rocksdb/pull/2680

Differential Revision: D5550675

Pulled By: ajkr

fbshipit-source-id: bab5f0a7bc5db63bb084d0c10facbe437096367d
2017-08-03 21:41:49 -07:00
Alan Somers
5883a1ae24 Fix /bin/bash shebangs
Summary:
"/bin/bash" is a Linuxism.  "/usr/bin/env bash" is portable.
Closes https://github.com/facebook/rocksdb/pull/2646

Differential Revision: D5556259

Pulled By: ajkr

fbshipit-source-id: cbffd38ecdbfffb2438969ec007ab345ed893ccb
2017-08-03 15:56:46 -07:00
Andrew Kryczka
cc01985db0 Introduce bottom-pri thread pool for large universal compactions
Summary:
When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs.

This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions.

- Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads.
- Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default.
- Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic.
- Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`).
- Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free.
Closes https://github.com/facebook/rocksdb/pull/2580

Differential Revision: D5422916

Pulled By: ajkr

fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333
2017-08-03 15:43:29 -07:00
Andrew Kryczka
060ccd4f84 support multiple CFs with OPTIONS file
Summary:
Move an option necessary for running db_bench on multiple CFs into the general initialization area, so it works with both flag-based init and OPTIONS-based init.
Closes https://github.com/facebook/rocksdb/pull/2675

Differential Revision: D5541378

Pulled By: ajkr

fbshipit-source-id: 169926cb4ae95c17974f744faf7cc794d41e5c0a
2017-08-02 16:27:01 -07:00
Yi Wu
1900771bd2 Dump Blob DB options to info log
Summary:
* Dump blob db options to info log
* Remove BlobDBOptionsImpl to disallow dynamic cast *BlobDBOptions into *BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645
* Change some of the default options
* Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon.
Closes https://github.com/facebook/rocksdb/pull/2671

Differential Revision: D5529912

Pulled By: yiwu-arbug

fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7
2017-08-01 13:01:47 -07:00
Siying Dong
21696ba502 Replace dynamic_cast<>
Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645

Differential Revision: D5502723

Pulled By: siying

fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
2017-07-28 16:27:16 -07:00
Andrew Kryczka
f33f113683 fix db_bench argument type
Summary:
it should be a bool
Closes https://github.com/facebook/rocksdb/pull/2653

Differential Revision: D5506148

Pulled By: ajkr

fbshipit-source-id: f142f0f3aa8b678c68adef12e5ac6e1e163306f3
2017-07-27 12:14:37 -07:00
Siying Dong
c281b44829 Revert "CRC32 Power Optimization Changes"
Summary:
This reverts commit 2289d38115.
Closes https://github.com/facebook/rocksdb/pull/2652

Differential Revision: D5506163

Pulled By: siying

fbshipit-source-id: 105e31dd9d99090453a6b9f32c165206cd3affa3
2017-07-26 19:31:36 -07:00
Kamalalochana Subbaiah
2289d38115 CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2353

Differential Revision: D5210948

Pulled By: siying

fbshipit-source-id: 859a8c063d37697addd89ba2b8a14e5efd5d24bf
2017-07-26 09:42:29 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Maysam Yabandeh
2f375154ea checkout local branch in check_format_compatible.sh
Summary:
For forward_compatible_checkout_objs the local branch is already created in previous step. This patch avoid recreating it. This should address "fatal: A branch named '3.10.fb' already exists." errors.
Closes https://github.com/facebook/rocksdb/pull/2606

Differential Revision: D5443786

Pulled By: maysamyabandeh

fbshipit-source-id: 69d5a67b87677429cf36e3a467bd114d341f3b9c
2017-07-18 10:42:17 -07:00
Maysam Yabandeh
ddb22ac59c avoid collision with master branch in check format
Summary:
The new local branch specified with -b cannot be called master. Use tmp prefix to avoid name collision.
Closes https://github.com/facebook/rocksdb/pull/2600

Differential Revision: D5442944

Pulled By: maysamyabandeh

fbshipit-source-id: 4a623d9b21d6cc01bee812b2799790315bdf5f6e
2017-07-18 08:27:33 -07:00
Maysam Yabandeh
0c03a7f17d set the remote for git checkout
Summary:
This will fix the error: "error: pathspec '2.2.fb.branch' did not match any file(s) known to git."

Tested by manually sshing to sandcastle and running the command.
Closes https://github.com/facebook/rocksdb/pull/2599

Differential Revision: D5441130

Pulled By: maysamyabandeh

fbshipit-source-id: a22fd6a52221471bafbba8990394b499535e5812
2017-07-17 19:41:50 -07:00
Chris Lamb
b2dd192fed tools/write_stress.cc: Correct "1204" typos.
Summary:
Should be 1024, obviously :)
Closes https://github.com/facebook/rocksdb/pull/2592

Differential Revision: D5435269

Pulled By: ajkr

fbshipit-source-id: c59338a3900798a4733f0b205e534f21215cf049
2017-07-17 11:27:10 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Daniel Black
67510eeff3 db_crashtest.py: remove need for shell
Summary:
Before:
$ ps -ef

build     1713    16  0 Jul11 ?        00:00:00 make crash_test
build     3437  1713  0 Jul11 ?        00:00:00 python -u tools/db_crashtest.py --simple blackbox
build     3438  3437  0 Jul11 ?        00:00:00 [sh] <defunct>
build     3440     1 99 Jul11 ?        5-03:01:25 ./db_stress --max_background_compactions=1 --max_write_buffer_number=3 --sync=0 --reopen=20 --write_buffer_size=33554432 --delpercent=5 --block_size=16384 --allow_concurrent_me

After:

build      1706     16  0 02:52 ?        00:00:01 make crash_test
build      3432   1706  0 02:55 ?        00:00:00 python -u tools/db_crashtest.py --simple blackbox
build      4452   3432 99 04:35 ?        00:01:42 ./db_stress --max_background_compactions=1 --max_write_buffer_number=3 --sync=0 --reopen=20 --write_buffer_size=33554432 --delpercent=5 --block_size=16384 --allow_concurr
Closes https://github.com/facebook/rocksdb/pull/2571

Differential Revision: D5421580

Pulled By: maysamyabandeh

fbshipit-source-id: d6c3970c38ea0fa23da653f4385e8e25d83f5c9f
2017-07-14 09:11:03 -07:00
Yi Wu
5bfb67d90f Enable write rate limit for updaterandom benchmark
Summary:
We have FLAGS_benchmark_write_rate_limit to limit write rate in
db_bench, but it was not in use for updaterandom benchmark.
Closes https://github.com/facebook/rocksdb/pull/2578

Differential Revision: D5420328

Pulled By: yiwu-arbug

fbshipit-source-id: 5fa48c2b88f2f2dc83d615cb9c40c472bc916835
2017-07-13 16:27:38 -07:00
Siying Dong
98d1a5510b db_bench to by default verify checksum
Summary: Closes https://github.com/facebook/rocksdb/pull/2575

Differential Revision: D5417350

Pulled By: siying

fbshipit-source-id: 4bc11e35a7256167a5a7d2f586f2ac74c0deddb0
2017-07-13 12:14:17 -07:00
Yi Wu
c32f27223b Fixes db_bench with blob db
Summary:
* Create info log before db open to make blob db able to log to LOG file.
* Properly destroy blob db.
Closes https://github.com/facebook/rocksdb/pull/2567

Differential Revision: D5400034

Pulled By: yiwu-arbug

fbshipit-source-id: a49cfaf4b5c67d42d4cbb872bd5a9441828c17ce
2017-07-11 14:14:38 -07:00
Daniel Black
fcd99d27c9 db_bench_tool: fix buffer size
Summary:
Found by gcc warning:

x86_64-pc-linux-gnu-g++ --version
x86_64-pc-linux-gnu-g++ (GCC) 7.1.1 20170710

tools/db_bench_tool.cc: In member function 'void rocksdb::Benchmark::RandomWithVerify(rocksdb::ThreadState*)':
tools/db_bench_tool.cc:4430:8: error: '%lu' directive output may be truncated writing between 1 and 19 bytes into a region of size between 0 and 66 [-Werror=format-truncation=]
   void RandomWithVerify(ThreadState* thread) {
        ^~~~~~~~~~~~~~~~
tools/db_bench_tool.cc:4430:8: note: directive argument in the range [0, 9223372036854775807]
tools/db_bench_tool.cc:4492:13: note: 'snprintf' output between 37 and 128 bytes into a destination of size 100
     snprintf(msg, sizeof(msg),
     ~~~~~~~~^~~~~~~~~~~~~~~~~~
              "( get:%" PRIu64 " put:%" PRIu64 " del:%" PRIu64 " total:%" \
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              PRIu64 " found:%" PRIu64 ")",
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
              gets_done, puts_done, deletes_done, readwrites_, found);
              ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
Makefile:1707: recipe for target 'tools/db_bench_tool.o' failed
Closes https://github.com/facebook/rocksdb/pull/2558

Differential Revision: D5398703

Pulled By: siying

fbshipit-source-id: 6ffa552bbd8b59cfc2c36289f86ff9b9acca8ca6
2017-07-11 10:41:06 -07:00
Aaron Gao
87128bd5c5 fix regression test
Summary:
delete the checkout line
Closes https://github.com/facebook/rocksdb/pull/2556

Differential Revision: D5394446

Pulled By: lightmark

fbshipit-source-id: 4afaf35e7ad0378d12169d13a3131a92b428b5d2
2017-07-10 17:17:31 -07:00
Andrew Kryczka
657df29eaf Add max_background_jobs to db_bench
Summary:
As titled. Also fixed an off-by-one error causing us to add one less range deletion than the user specified.
Closes https://github.com/facebook/rocksdb/pull/2544

Differential Revision: D5383451

Pulled By: ajkr

fbshipit-source-id: cbd5890c33f09bbb5c0c1f4bb952a1add32336e0
2017-07-07 18:13:21 -07:00
Siying Dong
7c4a9e6c92 Initialize a variable in ldb to make code analysis tool happy
Summary: Closes https://github.com/facebook/rocksdb/pull/2529

Differential Revision: D5378787

Pulled By: siying

fbshipit-source-id: 801ecacc2804f77cb3c9e5e665829e439db68800
2017-07-06 15:56:57 -07:00
Andrew Kryczka
33042573db Fix GetCurrentTime() initialization for valgrind
Summary:
Valgrind had false positive complaints about the initialization pattern for `GetCurrentTime()`'s argument in #2480. We can instead have the client initialize the time variable before calling `GetCurrentTime()`, and have `GetCurrentTime()` promise to only overwrite it in success case.
Closes https://github.com/facebook/rocksdb/pull/2526

Differential Revision: D5358689

Pulled By: ajkr

fbshipit-source-id: 857b189f24c19196f6bb299216f3e23e7bc4be42
2017-07-05 12:12:00 -07:00
Siying Dong
1cb8c6de63 Add -enable_pipelined_write to db_bench and add two defaults
Summary:
Expose pipeline write in db_bench and change the default to parallel memtable inserts
Closes https://github.com/facebook/rocksdb/pull/2527

Differential Revision: D5359825

Pulled By: siying

fbshipit-source-id: e30755feb07ff19a731c4058acf101e02de4e197
2017-06-30 15:27:03 -07:00
Yi Wu
67b417d624 fix format compatible test
Summary:
The comma "," is not a valid separator for bash arrays.
Closes https://github.com/facebook/rocksdb/pull/2516

Differential Revision: D5348101

Pulled By: yiwu-arbug

fbshipit-source-id: 8f0afdac368e21076eb7366b7df7dbaaf158cf96
2017-06-29 10:42:14 -07:00
Mike Kolupaev
397ab11152 Improve Status message for block checksum mismatches
Summary:
We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages.

It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild.
Closes https://github.com/facebook/rocksdb/pull/2507

Differential Revision: D5345702

Pulled By: al13n321

fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339
2017-06-28 21:27:01 -07:00
Sagar Vemuri
1cd45cd1b3 FIFO Compaction with TTL
Summary:
Introducing FIFO compactions with TTL.

FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size.

To address that request:
- Added a new TTL option to FIFO compaction options.
- Updated FIFO compaction score to take TTL into consideration.
- Added a new table property, creation_time, to keep track of when the SST file is created.
- Creation_time is set as below:
  - On Flush: Set to the time of flush.
  - On Compaction: Set to the max creation_time of all the files involved in the compaction.
  - On Repair and Recovery: Set to the time of repair/recovery.
  - Old files created prior to this code change will have a creation_time of 0.
- FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. `if (file.creation_time < (current_time - ttl)) then delete(file)`. This will enable cases where you might want to delete all files older than, say, 1 day.
- FIFO compaction will fall back to the prior way of deleting files based on size if:
  - the creation_time of all files involved in compaction is 0.
  - the total size (of all SST files combined) does not drop below `compaction_options_fifo.max_table_files_size` even if the files older than ttl are deleted.

This feature is not supported if max_open_files != -1 or with table formats other than Block-based.

**Test Plan:**
Added tests.

**Benchmark results:**
Base: FIFO with max size: 100MB ::
```
svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100

readwhilewriting :       1.924 micros/op 519858 ops/sec;   13.6 MB/s (1176277 of 5000000 found)
```

With TTL (a low one for testing) ::
```
svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20

readwhilewriting :       1.902 micros/op 525817 ops/sec;   13.7 MB/s (1185057 of 5000000 found)
```
Example Log lines:
```
2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion
2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files
...
2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK
2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40}
```

SST Files remaining in the dbbench dir, after db_bench execution completed:
```
svemuri@dev15905 ~/rocksdb (fifo-compaction)  $ ls -l /dev/shm//dbbench/*.sst
-rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst
-rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst
-rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst
```
Closes https://github.com/facebook/rocksdb/pull/2480

Differential Revision: D5305116

Pulled By: sagar0

fbshipit-source-id: 3e5cfcf5dd07ed2211b5b37492eb235b45139174
2017-06-27 17:11:48 -07:00
Yi Wu
dc3d2e4d21 update compatible test
Summary:
update compatible test to include 5.5 and 5.6 branch.
Closes https://github.com/facebook/rocksdb/pull/2501

Differential Revision: D5325220

Pulled By: yiwu-arbug

fbshipit-source-id: 5f5271491e6dd2d7b2cf73a7142f38a571553bc4
2017-06-27 11:00:59 -07:00
Maysam Yabandeh
2c98b06bff Remove pin_slice option by making it the default
Summary:
This would simplify db_bench_tool.cc
Closes https://github.com/facebook/rocksdb/pull/2457

Differential Revision: D5259035

Pulled By: maysamyabandeh

fbshipit-source-id: 0a9c3abda624070fe2650200b885ad7e1c60182c
2017-06-15 16:14:08 -07:00
Maysam Yabandeh
c80c6115de add db_bench options for partitioning
Summary: Closes https://github.com/facebook/rocksdb/pull/2456

Differential Revision: D5259083

Pulled By: maysamyabandeh

fbshipit-source-id: 1ed1746da7a8baadf4772d023d927c6c4e6b112a
2017-06-15 16:14:08 -07:00
Sagar Vemuri
89ad9f3adb Allow ignoring unknown options when loading options from a file
Summary:
Added a flag, `ignore_unknown_options`, to skip unknown options when loading an options file (using `LoadLatestOptions`/`LoadOptionsFromFile`) or while verifying options (using `CheckOptionsCompatibility`). This will help in downgrading the db to an older version.

Also added `--ignore_unknown_options` flag to ldb

**Example Use case:**
In MyRocks, if copying from newer version to older version, it is often impossible to start because of new RocksDB options that don't exist in older version, even though data format is compatible.
MyRocks uses these load and verify functions in [ha_rocksdb.cc::check_rocksdb_options_compatibility](e004fd9f41/storage/rocksdb/ha_rocksdb.cc (L3348-L3401)).

**Test Plan:**
Updated the unit tests.
`make check`

ldb:
$ ./ldb --db=/tmp/test_db --create_if_missing put a1 b1
OK

Now edit /tmp/test_db/<OPTIONS-file> and add an unknown option.

Try loading the options now, and it fails:
$ ./ldb --db=/tmp/test_db --try_load_options get a1
Failed: Invalid argument: Unrecognized option DBOptions:: abcd

Passes with the new --ignore_unknown_options flag
$ ./ldb --db=/tmp/test_db --try_load_options --ignore_unknown_options get a1
b1
Closes https://github.com/facebook/rocksdb/pull/2423

Differential Revision: D5212091

Pulled By: sagar0

fbshipit-source-id: 2ec17636feb47dc0351b53a77e5f15ef7cbf2ca7
2017-06-13 16:58:01 -07:00
Andrew Kryczka
c217e0b9c7 Call RateLimiter for compaction reads
Summary:
Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed.
Closes https://github.com/facebook/rocksdb/pull/2433

Differential Revision: D5216946

Pulled By: ajkr

fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e
2017-06-13 14:56:46 -07:00
hyunwoo
c7662a44a4 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2376

Differential Revision: D5183630

Pulled By: ajkr

fbshipit-source-id: 133cfd0445959e70aa2cd1a12151bf3c0c5c3ac5
2017-06-05 11:27:34 -07:00
Aaron Gao
7e5fac2c34 remove test dir before exit when current regression is running
Summary:
clean up the current test dir if the last regression test is still running.
Closes https://github.com/facebook/rocksdb/pull/2401

Differential Revision: D5177882

Pulled By: lightmark

fbshipit-source-id: 91d899fcc2bde841948eae71af8584d4bdb35468
2017-06-02 17:26:19 -07:00
Aaron Gao
7f6c02dda1 using ThreadLocalPtr to hide ROCKSDB_SUPPORT_THREAD_LOCAL from public…
Summary:
… headers

https://github.com/facebook/rocksdb/pull/2199 should not reference RocksDB-specific macros (like ROCKSDB_SUPPORT_THREAD_LOCAL in this case) to public headers, `iostats_context.h` and `perf_context.h`. We shouldn't do that because users have to provide these compiler flags when building their binary with RocksDB.

We should hide the thread local global variable inside our implementation and just expose a function api to retrieve these variables. It may break some users for now but good for long term.

make check -j64
Closes https://github.com/facebook/rocksdb/pull/2380

Differential Revision: D5177896

Pulled By: lightmark

fbshipit-source-id: 6fcdfac57f2e2dcfe60992b7385c5403f6dcb390
2017-06-02 17:26:19 -07:00
Siying Dong
95b0e89b5d Improve write buffer manager (and allow the size to be tracked in block cache)
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350

Differential Revision: D5110648

Pulled By: siying

fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
2017-06-02 14:26:56 -07:00
Aaron Gao
8721996065 add checkpoint support for single db in regression test
Summary:
For level_compaction_style regression test.
Closes https://github.com/facebook/rocksdb/pull/2397

Differential Revision: D5168545

Pulled By: lightmark

fbshipit-source-id: 195e4d84917e7c261d9f4fbe9aee5d104c9cb9a2
2017-06-01 15:56:59 -07:00
Aaron Gao
9b3ed83506 fix regression test
Summary:
fix regression test by not reporting stats when building db
Closes https://github.com/facebook/rocksdb/pull/2390

Differential Revision: D5159909

Pulled By: lightmark

fbshipit-source-id: c3f4b9deb9c6799ff84207fd341c529144f8158d
2017-05-31 15:41:45 -07:00
Aaron Gao
cbc821c25b change regression rebuild to one level
Summary:
abandon fillseqdeterministic
test locally
Closes https://github.com/facebook/rocksdb/pull/2290

Differential Revision: D5151867

Pulled By: lightmark

fbshipit-source-id: 4c8a24cc937212ffb5ceb9bfaf7288eb8726d0c1
2017-05-30 16:41:21 -07:00
Yi Wu
0be636bf70 Fix db_bench build break with blob db
Summary:
Lite build does not recognize FLAGS_use_blob_db. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/2372

Reviewed By: anirbanr-fb

Differential Revision: D5130773

Pulled By: yiwu-arbug

fbshipit-source-id: 43131d9d0be5811f2129af562be72cca26369cb3
2017-05-25 14:11:22 -07:00
Aaron Gao
135ee6a3fc fix tsan crash data race
Summary:
rand_ has data race risk
TEST_TMPDIR=\/dev\/shm\/rocksdb OPT=-g COMPILE_WITH_TSAN=1 CRASH_TEST_KILL_ODD=1887 make J=1 crash_test
Closes https://github.com/facebook/rocksdb/pull/2368

Differential Revision: D5127424

Pulled By: lightmark

fbshipit-source-id: b7f4d1430a5769b57da9f99037106749264b2ced
2017-05-25 10:44:07 -07:00
Andrew Kryczka
bb01c1880c Introduce max_background_jobs mutable option
Summary:
- `max_background_flushes` and `max_background_compactions` are still supported for backwards compatibility
- `base_background_compactions` is completely deprecated. Now we just throttle to one background compaction when there's no pressure.
- `max_background_jobs` is added to automatically partition the concurrent background jobs into flushes vs compactions. Currently it's very simple as we just allocate one-fourth of the jobs to flushes, and the remaining can be used for compactions.
- The test cases that set `base_background_compactions > 1` needed to be updated. I just grab the pressure token such that the desired number of compactions can be scheduled.
Closes https://github.com/facebook/rocksdb/pull/2205

Differential Revision: D4937461

Pulled By: ajkr

fbshipit-source-id: df52cbbd497e13bbc9a60560a5ac2a2526b3f1f9
2017-05-24 11:29:08 -07:00
Sagar Vemuri
85b8569ae8 Fix release build on Linux
Summary:
Release builds are failing on Linux with the error:
```
tools/db_stress.cc: In function ‘int main(int, char**)’:
tools/db_stress.cc:2365:12: error: ‘rocksdb::SyncPoint’ has not been declared
   rocksdb::SyncPoint::GetInstance()->SetCallBack(
            ^
tools/db_stress.cc:2370:12: error: ‘rocksdb::SyncPoint’ has not been declared
   rocksdb::SyncPoint::GetInstance()->SetCallBack(
            ^
tools/db_stress.cc:2375:12: error: ‘rocksdb::SyncPoint’ has not been declared
   rocksdb::SyncPoint::GetInstance()->EnableProcessing();
            ^
make[1]: *** [tools/db_stress.o] Error 1
make[1]: Leaving directory `/data/sandcastle/boxes/trunk-git-rocksdb-public'
make: *** [release] Error 2
```
Closes https://github.com/facebook/rocksdb/pull/2355

Differential Revision: D5113552

Pulled By: sagar0

fbshipit-source-id: 351df707277787da5633ba4a40e52edc7c895dc4
2017-05-23 14:57:05 -07:00
Yi Wu
578fb0b1dc Simple blob file dumper
Summary:
A simple blob file dumper.
Closes https://github.com/facebook/rocksdb/pull/2242

Differential Revision: D5097553

Pulled By: yiwu-arbug

fbshipit-source-id: c6e00d949fcd3658f9f68da9352f06339fac418d
2017-05-23 10:42:59 -07:00
Aaron Gao
3e86c0f07c disable direct reads for log and manifest and add direct io to tests
Summary:
Disable direct reads for log and manifest. Direct reads should not affect sequential_file
Also add kDirectIO for option_config_ in db_test_util
Closes https://github.com/facebook/rocksdb/pull/2337

Differential Revision: D5100261

Pulled By: lightmark

fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b
2017-05-22 18:41:28 -07:00
hyunwoo
f720796e24 fixed typo
Summary:
fixed exisitng -> existing
Closes https://github.com/facebook/rocksdb/pull/2305

Differential Revision: D5070169

Pulled By: yiwu-arbug

fbshipit-source-id: 8c8450acf50757b767cf78b78314018395738d96
2017-05-16 11:07:58 -07:00
Andrew Kryczka
3fa9a39c68 Add GetAllKeyVersions API
Summary:
- Introduced an include/ file dedicated to db-related debug functions to avoid making db.h more complex
- Added debugging function, `GetAllKeyVersions()`, to return a listing of internal data for a range of user keys. The new `struct KeyVersion` exposes data similar to internal key without exposing any internal type.
- Migrated the "ldb idump" subcommand to use this function
- The API takes an inclusive-exclusive range to match behavior of "ldb idump". This will be quite annoying for users who want to query a single user key's versions :(.
Closes https://github.com/facebook/rocksdb/pull/2232

Differential Revision: D4976007

Pulled By: ajkr

fbshipit-source-id: cab375da53a7595d6575af2b7e3b776aa3ad793e
2017-05-12 15:54:06 -07:00
Anirban Rahut
d85ff4953c Blob storage pr
Summary:
The final pull request for Blob Storage.
Closes https://github.com/facebook/rocksdb/pull/2269

Differential Revision: D5033189

Pulled By: yiwu-arbug

fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06
2017-05-10 15:14:44 -07:00
Siying Dong
264d3f540c Allow IntraL0 compaction in FIFO Compaction
Summary:
Allow an option for users to do some compaction in FIFO compaction, to pay some write amplification for fewer number of files.
Closes https://github.com/facebook/rocksdb/pull/2163

Differential Revision: D4895953

Pulled By: siying

fbshipit-source-id: a1ab608dd0627211f3e1f588a2e97159646e1231
2017-05-04 18:16:13 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Siying Dong
963eeba488 Revert how check_format_compatible.sh checkout release branches.
Summary:
In a previous commit, I changed the way to checkout release branches from "git checkout <branch_name>" to "git checkout origin/<branch_name>". However, this doesn't seem to work in our CI environment. Revert it.
Closes https://github.com/facebook/rocksdb/pull/2189

Differential Revision: D4922294

Pulled By: siying

fbshipit-source-id: 482c17f9b05e6ccb190876b050682fe5a458103d
2017-04-20 11:06:09 -07:00
Siying Dong
97005dbd5d tools/check_format_compatible.sh to cover option file loading too
Summary:
tools/check_format_compatible.sh will check a newer version of RocksDB can open option files generated by older version releases. In order to achieve that, a new parameter "--try_load_options" is added to ldb. With this parameter set, if option file exists, we load the option file and use it to open the DB. With this opiton set, we can validate option loading logic.
Closes https://github.com/facebook/rocksdb/pull/2178

Differential Revision: D4914989

Pulled By: siying

fbshipit-source-id: db114f7724fcb41e5e9483116d84d7c4b8389ca4
2017-04-20 10:26:37 -07:00
Maysam Yabandeh
8f61967881 Add cpu usage to regression benchmarks (4th attempt)
Summary:
Tested by running it on a remote machine.

I could not run it on the particular remote machine which has a different location for time command since it is busy and the script does not allow concurrent runs. So I tested it by hacking the script and replacing the command with "\$(hostname)" and confirmed that the scripts prints out the host name of the remote machine.
Closes https://github.com/facebook/rocksdb/pull/2181

Differential Revision: D4921654

Pulled By: maysamyabandeh

fbshipit-source-id: 8abb5ea9f7234f3c50a749576ccbb47ff605beb9
2017-04-20 09:31:09 -07:00
Maysam Yabandeh
927bbab25c Revert "Add cpu usage to regression benchmarks (3rd attempt)"
Summary:
This reverts commit 476e80be80.
Closes https://github.com/facebook/rocksdb/pull/2177

Differential Revision: D4914830

Pulled By: maysamyabandeh

fbshipit-source-id: 039299348ceb325aa721eb35e3a26e890f84ee74
2017-04-19 11:11:12 -07:00
Siying Dong
1553659d6a Add more recent versions to tools/check_format_compatible.sh
Summary:
Need to add more recent versions to tools/check_format_compatible.sh to meka sure backward and forward compatibility.
Closes https://github.com/facebook/rocksdb/pull/2175

Differential Revision: D4911585

Pulled By: siying

fbshipit-source-id: 943e6488757efb11bb6720d811c7ba949915c9de
2017-04-18 18:57:11 -07:00
Maysam Yabandeh
476e80be80 Add cpu usage to regression benchmarks (3rd attempt)
Summary:
Tested by running rocks/tools/debug_regression_test.sh and verifying the local output:
```
 cat local/rocks_regression_tests/OPTIONS-viewstate-66-1262-5000/2017-04-13-11-34-51/SUMMARY.csv
                               commit id,                benchmark,                     user@host,num-dbs,key-range,key-size,value-size,compress-rate,ops-per-thread,num-threads,  cache-size,flushes,compactions,ops-per-s,       p50,       p75,       p99,     p99.9,    p99.99,debug,real-sec,user-sec,sys-sec
d2dce5611a,               readrandom,                root@localhost,     12,     5000,      66,      1262,           50,           312,         16,  1073741824,      4,         16,   138458,      9380,     11530,     55200,  16803200,  32504000,    1,    0,    0,    0
d2dce5611a,         readwhilewriting,                root@localhost,     12,     5000,      66,      1262,           50,           312,         16,  1073741824,      4,         16,   104511,
Closes https://github.com/facebook/rocksdb/pull/2157

Differential Revision: D4909238

Pulled By: maysamyabandeh

fbshipit-source-id: dc7bb8569c3c33b9f7c4ba47a757b24d27bb3b31
2017-04-18 17:11:25 -07:00
Siying Dong
c49d704656 Add DB:ResetStats()
Summary:
Add a function to allow users to reset internal stats without restarting the DB.
Closes https://github.com/facebook/rocksdb/pull/2167

Differential Revision: D4907939

Pulled By: siying

fbshipit-source-id: ab2dd85b88aabe9380da7485320a1d460d3e1f68
2017-04-18 16:56:48 -07:00
Aaron Gao
44fa8ece9b change use_direct_writes to use_direct_io_for_flush_and_compaction
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117

Differential Revision: D4860912

Pulled By: lightmark

fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
2017-04-13 16:12:04 -07:00
Aaron Gao
ba7da434ae fix db_stress crash caused by buggy kernel warning
Summary:
filter the warning out and only print it once.
Closes https://github.com/facebook/rocksdb/pull/2137

Differential Revision: D4870925

Pulled By: lightmark

fbshipit-source-id: 91b363ce7f70bce88b0780337f408fc4649139b8
2017-04-11 16:56:59 -07:00
Maysam Yabandeh
6a8d5c015b Revert "Report cpu usage using time command"
Summary:
This reverts commit 97ec8a1349.
Closes https://github.com/facebook/rocksdb/pull/2136

Differential Revision: D4870610

Pulled By: maysamyabandeh

fbshipit-source-id: cdbfba135b065562f38f704f350a9a4e63a9a122
2017-04-11 13:57:58 -07:00
Maysam Yabandeh
97ec8a1349 Report cpu usage using time command
Summary:
Run the time command before regression tests, parse the output, and add the numbers to the report.
Closes https://github.com/facebook/rocksdb/pull/2101

Differential Revision: D4862781

Pulled By: maysamyabandeh

fbshipit-source-id: 4a81caa5d14187d67093aad154c8f0ad56aba901
2017-04-10 14:59:31 -07:00
Maysam Yabandeh
9690653db5 Add a verify phase to benchmarks
Summary:
Check the result of the benchmark againt a specified truth_db, which is
expected to be produced using the same benchmark but perhaps on a
different commit or with different configs.

The verification is simple and assumes that key/values are generated
deterministically. This assumption would break if db_bench using rand
variable differently from the benchmark that produced truth_db.
Currently it is checked to work on fillrandom and readwhilewriting.

A param finish_after_writes is added to ensure that the background
writing thread will write the same number of entries between two
benchmarks.

Example:
$ TEST_TMPDIR=/dev/shm/truth_db ./db_bench
--benchmarks="fillrandom,readwhilewriting" --num=200000
--finish_after_writes=true
$ TEST_TMPDIR=/dev/shm/tmpdb ./db_bench
--benchmarks="fillrandom,readwhilewriting,verify" --truth_db
/dev/shm/truth_db/dbbench --num=200000 --finish_after_writes=true
Verifying db <= truth_db...
Verifying db >= truth_db...
...Verified
Closes https://github.com/facebook/rocksdb/pull/2098

Differential Revision: D4839233

Pulled By: maysamyabandeh

fbshipit-source-id: 2f4ed31
2017-04-07 11:39:12 -07:00
Sagar Vemuri
343b59d6ee Move various string utility functions into string_util
Summary:
This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.

Check the sub-commits for easier reviewing.
Closes https://github.com/facebook/rocksdb/pull/2094

Differential Revision: D4837730

Pulled By: sagar0

fbshipit-source-id: 344278a
2017-04-06 14:54:12 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Aaron Gao
af256eb2b7 build db every monday
Summary:
Rebuilding regression db at the first run every monday.
Closes https://github.com/facebook/rocksdb/pull/2093

Differential Revision: D4836961

Pulled By: lightmark

fbshipit-source-id: 22d6c25
2017-04-05 12:24:21 -07:00
Siying Dong
6ef8c620d3 Move auto_roll_logger and filename out of db/
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080

Differential Revision: D4821141

Pulled By: siying

fbshipit-source-id: ca7d768
2017-04-03 18:39:14 -07:00
Aaron Gao
0537f515c2 fix run_remote with strong quoting
Summary:
add \' pair to avoid bash expanding before run remotely
Closes https://github.com/facebook/rocksdb/pull/2079

Differential Revision: D4821342

Pulled By: lightmark

fbshipit-source-id: 418ba41
2017-04-03 13:09:12 -07:00
Aaron Gao
bd7d13835e test remote instead run remote in regression test
Summary:
avoid exit when $? != 0
Closes https://github.com/facebook/rocksdb/pull/2076

Differential Revision: D4815655

Pulled By: lightmark

fbshipit-source-id: 829299e
2017-03-31 18:54:11 -07:00
Aaron Gao
c81a805fe6 test db existence on the remote host
Summary:
Fix the bug that previous test existence locally
Closes https://github.com/facebook/rocksdb/pull/2074

Differential Revision: D4813631

Pulled By: lightmark

fbshipit-source-id: eceb0d9
2017-03-31 14:54:17 -07:00
Aaron Gao
5fc1e6765a add -rf when remove db in regression test
Summary:
force remove db dir when rebuilding
Closes https://github.com/facebook/rocksdb/pull/2067

Differential Revision: D4811926

Pulled By: lightmark

fbshipit-source-id: ab068a2
2017-03-31 12:10:23 -07:00
Aaron Gao
da175f7ecc exit with code 2 when there is already a db_bench running in regression test
Summary: Closes https://github.com/facebook/rocksdb/pull/2063

Differential Revision: D4805507

Pulled By: lightmark

fbshipit-source-id: 59ccb18
2017-03-30 14:09:11 -07:00
Aaron Gao
f3607640a6 add ldb build to regression test
Summary:
add ldb to regression test in order to enable reuse db by creating checkpoint
Closes https://github.com/facebook/rocksdb/pull/2030

Differential Revision: D4800549

Pulled By: lightmark

fbshipit-source-id: d3a7325
2017-03-29 18:24:11 -07:00
Yi Wu
41fe9ad75b Hide usage of compaction_options_fifo from lite build
Summary:
...to fix lite build error.
Closes https://github.com/facebook/rocksdb/pull/2046

Differential Revision: D4785910

Pulled By: yiwu-arbug

fbshipit-source-id: b591f27
2017-03-28 13:39:13 -07:00
Shu Zhang
8dee8cad9e Enable fifo compaction benchmark to db_bench
Summary:
Added fifo benchmark to db_bench.
One thing i am not sure is that i am using CompactRange() instead of CompactFiles(). (may cause performance skew because CompactionRange() is not happening in current thread?)  For CompactFiles(), for some reason FIFO compaction doesn't work as expected. More insight is welcomed. I guess FIFO compaction doesn't work with file names? igorcanadi

test cmd:
./db_bench --compaction_style=2 --benchmarks=fillseqdeterministic --disable_auto_compactions --num_levels=1 --fifo_compaction_max_table_files_size_mb=10

---------------------- DB 0 LSM ---------------------
Level[0]: /000014.sst(size: 4211014 bytes)
fillseqdeterministic :       4.731 micros/op 211381 ops/sec;   23.4 MB/s
Closes https://github.com/facebook/rocksdb/pull/1734

Differential Revision: D4774964

Pulled By: siying

fbshipit-source-id: 9d08df6
2017-03-24 17:09:15 -07:00
Siying Dong
e474df9470 db_bench: not need to check mmap for PlainTable
Summary:
PlainTable now supports non-mmap mode. We don't need to check it anymore.
Closes https://github.com/facebook/rocksdb/pull/1882

Differential Revision: D4751643

Pulled By: siying

fbshipit-source-id: ab14540
2017-03-22 11:09:13 -07:00
Siying Dong
17866ecc3a Allow Users to change customized ldb tools' header in help printing
Summary: Closes https://github.com/facebook/rocksdb/pull/2018

Differential Revision: D4748448

Pulled By: siying

fbshipit-source-id: a54c2f9
2017-03-21 17:39:12 -07:00
Leonidas Galanis
a2a883318b remove deleted option from benchmark.sh
Summary:
Removed max_grandparent_overlap_factor from benchmark.sh since it is not a valid option anymore.
Closes https://github.com/facebook/rocksdb/pull/2015

Differential Revision: D4748229

Pulled By: lgalanis

fbshipit-source-id: c3869ea
2017-03-21 12:54:13 -07:00
Aaron Gao
78cb195595 add checkpoint to ldb
Summary: Closes https://github.com/facebook/rocksdb/pull/2017

Differential Revision: D4747656

Pulled By: lightmark

fbshipit-source-id: c52f160
2017-03-21 11:54:11 -07:00
Aaron Gao
93c68b642e change regression bash file with debug mode
Summary:
add debug mode for better debugging
refactor some regex
change micros/op to ops/sec
Closes https://github.com/facebook/rocksdb/pull/1999

Differential Revision: D4742806

Pulled By: lightmark

fbshipit-source-id: 0fe3ae6
2017-03-20 17:39:17 -07:00
Andrew Kryczka
e66221add4 fix db_bench rate limiter callsites
Summary:
pass nullptr as stats object for db_bench-specific rate limiters since its stats are intended to capture background write activity only.
Closes https://github.com/facebook/rocksdb/pull/1997

Differential Revision: D4726806

Pulled By: ajkr

fbshipit-source-id: 8e4b225
2017-03-16 17:54:12 -07:00
Maysam Yabandeh
11526252cc Pinnableslice (2nd attempt)
Summary:
PinnableSlice

    Summary:
    Currently the point lookup values are copied to a string provided by the
    user. This incures an extra memcpy cost. This patch allows doing point lookup
    via a PinnableSlice which pins the source memory location (instead of
    copying their content) and releases them after the content is consumed
    by the user. The old API of Get(string) is translated to the new API
    underneath.

    Here is the summary for improvements:

    value 100 byte: 1.8% regular, 1.2% merge values
    value 1k byte: 11.5% regular, 7.5% merge values
    value 10k byte: 26% regular, 29.9% merge values
    The improvement for merge could be more if we extend this approach to
    pin the merge output and delay the full merge operation until the user
    actually needs it. We have put that for future work.

    PS:
    Sometimes we observe a small decrease in performance when switching from
    t5452014 to this patch but with the old Get(string) API. The d
Closes https://github.com/facebook/rocksdb/pull/1756

Differential Revision: D4391738

Pulled By: maysamyabandeh

fbshipit-source-id: 6f3edd3
2017-03-13 11:54:10 -07:00
Reid Horuff
ebd5639b6d Add ability to search for key prefix in sst_dump tool
Summary:
Add the flag --prefix to the sst_dump tool
This flag is similar to, and exclusive from, the --from flag.

--prefix=0x00FF will return all rows prefixed with 0x00FF.
The --to flag may also be specified and will work as expected.

These changes were used to help in debugging the power cycle corruption issue and theses changes were tested by scanning through a udb.
Closes https://github.com/facebook/rocksdb/pull/1984

Differential Revision: D4691814

Pulled By: reidHoruff

fbshipit-source-id: 027f261
2017-03-13 10:39:12 -07:00
Maysam Yabandeh
5dae019477 Revert "Report cpu usage using time command"
Summary:
This reverts commit d43adf21bb.

The patch has caused problems in regression tests. Will revert it for now until we figure how to debug the problems regression tests.
Closes https://github.com/facebook/rocksdb/pull/1975

Differential Revision: D4682880

Pulled By: maysamyabandeh

fbshipit-source-id: 84df83a
2017-03-09 11:09:13 -08:00
Maysam Yabandeh
d43adf21bb Report cpu usage using time command
Summary:
It augments the regression benchmarks with a time command, parses the output, and print them to the SUMMARY.csv file.

I tested a variation of the script locally. Any idea how to do run a test that also involves writing to scuba tables?
Closes https://github.com/facebook/rocksdb/pull/1967

Differential Revision: D4679470

Pulled By: maysamyabandeh

fbshipit-source-id: 44dac30
2017-03-08 17:54:11 -08:00
Andrew Kryczka
2a5daa06f0 Add stderr log level for ldb backup commands
Summary:
Also extracted the common logic into a base class, BackupableCommand.
Closes https://github.com/facebook/rocksdb/pull/1939

Differential Revision: D4630121

Pulled By: ajkr

fbshipit-source-id: 04bb067
2017-03-03 13:24:15 -08:00
Islam AbdelRahman
f89b3893c0 Remove skip_table_builder_flush and default it to true
Summary:
This option is needed to be enabled for Direct IO
and I cannot think of a reason where we need to disable it

remove it and default it to true
Closes https://github.com/facebook/rocksdb/pull/1944

Differential Revision: D4641088

Pulled By: IslamAbdelRahman

fbshipit-source-id: d7085b9
2017-03-02 16:54:10 -08:00
Andrew Kryczka
cc253982dd Use more default options in db_bench
Summary:
The default behavior was too weird because, previously, we got the L0 file size limit (64MB) from Options default and L1+ file size limit (2MB) from the hardcoded value. We should get both from Options default.
Closes https://github.com/facebook/rocksdb/pull/1943

Differential Revision: D4640301

Pulled By: ajkr

fbshipit-source-id: fd8c0fd
2017-03-02 10:54:11 -08:00
Islam AbdelRahman
be3e5568be Fix unaligned reads in read cache
Summary:
- Fix unaligned reads in read cache by using RandomAccessFileReader
- Allow read cache flags in db_bench
Closes https://github.com/facebook/rocksdb/pull/1916

Differential Revision: D4610885

Pulled By: IslamAbdelRahman

fbshipit-source-id: 2aa1dc8
2017-02-27 13:09:12 -08:00
Aaron Gao
e7d902e693 add direct_io and compaction_readahead_size in db_stress
Summary:
add direct_io and compaction_readahead_size in db_stress
test direct_io under db_stress with compaction_readahead_size enabled to capture bugs found in production.
`./db_stress --allow_concurrent_memtable_write=0 --use_direct_reads --use_direct_writes --compaction_readahead_size=4096`
Closes https://github.com/facebook/rocksdb/pull/1906

Differential Revision: D4604514

Pulled By: IslamAbdelRahman

fbshipit-source-id: ebbf0ee
2017-02-23 11:39:14 -08:00
Sagar Vemuri
eb912a927e Remove disableDataSync option
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859

Differential Revision: D4541292

Pulled By: sagar0

fbshipit-source-id: 5b3a6ca
2017-02-13 11:09:13 -08:00
Andrew Kryczka
3b4ac8076b Clarify ldb column family argument
Summary:
move the argument description to the right section, make it clearer that the '=' sign is required, and use it in one subcommand where it seemed forgotten before.
Closes https://github.com/facebook/rocksdb/pull/1840

Differential Revision: D4515096

Pulled By: ajkr

fbshipit-source-id: 7b5b1c1
2017-02-07 11:54:10 -08:00
Andrew Kryczka
d70ce7ee0b Move db_bench flags out of unnamed namespace
Summary:
I want to be able to, e.g., DECLARE_string(statistics_string); in my application such that I can override the default value of statistics_string. For this to work, we need to remove the unnamed namespace containing all the flags, and make sure all variables/functions covered by that namespace are static.

Replaces #1828 due to internal tool issues.
Closes https://github.com/facebook/rocksdb/pull/1844

Differential Revision: D4515124

Pulled By: ajkr

fbshipit-source-id: 23b695e
2017-02-07 11:39:12 -08:00
Dmitri Smirnov
0a4cdde50a Windows thread
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
  an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823

Differential Revision: D4492902

Pulled By: siying

fbshipit-source-id: c74cb11
2017-02-06 14:54:18 -08:00
Andrew Kryczka
37d4a79e99 Deserialize custom Statistics object in db_bench
Summary:
Added -statistics_string to deserialize a Statistics object using the factory functions registered by applications.
Closes https://github.com/facebook/rocksdb/pull/1812

Differential Revision: D4469811

Pulled By: ajkr

fbshipit-source-id: 2d80862
2017-01-26 15:24:16 -08:00
Andrew Kryczka
17c1180603 Generalize Env registration framework
Summary:
The Env registration framework supports registering client Envs and selecting which one to instantiate according to a text field. This enabled things like adding the -env_uri argument to db_bench, so the same binary could be reused with different Envs just by changing CLI config.

Now this problem has come up again in a non-Env context, as I want to instantiate a client Statistics implementation from db_bench, which is configured entirely via text parameters. Also, in the future we may wish to use it for deserializing client objects when loading OPTIONS file.

This diff generalizes the Env registration logic to work with arbitrary types.

- Generalized registration and instantiation code by templating them
- The entire implementation is in a header file as that's Google style guide's recommendation for template definitions
- Pattern match with std::regex_match rather than checking prefix, which was the previous behavior
- Rename functions/files to be non-Env-specific
Closes https://github.com/facebook/rocksdb/pull/1776

Differential Revision: D4421933

Pulled By: ajkr

fbshipit-source-id: 34647d1
2017-01-25 16:09:14 -08:00
Maysam Yabandeh
d0ba8ec8f9 Revert "PinnableSlice"
Summary:
This reverts commit 54d94e9c2c.

The pull request was landed by mistake.
Closes https://github.com/facebook/rocksdb/pull/1755

Differential Revision: D4391678

Pulled By: maysamyabandeh

fbshipit-source-id: 36d5149
2017-01-08 14:24:12 -08:00
Maysam Yabandeh
54d94e9c2c PinnableSlice
Summary:
Currently the point lookup values are copied to a string provided by the user.
This incures an extra memcpy cost. This patch allows doing point lookup
via a PinnableSlice which pins the source memory location (instead of
copying their content) and releases them after the content is consumed
by the user. The old API of Get(string) is translated to the new API
underneath.

 Here is the summary for improvements:
 1. value 100 byte: 1.8%  regular, 1.2% merge values
 2. value 1k   byte: 11.5% regular, 7.5% merge values
 3. value 10k byte: 26% regular,    29.9% merge values

 The improvement for merge could be more if we extend this approach to
 pin the merge output and delay the full merge operation until the user
 actually needs it. We have put that for future work.

PS:
Sometimes we observe a small decrease in performance when switching from
t5452014 to this patch but with the old Get(string) API. The difference
is a little and could be noise. More importantly it is safely
cancelled
Closes https://github.com/facebook/rocksdb/pull/1732

Differential Revision: D4374613

Pulled By: maysamyabandeh

fbshipit-source-id: a077f1a
2017-01-08 13:54:13 -08:00
Dmitri Smirnov
e04480faed Fix MS warnings. Use ROCKSDB_Prsz for size_t.
Summary: Closes https://github.com/facebook/rocksdb/pull/1737

Differential Revision: D4378852

Pulled By: yiwu-arbug

fbshipit-source-id: ba8b02d
2017-01-06 16:24:14 -08:00
Yi Wu
640d724808 Update db_bench and sst_dump to test with block cache mid-point inser?
Summary:
?tion

Add flags in db_bench to test with block cache mid-point insertion.
Also update sst_dump to dump total block sizes of each type. I find it
useful to look at these test db stats and I don't know if we have them
elsewhere.
Closes https://github.com/facebook/rocksdb/pull/1706

Differential Revision: D4355812

Pulled By: yiwu-arbug

fbshipit-source-id: 3e4a348
2017-01-03 18:39:14 -08:00
Aaron Gao
972f96b3fb direct io write support
Summary:
rocksdb direct io support

```
[gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 5.0
Date:       Wed Nov 23 13:17:43 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s

[gzh@dev11575.prn2 ~/roc
Closes https://github.com/facebook/rocksdb/pull/1564

Differential Revision: D4241093

Pulled By: lightmark

fbshipit-source-id: 98c29e3
2016-12-22 13:09:19 -08:00
Siying Dong
7bd725e962 db_bench: introduce --benchmark_read_rate_limit
Summary:
Add the parameter in db_bench to help users to measure latency histogram with constant read rate.
Closes https://github.com/facebook/rocksdb/pull/1683

Differential Revision: D4341387

Pulled By: siying

fbshipit-source-id: 1b4b276
2016-12-19 12:39:11 -08:00
Daniel Black
816c1e30ca gcc-7 requires include <functional> for std::function
Summary:
Fixes compile error:

In file included from ./util/statistics.h:17:0,
                 from ./util/stop_watch.h:8,
                 from ./util/perf_step_timer.h:9,
                 from ./util/iostats_context_imp.h:8,
                 from ./util/posix_logger.h:27,
                 from ./port/util_logger.h:18,
                 from ./db/auto_roll_logger.h:15,
                 from db/auto_roll_logger.cc:6:
./util/thread_local.h:65:16: error: 'function' in namespace 'std' does not name a template type
   typedef std::function<void(void*, void*)> FoldFunc;
Closes https://github.com/facebook/rocksdb/pull/1656

Differential Revision: D4318702

Pulled By: yiwu-arbug

fbshipit-source-id: 8c5d17a
2016-12-16 11:24:18 -08:00
Daniel Black
0ab6fc167f Gcc-7 buffer size insufficient
Summary:
Bunch of commits related to insufficient buffer size. Errors in individual commits.
Closes https://github.com/facebook/rocksdb/pull/1673

Differential Revision: D4332127

Pulled By: IslamAbdelRahman

fbshipit-source-id: 878f73c
2016-12-14 19:24:26 -08:00
Islam AbdelRahman
d71e728c7a Print user collected properties in sst_dump
Summary:
Include a dump of user_collected_properties in sst_dump
Closes https://github.com/facebook/rocksdb/pull/1668

Differential Revision: D4325078

Pulled By: IslamAbdelRahman

fbshipit-source-id: 226b6d6
2016-12-14 11:24:11 -08:00
Daniel Black
1f6f7e3e89 cast to signed char in ldb_cmd_test for ppc64le
Summary:
char is unsigned on power by default causing this test to fail with the FF case. ppc64 return 255 while x86 returned -1. Casting works on both platforms.
Closes https://github.com/facebook/rocksdb/pull/1500

Differential Revision: D4308775

Pulled By: yiwu-arbug

fbshipit-source-id: db3e6e0
2016-12-12 14:39:18 -08:00
Andrew Kryczka
dc64f46b1c Add db_bench option for stderr logging
Summary:
The info log file ("LOG") is stored in the db directory by default. When the db is on a distributed env, this is unnecessarily slow. So, I added an option to db_bench to just print the info log messages to stderr.
Closes https://github.com/facebook/rocksdb/pull/1641

Differential Revision: D4309348

Pulled By: ajkr

fbshipit-source-id: 1e6f851
2016-12-09 16:54:14 -08:00
Andrew Kryczka
5241e0dbfc fix db_bench argument type
Summary: Closes https://github.com/facebook/rocksdb/pull/1633

Differential Revision: D4298161

Pulled By: yiwu-arbug

fbshipit-source-id: 2c7af35
2016-12-08 11:09:14 -08:00
Andrew Kryczka
2c2ba68247 db_stress support for range deletions
Summary:
made db_stress capable of adding range deletions to its db and verifying their correctness. i'll make db_crashtest.py use this option later once the collapsing optimization (https://github.com/facebook/rocksdb/pull/1614) is committed because currently it slows down the test too much.
Closes https://github.com/facebook/rocksdb/pull/1625

Differential Revision: D4293939

Pulled By: ajkr

fbshipit-source-id: d3beb3a
2016-12-07 13:09:24 -08:00
Andrew Kryczka
fa50fffaf5 Option to expand range tombstones in db_bench
Summary:
When enabled, this option replaces range tombstones with a sequence of
point tombstones covering the same range. This can be used to A/B test perf of
range tombstones vs sequential point tombstones, and help us find the cross-over
point, i.e., the size of the range above which range tombstones outperform point
tombstones.
Closes https://github.com/facebook/rocksdb/pull/1594

Differential Revision: D4246312

Pulled By: ajkr

fbshipit-source-id: 3b00b23
2016-12-06 18:09:14 -08:00
Yi Wu
56281f3a97 Add memtable_insert_with_hint_prefix_size option to db_bench
Summary:
Add memtable_insert_with_hint_prefix_size option to db_bench
Closes https://github.com/facebook/rocksdb/pull/1604

Differential Revision: D4260549

Pulled By: yiwu-arbug

fbshipit-source-id: cee5ef7
2016-12-01 16:54:16 -08:00
Islam AbdelRahman
4a21b1402c Cache heap::downheap() root comparison (optimize heap cmp call)
Summary:
Reduce number of comparisons in heap by caching which child node in the first level is smallest (left_child or right_child)
So next time we can compare directly against the smallest child

I see that the total number of calls to comparator drops significantly when using this optimization

Before caching (~2mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000  --perf_level=2
readseq      :       0.338 micros/op 2959201 ops/sec;  327.4 MB/s user_key_comparison_count = 2000008
```
After caching (~1mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000 --perf_level=2
readseq      :       0.309 micros/op 3236801 ops/sec;  358.1 MB/s user_key_comparison_count = 1000011
```

It also improves
Closes https://github.com/facebook/rocksdb/pull/1600

Differential Revision: D4256027

Pulled By: IslamAbdelRahman

fbshipit-source-id: 76fcc66
2016-12-01 13:39:14 -08:00
Igor Canadi
3f407b065c Kill flashcache code in RocksDB
Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72
2016-12-01 10:09:22 -08:00
Andrew Kryczka
fd43ee09da Range deletion microoptimizations
Summary:
- Made RangeDelAggregator's InternalKeyComparator member a reference-to-const so we don't need to copy-construct it. Also added InternalKeyComparator to ImmutableCFOptions so we don't need to construct one for each DBIter.
- Made MemTable::NewRangeTombstoneIterator and the table readers' NewRangeTombstoneIterator() functions return nullptr instead of NewEmptyInternalIterator to avoid the allocation. Updated callers accordingly.
Closes https://github.com/facebook/rocksdb/pull/1548

Differential Revision: D4208169

Pulled By: ajkr

fbshipit-source-id: 2fd65cf
2016-11-21 12:24:13 -08:00
Andrew Kryczka
018bb2ebf5 DeleteRange support for db_bench
Summary:
Added a few options to configure when to add range tombstones during
any benchmark involving writes.
Closes https://github.com/facebook/rocksdb/pull/1522

Differential Revision: D4187388

Pulled By: ajkr

fbshipit-source-id: 2c8a473
2016-11-15 17:39:47 -08:00
Andrew Kryczka
53b693f5fe ldb support for range delete
Summary:
Add a subcommand to ldb with which we can delete a range of keys.
Closes https://github.com/facebook/rocksdb/pull/1521

Differential Revision: D4186338

Pulled By: ajkr

fbshipit-source-id: b8e9861
2016-11-15 15:54:20 -08:00
Lijun Tang
adb665e0bf Allowed delayed_write_rate option to be dynamically set.
Summary: Closes https://github.com/facebook/rocksdb/pull/1488

Differential Revision: D4157784

Pulled By: siying

fbshipit-source-id: f150081
2016-11-12 15:54:11 -08:00
Anirban Rahut
14c0380e78 Convenience option to parse an internal key on command line
Summary:
enhancing sst_dump to be able to parse internal key
Closes https://github.com/facebook/rocksdb/pull/1482

Differential Revision: D4154175

Pulled By: siying

fbshipit-source-id: b0e28b1
2016-11-10 10:09:21 -08:00
Peter (Stig) Edwards
144cdb8f16 16384 as e.g .value for compression_max_dict_bytes
Summary:
Use 16384 as e.g .value for ldb the --compression_max_dict_bytes option.
I think 14 was copy and pasted from the options in the lines above.
Closes https://github.com/facebook/rocksdb/pull/1483

Differential Revision: D4154393

Pulled By: siying

fbshipit-source-id: ef53a69
2016-11-09 11:24:20 -08:00
Andrew Kryczka
9e7cf3469b DeleteRange user iterator support
Summary:
Note: reviewed in  https://reviews.facebook.net/D65115

- DBIter maintains a range tombstone accumulator. We don't cleanup obsolete tombstones yet, so if the user seeks back and forth, the same tombstones would be added to the accumulator multiple times.
- DBImpl::NewInternalIterator() (used to make DBIter's underlying iterator) adds memtable/L0 range tombstones, L1+ range tombstones are added on-demand during NewSecondaryIterator() (see D62205)
- DBIter uses ShouldDelete() when advancing to check whether keys are covered by range tombstones
Closes https://github.com/facebook/rocksdb/pull/1464

Differential Revision: D4131753

Pulled By: ajkr

fbshipit-source-id: be86559
2016-11-04 12:09:22 -07:00
Benoit Girard
2b16d664cb Change max_bytes_for_level_multiplier to double
Summary: Closes https://github.com/facebook/rocksdb/pull/1427

Differential Revision: D4094732

Pulled By: yiwu-arbug

fbshipit-source-id: b9b79e9
2016-11-01 21:09:23 -07:00
Siying Dong
f9eb56791a db_bench: --dump_malloc_stats takes no effect
Summary:
Fix the bug that --dump_malloc_stats is set before opening the DB.
Closes https://github.com/facebook/rocksdb/pull/1447

Differential Revision: D4106001

Pulled By: siying

fbshipit-source-id: 4e746da
2016-10-31 14:54:26 -07:00
Kien-hung Li
eeb27e1bbd Add handy option to turn on direct I/O in db_bench (#1424) 2016-10-28 10:36:05 -07:00
Yueh-Hsuan Chiang
f83cd64c0b Fix a bug that mistakenly disable regression_test.sh to update commit (#1415)
Summary:
Fix a bug that mistakenly disable regression_test.sh to update commit

Test Plan:
regression_test.sh
2016-10-21 17:26:24 -07:00
Aaron Gao
08616b4933 [db_bench] add filldeterministic (Universal+level compaction)
Summary:
in db_bench, we can dynamically create a rocksdb database that guarantees the shape of its LSM.
universal + level compaction
no fifo compaction
no multi db support

Test Plan:
./db_bench -benchmarks=fillseqdeterministic -compaction_style=1 -num_levels=3 --disable_auto_compactions -num=1000000 -value_size=1000
```
---------------------- LSM ---------------------
Level[0]: /000480.sst(size: 35060275 bytes)
Level[0]: /000479.sst(size: 70443197 bytes)
Level[0]: /000478.sst(size: 141600383 bytes)
Level[1]: /000341.sst - /000475.sst(total size: 284726629 bytes)
Level[2]: /000071.sst - /000340.sst(total size: 568649806 bytes)
fillseqdeterministic :      60.447 micros/op 16543 ops/sec;   16.0 MB/s
```

Reviewers: sdong, andrewkr, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63111
2016-10-18 16:30:57 -07:00
dhruba borthakur
5027dd17a7 Fix a minor bug in the ldb tool that was not selecting the specified (#1399)
column family for compaction.
2016-10-17 10:40:30 -07:00
Siying Dong
a249a0b75b check_format_compatible.sh to use some branch which allows to run with GCC 4.8 (#1393)
Summary:
Some older tags don't run GCC 4.8 with FB internal setting. Fixed them and created branches. Change the format compatible script accordingly.

Also add more releases to check format compatibility.
2016-10-13 16:15:55 -07:00
Kefu Chai
21f4bb5a89 cmake support for linux and osx (#1358)
* enable cmake to work on linux and osx also

* port part of build_detect_platform not covered by thirdparty.inc
  to cmake.
  - detect fallocate()
  - detect malloc_usable_size()
  - detect JeMalloc
  - detect snappy
* check for asan,tsan,ubsan
* create 'build_version.cc' in build directory.
* add `check` target to support 'make check'.
* add `tools` target to match its counterpart in Makefile.
* use `date` on non-win32 platforms.
* pass different cflags on non-win32 platforms
* detect pthead library using FindThread cmake module.
* enable CMP0042 to silence the cmake warning on osx
* reorder the linked libraries. because testutillib references gtest, to
  enable the linker to find the referenced symbols, we need to put gtest
  after testutillib.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>

* hash_table_bench.cc: fix build without gflags

Signed-off-by: Kefu Chai <kchai@redhat.com>

* remove gtest from librocksdb linkage

testharness.cc is included in librocksdb sources, and it uses gtest. but
gtest is not supposed to be part of the public API of librocksdb. so, in
this change, the testharness.cc is moved out out librocksdb, and is
built as an object target, then linked with the tools and tests instead.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-09-28 11:53:15 -07:00
Yi Wu
9ed928e7a9 Split DBOptions into ImmutableDBOptions and MutableDBOptions
Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.

Test Plan:
  make all check

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64065
2016-09-23 16:34:04 -07:00
rockeet
4c3f4496b5 Add TableBuilderOptions::level and relevant changes (#1335) 2016-09-17 22:30:43 -07:00
sdong
607628d349 Support ZSTD with finalized format
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.

Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63141
2016-09-06 12:22:16 -07:00
Yi Wu
a88677d2cf Remove ImmutableCFOptions from public API
Summary: There's no reference to ImmutableCFOptions elsewhere in /include/rocksdb. ImmutableCFOptions was introduced in this commit (5665e5e285) but later its reference in /include/rocksdb/table.h is removed.

Test Plan:
  make all check

Reviewers: IslamAbdelRahman, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63177
2016-09-02 14:16:31 -07:00
Islam AbdelRahman
5051755e35 Fix db_bench memory use after free (detected by clang_analyze)
Summary: Fix using `arg[i].thread` after deleting it

Test Plan: run clang_analyze

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63171
2016-09-02 10:58:08 -07:00
sdong
32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
Islam AbdelRahman
b49b92cf28 Introduce Read amplification bitmap (read amp statistics)
Summary:
Add ReadOptions::read_amp_bytes_per_bit option which allow us to create a bitmap for every data block we read
the bitmap will contain (block_size / read_amp_bytes_per_bit) bits.

We will use this bitmap to mark which bytes have been used of the block so we can calculate the read amplification

Test Plan: added new tests

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: yiwu, leveldb, march, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58707
2016-08-26 18:55:58 -07:00
Islam AbdelRahman
cce702a6e4 [db_bench] Support single benchmark arguments (Repeat for X times, Warm up for X times), Support CombinedStats (AVG / MEDIAN)
Summary:
This diff allow us to run a single benchmark X times and warm it up for Y times. and see the AVG & MEDIAN throughput of these X runs
for example

```
$ ./db_bench --benchmarks="fillseq,readseq[X5-W2]"
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 4.12
Date:       Wed Aug 24 10:45:26 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-8616/dbbench]
fillseq      :       4.695 micros/op 212971 ops/sec;   23.6 MB/s
DB path: [/tmp/rocksdbtest-8616/dbbench]
Warming up benchmark by running 2 times
readseq      :       0.214 micros/op 4677005 ops/sec;  517.4 MB/s
readseq      :       0.212 micros/op 4706834 ops/sec;  520.7 MB/s
Running benchmark for 5 times
readseq      :       0.218 micros/op 4588187 ops/sec;  507.6 MB/s
readseq      :       0.208 micros/op 4816538 ops/sec;  532.8 MB/s
readseq      :       0.213 micros/op 4685376 ops/sec;  518.3 MB/s
readseq      :       0.214 micros/op 4676787 ops/sec;  517.4 MB/s
readseq      :       0.217 micros/op 4618532 ops/sec;  510.9 MB/s
readseq [AVG    5 runs] : 4677084 ops/sec;  517.4 MB/sec
readseq [MEDIAN 5 runs] : 4676787 ops/sec;  517.4 MB/sec
```

Test Plan: run db_bench

Reviewers: sdong, andrewkr, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62235
2016-08-25 12:57:35 -07:00
Yi Wu
d76ddf327d Disable ClockCache db_crashtest
Summary: Tempororily disable clock cache in db_crashtest while we investigate data race issue with clock cache.

Test Plan:
    python ./tools/db_crashtest.py blackbox

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62481
2016-08-24 14:19:05 -07:00
Yi Wu
4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
Mark Callaghan
4fe12baa66 Make db_bench less space for --stats_per_interval
Summary:
Changes compaction IO stats to be printed once per interval rather
than once per interval per thread. https://github.com/facebook/rocksdb/issues/1276

Task ID: #12698508

Test Plan: run db_bench

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62067
2016-08-15 11:07:05 -07:00
sdong
8b79422b52 [Proof-Of-Concept] RocksDB Blob Storage with a blob log file.
Summary:
This is a proof of concept of a RocksDB blob log file. The actual value of the Put() is appended to a blob log using normal data block format, and the handle of the block is written as the value of the key in RocksDB.

The prototype only supports Put() and Get(). It doesn't support DB restart, garbage collection, Write() call, iterator, snapshots, etc.

Test Plan: Add unit tests.

Reviewers: arahut

Reviewed By: arahut

Subscribers: kradhakrishnan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61485
2016-08-10 17:05:17 -07:00
sdong
f35b16f246 db_bench add an option of --base_background_compactions
Summary: As title.

Test Plan: Run the benchmark with and without the parameter.

Reviewers: yiwu, andrewkr, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61491
2016-08-04 11:06:00 -07:00
omegaga
64046e581c Write a benchmark to emulate time series data
Summary: Add a benchmark to `db_bench`. In this benchmark, a write thread will populate time series data in the format of 'id | timestamp', and multiple read threads will randomly retrieve all data from one id at a time.

Test Plan: Run the benchmark: `num=134217728;bpl=536870912;mb=67108864;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4;delay=8;stop=12;wbn=3;mbc=20;wbs=134217728;dds=0;sync=0;t=32;vs=800;bs=4096;cs=17179869184;of=500000;wps=0;si=10000000; kir=100000; dir=/data/users/jhli/test/; ./db_bench --benchmarks=timeseries --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$num --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=6 --open_files=$of --verify_checksum=1 --db=$dir --sync=$sync --disable_wal=0 --compression_type=none --stats_interval=$si --compression_ratio=1 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=0 --key_id_range=$kir`

Reviewers: andrewkr, sdong

Reviewed By: sdong

Subscribers: lgalanis, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60651
2016-08-01 12:28:51 -07:00
Islam AbdelRahman
557748ff7b Fix db_stress failure (pass merge_operator even if not used)
Summary:
db_stress test is now failing because of this scenario
- run db_stress with merge_operator enabled (now we have a db with merge operands)
- run db_stress with merge_operator disabled (now when we fail to open the db)

the solution is to pass the merge_operator to the DB even if we are not going to do any merge operations

Test Plan: Check the failure

Reviewers: andrewkr, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61311
2016-07-29 11:02:03 -07:00
sdong
d4c45428af db_stress shouldn't assert file size 0 if file creation fails
Summary: OnTableFileCreated() now is also called when the file creaion fails. In that case, we shouldn't assert the file size is not 0.

Test Plan: Run crash test

Reviewers: yiwu, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61137
2016-07-27 12:08:16 -07:00
sdong
e5b5f12b81 Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60513
2016-07-26 18:15:11 -07:00
Andrew Kryczka
bbd6a5a184 ldb restore subcommand
Summary:
- Added a new subcommand, 'ldb restore', that restores from backup
- Made backup_env_uri optional (also for 'ldb backup') because it can use db_env when backup_env isn't provided

Test Plan:
verify backup and restore commands work:

  $ ./ldb backup --db=/data/users/andrewkr/rocksdb/db_bench-out/ --num_threads 1 --backup_dir=/data/users/andrewkr/rocksdb/db_bench-out/backup/
  $ ./ldb restore --db=/data/users/andrewkr/rocksdb/db_bench-out/restore/ --num_threads 1 --backup_dir=/data/users/andrewkr/rocksdb/db_bench-out/backup/

Reviewers: sdong, wanning

Reviewed By: wanning

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60849
2016-07-26 11:13:26 -07:00
zhang-tong
b2a8016df1 Update db_bench_tool.cc (#1239)
* Update db_bench_tool.cc

* Update db_bench_tool.cc
2016-07-25 14:50:44 -07:00
sdong
f85df120f2 Re-enable tsan crash white-box test with reduced killing odds
Summary:
Tsan crash white-box test was disabled because it never ends. Re-enable it with reduced killing odds.
Add a parameter in crash test script to allow we pass it through an environment variable.

Test Plan: Run it manually.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61053
2016-07-22 12:51:27 -07:00
Peter (Stig) Edwards
b06ca5f860 ldb load, prefer ifsteam(/dev/stdin) to std::cin (#1207)
getline on std::cin can be very inefficient when ldb is loading large values, with high CPU usage in libc _IO_(un)getc, this is because of the performance penalty that comes from synchronizing stdio and iostream buffers.
See the reproducers and tests in #1133 .
If an ifstream on /dev/stdin is used (when available) then using ldb to load large values can be much more efficient.
I thought for ldb load, that this approach is preferable to using <cstdio> or std::ios_base::sync_with_stdio(false).
I couldn't think of a use case where ldb load would need to support reading unbuffered input, an alternative approach would be to add support for passing --input_file=/dev/stdin.
I have a CLA in place, thanks.

The CI tests were failing at the time of https://github.com/facebook/rocksdb/pull/1156, so this change and PR will supersede it.
2016-07-22 11:46:40 -07:00
Islam AbdelRahman
68a8e6b8fa Introduce FullMergeV2 (eliminate memcpy from merge operators)
Summary:
This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>

This diff is stacked on top of D56493 and D56511

In this diff we
- Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
- Replace std::deque<std::string> with std::vector<Slice> to pass operands
- Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
- Allow FullMergeV2 output to be an existing operand

```
[Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s

[master]
readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
```

```
[Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s

[master]
readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s

```

```
[Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]

[FullMergeV2]
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s

[master]
readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
```

```
[Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions

[FullMergeV2]
readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s

[master]
readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
```

Test Plan: COMPILE_WITH_ASAN=1 make check -j64

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: lovro, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57075
2016-07-20 09:49:03 -07:00
John Alexander
9430333f84 New Statistics to track Compression/Decompression (#1197)
* Added new statistics and refactored to allow ioptions to be passed around as required to access environment and statistics pointers (and, as a convenient side effect, info_log pointer).

* Prevent incrementing compression counter when compression is turned off in options.

* Prevent incrementing compression counter when compression is turned off in options.

* Added two more supported compression types to test code in db_test.cc

* Prevent incrementing compression counter when compression is turned off in options.

* Added new StatsLevel that excludes compression timing.

* Fixed casting error in coding.h

* Fixed CompressionStatsTest for new StatsLevel.

* Removed unused variable that was breaking the Linux build
2016-07-19 09:44:03 -07:00
Wanning Jiang
880ee363eb ldb backup support
Summary: add backup support for ldb tool, and use it to run load test for backup on two HDFS envs

Test Plan: first generate some db, then compile against load test in fbcode, run load_test --db=<db path> backup --backup_env_uri=<URI of backup env> --backup_dir=<backup directory> --num_threads=<number of thread>

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60633
2016-07-14 14:09:31 -07:00
Yi Wu
296545a2c7 Fix clang analyzer errors
Summary:
Fixing erros reported by clang static analyzer.
* Removing some unused variables.
* Adding assertions to fix false positives reported by clang analyzer.
* Adding `__clang_analyzer__` macro to suppress false positive warnings.

Test Plan:
    USE_CLANG=1 OPT=-g make analyze -j64

Reviewers: andrewkr, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60549
2016-07-08 17:50:51 -07:00
sdong
32df9733d1 Add options.write_buffer_manager: control total memtable size across DB instances
Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances.

Test Plan: Add a new unit test.

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59925
2016-07-05 18:11:25 -07:00
John Alexander
43692793e3 Fixed Minor Bug on Windows Build and db_bench_tool.cc (#1189)
* Fixed Windows build error in CMakeLists.txt and perf_level error in db_bench_tool.cc

* Changed hard-coded perf levels in db_bench_tool.cc to enum values from perf_level.h

* Replaced remaining FLAGS_perf_level > 0
2016-06-29 18:44:22 -07:00
Yueh-Hsuan Chiang
faa7eb3b99 Improve regression_test.sh
Summary:
This diff makes the following improvement in regression_test.sh:

1. Add NUM_OPS and DELETE_TEST_PATH to regression_test.sh:

  * NUM_OPS:  The number of operations that will be issued
    in EACH thread.
    Default: $NUM_KEYS / $NUM_THREADS

  * DELETE_TEST_PATH: If true, then the test directory
    will be deleted after the script ends.
    Default: 0
2. Add more information in SUMMARY.csv
3. Fix a bug in regression_test.sh where each thread in fillseq will all issue $NUM_KEYS writes.
4. Add --deletes in db_bench, which allows us to control the number of deletes instead of must using FLAGS_num.

Test Plan: run regression test with and without DELETE_TEST_PATH and NUM_OPS

Reviewers: yiwu, sdong, IslamAbdelRahman, gunnarku

Reviewed By: gunnarku

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60039
2016-06-28 14:10:24 -07:00
Islam AbdelRahman
d6b79e2fd0 Remove filter_deletes from crash_test
Summary: filter_deletes option was removed, remove it from crash_test to fix it

Test Plan: make crash_test

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59901
2016-06-21 17:30:21 -07:00
sdong
7b79238b65 Deprectate filter_deletes
Summary: filter_deltes is not a frequently used feature. Remove it.

Test Plan: Run all test suites.

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59427
2016-06-17 10:30:47 -07:00
Andrew Kryczka
ca3db54788 Fetch branches from github for format compatibility test
Summary:
The sandcastle setup doesn't provide a remote with our branches. Need
to fetch them directly from github.

Test Plan:
ran script on devserver and the relevant commands on a sandcastle
host.

Reviewers: IslamAbdelRahman, lightmark, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59511
2016-06-10 13:17:14 -07:00
sdong
20699df843 memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio

Test Plan: Run all existing tests

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59199
2016-06-10 12:12:10 -07:00
Andrew Kryczka
a683d4aba9 URI-based Env selection for db_bench
Summary:
Added an option, --env_uri. When provided, it is used as an argument to
NewEnvFromUri(), which instantiates an Env based on it.

Test Plan:
built a simple binary that registers ChrootEnv for prefix "/", then
ran:

  $ ./tmp --env_uri /tmp/ --db /abcde

/tmp/ is the chroot directory and /abcde is the db_name. Then I verified
db_bench uses /tmp/abcde

Reviewers: sdong, kradhakrishnan, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59325
2016-06-09 17:53:03 -07:00
sdong
2ae15b2d50 Format compatible test should cover forward compatibility up to 4.8.
Test Plan: Run the compatibility test

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59385
2016-06-09 10:22:31 -07:00
Andrew Kryczka
5091dfc1ed use branch names in format compatibility test
Summary:
We had to go back and update the g++ path for 4.4.fb-4.8.fb. So the
path is now fixed on the branches, but can't be fixed on the tags since they're
immutable. By making format compatibility tests use branch names (when
available), backported fixes like this will be used without having to re-release.

Also removed v1.5.7 and v2.1 because make fails.

Test Plan:
  $ build_tools/rocksdb-lego-determinator run_format_compatible

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59355
2016-06-08 15:38:04 -07:00
Yueh-Hsuan Chiang
fda098461b Allow regression_test.sh to specify OPTIONS_FILE. Add header comments.
Summary:
This patch does the following improvement for regression_test.sh
* Allow regression_test.sh to specify OPTIONS_FILE.
* Add header comments that includes examples on how to run the script
  and introduce all configurable parameters.
* bug fix.

Test Plan: Run the example commands in the header comments of regression_test.sh

Reviewers: sdong, yiwu, gunnarku

Reviewed By: gunnarku

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59175
2016-06-06 22:57:46 -07:00
Yueh-Hsuan Chiang
88acd932f6 Allows db_bench to take an options file
Summary:
This patch allows db_bench to initialize it's RocksDB Options via a
options file, specified by the --options_file flag.  Note that if
--options_file flag is set, then it has higher priority than the
command-line argument.

Test Plan: db_bench_tool_test

Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D58533
2016-06-02 16:24:14 -07:00
Yueh-Hsuan Chiang
8cf0f86d39 Allow regression test to run db_bench at a remost host
Summary:
This patch does the following improvement on the regression_test.sh
* allows db_bench being executed at a remost host while storing the
  benchmark results locally.
* kills all db_bench related processes before running db_bench
* better error handling.

Test Plan:
1. Run regression_test.sh both locally and remotely
2. Run multiple regression_test.sh at the same time and make sure
   i. Only one runs successfully.
   ii. The one that runs successfully will kill all other db_bench
       processes before it runs any benchmark.

Reviewers: sdong, yiwu, gunnarku

Reviewed By: gunnarku

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D58611
2016-06-02 02:53:39 -07:00
sdong
c40c4cae14 LDBCommand::SelectCommand to use a struct as the parameter
Summary: The function wrapper for LDBCommand::SelectCommand is too long so that Windows build fails with warning "decorated name length exceeded, name was truncated". Shrink the length by using a struct.

Test Plan: Build on both of Linux and Windows and make sure the warning doesn't show in either platform.

Reviewers: andrewkr, adsharma, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58965
2016-05-31 10:26:53 -07:00
sdong
0e20000171 LDBCommand::InitFromCmdLineArgs() to move from template to function wrapper
Summary:
Build failure with some compiler setting with

tools/reduce_levels_test.cc:97: undefined reference to `rocksdb::LDBCommand* rocksdb::LDBCommand::InitFromCmdLineArgs<rocksdb::LDBCommand* (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, std::vector<std::string, std::allocator<std::string> > const&)>(std::vector<std::string, std::allocator<std::string> > const&, rocksdb::Options const&, rocksdb::LDBOptions const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const*, rocksdb::LDBCommand* (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, std::vector<std::string, std::allocator<std::string> > const&))'

Fix it by changing to function pointer instead

Test Plan: Run all existing tests

Reviewers: andrewkr, kradhakrishnan, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adsharma, lightmark, yiwu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58905
2016-05-27 11:07:48 -07:00
Islam AbdelRahman
9dd50d9902 Fix db_bench
Summary: Fix simple issue with FLAGS_simcache_size condition

Test Plan: run db_bench

Reviewers: lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58743
2016-05-24 17:27:44 -07:00
Aaron Gao
5d660258e7 add simulator Cache as class SimCache/SimLRUCache(with test)
Summary: add class SimCache(base class with instrumentation api) and SimLRUCache(derived class with detailed implementation) which is used as an instrumented block cache that can predict hit rate for different cache size

Test Plan:
Add a test case in `db_block_cache_test.cc` called `SimCacheTest` to test basic logic of SimCache.
Also add option `-simcache_size` in db_bench. if set with a value other than -1, then the benchmark will use this value as the size of the simulator cache and finally output the simulation result.
```
[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 1000000
RocksDB:    version 4.8
Date:       Tue May 17 16:56:16 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       6.809 micros/op 146874 ops/sec;   16.2 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.343 micros/op 157665 ops/sec;   17.4 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 986559
SimCache HITs:    264760
SimCache HITRATE: 26.84%

[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 10000000
RocksDB:    version 4.8
Date:       Tue May 17 16:57:10 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       5.066 micros/op 197394 ops/sec;   21.8 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.457 micros/op 154870 ops/sec;   17.1 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 1059764
SimCache HITs:    374501
SimCache HITRATE: 35.34%

[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 100000000
RocksDB:    version 4.8
Date:       Tue May 17 16:57:32 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       5.632 micros/op 177572 ops/sec;   19.6 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.892 micros/op 145094 ops/sec;   16.1 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 1150767
SimCache HITs:    1034535
SimCache HITRATE: 89.90%
```

Reviewers: IslamAbdelRahman, andrewkr, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57999
2016-05-23 23:35:23 -07:00
Aaron Orenstein
2073cf3775 Eliminate use of 'using namespace std'. Also remove a number of ADL references to std functions.
Summary: Reduce use of argument-dependent name lookup in RocksDB.

Test Plan: 'make check' passed.

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58203
2016-05-20 07:42:18 -07:00
Richard Cairns Jr
f6e404c20a Added "number of merge operands" to statistics in ssts.
Summary:
A couple of notes from the diff:
  - The namespace block I added at the top of table_properties_collector.cc was in reaction to an issue i was having with PutVarint64 and reusing the "val" string.  I'm not sure this is the cleanest way of doing this, but abstracting this out at least results in the correct behavior.
  - I chose "rocksdb.merge.operands" as the property name.  I am open to suggestions for better names.
  - The change to sst_dump_tool.cc seems a bit inelegant to me.  Is there a better way to do the if-else block?

Test Plan:
I added a test case in table_properties_collector_test.cc.  It adds two merge operands and checks to make sure that both of them are reflected by GetMergeOperands.  It also checks to make sure the wasPropertyPresent bool is properly set in the method.

Running both of these tests should pass:
./table_properties_collector_test
./sst_dump_test

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58119
2016-05-19 14:24:48 -07:00
krad
a08c8c851a Added PersistentCache abstraction
Summary:
Added a new abstraction to cache page to RocksDB designed for the read
cache use.

RocksDB current block cache is more of an object cache. For the persistent read cache
project, what we need is a page cache equivalent. This changes adds a cache
abstraction to RocksDB to cache pages called PersistentCache. PersistentCache can cache
uncompressed pages or raw pages (content as in filesystem). The user can
choose to operate PersistentCache either in  COMPRESSED or UNCOMPRESSED mode.

Blame Rev:

Test Plan: Run unit tests

Reviewers: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55707
2016-05-15 22:17:18 -07:00
Arun Sharma
5c06e0814c [ldb] Templatize the Selector
Summary:
So a customized ldb tool can pass it's own Selector.
Such a selector is expected to call LDBCommand::SelectCommand
and then add some of its own customized commands

Test Plan: make ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57249
2016-05-13 12:12:39 -07:00
Arun Sharma
49815e3841 [ldb] Export LDBCommandRunner
Summary:
The implementation remains where it is. Only the
header is exported. This is so that a customized
ldb tool can print help along with its own
extra commands

Test Plan: make ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57255
2016-05-11 13:08:45 -07:00
Andrew Kryczka
5c1c904877 ldb option for compression dictionary size
Summary:
Expose the option so it's easy to run offline tests of compression
dictionary feature.

Test Plan:
verified compression dictionary is loaded into lz4 for below command:

  $ ./ldb compact --compression_type=lz4 --compression_max_dict_bytes=16384 --db=/tmp/feed-compression-test/

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57441
2016-05-10 16:33:47 -07:00
Reid Horuff
0460e9dcce Modification of WriteBatch to support two phase commit
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.

Test Plan: single unit test in write_batch_test.

Reviewers: hermanlee4, sdong, anthony

Subscribers: leveldb, dhruba, vasilep, andrewkr

Differential Revision: https://reviews.facebook.net/D57867
2016-05-10 14:06:07 -07:00
Yueh-Hsuan Chiang
fca5aa6fcc Initial script for the new regression test
Summary:
This diff includes an initial script running a set of benchmarks for
regression test.  The script does the following things:

  checkout the specified rocksdb commit (or origin/master as default)
  make clean && DEBUG_LEVEL=0 make db_bench
  setup test directories
  run set of benchmarks and store results

Currently, the script will run couple benchmarks, store all the benchmark
output, extract micros per op and percentile information for each benchmark
and store them in a single SUMMARY.csv file.  The SUMMARY.csv will make the
follow-up regression detection easier.

In addition, the current script only takes env arguments to set important
attributes of db_bench.  Will follow-up with a patch that allows db_bench
to construct options from an options file.

Test Plan:
NUM_KEYS=100 ./tools/regression_test.sh

  Sample SUMMARY.csv file:

                                     commit id,                      benchmark,  ms-per-op,        p50,        p75,        p99,      p99.9,     p99.99
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                        fillseq,      15.28,      54.66,      77.14,    5000.00,   17900.00,   18483.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                      overwrite,      13.54,      57.69,      86.39,    3000.00,   15600.00,   17013.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     readrandom,       1.04,       0.80,       1.67,     293.33,     395.00,     504.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,               readwhilewriting,       2.75,       1.01,       1.87,     200.00,     460.00,     485.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                   deleterandom,       3.64,      48.12,      70.09,     200.00,     336.67,     347.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     seekrandom,      24.31,     391.87,     513.69,     872.73,     990.00,    1048.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,         seekrandomwhilewriting,      14.02,     185.14,     294.15,     700.00,    1440.00,    1527.00

Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr, gunnarku

Reviewed By: gunnarku

Subscribers: gunnarku, MarkCallaghan, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57597
2016-05-09 13:32:57 -07:00
Islam AbdelRahman
e1951b6f28 Add --index_block_restart_interval option in db_bench
Summary:
Pass --index_block_restart_interval flag to block_based_options in db_bench tool.

Test Plan: none

Reviewers: sdong, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57699
2016-05-09 12:09:05 -07:00
Andrew Kryczka
269f6b2e2d Revert "Modification of WriteBatch to support two phase commit"
Summary: Revert D54093 and D57453

Test Plan: running make check

Reviewers: horuff, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57819
2016-05-06 16:58:24 -07:00
Arun Sharma
04dec2a359 [ldb] Export ldb_cmd*.h
Summary:
This is needed so that rocksdb users can add more
commands to the included ldb tool by adding more custom
commands.

Test Plan: make -j ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57243
2016-05-06 16:09:09 -07:00
Mark Callaghan
9790b94c92 Add optimize_filters_for_hits option to db_bench
Summary:
Add optimize_filters_for_hits option to db_bench

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57621
2016-05-05 07:32:10 -07:00
Patrick Chan
cba752d588 sst_dump won't print size for unsupported compression type 2016-05-03 08:46:24 -07:00
Islam AbdelRahman
f3bb024fd6 Fix clang build
Summary: fix clang build

Test Plan: USE_CLANG make all -j64

Reviewers: horuff, sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57453
2016-04-29 15:19:19 -07:00
Reid Horuff
1d2e4ef747 ldb support new WAL records 2016-04-29 11:47:24 -07:00
Islam AbdelRahman
029022b0f1 Fix crash_test
Summary:
crash_test grep for 'fail' string in the output and if found it consider that we failed.
Update the output to use something else

Test Plan: make crash_test (still running)

Reviewers: yhchiang, sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57381
2016-04-28 15:59:33 -07:00
Andrew Kryczka
4032145adc Configurable compression in db_bench
Summary:
Made compression type and dictionary size configurable via environment
variables.

Depends on D52287.

Test Plan:
check these options are passed to the db.

  $ COMPRESSION_MAX_DICT_BYTES=65536 COMPRESSION_TYPE=LZ4 NUM_KEYS=10000000 DB_DIR=./tmp/ WAL_DIR=./tmp/ ./tools/benchmark.sh filluniquerandom
  ...
  $ grep Options.compression tmp/LOG
  2016/04/22-19:11:30.397829 7f5f263a2980          Options.compression: LZ4
  ...
  2016/04/22-19:11:30.397837 7f5f263a2980         Options.compression_opts.max_dict_bytes: 65536

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57141
2016-04-27 17:39:18 -07:00
Andrew Kryczka
843d2e3137 Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.

During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.

On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.

Test Plan: new unit test

Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong

Reviewed By: sdong

Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52287
2016-04-27 17:36:03 -07:00
Yueh-Hsuan Chiang
ad573b9027 Temporarily disable CompactFiles in db_stress in its default setting
Summary:
As db_stress with CompactFiles possibly catches a previous bug currently,
temporarily disable CompactFiles in db_stress in its default setting
to allows new bug to be detected while investigating the bug in CompactFiles.

Test Plan: crash test

Reviewers: sdong, kradhakrishnan, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57333
2016-04-27 16:50:51 -07:00
Sergey Makarenko
1c80dfab24 Print memory allocation counters
Summary:
Introduced option to dump malloc statistics using new option flag.
    Added new command line option to db_bench tool to enable this
    funtionality.
    Also extended build to support environments with/without jemalloc.

Test Plan:
1) Build rocksdb using `make` command. Launch the following command
    `./db_bench --benchmarks=fillrandom --dump_malloc_stats=true
    --num=10000000` end verified that jemalloc dump is present in LOG file.
    2) Build rocksdb using `DISABLE_JEMALLOC=1  make db_bench -j32` and ran
    the same db_bench tool and found the following message in LOG file:
    "Please compile with jemalloc to enable malloc dump".
    3) Also built rocksdb using `make` command on MacOS to verify behavior
    in non-FB environment.
    Also to debug build configuration change temporary changed
    AM_DEFAULT_VERBOSITY = 1 in Makefile to see compiler and build
    tools output. For case 1) -DROCKSDB_JEMALLOC was present in compiler
    command line. For both 2) and 3) this flag was not present.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57321
2016-04-27 16:23:33 -07:00
Yueh-Hsuan Chiang
644f978c18 Fix RocksDB Lite build in db_stress
Summary: Fix RocksDB Lite build in db_stress

Test Plan: OPT=-DROCKSDB_LITE db_stress

Reviewers: IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57045
2016-04-21 14:47:23 -07:00
Dmitri Smirnov
ee221d2de0 Introduce XPRESS compresssion on Windows. (#1081)
Comparable with Snappy on comp ratio.
  Implemented using Windows API, does not require external package.
  Avaiable since Windows 8 and server 2012.
  Use -DXPRESS=1 with CMake to enable.
2016-04-19 22:54:24 -07:00
Yueh-Hsuan Chiang
6cbffd50d0 Enable testing CompactFiles in db_stress
Summary:
Enable testing CompactFiles in db_stress by adding flag test_compact_files
to db_stress.

Test Plan:
./db_stress --test_compact_files=1 --compaction_style=0 --allow_concurrent_memtable_write=false --ops_per_thread=100000
./db_stress --test_compact_files=1 --compaction_style=1 --allow_concurrent_memtable_write=false --ops_per_thread=100000

Sample output (note that it's normal to have some CompactFiles() failed):
    Stress Test : 491.891 micros/op 65054 ops/sec
                : Wrote 21.98 MB (0.45 MB/sec) (45% of 3200352 ops)
                : Wrote 1440728 times
                : Deleted 441616 times
                : Single deleted 38181 times
                : 319251 read and 19025 found the key
                : Prefix scanned 640520 times
                : Iterator size sum is 9691415
                : Iterated 319704 times
                : Got errors 0 times
                : 1323 CompactFiles() succeed
                : 32 CompactFiles() failed
    2016/04/11-15:50:58  Verification successful

Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56565
2016-04-19 14:36:09 -07:00
Yueh-Hsuan Chiang
a2466c8851 [db_stress] Make subcompaction random in crash_test
Summary: Make subcompaction random in crash_test

Test Plan: make crash_test and verify whether subcompaction changes randomly

Reviewers: IslamAbdelRahman, kradhakrishnan, yiwu, sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56571
2016-04-18 14:43:33 -07:00
Andrew Kryczka
c3c389d542 Fix column label for L0 write sum
Summary:
This is taken from the "Write(GB)" column in compaction stats, so the
units should be GB, not MB.

Test Plan: none

Reviewers: sdong, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56889
2016-04-18 14:34:45 -07:00
sdong
4b6833aec1 Rename options.compaction_measure_io_stats to options.report_bg_io_stats and include flush too.
Summary: It is useful to print out IO stats in flush jobs too. Extend options.compaction_measure_io_stats to flush jobs and raname it.

Test Plan: Try db_bench and see the stats are printed out.

Reviewers: yhchiang

Reviewed By: yhchiang

Subscribers: kradhakrishnan, yiwu, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56769
2016-04-15 10:22:18 -07:00
Hyunyoung Lee
71303e04e7 Update db_bench_tool.cc (#1073)
* Update db_bench_tool.cc

I fixed the wrong letters, LevelDB -> rocksDB, because I thought of LevelDB as the wrong presentation.

the following show my fix :

fprintf(stderr, "LevelDB:    version %d.%d\n",
            kMajorVersion, kMinorVersion);

----------------->
fprintf(stderr, "rocksDB:    version %d.%d\n",
            kMajorVersion, kMinorVersion);

* Update db_bench_tool.cc

* Update db_bench_tool.cc
2016-04-12 17:05:09 -04:00
Andrew Kryczka
8e0e22f76b Fix Windows build by replacing strings.h include
Summary:
strings.h header does not exist on Windows. So, we can try another way
to compare strings ignoring case.

Test Plan:
built and ran:

  $ ./ldb_cmd_test

Reviewers: sdong, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56535
2016-04-11 19:21:00 -07:00
Islam AbdelRahman
0522990358 Improve sst_dump help message
Summary:
Current Message

```
sst_dump [--command=check|scan|none|raw] [--verify_checksum] --file=data_dir_OR_sst_file [--output_hex] [--input_key_hex] [--from=<user_key>] [--to=<user_key>] [--read_num=NUM] [--show_properties] [--show_compression_sizes] [--show_compression_sizes [--set_block_size=<block_size>]]
```
New message

```
sst_dump --file=<data_dir_OR_sst_file> [--command=check|scan|raw]
    --file=<data_dir_OR_sst_file>
      Path to SST file or directory containing SST files

    --command=check|scan|raw
        check: Iterate over entries in files but dont print anything except if an error is encounterd (default command)
        scan: Iterate over entries in files and print them to screen
        raw: Dump all the table contents to <file_name>_dump.txt

    --output_hex
      Can be combined with scan command to print the keys and values in Hex

    --from=<user_key>
      Key to start reading from when executing check|scan

    --to=<user_key>
      Key to stop reading at when executing check|scan

    --read_num=<num>
      Maximum number of entries to read when executing check|scan

    --verify_checksum
      Verify file checksum when executing check|scan

    --input_key_hex
      Can be combined with --from and --to to indicate that these values are encoded in Hex

    --show_properties
      Print table properties after iterating over the file

    --show_compression_sizes
      Independent command that will recreate the SST file using 16K block size with different
      compressions and report the size of the file using such compression

    --set_block_size=<block_size>
      Can be combined with --show_compression_sizes to set the block size that will be used
      when trying different compression algorithms
```

Test Plan: none

Reviewers: yhchiang, andrewkr, kradhakrishnan, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56325
2016-04-08 12:05:02 -07:00
Andrew Kryczka
2391ef7214 Embed column family name in SST file
Summary:
Added the column family name to the properties block. This property
is omitted only if the property is unavailable, such as when RepairDB()
writes SST files.

In a next diff, I will change RepairDB to use this new property for
deciding to which column family an existing SST file belongs. If this
property is missing, it will add it to the "unknown" column family (same
as its existing behavior).

Test Plan:
New unit test:

  $ ./db_table_properties_test --gtest_filter=DBTablePropertiesTest.GetColumnFamilyNameProperty

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55605
2016-04-06 23:10:32 -07:00
Andrew Kryczka
f2c43a4a27 Stderr info logger
Summary:
Adapted a stderr logger from the option tests. Moved it to a separate
header so we can reuse it, e.g., from ldb subcommands for faster debugging. This
is especially useful to make errors/warnings more visible when running
"ldb repair", which involves potential data loss.

Test Plan:
ran options_test and "ldb repair"

  $ ./ldb repair --db=./tmp/
  [WARN] **** Repaired rocksdb ./tmp/; recovered 1 files; 588bytes. Some data may have been lost. ****
  OK

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56151
2016-04-01 11:06:06 -07:00
Marton Trencseni
9b51987521 Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.

Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56133
2016-04-01 10:42:39 -07:00
Igor Canadi
925b5d0025 Merge pull request #1054 from DCEngines/magic12
Remove the Magic number 12 used in record size checks
2016-03-30 21:38:19 -07:00
Gunnar Kudrjavets
994b3bd693 Add support for UBsan builds to RocksDB
Summary:
Undefined Behavior Sanitizer (ubsan) is //a good thing// which will help us to find sneaky bugs with low cost. Please see http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/ for more details and official GCC documentation for more context: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html.

Changes itself are quite simple and pretty much imitating whatever is implemented for ASan.

Hooking the UBsan validation build to Sandcastle is a separate step and will be dealt as separate diff because code is in internal repository.

Test Plan: Make sure that that there no regressions when it comes to builds and test pass rate.

Reviewers: leveldb, sdong, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56049
2016-03-30 15:59:24 -07:00
Laurent Demailly
21700a5106 to/from hex refactor
Summary:
Expose the inverse of ToString(hex=true) on Slice: Slice::DecodeHex
Refactor the other implementation of to/from hex in ldb_cmd.h to use the Slice
version
(Difference between the 2 is whether 0x is expected/produced in front of the hex
string or not)
Eliminated support for invalid odd length hex string - this is now invalid
instead of having 1/2 byte set
Added (inverse of HexToString) test for LDBCommand::StringToHex which also
indirectly tests Slice::ToString(true)

After moving the original implementation from ldb_cmd.h, updated it to much simpler/efficient version
(originally/inspired from https://github.com/facebook/wdt/blob/master/util/EncryptionUtils.cpp#L140-L169 )

Test Plan: run tests

Reviewers: uddipta, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56121
2016-03-30 14:36:48 -07:00
zensan
78711524b7 In all the places where log records are read, there was a check that
record.size() should not be less than 12.

This "magic number" seems to be the WriteBatch header (8 byte sequence
and 4 byte count).   Replaced all the places where "12" was used
by WriteBatchInternal::kHeader.
2016-03-30 23:05:22 +05:30
sdong
b1fafcaca6 Revert "Adding pin_l0_filter_and_index_blocks_in_cache feature."
This reverts commit 522de4f59e.

It has bug of index block cleaning up.
2016-03-21 11:50:42 -07:00
sdong
43bbb56198 tools/check_format_compatible.sh to use consistent version when testing backward and forward compatibility
Summary: Test seems to fail if we don't use consistent version between testing forward and backward compatibility.

Test Plan: Run the script (with some version removed manually to make it shorter)

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55773
2016-03-21 11:13:26 -07:00
sdong
780d2b04cb Update format compatible checking tool
Summary: After introducing a less forward-compatible change, update the backward compatible checking tool.

Test Plan: Run it.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55695
2016-03-18 14:07:15 -07:00
Marton Trencseni
522de4f59e Adding pin_l0_filter_and_index_blocks_in_cache feature.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
When the table reader is destroyed, it releases the pinned blocks (if there were any). This has to happen before the cache is destroyed, so I had to introduce a TableReader::Close(), to guarantee the order of destruction.

Test Plan:
Added two unit tests for this. Existing unit tests run fine (default is pin_l0_filter_and_index_blocks_in_cache=false).

DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32
  Mac: OK.
  Linux: with D55287 patched in it's OK.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54801
2016-03-17 22:40:01 +00:00
Gunnar Kudrjavets
90aff0c444 Update --max_write_buffer_number for compaction benchmarks
Summary: For compactions benchmarks (both level and universal) we'll use `--max_write_buffer_number=4`. For all the other benchmarks which don't customize the value of `--max_background_flushes` we'll continue using `--max_write_buffer_number=8`.

Test Plan:
To validate basic correctness and command-line options:

```
cd ~/rocksdb
NKEYS=10000000 ./tools/run_flash_bench.sh
```

Reviewers: MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55497
2016-03-17 10:14:23 -07:00
Siying Dong
774922c680 Merge pull request #1026 from SherlockNoMad/Hist
Histogram Concurrency Improvement and Time-Windowing Support
2016-03-15 11:27:54 -07:00
Igor Canadi
17b879b91e Merge pull request #1037 from SherlockNoMad/BuildFix
Fix AppVeyor build error
2016-03-15 11:15:56 -07:00
SherlockNoMad
f11b0df121 Fix AppVeyor build error 2016-03-15 10:57:33 -07:00
Gunnar Kudrjavets
697fab820a Updates to RocksDB subcompaction benchmarking script
Summary: Set of updates to the subcompaction benchmark script which are based on our internal discussions. The intent behind the changes is to make sure that the scripts will correctly reflect how we're doing the actual benchmarking.

Test Plan: Tested by exercising the full set of compaction benchmarks and validating the execution and consistency of results.

Reviewers: MarkCallaghan, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55461
2016-03-14 23:09:04 -07:00
Andrew Kryczka
08304c0867 Expose RepairDB as ldb command
Summary: This will make it easier for admins and devs to use RepairDB.

Test Plan:
Tried deleting the manifest and verified it recovers:

  $ ldb --create_if_missing --db=/tmp/test_db put ok ok
  $ rm -f /tmp/test_db/MANIFEST-000001
  $ ./ldb --db=/tmp/test_db repair
  $ ldb --db=/tmp/test_db get ok
  ok

Reviewers: yhchiang, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55359
2016-03-12 13:50:20 -08:00
SherlockNoMad
54f6b9e162 Histogram Concurrency Improvement and Time-Windowing Support 2016-03-11 16:54:25 -08:00
agiardullo
790252805d Add multithreaded transaction test
Summary: Refactored db_bench transaction stress tests so that they can be called from unit tests as well.

Test Plan: run new unit test as well as db_bench

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55203
2016-03-11 15:16:52 -08:00
Igor Canadi
ee8cc35201 Merge pull request #938 from alexander-fenster/master
added --no_value option to ldb scan to dump key only
2016-03-10 16:44:52 -08:00
Alexander Fenster
f0161c37b0 formatting fix 2016-03-10 13:34:42 -08:00
Gunnar Kudrjavets
68189f7e1b Update benchmarks used to measure subcompaction performance
Summary: After closely working with Mark, Siying, and Yueh-Hsuan this set of changes reflects the updates needed to measure RocksDB subcompaction performance in a correct manner. The essence of the benchmark is executing `fillrandom` followed by `compact` with the correct set of options for various number of subcompactions specified.

Test Plan: Tested internally to verify correctness and reliability.

Reviewers: sdong, yhchiang, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D55089
2016-03-04 12:32:11 -08:00
Jonathan Wiepert
871cc5f987 fix build without gflags
Test Plan:
Built and ran with gflags:
% ./db_bench
LevelDB:    version 4.5
Date:       Tue Feb 16 12:04:23 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
...

And without gflags:
% ./db_bench
Please install gflags to run rocksdb tools
%

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: igor, dhruba

Differential Revision: https://reviews.facebook.net/D54243
2016-02-16 12:16:47 -08:00
Jonathan Wiepert
7bd284c374 Separeate main from bench functionality to allow cusomizations
Summary: Isolate db_bench functionality from main so custom benchmark code can be written and managed

Test Plan:
Tested commands
./build_tools/regression_build_test.sh
./db_bench --db=/tmp/rocksdbtest-12321/dbbench --stats_interval_seconds=1 --num=1000
./db_bench --db=/tmp/rocksdbtest-12321/dbbench --stats_interval_seconds=1 --num=1000 --reads=500 --writes=500
./db_bench --db=/tmp/rocksdbtest-12321/dbbench --stats_interval_seconds=1 --num=1000 --merge_keys=100 --numdistinct=100 --num_column_families=3 --num_hot_column_families=1
./db_bench --stats_interval_seconds=1 --num=1000 --bloom_locality=1 --seed=5 --threads=5
./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --usee_uint64_comparator=true --batch-size=5
./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --use_uint64_comparator=true --batch_size=5
./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --usee_uint64_comparator=true --batch-size=5

Test Results - https://phabricator.fb.com/P56130387

Additional tests for:
./db_bench --duration=60 --value_size=50 --seek_nexts=10 --reverse_iterator=true --use_uint64_comparator=true --batch_size=5 --key_size=8 --merge_operator=put
./db_bench --stats_interval_seconds=1 --num=1000 --bloom_locality=1 --seed=5 --threads=5 --merge_operator=uint64add

Results: https://phabricator.fb.com/P56130607

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D53991
2016-02-16 06:17:31 -08:00
Gunnar Kudrjavets
337671b688 Add universal compaction benchmarks to run_flash_bench.sh
Summary:
Implement a benchmark for universal compaction based on the feature description (see below), in-person discussions, and reading source code:

https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide
https://github.com/facebook/rocksdb/wiki/Universal-Compaction
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#universal-compaction

Universal compaction benchmark is based on `overwrite` benchmark, adding compaction specific options to it, and executing it for different values of subcompaction to understand the impact of scaling out subcompactions for a particular scenario.

Test Plan:
  - Execute the benchmark on various machines for multiple iterations to verify the reliability.
  - Observe the output to make sure that compaction is taking place.
  - Observe the execution to make sure that arguments passed to `db_bench` are correct.

Reviewers: sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54045
2016-02-10 15:30:47 -08:00
Baraa Hamodi
21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
sdong
2608219cc9 crash_test: cover concurrent memtable insert in default crash test
Summary: Default crash test uses prefix hash memtable, which is not compatible to concurrent memtable. Allow prefix test run with skip list and use skip list memtable when concurrent insert is used.

Test Plan: Run "python -u tools/db_crashtest.py whitebox" and watch sometimes skip list is used.

Reviewers: anthony, yhchiang, kradhakrishnan, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53907
2016-02-09 08:02:38 -08:00
sdong
b1887c5dd9 Explictly fail when memtable doesn't support concurrent insert
Summary: If users turn on concurrent insert but the memtable doesn't support it, they might see unexcepted crash. Fix it by explicitly fail.

Test Plan:
Run different setting of stress_test and make sure it fails correctly.
Will add a unit test too.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, andrewkr, ngbronson

Reviewed By: ngbronson

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53895
2016-02-05 14:15:50 -08:00
Gunnar Kudrjavets
8ed3438778 Add option to run fillseq with WAL enabled in addition to WAL disabled
Summary: This set of changes is part of the work to introduce benchmark for universal style compaction in RocksDB. It's conceptually separate from the compaction work, so sending it out as a separate diff to get it out of the way.

Test Plan:
  - Run `./tools/run_flash_bench.sh`.
  - Look at the contents of `report.txt` and `report2.txt` to make sure that data is reported and attributed correctly.
  - During `db_bench` execution time make sure that the correct flags are passed to `--disable_wal` depending on the benchmark being executed.

Reviewers: MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53865
2016-02-05 13:20:56 -08:00
sdong
34a40bf911 Add --allow_concurrent_memtable_write in stress test and run it in crash_test
Summary: Add an option of --allow_concurrent_memtable_write in stress test and cover it in crash test

Test Plan: Run crash test and make sure three combinations of the two options show up randomly.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr, anthony, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53811
2016-02-04 16:15:18 -08:00
Gunnar Kudrjavets
a09ce4fcd3 Skip some of the non-critical tests in ./tools/run_flash_bench.sh
Summary:
Some of the tests aren't considered to be critical when it comes to getting key benchmarking data for RocksDB. Therefore we'll introduce an environment variable `SKIP_LOW_PRI_TESTS` which enables skipping those test cases. By default all the tests will be run. If you want to optimize the test-case execution then do the following:

`
$ export SKIP_LOW_PRI_TESTS=1
$ ./tools/run_flash_bench.sh
`

Test Plan: Verified that when  `SKIP_LOW_PRI_TESTS` is not set then `benchmark.sh` is called for all the scenarios and when `SKIP_LOW_PRI_TESTS` is set to `1` then `benchmark.sh` is called only for the test-cases which are critical.

Reviewers: MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53739
2016-02-03 09:56:56 -08:00
sdong
38e1d7fea3 ldb to support --column_family option
Summary:
Add an option --column_family option, so that users can query or update specific column family.
Also add an create column family parameter to make unit test easier.
Still need to add unit tests.

Test Plan: Will add a test case in ldb python test.

Reviewers: yhchiang, rven, andrewkr, IslamAbdelRahman, kradhakrishnan, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53265
2016-01-25 14:58:18 -08:00
sdong
fdbff42391 Crash test to make kill decision for every kill point
Summary:
In crash test, when coming to each kill point, we start a random class using seed as current second. With this approach, for every second, the random number used is the same. However, in each second, there are multiple kill points with different frequency. It makes it hard to reason about chance of kill point to trigger. With this commit, we use thread local random seed to generate the random number, so that it will take different values per second, hoping it makes chances of killing much easier to reason about.

Also significantly reduce the kill odd to make sure time before kiling is similar as before.

Test Plan: Run white box crash test and see the killing happens as expected and the run time time before killing reasonable.

Reviewers: kradhakrishnan, IslamAbdelRahman, rven, yhchiang, andrewkr, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52971
2016-01-19 18:11:24 -08:00
sdong
b54d4dd435 tools/sst_dump_tool_imp.h not to depend on "util/testutil.h"
Summary:
util/testutil.h doesn't seem to be used in tools/sst_dump_tool_imp.h. Remove it.
Also move some other include to tools/sst_dump_tool.cc instead.

Test Plan: Build with GCC, CLANG and with GCC 4.81 and 4.9.

Reviewers: yuslepukhin, yhchiang, rven, anthony, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52791
2016-01-13 11:34:53 -08:00
Alexander Fenster
e16438bb86 fixing build warning 2016-01-11 11:23:33 -08:00
Alexander Fenster
b73fbbaf64 added --no_value option to ldb scan to dump key only 2016-01-11 10:51:42 -08:00
Gunnar Kudrjavets
b1a3b4c0d0 Make ldb automagically determine the file type and use the correct dumping function
Summary:
This set of changes implements the following design: `ldb` will utilize `--path` parameter which can be used to specify a file name. Tool will then apply some heuristic to determine how to output the data properly. The design decision is not to probe the file content, but use file names to determine what dumping function to call.

Usage examples:

Understands that path points to a manifest file and dumps it.
`./ldb --path=/tmp/test_db/MANIFEST-000023 dump`

Understands that path points to a WAL file and dumps it.
`./ldb --path=/tmp/test_db/000024.log dump --header`

Understands that path points to a SST file and dumps it.
`./ldb --path=/tmp/test_db/000007.sst dump`

Figures out that none of the supported file types are applicable and outputs
an appropriate error message.
`./ldb --path=/tmp/cron.log dump`

Test Plan:
Basics:

git diff
make clean
make -j 32 commit-prereq
arc lint

More specific testing (done as part of commit-prereq, but can be iterated separately when making isolated changes):

make clean
make ldb
python tools/ldb_test.py
make rocksdb_dump
make rocksdb_undump
sh tools/rocksdb_dump_test.sh

Reviewers: rven, IslamAbdelRahman, yhchiang, kradhakrishnan, anthony, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52269
2016-01-06 14:19:08 -08:00
Mark Callaghan
4041903ecd Enhance db_bench write rate limit
Summary:
1) changes tools/{benchmark,run_flash_bench}.sh to optionally use the write rate limit
2) removes code for --writes_per_second and switches the 'background' write rate limit
to use --benchmark_write_rate_limit

Replaces https://reviews.facebook.net/D49113

Task ID: #9555881

Blame Rev:

Test Plan:
tools/run_flash_bench.sh

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D52485
2016-01-04 12:01:27 -08:00
Siying Dong
298ba27ae2 Merge pull request #846 from yuslepukhin/enble_c4244_lossofdata
Enable MS compiler warning c4244.
2015-12-23 22:59:42 -08:00
Andrew Kryczka
e089db40f9 Skip bottom-level filter block caching when hit-optimized
Summary:
When Get() or NewIterator() trigger file loads, skip caching the filter block if
(1) optimize_filters_for_hits is set and (2) the file is on the bottommost
level. Also skip checking filters under the same conditions, which means that
for a preloaded file or a file that was trivially-moved to the bottom level, its
filter block will eventually expire from the cache.

- added parameters/instance variables in various places in order to propagate the config ("skip_filters") from version_set to block_based_table_reader
- in BlockBasedTable::Rep, this optimization prevents filter from being loaded when the file is opened simply by setting filter_policy = nullptr
- in BlockBasedTable::Get/BlockBasedTable::NewIterator, this optimization prevents filter from being used (even if it was loaded already) by setting filter = nullptr

Test Plan:
updated unit test:

  $ ./db_test --gtest_filter=DBTest.OptimizeFiltersForHits

will also run 'make check'

Reviewers: sdong, igor, paultuckfield, anthony, rven, kradhakrishnan, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D51633
2015-12-23 10:15:07 -08:00
Dmitri Smirnov
236fe21c92 Enable MS compiler warning c4244.
Mostly due to the fact that there are differences in sizes of int,long
  on 64 bit systems vs GNU.
2015-12-11 16:47:34 -08:00
charsyam
c30b499541 fix typos in comments 2015-12-11 01:54:48 +09:00
yuslepukhin
78de0c9222 Fix up VS 15 build.
Fix warnings
 Take advantage of native snprintf on VS 15
2015-12-08 08:38:21 -08:00
Islam AbdelRahman
88e0527724 Reduce moving memory in LDB::ScanCommand
Summary:
Based on https://github.com/facebook/rocksdb/issues/843
It looks that when the data is hot we spend significant amount of time moving data out of RocksDB blocks. This patch reduce moving memory when possible

Original performance
```
$ time ./ldb --db=/home/tec/local/ellina_test/testdb scan > /dev/null
real	0m16.736s
user	0m11.993s
sys	0m4.725s
```

Performance after reducing memcpy
```
$ time ./ldb --db=/home/tec/local/ellina_test/testdb scan > /dev/null
real	0m11.590s
user	0m6.983s
sys	0m4.595s
```

Test Plan:
dump the output of the scan into 2 files and verifying the are exactly the same
make check

Reviewers: sdong, yhchiang, anthony, rven, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D51093
2015-11-19 22:26:37 -08:00
sdong
51fce92e11 "ldb compact" should force bottommost level compaction
Summary: Now "ldb compact" skips the bottommost level compaction. This is an unintended behavior change. Reverting it now. Maybe we need to add another mode later for it.

Test Plan: Run a manual test of 'ldb' to make sure bottom most level is compacted.

Reviewers: IslamAbdelRahman, yhchiang, anthony, kradhakrishnan, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D50925
2015-11-17 18:07:11 -08:00
Dmitri Smirnov
ee2c3236dd Fix compilation problem on Windows.
char is not a valid template parameter for std::uniform_int_distribution
  according to the standard. Replacing with int should be just fine.
2015-10-29 11:29:18 -07:00
Igor Canadi
c97667d9f1 Fix RocksDB lite build for write_stress
Summary: We don't have access to GetLiveFilesMetadata() in RocksDB lite. If compiling write_stress for lite, I skip the check for leaked files, which depends on this function.

Test Plan: OPT=-DROCKSDB_LITE m write_stress

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49647
2015-10-28 16:37:39 -07:00
Igor Canadi
4b66d95344 Write stress test
Summary:
The goal of this diff is to create a simple stress test with focus on catching:
* bugs in compaction/flush processes, especially the ones that cause assertion errors
* bugs in the code that deletes obsolete files

There are two parts of the test:
* write_stress, a binary that writes to the database
* write_stress_runner.py, a script that invokes and kills write_stress

Here are some interesting parts of write_stress:
* Runs with very high concurrency of compactions and flushes (32 threads total) and tries to create a huge amount of small files
* The keys written to the database are not uniformly distributed -- there is a 3-character prefix that mutates occasionally (in prefix mutator thread), in such a way that the first character mutates slower than second, which mutates slower than third character. That way, the compaction stress tests some interesting compaction features like trivial moves and bottommost level calculation
* There is a thread that creates an iterator, holds it for couple of seconds and then iterates over all keys. This is supposed to test RocksDB's abilities to keep the files alive when there are references to them.
* Some writes trigger WAL sync. This is stress testing our WAL sync code.
* At the end of the run, we make sure that we didn't leak any of the sst files

write_stress_runner.py changes the mode in which we run write_stress and also kills and restarts it. There are some interesting characteristics:
* At the beginning we divide the full test runtime into smaller parts -- shorter runtimes (couple of seconds) and longer runtimes (100, 1000) seconds
* The first time we run write_stress, we destroy the old DB. Every next time during the test, we use the same DB.
* We can run in kill mode or clean-restart mode. Kill mode kills the write_stress violently.
* We can run in mode where delete_obsolete_files_with_fullscan is true or false
* We can run with low_open_files mode turned on or off. When it's turned on, we configure table cache to only hold a couple of files -- that way we need to reopen files every time we access them.

Another goal was to create a stress test without a lot of parameters. So tools/write_stress_runner.py should only take one parameter -- runtime_sec and it should figure out everything else on its own.

In a separate diff, I'll add this new test to our nightly legocastle runs.

Test Plan:
The goal of this test was to retroactively catch the following bugs: D33045, D48201, D46899, D42399. I failed to reproduce D48201, but all others have been caught!

When i reverted https://reviews.facebook.net/D33045:

     ./write_stress --runtime_sec=200 --low_open_files_mode=true
     Iterator statuts not OK: IO error: /fast-rocksdb-tmp/rocksdb_test/write_stress/089166.sst: No such file or directory

When i reverted https://reviews.facebook.net/D42399:

    python tools/write_stress_runner.py --runtime_sec=5000
    Running write_stress, will kill after 5 seconds: ./write_stress --runtime_sec=-1
    Running write_stress, will kill after 2 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false --delete_obsolete_files_with_fullscan=true
    Running write_stress, will kill after 7 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false
    Running write_stress, will kill after 5 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false
    Running write_stress, will kill after 8 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false --low_open_files_mode=true
    Write to DB failed: IO error: /fast-rocksdb-tmp/rocksdb_test/write_stress/019250.sst: No such file or directory
    ERROR: write_stress died with exitcode=-6

When i reverted https://reviews.facebook.net/D46899:

    python tools/write_stress_runner.py --runtime_sec=1000
    runtime: 1000
    Going to execute write stress for [3, 3, 100, 3, 2, 100, 1, 788]
    Running write_stress for 3 seconds: ./write_stress --runtime_sec=3 --low_open_files_mode=true
    Running write_stress for 3 seconds: ./write_stress --runtime_sec=3 --destroy_db=false --delete_obsolete_files_with_fullscan=true
    Running write_stress, will kill after 100 seconds: ./write_stress --runtime_sec=-1 --destroy_db=false --delete_obsolete_files_with_fullscan=true
    write_stress: db/db_impl.cc:2070: void rocksdb::DBImpl::MarkLogsSynced(uint64_t, bool, const rocksdb::Status&): Assertion `log.getting_synced' failed.
    ERROR: write_stress died with exitcode=-6

Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, sdong, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49533
2015-10-28 16:15:07 -07:00
sdong
ab0f3b964f crash_test to trigger some less frequent crash point more frequently
Summary: crash_test still has a very low chance to hit some crash point. Have another mode for covering them more likely.

Test Plan: Run crash_test and see db_stress is called with expected prameters.

Reviewers: kradhakrishnan, igor, anthony, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49473
2015-10-27 12:06:06 -07:00
Siying Dong
138876a62c Merge pull request #746 from ceph/wip-recycle
Add Options.recycle_log_file_num for Recycling WAL Files
2015-10-26 15:01:28 -07:00
Shusen Liu
d0d13ebf67 fix bug in db_crashtest.py
Summary:
in tools/db_crashtest.py, cmd_params['db'] by default is a lambda expression, not the actual db_name.
fix by get the db_name before passing it to gen_cmd.

Test Plan: run `make crashtest`

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D49119
2015-10-20 22:01:11 -07:00
Shusen Liu
033c6f1add T7916298, bug fix
Summary: dbname => cmd_params['db']

Test Plan: Run `make crash_test`

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D49077
2015-10-19 21:09:35 -07:00
Shusen Liu
4575de5b9e #7916298: merge tools/db_crashtest2.py into tools/db_crashtest.py
Summary:
merge tools/db_crashtest2.py into tools/db_crashtest.py

python tools/db_crashtest.py -h  # show help message, ALL parameters can be overwrite by arguments

Example usages:
python tools/db_crashtest.py blackbox  # run blackbox with default parameters
python tools/db_crashtest.py blackbox --simple
python tools/db_crashtest.py whitebox  # run whitebox with default parameters
python tools/db_crashtest.py whitebox --simple

all default parameters are identical to previous version.

Test Plan: `make crash_test` and make sure it can run with expected parameters pased to db_stress.

Reviewers: igor, rven, anthony, IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48567
2015-10-19 13:24:55 -07:00
Sage Weil
3ac13c99d1 log_reader: pass log_number and optional info_log to ctor
We will need the log number to validate the recycle-style CRCs.  The log
is helpful for debugging, but optional, as not all callers have it.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-10-18 21:24:32 -04:00
sdong
f9ba79ecd6 crash_test to trigger fail points other than file appending more frequently
Summary:
For half of the crash_test run, disable fail point for file appending, in order to trigger other fail point more frequently.
Also, tune crash test parameter a little bit for it to initialize faster.

Test Plan: Run crash_test and make sure it issues db_stress commands as expected.

Reviewers: yhchiang, igor, rven, anthony, IslamAbdelRahman, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48843
2015-10-16 11:35:27 -07:00
sdong
680156ca61 crash_test to run with data sync on
Summary: Mode of data sync off is a much less used than the case of data sync on. Crash test should cover the more common case than a corner case. So turn data sync on in crash tests.

Test Plan: Run crash test and make sure it can run with expected parameters pased to db_stres.

Reviewers: IslamAbdelRahman, anthony, igor, rven, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48729
2015-10-15 14:37:08 -07:00
sdong
e1a5ff857b Allow users to disable some kill points in db_stress
Summary:
Give a name for every kill point, and allow users to disable some kill points based on prefixes. The kill points can be passed by db_stress through a command line paramter. This provides a way for users to boost the chance of triggering low frequency kill points
This allow follow up changes in crash test scripts to improve crash test coverage.

Test Plan:
Manually run db_stress with variable values of --kill_random_test and --kill_prefix_blacklist. Like this:
 --kill_random_test=2 --kill_prefix_blacklist=Posix,WritableFileWriter::Append,WritableFileWriter::WriteBuffered,WritableFileWriter::Sync

Reviewers: igor, kradhakrishnan, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48735
2015-10-15 14:33:13 -07:00
Islam AbdelRahman
6d730b4ae7 Block tests under ROCKSDB_LITE
Summary:
This patch will block all tests (not including db_test) that don't compile / fail under ROCKSDB_LITE

Test Plan:
OPT=-DROCKSDB_LITE make db_compaction_filter_test -j64 &&
OPT=-DROCKSDB_LITE make db_compaction_test -j64 &&
OPT=-DROCKSDB_LITE make db_dynamic_level_test -j64 &&
OPT=-DROCKSDB_LITE make db_log_iter_test -j64 &&
OPT=-DROCKSDB_LITE make db_tailing_iter_test -j64 &&
OPT=-DROCKSDB_LITE make db_universal_compaction_test -j64 &&
OPT=-DROCKSDB_LITE make ldb_cmd_test -j64

make clean

make db_compaction_filter_test -j64 &&
make db_compaction_test -j64 &&
make db_dynamic_level_test -j64 &&
make db_log_iter_test -j64 &&
make db_tailing_iter_test -j64 &&
make db_universal_compaction_test -j64 &&
make ldb_cmd_test -j64

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48723
2015-10-15 10:51:00 -07:00
Venkatesh Radhakrishnan
63e507c59c Move ldb and sst_dump from utils to tools.
Summary: As part of cleaning up dependencies for tech debt week, we are moving ldb and sst_dump tools from util to tools, since they are tools.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48747
2015-10-14 17:08:28 -07:00
Islam AbdelRahman
9babaeed16 Update dump_tool and undump_tool to accept Options
Summary:
Refactor dump_tool and undump_tool so that it's possible to use them with customized options
for example setting a specific comparator similar to what Dragon is doing with the LdbTool

https://phabricator.fb.com/diffusion/FBCODE/browse/master/dragon/tools/Ldb.cpp

Test Plan:
compiles
used it to dump / undump a dragon shard

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, adsharma

Differential Revision: https://reviews.facebook.net/D47853
2015-10-05 19:49:48 -07:00
Yueh-Hsuan Chiang
a263002a36 Fixed a tsan warning in db_stress.cc
Summary:
Fixed the following tsan warning in db_stress.cc

  WARNING: ThreadSanitizer: data race (pid=3163194)
  Read of size 8 at 0x7fd1797cb518 by thread T32:
    #0 VerifyDb tools/db_stress.cc:1731 (db_stress+0x000000040674)
    #1 rocksdb::StressTest::ThreadBody(void*) tools/db_stress.cc:1191 (db_stress+0x0000000625a9)
    #2 StartThreadWrapper util/env_posix.cc:1648 (db_stress+0x00000028bbbd)

  Previous write of size 8 at 0x7fd1797cb518 by thread T31:
    #0 VerifyDb tools/db_stress.cc:1726 (db_stress+0x00000004072a)
    #1 rocksdb::StressTest::ThreadBody(void*) tools/db_stress.cc:1191 (db_stress+0x0000000625a9)
    #2 StartThreadWrapper util/env_posix.cc:1648 (db_stress+0x00000028bbbd)

The cause is that in VerifyDb(), the static local const variable long max_key
can be read and written at the same time.  This patch fixed it by making it
non-static.

Test Plan: db_stress

Reviewers: igor, sdong, IslamAbdelRahman, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47703
2015-09-28 12:06:43 -07:00
Andres Noetzli
014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00
Dmitry Marakasov
4b0b0201c9 Fix `integer overflow in expression' error 2015-09-15 14:41:00 +03:00
Amit Arya
7a31960ee9 Tests for ManifestDumpCommand and ListColumnFamiliesCommand
Summary:
Added tests for two LDBCommands namely i) ManifestDumpCommand and ii) ListColumnFamiliesCommand.
+ Minor fix in the sscanf formatter (along relace C cast with C++ cast) + replacing localtime with localtime_r which is thread safe.

Test Plan: make all && ./tools/ldb_test.py

Reviewers: anthony, igor, IslamAbdelRahman, kradhakrishnan, lgalanis, rven, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45819
2015-09-08 14:23:42 -07:00
sdong
7a0dbdf3ac Add ZSTD (not final format) compression type
Summary: Add ZSTD compression type. The same way as adding LZ4.

Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests.

Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor

Reviewed By: igor

Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45747
2015-08-28 11:01:13 -07:00
Mark Callaghan
4c81ac0c59 Fix benchmark report script
Summary:
db_bench output now displays Percentile many times with --statistics after
read IO latency histograms were added. So I only need the last one in the report output.

Task ID: #

Blame Rev:

Test Plan:
run run_flash_bench.sh

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D45093
2015-08-22 12:18:00 -07:00
Ari Ekmekji
b6def58f73 Changed 'num_subcompactions' to the more accurate 'max_subcompactions'
Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.

I also added a --subcompactions option to db_bench

Test Plan: make all   make check

Reviewers: sdong, igor, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D45069
2015-08-21 14:25:34 -07:00
Mark Callaghan
41a0e2811d Improve defaults for benchmarks
Summary:
Changes include:
* don't sync-on-commit for single writer thread in readwhile... tests
* make default block size 8kb rather than 4kb to avoid too small blocks after compression
* use snappy instead of zlib to avoid stalls from compression latency
* disable statistics
* use bytes_per_sync=8M to reduce throughput loss on disk
* use open_files=-1 to reduce mutex contention

Task ID: #

Blame Rev:

Test Plan:
run benchmark

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44961
2015-08-20 18:59:10 -07:00
Ari Ekmekji
f0da6977a3 [Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State
Summary:
In prepration for running multiple threads at the same time during
a compaction job, this patch assigns each subcompaction its own state
(instead of sharing the one global CompactionState). Each subcompaction then
uses this state to update its statistics, keep track of its snapshots, etc.
during the course of execution. Then at the end of all the executions the
statistics are aggregated across the subcompactions so that the final result
is the same as if only one larger compaction had run.

Test Plan: ./db_test  ./db_compaction_test  ./compaction_job_test

Reviewers: sdong, anthony, igor, noetzli, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43239
2015-08-18 11:06:23 -07:00
Andres Notzli
4249f159d5 Removing duplicate code in db_bench/db_stress, fixing typos
Summary:
While working on single delete support for db_bench, I realized that
db_bench/db_stress contain a bunch of duplicate code related to
copmression and found some typos. This patch removes duplicate code,
typos and a redundant #ifndef in internal_stats.cc.

Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress

Reviewers: yhchiang, sdong, rven, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43965
2015-08-11 11:46:15 -07:00
sdong
cf3e05304f crash_test cleans up directory before testing if TEST_TMPDIR is set
Summary: In a recent change, crash_test can put data under TEST_TMPDIR. However, the directory is not cleaned before running the test, which may cause unexpected results. Clean it.

Test Plan: Run white and black box crash test against non-existing, or non-empty but not compactible DBs, and make sure it works as expected.

Reviewers: kradhakrishnan, rven, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43515
2015-08-04 14:59:28 -07:00
sdong
e2a3bfe74b First half of whitebox_crash_test to keep crashing the same DB
Summary: Currently, whitebox crash test is not really executed, because the DB is destroyed after each crash. With this fix, in the first half of the time, DB will keep opening the crashed DB and continue from there.

Test Plan: "make whitebox_crash_test" and see the same DB keeps crashing and being reopened.

Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43503
2015-08-04 12:16:44 -07:00
sdong
2e73bd4ff2 crash_test to put DB under TEST_TMPDIR
Summary: Currently crash_test only puts data under /tmp. It is less flexible if we want to cover different file systems or media. Make crash_test to appreciate TEST_TMPDIR so that users can run it against another file system.

Test Plan: Run blackbox_crash_test and whitebox_crash_test with or without TEST_TMPDIR set and make sure DBs are put in the right place

Reviewers: kradhakrishnan, yhchiang, rven, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43509
2015-08-04 12:10:55 -07:00
sdong
1205bdbcee crash_test to cover simply cases
Summary:
crash_test now only runs complicated options, multiple column families, prefix hash, frequently changing options, many compaction threads, etc. These options are good to cover new features but we loss coverage in most common use cases. Furthermore, by running only for multiple column families, we are not able to create LSM trees that are large enough to cover some stress cases.
Make half of crash_test runs the simply tests: single column family, default mem table, one compaction thread, no change options.

Test Plan: Run crash_test

Reviewers: rven, yhchiang, IslamAbdelRahman, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43461
2015-08-04 12:08:38 -07:00
Andres Noetzli
bd852bf118 Fixed typos in db_stress
Summary: Fixed typos.

Test Plan: None

Reviewers: igor, yhchiang, sdong, anthony, rven

Reviewed By: rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43365
2015-07-31 14:11:43 -07:00
sdong
7bfae3a723 tools/db_crashtest2.py should run on the same DB
Summary:
Crash tests are supposed to restart the same DB after crashing, but it is now opening a different DB. Fix it.
It's probably a leftover of https://reviews.facebook.net/D17073

Test Plan: Run the test and make sure the same Db is opened.

Reviewers: kradhakrishnan, rven, igor, IslamAbdelRahman, yhchiang, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43197
2015-07-29 15:50:37 -07:00
Dmitri Smirnov
31b35c902e Add missing tests, fix db_sanity
Add heap_test, merge_helper_test
 Fix uninitialized pointers in db_sanity_test that cause SIGSEV when DB::Open fails in case compression is not linked.
2015-07-21 18:04:28 -07:00
Igor Canadi
35ca59364c Don't let flushes preempt compactions
Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.

We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.

Test Plan: make check (there might be some unit tests that depend on this behavior)

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41931
2015-07-17 12:02:52 -07:00
Igor Canadi
e94c510c3f Make ldb_test not depend on compression
Summary: This is failing our tsan tests. Our new behavior is to fail DB::Open() if the requested compression is not available. The easiest fix is to make ldb_test not depend on compression.

Test Plan: python tools/ldb_test.py

Reviewers: sdong, yhchiang, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42075
2015-07-14 23:13:23 -07:00
Igor Canadi
a9c5109515 Deprecate purge_redundant_kvs_while_flush
Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.

Test Plan: none

Reviewers: sdong, yhchiang, anthony, dhruba

Reviewed By: dhruba

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D42063
2015-07-14 13:07:02 +02:00
Islam AbdelRahman
e290f5d3ce Block reduce_levels_test in ROCKSDB_LITE
Summary: Block reduce_levels_test in ROCKSDB_LITE as LDBCommand is not supported

Test Plan:
make reduce_levels_test -j64
OPT=-DROCKSDB_LITE make reduce_levels_test -j64
make check -j64

Reviewers: sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D41967
2015-07-13 18:45:55 -07:00
sdong
f9728640f3 "make format" against last 10 commits
Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more.

Test Plan: Build it.

Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D41961
2015-07-13 13:50:18 -07:00
Dmitri Smirnov
9dbde7277c Merge remote-tracking branch 'origin' into ms_win_port 2015-07-02 11:34:22 -07:00
Dmitri Smirnov
18285c1e2f Windows Port from Microsoft
Summary: Make RocksDb build and run on Windows to be functionally
 complete and performant. All existing test cases run with no
 regressions. Performance numbers are in the pull-request.

 Test plan: make all of the existing unit tests pass, obtain perf numbers.

 Co-authored-by: Praveen Rao praveensinghrao@outlook.com
 Co-authored-by: Sherlock Huang baihan.huang@gmail.com
 Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
 Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com
2015-07-01 16:13:56 -07:00
Igor Canadi
619167ee66 Fix mac compile
Summary: as title

Test Plan: make check

Reviewers: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D40785
2015-06-26 10:29:24 -07:00
Michael Callahan
15325bf55b First version of rocksdb_dump and rocksdb_undump.
Summary: Hack up rocksdb_dump and rocksdb_undump utilities to get this task rolling/promote discussion.

Test Plan: Dump/undump databases recursively to see if nothing is lost.

Reviewers: sdong, yhchiang, rven, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37269
2015-06-19 16:24:36 -07:00
Yueh-Hsuan Chiang
2e764f06ea [API Change] Improve EventListener::OnFlushCompleted interface
Summary:
EventListener::OnFlushCompleted() now passes a structure instead
of a list of parameters.  This minimizes the API change in the
future.

Test Plan:
listener_test
compact_files_test
example/compact_files_example

Reviewers: kradhakrishnan, sdong, IslamAbdelRahman, rven, igor

Reviewed By: rven, igor

Subscribers: IslamAbdelRahman, rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39543
2015-06-05 12:28:51 -07:00
Yueh-Hsuan Chiang
fc83821270 Add EventListener::OnTableFileCreated()
Summary:
Add EventListener::OnTableFileCreated(), which will be called
when a table file is created.  This patch is part of the
EventLogger and EventListener integration.

Test Plan: Augment existing test in db/listener_test.cc

Reviewers: anthony, kradhakrishnan, rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38865
2015-06-02 14:12:23 -07:00
Yueh-Hsuan Chiang
16c197627a Fixed db_stress
Summary:
Fixed db_stress by correcting the verification of column family
names in the Listener of db_stress

Test Plan: db_stress

Reviewers: igor, sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39255
2015-05-30 14:26:00 -07:00
Yueh-Hsuan Chiang
832271f6b1 Fixed a compile warning in db_stress in NDEBUG mode.
Summary: Fixed a compile warning in db_stress in NDEBUG mode.

Test Plan: make OPT=-DNDEBUG db_stress

Reviewers: sdong, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39213
2015-05-29 15:00:25 -07:00
agiardullo
dc9d70de65 Optimistic Transactions
Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.

Test Plan: Added a new test, but still need to look into stress testing.

Reviewers: yhchiang, igor, rven, sdong

Reviewed By: sdong

Subscribers: adamretter, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33435
2015-05-29 14:36:35 -07:00
Yueh-Hsuan Chiang
d5a0c0e69b Fixed a compile warning in db_stress
Summary:
Fixed the following compile warning in db_stress:
error: 'OnCompactionCompleted' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]

Test Plan: make db_stress

Reviewers: sdong, igor, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39207
2015-05-29 13:37:59 -07:00
Yueh-Hsuan Chiang
9ffc8ba024 Include EventListener in stress test.
Summary: Include EventListener in stress test.

Test Plan: make blackbox_crash_test whitebox_crash_test

Reviewers: anthony, igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39105
2015-05-29 13:17:49 -07:00
agiardullo
c815351038 Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
Aaron Schlesinger
6116ccc232 moving dockerfile to root 2015-05-22 16:06:53 -07:00
Aaron Schlesinger
d90cee9fd3 adding docker build script and dockerfile 2015-05-22 16:03:39 -07:00
Mark Callaghan
88044340c1 Add Size-GB column to benchmark reports
Summary:
See https://gist.github.com/mdcallag/b867ee051d765760be0d for a sample

Task ID: #

Blame Rev:

Test Plan:
Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37971
2015-05-02 07:46:12 -07:00