Commit Graph

808 Commits

Author SHA1 Message Date
Michael Liu
ca89ac2ba9 Apply modernize-use-override (2nd iteration)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14090024

fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
2019-02-14 14:41:36 -08:00
Yanqin Jin
4fc442029a Avoid using kInAtomicGroup tag for single-cf op (#4981)
Summary:
if an operation just involves a single column family, then we do
not have to set the kInAtomicGroup tag when writing to MANIFEST. This change
can fix a compatibility test failure, i.e. 5.15 and earlier cannot recognize
kInAtomicGroup tag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4981

Differential Revision: D14072687

Pulled By: riversand963

fbshipit-source-id: 46b0c61e399f16c6b7169de0b33430d0ed90d6d4
2019-02-13 18:33:42 -08:00
Andrew Kryczka
62f70f6d14 Reduce scope of compression dictionary to single SST (#4952)
Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.

So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:

- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952

Differential Revision: D13967980

Pulled By: ajkr

fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
2019-02-11 19:47:32 -08:00
Andrew Kryczka
1218704b61 Fix compression_zstd_max_train_bytes coverage in stress test (#4957)
Summary:
Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957

Differential Revision: D13994302

Pulled By: ajkr

fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65
2019-02-11 14:56:39 -08:00
Siying Dong
cf3a671733 Remove cuckoo hash memtable (#4953)
Summary:
Cuckoo Hash is less useful than we initially expected. Remove it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953

Differential Revision: D13979264

Pulled By: siying

fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed
2019-02-07 16:15:27 -08:00
Siying Dong
d9c9f3c809 db_bench: fix "micros/op" reporting (#4949)
Summary:
4985a9f73b (diff-e5276985b26a0551957144f4420a594bR511)
changes the meaning of latency reporting from running time per query, to elapse_time / #ops, without providing a reason why.
Considering that this is a counter-intuitive reporting, Reverting the change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4949

Differential Revision: D13964684

Pulled By: siying

fbshipit-source-id: d6304d3d4b5a802daa292302623c7dbca9a680bc
2019-02-05 17:20:02 -08:00
Alexander Zinoviev
32a6dd9a41 Add a new CPU time counter to compaction report (#4889)
Summary:
Measure CPU time consumed for a compaction and report it in the stats report
Enable NowCPUNanos() to work for MacOS
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889

Differential Revision: D13701276

Pulled By: zinoale

fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055
2019-01-29 17:24:00 -08:00
zhichao-cao
e2547103fd Fix the build error caused by the dynamic array (#4918)
Summary:
In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256.

Tested with make check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4918

Differential Revision: D13844298

Pulled By: sagar0

fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab
2019-01-28 12:39:57 -08:00
Andrew Kryczka
e242fa4664 Add latest toolchain (gcc-8, etc.) build support for fbcode users (#4923)
Summary:
- When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1`
- It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090
- I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923

Differential Revision: D13827226

Pulled By: ajkr

fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947
2019-01-28 11:26:32 -08:00
Sagar Vemuri
0cead31d10 Fix Clang static analyzer warning in db_bench (#4910)
Summary:
Fixed clang static analyzer warning about division by 0.
```
ar: creating librocksdb_debug.a
tools/db_bench_tool.cc:4650:43: warning: Division by zero
      int pos = static_cast<int>(rand_num % range_);
                                 ~~~~~~~~~^~~~~~~~
1 warning generated.
make: *** [analyze] Error 1
```

This is from the new code I recently merged in ce8e88d2d7.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4910

Differential Revision: D13788037

Pulled By: sagar0

fbshipit-source-id: f48851dca85047c19fbb1a361e25ce643aa4c7ea
2019-01-23 13:33:02 -08:00
Zhongyi Xie
cbe0239270 add cast to avoid loss of precision error (#4906)
Summary:
this PR address the following error:
> tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32]
        s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size));
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906

Differential Revision: D13780185

Pulled By: miasantreble

fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3
2019-01-22 22:44:17 -08:00
Zhichao Cao
ce8e88d2d7 Generate mixed workload with Get, Put, Seek in db_bench (#4788)
Summary:
Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics.

After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The *_accessed_key_stats.txt,  *-accessed_value_size_distribution.txt, *-iterator_length_distribution.txt, and *-qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer))

The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench

For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k*(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html)

As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_a*sin(sine_b*x + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "*-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes.

To use the bench mark, user can indicate the following parameters as examples:
```
-benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788

Differential Revision: D13573940

Pulled By: sagar0

fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222
2019-01-22 10:44:26 -08:00
Andrew Kryczka
01013ae766 Digest ZSTD compression dictionary once when writing SST file (#4849)
Summary:
This is essentially a re-submission of #4251 with a few improvements:

- Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict`
- Eliminated `Init` functions. Instead do all initialization work in constructors.
- Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849

Differential Revision: D13606039

Pulled By: ajkr

fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447
2019-01-18 19:12:57 -08:00
Siying Dong
4e37251b4d With ldb --try_load_options and wal_dir doesn't exist, ignore it (#4875)
Summary:
LDB is frequently used to exam data copied. wal_dir in option file is not modified and it usually points to the path it copied from.
The user experience will be better if when ldb sees wal_dir pointed by the option file doesn't exist, rather than fail, just ignore it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4875

Differential Revision: D13643173

Pulled By: siying

fbshipit-source-id: 2e64d4ea2ec49a6794b9a706b7fc1ba901128bb8
2019-01-11 16:48:32 -08:00
Yanqin Jin
ffc9f84649 Free memory after use
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4857

Differential Revision: D13602688

Pulled By: riversand963

fbshipit-source-id: 993419a6afb982a7a701ff71daebebb4b4a6b265
2019-01-08 17:19:09 -08:00
Yanqin Jin
e686caffec Remove unnecessary assersion in AtomicFlushStressTest::TestCheckpoint (#4846)
Summary:
as titled.
We can remove the assersion because we do not perform verification in
AtomicFlushStressTest::TestCheckpoint for similar reasons to TestGet, TestPut,
etc.
Therefore, we override TestCheckpoint in AtomicFlushStressTest so that the
assertion `rand_column_families.size() == rand_keys.size()' is removed, and we
do not verify the DB in this function.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4846

Differential Revision: D13583377

Pulled By: riversand963

fbshipit-source-id: 03647f3da67e27a397413fd666e3bb43003bf596
2019-01-07 16:47:26 -08:00
Huachao Huang
74f7d7551e tools: use provided options instead of the default (#4839)
Summary:
The current implementation hardcode the default options in different
places, which makes it impossible to support other environments (like
encrypted environment).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4839

Differential Revision: D13573578

Pulled By: sagar0

fbshipit-source-id: 76b58b4b758902798d10ff2f52d9f39abff015e7
2019-01-03 11:23:49 -08:00
Yanqin Jin
565b5bdc42 Add support for read-only db chkpt stress (#4690)
Summary:
Updated stress test will support testing of db in read-only mode.
The user has to make sure that only read/scan operations are enabled.
This PR relies on #4681.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4690

Differential Revision: D13102741

Pulled By: riversand963

fbshipit-source-id: f5a36b34db187fe12dd355f7eda161f99d6c75e4
2019-01-02 17:40:53 -08:00
Andrew Kryczka
ace543a815 fix accounting for range tombstones in TableProperties (#4841)
Summary:
- To be consistent with the accounting of other optypes in `TableProperties`, we should count range tombstones in `TableProperties::num_entries` and `TableProperties::num_deletions`.
- Updated assertions in stress test's `OnTableFileCreated` handler to accept files with range tombstones only.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4841

Differential Revision: D13568424

Pulled By: ajkr

fbshipit-source-id: 0139d7806494eda20ece67ec460d2458dbbf6026
2019-01-02 15:08:53 -08:00
Andrew Kryczka
68d949b3e3 Enable DeleteRange in stress/crash tests (#4483)
Summary:
Set `delrangepercent=1` when `test_batches_snapshots=false`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4483

Differential Revision: D10324361

Pulled By: ajkr

fbshipit-source-id: 0cde1f1504f9493408a0c6493b976d7e5f5b2d23
2018-12-18 13:42:49 -08:00
Fosco Marotto
311cd8cf2f Updated benchmark script (#4134)
Summary:
When producing the updated performance on flash results for the wiki, these are the updates which were made.

https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4134

Differential Revision: D13491052

Pulled By: gfosco

fbshipit-source-id: dcd92f24659e0917cb1ac54a4446aa8e7aac8b0d
2018-12-17 16:34:30 -08:00
Andrew Kryczka
8d2b74d287 Refine db_stress params for atomic flush (#4781)
Summary:
Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781

Differential Revision: D13463211

Pulled By: ajkr

fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd
2018-12-13 22:10:38 -08:00
DorianZheng
2670fe8c73 Get CompactionJobInfo from CompactFiles
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716

Differential Revision: D13207677

Pulled By: ajkr

fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b
2018-12-13 14:21:24 -08:00
Sagar Vemuri
70645355ad Move FIFOCompactionPicker to a separate file (#4724)
Summary:
**Summary:**
Simplified the code layout by moving FIFOCompactionPicker to a separate file.
**Why?:**
While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724

Differential Revision: D13227914

Pulled By: sagar0

fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3
2018-11-29 16:04:52 -08:00
Sagar Vemuri
c94f073e5e Fix Mac build break in casting (#4722)
Summary:
Mac build is failing with the below error:
```
$ make db_bench -j8
...
...
tools/db_bench_tool.cc:4583:25: error: no matching function for call to 'max'
              (uint64_t)std::max(0l, seek_pos - FLAGS_max_scan_distance),
                        ^~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2717:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('long' vs. 'long long')
max(const _Tp& __a, const _Tp& __b)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2727:1: note: candidate template ignored: could not match 'initializer_list<type-parameter-0-0>' against 'long'
max(initializer_list<_Tp> __t, _Compare __comp)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2709:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided
max(const _Tp& __a, const _Tp& __b, _Compare __comp)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2735:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were provided
max(initializer_list<_Tp> __t)
^
1 error generated.
make: *** [tools/db_bench_tool.o] Error 1
```

My compiler version:
Mac OS X Mojave
```
$ clang++ --version
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4722

Differential Revision: D13220196

Pulled By: sagar0

fbshipit-source-id: 01e5e928288a5613027c83a26ad8aedf04438b14
2018-11-27 13:30:16 -08:00
Huachao Huang
5e72bc113a Add SstFileReader to read sst files (#4717)
Summary:
A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.

TODO: there is only a very simple test now, because I want to get some feedback first.
If the changes look good, I will add more tests soon.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717

Differential Revision: D13212686

Pulled By: ajkr

fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
2018-11-27 13:02:23 -08:00
Po-Chuan Hsieh
60deb4485e Fix build with ROCKSDB_LITE and -Wunused-private-field (#4715)
Summary:
The error message of databases/rocksdb-lite (FreeBSD port) is as follows:
```
  tools/db_bench_tool.cc:1976:16: error: private field 'trace_options_' is not used [-Werror,-Wunused-private-field]
    TraceOptions trace_options_;
                 ^
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4715

Differential Revision: D13207902

Pulled By: ajkr

fbshipit-source-id: be3c612eba656aeddb77e35e2f201dd25dc92f7e
2018-11-26 21:35:38 -08:00
Abhishek Madan
0ed738fdd0 Add max_scan_distance flag to db_bench (#4660)
Summary:
The new flag makes it possible to constrain iterator traversal
by the upper/lower bound the iterator is expected to pass. This allows
seekrandom results to be more easily comparable between DBs with and
without deletions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4660

Differential Revision: D13053111

Pulled By: abhimadan

fbshipit-source-id: 33e250f2e2d210b54c7726399da30a33f723c33c
2018-11-14 10:46:12 -08:00
Yanqin Jin
de65103553 Improve result report of scan (#4648)
Summary:
When iterator becomes invalid, there are two possibilities.
First, all data in the column family have been scanned and there is nothing
more to scan.
Second, an underlying error has occurred, causing `status()` to be !ok.
Therefore, we need to check for both cases when `!iter->Valid()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4648

Differential Revision: D12959601

Pulled By: riversand963

fbshipit-source-id: 49c9382c9ea9e78f2e2b6f3708f0670b822ca8dd
2018-11-13 20:03:59 -08:00
Zhichao Cao
d761857d56 Add unique key number changing statistics to Trace_analyzer (#4646)
Summary:
Changes:
1. in current version, key size distribution is printed out as the result. In this change, the result will be output to a file to make further analyze easier
2. To understand how the unique keys are accessed over time, the total unique key number of each CF of each query type in each second over time is output to a file. In this way, user could know when the unique keys are accessed frequently or accessed rarely.
3. output the total QPS of each CF to a file
4. Add the print result of total queries of each CF of each query type.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4646

Differential Revision: D12968156

Pulled By: zhichao-cao

fbshipit-source-id: 6c411c7ec47c7843a70929136efd71a150db0e4c
2018-11-12 08:26:50 -08:00
Sagar Vemuri
dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Andrew Kryczka
8ba17f382e Verify restore from backup in db_stress (#4655)
Summary:
We already exercised backup functionality in `db_stress` according to the `-backup_one_in` flag. This PR verifies the backup can be restored/opened and sanity checks a few keys. Changes in this PR:

- Extracted existing backup-related logic to a helper function, `TestBackupRestore`
- Added restore logic, which targets a hidden directory named "./.restore\<thread number\>", similar to how backups target hidden directories named "./.backup\<thread number\>".
- After restore, check the existence/non-existence of a few keys.
- With this PR, backup is no longer compatible with clearing column families.
- Also included unrelated fixes to set `ReadOptions::total_order_seek=true` when using `-compare_full_db_state_snapshot`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4655

Differential Revision: D12972496

Pulled By: ajkr

fbshipit-source-id: 481a40052d9a38d1bd5c5159aa4d7c5a4b546b80
2018-11-08 15:15:24 -08:00
Yanqin Jin
d7a04383d1 Include newer RocksDB versions in compat test (#4634)
Summary:
Include 5.16 and 5.17 in check_format_compatible.sh
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4634

Differential Revision: D12947140

Pulled By: riversand963

fbshipit-source-id: 79852b76d5139b2f31db59ed14cb368be01f2c32
2018-11-06 14:25:39 -08:00
Yanqin Jin
50895e5f0d Update manual flush stress test (#4608)
Summary:
Originally, the manual flush calls in db_stress flushes only a single column
family, which is not sufficient when atomic flush is enabled.
With atomic flush, we should call `Flush(flush_opts, cfhs)` to better test this
new feature. Specifically, we manuall flush all column families so that
database verification is easier.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4608

Differential Revision: D12849160

Pulled By: riversand963

fbshipit-source-id: ae1f0dd825247b42c0aba520a5c967335102c876
2018-10-30 17:30:28 -07:00
Abhishek Madan
eaaf1a6f05 Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594)
Summary:
Since the number of range deletions are reported in
TableProperties, it is confusing to not report the number of merge
operands and point deletions as top-level properties; they are
accessible through the public API, but since they are not the "main"
properties, they do not appear in aggregated table properties, or the
string representation of table properties.

This change promotes those two property keys to
`rocksdb/table_properties.h`, adds corresponding uint64 members for
them, deprecates the old access methods `GetDeletedKeys()` and
`GetMergeOperands()` (though they are still usable for now), and removes
`InternalKeyPropertiesCollector`. The property key strings are the same
as before this change, so this should be able to read DBs written from older
versions (though I haven't tested this yet).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594

Differential Revision: D12826893

Pulled By: abhimadan

fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
2018-10-30 15:34:27 -07:00
Yanqin Jin
912bbbbc72 Enable crash-recovery stress test for atomic flush (#4605)
Summary:
This PR adds test of atomic flush to our continuous stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605

Differential Revision: D12840607

Pulled By: riversand963

fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9
2018-10-30 14:03:36 -07:00
Yanqin Jin
7fb39f1ae1 Fix a warning against implicit type conversion (#4593)
Summary:
Test plan
```
$USE_CLANG=1 make -j32 all check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4593

Differential Revision: D12811159

Pulled By: riversand963

fbshipit-source-id: 5e3bbe058c5a8d5a286a19d7643593fc154a2d6d
2018-10-29 09:54:36 -07:00
Yanqin Jin
5b4c709fad Enable atomic flush (#4023)
Summary:
Adds a DB option `atomic_flush` to control whether to enable this feature. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4023

Differential Revision: D8518381

Pulled By: riversand963

fbshipit-source-id: 1e3bb33e99bb102876a31b378d93b0138ff6634f
2018-10-26 15:08:43 -07:00
Zhongyi Xie
fe0d23059d Fix two contrun job failures (#4587)
Summary:
Currently there are two contrun test failures:
* rocksdb-contrun-lite:
> tools/db_bench_tool.cc: In function ‘int rocksdb::db_bench_tool(int, char**)’:
tools/db_bench_tool.cc:5814:5: error: ‘DumpMallocStats’ is not a member of ‘rocksdb’
     rocksdb::DumpMallocStats(&stats_string);
     ^
make: *** [tools/db_bench_tool.o] Error 1
* rocksdb-contrun-unity:
> In file included from unity.cc:44:0:
db/range_tombstone_fragmenter.cc: In member function ‘void rocksdb::FragmentedRangeTombstoneIterator::FragmentTombstones(std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice> >, rocksdb::SequenceNumber)’:
db/range_tombstone_fragmenter.cc:90:14: error: reference to ‘ParsedInternalKeyComparator’ is ambiguous
   auto cmp = ParsedInternalKeyComparator(icmp_);

This PR will fix them
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4587

Differential Revision: D10846554

Pulled By: miasantreble

fbshipit-source-id: 8d3358879e105060197b1379c84aecf51b352b93
2018-10-24 20:16:45 -07:00
Yi Wu
0415244bfa option to print malloc stats at the end of db_bench (#4582)
Summary:
Option to print malloc stats to stdout at the end of db_bench. This is different from `--dump_malloc_stats`, which periodically print the same information to LOG file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4582

Differential Revision: D10520814

Pulled By: yiwu-arbug

fbshipit-source-id: beff5e514e414079d31092b630813f82939ffe5c
2018-10-24 11:39:05 -07:00
Simon Grätzer
f959e88048 Fix printf formatting on MacOS (#4533)
Summary:
On MacOS with clang the compilation of _tools/db_bench_tool.cc_ always fails because the format used in a `fprintf` call has the wrong type. This PR should hopefully fix this issue
```
tools/db_bench_tool.cc:4233:61: error: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long')
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4533

Differential Revision: D10471657

Pulled By: maysamyabandeh

fbshipit-source-id: f20f5f3756d3571b586c895c845d0d4d1e34a398
2018-10-19 14:46:09 -07:00
Yanqin Jin
da4aa59b4c Add read retry support to log reader (#4394)
Summary:
Current `log::Reader` does not perform retry after encountering `EOF`. In the future, we need the log reader to be able to retry tailing the log even after `EOF`.

Current implementation is simple. It does not provide more advanced retry policies. Will address this in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4394

Differential Revision: D9926508

Pulled By: riversand963

fbshipit-source-id: d86d145792a41bd64a72f642a2a08c7b7b5201e1
2018-10-19 11:53:00 -07:00
Abhishek Madan
35cd754a6d Add writes_before_delete_range flag to db_bench (#4538)
Summary:
The new flag allows tombstones to be generated after enough
keys have been written to the database, which makes it easier to ensure
that tombstones cover a lot of keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4538

Differential Revision: D10455685

Pulled By: abhimadan

fbshipit-source-id: f25d5421745a353c830dea12b79784e852056551
2018-10-18 17:19:59 -07:00
Zhongyi Xie
d6ec288703 Add PerfContextByLevel to provide per level perf context information (#4226)
Summary:
Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level.
This will be helpful when analyzing point and range query performance as well as tuning bloom filter
Also replaced __thread with thread_local keyword for perf_context
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226

Differential Revision: D10369509

Pulled By: miasantreble

fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82
2018-10-17 11:19:40 -07:00
Young Tack Jin
c648d90f8e benchmark.sh: to fix divide by zero runtime error (#4442)
Summary:
"Write (GB)" of $9 rather than "Rnp1 (GB)" of $8
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4442

Differential Revision: D10318193

Pulled By: yiwu-arbug

fbshipit-source-id: 03a7ef1938d9332e06fb3fd8490ca212f61fac6b
2018-10-10 21:03:19 -07:00
Zhichao Cao
7ca1a1f0d8 Fix trace_analyzer potential huge memory wasting due to no valid query analyzed (#4473)
Summary:
If the query types being analyzed do not appear in the trace, the current trace_analyzer will use 0 as the begin time, which create the time duration from 1970/01/01 to the now time. It will waste huge memory. Fixed by adding the trace_create_time to limit the duration.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4473

Differential Revision: D10246204

Pulled By: zhichao-cao

fbshipit-source-id: 42850b080b2e62f586fe73afd7737c2246d1a8c8
2018-10-10 10:00:00 -07:00
Igor Canadi
1cf5deb8fd Introduce CacheAllocator, a custom allocator for cache blocks (#4437)
Summary:
This is a conceptually simple change, but it touches many files to
pass the allocator through function calls.

We introduce CacheAllocator, which can be used by clients to configure
custom allocator for cache blocks. Our motivation is to hook this up
with folly's `JemallocNodumpAllocator`
(f43ce6d686/folly/experimental/JemallocNodumpAllocator.h),
but there are many other possible use cases.

Additionally, this commit cleans up memory allocation in
`util/compression.h`, making sure that all allocations are wrapped in a
unique_ptr as soon as possible.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437

Differential Revision: D10132814

Pulled By: yiwu-arbug

fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564
2018-10-02 17:24:58 -07:00
Andrew Kryczka
d56070d875 Fix benchmark script with vector memtable (#4428)
Summary:
I guess we didn't update this script when `--allow_concurrent_memtable_write` became true by default.

Fixes #4413.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4428

Differential Revision: D10036452

Pulled By: ajkr

fbshipit-source-id: f464be0642bd096d9040f82cdc3eae614a902183
2018-09-26 13:22:45 -07:00
Abhishek Madan
519f8b145f Generate appropriate number of keys in db_bench (#4404)
Summary:
If range tombstones are generated every few writes, the
KeyGenerator's limit is now extended to account for the additional
Next() calls. This is primarily important for `filluniquerandom`
benchmarks that enforce the call limit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404

Differential Revision: D9949326

Pulled By: abhimadan

fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38
2018-09-19 16:28:21 -07:00
Zhongyi Xie
9b3cf908a6 add missing range in random.choice argument (#4397)
Summary:
This will fix the broken asan crash test:
> Traceback (most recent call last):
  File "tools/db_crashtest.py", line 384, in <module>
    main()
  File "tools/db_crashtest.py", line 368, in main
    parser.add_argument("--" + k, type=type(v() if callable(v) else v))
  File "tools/db_crashtest.py", line 59, in <lambda>
    "index_block_restart_interval": lambda: random.choice(1, 16),
TypeError: choice() takes exactly 2 arguments (3 given)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397

Differential Revision: D9933041

Pulled By: miasantreble

fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
2018-09-19 12:13:20 -07:00
Maysam Yabandeh
a0ebec3804 Extend crash test with index_block_restart_interval (#4383)
Summary:
The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383

Differential Revision: D9887304

Pulled By: maysamyabandeh

fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
2018-09-18 15:43:29 -07:00
Andrew Kryczka
8c25204633 Support manual flush in stress/crash tests (#4368)
Summary:
- Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
- Enabled by default in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368

Differential Revision: D9838593

Pulled By: ajkr

fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
2018-09-17 12:27:55 -07:00
Anand Ananthabhotla
a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00
Dmitri Smirnov
879998b369 Adjust c test and fix windows compilation issues
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4369

Differential Revision: D9844200

Pulled By: sagar0

fbshipit-source-id: 0d9f5f73b28234eaac55d3551ce4e2dc177af138
2018-09-14 20:57:22 -07:00
Andrew Kryczka
c94523ee56 Delete code for WAL reader to start at nonzero offset (#4362)
Summary:
The code is dead in RocksDB as `log::Reader::initial_offset_` is always zero. We should delete it so we don't have to maintain it like in #4359.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4362

Differential Revision: D9817829

Pulled By: ajkr

fbshipit-source-id: 474a2c679e5bd273b40608f3a5332931d9eefe6d
2018-09-13 17:13:03 -07:00
kckjn97
902261519e correct mistyped msg. (#4341)
Summary:
corrected the mistyped message.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4341

Differential Revision: D9816571

Pulled By: ajkr

fbshipit-source-id: 1df0424e981a01470a638a37b925c4133d59a48b
2018-09-13 14:57:38 -07:00
Maysam Yabandeh
3f5282268f Skip concurrency control during recovery of pessimistic txn (#4346)
Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346

Differential Revision: D9759149

Pulled By: maysamyabandeh

fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
2018-09-10 16:57:53 -07:00
Andrew Kryczka
2c14662213 Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347)
Summary:
Reverting is needed to unblock a user building against master, who is blocked for multiple days due to a thread-safety issue in `GetEmptyDict`. We haven't been able to fix it quickly, so reverting.

Simply ran `git revert 6c40806e51a89386d2b066fddf73d3fd03a36f65`. There were no merge conflicts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4347

Differential Revision: D9668365

Pulled By: ajkr

fbshipit-source-id: 0c56334f0a23cf5ee0233d4e4679eae6709739cd
2018-09-06 09:58:34 -07:00
Andrew Kryczka
1a88c43751 Reduce empty SST creation/deletion in compaction (#4336)
Summary:
This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug.

Also fixed an uninitialized variable in db_stress.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336

Differential Revision: D9600080

Pulled By: ajkr

fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7
2018-08-31 12:28:52 -07:00
Zhongyi Xie
1cf17ba53b Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323)
Summary:
Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323

Differential Revision: D9599170

Pulled By: miasantreble

fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
2018-08-30 18:42:51 -07:00
Shrikanth Shankar
4848bd0c4e Drop unnecessary deletion markers during compaction (issue - 3842) (#4289)
Summary:
This PR fixes issue 3842. We drop deletion markers iff
1. We are the bottom most level AND
2. All other occurrences of the key are in the same snapshot range as the delete

I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far)

make check -j64
db_stress
db_stress  --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */
./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289

Differential Revision: D9491165

Pulled By: shrikanthshankar

fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2
2018-08-24 15:17:54 -07:00
Yanqin Jin
8022500ecc Add compatibility test of SST ingestion (#4310)
Summary:
Test plan
```
$cd rocksdb/
$./tools/check_format_compatible.sh
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4310

Differential Revision: D9498125

Pulled By: riversand963

fbshipit-source-id: 83cf6992949a52199e7812bb41bc9281ac271a24
2018-08-24 14:27:43 -07:00
Andrew Kryczka
e7bb8e9b92 Fix clang build of db_stress (#4312)
Summary:
Blame: #4307
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312

Differential Revision: D9494093

Pulled By: ajkr

fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
2018-08-23 21:57:57 -07:00
Andrew Kryczka
6c40806e51 Digest ZSTD compression dictionary once per SST file (#4251)
Summary:
In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.

ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.

There are a couple other changes included in this PR:

- Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
- Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251

Differential Revision: D9257078

Pulled By: ajkr

fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
2018-08-23 19:28:18 -07:00
Andrew Kryczka
ee234e83e3 Invoke OnTableFileCreated for empty SSTs (#4307)
Summary:
The API comment on `OnTableFileCreationStarted` (b6280d01f9/include/rocksdb/listener.h (L331-L333)) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.

This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307

Differential Revision: D9485201

Pulled By: ajkr

fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
2018-08-23 18:27:30 -07:00
zhichao-cao
cf7150ac2e Add the unit test of Iterator to trace_analyzer_test (#4282)
Summary:
Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282

Differential Revision: D9436758

Pulled By: zhichao-cao

fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
2018-08-23 17:28:32 -07:00
Yanqin Jin
bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00
Fenggang Wu
9d646a6311 Add db_bench options of data block hash index (#4281)
Summary:
Add `--data_block_index_type` and `--data_block_hash_table_util_ratio` option to `db_bench`.

`--data_block_index_type` can be either of `binary` (default) or `binary_and_hash`;
`--data_block_hash_table_util_ratio` will be a double. The default value is `0.75`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4281

Differential Revision: D9361476

Pulled By: fgwu

fbshipit-source-id: dc53e01acef9db81b9eec5e8a96f3bc8ed718c10
2018-08-16 18:42:46 -07:00
Siying Dong
9c0c8f5ff6 GetAllKeyVersions() to take an extra argument of max_num_ikeys. (#4271)
Summary:
Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271

Differential Revision: D9369149

Pulled By: siying

fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf
2018-08-16 15:57:08 -07:00
Zhichao Cao
8ae2bf5331 Fix the build and test bugs in the Trace_analyzer (#4274)
Summary:
The wrong options are used in the trace_analyzer_test, removed. The potential loses integer precision are fixed.

Pass the specified testing case, make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4274

Reviewed By: yiwu-arbug

Differential Revision: D9327811

Pulled By: zhichao-cao

fbshipit-source-id: d62cb18d6586503a490cd323bfc1c672b68b346e
2018-08-14 18:27:48 -07:00
Anand Ananthabhotla
bf07e90cf2 Fix db_stress assertion failures on 0 byte SSTs (#4273)
Summary:
In the OnTableFileCreation() listener, assert on various TableProperties
only when file size > 0 bytes. The listener can get called even for 0
byte SSTs which have been deleted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4273

Differential Revision: D9322738

Pulled By: anand1976

fbshipit-source-id: 17cdfb3d0da946b9a158d7328e5db1c87973956b
2018-08-14 14:58:26 -07:00
Maysam Yabandeh
d122025891 Extend stress test to format_version 4 (#4265)
Summary:
Stress tests currently cover format_version 2 and 3. The patch adds 4 as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4265

Differential Revision: D9323185

Pulled By: maysamyabandeh

fbshipit-source-id: 54d11e41ecae09bae14cadd7313f07c9a3db5a57
2018-08-14 14:13:33 -07:00
Zhichao Cao
999d955e4f RocksDB Trace Analyzer (#4091)
Summary:
A framework of trace analyzing for RocksDB

After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
**Input:**
1. trace file
2. Whole keys space file

**Statistics:**
1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
2. Key hotness (access count) of each one
3. Key space separation based on given prefix
4. Key size distribution
5. Value size distribution if appliable
6. Top K accessed keys
7. QPS statistics including the average QPS and peak QPS
8. Top K accessed prefix
9. The query correlation analyzing, output the number of X after Y and the corresponding average time
    intervals

**Output:**
1. key access heat map (either in the accessed key space or whole key space)
2. trace sequence file (interpret the raw trace file to line base text file for future use)
3. Time serial (The key space ID and its access time)
4. Key access count distritbution
5. Key size distribution
6. Value size distribution (in each intervals)
7. whole key space separation by the prefix
8. Accessed key space separation by the prefix
9. QPS of each operation and each column family
10. Top K QPS and their accessed prefix range

**Test:**
1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
2. Generated the trace and analyze the trace

**Implemented but not tested (due to the limitation of trace_replay):**
1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
2. Analyzing the number of Key found by Get

**Future Work:**
1.  Support execution time analyzing of each requests
2.  Support cache hit situation and block read situation of Get
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091

Differential Revision: D9256157

Pulled By: zhichao-cao

fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
2018-08-13 11:44:02 -07:00
Yanqin Jin
1b1d264342 Remove an assersion about file size (#4268)
Summary:
Due to 4ea56b1bd0, we should also remove the
assersion in stress test. This removal can be temporary, and we can add it back
once we figure out the reason for the 0-byte SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4268

Differential Revision: D9297186

Pulled By: riversand963

fbshipit-source-id: cebba9a68f42e815f8cf24471176d2cfdf962f63
2018-08-13 11:12:50 -07:00
Yanqin Jin
b271f956c2 Fix a TSAN failure (#4250)
Summary:
TSAN fails due to comparison between signed int and unsigned long. Fix it by
static_casting.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4250

Differential Revision: D9256535

Pulled By: riversand963

fbshipit-source-id: c6bad23ff70c6d0ec58e2e85c401ce0ad45de609
2018-08-09 19:42:32 -07:00
Dmitri Smirnov
ab22cf349e Implement Env::NumFileLinks (#4221)
Summary:
Although delete scheduler implementation allows for the interface not to be supported, the delete_scheduler_test does not allow for that.
Address compiler warnings
Make sst_dump_test use test directory structure as the current execution directory may not be writiable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4221

Differential Revision: D9210152

Pulled By: siying

fbshipit-source-id: 381a74511e969ecb8089d5c4b4df87dc30c8df63
2018-08-09 14:29:11 -07:00
Yanqin Jin
de7f423a82 Add SST ingestion to ldb (#4205)
Summary:
We add two subcommands `write_extern_sst` and `ingest_extern_sst` to ldb. This PR avoids changing existing code because we hope to cherry-pick to earlier releases to support compatibility check for external SST file ingestion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4205

Differential Revision: D9112711

Pulled By: riversand963

fbshipit-source-id: 7cae88380d4de86da8440230e87eca66755648e4
2018-08-09 14:29:11 -07:00
Andrew Kryczka
7a9a164276 Fix db_bench default compression level (#4248)
Summary:
db_bench's previous default compression level (-1) was not the default compression level in all libraries. In particular, in ZSTD negative values are valid compression levels, while ZSTD's default compression level is three.

This PR changes db_bench's default to be RocksDB's library-independent default compression level (see #3895). I also changed a couple other flags to get their default values from an options object directly rather than hardcoding.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4248

Differential Revision: D9235140

Pulled By: ajkr

fbshipit-source-id: be4e0722d59fa1968832183db36d1d20fcf11e5b
2018-08-09 10:28:14 -07:00
Andrew Kryczka
6175b4b294 Support dictionary compression in stress/crash tests (#4234)
Summary:
- Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test
- Randomly enable/disable the above flags in crash test
- Set `--compression_type=zstd` in FB-specific crash test runs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234

Differential Revision: D9187207

Pulled By: ajkr

fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2
2018-08-06 15:27:29 -07:00
Sagar Vemuri
fefdac1004 Fix lite build failure in db_bench due to trace/replay (#4225)
Summary:
Fix lite build failure in db_bench due to trace/replay feature.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4225

Differential Revision: D9153303

Pulled By: sagar0

fbshipit-source-id: 9f7a8035429d0dcdbe99616d11389ed7bccf44be
2018-08-03 11:58:55 -07:00
Pooja Malik
9dbf39399e Rules Advisor: some fixes to support fetching stats from ODS (#4223)
Summary:
This PR includes fixes for some bugs that I encountered while testing the Optimizer with ODS stats support.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4223

Differential Revision: D9140786

Pulled By: poojam23

fbshipit-source-id: 045cb3f27d075c2042040ac2d561938349419516
2018-08-02 15:42:42 -07:00
Pooja Malik
892a156267 Advisor: README and blog, and also tests for DBBenchRunner, DatabaseOptions (#4201)
Summary:
This pull request adds a README file and a blog post for the Advisor tool. It also adds the missing tests for some Optimizer modules. Some comments are added to the classes being tested for improved readability.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4201

Reviewed By: maysamyabandeh

Differential Revision: D9125311

Pulled By: poojam23

fbshipit-source-id: aefcf2f06eaa05490cc2834ef5aa6e21f0d1dc55
2018-08-01 16:13:09 -07:00
Sagar Vemuri
12b6cdeed3 Trace and Replay for RocksDB (#3837)
Summary:
A framework for tracing and replaying RocksDB operations.

A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench.

- Column-families are supported
- Multi-threaded tracing is supported.
- TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it.
- This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations.

Currently supported DB operations:
- Writes:
-- Put
-- Merge
-- Delete
-- SingleDelete
-- DeleteRange
-- Write
- Reads:
-- Get (point lookups)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837

Differential Revision: D7974837

Pulled By: sagar0

fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72
2018-08-01 00:27:08 -07:00
Yanqin Jin
8abafb1feb Generalize parameters generation. (#4046)
Summary:
Making generation of column families and keys virtual function so that
subclasses of StressTest can override them to provide custom parameter
generation for more flexibility. This will be useful for future tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4046

Differential Revision: D9073382

Pulled By: riversand963

fbshipit-source-id: 2754f0fdfa5c24d95c1f92d4944bc479552fb665
2018-07-30 17:42:12 -07:00
Yanqin Jin
54de56844d Remove random writes from SST file ingestion (#4172)
Summary:
RocksDB used to store global_seqno in external SST files written by
SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the
`global_seqno`. Since random write is not supported in some non-POSIX compliant
file systems, external SST file ingestion is not supported on these file
systems. To address this limitation, we no longer update `global_seqno` during
file ingestion. Later RocksDB uses the MANIFEST and other information in table
properties to deduce global seqno for externally-ingested SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172

Differential Revision: D8961465

Pulled By: riversand963

fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071
2018-07-27 16:12:23 -07:00
DorianZheng
f5e46354d2 Protect external file when ingesting (#4099)
Summary:
If crash happen after a hard link established, Recover function may reuse the file number that has already assigned to the internal file, and this will overwrite the external file. To protect the external file, we have to make sure the file number will never being reused.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/4099

Differential Revision: D9034092

Pulled By: riversand963

fbshipit-source-id: 3f1a737440b86aa2ef01673e5013aacbb7c33e28
2018-07-27 14:13:12 -07:00
Pooja Malik
134a52e144 Optimizer's skeleton: use advisor to optimize config options (#4169)
Summary:
In https://github.com/facebook/rocksdb/pull/3934 we introduced advisor scripts that make suggestions in the config options based on the log file and stats from a run of rocksdb. The optimizer runs the advisor on a benchmark application in a loop and automatically applies the suggested changes until the config options are optimized. This is a work in progress and the patch is the initial skeleton for the optimizer. The sample application that is run in the loop is currently dbbench.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4169

Reviewed By: maysamyabandeh

Differential Revision: D9023671

Pulled By: poojam23

fbshipit-source-id: a6192d475c462cf6eb2b316716f97cb400fcb64d
2018-07-26 17:13:32 -07:00
Siying Dong
4b0a43574a db_stress to cover upper bound in iterators (#4162)
Summary:
db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162

Differential Revision: D8953507

Pulled By: siying

fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b
2018-07-23 10:45:29 -07:00
Zhichao Cao
6811fb0658 Fixed the db_bench MergeRandom only access CF_default (#4155)
Summary:
When running the tracing and analyzing, I found that MergeRandom benchmark in db_bench only access the default column family even the -num_column_families is specified > 1.

changes: Using the db_with_cfh as DB to randomly select the column family to execute the Merge operation if -num_column_families is specified > 1.

Tested with make asan_check and verified in tracing
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4155

Differential Revision: D8907888

Pulled By: zhichao-cao

fbshipit-source-id: 2b4bc8fe0e99c8f262f5be6b986c7025d62cf850
2018-07-20 15:58:54 -07:00
Siying Dong
a5e851e113 Reformatting some recent changes (#4161)
Summary:
Lint is not happy with some new code recently committed. Format them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4161

Differential Revision: D8940582

Pulled By: siying

fbshipit-source-id: c9b43b1ef8c88b5e923911058b44eb77234b36b7
2018-07-20 14:43:38 -07:00
Pooja Malik
1857576e03 db_bench support for OPTIONS+bloom and nicer output for perf_context (#4153)
Summary:
Adding the string "PERF_CONTEXT:" before the perf_context stats are printed. Setting the filter policy if it's a block based table even when options are being loaded from the provided FLAGS_options_file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4153

Differential Revision: D8905517

Pulled By: poojam23

fbshipit-source-id: 5956ed7882d39ec8ae654d5dadeb88727a36f0dd
2018-07-18 16:27:49 -07:00
Maysam Yabandeh
8581a93a6b Per-thread unique test db names (#4135)
Summary:
The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135

Differential Revision: D8846653

Pulled By: maysamyabandeh

fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
2018-07-13 17:27:39 -07:00
Zhongyi Xie
23b76252c8 db_bench: enable setting cache_size when loading options file
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4118

Differential Revision: D8845554

Pulled By: miasantreble

fbshipit-source-id: 13bd3c1259a7c30bad762a413fe3bb24eea650ba
2018-07-13 16:43:53 -07:00
Zhongyi Xie
de98fd88e3 Support compaction filter in db_bench (#4106)
Summary:
Right now there is no support for enabling compaction filter in db_bench, we should add support for that to facilitate testing of compaction filter.
This PR adds a compaction filter called KeepFilter and make `Filter` always returns false, essentially a noop compaction filter. This will allow us to test compaction filter code path without having to support arbitrary compaction filters
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4106

Differential Revision: D8828517

Pulled By: miasantreble

fbshipit-source-id: 9ad76d04103eaa9d00da98334b4a39e542d26c41
2018-07-12 19:42:27 -07:00
Andrew Kryczka
97fe23fc5c Fix unsigned int flag in db_bench (#4129)
Summary:
`DEFINE_uint32` was unavailable on some platforms, e.g., https://travis-ci.org/facebook/rocksdb/jobs/403352902. Use `DEFINE_uint64` instead which should work as it's used many times elsewhere in this file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4129

Differential Revision: D8830311

Pulled By: ajkr

fbshipit-source-id: b4fc90ba3f50e649c070ce8069c68e530d731f05
2018-07-12 18:43:23 -07:00
Andrew Kryczka
63904434eb db_bench periodically dump stats to info log (#4109)
Summary:
give control of how often stats are printed, including jemalloc stats if enabled. Previously the default was 10 minutes so we'd only see updated stats for very long benchmark runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4109

Differential Revision: D8796444

Pulled By: ajkr

fbshipit-source-id: fd7902fe3f105fae89322c4ab63316bba4a2b15e
2018-07-12 15:57:42 -07:00
Manuel Ung
b9846370e9 WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078)
Summary:
This adds support for recovering WriteUnprepared transactions through the following changes:
- The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
- `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
- `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.

Commit/Rollback of live transactions is still unimplemented and will come later.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078

Differential Revision: D8703382

Pulled By: lth

fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
2018-07-06 17:59:13 -07:00
Maysam Yabandeh
235ab9dd32 Pin mmap files in ReadOnlyDB (#4053)
Summary:
https://github.com/facebook/rocksdb/pull/3881 fixed a bug where PinnableSlice pin mmap files which could be deleted with background compaction. This is however a non-issue for ReadOnlyDB when there is no compaction running and max_open_files is -1. This patch reenables the pinning feature for that case.
Closes https://github.com/facebook/rocksdb/pull/4053

Differential Revision: D8662546

Pulled By: maysamyabandeh

fbshipit-source-id: 402962602eb0f644e17822748332999c3af029fd
2018-06-27 17:13:34 -07:00
Peter (Stig) Edwards
2694b6dc26 Remove unused imports, from python scripts. (#4057)
Summary:
Also remove redefined variable.
As reported on https://lgtm.com/projects/g/facebook/rocksdb/
Closes https://github.com/facebook/rocksdb/pull/4057

Differential Revision: D8648342

Pulled By: ajkr

fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183
2018-06-26 12:43:04 -07:00
Yanqin Jin
2729dd72ad Reclaim memory allocated to backup_engine.
Summary: Closes https://github.com/facebook/rocksdb/pull/4045

Differential Revision: D8595609

Pulled By: riversand963

fbshipit-source-id: 5ba5954d804b82b0e7264b2e18e1da4c94103b53
2018-06-23 17:12:14 -07:00