Commit Graph

807 Commits

Author SHA1 Message Date
Andrew Kryczka
8c25204633 Support manual flush in stress/crash tests (#4368)
Summary:
- Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
- Enabled by default in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368

Differential Revision: D9838593

Pulled By: ajkr

fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
2018-09-17 12:27:55 -07:00
Anand Ananthabhotla
a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00
Dmitri Smirnov
879998b369 Adjust c test and fix windows compilation issues
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4369

Differential Revision: D9844200

Pulled By: sagar0

fbshipit-source-id: 0d9f5f73b28234eaac55d3551ce4e2dc177af138
2018-09-14 20:57:22 -07:00
Andrew Kryczka
c94523ee56 Delete code for WAL reader to start at nonzero offset (#4362)
Summary:
The code is dead in RocksDB as `log::Reader::initial_offset_` is always zero. We should delete it so we don't have to maintain it like in #4359.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4362

Differential Revision: D9817829

Pulled By: ajkr

fbshipit-source-id: 474a2c679e5bd273b40608f3a5332931d9eefe6d
2018-09-13 17:13:03 -07:00
kckjn97
902261519e correct mistyped msg. (#4341)
Summary:
corrected the mistyped message.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4341

Differential Revision: D9816571

Pulled By: ajkr

fbshipit-source-id: 1df0424e981a01470a638a37b925c4133d59a48b
2018-09-13 14:57:38 -07:00
Maysam Yabandeh
3f5282268f Skip concurrency control during recovery of pessimistic txn (#4346)
Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346

Differential Revision: D9759149

Pulled By: maysamyabandeh

fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
2018-09-10 16:57:53 -07:00
Andrew Kryczka
2c14662213 Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347)
Summary:
Reverting is needed to unblock a user building against master, who is blocked for multiple days due to a thread-safety issue in `GetEmptyDict`. We haven't been able to fix it quickly, so reverting.

Simply ran `git revert 6c40806e51a89386d2b066fddf73d3fd03a36f65`. There were no merge conflicts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4347

Differential Revision: D9668365

Pulled By: ajkr

fbshipit-source-id: 0c56334f0a23cf5ee0233d4e4679eae6709739cd
2018-09-06 09:58:34 -07:00
Andrew Kryczka
1a88c43751 Reduce empty SST creation/deletion in compaction (#4336)
Summary:
This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug.

Also fixed an uninitialized variable in db_stress.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336

Differential Revision: D9600080

Pulled By: ajkr

fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7
2018-08-31 12:28:52 -07:00
Zhongyi Xie
1cf17ba53b Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323)
Summary:
Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323

Differential Revision: D9599170

Pulled By: miasantreble

fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
2018-08-30 18:42:51 -07:00
Shrikanth Shankar
4848bd0c4e Drop unnecessary deletion markers during compaction (issue - 3842) (#4289)
Summary:
This PR fixes issue 3842. We drop deletion markers iff
1. We are the bottom most level AND
2. All other occurrences of the key are in the same snapshot range as the delete

I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far)

make check -j64
db_stress
db_stress  --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */
./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289

Differential Revision: D9491165

Pulled By: shrikanthshankar

fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2
2018-08-24 15:17:54 -07:00
Yanqin Jin
8022500ecc Add compatibility test of SST ingestion (#4310)
Summary:
Test plan
```
$cd rocksdb/
$./tools/check_format_compatible.sh
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4310

Differential Revision: D9498125

Pulled By: riversand963

fbshipit-source-id: 83cf6992949a52199e7812bb41bc9281ac271a24
2018-08-24 14:27:43 -07:00
Andrew Kryczka
e7bb8e9b92 Fix clang build of db_stress (#4312)
Summary:
Blame: #4307
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312

Differential Revision: D9494093

Pulled By: ajkr

fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
2018-08-23 21:57:57 -07:00
Andrew Kryczka
6c40806e51 Digest ZSTD compression dictionary once per SST file (#4251)
Summary:
In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.

ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.

There are a couple other changes included in this PR:

- Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
- Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251

Differential Revision: D9257078

Pulled By: ajkr

fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
2018-08-23 19:28:18 -07:00
Andrew Kryczka
ee234e83e3 Invoke OnTableFileCreated for empty SSTs (#4307)
Summary:
The API comment on `OnTableFileCreationStarted` (b6280d01f9/include/rocksdb/listener.h (L331-L333)) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.

This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307

Differential Revision: D9485201

Pulled By: ajkr

fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
2018-08-23 18:27:30 -07:00
zhichao-cao
cf7150ac2e Add the unit test of Iterator to trace_analyzer_test (#4282)
Summary:
Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282

Differential Revision: D9436758

Pulled By: zhichao-cao

fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
2018-08-23 17:28:32 -07:00
Yanqin Jin
bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00
Fenggang Wu
9d646a6311 Add db_bench options of data block hash index (#4281)
Summary:
Add `--data_block_index_type` and `--data_block_hash_table_util_ratio` option to `db_bench`.

`--data_block_index_type` can be either of `binary` (default) or `binary_and_hash`;
`--data_block_hash_table_util_ratio` will be a double. The default value is `0.75`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4281

Differential Revision: D9361476

Pulled By: fgwu

fbshipit-source-id: dc53e01acef9db81b9eec5e8a96f3bc8ed718c10
2018-08-16 18:42:46 -07:00
Siying Dong
9c0c8f5ff6 GetAllKeyVersions() to take an extra argument of max_num_ikeys. (#4271)
Summary:
Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271

Differential Revision: D9369149

Pulled By: siying

fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf
2018-08-16 15:57:08 -07:00
Zhichao Cao
8ae2bf5331 Fix the build and test bugs in the Trace_analyzer (#4274)
Summary:
The wrong options are used in the trace_analyzer_test, removed. The potential loses integer precision are fixed.

Pass the specified testing case, make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4274

Reviewed By: yiwu-arbug

Differential Revision: D9327811

Pulled By: zhichao-cao

fbshipit-source-id: d62cb18d6586503a490cd323bfc1c672b68b346e
2018-08-14 18:27:48 -07:00
Anand Ananthabhotla
bf07e90cf2 Fix db_stress assertion failures on 0 byte SSTs (#4273)
Summary:
In the OnTableFileCreation() listener, assert on various TableProperties
only when file size > 0 bytes. The listener can get called even for 0
byte SSTs which have been deleted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4273

Differential Revision: D9322738

Pulled By: anand1976

fbshipit-source-id: 17cdfb3d0da946b9a158d7328e5db1c87973956b
2018-08-14 14:58:26 -07:00
Maysam Yabandeh
d122025891 Extend stress test to format_version 4 (#4265)
Summary:
Stress tests currently cover format_version 2 and 3. The patch adds 4 as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4265

Differential Revision: D9323185

Pulled By: maysamyabandeh

fbshipit-source-id: 54d11e41ecae09bae14cadd7313f07c9a3db5a57
2018-08-14 14:13:33 -07:00
Zhichao Cao
999d955e4f RocksDB Trace Analyzer (#4091)
Summary:
A framework of trace analyzing for RocksDB

After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
**Input:**
1. trace file
2. Whole keys space file

**Statistics:**
1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
2. Key hotness (access count) of each one
3. Key space separation based on given prefix
4. Key size distribution
5. Value size distribution if appliable
6. Top K accessed keys
7. QPS statistics including the average QPS and peak QPS
8. Top K accessed prefix
9. The query correlation analyzing, output the number of X after Y and the corresponding average time
    intervals

**Output:**
1. key access heat map (either in the accessed key space or whole key space)
2. trace sequence file (interpret the raw trace file to line base text file for future use)
3. Time serial (The key space ID and its access time)
4. Key access count distritbution
5. Key size distribution
6. Value size distribution (in each intervals)
7. whole key space separation by the prefix
8. Accessed key space separation by the prefix
9. QPS of each operation and each column family
10. Top K QPS and their accessed prefix range

**Test:**
1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
2. Generated the trace and analyze the trace

**Implemented but not tested (due to the limitation of trace_replay):**
1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
2. Analyzing the number of Key found by Get

**Future Work:**
1.  Support execution time analyzing of each requests
2.  Support cache hit situation and block read situation of Get
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091

Differential Revision: D9256157

Pulled By: zhichao-cao

fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
2018-08-13 11:44:02 -07:00
Yanqin Jin
1b1d264342 Remove an assersion about file size (#4268)
Summary:
Due to 4ea56b1bd0, we should also remove the
assersion in stress test. This removal can be temporary, and we can add it back
once we figure out the reason for the 0-byte SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4268

Differential Revision: D9297186

Pulled By: riversand963

fbshipit-source-id: cebba9a68f42e815f8cf24471176d2cfdf962f63
2018-08-13 11:12:50 -07:00
Yanqin Jin
b271f956c2 Fix a TSAN failure (#4250)
Summary:
TSAN fails due to comparison between signed int and unsigned long. Fix it by
static_casting.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4250

Differential Revision: D9256535

Pulled By: riversand963

fbshipit-source-id: c6bad23ff70c6d0ec58e2e85c401ce0ad45de609
2018-08-09 19:42:32 -07:00
Dmitri Smirnov
ab22cf349e Implement Env::NumFileLinks (#4221)
Summary:
Although delete scheduler implementation allows for the interface not to be supported, the delete_scheduler_test does not allow for that.
Address compiler warnings
Make sst_dump_test use test directory structure as the current execution directory may not be writiable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4221

Differential Revision: D9210152

Pulled By: siying

fbshipit-source-id: 381a74511e969ecb8089d5c4b4df87dc30c8df63
2018-08-09 14:29:11 -07:00
Yanqin Jin
de7f423a82 Add SST ingestion to ldb (#4205)
Summary:
We add two subcommands `write_extern_sst` and `ingest_extern_sst` to ldb. This PR avoids changing existing code because we hope to cherry-pick to earlier releases to support compatibility check for external SST file ingestion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4205

Differential Revision: D9112711

Pulled By: riversand963

fbshipit-source-id: 7cae88380d4de86da8440230e87eca66755648e4
2018-08-09 14:29:11 -07:00
Andrew Kryczka
7a9a164276 Fix db_bench default compression level (#4248)
Summary:
db_bench's previous default compression level (-1) was not the default compression level in all libraries. In particular, in ZSTD negative values are valid compression levels, while ZSTD's default compression level is three.

This PR changes db_bench's default to be RocksDB's library-independent default compression level (see #3895). I also changed a couple other flags to get their default values from an options object directly rather than hardcoding.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4248

Differential Revision: D9235140

Pulled By: ajkr

fbshipit-source-id: be4e0722d59fa1968832183db36d1d20fcf11e5b
2018-08-09 10:28:14 -07:00
Andrew Kryczka
6175b4b294 Support dictionary compression in stress/crash tests (#4234)
Summary:
- Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test
- Randomly enable/disable the above flags in crash test
- Set `--compression_type=zstd` in FB-specific crash test runs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234

Differential Revision: D9187207

Pulled By: ajkr

fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2
2018-08-06 15:27:29 -07:00
Sagar Vemuri
fefdac1004 Fix lite build failure in db_bench due to trace/replay (#4225)
Summary:
Fix lite build failure in db_bench due to trace/replay feature.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4225

Differential Revision: D9153303

Pulled By: sagar0

fbshipit-source-id: 9f7a8035429d0dcdbe99616d11389ed7bccf44be
2018-08-03 11:58:55 -07:00
Pooja Malik
9dbf39399e Rules Advisor: some fixes to support fetching stats from ODS (#4223)
Summary:
This PR includes fixes for some bugs that I encountered while testing the Optimizer with ODS stats support.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4223

Differential Revision: D9140786

Pulled By: poojam23

fbshipit-source-id: 045cb3f27d075c2042040ac2d561938349419516
2018-08-02 15:42:42 -07:00
Pooja Malik
892a156267 Advisor: README and blog, and also tests for DBBenchRunner, DatabaseOptions (#4201)
Summary:
This pull request adds a README file and a blog post for the Advisor tool. It also adds the missing tests for some Optimizer modules. Some comments are added to the classes being tested for improved readability.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4201

Reviewed By: maysamyabandeh

Differential Revision: D9125311

Pulled By: poojam23

fbshipit-source-id: aefcf2f06eaa05490cc2834ef5aa6e21f0d1dc55
2018-08-01 16:13:09 -07:00
Sagar Vemuri
12b6cdeed3 Trace and Replay for RocksDB (#3837)
Summary:
A framework for tracing and replaying RocksDB operations.

A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench.

- Column-families are supported
- Multi-threaded tracing is supported.
- TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it.
- This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations.

Currently supported DB operations:
- Writes:
-- Put
-- Merge
-- Delete
-- SingleDelete
-- DeleteRange
-- Write
- Reads:
-- Get (point lookups)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837

Differential Revision: D7974837

Pulled By: sagar0

fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72
2018-08-01 00:27:08 -07:00
Yanqin Jin
8abafb1feb Generalize parameters generation. (#4046)
Summary:
Making generation of column families and keys virtual function so that
subclasses of StressTest can override them to provide custom parameter
generation for more flexibility. This will be useful for future tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4046

Differential Revision: D9073382

Pulled By: riversand963

fbshipit-source-id: 2754f0fdfa5c24d95c1f92d4944bc479552fb665
2018-07-30 17:42:12 -07:00
Yanqin Jin
54de56844d Remove random writes from SST file ingestion (#4172)
Summary:
RocksDB used to store global_seqno in external SST files written by
SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the
`global_seqno`. Since random write is not supported in some non-POSIX compliant
file systems, external SST file ingestion is not supported on these file
systems. To address this limitation, we no longer update `global_seqno` during
file ingestion. Later RocksDB uses the MANIFEST and other information in table
properties to deduce global seqno for externally-ingested SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172

Differential Revision: D8961465

Pulled By: riversand963

fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071
2018-07-27 16:12:23 -07:00
DorianZheng
f5e46354d2 Protect external file when ingesting (#4099)
Summary:
If crash happen after a hard link established, Recover function may reuse the file number that has already assigned to the internal file, and this will overwrite the external file. To protect the external file, we have to make sure the file number will never being reused.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/4099

Differential Revision: D9034092

Pulled By: riversand963

fbshipit-source-id: 3f1a737440b86aa2ef01673e5013aacbb7c33e28
2018-07-27 14:13:12 -07:00
Pooja Malik
134a52e144 Optimizer's skeleton: use advisor to optimize config options (#4169)
Summary:
In https://github.com/facebook/rocksdb/pull/3934 we introduced advisor scripts that make suggestions in the config options based on the log file and stats from a run of rocksdb. The optimizer runs the advisor on a benchmark application in a loop and automatically applies the suggested changes until the config options are optimized. This is a work in progress and the patch is the initial skeleton for the optimizer. The sample application that is run in the loop is currently dbbench.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4169

Reviewed By: maysamyabandeh

Differential Revision: D9023671

Pulled By: poojam23

fbshipit-source-id: a6192d475c462cf6eb2b316716f97cb400fcb64d
2018-07-26 17:13:32 -07:00
Siying Dong
4b0a43574a db_stress to cover upper bound in iterators (#4162)
Summary:
db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162

Differential Revision: D8953507

Pulled By: siying

fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b
2018-07-23 10:45:29 -07:00
Zhichao Cao
6811fb0658 Fixed the db_bench MergeRandom only access CF_default (#4155)
Summary:
When running the tracing and analyzing, I found that MergeRandom benchmark in db_bench only access the default column family even the -num_column_families is specified > 1.

changes: Using the db_with_cfh as DB to randomly select the column family to execute the Merge operation if -num_column_families is specified > 1.

Tested with make asan_check and verified in tracing
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4155

Differential Revision: D8907888

Pulled By: zhichao-cao

fbshipit-source-id: 2b4bc8fe0e99c8f262f5be6b986c7025d62cf850
2018-07-20 15:58:54 -07:00
Siying Dong
a5e851e113 Reformatting some recent changes (#4161)
Summary:
Lint is not happy with some new code recently committed. Format them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4161

Differential Revision: D8940582

Pulled By: siying

fbshipit-source-id: c9b43b1ef8c88b5e923911058b44eb77234b36b7
2018-07-20 14:43:38 -07:00
Pooja Malik
1857576e03 db_bench support for OPTIONS+bloom and nicer output for perf_context (#4153)
Summary:
Adding the string "PERF_CONTEXT:" before the perf_context stats are printed. Setting the filter policy if it's a block based table even when options are being loaded from the provided FLAGS_options_file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4153

Differential Revision: D8905517

Pulled By: poojam23

fbshipit-source-id: 5956ed7882d39ec8ae654d5dadeb88727a36f0dd
2018-07-18 16:27:49 -07:00
Maysam Yabandeh
8581a93a6b Per-thread unique test db names (#4135)
Summary:
The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel.
Example: ``` ~/gtest-parallel/gtest-parallel ./table_test```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135

Differential Revision: D8846653

Pulled By: maysamyabandeh

fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84
2018-07-13 17:27:39 -07:00
Zhongyi Xie
23b76252c8 db_bench: enable setting cache_size when loading options file
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4118

Differential Revision: D8845554

Pulled By: miasantreble

fbshipit-source-id: 13bd3c1259a7c30bad762a413fe3bb24eea650ba
2018-07-13 16:43:53 -07:00
Zhongyi Xie
de98fd88e3 Support compaction filter in db_bench (#4106)
Summary:
Right now there is no support for enabling compaction filter in db_bench, we should add support for that to facilitate testing of compaction filter.
This PR adds a compaction filter called KeepFilter and make `Filter` always returns false, essentially a noop compaction filter. This will allow us to test compaction filter code path without having to support arbitrary compaction filters
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4106

Differential Revision: D8828517

Pulled By: miasantreble

fbshipit-source-id: 9ad76d04103eaa9d00da98334b4a39e542d26c41
2018-07-12 19:42:27 -07:00
Andrew Kryczka
97fe23fc5c Fix unsigned int flag in db_bench (#4129)
Summary:
`DEFINE_uint32` was unavailable on some platforms, e.g., https://travis-ci.org/facebook/rocksdb/jobs/403352902. Use `DEFINE_uint64` instead which should work as it's used many times elsewhere in this file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4129

Differential Revision: D8830311

Pulled By: ajkr

fbshipit-source-id: b4fc90ba3f50e649c070ce8069c68e530d731f05
2018-07-12 18:43:23 -07:00
Andrew Kryczka
63904434eb db_bench periodically dump stats to info log (#4109)
Summary:
give control of how often stats are printed, including jemalloc stats if enabled. Previously the default was 10 minutes so we'd only see updated stats for very long benchmark runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4109

Differential Revision: D8796444

Pulled By: ajkr

fbshipit-source-id: fd7902fe3f105fae89322c4ab63316bba4a2b15e
2018-07-12 15:57:42 -07:00
Manuel Ung
b9846370e9 WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078)
Summary:
This adds support for recovering WriteUnprepared transactions through the following changes:
- The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
- `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
- `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.

Commit/Rollback of live transactions is still unimplemented and will come later.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078

Differential Revision: D8703382

Pulled By: lth

fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
2018-07-06 17:59:13 -07:00
Maysam Yabandeh
235ab9dd32 Pin mmap files in ReadOnlyDB (#4053)
Summary:
https://github.com/facebook/rocksdb/pull/3881 fixed a bug where PinnableSlice pin mmap files which could be deleted with background compaction. This is however a non-issue for ReadOnlyDB when there is no compaction running and max_open_files is -1. This patch reenables the pinning feature for that case.
Closes https://github.com/facebook/rocksdb/pull/4053

Differential Revision: D8662546

Pulled By: maysamyabandeh

fbshipit-source-id: 402962602eb0f644e17822748332999c3af029fd
2018-06-27 17:13:34 -07:00
Peter (Stig) Edwards
2694b6dc26 Remove unused imports, from python scripts. (#4057)
Summary:
Also remove redefined variable.
As reported on https://lgtm.com/projects/g/facebook/rocksdb/
Closes https://github.com/facebook/rocksdb/pull/4057

Differential Revision: D8648342

Pulled By: ajkr

fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183
2018-06-26 12:43:04 -07:00
Yanqin Jin
2729dd72ad Reclaim memory allocated to backup_engine.
Summary: Closes https://github.com/facebook/rocksdb/pull/4045

Differential Revision: D8595609

Pulled By: riversand963

fbshipit-source-id: 5ba5954d804b82b0e7264b2e18e1da4c94103b53
2018-06-23 17:12:14 -07:00
Maysam Yabandeh
80ade9ad83 Pin top-level index on partitioned index/filter blocks (#4037)
Summary:
Top-level index in partitioned index/filter blocks are small and could be pinned in memory. So far we use that by cache_index_and_filter_blocks to false. This however make it difficult to keep account of the total memory usage. This patch introduces pin_top_level_index_and_filter which in combination with cache_index_and_filter_blocks=true keeps the top-level index in cache and yet pinned them to avoid cache misses and also cache lookup overhead.
Closes https://github.com/facebook/rocksdb/pull/4037

Differential Revision: D8596218

Pulled By: maysamyabandeh

fbshipit-source-id: 3a5f7f9ca6b4b525b03ff6bd82354881ae974ad2
2018-06-22 15:27:46 -07:00
Yi Wu
c726f7fda8 Fix dangling checkpoint pointer in db_stress (#4042)
Summary:
Fix db_stress failed to delete checkpoint pointer. It's caught by asan_crash test.
Closes https://github.com/facebook/rocksdb/pull/4042

Differential Revision: D8592604

Pulled By: yiwu-arbug

fbshipit-source-id: 7b2d67d5e3dfb05f71c33fcf320482303e97d3ef
2018-06-22 11:43:50 -07:00
Andrew Kryczka
0a5b16c7c5 Cleanup staging directory at start of checkpoint (#4035)
Summary:
- Attempt to clean the checkpoint staging directory before starting a checkpoint. It was already cleaned up at the end of checkpoint. But it wasn't cleaned up in the edge case where the process crashed while staging checkpoint files.
- Attempt to clean the checkpoint directory before calling `Checkpoint::Create` in `db_stress`. This handles the case where checkpoint directory was created by a previous `db_stress` run but the process crashed before cleaning it up.
- Use `DestroyDB` for cleaning checkpoint directory since a checkpoint is a DB.
Closes https://github.com/facebook/rocksdb/pull/4035

Reviewed By: yiwu-arbug

Differential Revision: D8580223

Pulled By: ajkr

fbshipit-source-id: 28c667400e249fad0fdedc664b349031b7b61599
2018-06-21 16:27:12 -07:00
Yanqin Jin
397495964b Fix a warning (treated as error) caused by type mismatch.
Summary: Closes https://github.com/facebook/rocksdb/pull/4032

Differential Revision: D8573061

Pulled By: riversand963

fbshipit-source-id: 112324dcb35956d6b3ec891073f4f21493933c8b
2018-06-21 11:13:09 -07:00
Yanqin Jin
524c6e6b72 Add file name info to SequentialFileReader. (#4026)
Summary:
We potentially need this information for tracing, profiling and diagnosis.
Closes https://github.com/facebook/rocksdb/pull/4026

Differential Revision: D8555214

Pulled By: riversand963

fbshipit-source-id: 4263e06c00b6d5410b46aa46eb4e358ff2161dd2
2018-06-21 08:42:24 -07:00
Andrew Kryczka
14cee194d6 Support file ingestion in stress test (#4018)
Summary:
Once per `ingest_external_file_one_in` operations, uses SstFileWriter to create a file containing `ingest_external_file_width` consecutive keys. The file is named containing the thread ID to avoid clashes. The file is then added to the DB using `IngestExternalFile`.

We can't enable it by default in crash test because `nooverwritepercent` and `test_batches_snapshot` both must be zero for the DB's whole lifetime. Perhaps we should setup a separate test with that config as range deletion also requires it.
Closes https://github.com/facebook/rocksdb/pull/4018

Differential Revision: D8507698

Pulled By: ajkr

fbshipit-source-id: 1437ea26fd989349a9ce8b94117241c65e40f10f
2018-06-20 22:27:45 -07:00
Andrew Kryczka
7f3a634e06 Support pipelined write in stress/crash tests
Summary: Closes https://github.com/facebook/rocksdb/pull/4019

Differential Revision: D8508681

Pulled By: ajkr

fbshipit-source-id: 23a3c07d642386446e322b02e69cdf70d12ef009
2018-06-19 09:14:12 -07:00
Andrew Kryczka
8585059ae0 Support backup and checkpoint in db_stress (#4005)
Summary:
Add the `backup_one_in` and `checkpoint_one_in` options to periodically trigger backups and checkpoints. The directory names contain thread ID to avoid clashing with parallel backups/checkpoints. Enable checkpoint in crash test so our CI runs will use it. Didn't enable backup in crash test since it copies all the files which is too slow.
Closes https://github.com/facebook/rocksdb/pull/4005

Differential Revision: D8472275

Pulled By: ajkr

fbshipit-source-id: ff91bdc37caac4ffd97aea8df96b3983313ac1d5
2018-06-18 19:28:18 -07:00
Andrew Kryczka
de2c6fb158 Fix stderr processing in crash test (#4006)
Summary:
Fixed bug where `db_stress` output a line with a warning followed by a line with an error, and `db_crashtest.py` considered that a success. For example:

```
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
open error: Corruption: SST file is ahead of WALs
```
Closes https://github.com/facebook/rocksdb/pull/4006

Differential Revision: D8473463

Pulled By: ajkr

fbshipit-source-id: 60461bdd7491d9d26c63f7d4ee522a0f88ba3de7
2018-06-18 17:58:13 -07:00
Hans-Wilhelm Warlo
4faaab70a6 Benchmark sine wave write rate limit (#3914)
Summary:
As mentioned at the [dev forum.](https://www.facebook.com/groups/rocksdb.dev/1693425187422655/)

Let me know if you would like me to do any changes!
Closes https://github.com/facebook/rocksdb/pull/3914

Differential Revision: D8452824

Pulled By: siying

fbshipit-source-id: 56439b3228ecdcc5a199d5198eff2fab553be961
2018-06-15 12:12:03 -07:00
Siying Dong
f5281a53a4 tools/check_format_compatible.sh to cover forward option reading too (#3994)
Summary:
Make sure that some recent releases can read master's option files while ignoring unknown options. Also add two more recent release branches.
Closes https://github.com/facebook/rocksdb/pull/3994

Differential Revision: D8409499

Pulled By: siying

fbshipit-source-id: 1b025f19ba288da0517f6b4572797573e23e23c2
2018-06-15 11:12:29 -07:00
Andrew Kryczka
7497f992e0 Run manual compaction in stress/crash tests (#3936)
Summary:
- Add support to `db_stress` for `CompactRange`
- Enable `CompactRange` and `CompactFiles` in crash tests
Closes https://github.com/facebook/rocksdb/pull/3936

Differential Revision: D8230953

Pulled By: ajkr

fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8
2018-06-13 16:45:28 -07:00
Andrew Kryczka
dd216dd76a Choose unique keys faster in db_stress (#3990)
Summary:
db_stress initialization randomly chooses a set of keys to not overwrite. It was doing it separately for each column family. That caused 30+ second initialization times for the non-simple crash tests, which have 10 CFs. This PR:

- reuses the same set of randomly chosen no-overwrite keys across all CFs
- logs a couple more timestamps so we can more easily see initialization time
Closes https://github.com/facebook/rocksdb/pull/3990

Differential Revision: D8393821

Pulled By: ajkr

fbshipit-source-id: d0b263a298df607285ffdd8b0983ff6575cc6c34
2018-06-13 13:43:23 -07:00
Yanqin Jin
3470c75852 Fix build errors.
Summary: Closes https://github.com/facebook/rocksdb/pull/3967

Differential Revision: D8322775

Pulled By: riversand963

fbshipit-source-id: bd73067bd5d3ed4627348f0685bc499359ad6442
2018-06-07 15:43:09 -07:00
Zhichao Cao
23e1d23675 Fixed the fprintf of uint64_t by using PRIu64 (#3963)
Summary:
Fixed the fprintf format of uint64_t by using PRIu64 in file tools/ldb_cmd.cc
Closes https://github.com/facebook/rocksdb/pull/3963

Differential Revision: D8306179

Pulled By: zhichao-cao

fbshipit-source-id: 597dcd55321576801bbf2cf4714736ebc4750a0c
2018-06-07 11:44:48 -07:00
Yanqin Jin
0a0860a5fb Refactoring db_stress.cc (#3902)
Summary:
We use `db_stress.cc` intensively to test and verify the behavior of RocksDB. Sometimes we need to add new tests for recently added features. Original `StressTest` class provides many general functionality that can be leveraged by other tests. Therefore, in this refactoring PR, I try to identify the general operations as well as operations that future tests most likely want to customize. Future tests can inherit `StressTest` and overriding the virtual functions to test custom logic.
Closes https://github.com/facebook/rocksdb/pull/3902

Differential Revision: D8284607

Pulled By: riversand963

fbshipit-source-id: 019302d04665a2b18334b6d05d04a477168c8ea4
2018-06-07 10:43:00 -07:00
Pooja Malik
5504a056f8 Adding advisor Rules and parser scripts with unit tests. (#3934)
Summary:
This adds some rules in the tools/advisor/advisor/rules.ini (refer this for more information) file and corresponding python parser scripts for parsing the rules file and the rocksdb LOG and OPTIONS files. This is WIP for adding rules depending on ODS. The starting point of the script is the rocksdb/tools/advisor/advisor/rule_parser.py file.
Closes https://github.com/facebook/rocksdb/pull/3934

Reviewed By: maysamyabandeh

Differential Revision: D8304059

Pulled By: poojam23

fbshipit-source-id: 47f2a50f04d46d40e225dd1cbf58ba490f79e239
2018-06-06 14:42:59 -07:00
Zhongyi Xie
f1592a06c2 run make format for PR 3838 (#3954)
Summary:
PR https://github.com/facebook/rocksdb/pull/3838 made some changes that triggers lint warnings.
Run `make format` to fix formatting as suggested by siying .
Also piggyback two changes:
1) fix singleton destruction order for windows and posix env
2) fix two clang warnings
Closes https://github.com/facebook/rocksdb/pull/3954

Differential Revision: D8272041

Pulled By: miasantreble

fbshipit-source-id: 7c4fd12bd17aac13534520de0c733328aa3c6c9f
2018-06-05 12:58:02 -07:00
Maysam Yabandeh
d0c38c0c8c Extend some tests to format_version=3 (#3942)
Summary:
format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3.
Closes https://github.com/facebook/rocksdb/pull/3942

Differential Revision: D8238413

Pulled By: maysamyabandeh

fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035
2018-06-04 20:13:00 -07:00
Dmitri Smirnov
f4b72d7056 Provide a way to override windows memory allocator with jemalloc for ZSTD
Summary:
Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably.

For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance.

The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression.
Closes https://github.com/facebook/rocksdb/pull/3838

Differential Revision: D8229794

Pulled By: miasantreble

fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d
2018-06-04 12:12:48 -07:00
Andrew Kryczka
4f297ad05f Fix crash test check for direct I/O
Summary:
We need to keep the DB directory around since the direct IO check in "db_crashtest.py" relies on it existing. This PR fixes an issue where it was removed after each stress test run during the second half of whitebox crash testing.
Closes https://github.com/facebook/rocksdb/pull/3946

Differential Revision: D8247998

Pulled By: ajkr

fbshipit-source-id: 4e7cffbdab9b40df125e7842d0d59916e76261d3
2018-06-03 21:42:12 -07:00
Andrew Kryczka
88c3ee2d31 Configure direct I/O statically in db_stress
Summary:
Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested):

- It's a DB option so SetDBOptions should've been called instead
- It's not a dynamic option so even SetDBOptions would fail
- It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU.

In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT.
Closes https://github.com/facebook/rocksdb/pull/3939

Differential Revision: D8238120

Pulled By: ajkr

fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279
2018-06-01 16:42:34 -07:00
Jacquin Mininger
727eb881a5 Compile error in db bench tool
Summary:
Small format error below causes build to fail. I believe that this :
```
fprintf(stderr, "num reads to do %lu\n", reads_);
```
Can be changed to this:
```
fprintf(stderr, "num reads to do %" PRIu64 "\n", reads_);
```
Successful build
```
  CC       utilities/blob_db/blob_dump_tool.o
  AR       librocksdb_debug.a
ar: creating archive librocksdb_debug.a
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: librocksdb_debug.a(rocks_lua_compaction_filter.o) has no symbols
  CC       tools/db_bench.o
  CC       tools/db_bench_tool.o
tools/db_bench_tool.cc:4532:46: error: format specifies type 'unsigned long' but the argument has type 'int64_t' (aka 'long long') [-Werror,-Wformat]
    fprintf(stderr, "num reads to do %lu\n", reads_);
                                     ~~~     ^~~~~~
                                     %lld
1 error generated.
make: *** [tools/db_bench_tool.o] Error 1
```

```
$ cd rocksdb
$ make all

$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```
Closes https://github.com/facebook/rocksdb/pull/3909

Differential Revision: D8215710

Pulled By: siying

fbshipit-source-id: 15e49fb02a818fec846e9f9b2a50e372b6b67751
2018-05-30 18:01:36 -07:00
Yi Wu
bc7e8d472e LRUCache midpoint insertion
Summary:
Implement midpoint insertion strategy where new blocks will be insert to the middle of LRU list, then move the head on the first hit in cache.
Closes https://github.com/facebook/rocksdb/pull/3877

Differential Revision: D8100895

Pulled By: yiwu-arbug

fbshipit-source-id: f4bd83cb8be469e5d02072cfc8bd66011391f3da
2018-05-24 15:57:33 -07:00
Dmitri Smirnov
3db8504cde Catchup with posix features
Summary:
Catch up with Posix features
  NewWritableRWFile must fail when file does not exists
  Implement Env::Truncate()
  Adjust Env options optimization functions
  Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857

Differential Revision: D8053610

Pulled By: ajkr

fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
2018-05-24 15:13:04 -07:00
Andrew Kryczka
fcb31016e9 Avoid single-deleting merge operands in db_stress
Summary:
I repro'd some of the "unexpected value" failures showing up in our CI lately and they always happened on keys that have a mix of single deletes and merge operands. The `SingleDelete()` API comment mentions it's incompatible with `Merge()`, so this PR prevents `db_stress` from mixing them.
Closes https://github.com/facebook/rocksdb/pull/3878

Differential Revision: D8097346

Pulled By: ajkr

fbshipit-source-id: 357a48c6a31156f4f8db3ce565638ad924c437a1
2018-05-22 10:58:36 -07:00
Zhongyi Xie
c3ebc75843 Move prefix_extractor to MutableCFOptions
Summary:
Currently it is not possible to change bloom filter config without restart the db, which is causing a lot of operational complexity for users.
This PR aims to make it possible to dynamically change bloom filter config.
Closes https://github.com/facebook/rocksdb/pull/3601

Differential Revision: D7253114

Pulled By: miasantreble

fbshipit-source-id: f22595437d3e0b86c95918c484502de2ceca120c
2018-05-21 14:43:11 -07:00
Yanqin Jin
a0c7b4d526 Set the default value of max_manifest_file_size.
Summary:
In the past, the default value of max_manifest_file_size is uint64_t::MAX,
allowing a long running RocksDB process to grow its MANIFEST file to take up
the entire disk, as reported in [issue 3851](https://github.com/facebook/rocksdb/issues/3851). It is reasonable and common to provide a default non-max value for this option. Therefore, I set the value to 1GB.

siying miasantreble Please let me know whether this looks good to you. Thanks!
Closes https://github.com/facebook/rocksdb/pull/3867

Differential Revision: D8051524

Pulled By: riversand963

fbshipit-source-id: 50251f0804b1fa933a19a30d19d261ea8b9d2b72
2018-05-18 08:11:55 -07:00
Sagar Vemuri
ebb823f746 Fix db_stress build on mac
Summary:
I noticed, while debugging an unrelated issue, that db_stress is failing to build on mac, leading to a failed `make all`.
```
$ make db_stress -j4
...
tools/db_stress.cc:862:69: error: cannot initialize a parameter of type 'uint64_t *' (aka 'unsigned long long *') with an rvalue of type 'size_t *' (aka 'unsigned long *')
        status = FLAGS_env->GetFileSize(FLAGS_expected_values_path, &size);
                                                                    ^~~~~
./include/rocksdb/env.h:277:66: note: passing argument to parameter 'file_size' here
  virtual Status GetFileSize(const std::string& fname, uint64_t* file_size) = 0;
                                                                 ^
1 error generated.
make: *** [tools/db_stress.o] Error 1
make: *** Waiting for unfinished jobs....
```
Closes https://github.com/facebook/rocksdb/pull/3839

Differential Revision: D7979236

Pulled By: sagar0

fbshipit-source-id: 0615e7bb5405bade71e4203803bf723720422d62
2018-05-14 11:14:07 -07:00
Andrew Kryczka
072ae671a7 Apply use_direct_io_for_flush_and_compaction to writes only
Summary:
Previously `DBOptions::use_direct_io_for_flush_and_compaction=true` combined with `DBOptions::use_direct_reads=false` could cause RocksDB to simultaneously read from two file descriptors for the same file, where background reads used direct I/O and foreground reads used buffered I/O. Our measurements found this mixed-mode I/O negatively impacted foreground read perf, compared to when only buffered I/O was used.

This PR makes the mixed-mode I/O situation impossible by repurposing `DBOptions::use_direct_io_for_flush_and_compaction` to only apply to background writes, and `DBOptions::use_direct_reads` to apply to all reads. There is no risk of direct background direct writes happening simultaneously with buffered reads since we never read from and write to the same file simultaneously.
Closes https://github.com/facebook/rocksdb/pull/3829

Differential Revision: D7915443

Pulled By: ajkr

fbshipit-source-id: 78bcbf276449b7e7766ab6b0db246f789fb1b279
2018-05-09 19:42:58 -07:00
Andrew Kryczka
d19f568abf Refactor argument handling in db_crashtest.py
Summary:
- Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option.
- Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default
- Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params.
- Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication
Closes https://github.com/facebook/rocksdb/pull/3809

Differential Revision: D7885779

Pulled By: ajkr

fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8
2018-05-09 13:42:41 -07:00
Andrew Kryczka
4c5a3232e4 Fix db_stress memory leak ASAN error
Summary:
In case `--expected_values_path` is unset, we allocate a buffer internally to hold the expected DB state. This PR makes sure it is freed.
Closes https://github.com/facebook/rocksdb/pull/3804

Differential Revision: D7874694

Pulled By: ajkr

fbshipit-source-id: a8f7655e009507c4e639ceebfc3525d69c856e3b
2018-05-04 16:45:15 -07:00
Zhongyi Xie
a703432808 MaxFileSizeForLevel: adjust max_file_size for dynamic level compaction
Summary:
`MutableCFOptions::RefreshDerivedOptions` always assume base level is L1, which is not true when `level_compaction_dynamic_level_bytes=true` and Level based compaction is used.
This PR fixes this by recomputing `max_file_size` at query time (in `MaxFileSizeForLevel`)
Fixes https://github.com/facebook/rocksdb/issues/3229

In master:

```
Level Files Size(MB)
--------------------
  0       14      846
  1        0        0
  2        0        0
  3        0        0
  4        0        0
  5       15      366
  6       11      481
Cumulative compaction: 3.83 GB write, 2.27 GB read
```
In branch:
```
Level Files Size(MB)
--------------------
  0        9      544
  1        0        0
  2        0        0
  3        0        0
  4        0        0
  5        0        0
  6      445      935
Cumulative compaction: 2.91 GB write, 1.46 GB read
```

db_bench command used:
```
./db_bench --benchmarks="fillrandom,deleterandom,fillrandom,levelstats,stats" --statistics -deletes=5000 -db=tmp -compression_type=none --num=20000 -value_size=100000 -level_compaction_dynamic_level_bytes=true -target_file_size_base=2097152 -target_file_size_multiplier=2
```
Closes https://github.com/facebook/rocksdb/pull/3755

Differential Revision: D7721381

Pulled By: miasantreble

fbshipit-source-id: 39afb8503190bac3b466adf9bbf2a9b3655789f8
2018-05-03 16:42:13 -07:00
Dmitri Smirnov
acb61b7a52 Adjust pread/pwrite to return Status
Summary:
Returning bytes_read causes the caller to call GetLastError()
  to report failure but the lasterror may be overwritten by then
  so we lose the error code.
  Fix up CMake file to include xpress source code only when needed.
  Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795

Differential Revision: D7832935

Pulled By: anand1976

fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b
2018-05-01 13:42:46 -07:00
Andrew Kryczka
46152d53bf Second attempt at db_stress crash-recovery verification
Summary:
- Original commit: a4fb1f8c04
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e

This PR includes the contents of the original commit plus two bug fixes, which are:

- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793

Differential Revision: D7811671

Pulled By: ajkr

fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
2018-04-30 12:27:34 -07:00
Andrew Kryczka
6afe22db2e revert db_stress crash-recovery verification
Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786

Differential Revision: D7794516

Pulled By: ajkr

fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33
2018-04-27 12:57:01 -07:00
Zhongyi Xie
459bb9028f remove prefixscanrandom from db_bench help
Summary:
fix issue reported in https://github.com/facebook/rocksdb/issues/3757
Closes https://github.com/facebook/rocksdb/pull/3784

Differential Revision: D7794107

Pulled By: miasantreble

fbshipit-source-id: 43535074fcb82adb5656bcb916284b2dfc5cbb64
2018-04-27 12:13:19 -07:00
Andrew Kryczka
db36f222d8 Allow options file in db_stress and db_crashtest
Summary:
- When options file is provided to db_stress, take supported options from the file instead of from flags
- Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file
- Allow options filename to be passed via `db_crashtest.py`
Closes https://github.com/facebook/rocksdb/pull/3768

Differential Revision: D7755331

Pulled By: ajkr

fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a
2018-04-26 18:42:07 -07:00
Andrew Kryczka
a4fb1f8c04 Add crash-recovery correctness check to db_stress
Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
2018-04-24 15:58:22 -07:00
Gabriel Wicke
090c78a0d7 Support lowering CPU priority of background threads
Summary:
Background activities like compaction can negatively affect
latency of higher-priority tasks like request processing. To avoid this,
rocksdb already lowers the IO priority of background threads on Linux
systems. While this takes care of typical IO-bound systems, it does not
help much when CPU (temporarily) becomes the bottleneck. This is
especially likely when using more expensive compression settings.

This patch adds an API to allow for lowering the CPU priority of
background threads, modeled on the IO priority API. Benchmarks (see
below) show significant latency and throughput improvements when CPU
bound. As a result, workloads with some CPU usage bursts should benefit
from lower latencies at a given utilization, or should be able to push
utilization higher at a given request latency target.

A useful side effect is that compaction CPU usage is now easily visible
in common tools, allowing for an easier estimation of the contribution
of compaction vs. request processing threads.

As with IO priority, the implementation is limited to Linux, degrading
to a no-op on other systems.
Closes https://github.com/facebook/rocksdb/pull/3763

Differential Revision: D7740096

Pulled By: gwicke

fbshipit-source-id: e5d32373e8dc403a7b0c2227023f9ce4f22b413c
2018-04-24 08:41:51 -07:00
Zhongyi Xie
8a9c7f71c9 fix compilation error: implicit conversion loses integer precision
Summary:
Fix compilation error with clang:
> tools/db_stress.cc:2598:21: error: implicit conversion loses integer precision: 'gflags::uint64' (aka 'unsigned long') to 'uint32_t' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32]
        Random rand(FLAGS_seed);
               ~~~~ ^~~~~~~~~~
Closes https://github.com/facebook/rocksdb/pull/3746

Differential Revision: D7703209

Pulled By: miasantreble

fbshipit-source-id: 18c56a5138a2f308e4213594bc82e8e64bc21570
2018-04-19 18:57:43 -07:00
Maysam Yabandeh
6d06be22c0 Improve db_stress with transactions
Summary:
db_stress was already capable running transactions by setting use_txn. Running it under stress showed a couple of problems fixed in this patch.
- The uncommitted transaction must be either rolled back or commit after recovery.
- Current implementation of WritePrepared transaction cannot handle cf drop before crash. Clarified that in the comments and added safety checks. When running with use_txn, clear_column_family_one_in must be set to 0.
Closes https://github.com/facebook/rocksdb/pull/3733

Differential Revision: D7654419

Pulled By: maysamyabandeh

fbshipit-source-id: a024bad80a9dc99677398c00d29ff17d4436b7f3
2018-04-18 16:32:35 -07:00
Yanqin Jin
2ee1496c43 Add missing whitespace.
Summary: Closes https://github.com/facebook/rocksdb/pull/3729

Differential Revision: D7645465

Pulled By: riversand963

fbshipit-source-id: a64da0960fe6c39847ef848b8888fe9a9c1df25d
2018-04-17 09:57:40 -07:00
Yi Wu
2c2f388897 db_bench fillXXXdeterministic should respect compression type
Summary:
db_bench fillXXXdeterministic should respect compression type when calling CompactFiles().
Closes https://github.com/facebook/rocksdb/pull/3731

Differential Revision: D7647761

Pulled By: yiwu-arbug

fbshipit-source-id: 15e12429e0dd93ece2231b015f2e26c2d94781e6
2018-04-16 18:01:47 -07:00
Zhongyi Xie
af95aecd01 use delete[] to dealloc an array
Summary:
fix a bug in `db_stress` where an int array was incorrectly deallocated using delete instead of delete[]
Closes https://github.com/facebook/rocksdb/pull/3725

Differential Revision: D7634749

Pulled By: miasantreble

fbshipit-source-id: 489b776f5f4c03de1824edac5495787ec19cc910
2018-04-15 23:56:39 -07:00
Zhongyi Xie
954b496b3f fix memory leak in two_level_iterator
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
2018-04-15 17:26:26 -07:00
Amy Tai
28087acd79 Implemented Knuth shuffle to construct permutation for selecting no_o…
Summary:
…verwrite_keys. Also changed each no_overwrite_key set to an unordered set, otherwise Knuth shuffle only gets you 2x time improvement, because insertion (and subsequent internal sorting) into an ordered set is the bottleneck.

With this change, each iteration of permutation construction and prefix selection takes around 40 secs, as opposed to 360 secs previously. However, this still means that with the default 10 CF per blackbox test case, the test is going to time out given the default interval of 200 secs.

Also, there is currently an assertion error affecting all blackbox tests in db_crashtest.py; this assertion error will be fixed in a future PR.
Closes https://github.com/facebook/rocksdb/pull/3699

Differential Revision: D7624616

Pulled By: amytai

fbshipit-source-id: ea64fbe83407ff96c1c0ecabbc6c830576939393
2018-04-13 22:13:13 -07:00
David Lai
3be9b36453 comment unused parameters to turn on -Wunused-parameter flag
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662

Differential Revision: D7426121

Pulled By: Dayvedde

fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
2018-04-12 17:59:16 -07:00
Maysam Yabandeh
eb5a295440 WritePrepared Txn: add write_committed option to dump_wal
Summary:
Currently dump_wal cannot print the prepared records from the WAL that is generated by WRITE_PREPARED write policy since the default reaction of the handler is to return NotSupported if markers of WRITE_PREPARED are encountered. This patch enables the admin to pass --write_committed=false option, which will be accordingly passed to the handler. Note that DBFileDumperCommand and DBDumperCommand are still not updated by this patch but firstly they are not urgent and secondly we need to revise this approach later when we also add WRITE_UNPREPARED markers so I leave it for future work.

Tested by running it on a WAL generated by WRITE_PREPARED:
$ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log  | grep BEGIN_PREARE | head -1
1,2,70,0,BEGIN_PREARE
$ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log --write_committed=false | grep BEGIN_PREARE | head -1
1,2,70,0,BEGIN_PREARE PUT(0) : 0x30303031313330313938 PUT(0) : 0x30303032353732313935 END_PREPARE(0x74786E31313535383434323738303738363938313335312D30)
Closes https://github.com/facebook/rocksdb/pull/3682

Differential Revision: D7522090

Pulled By: maysamyabandeh

fbshipit-source-id: a0332207261c61e18b2f9dfbe9feecd9a1339aca
2018-04-07 21:56:42 -07:00
Phani Shekhar Mantripragada
446b32cfc3 Support for Column family specific paths.
Summary:
In this change, an option to set different paths for different column families is added.
This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path.
To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path.

Changes :
1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions.  This member is used to identify the path information whenever files are accessed.
2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting.
3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths.
4) Unit tests are added appropriately.
Closes https://github.com/facebook/rocksdb/pull/3102

Differential Revision: D6951697

Pulled By: ajkr

fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d
2018-04-05 19:58:20 -07:00
Yi Wu
36a9f22931 Blob DB: blob_dump to show uncompressed values
Summary:
Make blob_dump tool able to show uncompressed values if the blob file is compressed. Also show total compressed vs. raw size at the end if --show_summary is provided.
Closes https://github.com/facebook/rocksdb/pull/3633

Differential Revision: D7348926

Pulled By: yiwu-arbug

fbshipit-source-id: ca709cb4ed5cf6a550ff2987df8033df81516f8e
2018-04-05 11:12:16 -07:00
Andrew Kryczka
b058a33705 Reduce default --nooverwritepercent in black-box crash tests
Summary:
Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`.

Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds.
Closes https://github.com/facebook/rocksdb/pull/3671

Differential Revision: D7457732

Pulled By: ajkr

fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0
2018-04-03 15:28:40 -07:00
Anand Ananthabhotla
f9f4d40f93 Align SST file data blocks to avoid spanning multiple pages
Summary:
Provide a block_align option in BlockBasedTableOptions to allow
alignment of SST file data blocks. This will avoid higher
IOPS/throughput load due to < 4KB data blocks spanning 2 4KB pages.
When this option is set to true, the block alignment is set to lower of
block size and 4KB.
Closes https://github.com/facebook/rocksdb/pull/3502

Differential Revision: D7400897

Pulled By: anand1976

fbshipit-source-id: 04cc3bd144e88e3431a4f97604e63ad7a0f06d44
2018-03-26 20:26:10 -07:00
Sagar Vemuri
a993c0139d Add 5.11 and 5.12 to tools/check_format_compatible.sh
Summary: Closes https://github.com/facebook/rocksdb/pull/3646

Differential Revision: D7384727

Pulled By: sagar0

fbshipit-source-id: f713af7adb2ffea5303bbf0fac8a8a1630af7b38
2018-03-23 12:43:06 -07:00
Siying Dong
6383e42362 benchmark.sh to use --max_background_job
Summary: Closes https://github.com/facebook/rocksdb/pull/3632

Differential Revision: D7347012

Pulled By: siying

fbshipit-source-id: 46230ec4a917ccf4c478825b07e92b4665a4820b
2018-03-20 18:57:55 -07:00
Bruce Mitchener
a3a3f5497c Fix some typos in comments and docs.
Summary: Closes https://github.com/facebook/rocksdb/pull/3568

Differential Revision: D7170953

Pulled By: siying

fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c
2018-03-08 10:27:25 -08:00
Yi Wu
b864bc9b5b Blob DB: Improve FIFO eviction
Summary:
Improving blob db FIFO eviction with the following changes,
* Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size.
* FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion.
* FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files.
* If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO.
* Compaction filter also filter those blob indexes where corresponding blob file is gone.
* Add an event listener to listen compaction/flush event and update sst file size.
* Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db.
* More blob db statistics around FIFO.
* Fix some locking issue when accessing a blob file.
Closes https://github.com/facebook/rocksdb/pull/3556

Differential Revision: D7139328

Pulled By: yiwu-arbug

fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa
2018-03-06 11:57:42 -08:00
Pooya Shareghi
0a2354ca8f Added bytes XOR merge operator
Summary:
Closes https://github.com/facebook/rocksdb/pull/575

I fixed the merge conflicts etc.
Closes https://github.com/facebook/rocksdb/pull/3065

Differential Revision: D7128233

Pulled By: sagar0

fbshipit-source-id: 2c23a48c9f0432c290b0cd16a12fb691bb37820c
2018-03-06 10:27:36 -08:00
Andrew Kryczka
5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Maysam Yabandeh
d060421c77 Fix a leak in prepared_section_completed_
Summary:
The zeroed entries were not removed from prepared_section_completed_ map. This patch adds a unit test to show the problem and fixes that by refactoring the code. The new code is more efficient since i) it uses two separate mutex to avoid contention between commit and prepare threads, ii) it uses a sorted vector for maintaining uniq log entires with prepare which avoids a very large heap with many duplicate entries.
Closes https://github.com/facebook/rocksdb/pull/3545

Differential Revision: D7106071

Pulled By: maysamyabandeh

fbshipit-source-id: b3ae17cb6cd37ef10b6b35e0086c15c758768a48
2018-03-01 20:41:56 -08:00
Igor Sugak
aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai
f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Andrew Kryczka
1960e73e21 fix handling of empty string as checkpoint directory
Summary:
- made `CreateCheckpoint` properly return `InvalidArgument` when called with an empty directory. Previously it triggered an assertion failure due to a bug in the logic.
- made `ldb` set empty `checkpoint_dir` if that's what the user specifies, so that we can use it to properly test `CreateCheckpoint` in the future.

Differential Revision: D6874562

fbshipit-source-id: dcc1bd41768261d9338987fa7711444289707ed7
2018-02-20 16:44:00 -08:00
Yi Wu
989d12313c Legocastle job to report lite build binary size to scuba
Summary:
Add a legocastle job to continuously build the last 10 commits every 4 hours and report lite build binary size to scuba.
Closes https://github.com/facebook/rocksdb/pull/3511

Differential Revision: D7001730

Pulled By: yiwu-arbug

fbshipit-source-id: 7c8ca87c46d663c786a0d32be69ebbe7b19a5eb9
2018-02-15 17:27:24 -08:00
Andrew Kryczka
0a0fad447b db_bench separate options for partition index and filters
Summary:
Some workloads (like my current benchmarking) may want partitioned indexes without partitioned filters. Particularly, when `-optimize_filters_for_hits=true`, the total index size may be larger than the total filter size, so it can make sense to hold all filters in-memory but not all indexes.
Closes https://github.com/facebook/rocksdb/pull/3492

Differential Revision: D6970092

Pulled By: ajkr

fbshipit-source-id: b7fa1828e1d13829339aefb90fd56eb7c5337f61
2018-02-12 14:57:13 -08:00
Chinmay Kamat
9fc72d6f16 Compilation fixes for powerpc build, -Wparentheses-equality error and missing header guards
Summary:
This pull request contains miscellaneous compilation fixes.

Thanks,
Chinmay
Closes https://github.com/facebook/rocksdb/pull/3462

Differential Revision: D6941424

Pulled By: sagar0

fbshipit-source-id: fe9c26507bf131221f2466740204bff40a15614a
2018-02-09 14:12:43 -08:00
Tamir Duberstein
cd5092e168 Suppress unused warnings
Summary:
- Use `__unused__` everywhere
- Suppress unused warnings in Release mode
    + This currently affects non-MSVC builds (e.g. mingw64).
Closes https://github.com/facebook/rocksdb/pull/3448

Differential Revision: D6885496

Pulled By: miasantreble

fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211
2018-02-02 12:27:07 -08:00
Siying Dong
e2d4b0efb1 db_bench: sanity check CuckooTable with mmap_read option
Summary:
This is to avoid run time error. Fail the db_bench immediately if cuckoo table is used but mmap_read is not specified.
Closes https://github.com/facebook/rocksdb/pull/3420

Differential Revision: D6838284

Pulled By: siying

fbshipit-source-id: 20893fa28d40fadc31e4ff154bed02f5a1bad341
2018-01-29 14:27:32 -08:00
Mark Isaacson
b8eb32f8cf Suppress lint in old files
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.

Reviewed By: yiwu-arbug

Differential Revision: D6821806

fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
2018-01-29 12:56:42 -08:00
Andrew Kryczka
9f7ccc8445 fix db_bench filluniquerandom key count assertion
Summary:
It failed every time. I guess people usually ran with assertions disabled.
Closes https://github.com/facebook/rocksdb/pull/3422

Differential Revision: D6822984

Pulled By: ajkr

fbshipit-source-id: 2e90db75618b26ac1c46ddfa9e03c095c7bf16e3
2018-01-29 11:43:21 -08:00
Andrew Kryczka
0e6e405fec db_bench support for memtable in-place update
Summary: Closes https://github.com/facebook/rocksdb/pull/3416

Differential Revision: D6820606

Pulled By: ajkr

fbshipit-source-id: 5035ffb33ade8d50520cafeb685ee8c8fcf1cca8
2018-01-26 10:57:49 -08:00
Siying Dong
47ad6b81ff Add 5.10.fb to tools/check_format_compatible.sh
Summary: Closes https://github.com/facebook/rocksdb/pull/3383

Differential Revision: D6762375

Pulled By: siying

fbshipit-source-id: dc1e0dc9718ffb59ffe42e2a2c844b67f935a5fb
2018-01-19 12:42:07 -08:00
Anand Ananthabhotla
199405192d Add a BlockBasedTableOption to turn off index block compression.
Summary:
Add a new bool option index_uncompressed in BlockBasedTableOptions.
Closes https://github.com/facebook/rocksdb/pull/3303

Differential Revision: D6686161

Pulled By: anand1976

fbshipit-source-id: 748b46993d48a01e5f89b6bd3e41f06a59ec6054
2018-01-10 15:11:59 -08:00
Yi Wu
46ec52499e Fix db_bench write being disabled in lite build
Summary:
The macro was added by mistake in #2372
Closes https://github.com/facebook/rocksdb/pull/3343

Differential Revision: D6681356

Pulled By: yiwu-arbug

fbshipit-source-id: 4180172fb0eaef4189c07f219241e0c261c03461
2018-01-09 10:57:29 -08:00
Maysam Yabandeh
00b33c2474 WritePrepared Txn: address some pending TODOs
Summary:
This patch addresses a couple of minor TODOs for WritePrepared Txn such as double checking some assert statements at runtime as well, skip extra AddPrepared in non-2pc transactions, and safety check for infinite loops.
Closes https://github.com/facebook/rocksdb/pull/3302

Differential Revision: D6617002

Pulled By: maysamyabandeh

fbshipit-source-id: ef6673c139cb49f64c0879508d2f573b78609aca
2018-01-09 08:57:20 -08:00
yingsu00
f54d7f5fea Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**

RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.

This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.

**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.

Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.

1) ReadRandom in db_bench overall metrics

    PER RUN
    Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
    3-way      |  1   | 4.143   | 241387 | 26.7
    3-way      |  2   | 3.775   | 264872 | 29.3
    3-way      | 3    | 4.116   | 242929 | 26.9
    FastCrc32c|1  | 4.037   | 247727 | 27.4
    FastCrc32c|2  | 4.648   | 215166 | 23.8
    FastCrc32c|3  | 4.352   | 229799 | 25.4

     AVG
    Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
    3-way           |     4.01                               |      249,729                 |      27.63
    FastCrc32c  |     4.35                              |     230,897                  |      25.53

 2)   Crc32c computation CPU cost (inclusive samples percentage)
    PER RUN
    Implementation | run |  TotalSamples   | Crc32c percentage
    3-way                 |  1    |  4,572,250,000 | 4.37%
    3-way                 |  2    |  3,779,250,000 | 4.62%
    3-way                 |  3    |  4,129,500,000 | 4.48%
    FastCrc32c       |  1    |  4,663,500,000 | 11.24%
    FastCrc32c       |  2    |  4,047,500,000 | 12.34%
    FastCrc32c       |  3    |  4,366,750,000 | 11.68%

 **# Test Plan**
     make -j64 corruption_test && ./corruption_test
      By default it uses 3-way SSE algorithm

     NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test

    make clean && DEBUG_LEVEL=0 make -j64 db_bench
    make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173

Differential Revision: D6330882

Pulled By: yingsu00

fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-19 18:26:49 -08:00
Maysam Yabandeh
95583e1532 db_stress: skip snapshot check if cf is dropped
Summary:
We added a new verification that ensures a value that snapshot reads when is released is the same as when it was created. This test however fails when the cf is dropped in between. The patch skips the tests if that was the case.
Closes https://github.com/facebook/rocksdb/pull/3279

Differential Revision: D6581584

Pulled By: maysamyabandeh

fbshipit-source-id: afe37d371c0f91818d2e279b3949b810e112e8eb
2017-12-15 16:28:04 -08:00
Maysam Yabandeh
cd2e5cae7f WritePrepared Txn: make db_stress transactional
Summary:
Add "--use_txn" option to use transactional API in db_stress, default being WRITE_PREPARED policy, which is the main intention of modifying db_stress. It also extend the existing snapshots to verify that before releasing a snapshot a read from it returns the same value as before.
Closes https://github.com/facebook/rocksdb/pull/3243

Differential Revision: D6556912

Pulled By: maysamyabandeh

fbshipit-source-id: 1ae31465be362d44bd06e635e2e9e49a1da11268
2017-12-13 11:57:29 -08:00
Yi Wu
e1c569c324 Fix clang-analyzer false-positive on ldb_cmd.cc
Summary:
clang-analyzer complaint about db_ being nullptr, but it couldn't be because it checks exec_stats before proceed. Add an assert to get around the false-positive.

Test Plan
`make analyze`
Closes https://github.com/facebook/rocksdb/pull/3236

Differential Revision: D6505417

Pulled By: yiwu-arbug

fbshipit-source-id: e5b65764ea994dd9e4bab3e697b97dc70dc22cab
2017-12-06 22:58:46 -08:00
Sagar Vemuri
d51fcb21f4 Blob DB: Add db_bench options
Summary:
Adding more BlobDB db_bench options which are needed for benchmarking.
Closes https://github.com/facebook/rocksdb/pull/3230

Differential Revision: D6500711

Pulled By: sagar0

fbshipit-source-id: 91d63122905854ef7c9148a0235568719146e6c5
2017-12-06 20:44:12 -08:00
Yi Wu
7f04af32a5 ldb to allow db with --try_load_options and without an options file
Summary:
This is to fix tools/check_format_compatible.sh. The tool try to open
old versions of rocksdb with the provided options file. When options
file is missing (e.g. rocksdb 2.2), it should still proceed with default
options.
Closes https://github.com/facebook/rocksdb/pull/3232

Differential Revision: D6503955

Pulled By: yiwu-arbug

fbshipit-source-id: e44cfcce7ddc7d12cf83466ed3f3fe7624aa78b8
2017-12-06 16:42:26 -08:00
Yi Wu
b5798bd324 Add missing recent versions to format compatible test
Summary:
Add recent versions for format compatible test. We should probably update the script to auto include available versions (by looking at include/rocksdb/versions.h and deduce branch names), but we can do it later.
Closes https://github.com/facebook/rocksdb/pull/3233

Differential Revision: D6503631

Pulled By: yiwu-arbug

fbshipit-source-id: e2b01d1ef6e784ff6ffa1bd75d741755e3c69a8c
2017-12-06 16:13:50 -08:00
Andrew Kryczka
63f1c0a57d fix gflags namespace
Summary:
I started adding gflags support for cmake on linux and got frustrated that I'd need to duplicate the build_detect_platform logic, which determines namespace based on attempting compilation. We can do it differently -- use the GFLAGS_NAMESPACE macro if available, and if not, that indicates it's an old gflags version without configurable namespace so we can simply hardcode "google".
Closes https://github.com/facebook/rocksdb/pull/3212

Differential Revision: D6456973

Pulled By: ajkr

fbshipit-source-id: 3e6d5bde3ca00d4496a120a7caf4687399f5d656
2017-12-01 10:42:05 -08:00
Maysam Yabandeh
18dcf7f98d WritePrepared Txn: PreReleaseCallback
Summary:
Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
Closes https://github.com/facebook/rocksdb/pull/3205

Differential Revision: D6438959

Pulled By: maysamyabandeh

fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
2017-11-30 23:50:45 -08:00
Andrew Kryczka
ed3af9ef99 improve ldb CLI option support
Summary:
- Made CLI arguments take precedence over options file when both are provided. Note some of the CLI args are not settable via options file, like `--compression_max_dict_bytes`, so it's necessary to allow both ways of providing options simultaneously.
- Changed `PrepareOptionsForOpenDB` to update the proper `ColumnFamilyOptions` if one exists for the user's `--column_family_name` argument. I supported this only in the base class, `LDBCommand`, so it works for the general arguments. Will defer adding support for subcommand-specific arguments.
- Made the command fail if `--try_load_options` is provided and loading options file returns NotFound. I found the previous behavior of silently continuing confusing.
Closes https://github.com/facebook/rocksdb/pull/3144

Differential Revision: D6270544

Pulled By: ajkr

fbshipit-source-id: 7c2eac9f9b38720523d74466fb9e78db53561367
2017-11-28 17:28:58 -08:00
Prashant D
c1ed005a21 tools: Fix coverity issues
Summary:
tools/ldb_cmd.cc:
```
310  ignore_unknown_options_ = IsFlagPresent(flags, ARG_IGNORE_UNKNOWN_OPTIONS);

CID 1322798 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
5. uninit_member: Non-static class member db_ttl_ is not initialized in this constructor nor in any functions that it calls.
311}
```
Closes https://github.com/facebook/rocksdb/pull/3122

Differential Revision: D6428576

Pulled By: sagar0

fbshipit-source-id: d77f04dd201f7f1d9f59ef88a215ee7ad7b934e9
2017-11-28 15:27:41 -08:00
Sagar Vemuri
8954f830a0 Blob DB: db_bench flag to control BlobDB's garbage collection
Summary:
flag: blob_db_enable_gc, to control BlobDb's enable_garbage_collection.
Closes https://github.com/facebook/rocksdb/pull/3190

Differential Revision: D6383395

Pulled By: sagar0

fbshipit-source-id: 4134e835150748c425b8187264273a54c6d8381c
2017-11-20 23:26:15 -08:00
Yi Wu
9871ea4357 Regression test build binaries with PORTABLE=1
Summary:
We hit "Illegal instruction" error in regression test with "shlx" instruction. Setting PORTABLE=1 to resolve it.
Closes https://github.com/facebook/rocksdb/pull/3165

Differential Revision: D6321972

Pulled By: yiwu-arbug

fbshipit-source-id: cc9fe0dbd4698d1b66a750a0b062f66899862719
2017-11-13 21:26:24 -08:00
Andrew Kryczka
114896c4e0 db_bench compression options
Summary:
- moved existing compression options to `InitializeOptionsGeneral` since they cannot be set through options file
- added flag for `zstd_max_train_bytes` which was recently introduced by #3057
Closes https://github.com/facebook/rocksdb/pull/3128

Differential Revision: D6240460

Pulled By: ajkr

fbshipit-source-id: 27dbebd86a55de237ba6a45cc79cff9214e82ebc
2017-11-07 14:00:03 -08:00
Andrew Kryczka
65c95d9c59 support db_bench compact benchmark on bottommost files
Summary:
Without this option, running the compact benchmark on a DB containing only bottommost files simply returned immediately.
Closes https://github.com/facebook/rocksdb/pull/3138

Differential Revision: D6256660

Pulled By: ajkr

fbshipit-source-id: e3b64543acd503d821066f4200daa201d4fb3a9d
2017-11-07 10:57:24 -08:00
Andrew Kryczka
4d43c6a6a4 db_stress snapshot compatibility with reopens
Summary:
- Release all snapshots before crashing and reopening the DB. Without this, we may attempt to release snapshots from an old DB using a new DB. That tripped an assertion.
- Release multiple snapshots in the same operation if needed. Without this, we would sometimes leak snapshots.
Closes https://github.com/facebook/rocksdb/pull/3098

Differential Revision: D6194923

Pulled By: ajkr

fbshipit-source-id: b9c89bcca7ebcbb6c7802c616f9d1175a005aadf
2017-10-31 01:26:08 -07:00
Andrew Kryczka
d75793d6b4 db_stress support long-held snapshots
Summary:
Add options to `db_stress` (correctness testing tool) to randomly acquire snapshot and release it after some period of time. It's useful for correctness testing of #3009, as well as other parts of compaction that behave differently depending on which snapshots are held.
Closes https://github.com/facebook/rocksdb/pull/3038

Differential Revision: D6086501

Pulled By: ajkr

fbshipit-source-id: 3ec0d8666c78ac507f1f808887c4ff759ba9b865
2017-10-20 15:26:59 -07:00
Dmitri Smirnov
ebab2e2d42 Enable MSVC W4 with a few exceptions. Fix warnings and bugs
Summary: Closes https://github.com/facebook/rocksdb/pull/3018

Differential Revision: D6079011

Pulled By: yiwu-arbug

fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57
2017-10-19 10:57:12 -07:00
Andrew Kryczka
731895214b db_bench randomtransaction print throughput
Summary:
print throughput in MB/s upon finishing randomtransaction benchmark
Closes https://github.com/facebook/rocksdb/pull/3016

Differential Revision: D6070426

Pulled By: ajkr

fbshipit-source-id: 69df43beed4c374a36d826e761ca3a83e1fdcbf5
2017-10-16 18:42:25 -07:00
Andrew Kryczka
1026e794a3 rate limit auto-tuning
Summary:
Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in `GenericRateLimiter::Tune`. Other changes:

- make rate limiter's `Env*` configurable for testing
- track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter.
Closes https://github.com/facebook/rocksdb/pull/2899

Differential Revision: D5858704

Pulled By: ajkr

fbshipit-source-id: cc2bac30f85e7f6fd63655d0a6732ef9ed7403b1
2017-10-04 19:15:01 -07:00
Siying Dong
2a3363d52e ldb dump can print histogram of value size
Summary:
Make "ldb dump --count_only" print histogram of value size. Also, fix a bug that "ldb dump --path=<db_path>" doesn't work.
Closes https://github.com/facebook/rocksdb/pull/2944

Differential Revision: D5954527

Pulled By: siying

fbshipit-source-id: c620a444ec544258b8d113f5f663c375dd53d6be
2017-10-02 09:41:17 -07:00
Andrew Kryczka
8fc3de3c62 make rate limiter a general option
Summary:
it's unsupported in options file, so the flag should be respected by db_bench even when an options file is provided.
Closes https://github.com/facebook/rocksdb/pull/2910

Differential Revision: D5869836

Pulled By: ajkr

fbshipit-source-id: f67f591ae083e95e989f86b6fad50765d2e3d855
2017-09-21 11:11:00 -07:00
Amy Xu
5785b1fcb8 Fix naming in InternalKey
Summary:
- Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic
Closes https://github.com/facebook/rocksdb/pull/2868

Differential Revision: D5804152

Pulled By: axxufb

fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183
2017-09-12 17:17:42 -07:00
Kamalalochana Subbaiah
e612e31740 Updated CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2716

Differential Revision: D5606836

Pulled By: siying

fbshipit-source-id: 720262453b1546e5fdbbc668eff56848164113f3
2017-08-31 14:16:30 -07:00
Dmitri Smirnov
8fa4d108a2 Try to switch to Stduio 2017
Summary: Closes https://github.com/facebook/rocksdb/pull/2802

Differential Revision: D5746710

Pulled By: siying

fbshipit-source-id: daa621ba5fccb84c0d6cdb7755c5e09319c45cb4
2017-08-31 10:30:27 -07:00
Sagar Vemuri
06b37eef7b Set defaults for high-pri and low-pri thread pools in regression test script
Summary:
**Summary**:
Set defaults for high-pri and low-pri thread pools in regression test script.

**Reason for this change**:
With #2680 , high-pri and low-pri thread pools get different numbers than before if  `num_high_pri_threads` and `num_low_pri_threads` options are not explicitly passed to db_bench in regression test script ... leading to a false-positive regression.

**Test Plan**:
REMOTE_HOST=udb1671.prn3 TEST_MODE=1 FBSOURCE=~/fbsource ~/fbsource/fbcode/rocks/tools/debug_regression_test.sh viewstate  (with very minor changes to the internals).

Observe P50 and P99 which showed up as regressions in our graphs.

Stats with the commit prior to #2680 , ie. 4f81ab3 :
seekrandomwhilewriting :      75.096 micros/op 13316 ops/sec;  168.6 MB/s (7499074 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1197.7254  StdDev: 33.35
Min: 187  Median: 980.5292  Max: 1816424
Percentiles: **P50: 980.53** P75: 1494.57 **P99: 4185.64** P99.9: 7800.11 P99.99: 15039.64

Stats at #2680, ie. at commit dce6d5a (false-positive regression):
seekrandomwhilewriting :      85.330 micros/op 11719 ops/sec;  148.4 MB/s (7499073 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1362.3261  StdDev: 27.86
Min: 185  Median: 1088.1915  Max: 652760
Percentiles: **P50: 1088.19** P75: 1658.12 **P99: 5361.15** P99.9: 7997.95 P99.99: 11730.07

Stats with the current change on top of dce6d5a :
seekrandomwhilewriting :      77.780 micros/op 12856 ops/sec;  162.8 MB/s (7499102 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1226.6744  StdDev: 17.16
Min: 185  Median: 994.2956  Max: 2553530
Percentiles: **P50: 994.30** P75: 1513.68 **P99: 4284.30** P99.9: 9338.64 P99.99: 23008.86
Closes https://github.com/facebook/rocksdb/pull/2801

Differential Revision: D5742338

Pulled By: sagar0

fbshipit-source-id: cc5d727c1a131f2a7070d1bb892efbe929b976ff
2017-08-30 17:29:34 -07:00