Commit Graph

1141 Commits

Author SHA1 Message Date
sdong
df3f33dd05 Fix db_bench LITE build recently broken (#6411)
Summary:
A recent change https://github.com/facebook/rocksdb/pull/6386 broke LITE build in a trivial way. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6411

Test Plan: Run "LITE=1 make all"

Differential Revision: D19871765

fbshipit-source-id: 74f0ad3f8a9d666fbde0da7fd29ba1547a811f77
2020-02-13 10:52:50 -08:00
Burton Li
e64508917b db_bench supports for generating random variable sized value. (#6386)
Summary:
1. `db_bench` now supports `value_size_distribution_type`, `value_size_min`, `value_size_max` options for generating random variable sized value.
2. Added `blob_db_compression_type` option for BlobDB to enable blob compression.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6386

Differential Revision: D19859406

Pulled By: zhichao-cao

fbshipit-source-id: ace52674090023fde15d832392110bf288a8e215
2020-02-12 14:47:03 -08:00
Zhichao Cao
4369f2c7bb Checksum for each SST file and stores in MANIFEST (#6216)
Summary:
In the current code base, RocksDB generate the checksum for each block and verify the checksum at usage. Current PR enable SST file checksum. After a SST file is generated by Flush or Compaction, RocksDB generate the SST file checksum and store the checksum value and checksum method name in the vs_info and MANIFEST as part for the FileMetadata.

Added the enable_sst_file_checksum to Options to enable or disable file checksum. Added sst_file_checksum to Options such that user can plugin their own SST file checksum calculate method via overriding the SstFileChecksum class. The checksum information inlcuding uint32_t checksum value and a checksum name (string).  A new tool is added to LDB such that user can dump out a list of file checksum information from MANIFEST. If user enables the file checksum but does not provide the sst_file_checksum instance, RocksDB will use the default crc32checksum implemented in table/sst_file_checksum_crc32c.h
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6216

Test Plan: Added the testing case in table_test and ldb_cmd_test to verify checksum is correct in different level. Pass make asan_check.

Differential Revision: D19171461

Pulled By: zhichao-cao

fbshipit-source-id: b2e53479eefc5bb0437189eaa1941670e5ba8b87
2020-02-10 15:52:52 -08:00
sdong
876c2dbff4 Allow readahead when reading option files. (#6372)
Summary:
Right, when reading from option files, no readahead is used and 8KB buffer is used. It might introduce high latency if the file system provide high latency and doesn't do readahead. Instead, introduce a readahead to the file. When calling inside DB, infer the value from options.log_readahead. Otherwise, a default 512KB readahead size is used.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6372

Test Plan: Add --log_readahead_size in db_bench. Run it with several options and observe read size from option files using strace.

Differential Revision: D19727739

fbshipit-source-id: e6d8053b0a64259abc087f1f388b9cd66fa8a583
2020-02-07 15:18:26 -08:00
sdong
24c9dce825 Remove include math.h (#6373)
Summary:
We see some odd errors complaining math. However, it doesn't seem that it is needed to be included. Remove the include of math.h. Just removing it from db_bench doesn't seem to break anything. Replacing sqrt from std::sqrt seems to work for histogram.cc
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6373

Test Plan: Watch Travis and appveyor to run.

Differential Revision: D19730068

fbshipit-source-id: d3ad41defcdd9f51c2da1a3673fb258f5dfacf47
2020-02-05 21:00:49 -08:00
Maysam Yabandeh
967a2d953f Revert "crash_test to enable block-based table hash index (#6310)" (#6327)
Summary:
This reverts commit 8e309b35bb.
The stress tests are failing . Revert it until we figure the root cause.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6327

Differential Revision: D19537657

Pulled By: maysamyabandeh

fbshipit-source-id: bf34a5dd720825957729e136e9a5a729a240e61a
2020-01-23 09:09:17 -08:00
Maysam Yabandeh
cb1142e00d Set index_block_restart_interval of kHashSearch to 1 in stress test (#6324)
Summary:
kHashSearch is incompatible with larger than 1 values for index_block_restart_interval. Setting it to 1 in stress tests would avoid confusion about the test parameters.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6324

Differential Revision: D19525669

Pulled By: maysamyabandeh

fbshipit-source-id: fbf3a797e0ebcebb4d32eba3728cf3583906fc8a
2020-01-22 16:33:21 -08:00
sdong
8e309b35bb crash_test to enable block-based table hash index (#6310)
Summary:
Block-based table has index has been disabled in crash test due to bugs. We fixed a bug and re-enable it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6310

Test Plan: Finish one round of "crash_test_with_atomic_flush" test successfully while exclusively running has index. Another run also ran for several hours without failure.

Differential Revision: D19455856

fbshipit-source-id: 1192752d2c1e81ed7e5c5c7a9481c841582d5274
2020-01-21 12:27:30 -08:00
sdong
6b64aed4c0 Fix bug which causes crash_test to always run on sync mode (#6304)
Summary:
A previous change meant to make db_stress to run on sync=1 mode for 1/20 of the time in crash_test, but a bug caused to to always run on sync=1 mode. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6304

Test Plan: Start and kill "python -u tools/db_crashtest.py --simple whitebox" multiple times and observe that most times sync=0 is used while some times sync=1 is used.

Differential Revision: D19433000

fbshipit-source-id: 7a0adba39b17a1b3acbbd791bb0cdb743b91fa95
2020-01-17 01:46:48 -08:00
anand76
687119aeaf Variable key length in db_stress (#6273)
Summary:
Undo https://github.com/facebook/rocksdb/issues/6243 and fix the crash test failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6273

Test Plan: Run make ubsan_crash_test

Differential Revision: D19331472

Pulled By: anand1976

fbshipit-source-id: 30aa4a36c1b0f77a97159d82bbfd1cd767878e28
2020-01-09 21:27:18 -08:00
Peter Dillinger
83108d23e8 Re-enable level_compaction_dynamic_level_bytes in crash test (#6251)
Summary:
Was probably a false signal suggesting a problem in https://github.com/facebook/rocksdb/issues/6217
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6251

Test Plan: 'make crash_test'

Differential Revision: D19246951

Pulled By: pdillinger

fbshipit-source-id: 3e4fafcd9a7cf5f19ffd207b90279ba615145d6f
2019-12-30 10:15:49 -08:00
Peter Dillinger
37fd2b9694 Revert "Generate variable length keys in db_stress (#6165)" and follow-ups (#6243)
Summary:
This commit is suspected in some crash test failures such as

Verification failed for column family 0 key 78438077: Value not found: NotFound:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243

Test Plan: 'make check' and start 'make crash_test'

Differential Revision: D19220495

Pulled By: pdillinger

fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa
2019-12-23 16:32:57 -08:00
anand76
3160edfdc7 Generate variable length keys in db_stress (#6165)
Summary:
Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165

Test Plan: run make crash_test

Differential Revision: D19204646

Pulled By: anand1976

fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7
2019-12-20 21:10:33 -08:00
sdong
338c149b92 crash_test to cover bottommost compression and some other changes (#6215)
Summary:
Several improvements to crash_test/stress_test:
(1) Stress_test to support an parameter of bottommost compression
(2) Rename those FLAGS_* variables that are not gflags to avoid confusion
(3) Crash_test to randomly generate compression type for bottommost compression with half the chance.
(4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment.
(5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215

Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files.

Differential Revision: D19171255

fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735
2019-12-20 16:14:52 -08:00
Zhichao Cao
f89dea4fec db_stress: Added the verification for GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile (#6224)
Summary:
Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224

Test Plan: pass db_stress default run.

Differential Revision: D19183358

Pulled By: zhichao-cao

fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a
2019-12-20 12:07:30 -08:00
Yanqin Jin
670a916d01 Add more verification to db_stress (#6173)
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.

Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173

Differential Revision: D19047367

Pulled By: riversand963

fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
2019-12-20 08:49:29 -08:00
Peter Dillinger
1ba92b8582 Only search specific directories in Python check (#6225)
Summary:
The new Python syntax check could fail if external entities
were cloned or symlinked to a subdir in a rocksdb git clone. (E.g.
Facebook internal LITE build.) Only look for Python files in specific
subdirs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6225

Test Plan: python tools/check_all_python.py (still 34 files checked)

Reviewed By: gfosco

Differential Revision: D19186110

Pulled By: pdillinger

fbshipit-source-id: 1fefa54e36b32cd5d96d3d1a43e8a2a694c22ea5
2019-12-19 15:37:30 -08:00
Maysam Yabandeh
77d5ba7887 Revert "Add kHashSearch to stress tests (#6210)" (#6220)
Summary:
This reverts commit 54f9092b0c.
It making our daily stress tests fail. Revert it until the issues are fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6220

Differential Revision: D19179881

Pulled By: maysamyabandeh

fbshipit-source-id: 99de0eaf776567fa81110b9ad2608234a16083ce
2019-12-19 10:46:55 -08:00
Peter Dillinger
9ff569bdfc Temporarily disable level_compaction_dynamic_level_bytes in crash test (#6217)
Summary:
We're seeing assertion violations like this in crash test:

db_stress: table/block_based/block_based_table_reader.cc:4129: virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller): Assertion `end_offset >= start_offset' failed.***

And ApproximateSize appears only to be called with the level_compaction_dynamic_level_bytes option.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6217

Test Plan:
temporarily put an assert(false) in ApproximateSize and
briefly run 'make crash_test'

Differential Revision: D19179174

Pulled By: pdillinger

fbshipit-source-id: 506e6549aea0da19b363a1a6da04373c364d92e4
2019-12-19 10:24:49 -08:00
Peter Dillinger
5b18729d7d Syntax check python files on testing (#6209)
Summary:
Adds a python script to syntax check all python files in the
repository and report any errors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6209

Test Plan:
'make check' with and without seeded syntax errors. Also look
for "No syntax errors in 34 .py files" on success, and in java_test CI output

Differential Revision: D19166756

Pulled By: pdillinger

fbshipit-source-id: 537df464b767260d66810b4cf4c9808a026c58a4
2019-12-19 08:31:11 -08:00
Maysam Yabandeh
54f9092b0c Add kHashSearch to stress tests (#6210)
Summary:
Beside extending index_type to kHashSearch, it clarifies in the code base that this feature is incompatible with index_block_restart_interval > 1.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6210

Test Plan:
```
make -j32 crash_test

Differential Revision: D19166567

Pulled By: maysamyabandeh

fbshipit-source-id: 3aaf75a70a8b462d372d43aac69dbd10df303ec7
2019-12-18 18:09:30 -08:00
Levi Tamasi
130e710056 Add BlobDB GC cutoff parameter to db_bench (#6211)
Summary:
The patch makes it possible to set the BlobDB configuration option
`garbage_collection_cutoff` on the command line. In addition, it changes
the `db_bench` code so that the default values of BlobDB related
parameters are taken from the defaults of the actual BlobDB
configuration options (note: this changes the the default of
`blob_db_bytes_per_sync`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6211

Test Plan: Ran `db_bench` with various values of the new parameter.

Differential Revision: D19166895

Pulled By: ltamasi

fbshipit-source-id: 305ccdf0123b9db032b744715810babdc3e3b7d5
2019-12-18 17:46:08 -08:00
Peter Dillinger
dfb259e48d Fix syntax error (!) in db_crashtest.py (#6207)
Summary:
Fixes syntax error from https://github.com/facebook/rocksdb/pull/6203

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6207

Test Plan: make blackbox_crash_test -> no more syntax error

Differential Revision: D19161752

Pulled By: pdillinger

fbshipit-source-id: b3032f296041ab56307762622b9ef6c03a8379aa
2019-12-18 09:32:52 -08:00
anand76
2afea29762 Add VerifyChecksum() to db_stress (#6203)
Summary:
Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203

Differential Revision: D19145753

Pulled By: anand1976

fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b
2019-12-17 20:44:58 -08:00
sdong
9f250dd88e crash_test: two fixes (#6200)
Summary:
Fix two crash test issues:
1. sync mode should not run with disable_wal=true
2. disable "compaction_readahead_size" for now. With it on, some block checksum verification failure will happen in compaction paths. Not sure why, but disable it for now to keep the test clean.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6200

Test Plan: Run "make crash_test" and "make crash_test_with_atomic_flush" and see it runs way longer than before the fix without failing.

Differential Revision: D19143493

fbshipit-source-id: 438fad52fbda60aafd142e1b65578addbe7d72b1
2019-12-17 18:25:04 -08:00
sdong
bcc372c0c3 Add some new options to crash_test (#6176)
Summary:
Several options are trivially added to crash test and random values are picked.
Made simple test run non-dynamic level and normal test run dynamic level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176

Test Plan: Run crash_test and watch the printing

Differential Revision: D19053955

fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502
2019-12-16 15:43:13 -08:00
Maysam Yabandeh
4b97812da8 Add long-running snapshots to stress tests (#6171)
Summary:
Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171

Test Plan:
```
make -j32 crash_test

Differential Revision: D19038399

Pulled By: maysamyabandeh

fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0
2019-12-14 15:22:40 -08:00
anand76
afa2420c2b Introduce a new storage specific Env API (#5761)
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.

This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.

The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.

This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.

The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761

Differential Revision: D18868376

Pulled By: anand1976

fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
2019-12-13 14:48:41 -08:00
Maysam Yabandeh
fec7302a9d Enable unordered_write in stress tests (#6164)
Summary:
With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164

Test Plan:
```
make -j32 crash_test_with_txn

Differential Revision: D18991899

Pulled By: maysamyabandeh

fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
2019-12-13 10:25:04 -08:00
Maysam Yabandeh
8613ee2e94 Enable all txn write policies in crash test (#6158)
Summary:
Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158

Test Plan:
```
make -j32 crash_test_with_txn
```

Differential Revision: D18946307

Pulled By: maysamyabandeh

fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950
2019-12-12 10:43:49 -08:00
Maysam Yabandeh
1ad6fa9cc7 Enable txn in crash tests (#6155)
Summary:
Start daily crash tests with use_txn flag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6155

Differential Revision: D18943630

Pulled By: maysamyabandeh

fbshipit-source-id: eea99a6ffd5f57fb9651f6ca7dab8fbf70379c87
2019-12-11 16:01:55 -08:00
Peter Dillinger
a653857178 Add PauseBackgroundWork() to db_stress (#6148)
Summary:
Worker thread will occasionally call PauseBackgroundWork(),
briefly sleep (to avoid stalling itself) and then call
ContinueBackgroundWork().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148

Test Plan:
some running of 'make blackbox_crash_test' with temporary
printf output to confirm code occasionally reached.

Differential Revision: D18913886

Pulled By: pdillinger

fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be
2019-12-10 15:46:48 -08:00
Adam Simpkins
2bb5fc1280 Add an option to the CMake build to disable building shared libraries (#6122)
Summary:
Add an option to explicitly disable building shared versions of the
RocksDB libraries.  The shared libraries cannot be built in cases where
some dependencies are only available as static libraries.  This allows
still building RocksDB in these situations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6122

Differential Revision: D18920740

fbshipit-source-id: d24f66d93c68a1e65635e6e0b663bae62c903bca
2019-12-10 15:20:50 -08:00
Yanqin Jin
2b060c1498 Use Env::GetChildren() instead of readdir (#6139)
Summary:
For more portability, switch from readdir to Env::GetChildren() in ldb's
manifest_dump subcommand.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6139

Test Plan:
```
$make check
```
Manually check ldb command.

Differential Revision: D18898197

Pulled By: riversand963

fbshipit-source-id: 92afca379e9fbe78ab70b2eb40d127daad8df5e2
2019-12-10 11:49:09 -08:00
Peter Dillinger
6380df5e10 Vary bloom_bits in db_crashtest (#6103)
Summary:
Especially with non-integral bits/key now supported,
db_crashtest should vary the bloom_bits configuration. The probabilities
look like this:

1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
chance of 0 which disables the bloom filter.

1/2 chance of a float from a lognormal distribution with a median of 10.
This always produces positive values but with a decent chance of < 1
(overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
implementation limits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103

Test Plan:
start 'make blackbox_crash_test' several times and look at
configuration output

Differential Revision: D18734877

Pulled By: pdillinger

fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
2019-12-10 08:39:50 -08:00
sdong
7d79b32618 Break db_stress_tool.cc to a list of source files (#6134)
Summary:
db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files.
Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134

Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1

Differential Revision: D18876064

fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d
2019-12-08 23:51:01 -08:00
Adam Simpkins
100b5e69f3 Fix build failure for db_stress tool when building with CMake (#6117)
Summary:
PR https://github.com/facebook/rocksdb/issues/5937 changed the db_stress tool to also require db_stress_tool.cc,
and updated the Makefile but not the CMakeLists.txt file.  This updates
the CMakeLists.txt file so that the CMake build succeeds again.

PR https://github.com/facebook/rocksdb/issues/5950 updated the Makefile build to package db_stress_tool.cc into
its own librocksdb_stress.a library.  I haven't done that here since
there didn't really seem to be much benefit: the Makefile-based build
does not install this library.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6117

Test Plan: Confirmed the CMake build succeeds on an Ubuntu 18.04 system.

Differential Revision: D18835053

Pulled By: riversand963

fbshipit-source-id: 6e2a66834716e73b1eb736d9b7159870defffec5
2019-12-05 15:34:54 -08:00
Connor
f32a311f0d Fix compliation error on GCC4.8.2 (#6106)
Summary:
```
In file included from /usr/include/c++/4.8.2/algorithm:62:0,
                 from ./db/merge_context.h:7,
                 from ./db/dbformat.h:16,
                 from ./tools/block_cache_analyzer/block_cache_trace_analyzer.h:12,
                 from tools/block_cache_analyzer/block_cache_trace_analyzer.cc:8:
/usr/include/c++/4.8.2/bits/stl_algo.h: In instantiation of ‘_RandomAccessIterator std::__unguarded_partition(_RandomAccessIterator, _RandomAccessIterator, const _Tp&, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<std::pair<std::basic_string<char>, long unsigned int>*, std::vector<std::pair<std::basic_string<char>, long unsigned int> > >; _Tp = std::pair<std::basic_string<char>, long unsigned int>; _Compare = rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1]’:
/usr/include/c++/4.8.2/bits/stl_algo.h:2296:78:   required from ‘_RandomAccessIterator std::__unguarded_partition_pivot(_RandomAccessIterator, _RandomAccessIterator, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<std::pair<std::basic_string<char>, long unsigned int>*, std::vector<std::pair<std::basic_string<char>, long unsigned int> > >; _Compare = rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1]’
/usr/include/c++/4.8.2/bits/stl_algo.h:2337:62:   required from ‘void std::__introsort_loop(_RandomAccessIterator, _RandomAccessIterator, _Size, _Compare) [with _RandomAccessIterator = __gnu_cxx::__normal_iterator<std::pair<std::basic_string<char>, long unsigned int>*, std::vector<std::pair<std::basic_string<char>, long unsigned int> > >; _Size = long int; _Compare = rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1]’
/usr/include/c++/4.8.2/bits/stl_algo.h:5499:44:   required from ‘void std::sort(_RAIter, _RAIter, _Compare) [with _RAIter = __gnu_cxx::__normal_iterator<std::pair<std::basic_string<char>, long unsigned int>*, std::vector<std::pair<std::basic_string<char>, long unsigned int> > >; _Compare = rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1’
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:583:79:   required from here
/usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: error: no match for call to ‘(rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1) (std::pair<std::basic_string<char>, long unsigned int>&, const std::pair<std::basic_string<char>, long unsigned int>&)’
    while (__comp(*__first, __pivot))
                                   ^
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:582:9: note: candidates are:
       [=](std::pair<std::string, uint64_t>& a,
         ^
In file included from /usr/include/c++/4.8.2/algorithm:62:0,
                 from ./db/merge_context.h:7,
                 from ./db/dbformat.h:16,
                 from ./tools/block_cache_analyzer/block_cache_trace_analyzer.h:12,
                 from tools/block_cache_analyzer/block_cache_trace_analyzer.cc:8:
/usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: note: bool (*)(std::pair<std::basic_string<char>, long unsigned int>&, std::pair<std::basic_string<char>, long unsigned int>&) <conversion>
    while (__comp(*__first, __pivot))
                                   ^
/usr/include/c++/4.8.2/bits/stl_algo.h:2263:35: note:   candidate expects 3 arguments, 3 provided
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:583:46: note: rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1
           std::pair<std::string, uint64_t>& b) { return b.second < a.second; });
                                              ^
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:583:46: note:   no known conversion for argument 2 from ‘const std::pair<std::basic_string<char>, long unsigned int>’ to ‘std::pair<std::basic_string<char>, long unsigned int>&’
In file included from /usr/include/c++/4.8.2/algorithm:62:0,
                 from ./db/merge_context.h:7,
                 from ./db/dbformat.h:16,
                 from ./tools/block_cache_analyzer/block_cache_trace_analyzer.h:12,
                 from tools/block_cache_analyzer/block_cache_trace_analyzer.cc:8:
/usr/include/c++/4.8.2/bits/stl_algo.h:2266:34: error: no match for call to ‘(rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1) (const std::pair<std::basic_string<char>, long unsigned int>&, std::pair<std::basic_string<char>, long unsigned int>&)’
    while (__comp(__pivot, *__last))
                                  ^
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:582:9: note: candidates are:
       [=](std::pair<std::string, uint64_t>& a,
         ^
In file included from /usr/include/c++/4.8.2/algorithm:62:0,
                 from ./db/merge_context.h:7,
                 from ./db/dbformat.h:16,
                 from ./tools/block_cache_analyzer/block_cache_trace_analyzer.h:12,
                 from tools/block_cache_analyzer/block_cache_trace_analyzer.cc:8:
/usr/include/c++/4.8.2/bits/stl_algo.h:2266:34: note: bool (*)(std::pair<std::basic_string<char>, long unsigned int>&, std::pair<std::basic_string<char>, long unsigned int>&) <conversion>
    while (__comp(__pivot, *__last))
                                  ^
/usr/include/c++/4.8.2/bits/stl_algo.h:2266:34: note:   candidate expects 3 arguments, 3 provided
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:583:46: note: rocksdb::BlockCacheTraceAnalyzer::WriteSkewness(const string&, const std::vector<long unsigned int>&, rocksdb::TraceType) const::__lambda1
           std::pair<std::string, uint64_t>& b) { return b.second < a.second; });
                                              ^
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:583:46: note:   no known conversion for argument 1 from ‘const std::pair<std::basic_string<char>, long unsigned int>’ to ‘std::pair<std::basic_string<char>, long unsigned int>&’
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6106

Differential Revision: D18783943

Pulled By: riversand963

fbshipit-source-id: cc7fc10565f0210b9eebf46b95cb4950ec0b15fa
2019-12-03 11:59:21 -08:00
Peter Dillinger
f19faf7814 Add format_version=5 to db_crashtest (#6102)
Summary:
format_version=5 enables new Bloom filter. Using 2/5
probability for "latest and greatest" rather than naive 1/4.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6102

Test Plan: start 'make blackbox_crash_test'

Differential Revision: D18735685

Pulled By: pdillinger

fbshipit-source-id: e81529c8a3f53560d246086ee5f92ee7d79a2eab
2019-11-27 13:19:11 -08:00
Adam Retter
6d58ea901d Fix compilation under MSVC VS2015 (#6081)
Summary:
**NOTE**: this also needs to be back-ported to 6.4.6 and possibly older branches if further releases from them is envisaged.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6081

Differential Revision: D18710107

Pulled By: zhichao-cao

fbshipit-source-id: 03260f9316566e2bfc12c7d702d6338bb7941e01
2019-11-26 18:24:09 -08:00
sdong
0bc87442ae Update HISTORY.md for forward compatibility (#6085)
Summary:
https://github.com/facebook/rocksdb/pull/6060 broke forward compatiblity for releases from 3.10 to 4.2. Update HISTORY.md to mention it. Also remove it from the compatibility tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6085

Differential Revision: D18691694

fbshipit-source-id: 4ef903783dc722b8a4d3e8229abbf0f021a114c9
2019-11-26 10:00:31 -08:00
sdong
4e0dcd36df db_stress sometimes generates keys close to SST file boundaries (#6037)
Summary:
Recently, a bug was found related to a seek key that is close to SST file boundary. However, it only occurs in a very small chance in db_stress, because the chance that a random key hits SST file boundaries is small. To boost the chance, with 1/16 chance, we pick keys that are close to SST file boundaries.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6037

Test Plan: Did some manual printing out, and hack to cover the key generation logic to be correct.

Differential Revision: D18598476

fbshipit-source-id: 13b76687d106c5be4e3e02a0c77fa5578105a071
2019-11-19 13:17:03 -08:00
sdong
a150604e10 db_stress to cover total order seek (#6039)
Summary:
Right now, in db_stress, as long as prefix extractor is defined, TestIterator always uses. There is value of cover total_order_seek = true when prefix extractor is define. Add a small chance that this flag is turned on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6039

Test Plan: Run the test for a while.

Differential Revision: D18539689

fbshipit-source-id: 568790dd7789c9986b83764b870df0423a122d99
2019-11-18 15:01:38 -08:00
sdong
6123611c42 crash_test: use large max_manifest_file_size most of the time. (#6034)
Summary:
Right now, crash_test always uses 16KB max_manifest_file_size value. It is good to cover logic of manifest file switch. However, information stored in manifest files might be useful in debugging failures. Switch to only use small manifest file size in 1/15 of the time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6034

Test Plan: Observe command generated by db_crash_test.py multiple times and see the --max_manifest_file_size value distribution.

Differential Revision: D18513824

fbshipit-source-id: 7b3ae6dbe521a0918df41064e3fa5ecbf2466e04
2019-11-14 14:01:06 -08:00
sdong
a19de78da5 db_stress to cover SeekForPrev() (#6022)
Summary:
Right now, db_stress doesn't cover SeekForPrev(). Add the coverage, which mirrors what we do for Seek().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6022

Test Plan: Run "make crash_test". Do some manual source code hack to simular iterator wrong results and see it caught.

Differential Revision: D18442193

fbshipit-source-id: 879b79000d5e33c625c7e970636de191ccd7776c
2019-11-11 17:33:54 -08:00
sdong
1da1f04231 Stress test to relax the iterator verification case for lower bound (#5869)
Summary:
In stress test, all iterator verification is turned off is lower bound is enabled. This might be stricter than needed. This PR relaxes the condition and include the case where lower bound is lower than both of seek key and upper bound. It seems to work mostly fine when I run crash test locally.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5869

Test Plan: Run crash_test

Differential Revision: D18363578

fbshipit-source-id: 23d57e11ea507949b8100f4190ddfbe8db052d5a
2019-11-07 11:16:59 -08:00
sdong
111ebf3161 db_stress: improve TestGet() failure printing (#5989)
Summary:
Right now, in db_stress's CF consistency test's TestGet case, if failure happens, we do normal string printing, rather than hex printing, so that some text is not printed out, which makes debugging harder. Fix it by printing hex instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5989

Test Plan: Build db_stress and see t passes.

Differential Revision: D18363552

fbshipit-source-id: 09d1b8f6fbff37441cbe7e63a1aef27551226cec
2019-11-06 17:38:25 -08:00
Zhichao Cao
8ea087ad16 Workload generator (Mixgraph) based on prefix hotness (#5953)
Summary:
In the previous PR https://github.com/facebook/rocksdb/issues/4788, user can use db_bench mix_graph option to generate the workload that is from the social graph. The key is generated based on the key access hotness. In this PR, user can further model the key-range hotness and fit those to two-term-exponential distribution. First, user cuts the whole key space into small key ranges (e.g., key-ranges are the same size and the key-range number is the number of SST files). Then, user calculates the average access count per key of each key-range as the key-range hotness. Next, user fits the key-range hotness to two-term-exponential distribution (f(x) = f(x) = a*exp(b*x) + c*exp(d*x)) and generate the value of a, b, c, and d. They are the parameters in db_bench: prefix_dist_a, prefix_dist_b, prefix_dist_c, and prefix_dist_d. Finally, user can run db_bench by specify the parameters.
For example:
`./db_bench --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=268435456 -key_dist_a=0.002312 -key_dist_b=0.3467 -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=350 -sine_b=0.0105 -sine_d=50000 --perf_level=2 -reads=1000000 -num=5000000 -key_size=48`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5953

Test Plan: run db_bench with different parameters and checked the results.

Differential Revision: D18053527

Pulled By: zhichao-cao

fbshipit-source-id: 171f8b3142bd76462f1967c58345ad7e4f84bab7
2019-11-06 13:02:20 -08:00
Maysam Yabandeh
50804656d2 Enable write-conflict snapshot in stress tests (#5897)
Summary:
DBImpl extends the public GetSnapshot() with GetSnapshotForWriteConflictBoundary() method that takes snapshots specially for write-write conflict checking. Compaction treats such snapshots differently to avoid GCing a value written after that, so that the write conflict remains visible even after the compaction. The patch extends stress tests with such snapshots.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5897

Differential Revision: D17937476

Pulled By: maysamyabandeh

fbshipit-source-id: bd8b0c578827990302194f63ae0181e15752951d
2019-11-06 11:13:22 -08:00
sdong
e4e1d35cc2 Revert "Disable pre-5.5 versions in the format compatibility test (#5990)" (#5999)
Summary:
This reverts commit 351e25401b.

All branches have been fixed to buildable on FB environments, so we can revert it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5999

Differential Revision: D18281947

fbshipit-source-id: 6deaaf1b5df2349eee5d6ed9b91208cd7e23ec8e
2019-11-01 15:57:15 -07:00
sdong
5b656584af crash_test: disable periodic compaction in FIFO compaction. (#5993)
Summary:
A recent commit make periodic compaction option valid in FIFO, which means TTL. But we fail to disable it in crash test, causing assert failure. Fix it by having it disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5993

Test Plan: Restart "make crash_test" many times and make sure --periodic_compaction_seconds=0 is always the case when --compaction_style=2

Differential Revision: D18263223

fbshipit-source-id: c91a802017d83ae89ac43827d1b0012861933814
2019-10-31 17:28:03 -07:00
Levi Tamasi
351e25401b Disable pre-5.5 versions in the format compatibility test (#5990)
Summary:
We have updated earlier release branches going back to 5.5 so they are
built using gcc7 by default. Disabling ancient versions before that
until we figure out a plan for them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5990

Test Plan: Ran the script locally.

Differential Revision: D18252386

Pulled By: ltamasi

fbshipit-source-id: a7bbb30dc52ff2eaaf31a29ecc79f7cf4e2834dc
2019-10-31 13:45:02 -07:00
sdong
0337d87b42 crash_test: disable atomic flush with pipelined write (#5986)
Summary:
Recently, pipelined write is enabled even if atomic flush is enabled, which causing sanitizing failure in db_stress. Revert this change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5986

Test Plan: Run "make crash_test_with_atomic_flush" and see it to run for some while so that the old sanitizing error (which showed up quickly) doesn't show up.

Differential Revision: D18228278

fbshipit-source-id: 27fdf2f8e3e77068c9725a838b9bef4ab25a2553
2019-10-30 11:36:55 -07:00
sdong
15119f08e2 Add more release branches to tools/check_format_compatible.sh (#5985)
Summary:
More release branches are created. We should include them in continuous format compatibility checks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5985

Test Plan: Let's see whether it is passes.

Differential Revision: D18226532

fbshipit-source-id: 75d8cad5b03ccea4ce16f00cea1f8b7893b0c0c8
2019-10-30 11:20:49 -07:00
sdong
a3960fc875 Move pipeline write waiting logic into WaitForPendingWrites() (#5716)
Summary:
In pipeline writing mode, memtable switching needs to wait for memtable writing to finish to make sure that when memtables are made immutable, inserts are not going to them. This is currently done in DBImpl::SwitchMemtable(). This is done after flush_scheduler_.TakeNextColumnFamily() is called to fetch the list of column families to switch. The function flush_scheduler_.TakeNextColumnFamily() itself, however, is not thread-safe when being called together with flush_scheduler_.ScheduleFlush().
This change provides a fix, which moves the waiting logic before flush_scheduler_.TakeNextColumnFamily(). WaitForPendingWrites() is a natural place where the logic can happen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5716

Test Plan: Run all tests with ASAN and TSAN.

Differential Revision: D18217658

fbshipit-source-id: b9c5e765c9989645bf10afda7c5c726c3f82f6c3
2019-10-29 18:16:36 -07:00
sdong
f22aaf8b3f db_stress: CF Consistency check to use random CF to validate iterator results (#5983)
Summary:
Right now, in db_stress's iterator tests, we always use the same CF to validate iterator results. This commit changes it so that a randomized CF is used in Cf consistency test, where every CF should have exactly the same data. This would help catch more bugs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5983

Test Plan: Run "make crash_test_with_atomic_flush".

Differential Revision: D18217643

fbshipit-source-id: 3ac998852a0378bb59790b20c5f236f6a5d681fe
2019-10-29 18:16:35 -07:00
sdong
9f1e5a0b87 CfConsistencyStressTest to validate key consistent across CFs in TestGet() (#5863)
Summary:
Right now in CF consitency stres test's TestGet(), keys are just fetched without validation. With this change, in 1/2 the time, compare all the CFs share the same value with the same key.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5863

Test Plan: Run "make crash_test_with_atomic_flush" and see tests pass. Hack the code to generate some inconsistency and observe the test fails as expected.

Differential Revision: D17934206

fbshipit-source-id: 00ba1a130391f28785737b677f80f366fb83cced
2019-10-23 16:57:16 -07:00
Yanqin Jin
c0abc6bbc1 Use FLAGS_env for certain operations in db_bench (#5943)
Summary:
Since we already parse env_uri from command line and creates custom Env
accordingly, we should invoke the methods of such Envs instead of using
Env::Default().

Test Plan (on devserver):
```
$make db_bench db_stress
$./db_bench -benchmarks=fillseq
./db_stress
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5943

Differential Revision: D18018550

Pulled By: riversand963

fbshipit-source-id: 03b61329aaae0dfd914a0b902cc677f570f102e3
2019-10-22 11:43:21 -07:00
Yanqin Jin
c53db172a1 Fix TestIterate for HashSkipList in db_stress (#5942)
Summary:
Since SeekForPrev (used by Prev) is not supported by HashSkipList when prefix is used, we disable it when stress testing HashSkipList.

- Change the default memtablerep to skip list.
- Avoid Prev() when memtablerep is HashSkipList and prefix is used.

Test Plan (on devserver):
```
$make db_stress
$./db_stress -ops_per_thread=10000 -reopen=1 -destroy_db_initially=true -column_families=1 -threads=1 -column_families=1 -memtablerep=prefix_hash
$# or simply
$./db_stress
$./db_stress -memtablerep=prefix_hash
```
Results must print "Verification successful".
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5942

Differential Revision: D18017062

Pulled By: riversand963

fbshipit-source-id: af867e59aa9e6f533143c984d7d529febf232fd7
2019-10-18 15:49:12 -07:00
Peter Dillinger
fe464bca5c Fix PlainTableReader not to crash sst_dump (#5940)
Summary:
Plain table SSTs could crash sst_dump because of a bug in
PlainTableReader that can leave table_properties_ as null. Even if it
was intended not to keep the table properties in some cases, they were
leaked on the offending code path.

Steps to reproduce:

    $ db_bench --benchmarks=fillrandom --num=2000000 --use_plain_table --prefix-size=12
    $ sst_dump --file=0000xx.sst --show_properties
    from [] to []
    Process /dev/shm/dbbench/000014.sst
    Sst file format: plain table
    Raw user collected properties
    ------------------------------
    Segmentation fault (core dumped)

Also added missing unit testing of plain table full_scan_mode, and
an assertion in NewIterator to check for regression.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5940

Test Plan: new unit test, manual, make check

Differential Revision: D18018145

Pulled By: pdillinger

fbshipit-source-id: 4310c755e824c4cd6f3f86a3abc20dfa417c5e07
2019-10-18 14:44:42 -07:00
Zhichao Cao
526e3b9763 Enable trace_replay with multi-threads (#5934)
Summary:
In the current trace replay, all the queries are serialized and called by single threads. It may not simulate the original application query situations closely. The multi-threads replay is implemented in this PR. Users can set the number of threads to replay the trace. The queries generated according to the trace records are scheduled in the thread pool job queue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5934

Test Plan: test with make check and real trace replay.

Differential Revision: D17998098

Pulled By: zhichao-cao

fbshipit-source-id: 87eecf6f7c17a9dc9d7ab29dd2af74f6f60212c8
2019-10-18 14:13:50 -07:00
Yanqin Jin
e60cc0925c Expose db stress tests (#5937)
Summary:
expose db stress test by providing db_stress_tool.h in public header.
This PR does the following:
- adds a new header, db_stress_tool.h, in include/rocksdb/
- renames db_stress.cc to db_stress_tool.cc
- adds a db_stress.cc which simply invokes a test function.
- update Makefile accordingly.

Test Plan (dev server):
```
make db_stress
./db_stress
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5937

Differential Revision: D17997647

Pulled By: riversand963

fbshipit-source-id: 1a8d9994f89ce198935566756947c518f0052410
2019-10-18 09:46:44 -07:00
Levi Tamasi
fdc1cb43a6 Support decoding blob indexes in sst_dump (#5926)
Summary:
The patch adds a new command line parameter --decode_blob_index to sst_dump.
If this switch is specified, sst_dump prints blob indexes in a human readable format,
printing the blob file number, offset, size, and expiration (if applicable) for blob
references, and the blob value (and expiration) for inlined blobs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5926

Test Plan:
Used db_bench's BlobDB mode to generate SST files containing blob references with
and without expiration, as well as inlined blobs with and without expiration (note: the
latter are stored as plain values), and confirmed sst_dump correctly prints all four types
of records.

Differential Revision: D17939077

Pulled By: ltamasi

fbshipit-source-id: edc5f58fee94ba35f6699c6a042d5758f5b3963d
2019-10-17 19:36:54 -07:00
Levi Tamasi
78b28d80b0 Support non-TTL Puts for BlobDB in db_bench (#5921)
Summary:
Currently, db_bench only supports PutWithTTL operations for BlobDB but
not regular Puts. The patch adds support for regular (non-TTL) Puts and also
changes the default for blob_db_max_ttl_range to zero, which corresponds
to no TTL.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5921

Test Plan:
make check

./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1
-duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000
-target_file_size_base=1000000 (issues Put operations with no TTL)

./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1
-duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000
-target_file_size_base=1000000 -blob_db_max_ttl_range=86400 (issues
PutWithTTL operations with random TTLs in the [0, blob_db_max_ttl_range)
interval, as before)

Differential Revision: D17919798

Pulled By: ltamasi

fbshipit-source-id: b946c3522b836b92b4c157ffbad24f92ba2b0a16
2019-10-14 17:49:20 -07:00
Maysam Yabandeh
a6e615a7ba Enable partitioned index/filter in stress tests (#5918)
Summary:
This is the 3rd attempt after the revert of https://github.com/facebook/rocksdb/issues/4020 and https://github.com/facebook/rocksdb/issues/5895
The last bug is fixed https://github.com/facebook/rocksdb/pull/5907
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5918

Test Plan:
```
make -j32 crash_test
```

Differential Revision: D17909489

Pulled By: maysamyabandeh

fbshipit-source-id: 7dfb8cf998c2d295c86465dd21734593d277887e
2019-10-14 10:35:18 -07:00
Yanqin Jin
bc8b05cb77 Revert "Enable partitioned index/filter in stress tests (#5895)" (#5904)
Summary:
This reverts commit 2f4e288143.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5904

Differential Revision: D17871282

Pulled By: riversand963

fbshipit-source-id: d210725f8f3b26d8eac25892094da09d9694337e
2019-10-10 19:19:39 -07:00
anand76
80ad996b35 Make the db_stress reopen loop in OperateDb() more robust (#5893)
Summary:
The loop in OperateDb() is getting quite complicated with the introduction of multiple key operations such as MultiGet and Reseeks. This is resulting in a number of corner cases that hangs db_stress due to synchronization problems during reopen (i.e when -reopen=<> option is specified). This PR makes it more robust by ensuring all db_stress threads vote to reopen the DB the exact same number of times.
Most of the changes in this diff are due to indentation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5893

Test Plan: Run crash test

Differential Revision: D17823827

Pulled By: anand1976

fbshipit-source-id: ec893829f611ac7cac4057c0d3d99f9ffb6a6dd9
2019-10-09 09:27:10 -07:00
Yanqin Jin
167cdc9f17 Support custom env in sst_dump (#5845)
Summary:
This PR allows for the creation of custom env when using sst_dump. If
the user does not set options.env or set options.env to nullptr, then sst_dump
will automatically try to create a custom env depending on the path to the sst
file or db directory. In order to use this feature, the user must call
ObjectRegistry::Register() beforehand.

Test Plan (on devserver):
```
$make all && make check
```
All tests must pass to ensure this change does not break anything.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5845

Differential Revision: D17678038

Pulled By: riversand963

fbshipit-source-id: 58ecb4b3f75246d52b07c4c924a63ee61c1ee626
2019-10-08 19:19:12 -07:00
Maysam Yabandeh
2f4e288143 Enable partitioned index/filter in stress tests (#5895)
Summary:
This is the 2nd attempt after the revert of https://github.com/facebook/rocksdb/pull/4020
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5895

Test Plan:
```
./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
```

Differential Revision: D17822137

Pulled By: maysamyabandeh

fbshipit-source-id: 3d148c0d8cc129080410ff859c04b544223c8ea3
2019-10-08 16:50:21 -07:00
anand76
cca87d7722 Fix reopen voting logic in db_stress to prevent hangs (#5876)
Summary:
When multiple operations are performed in a db_stress thread in one loop
iteration, the reopen voting logic needs to take that into account. It
was doing that for MultiGet, but a new option was introduced recently to
do multiple iterator seeks per iteration, which broke it again. Fix the
logic to be more robust and agnostic of the type of operation performed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5876

Test Plan: Run db_stress

Differential Revision: D17733590

Pulled By: anand1976

fbshipit-source-id: 787f01abefa1e83bba43e0b4f4abb26699b2089e
2019-10-03 10:22:26 -07:00
sdong
503a756e42 Fix clang analyze warning in db_stress (#5870)
Summary:
Recent changes trigger clang analyze warning. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5870

Test Plan: "USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j60 analyze" and make sure it passes.

Differential Revision: D17682533

fbshipit-source-id: 02716f2a24572550a22db4bbe9b54d4872dfae32
2019-09-30 22:15:27 -07:00
Jay Zhuang
51413e0a85 Fix a compile error (#5864)
Summary:
```
tools/block_cache_analyzer/block_cache_trace_analyzer.cc:653:48: error: implicit conversion loses integer precision: 'uint64_t' (aka 'unsigned long long') to 'std::__1::linear_congruential_engine<unsigned int, 48271, 0, 2147483647>::result_type' (aka 'unsigned int') [-Werror,-Wshorten-64-to-32]
  std::default_random_engine rand_engine(env_->NowMicros());
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5864

Differential Revision: D17668962

fbshipit-source-id: e08fa58b2a78a8dd8b334862b5714208f696b8ab
2019-09-30 14:02:19 -07:00
sdong
69c4ccb970 Fix three more db_stress bugs (#5867)
Summary:
Two more bug fixes in db_stress:
1. this is to complete the fix of the regression bug causing overflowing when supporting FLAGS_prefix_size = -1.
2. Fix regression bug in compare iterator itself:
(1) when creating control iterator, which used the same read option as the normal iterator by mistake; (2) the logic of comparing has some problems. Fix them.
(3) disable validation for lower bound now, which generated some wildly different results. Disabling it to make normal tests pass while investigating it.
3. Cleaning up snapshots in verification failure cases. Memory is leaked otherwise.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5867

Test Plan: Run "make crash_test" for a while and see at least 1 is fixed.

Differential Revision: D17671712

fbshipit-source-id: 011f98ea1a72aef23e19ff28656830c78699b402
2019-09-30 12:38:23 -07:00
sdong
5cd8aaf75f db_stress: fix run time error when prefix_size = -1 (#5862)
Summary:
When prefix_size = -1, stress test crashes with run time error because of overflow. Fix it by not using -1 but 7 in prefix scan mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5862

Test Plan:
Run
python -u tools/db_crashtest.py --simple whitebox --random_kill_odd \
      888887 --compression_type=zstd
and see it doesn't crash.

Differential Revision: D17642313

fbshipit-source-id: f029e7651498c905af1b1bee6d310ae50cdcda41
2019-09-27 16:55:57 -07:00
sdong
679a45d0cb crash_test to do some verification for prefix extractor and iterator bounds. (#5846)
Summary:
For now, crash_test is not able to report any failure for the logic related to iterator upper, lower bounds or iterators, or reseek. These are features prone to errors. Improve db_stress in several ways:
(1) For each iterator run, reseek up to 3 times.
(2) For every iterator, create control iterator with upper or lower bound, with total order seek. Compare the results with the iterator.
(3) Make simple crash test to avoid prefix size to have more coverage.
(4) make prefix_size = 0 a valid size and -1 to indicate disabling prefix extractor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5846

Test Plan: Manually hack the code to create wrong results and see they are caught by the tool.

Differential Revision: D17631760

fbshipit-source-id: acd460a177bd2124a5ffd7fff490702dba63030b
2019-09-27 11:10:44 -07:00
sdong
e8263dbdaa Apply formatter to recent 200+ commits. (#5830)
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830

Test Plan: Run all existing tests.

Differential Revision: D17488031

fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
2019-09-20 12:04:26 -07:00
sdong
c06b54d0c6 Apply formatter on recent 45 commits. (#5827)
Summary:
Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827

Test Plan: Run all existing tests.

Differential Revision: D17483727

fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f
2019-09-19 12:34:17 -07:00
Maysam Yabandeh
6ec6a4a9a4 Remove snap_refresh_nanos option (#5826)
Summary:
The snap_refresh_nanos option didn't bring much benefit. Remove the feature to simplify the code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5826

Differential Revision: D17467147

Pulled By: maysamyabandeh

fbshipit-source-id: 4f950b046990d0d1292d7fc04c2ccafaf751c7f0
2019-09-18 20:26:04 -07:00
Levi Tamasi
94d62d771e Temporarily disable partitioned index/filter in stress test (#5811)
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 enabled partitioned indexes/filters in stress tests; however,
this causes assertion failures in BatchedOpsStressTest. This patch
disables them until we can root cause the failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5811

Test Plan: Ran the script and made sure it only uses the binary search index.

Differential Revision: D17399366

Pulled By: ltamasi

fbshipit-source-id: adb116e6297f9c6ccd7ac15b6a16c9aa91f21ac5
2019-09-16 11:41:35 -07:00
sdong
b931f84e56 Divide file_reader_writer.h and .cc (#5803)
Summary:
file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803

Test Plan: Build whole project using make and cmake.

Differential Revision: D17374550

fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987
2019-09-16 10:33:51 -07:00
Peter (Stig) Edwards
2ed91622fb sst_dump recompress show #blocks compressed and not compressed (#5791)
Summary:
Closes https://github.com/facebook/rocksdb/issues/1474
Helps show when the 12.5% threshold for GoodCompressionRatio (originally from ldb) is hit.

Example output:

```
> ./sst_dump --file=/tmp/test.sst --command=recompress
from [] to []
Process /tmp/test.sst
Sst file format: block-based
Block Size: 16384
Compression: kNoCompression           Size:  122579836 Blocks:   2300 Compressed:      0 (  0.0%) Not compressed (ratio):   2300 (100.0%) Not compressed (abort):      0 (  0.0%)
Compression: kSnappyCompression       Size:   46289962 Blocks:   2300 Compressed:   2119 ( 92.1%) Not compressed (ratio):    181 (  7.9%) Not compressed (abort):      0 (  0.0%)
Compression: kZlibCompression         Size:   29689825 Blocks:   2300 Compressed:   2301 (100.0%) Not compressed (ratio):      0 (  0.0%) Not compressed (abort):      0 (  0.0%)
Unsupported compression type: kBZip2Compression.
Compression: kLZ4Compression          Size:   44785490 Blocks:   2300 Compressed:   1950 ( 84.8%) Not compressed (ratio):    350 ( 15.2%) Not compressed (abort):      0 (  0.0%)
Compression: kLZ4HCCompression        Size:   37498895 Blocks:   2300 Compressed:   2301 (100.0%) Not compressed (ratio):      0 (  0.0%) Not compressed (abort):      0 (  0.0%)
Unsupported compression type: kXpressCompression.
Compression: kZSTD                    Size:   32208707 Blocks:   2300 Compressed:   2301 (100.0%) Not compressed (ratio):      0 (  0.0%) Not compressed (abort):      0 (  0.0%)
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5791

Differential Revision: D17347870

fbshipit-source-id: af10849c010b46b20e54162b70123c2805ffe526
2019-09-13 16:30:41 -07:00
Lingjing You
1a928c22a0 Add insert hints for each writebatch (#5728)
Summary:
Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it.

Bench result (qps):

`./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4`

master:

| batch size \ thread num | 1       | 2       | 4       | 8       |
| ----------------------- | ------- | ------- | ------- | ------- |
| 1                       | 387883  | 220790  | 308294  | 490998  |
| 10                      | 1397208 | 978911  | 1275684 | 1733395 |
| 100                     | 2045414 | 1589927 | 1798782 | 2681039 |
| 1000                    | 2228038 | 1698252 | 1839877 | 2863490 |

fillseq with writebatch hint:

| batch size \ thread num | 1       | 2       | 4       | 8       |
| ----------------------- | ------- | ------- | ------- | ------- |
| 1                       | 286005  | 223570  | 300024  | 466981  |
| 10                      | 970374  | 813308  | 1399299 | 1753588 |
| 100                     | 1962768 | 1983023 | 2676577 | 3086426 |
| 1000                    | 2195853 | 2676782 | 3231048 | 3638143 |
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5728

Differential Revision: D17297240

fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c
2019-09-12 17:15:18 -07:00
Levi Tamasi
d35ffd569c Temporarily disable hash index in stress tests (#5792)
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash
tests, resulting in assertion failures in Block. This patch disables
the hash index until we can pinpoint the root cause of these issues.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792

Test Plan:
Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2
(binary search and partitioned index).

Differential Revision: D17346777

Pulled By: ltamasi

fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551
2019-09-12 12:11:34 -07:00
Adam Retter
e8c2e68b4e Fix RocksDB bug in block_cache_trace_analyzer.cc on Windows (#5786)
Summary:
This is required to compile on Windows with Visual Studio 2015.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5786

Differential Revision: D17335994

fbshipit-source-id: 8f9568310bc6f697e312b5e24ad465e9084f0011
2019-09-11 18:36:41 -07:00
Andrew Kryczka
dd2a35f13f Support partitioned index and filters in stress/crash tests (#4020)
Summary:
- In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test
- When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020

Test Plan:
currently this is blocked on fixing the bug that crash test caught:

```
$ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
...
Verification failed for column family 0 key 937501: Value not found: NotFound:
Crash-recovery verification failed :(
```

Differential Revision: D8508683

Pulled By: maysamyabandeh

fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629
2019-09-11 14:13:38 -07:00
anand76
eb9026f09b Add a db_bench benchmark to warm up the row cache
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5707

Differential Revision: D17242698

Pulled By: anand1976

fbshipit-source-id: 5d1bfda3c9e8f56176ae391cae6c91e6262016b8
2019-09-10 11:06:36 -07:00
sdong
1daff8f85a crash_test to skip compaction TTL for FIFO compaction. (#5749)
Summary:
https://github.com/facebook/rocksdb/pull/5741 added compaction TTL to crash test, but it causes assertion fails for FIFO compaction. Disable this combination for now while we debug the assertion failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5749

Test Plan: Run crash test and observe that when compaction_style=2, compaction_ttl is always 0.

Differential Revision: D17078292

fbshipit-source-id: 446821a3b9739956094d5e4f9be1251a15b57f5d
2019-08-27 17:55:37 -07:00
sdong
1d6a10f52d Extend stress test to cover periodic compaction and compaction TTL (#5741)
Summary:
Covering periodic compaction and compaction TTL can help us expose potential issues. Add it there.
Randomly select value for these two options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5741

Test Plan: Run crash_test and see the perameters generated.

Differential Revision: D17059515

fbshipit-source-id: 8213974846a0b6a22fc13be705825c9054d1d097
2019-08-26 15:03:25 -07:00
Zhongyi Xie
2f41ecfe75 Refactor trimming logic for immutable memtables (#5022)
Summary:
MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory.
We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one.
The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming.
In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022

Differential Revision: D14394062

Pulled By: miasantreble

fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5
2019-08-23 13:55:34 -07:00
sdong
d8a27d9331 Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729)
Summary:
AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same.
Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729

Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed.

Differential Revision: D16969791

fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163
2019-08-22 16:32:55 -07:00
sdong
8e12638f3d Slightly adjust atomic white box test's kill odd (#5717)
Summary:
Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout.

If WAL is disabled, make the odds 5x likely to trigger.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5717

Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out.

Differential Revision: D16897237

fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c
2019-08-19 10:51:59 -07:00
sdong
e1c468d16f Do readahead in VerifyChecksum() (#5713)
Summary:
Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713

Test Plan: Add a new unit test.

Differential Revision: D16860874

fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413
2019-08-16 16:42:56 -07:00
sdong
bd2c753dd0 Add command "list_file_range_deletes" in ldb (#5615)
Summary:
Add a command in ldb so that users can print out tombstones in SST files.
In order to test the code, change the interface of LDBCommandRunner::RunCommand() so that it doesn't return from the program, but return the status code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5615

Test Plan: Add a new unit test

Differential Revision: D16550326

fbshipit-source-id: 88ddfe6984bdcbb3a528abdd115089df09eba52e
2019-08-15 17:01:03 -07:00
haoyuhuang
3da225716c Block cache analyzer: Support reading from human readable trace file. (#5679)
Summary:
This PR adds support in block cache trace analyzer to read from human readable trace file. This is needed when a user does not have access to the binary trace file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5679

Test Plan: USE_CLANG=1 make check -j32

Differential Revision: D16697239

Pulled By: HaoyuHuang

fbshipit-source-id: f2e29d7995816c389b41458f234ec8e184a924db
2019-08-09 13:13:54 -07:00
haoyuhuang
6e78fe3c8d Pysim more algorithms (#5644)
Summary:
This PR adds four more eviction policies.
- OPT [1]
- Hyperbolic caching [2]
- ARC [3]
- GreedyDualSize [4]

[1] L. A. Belady. 1966. A Study of Replacement Algorithms for a Virtual-storage Computer. IBM Syst. J. 5, 2 (June 1966), 78-101. DOI=http://dx.doi.org/10.1147/sj.52.0078
[2] Aaron Blankstein, Siddhartha Sen, and Michael J. Freedman. 2017. Hyperbolic caching: flexible caching for web applications. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC '17). USENIX Association, Berkeley, CA, USA, 499-511.
[3] Nimrod Megiddo and Dharmendra S. Modha. 2003. ARC: A Self-Tuning, Low Overhead Replacement Cache. In Proceedings of the 2nd USENIX Conference on File and Storage Technologies (FAST '03). USENIX Association, Berkeley, CA, USA, 115-130.
[4] N. Young. The k-server dual and loose competitiveness for paging. Algorithmica, June 1994, vol. 11,(no.6):525-41. Rewritten version of ''On-line caching as cache size varies'', in The 2nd Annual ACM-SIAM Symposium on Discrete Algorithms, 241-250, 1991.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5644

Differential Revision: D16548817

Pulled By: HaoyuHuang

fbshipit-source-id: 838f76db9179f07911abaab46c97e1c929cfcd63
2019-08-06 18:50:59 -07:00
Vijay Nadimpalli
d150e01474 New API to get all merge operands for a Key (#5604)
Summary:
This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
1. Update subset of columns and read subset of columns -
Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
2. Updating very few attributes in a value which is a JSON-like document -
Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
----------------------------------------------------------------------------------------------------
API :
Status GetMergeOperands(
      const ReadOptions& options, ColumnFamilyHandle* column_family,
      const Slice& key, PinnableSlice* merge_operands,
      GetMergeOperandsOptions* get_merge_operands_options,
      int* number_of_operands)

Example usage :
int size = 100;
int number_of_operands = 0;
std::vector<PinnableSlice> values(size);
GetMergeOperandsOptions merge_operands_info;
db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);

Description :
Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604

Test Plan:
Added unit test and perf test in db_bench that can be run using the command:
./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist

Differential Revision: D16657366

Pulled By: vjnadimpalli

fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
2019-08-06 14:26:44 -07:00
haoyuhuang
f4a616ebf9 Block cache analyzer: python script to plot graphs (#5673)
Summary:
This PR updated the python script to plot graphs for stats output from block cache analyzer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5673

Test Plan: Manually run the script to generate graphs.

Differential Revision: D16657145

Pulled By: HaoyuHuang

fbshipit-source-id: fd510b5fd4307835f9a986fac545734dbe003d28
2019-08-05 18:35:52 -07:00
haoyuhuang
70c7302fb5 Block cache simulator: Add pysim to simulate caches using reinforcement learning. (#5610)
Summary:
This PR implements cache eviction using reinforcement learning. It includes two implementations:
1. An implementation of Thompson Sampling for the Bernoulli Bandit [1].
2. An implementation of LinUCB with disjoint linear models [2].

The idea is that a cache uses multiple eviction policies, e.g., MRU, LRU, and LFU. The cache learns which eviction policy is the best and uses it upon a cache miss.
Thompson Sampling is contextless and does not include any features.
LinUCB includes features such as level, block type, caller, column family id to decide which eviction policy to use.

[1] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11, 1 (July 2018), 1-96. DOI: https://doi.org/10.1561/2200000070
[2] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 661-670. DOI=http://dx.doi.org/10.1145/1772690.1772758
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5610

Differential Revision: D16435067

Pulled By: HaoyuHuang

fbshipit-source-id: 6549239ae14115c01cb1e70548af9e46d8dc21bb
2019-07-26 14:41:13 -07:00
Mark Rambacher
cfcf045acc The ObjectRegistry class replaces the Registrar and NewCustomObjects.… (#5293)
Summary:
The ObjectRegistry class replaces the Registrar and NewCustomObjects.  Objects are registered with the registry by Type (the class must implement the static const char *Type() method).

This change is necessary for a few reasons:
- By having a class (rather than static template instances), the class can be passed between compilation units, meaning that objects could be registered and shared from a dynamic library with an executable.
- By having a class with instances, different units could have different objects registered.  This could be useful if, for example, one Option allowed for a dynamic library and one did not.

When combined with some other PRs (being able to load shared libraries, a Configurable interface to configure objects to/from string), this code will allow objects in external shared libraries to be added to a RocksDB image at run-time, rather than requiring every new extension to be built into the main library and called explicitly by every program.

Test plan (on riversand963's  devserver)
```
$COMPILE_WITH_ASAN=1 make -j32 all && sleep 1 && make check
```
All tests pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5293

Differential Revision: D16363396

Pulled By: riversand963

fbshipit-source-id: fbe4acb615bfc11103eef40a0b288845791c0180
2019-07-23 17:13:05 -07:00
sdong
3782accf7d ldb sometimes specify a string-append merge operator (#5607)
Summary:
Right now, ldb cannot scan a DB with merge operands with default ldb. There is no hard to give a general merge operator so that it can at least print out something
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5607

Test Plan: Run ldb against a DB with merge operands and see the outputs.

Differential Revision: D16442634

fbshipit-source-id: c66c414ec07f219cfc6e6ec2cc14c783ee95df54
2019-07-23 14:25:18 -07:00
haoyuhuang
3778470061 Block cache analyzer: Compute correlation of features and human readable trace file. (#5596)
Summary:
- Compute correlation between a few features and predictions, e.g., number of accesses since the last access vs number of accesses till the next access on a block.
- Output human readable trace file so python can consume it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5596

Test Plan: make clean && USE_CLANG=1 make check -j32

Differential Revision: D16373200

Pulled By: HaoyuHuang

fbshipit-source-id: c848d26bc2e9210461f317d7dbee42d55be5a0cc
2019-07-22 17:51:34 -07:00
Yanqin Jin
a78503bd6c Temporarily disable snapshot list refresh for atomic flush stress test (#5581)
Summary:
Atomic flush test started to fail after https://github.com/facebook/rocksdb/issues/5099. Then https://github.com/facebook/rocksdb/issues/5278 provided a fix after
which the same error occurred much less frequently. However it still occur
occasionally. Not sure what the root cause is. This PR disables the feature of
snapshot list refresh, and we should keep an eye on the failure in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5581

Differential Revision: D16295985

Pulled By: riversand963

fbshipit-source-id: c9e62e65133c52c21b07097de359632ca62571e4
2019-07-22 14:38:16 -07:00
sdong
6bb3b4b567 ldb idump to support non-default column families. (#5594)
Summary:
ldb idump now only works for default column family. Extend it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5594

Test Plan: Compile and run the tool against a multiple CF DB.

Differential Revision: D16380684

fbshipit-source-id: bfb8af36fdad1806837c90aaaab492d71528aceb
2019-07-19 11:36:59 -07:00
haoyuhuang
8a008d4170 Block access tracing: Trace referenced key for Get on non-data blocks. (#5548)
Summary:
This PR traces the referenced key for Get for all types of blocks. This is useful when evaluating hybrid row-block caches.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5548

Test Plan: make clean && USE_CLANG=1 make check -j32

Differential Revision: D16157979

Pulled By: HaoyuHuang

fbshipit-source-id: f6327411c9deb74e35e22a35f66cdbae09ab9d87
2019-07-17 13:05:58 -07:00
Levi Tamasi
3bde41b5a3 Move the filter readers out of the block cache (#5504)
Summary:
Currently, when the block cache is used for the filter block, it is not
really the block itself that is stored in the cache but a FilterBlockReader
object. Since this object is not pure data (it has, for instance, pointers that
might dangle, including in one case a back pointer to the TableReader), it's not
really sharable. To avoid the issues around this, the current code erases the
cache entries when the TableReader is closed (which, BTW, is not sufficient
since a concurrent TableReader might have picked up the object in the meantime).
Instead of doing this, the patch moves the FilterBlockReader out of the cache
altogether, and decouples the filter reader object from the filter block.
In particular, instead of the TableReader owning, or caching/pinning the
FilterBlockReader (based on the customer's settings), with the change the
TableReader unconditionally owns the FilterBlockReader, which in turn
owns/caches/pins the filter block. This change also enables us to reuse the code
paths historically used for data blocks for filters as well.

Note:
Eviction statistics for filter blocks are temporarily broken. We plan to fix this in a
separate phase.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5504

Test Plan: make asan_check

Differential Revision: D16036974

Pulled By: ltamasi

fbshipit-source-id: 770f543c5fb4ed126fd1e04bfd3809cf4ff9c091
2019-07-16 13:14:58 -07:00
haoyuhuang
68d43b4d30 A python script to plot graphs for cvs files generated by block_cache_trace_analyzer
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5563

Test Plan: Manually run the script on files generated by block_cache_trace_analyzer.

Differential Revision: D16214400

Pulled By: HaoyuHuang

fbshipit-source-id: 94485eed995e9b2b63e197c5dfeb80129fa7897f
2019-07-12 18:56:20 -07:00
haoyuhuang
3e9c5a3523 Block cache analyzer: Add more stats (#5516)
Summary:
This PR provides more command line options for block cache analyzer to better understand block cache access pattern.
-analyze_bottom_k_access_count_blocks
-analyze_top_k_access_count_blocks
-reuse_lifetime_labels
-reuse_lifetime_buckets
-analyze_callers
-access_count_buckets
-analyze_blocks_reuse_k_reuse_window
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5516

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16037440

Pulled By: HaoyuHuang

fbshipit-source-id: b9a4ac0d4712053fab910732077a4d4b91400bc8
2019-07-12 16:55:34 -07:00
haoyuhuang
1a59b6e2a9 Cache simulator: Add a ghost cache for admission control and a hybrid row-block cache. (#5534)
Summary:
This PR adds a ghost cache for admission control. Specifically, it admits an entry on its second access.
It also adds a hybrid row-block cache that caches the referenced key-value pairs of a Get/MultiGet request instead of its blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5534

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16101124

Pulled By: HaoyuHuang

fbshipit-source-id: b99edda6418a888e94eb40f71ece45d375e234b1
2019-07-11 12:43:29 -07:00
Yanqin Jin
f786b4a5b4 Improve result print on atomic flush stress test failure (#5549)
Summary:
When atomic flush stress test fails, we print internal keys within the range with mismatched key/values for all column families.

Test plan (on devserver)
Manually hack the code to randomly insert wrong data. Run the test.
```
$make clean && COMPILE_WITH_TSAN=1 make -j32 db_stress
$./db_stress -test_atomic_flush=true -ops_per_thread=10000
```
Check that proper error messages are printed, as follows:
```
2019/07/08-17:40:14  Starting verification
Verification failed
Latest Sequence Number: 190903
[default] 000000000000050B => 56290000525350515E5F5C5D5A5B5859
[3] 0000000000000533 => EE100000EAEBE8E9E6E7E4E5E2E3E0E1FEFFFCFDFAFBF8F9
Internal keys in CF 'default', [000000000000050B, 0000000000000533] (max 8)
  key 000000000000050B seq 139920 type 1
  key 0000000000000533 seq 0 type 1
Internal keys in CF '3', [000000000000050B, 0000000000000533] (max 8)
  key 0000000000000533 seq 0 type 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5549

Differential Revision: D16158709

Pulled By: riversand963

fbshipit-source-id: f07fa87763f87b3bd908da03c956709c6456bcab
2019-07-09 16:27:22 -07:00
sdong
aa0367aabb Allow ldb to open DB as secondary (#5537)
Summary:
Right now ldb can open running DB through read-only DB. However, it might leave info logs files to the read-only DB directory. Add an option to open the DB as secondary to avoid it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5537

Test Plan:
Run
./ldb scan  --max_keys=10 --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp --no_value --hex
and
./ldb get 0x00000000000000103030303030303030 --hex --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp
against a normal db_bench run and observe the output changes. Also observe that no new info logs files are created under /tmp/rocksdbtest-2491/dbbench.
Run without --secondary_path and observe that new info logs created under /tmp/rocksdbtest-2491/dbbench.

Differential Revision: D16113886

fbshipit-source-id: 4e09dec47c2528f6ca08a9e7a7894ba2d9daebbb
2019-07-09 12:51:28 -07:00
Tim Hatch
a6a9213a36 Fix interpreter lines for files with python2-only syntax.
Reviewed By: lisroach

Differential Revision: D15362271

fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5
2019-07-09 10:51:37 -07:00
sdong
872a261ffc db_stress to print some internal keys after verification failure (#5543)
Summary:
Print out some more information when db_tress fails with verification failures to help debugging problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5543

Test Plan:
Manually ingest some failures and observe the outputs are like this:

Verification failed
[default] 0000000000199A5A => 7C3D000078797A7B74757677707172736C6D6E6F68696A6B
[6] 000000000019C8BD => 65380000616063626D6C6F6E69686B6A
internal keys in default CF [0000000000199A5A, 000000000019C8BD] (max 8)
  key 0000000000199A5A seq 179246 type 1
  key 000000000019C8BD seq 163970 type 1
Lastest Sequence Number: 292234

Differential Revision: D16153717

fbshipit-source-id: b33fa50a828c190cbf8249a37955432044f92daf
2019-07-08 13:36:37 -07:00
sdong
e4dcf5fd22 db_bench to add a new "benchmark" to print out all stats history (#5532)
Summary:
Sometimes it is helpful to fetch the whole history of stats after benchmark runs. Add such an option
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5532

Test Plan: Run the benchmark manually and observe the output is as expected.

Differential Revision: D16097764

fbshipit-source-id: 10b5b735a22a18be198b8f348be11f11f8806904
2019-07-03 20:03:28 -07:00
haoyuhuang
66464d1fde Remove multiple declarations o kMicrosInSecond.
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5526

Test Plan:
OPT=-g V=1 make J=1 unity_test -j32
make clean && make -j32

Differential Revision: D16079315

Pulled By: HaoyuHuang

fbshipit-source-id: 294ab439cf0db8dd5da44e30eabf0cbb2bb8c4f6
2019-07-01 15:15:12 -07:00
Eli Pozniansky
3e6c185381 Formatting fixes in db_bench_tool (#5525)
Summary:
Formatting fixes in db_bench_tool that were accidentally omitted
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5525

Test Plan: Unit tests

Differential Revision: D16078516

Pulled By: elipoz

fbshipit-source-id: bf8df0e3f08092a91794ebf285396d9b8a335bb9
2019-07-01 14:57:28 -07:00
Eli Pozniansky
f872009237 Fix from some C-style casting (#5524)
Summary:
Fix from some C-style casting in bloom.cc and ./tools/db_bench_tool.cc
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5524

Differential Revision: D16075626

Pulled By: elipoz

fbshipit-source-id: 352948885efb64a7ef865942c75c3c727a914207
2019-07-01 13:05:34 -07:00
haoyuhuang
9f0bd56889 Cache simulator: Refactor the cache simulator so that we can add alternative policies easily (#5517)
Summary:
This PR creates cache_simulator.h file. It contains a CacheSimulator that runs against a block cache trace record. We can add alternative cache simulators derived from CacheSimulator later. For example, this PR adds a PrioritizedCacheSimulator that inserts filter/index/uncompressed dictionary blocks with high priority.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5517

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16043689

Pulled By: HaoyuHuang

fbshipit-source-id: 65f28ed52b866ffb0e6eceffd7f9ca7c45bb680d
2019-07-01 12:46:32 -07:00
Yanqin Jin
c360675750 Add secondary instance to stress test (#5479)
Summary:
This PR allows users to run stress tests on secondary instance.

Test plan (on devserver)
```
./db_stress -ops_per_thread=100000 -enable_secondary=true -threads=32 -secondary_catch_up_one_in=10000 -clear_column_family_one_in=1000 -reopen=100
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5479

Differential Revision: D16074325

Pulled By: riversand963

fbshipit-source-id: c0ed959e7b6c7cda3efd0b3070ab379de3b29f1c
2019-07-01 11:49:50 -07:00
sdong
10bae8ceb3 Add more release versions to tools/check_format_compatible.sh (#5518)
Summary:
tools/check_format_compatible.sh is lagged behind. Catch up.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5518

Test Plan: Run the command

Differential Revision: D16063180

fbshipit-source-id: d063eb42df9653dec06a2cf0fb982b8a60ca3d2f
2019-06-28 17:41:58 -07:00
Aaron Gao
5c2f13fb14 add create_column_family and drop_column_family cmd to ldb tool (#5503)
Summary:
`create_column_family` cmd already exists but was somehow missed in the help message.
also add `drop_column_family` cmd which can drop a cf without opening db.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5503

Test Plan: Updated existing ldb_test.py to test deleting a column family.

Differential Revision: D16018414

Pulled By: lightmark

fbshipit-source-id: 1fc33680b742104fea86b10efc8499f79e722301
2019-06-27 11:11:48 -07:00
haoyuhuang
554a6456aa Block cache trace analysis: Write time series graphs in csv files (#5490)
Summary:
This PR adds a feature in block cache trace analysis tool to write statistics into csv files.
1. The analysis tool supports grouping the number of accesses per second by various labels, e.g., block, column family, block type, or a combination of them.
2. It also computes reuse distance and reuse interval.

Reuse distance: The cumulated size of unique blocks read between two consecutive accesses on the same block.
Reuse interval: The time between two consecutive accesses on the same block.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5490

Differential Revision: D15901322

Pulled By: HaoyuHuang

fbshipit-source-id: b5454fea408a32757a80be63de6fe1c8149ca70e
2019-06-24 20:42:12 -07:00
Yanqin Jin
1bfeffab2d Stop printing after verification fails (#5493)
Summary:
Stop verification and printing once verification fails.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5493

Differential Revision: D15928992

Pulled By: riversand963

fbshipit-source-id: 699feac034a217d57280aa3fb50f5aba06adf317
2019-06-20 22:16:58 -07:00
haoyuhuang
705b8eecb4 Add more callers for table reader. (#5454)
Summary:
This PR adds more callers for table readers. These information are only used for block cache analysis so that we can know which caller accesses a block.
1. It renames the BlockCacheLookupCaller to TableReaderCaller as passing the caller from upstream requires changes to table_reader.h and TableReaderCaller is a more appropriate name.
2. It adds more table reader callers in table/table_reader_caller.h, e.g., kCompactionRefill, kExternalSSTIngestion, and kBuildTable.

This PR is long as it requires modification of interfaces in table_reader.h, e.g., NewIterator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5454

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32.

Differential Revision: D15819451

Pulled By: HaoyuHuang

fbshipit-source-id: b6caa704c8fb96ddd15b9a934b7e7ea87f88092d
2019-06-20 14:31:48 -07:00
haoyuhuang
2e8ad03ab3 Add more stats in the block cache trace analyzer (#5482)
Summary:
This PR adds more stats in the block cache trace analyzer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5482

Differential Revision: D15883553

Pulled By: HaoyuHuang

fbshipit-source-id: 6d440e4f657af75690420102d532d0ee1ed4e9cf
2019-06-18 18:38:42 -07:00
Huisheng Liu
92f631da33 replace sprintf with its safe version snprintf (#5475)
Summary:
sprintf is unsafe and has buffer overrun risk. Replace it with the safer version snprintf where buffer size is supplied to avoid overrun.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5475

Differential Revision: D15879481

Pulled By: sagar0

fbshipit-source-id: 7ae1958ffc9727fa50261dfbb98ddd74e70a72d8
2019-06-18 16:42:26 -07:00
haoyuhuang
bcfc53b436 Block cache tracing: Fix minor bugs with downsampling and some benchmark results. (#5473)
Summary:
As the code changes for block cache tracing are almost complete, I did a benchmark to compare the performance when block cache tracing is enabled/disabled.

 With 1% downsampling ratio, the performance overhead of block cache tracing is negligible. When we trace all block accesses, the throughput drops by 6 folds with 16 threads issuing random reads and all reads are served in block cache.

Setup:
RocksDB:    version 6.2
Date:       Mon Jun 17 17:11:13 2019
CPU:        24 * Intel Core Processor (Skylake)
CPUCache:   16384 KB
Keys:       20 bytes each
Values:     100 bytes each (100 bytes after compression)
Entries:    10000000
Prefix:    20 bytes
Keys per prefix:    0
RawSize:    1144.4 MB (estimated)
FileSize:   1144.4 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: NoCompression
Compression sampling rate: 0
Memtablerep: skip_list
Perf Level: 1

I ran the readrandom workload for 1 minute. Detailed throughput results:  (ops/second)
Sample rate 0: no block cache tracing.
Sample rate 1: trace all block accesses.
Sample rate 100: trace accesses 1% blocks.
1 thread |   |   |  -- | -- | -- | --
Sample rate | 0 | 1 | 100
1 MB block cache size | 13,094 | 13,166 | 13,341
10 GB block cache size | 202,243 | 188,677 | 229,182

16 threads |   |   |  -- | -- | -- | --
Sample rate | 0 | 1 | 100
1 MB block cache size | 208,761 | 178,700 | 201,872
10 GB block cache size | 2,645,996 | 426,295 | 2,587,605
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5473

Differential Revision: D15869479

Pulled By: HaoyuHuang

fbshipit-source-id: 7ae802abe84811281a6af8649f489887cd7c4618
2019-06-17 17:59:02 -07:00
haoyuhuang
2d1dd5bce7 Support computing miss ratio curves using sim_cache. (#5449)
Summary:
This PR adds a BlockCacheTraceSimulator that reports the miss ratios given different cache configurations. A cache configuration contains "cache_name,num_shard_bits,cache_capacities". For example, "lru, 1, 1K, 2K, 4M, 4G".

When we replay the trace, we also perform lookups and inserts on the simulated caches.
In the end, it reports the miss ratio for each tuple <cache_name, num_shard_bits, cache_capacity> in a output file.

This PR also adds a main source block_cache_trace_analyzer so that we can run the analyzer in command line.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5449

Test Plan:
Added tests for block_cache_trace_analyzer.
COMPILE_WITH_ASAN=1 make check -j32.

Differential Revision: D15797073

Pulled By: HaoyuHuang

fbshipit-source-id: aef0c5c2e7938f3e8b6a10d4a6a50e6928ecf408
2019-06-17 16:41:12 -07:00
Zhongyi Xie
671d15cbdd Persistent Stats: persist stats history to disk (#5046)
Summary:
This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and  both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046

Differential Revision: D15863138

Pulled By: miasantreble

fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b
2019-06-17 15:21:50 -07:00
haoyuhuang
d43b4cd570 Integrate block cache tracing into db_bench (#5459)
Summary:
This PR integrates the block cache tracing into db_bench. It adds three command line arguments.
-block_cache_trace_file (Block cache trace file path.) type: string default: ""
-block_cache_trace_max_trace_file_size_in_bytes (The maximum block cache
trace file size in bytes. Block cache accesses will not be logged if the
trace file size exceeds this threshold. Default is 64 GB.) type: int64
default: 68719476736
-block_cache_trace_sampling_frequency (Block cache trace sampling
frequency, termed s. It uses spatial downsampling and samples accesses to
one out of s blocks.) type: int32 default: 1
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5459

Differential Revision: D15832031

Pulled By: HaoyuHuang

fbshipit-source-id: 0ecf2f2686557251fe741a2769b21170777efa3d
2019-06-17 11:08:21 -07:00
haoyuhuang
7a8d7358bb Integrate block cache tracer in block based table reader. (#5441)
Summary:
This PR integrates the block cache tracer into block based table reader. The tracer will write the block cache accesses using the trace_writer. The tracer is null in this PR so that nothing will be logged.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5441

Differential Revision: D15772029

Pulled By: HaoyuHuang

fbshipit-source-id: a64adb92642cd23222e0ba8b10d86bf522b42f9b
2019-06-14 17:40:31 -07:00
haoyuhuang
bb4178066d Integrate block cache tracer into db_impl (#5433)
Summary:
This PR integrates the block cache tracer class into db_impl.cc.
db_impl.cc contains a member variable of AtomicBlockCacheTraceWriter class and passes its reference to the block_based_table_reader.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5433

Differential Revision: D15728016

Pulled By: HaoyuHuang

fbshipit-source-id: 23d5659e8c82d556833dcc1a5558aac8c1f7db71
2019-06-13 15:43:10 -07:00
Maysam Yabandeh
f9842869cf Disable pipeline writes in stress test (#5445)
Summary:
The tsan crash tests are failing with a data race compliant with pipelined write option. Temporarily disable it until its concurrency issue are fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5445

Differential Revision: D15783824

Pulled By: maysamyabandeh

fbshipit-source-id: 413a0c3230b86f524fc7eeea2cf8e8375406e65b
2019-06-12 11:12:36 -07:00
haoyuhuang
9bbccda01e First commit for block cache trace analyzer (#5425)
Summary:
This PR contains the first commit for block cache trace analyzer. It reads a block cache trace file and prints statistics of the traces.

We will extend this class to provide more functionalities.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5425

Differential Revision: D15709580

Pulled By: HaoyuHuang

fbshipit-source-id: 2f43bd2311f460ab569880819d95eeae217c20bb
2019-06-11 12:22:44 -07:00
Zhongyi Xie
d68f9f4580 simplify include directive involving inttypes (#5402)
Summary:
When using `PRIu64` type of printf specifier, current code base does the following:
```
#ifndef __STDC_FORMAT_MACROS
#define __STDC_FORMAT_MACROS
#endif
#include <inttypes.h>
```
However, this can be simplified to
```
#include <cinttypes>
```
as long as flag `-std=c++11` is used.
This should solve issues like https://github.com/facebook/rocksdb/issues/5159
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5402

Differential Revision: D15701195

Pulled By: miasantreble

fbshipit-source-id: 6dac0a05f52aadb55e9728038599d3d2e4b59d03
2019-06-06 13:56:07 -07:00
Siying Dong
5851cb7fdb Move util/trace_replay.* to trace_replay/ (#5376)
Summary:
util/ means for lower level libraries. trace_replay is highly integrated to DB and sometimes call DB. Move it out to a separate directory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5376

Differential Revision: D15550938

Pulled By: siying

fbshipit-source-id: f46dce5ceffdc05a73f26379c7bb1b79ebe6c207
2019-06-03 13:25:26 -07:00
Siying Dong
000b9ec217 Move some logging related files to logging/ (#5387)
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387

Differential Revision: D15579036

Pulled By: siying

fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
2019-05-31 17:23:59 -07:00
Vijay Nadimpalli
49c5a12dbe Organizing rocksdb/db directory
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390

Differential Revision: D15579388

Pulled By: vjnadimpalli

fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366
2019-05-31 11:57:01 -07:00
Yanqin Jin
83f7a8eed0 Fix compilation error in LITE mode (#5391)
Summary:
Add macro ROCKSDB_LITE to fix compilation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5391

Differential Revision: D15574522

Pulled By: riversand963

fbshipit-source-id: 95aea83c5d9b2bf98a3ba0ef9167b63c9be2988b
2019-05-31 08:32:22 -07:00
Yanqin Jin
b9f5900658 Fix WAL replay by skipping old write batches (#5170)
Summary:
1. Fix a bug in WAL replay in which write batches with old sequence numbers are mistakenly inserted into memtables.
2. Add support for benchmarking secondary instance to db_bench_tool.
With changes made in this PR, we can start benchmarking secondary instance
using two processes. It is also possible to vary the frequency at which the
secondary instance tries to catch up with the primary. The info log of the
secondary can be found in a directory whose path can be specified with
'-secondary_path'.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5170

Differential Revision: D15564608

Pulled By: riversand963

fbshipit-source-id: ce97688ed3d33f69d3a0b9266ebbbbf887aa0ec8
2019-05-30 19:33:33 -07:00
Siying Dong
8843129ece Move some memory related files from util/ to memory/ (#5382)
Summary:
Move arena, allocator, and memory tools under util to a separate memory/ directory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382

Differential Revision: D15564655

Pulled By: siying

fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
2019-05-30 17:44:09 -07:00
Vijay Nadimpalli
50e470791d Organizing rocksdb/table directory by format
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373

Differential Revision: D15559425

Pulled By: vjnadimpalli

fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55
2019-05-30 14:51:11 -07:00
anand76
bd44ec2006 Fix reopen voting logic in db_stress when using MultiGet (#5374)
Summary:
When the --reopen option is non-zero, the DB is reopened after every ops_per_thread/(reopen+1) ops, with the check being done after every op. With MultiGet, we might do multiple ops in one iteration, which broke the logic that checked when to synchronize among the threads and reopen the DB. This PR fixes that logic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5374

Differential Revision: D15559780

Pulled By: anand1976

fbshipit-source-id: ee6563a68045df7f367eca3cbc2500d3e26359ef
2019-05-30 11:41:08 -07:00
Siying Dong
e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Siying Dong
545d206040 Move some file related files outside util/ (#5375)
Summary:
util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375

Differential Revision: D15550935

Pulled By: siying

fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
2019-05-29 20:47:06 -07:00
Maysam Yabandeh
eab4f49a2c WritePrepared: skip_concurrency_control option (#5330)
Summary:
This enables the user to set TransactionDBOptions::skip_concurrency_control so the standard `DB::Write(const WriteOptions& opts, WriteBatch* updates)` would skip the concurrency control. This would give higher throughput to the users who know their use case doesn't need concurrency control.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5330

Differential Revision: D15525932

Pulled By: maysamyabandeh

fbshipit-source-id: 68421ac1ba34f549a4a8de9ce4c2dccf6fb4b06b
2019-05-28 16:29:45 -07:00
Silver Chan
2095ae8858 fixed db_stress.cc build error (#5307)
Summary:
when building this file using Xcode 10.2.1 in MacOSX10.14, the compiler report this error:
`
rocksdb/tools/db_stress.cc:3613:33: error: implicit instantiation of
      undefined template 'std::__1::array<std::__1::basic_string<char>, 10>'
    std::array<std::string, 10> keys = {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"};
/usr/include/c++/v1/__tuple:223:64: note:
      template is declared here
template <class _Tp, size_t _Size> struct _LIBCPP_TEMPLATE_VIS array;
                                                               ^
1 error generated.
`
if including array, this error will be fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5307

Differential Revision: D15475217

Pulled By: sagar0

fbshipit-source-id: b04a7658c2ca2573157028863b3a80f5ab52b9de
2019-05-23 14:03:25 -07:00
Zhichao Cao
a13026fb2f Added trace replay fast forward function (#5273)
Summary:
In the current db_bench trace replay, the replay process strictly follows the timestamp to issue the queries. In some cases, user does not care about the time. Therefore, fast forward is needed for users to speed up the replay process.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5273

Differential Revision: D15389232

Pulled By: zhichao-cao

fbshipit-source-id: 735d629b9d2a167b05af3e4fa0ddf9d5d0be1806
2019-05-16 20:21:18 -07:00
anand76
6492430eaf Fix a bug in db_stress and an incorrect assertion in FilePickerMultiGet (#5301)
Summary:
This PR has two fixes for crash test failures -
1. Fix a bug in TestMultiGet() in db_stress that was passing list of key to MultiGet() in the wrong order, thus ensuring that actual values don't match expected values
2. Remove an incorrect assertion in FilePickerMultiGet::GetNextFileInLevelWithKeys() that checks that files in a level are in sorted order. This is not true with MultiGet(), especially if there are duplicate keys and we may have to go back one file for the next key. Furthermore, this assertion makes more sense when a new version is created, rather than at lookup time

Test -
asan_crash and ubsan_crash tests
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5301

Differential Revision: D15337383

Pulled By: anand1976

fbshipit-source-id: 35092cb15bbc1700e5e823cbe07bfa62f1e9e6c6
2019-05-14 11:58:04 -07:00
Maysam Yabandeh
f383641a1d Unordered Writes (#5218)
Summary:
Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees.
The patch is prepared based on an original by siying
Existing unit tests are extended to include unordered_write option.

Benchmark Results:
```
TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions  --unordered_write=1
```
With WAL
- Vanilla RocksDB: 78.6 MB/s
- WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x)
- unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees)

Without WAL
- Vanilla RocksDB: 111.3 MB/s
- WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x)
- unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees)

- WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x)

Limitations:
- The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218

Differential Revision: D15219029

Pulled By: maysamyabandeh

fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759
2019-05-13 17:47:21 -07:00
Yi Wu
92c60547fe db_bench: fix hang on IO error (#5300)
Summary:
db_bench will wait indefinitely if there's background error. Fix by pass `abs_time_us` to cond var.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5300

Differential Revision: D15319945

Pulled By: miasantreble

fbshipit-source-id: 0034fb7f6ec7c3303c4ccf26e54c20fbdac8ab44
2019-05-13 11:30:35 -07:00
anand76
181bb43f08 Fix bugs in FilePickerMultiGet (#5292)
Summary:
This PR fixes a couple of bugs in FilePickerMultiGet that were causing db_stress test failures. The failures were caused by -
1. Improper handling of a key that matches the user key portion of an L0 file's largest key. In this case, the curr_index_in_curr_level file index in L0 for that key was getting incremented, but batch_iter_ was not advanced. By design, all keys in a batch are supposed to be checked against an L0 file before advancing to the next L0 file. Not advancing to the next key in the batch was causing a double increment of curr_index_in_curr_level due to the same key being processed again
2. Improper handling of a key that matches the user key portion of the largest key in the last file of L1 and higher. This was resulting in a premature end to the processing of the batch for that level when the next key in the batch is a duplicate. Typically, the keys in MultiGet will not be duplicates, but its good to handle that case correctly

Test -
asan_crash
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5292

Differential Revision: D15282530

Pulled By: anand1976

fbshipit-source-id: d1a6a86e0af273169c3632db22a44d79c66a581f
2019-05-09 13:18:00 -07:00
anand76
930bfa5750 Disable MultiGet from db_stress (#5284)
Summary:
Disable it for now until we can get stress tests to pass consistently.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5284

Differential Revision: D15230727

Pulled By: anand1976

fbshipit-source-id: 239baacdb3c4cd4fb7c4447f7582b9042501d752
2019-05-06 18:26:50 -07:00
Maysam Yabandeh
6a40ee5eb1 Refresh snapshot list during long compactions (2nd attempt) (#5278)
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
For simplicity, to avoid the feature is disabled in two cases: i) When more than one sub-compaction are sharing the same snapshot list, ii) when Range Delete is used in which the range delete aggregator has its own copy of snapshot list.
This fixes the reverted https://github.com/facebook/rocksdb/pull/5099 issue with range deletes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5278

Differential Revision: D15203291

Pulled By: maysamyabandeh

fbshipit-source-id: fa645611e606aa222c7ce53176dc5bb6f259c258
2019-05-03 17:30:22 -07:00
Zhongyi Xie
5d27d65bef multiget: fix memory issues due to vector auto resizing (#5279)
Summary:
This PR fixes three memory issues found by ASAN
* in db_stress, the key vector for MultiGet is created using `emplace_back` which could potentially invalidates references to the underlying storage (vector<string>) due to auto resizing. Fix by calling reserve in advance.
* Similar issue in construction of GetContext autovector in version_set.cc
* In multiget_context.h use T[] specialization for unique_ptr that holds a char array
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5279

Differential Revision: D15202893

Pulled By: miasantreble

fbshipit-source-id: 14cc2cda0ed64d29f2a1e264a6bfdaa4294ee75d
2019-05-03 15:58:43 -07:00
Zhongyi Xie
3e994809a1 fix implicit conversion error reported by clang check (#5277)
Summary:
fix the following clang check errors
```
tools/db_stress.cc:3609:30: error: implicit conversion loses integer precision: 'std::vector::size_type' (aka 'unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
    int num_keys = rand_keys.size();
        ~~~~~~~~   ~~~~~~~~~~^~~~~~
tools/db_stress.cc:3888:30: error: implicit conversion loses integer precision: 'std::vector::size_type' (aka 'unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
    int num_keys = rand_keys.size();
        ~~~~~~~~   ~~~~~~~~~~^~~~~~
2 errors generated.
make: *** [tools/db_stress.o] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5277

Differential Revision: D15196620

Pulled By: miasantreble

fbshipit-source-id: d56b1420d4a9f1df875fc52877a5fbb342bc7cae
2019-05-03 10:02:27 -07:00
anand76
434ccf2df4 Add option to use MultiGet in db_stress (#5264)
Summary:
The new option will pick a batch size randomly in the range 1-64. It will then space the keys in the batch by random intervals.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5264

Differential Revision: D15175522

Pulled By: anand1976

fbshipit-source-id: c16baa69d0f1ff4cf53c55c813ddd82c8aeb58fc
2019-05-01 23:06:56 -07:00
Andrew Kryczka
b02d0c238d Init compression dict handle before reading meta-blocks (#5267)
Summary:
At least one of the meta-block loading functions (`ReadRangeDelBlock`)
uses the same block reading function (`NewDataBlockIterator`) as data
block reads, which means it uses the dictionary handle. However, the
dictionary handle was uninitialized while reading meta-blocks, causing
readers to receive an error. This situation was only noticed when
`cache_index_and_filter_blocks=true`.

This PR initializes the handle to null while reading meta-blocks to
prevent the error. It also adds support to `db_stress` /
`db_crashtest.py` for `cache_index_and_filter_blocks`.

Fixes #5263.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5267

Differential Revision: D15149264

Pulled By: maysamyabandeh

fbshipit-source-id: 991d38a306c62db5976778bfb050fa3cd4a0671b
2019-04-30 09:50:49 -07:00
Yanqin Jin
210b49cac9 Disable pipelined write in atomic flush stress test (#5266)
Summary:
Since currently pipelined write allows one thread to perform memtable writes
while another thread is traversing the `flush_scheduler_`, it will cause an
assertion failure in `FlushScheduler::Clear`. To unblock crash recoery tests,
we temporarily disable pipelined write when atomic flush is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5266

Differential Revision: D15142285

Pulled By: riversand963

fbshipit-source-id: a0c20fe4ac543e08feaed602414f982054df7831
2019-04-30 08:12:42 -07:00
Sagar Vemuri
3548e4220d Improve explicit user readahead performance (#5246)
Summary:
Improve the iterators performance when the user explicitly sets the readahead size via `ReadOptions.readahead_size`.

1. Stop creating new table readers when the user explicitly sets readahead size.
2. Make use of an internal buffer based on `FilePrefetchBuffer` instead of using `ReadaheadRandomAccessFileReader`, to handle the user readahead requests (for both buffered and direct io cases).
3. Add `readahead_size` to db_bench.

**Benchmarks:**
https://gist.github.com/sagar0/53693edc320a18abeaeca94ca32f5737

For 1 MB readahead, Buffered IO performance improves by 28% and Direct IO performance improves by 50%.
For 512KB readahead, Buffered IO performance improves by 30% and Direct IO performance improves by 67%.

**Test Plan:**
Updated `DBIteratorTest.ReadAhead` test to make sure that:
- no new table readers are created for iterators on setting ReadOptions.readahead_size
- At least "readahead" number of bytes are actually getting read on each iterator read.

TODO later:
- Use similar logic for compactions as well.
- This ties in nicely with #4052 and paves the way for removing ReadaheadRandomAcessFile later.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5246

Differential Revision: D15107946

Pulled By: sagar0

fbshipit-source-id: 2c1149729ca7d779e4e8b7710ba6f4e8cbfd3bea
2019-04-26 21:24:10 -07:00
Adam Retter
990b2f4cb3 Fix compilation on db_bench_tool.cc on Windows (#5227)
Summary:
I needed this change to be able to build the v6.0.1 release on Windows.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5227

Differential Revision: D15033815

Pulled By: sagar0

fbshipit-source-id: 579f3b8e694c34c0d43527eb2fa37175e37f5911
2019-04-23 11:16:51 -07:00
Fosco Marotto
6c2bf9e916 Add copyright headers per FB open-source checkup tool. (#5199)
Summary:
internal task: T35568575
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199

Differential Revision: D14962794

Pulled By: gfosco

fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe
2019-04-18 10:55:01 -07:00
Zhongyi Xie
baa5302447 Avoid double-compacting data in bottom level in manual compactions (#5138)
Summary:
Depending on the config, manual compaction (leveled compaction style) does following compactions:
L0->L1
L1->L2
...
Ln-1 -> Ln
Ln -> Ln
The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only.
Resolves issue https://github.com/facebook/rocksdb/issues/4995
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138

Differential Revision: D14940106

Pulled By: miasantreble

fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5
2019-04-16 23:32:20 -07:00
Yi Wu
b70967aac7 db_bench: support seek to non-exist prefix (#5163)
Summary:
Add `--seek_missing_prefix` flag to db_bench to allow benchmarking seeking to non-existing prefix. Usage example:
```
./db_bench --db=/dev/shm/db_bench --use_existing_db=false --benchmarks=fillrandom --num=100000000 --prefix_size=9 --keys_per_prefix=10
./db_bench --db=/dev/shm/db_bench --use_existing_db=true --benchmarks=seekrandom --disable_auto_compactions=true --num=100000000 --prefix_size=9 --keys_per_prefix=10 --reads=1000 --prefix_same_as_start=true --seek_missing_prefix=true
```
Also adding `--total_order_seek` and `--prefix_same_as_start` flags.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5163

Differential Revision: D14935724

Pulled By: riversand963

fbshipit-source-id: 7c41023f007febe373eb1589861f215432a9e18a
2019-04-15 10:54:58 -07:00
Yanqin Jin
3189398c00 Fix bugs detected by clang analyzer (#5185)
Summary:
as titled. False positive included, fixed anyway to make the check
pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185

Differential Revision: D14909384

Pulled By: riversand963

fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3
2019-04-12 10:45:56 -07:00
anand76
fefd4b98c5 Introduce a new MultiGet batching implementation (#5011)
Summary:
This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.

Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
2. Bloom filter cachelines can be prefetched, hiding the cache miss latency

The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.

Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).

Batch   Sizes

1        | 2        | 4         | 8      | 16  | 32

Random pattern (Stride length 0)
4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)

Good locality (Stride length 16)
4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135

Good locality (Stride length 256)
4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62

Medium locality (Stride length 4096)
4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891

dbbench command used (on a DB with 4 levels, 12 million keys)-
TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011

Differential Revision: D14348703

Pulled By: anand1976

fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
2019-04-11 14:28:26 -07:00
datonli
f0edf9d575 #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152)
Summary:
mv port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5152

Differential Revision: D14779409

Pulled By: siying

fbshipit-source-id: d4162c47c979c6e8cc6a9e601802864ab3768ecb
2019-04-04 11:38:19 -07:00
Yanqin Jin
d77476ef55 Fix db_stress for custom env (#5122)
Summary:
Fix some hdfs-related code so that it can compile and run 'db_stress'
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5122

Differential Revision: D14675495

Pulled By: riversand963

fbshipit-source-id: cac280479efcf5451982558947eac1732e8bc45a
2019-03-28 19:20:27 -07:00
Siying Dong
2b4d5ceb47 Remove some "using std::..." from header files. (#5113)
Summary:
The code convention we are following, Google C++ Style, discourage
alias in header files, especially public headers:
https://google.github.io/styleguide/cppguide.html#Aliases
Remove some of them. Might removed some from .cc files as well to be consistent.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5113

Differential Revision: D14633030

Pulled By: siying

fbshipit-source-id: b990edc919d5de60295992284f980195e501d424
2019-03-27 10:28:21 -07:00
Yanqin Jin
9358178edc Support for single-primary, multi-secondary instances (#4899)
Summary:
This PR allows RocksDB to run in single-primary, multi-secondary process mode.
The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.

This PR has several components:
1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.

2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.

3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
3.1 Tail the primary's MANIFEST during recovery.
3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
3.3 Tailing WAL will be in a future PR.

4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899

Differential Revision: D14510945

Pulled By: riversand963

fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
2019-03-26 16:45:31 -07:00
Zhongyi Xie
52e6404e0f ldb command parsing: allow option values to contain equals signs (#5088)
Summary:
Right now ldb command doesn't allow cases where option values contain equals sign. For example,
```
ldb --db=/tmp/test scan --from='q=3' --max_keys=1
```
after parsing, ldb will have one option 'db', 'max_keys' and one flag 'from'.
This PR updates the parsing logic so that it now supports the above mentioned cases
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5088

Differential Revision: D14600869

Pulled By: miasantreble

fbshipit-source-id: c6ef518c74a98d7b6675ea5954ae08b1bda5554e
2019-03-25 13:23:11 -07:00
Shobhit Dayal
b45b1cde3e Feature for sampling and reporting compressibility (#4842)
Summary:
This is a feature to sample data-block compressibility and and report them as stats. 1 in N (tunable) blocks is sampled for compressibility using two algorithms:
1. lz4 or snappy for fast compression
2. zstd or zlib for slow but higher compression.

The stats are reported to the caller as raw-bytes and compressed-bytes. The block continues to be compressed for storage using the specified CompressionType.

The db_bench_tool how has a command line option for specifying the sampling rate. It's default value is 0 (no sampling). To test the overhead for a certain value, users can compare the performance of db_bench_tool, varying the sampling rate. It is unlikely to have a noticeable impact for high values like 20.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4842

Differential Revision: D13629011

Pulled By: shobhitdayal

fbshipit-source-id: 14ca668bcab6499b2a1734edf848eb62a4f4fafa
2019-03-18 12:15:34 -07:00
Andrew Kryczka
2263f86901 exercise WAL recycling in crash test (#5070)
Summary:
Since this feature affects the WAL behavior, it seems important our crash-recovery tests cover it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5070

Differential Revision: D14470085

Pulled By: miasantreble

fbshipit-source-id: 9b9682a718a926d57d055e0a5ec867efbd2eb9c1
2019-03-15 12:03:26 -07:00
Zhichao Cao
dcde292c3b Add the -try_process_corrupted_trace option to trace_analyzer (#5067)
Summary:
In the current trace_analyzer implementation, once the trace file has corrupted content, which can be caused by unexpected tracing operations or other reasons, trace_analyzer will print the error and stop analyzing.

By adding the -try_process_corrupted_trace option, user can try to process the corrupted trace file and get the analyzing results of the trace records from the beginning to the the first corrupted point in the trace file. Analyzing might fail even this option is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5067

Differential Revision: D14433037

Pulled By: zhichao-cao

fbshipit-source-id: d095233ba371726869af0def0cdee23b69896831
2019-03-14 20:03:01 -07:00
Andrew Kryczka
5a5c0492db ldb: set total_order_seek for scans (#5066)
Summary:
Without `total_order_seek=true`, using this command with `prefix_extractor` set skips over lots of keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5066

Differential Revision: D14425967

Pulled By: sagar0

fbshipit-source-id: f6f142733258d92604f920615be9266e1fe797f8
2019-03-12 13:10:39 -07:00
Zhichao Cao
05ebfebc17 Fixed the potential stack overflow of MixGraph in db_bench (#5051)
Summary:
In the MixGraph benchmark of db_bench, The max buffer size used for value of KV-pair might be extremely large (64MB), which might cause function stack overflow in some platforms, reduced to 1MB.

Added the finished ops printing in MixGraph benchmark.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5051

Differential Revision: D14379571

Pulled By: zhichao-cao

fbshipit-source-id: 24084fbe38f60f2902d9a40f6bc9a25e4e2c9bb9
2019-03-08 14:10:17 -08:00
Andrew Kryczka
18d2e4beb7 Run db_bench on database generated externally (#5017)
Summary:
Added an option, `-use_existing_keys`, which can be set to run
benchmarks against an arbitrary existing database. Now users can
benchmark against their actual database rather than synthetic data.

Before the run begins, it loads all the keys into memory, then uses that
set of keys rather than synthesizing new ones in `GenerateKeyFromInt`.
This is mainly intended for small-scale DBs where the memory consumption
is not a concern.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5017

Differential Revision: D14270303

Pulled By: riversand963

fbshipit-source-id: 6328df9dffb5e19170270dd00a69f4bbe424e5ed
2019-03-01 11:19:03 -08:00
Siying Dong
aef763b6d6 Make statistics's stats_level change thread-safe (#5030)
Summary:
Right now, users can change statistics.stats_level while DB is running, but TSAN may report
data race. We make stats_level_ to be atomic, and access them using accessors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5030

Differential Revision: D14267519

Pulled By: siying

fbshipit-source-id: 37d7ebeff7a43a406230143422a16af899163f73
2019-03-01 10:42:09 -08:00
Maysam Yabandeh
0b80f6b380 WritePrepared: script to analyze stress test failures (#5033)
Summary:
This the hackish script we used to find the root cause of failures in transaction stress tests. It is not well-written and does not require rigorous reviewing but it is better than starting from scratch each time we observe an issue. The stress tests would just say that at which snapshots the sum of all the keys in a set is inconsistent with another set. To help debugging one need to know which key exactly returned inconsistent results. The script looks at the transactions between two conflicting snapshots, and performs thee changes manually to see for which key the read value was inconsistent.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5033

Differential Revision: D14280362

Pulled By: maysamyabandeh

fbshipit-source-id: d5826055c46711460ba81480d96cb5ea082814a5
2019-03-01 09:18:40 -08:00
Siying Dong
5e298f865b Add two more StatsLevel (#5027)
Summary:
Statistics cost too much CPU for some use cases. Add two stats levels
so that people can choose to skip two types of expensive stats, timers and
histograms.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5027

Differential Revision: D14252765

Pulled By: siying

fbshipit-source-id: 75ecec9eaa44c06118229df4f80c366115346592
2019-02-28 10:27:59 -08:00
Zhongyi Xie
c4f5d0aa15 add GetStatsHistory to retrieve stats snapshots (#4748)
Summary:
This PR adds public `GetStatsHistory` API to retrieve stats history in the form of an std map. The key of the map is the timestamp in microseconds when the stats snapshot is taken, the value is another std map from stats name to stats value (stored in std string). Two DBOptions are introduced: `stats_persist_period_sec` (default 10 minutes) controls the intervals between two snapshots are taken; `max_stats_history_count` (default 10) controls the max number of history snapshots to keep in memory. RocksDB will stop collecting stats snapshots if `stats_persist_period_sec` is set to 0.

(This PR is the in-memory part of https://github.com/facebook/rocksdb/pull/4535)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4748

Differential Revision: D13961471

Pulled By: miasantreble

fbshipit-source-id: ac836d401ecb84ea92216bf9966f969dedf4ad04
2019-02-20 15:52:54 -08:00
Zhongyi Xie
ed995c6a69 add whole key bloom filter support in memtables (#4985)
Summary:
MyRocks calls `GetForUpdate` on `INSERT`, for unique key check, and in almost all cases GetForUpdate returns empty result. For such cases, whole key bloom filter is helpful.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4985

Differential Revision: D14118257

Pulled By: miasantreble

fbshipit-source-id: d35cb7109c62fd5ad541a26968e3a3e16d3e85ea
2019-02-19 12:15:39 -08:00
Siying Dong
4db46aa2e6 Fix LITE Build (#4989)
Summary:
LITE mode has EventListener to be an empty class. However in db_bench,
it is used. When "override" is added to the functions, the build breaks. Fix it
by keeping the listener empty in LITE mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4989

Differential Revision: D14108132

Pulled By: siying

fbshipit-source-id: 80121aab35b1120e502b37b782301dd700692697
2019-02-15 16:13:11 -08:00
Aubin Sanyal
3231a2e581 Deprecate ttl option from CompactionOptionsFIFO (#4965)
Summary:
We introduced ttl option in CompactionOptionsFIFO when ttl-based file
deletion (compaction) was supported only as part of FIFO Compaction. But
with the extension of ttl semantics even to Level compaction,
CompactionOptionsFIFO.ttl can now be deprecated. Instead we will start
using ColumnFamilyOptions.ttl for FIFO compaction as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4965

Differential Revision: D14072960

Pulled By: sagar0

fbshipit-source-id: c98cc2ae695a28136295787cd88d36a220fc219e
2019-02-15 09:51:41 -08:00
Michael Liu
ca89ac2ba9 Apply modernize-use-override (2nd iteration)
Summary:
Use C++11’s override and remove virtual where applicable.
Change are automatically generated.

Reviewed By: Orvid

Differential Revision: D14090024

fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a
2019-02-14 14:41:36 -08:00
Yanqin Jin
4fc442029a Avoid using kInAtomicGroup tag for single-cf op (#4981)
Summary:
if an operation just involves a single column family, then we do
not have to set the kInAtomicGroup tag when writing to MANIFEST. This change
can fix a compatibility test failure, i.e. 5.15 and earlier cannot recognize
kInAtomicGroup tag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4981

Differential Revision: D14072687

Pulled By: riversand963

fbshipit-source-id: 46b0c61e399f16c6b7169de0b33430d0ed90d6d4
2019-02-13 18:33:42 -08:00
Andrew Kryczka
62f70f6d14 Reduce scope of compression dictionary to single SST (#4952)
Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.

So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:

- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952

Differential Revision: D13967980

Pulled By: ajkr

fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
2019-02-11 19:47:32 -08:00
Andrew Kryczka
1218704b61 Fix compression_zstd_max_train_bytes coverage in stress test (#4957)
Summary:
Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957

Differential Revision: D13994302

Pulled By: ajkr

fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65
2019-02-11 14:56:39 -08:00
Siying Dong
cf3a671733 Remove cuckoo hash memtable (#4953)
Summary:
Cuckoo Hash is less useful than we initially expected. Remove it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953

Differential Revision: D13979264

Pulled By: siying

fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed
2019-02-07 16:15:27 -08:00
Siying Dong
d9c9f3c809 db_bench: fix "micros/op" reporting (#4949)
Summary:
4985a9f73b (diff-e5276985b26a0551957144f4420a594bR511)
changes the meaning of latency reporting from running time per query, to elapse_time / #ops, without providing a reason why.
Considering that this is a counter-intuitive reporting, Reverting the change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4949

Differential Revision: D13964684

Pulled By: siying

fbshipit-source-id: d6304d3d4b5a802daa292302623c7dbca9a680bc
2019-02-05 17:20:02 -08:00
Alexander Zinoviev
32a6dd9a41 Add a new CPU time counter to compaction report (#4889)
Summary:
Measure CPU time consumed for a compaction and report it in the stats report
Enable NowCPUNanos() to work for MacOS
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4889

Differential Revision: D13701276

Pulled By: zinoale

fbshipit-source-id: 5024e5bbccd4dd10fd90d947870237f436445055
2019-01-29 17:24:00 -08:00
zhichao-cao
e2547103fd Fix the build error caused by the dynamic array (#4918)
Summary:
In the MixGraph benchmark of db_bench #4788 , the char array is initialized with an argument from user's input, which can cause build error on some platforms. Also, the msg char array size can be potentially smaller than the printed data, which should be extended from 100 to 256.

Tested with make check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4918

Differential Revision: D13844298

Pulled By: sagar0

fbshipit-source-id: 33c4809c5c4438f0a9f7b289d3f42e20c545bbab
2019-01-28 12:39:57 -08:00
Andrew Kryczka
e242fa4664 Add latest toolchain (gcc-8, etc.) build support for fbcode users (#4923)
Summary:
- When building with internal dependencies, specify this toolchain by setting `ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007=1`
- It is not enabled by default. However, it is enabled for TSAN builds in CI since there is a known problem with TSAN in gcc-5: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71090
- I did not add support for Lua since (1) we agreed to deprecate it, and (2) we only have an internal build for v5.3 with this toolchain while that has breaking changes compared to our current version (v5.2).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4923

Differential Revision: D13827226

Pulled By: ajkr

fbshipit-source-id: 9aa3388ed3679777cfb15ef8cbcb83c07f62f947
2019-01-28 11:26:32 -08:00
Sagar Vemuri
0cead31d10 Fix Clang static analyzer warning in db_bench (#4910)
Summary:
Fixed clang static analyzer warning about division by 0.
```
ar: creating librocksdb_debug.a
tools/db_bench_tool.cc:4650:43: warning: Division by zero
      int pos = static_cast<int>(rand_num % range_);
                                 ~~~~~~~~~^~~~~~~~
1 warning generated.
make: *** [analyze] Error 1
```

This is from the new code I recently merged in ce8e88d2d7.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4910

Differential Revision: D13788037

Pulled By: sagar0

fbshipit-source-id: f48851dca85047c19fbb1a361e25ce643aa4c7ea
2019-01-23 13:33:02 -08:00
Zhongyi Xie
cbe0239270 add cast to avoid loss of precision error (#4906)
Summary:
this PR address the following error:
> tools/db_bench_tool.cc:4776:68: error: implicit conversion loses integer precision: 'int64_t' (aka 'long') to 'unsigned int' [-Werror,-Wshorten-64-to-32]
        s = db_with_cfh->db->Put(write_options_, key, gen.Generate(value_size));
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4906

Differential Revision: D13780185

Pulled By: miasantreble

fbshipit-source-id: 1c83a77d341099518c72f0f4a63e97ab9c4784b3
2019-01-22 22:44:17 -08:00
Zhichao Cao
ce8e88d2d7 Generate mixed workload with Get, Put, Seek in db_bench (#4788)
Summary:
Based on the specific workload models (key access distribution, value size distribution, and iterator scan length distribution, the QPS variation), the MixGraph benchmark generate the synthetic workload according to these distributions which can reflect the real-world workload characteristics.

After user enable the tracing function, they will get the trace file. By analyzing the trace file with the trace_analyzer tool, user can generate a set of statistic data files including. The *_accessed_key_stats.txt,  *-accessed_value_size_distribution.txt, *-iterator_length_distribution.txt, and *-qps_stats.txt are mainly used to fit the Matlab model fitting. After that, user can get the parameters of the workload distributions (the modeling details are described: [here](https://github.com/facebook/rocksdb/wiki/RocksDB-Trace%2C-Replay%2C-and-Analyzer))

The key access distribution follows the The two-term power model. The probability density function is: `f(x) = ax^{b}+c`. The corresponding parameters are key_dist_a, key_dist_b, and key_dist_c in db_bench

For the value size distribution and iterator scan length distribution, they both follow the Generalized Pareto Distribution. The probability density function is `f(x) = (1/sigma)(1+k*(x-theta)/sigma))^{-1-1/k)`. The parameters are: value_k, value_theta, value_sigma and iter_k, iter_theta, iter_sigma. For more information about the Generalized Pareto Distribution, users can find the [wiki](https://en.wikipedia.org/wiki/Generalized_Pareto_distribution) and [Matalb page](https://www.mathworks.com/help/stats/generalized-pareto-distribution.html)

As for the QPS, it follows the diurnal pattern. So Sine is a good model to fit it. `F(x) = sine_a*sin(sine_b*x + sine_c) + sine_d`. The trace_will tell you the average QPS in the print out resutls, which is sine_d. After user fit the "*-qps_stats.txt" to the Matlab model, user can get the sine_a, sine_b, and sine_c. By using the 4 parameters, user can control the QPS variation including the period, average, changes.

To use the bench mark, user can indicate the following parameters as examples:
```
-benchmarks="mixgraph" -key_dist_a=0.002312 -key_dist_b=0.3467 -value_k=0.9233 -value_sigma=226.4092 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.7 -mix_put_ratio=0.25 -mix_seek_ratio=0.05 -sine_mix_rate_interval_milliseconds=500 -sine_a=15000 -sine_b=1 -sine_d=20000
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4788

Differential Revision: D13573940

Pulled By: sagar0

fbshipit-source-id: e184c27e07b4f1bc0b436c2be36c5090c1fb0222
2019-01-22 10:44:26 -08:00
Andrew Kryczka
01013ae766 Digest ZSTD compression dictionary once when writing SST file (#4849)
Summary:
This is essentially a re-submission of #4251 with a few improvements:

- Split `CompressionDict` into two separate classes: `CompressionDict` and `UncompressionDict`
- Eliminated `Init` functions. Instead do all initialization work in constructors.
- Added test case for parallel DB open, which is the scenario where #4251 failed under TSAN.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4849

Differential Revision: D13606039

Pulled By: ajkr

fbshipit-source-id: 08c236059798c710db9cbf545fce0f371232d447
2019-01-18 19:12:57 -08:00
Siying Dong
4e37251b4d With ldb --try_load_options and wal_dir doesn't exist, ignore it (#4875)
Summary:
LDB is frequently used to exam data copied. wal_dir in option file is not modified and it usually points to the path it copied from.
The user experience will be better if when ldb sees wal_dir pointed by the option file doesn't exist, rather than fail, just ignore it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4875

Differential Revision: D13643173

Pulled By: siying

fbshipit-source-id: 2e64d4ea2ec49a6794b9a706b7fc1ba901128bb8
2019-01-11 16:48:32 -08:00
Yanqin Jin
ffc9f84649 Free memory after use
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4857

Differential Revision: D13602688

Pulled By: riversand963

fbshipit-source-id: 993419a6afb982a7a701ff71daebebb4b4a6b265
2019-01-08 17:19:09 -08:00
Yanqin Jin
e686caffec Remove unnecessary assersion in AtomicFlushStressTest::TestCheckpoint (#4846)
Summary:
as titled.
We can remove the assersion because we do not perform verification in
AtomicFlushStressTest::TestCheckpoint for similar reasons to TestGet, TestPut,
etc.
Therefore, we override TestCheckpoint in AtomicFlushStressTest so that the
assertion `rand_column_families.size() == rand_keys.size()' is removed, and we
do not verify the DB in this function.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4846

Differential Revision: D13583377

Pulled By: riversand963

fbshipit-source-id: 03647f3da67e27a397413fd666e3bb43003bf596
2019-01-07 16:47:26 -08:00
Huachao Huang
74f7d7551e tools: use provided options instead of the default (#4839)
Summary:
The current implementation hardcode the default options in different
places, which makes it impossible to support other environments (like
encrypted environment).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4839

Differential Revision: D13573578

Pulled By: sagar0

fbshipit-source-id: 76b58b4b758902798d10ff2f52d9f39abff015e7
2019-01-03 11:23:49 -08:00
Yanqin Jin
565b5bdc42 Add support for read-only db chkpt stress (#4690)
Summary:
Updated stress test will support testing of db in read-only mode.
The user has to make sure that only read/scan operations are enabled.
This PR relies on #4681.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4690

Differential Revision: D13102741

Pulled By: riversand963

fbshipit-source-id: f5a36b34db187fe12dd355f7eda161f99d6c75e4
2019-01-02 17:40:53 -08:00
Andrew Kryczka
ace543a815 fix accounting for range tombstones in TableProperties (#4841)
Summary:
- To be consistent with the accounting of other optypes in `TableProperties`, we should count range tombstones in `TableProperties::num_entries` and `TableProperties::num_deletions`.
- Updated assertions in stress test's `OnTableFileCreated` handler to accept files with range tombstones only.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4841

Differential Revision: D13568424

Pulled By: ajkr

fbshipit-source-id: 0139d7806494eda20ece67ec460d2458dbbf6026
2019-01-02 15:08:53 -08:00
Andrew Kryczka
68d949b3e3 Enable DeleteRange in stress/crash tests (#4483)
Summary:
Set `delrangepercent=1` when `test_batches_snapshots=false`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4483

Differential Revision: D10324361

Pulled By: ajkr

fbshipit-source-id: 0cde1f1504f9493408a0c6493b976d7e5f5b2d23
2018-12-18 13:42:49 -08:00
Fosco Marotto
311cd8cf2f Updated benchmark script (#4134)
Summary:
When producing the updated performance on flash results for the wiki, these are the updates which were made.

https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4134

Differential Revision: D13491052

Pulled By: gfosco

fbshipit-source-id: dcd92f24659e0917cb1ac54a4446aa8e7aac8b0d
2018-12-17 16:34:30 -08:00
Andrew Kryczka
8d2b74d287 Refine db_stress params for atomic flush (#4781)
Summary:
Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781

Differential Revision: D13463211

Pulled By: ajkr

fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd
2018-12-13 22:10:38 -08:00
DorianZheng
2670fe8c73 Get CompactionJobInfo from CompactFiles
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4716

Differential Revision: D13207677

Pulled By: ajkr

fbshipit-source-id: d0ccf5a66df6cbb07288b0c5ebad81fd9df3926b
2018-12-13 14:21:24 -08:00
Sagar Vemuri
70645355ad Move FIFOCompactionPicker to a separate file (#4724)
Summary:
**Summary:**
Simplified the code layout by moving FIFOCompactionPicker to a separate file.
**Why?:**
While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724

Differential Revision: D13227914

Pulled By: sagar0

fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3
2018-11-29 16:04:52 -08:00
Sagar Vemuri
c94f073e5e Fix Mac build break in casting (#4722)
Summary:
Mac build is failing with the below error:
```
$ make db_bench -j8
...
...
tools/db_bench_tool.cc:4583:25: error: no matching function for call to 'max'
              (uint64_t)std::max(0l, seek_pos - FLAGS_max_scan_distance),
                        ^~~~~~~~
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2717:1: note: candidate template ignored: deduced conflicting types for parameter '_Tp' ('long' vs. 'long long')
max(const _Tp& __a, const _Tp& __b)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2727:1: note: candidate template ignored: could not match 'initializer_list<type-parameter-0-0>' against 'long'
max(initializer_list<_Tp> __t, _Compare __comp)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2709:1: note: candidate function template not viable: requires 3 arguments, but 2 were provided
max(const _Tp& __a, const _Tp& __b, _Compare __comp)
^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/include/c++/v1/algorithm:2735:1: note: candidate function template not viable: requires single argument '__t', but 2 arguments were provided
max(initializer_list<_Tp> __t)
^
1 error generated.
make: *** [tools/db_bench_tool.o] Error 1
```

My compiler version:
Mac OS X Mojave
```
$ clang++ --version
Apple LLVM version 10.0.0 (clang-1000.11.45.5)
Target: x86_64-apple-darwin18.2.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4722

Differential Revision: D13220196

Pulled By: sagar0

fbshipit-source-id: 01e5e928288a5613027c83a26ad8aedf04438b14
2018-11-27 13:30:16 -08:00
Huachao Huang
5e72bc113a Add SstFileReader to read sst files (#4717)
Summary:
A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.

TODO: there is only a very simple test now, because I want to get some feedback first.
If the changes look good, I will add more tests soon.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717

Differential Revision: D13212686

Pulled By: ajkr

fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
2018-11-27 13:02:23 -08:00
Po-Chuan Hsieh
60deb4485e Fix build with ROCKSDB_LITE and -Wunused-private-field (#4715)
Summary:
The error message of databases/rocksdb-lite (FreeBSD port) is as follows:
```
  tools/db_bench_tool.cc:1976:16: error: private field 'trace_options_' is not used [-Werror,-Wunused-private-field]
    TraceOptions trace_options_;
                 ^
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4715

Differential Revision: D13207902

Pulled By: ajkr

fbshipit-source-id: be3c612eba656aeddb77e35e2f201dd25dc92f7e
2018-11-26 21:35:38 -08:00
Abhishek Madan
0ed738fdd0 Add max_scan_distance flag to db_bench (#4660)
Summary:
The new flag makes it possible to constrain iterator traversal
by the upper/lower bound the iterator is expected to pass. This allows
seekrandom results to be more easily comparable between DBs with and
without deletions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4660

Differential Revision: D13053111

Pulled By: abhimadan

fbshipit-source-id: 33e250f2e2d210b54c7726399da30a33f723c33c
2018-11-14 10:46:12 -08:00
Yanqin Jin
de65103553 Improve result report of scan (#4648)
Summary:
When iterator becomes invalid, there are two possibilities.
First, all data in the column family have been scanned and there is nothing
more to scan.
Second, an underlying error has occurred, causing `status()` to be !ok.
Therefore, we need to check for both cases when `!iter->Valid()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4648

Differential Revision: D12959601

Pulled By: riversand963

fbshipit-source-id: 49c9382c9ea9e78f2e2b6f3708f0670b822ca8dd
2018-11-13 20:03:59 -08:00
Zhichao Cao
d761857d56 Add unique key number changing statistics to Trace_analyzer (#4646)
Summary:
Changes:
1. in current version, key size distribution is printed out as the result. In this change, the result will be output to a file to make further analyze easier
2. To understand how the unique keys are accessed over time, the total unique key number of each CF of each query type in each second over time is output to a file. In this way, user could know when the unique keys are accessed frequently or accessed rarely.
3. output the total QPS of each CF to a file
4. Add the print result of total queries of each CF of each query type.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4646

Differential Revision: D12968156

Pulled By: zhichao-cao

fbshipit-source-id: 6c411c7ec47c7843a70929136efd71a150db0e4c
2018-11-12 08:26:50 -08:00
Sagar Vemuri
dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Andrew Kryczka
8ba17f382e Verify restore from backup in db_stress (#4655)
Summary:
We already exercised backup functionality in `db_stress` according to the `-backup_one_in` flag. This PR verifies the backup can be restored/opened and sanity checks a few keys. Changes in this PR:

- Extracted existing backup-related logic to a helper function, `TestBackupRestore`
- Added restore logic, which targets a hidden directory named "./.restore\<thread number\>", similar to how backups target hidden directories named "./.backup\<thread number\>".
- After restore, check the existence/non-existence of a few keys.
- With this PR, backup is no longer compatible with clearing column families.
- Also included unrelated fixes to set `ReadOptions::total_order_seek=true` when using `-compare_full_db_state_snapshot`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4655

Differential Revision: D12972496

Pulled By: ajkr

fbshipit-source-id: 481a40052d9a38d1bd5c5159aa4d7c5a4b546b80
2018-11-08 15:15:24 -08:00
Yanqin Jin
d7a04383d1 Include newer RocksDB versions in compat test (#4634)
Summary:
Include 5.16 and 5.17 in check_format_compatible.sh
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4634

Differential Revision: D12947140

Pulled By: riversand963

fbshipit-source-id: 79852b76d5139b2f31db59ed14cb368be01f2c32
2018-11-06 14:25:39 -08:00
Yanqin Jin
50895e5f0d Update manual flush stress test (#4608)
Summary:
Originally, the manual flush calls in db_stress flushes only a single column
family, which is not sufficient when atomic flush is enabled.
With atomic flush, we should call `Flush(flush_opts, cfhs)` to better test this
new feature. Specifically, we manuall flush all column families so that
database verification is easier.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4608

Differential Revision: D12849160

Pulled By: riversand963

fbshipit-source-id: ae1f0dd825247b42c0aba520a5c967335102c876
2018-10-30 17:30:28 -07:00
Abhishek Madan
eaaf1a6f05 Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594)
Summary:
Since the number of range deletions are reported in
TableProperties, it is confusing to not report the number of merge
operands and point deletions as top-level properties; they are
accessible through the public API, but since they are not the "main"
properties, they do not appear in aggregated table properties, or the
string representation of table properties.

This change promotes those two property keys to
`rocksdb/table_properties.h`, adds corresponding uint64 members for
them, deprecates the old access methods `GetDeletedKeys()` and
`GetMergeOperands()` (though they are still usable for now), and removes
`InternalKeyPropertiesCollector`. The property key strings are the same
as before this change, so this should be able to read DBs written from older
versions (though I haven't tested this yet).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594

Differential Revision: D12826893

Pulled By: abhimadan

fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
2018-10-30 15:34:27 -07:00
Yanqin Jin
912bbbbc72 Enable crash-recovery stress test for atomic flush (#4605)
Summary:
This PR adds test of atomic flush to our continuous stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605

Differential Revision: D12840607

Pulled By: riversand963

fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9
2018-10-30 14:03:36 -07:00
Yanqin Jin
7fb39f1ae1 Fix a warning against implicit type conversion (#4593)
Summary:
Test plan
```
$USE_CLANG=1 make -j32 all check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4593

Differential Revision: D12811159

Pulled By: riversand963

fbshipit-source-id: 5e3bbe058c5a8d5a286a19d7643593fc154a2d6d
2018-10-29 09:54:36 -07:00
Yanqin Jin
5b4c709fad Enable atomic flush (#4023)
Summary:
Adds a DB option `atomic_flush` to control whether to enable this feature. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4023

Differential Revision: D8518381

Pulled By: riversand963

fbshipit-source-id: 1e3bb33e99bb102876a31b378d93b0138ff6634f
2018-10-26 15:08:43 -07:00
Zhongyi Xie
fe0d23059d Fix two contrun job failures (#4587)
Summary:
Currently there are two contrun test failures:
* rocksdb-contrun-lite:
> tools/db_bench_tool.cc: In function ‘int rocksdb::db_bench_tool(int, char**)’:
tools/db_bench_tool.cc:5814:5: error: ‘DumpMallocStats’ is not a member of ‘rocksdb’
     rocksdb::DumpMallocStats(&stats_string);
     ^
make: *** [tools/db_bench_tool.o] Error 1
* rocksdb-contrun-unity:
> In file included from unity.cc:44:0:
db/range_tombstone_fragmenter.cc: In member function ‘void rocksdb::FragmentedRangeTombstoneIterator::FragmentTombstones(std::unique_ptr<rocksdb::InternalIteratorBase<rocksdb::Slice> >, rocksdb::SequenceNumber)’:
db/range_tombstone_fragmenter.cc:90:14: error: reference to ‘ParsedInternalKeyComparator’ is ambiguous
   auto cmp = ParsedInternalKeyComparator(icmp_);

This PR will fix them
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4587

Differential Revision: D10846554

Pulled By: miasantreble

fbshipit-source-id: 8d3358879e105060197b1379c84aecf51b352b93
2018-10-24 20:16:45 -07:00
Yi Wu
0415244bfa option to print malloc stats at the end of db_bench (#4582)
Summary:
Option to print malloc stats to stdout at the end of db_bench. This is different from `--dump_malloc_stats`, which periodically print the same information to LOG file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4582

Differential Revision: D10520814

Pulled By: yiwu-arbug

fbshipit-source-id: beff5e514e414079d31092b630813f82939ffe5c
2018-10-24 11:39:05 -07:00
Simon Grätzer
f959e88048 Fix printf formatting on MacOS (#4533)
Summary:
On MacOS with clang the compilation of _tools/db_bench_tool.cc_ always fails because the format used in a `fprintf` call has the wrong type. This PR should hopefully fix this issue
```
tools/db_bench_tool.cc:4233:61: error: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long')
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4533

Differential Revision: D10471657

Pulled By: maysamyabandeh

fbshipit-source-id: f20f5f3756d3571b586c895c845d0d4d1e34a398
2018-10-19 14:46:09 -07:00
Yanqin Jin
da4aa59b4c Add read retry support to log reader (#4394)
Summary:
Current `log::Reader` does not perform retry after encountering `EOF`. In the future, we need the log reader to be able to retry tailing the log even after `EOF`.

Current implementation is simple. It does not provide more advanced retry policies. Will address this in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4394

Differential Revision: D9926508

Pulled By: riversand963

fbshipit-source-id: d86d145792a41bd64a72f642a2a08c7b7b5201e1
2018-10-19 11:53:00 -07:00
Abhishek Madan
35cd754a6d Add writes_before_delete_range flag to db_bench (#4538)
Summary:
The new flag allows tombstones to be generated after enough
keys have been written to the database, which makes it easier to ensure
that tombstones cover a lot of keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4538

Differential Revision: D10455685

Pulled By: abhimadan

fbshipit-source-id: f25d5421745a353c830dea12b79784e852056551
2018-10-18 17:19:59 -07:00
Zhongyi Xie
d6ec288703 Add PerfContextByLevel to provide per level perf context information (#4226)
Summary:
Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level.
This will be helpful when analyzing point and range query performance as well as tuning bloom filter
Also replaced __thread with thread_local keyword for perf_context
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226

Differential Revision: D10369509

Pulled By: miasantreble

fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82
2018-10-17 11:19:40 -07:00
Young Tack Jin
c648d90f8e benchmark.sh: to fix divide by zero runtime error (#4442)
Summary:
"Write (GB)" of $9 rather than "Rnp1 (GB)" of $8
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4442

Differential Revision: D10318193

Pulled By: yiwu-arbug

fbshipit-source-id: 03a7ef1938d9332e06fb3fd8490ca212f61fac6b
2018-10-10 21:03:19 -07:00
Zhichao Cao
7ca1a1f0d8 Fix trace_analyzer potential huge memory wasting due to no valid query analyzed (#4473)
Summary:
If the query types being analyzed do not appear in the trace, the current trace_analyzer will use 0 as the begin time, which create the time duration from 1970/01/01 to the now time. It will waste huge memory. Fixed by adding the trace_create_time to limit the duration.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4473

Differential Revision: D10246204

Pulled By: zhichao-cao

fbshipit-source-id: 42850b080b2e62f586fe73afd7737c2246d1a8c8
2018-10-10 10:00:00 -07:00
Igor Canadi
1cf5deb8fd Introduce CacheAllocator, a custom allocator for cache blocks (#4437)
Summary:
This is a conceptually simple change, but it touches many files to
pass the allocator through function calls.

We introduce CacheAllocator, which can be used by clients to configure
custom allocator for cache blocks. Our motivation is to hook this up
with folly's `JemallocNodumpAllocator`
(f43ce6d686/folly/experimental/JemallocNodumpAllocator.h),
but there are many other possible use cases.

Additionally, this commit cleans up memory allocation in
`util/compression.h`, making sure that all allocations are wrapped in a
unique_ptr as soon as possible.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437

Differential Revision: D10132814

Pulled By: yiwu-arbug

fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564
2018-10-02 17:24:58 -07:00
Andrew Kryczka
d56070d875 Fix benchmark script with vector memtable (#4428)
Summary:
I guess we didn't update this script when `--allow_concurrent_memtable_write` became true by default.

Fixes #4413.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4428

Differential Revision: D10036452

Pulled By: ajkr

fbshipit-source-id: f464be0642bd096d9040f82cdc3eae614a902183
2018-09-26 13:22:45 -07:00
Abhishek Madan
519f8b145f Generate appropriate number of keys in db_bench (#4404)
Summary:
If range tombstones are generated every few writes, the
KeyGenerator's limit is now extended to account for the additional
Next() calls. This is primarily important for `filluniquerandom`
benchmarks that enforce the call limit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404

Differential Revision: D9949326

Pulled By: abhimadan

fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38
2018-09-19 16:28:21 -07:00
Zhongyi Xie
9b3cf908a6 add missing range in random.choice argument (#4397)
Summary:
This will fix the broken asan crash test:
> Traceback (most recent call last):
  File "tools/db_crashtest.py", line 384, in <module>
    main()
  File "tools/db_crashtest.py", line 368, in main
    parser.add_argument("--" + k, type=type(v() if callable(v) else v))
  File "tools/db_crashtest.py", line 59, in <lambda>
    "index_block_restart_interval": lambda: random.choice(1, 16),
TypeError: choice() takes exactly 2 arguments (3 given)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397

Differential Revision: D9933041

Pulled By: miasantreble

fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
2018-09-19 12:13:20 -07:00
Maysam Yabandeh
a0ebec3804 Extend crash test with index_block_restart_interval (#4383)
Summary:
The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383

Differential Revision: D9887304

Pulled By: maysamyabandeh

fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
2018-09-18 15:43:29 -07:00
Andrew Kryczka
8c25204633 Support manual flush in stress/crash tests (#4368)
Summary:
- Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
- Enabled by default in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368

Differential Revision: D9838593

Pulled By: ajkr

fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
2018-09-17 12:27:55 -07:00
Anand Ananthabhotla
a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00
Dmitri Smirnov
879998b369 Adjust c test and fix windows compilation issues
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4369

Differential Revision: D9844200

Pulled By: sagar0

fbshipit-source-id: 0d9f5f73b28234eaac55d3551ce4e2dc177af138
2018-09-14 20:57:22 -07:00
Andrew Kryczka
c94523ee56 Delete code for WAL reader to start at nonzero offset (#4362)
Summary:
The code is dead in RocksDB as `log::Reader::initial_offset_` is always zero. We should delete it so we don't have to maintain it like in #4359.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4362

Differential Revision: D9817829

Pulled By: ajkr

fbshipit-source-id: 474a2c679e5bd273b40608f3a5332931d9eefe6d
2018-09-13 17:13:03 -07:00
kckjn97
902261519e correct mistyped msg. (#4341)
Summary:
corrected the mistyped message.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4341

Differential Revision: D9816571

Pulled By: ajkr

fbshipit-source-id: 1df0424e981a01470a638a37b925c4133d59a48b
2018-09-13 14:57:38 -07:00
Maysam Yabandeh
3f5282268f Skip concurrency control during recovery of pessimistic txn (#4346)
Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346

Differential Revision: D9759149

Pulled By: maysamyabandeh

fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
2018-09-10 16:57:53 -07:00
Andrew Kryczka
2c14662213 Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347)
Summary:
Reverting is needed to unblock a user building against master, who is blocked for multiple days due to a thread-safety issue in `GetEmptyDict`. We haven't been able to fix it quickly, so reverting.

Simply ran `git revert 6c40806e51a89386d2b066fddf73d3fd03a36f65`. There were no merge conflicts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4347

Differential Revision: D9668365

Pulled By: ajkr

fbshipit-source-id: 0c56334f0a23cf5ee0233d4e4679eae6709739cd
2018-09-06 09:58:34 -07:00
Andrew Kryczka
1a88c43751 Reduce empty SST creation/deletion in compaction (#4336)
Summary:
This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug.

Also fixed an uninitialized variable in db_stress.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336

Differential Revision: D9600080

Pulled By: ajkr

fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7
2018-08-31 12:28:52 -07:00
Zhongyi Xie
1cf17ba53b Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323)
Summary:
Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323

Differential Revision: D9599170

Pulled By: miasantreble

fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
2018-08-30 18:42:51 -07:00
Shrikanth Shankar
4848bd0c4e Drop unnecessary deletion markers during compaction (issue - 3842) (#4289)
Summary:
This PR fixes issue 3842. We drop deletion markers iff
1. We are the bottom most level AND
2. All other occurrences of the key are in the same snapshot range as the delete

I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far)

make check -j64
db_stress
db_stress  --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */
./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289

Differential Revision: D9491165

Pulled By: shrikanthshankar

fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2
2018-08-24 15:17:54 -07:00
Yanqin Jin
8022500ecc Add compatibility test of SST ingestion (#4310)
Summary:
Test plan
```
$cd rocksdb/
$./tools/check_format_compatible.sh
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4310

Differential Revision: D9498125

Pulled By: riversand963

fbshipit-source-id: 83cf6992949a52199e7812bb41bc9281ac271a24
2018-08-24 14:27:43 -07:00
Andrew Kryczka
e7bb8e9b92 Fix clang build of db_stress (#4312)
Summary:
Blame: #4307
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312

Differential Revision: D9494093

Pulled By: ajkr

fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
2018-08-23 21:57:57 -07:00
Andrew Kryczka
6c40806e51 Digest ZSTD compression dictionary once per SST file (#4251)
Summary:
In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.

ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.

There are a couple other changes included in this PR:

- Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
- Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251

Differential Revision: D9257078

Pulled By: ajkr

fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
2018-08-23 19:28:18 -07:00
Andrew Kryczka
ee234e83e3 Invoke OnTableFileCreated for empty SSTs (#4307)
Summary:
The API comment on `OnTableFileCreationStarted` (b6280d01f9/include/rocksdb/listener.h (L331-L333)) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.

This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307

Differential Revision: D9485201

Pulled By: ajkr

fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
2018-08-23 18:27:30 -07:00
zhichao-cao
cf7150ac2e Add the unit test of Iterator to trace_analyzer_test (#4282)
Summary:
Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282

Differential Revision: D9436758

Pulled By: zhichao-cao

fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
2018-08-23 17:28:32 -07:00
Yanqin Jin
bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00