Summary:
New experimental option BBTO::optimize_filters_for_memory builds
filters that maximize their use of "usable size" from malloc_usable_size,
which is also used to compute block cache charges.
Rather than always "rounding up," we track state in the
BloomFilterPolicy object to mix essentially "rounding down" and
"rounding up" so that the average FP rate of all generated filters is
the same as without the option. (YMMV as heavily accessed filters might
be unluckily lower accuracy.)
Thus, the option near-minimizes what the block cache considers as
"memory used" for a given target Bloom filter false positive rate and
Bloom filter implementation. There are no forward or backward
compatibility issues with this change, though it only works on the
format_version=5 Bloom filter.
With Jemalloc, we see about 10% reduction in memory footprint (and block
cache charge) for Bloom filters, but 1-2% increase in storage footprint,
due to encoding efficiency losses (FP rate is non-linear with bits/key).
Why not weighted random round up/down rather than state tracking? By
only requiring malloc_usable_size, we don't actually know what the next
larger and next smaller usable sizes for the allocator are. We pick a
requested size, accept and use whatever usable size it has, and use the
difference to inform our next choice. This allows us to narrow in on the
right balance without tracking/predicting usable sizes.
Why not weight history of generated filter false positive rates by
number of keys? This could lead to excess skew in small filters after
generating a large filter.
Results from filter_bench with jemalloc (irrelevant details omitted):
(normal keys/filter, but high variance)
$ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
Build avg ns/key: 29.6278
Number of filters: 5516
Total size (MB): 200.046
Reported total allocated memory (MB): 220.597
Reported internal fragmentation: 10.2732%
Bits/key stored: 10.0097
Average FP rate %: 0.965228
$ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
Build avg ns/key: 30.5104
Number of filters: 5464
Total size (MB): 200.015
Reported total allocated memory (MB): 200.322
Reported internal fragmentation: 0.153709%
Bits/key stored: 10.1011
Average FP rate %: 0.966313
(very few keys / filter, optimization not as effective due to ~59 byte
internal fragmentation in blocked Bloom filter representation)
$ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
Build avg ns/key: 29.5649
Number of filters: 162950
Total size (MB): 200.001
Reported total allocated memory (MB): 224.624
Reported internal fragmentation: 12.3117%
Bits/key stored: 10.2951
Average FP rate %: 0.821534
$ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
Build avg ns/key: 31.8057
Number of filters: 159849
Total size (MB): 200
Reported total allocated memory (MB): 208.846
Reported internal fragmentation: 4.42297%
Bits/key stored: 10.4948
Average FP rate %: 0.811006
(high keys/filter)
$ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
Build avg ns/key: 29.7017
Number of filters: 164
Total size (MB): 200.352
Reported total allocated memory (MB): 221.5
Reported internal fragmentation: 10.5552%
Bits/key stored: 10.0003
Average FP rate %: 0.969358
$ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
Build avg ns/key: 30.7131
Number of filters: 160
Total size (MB): 200.928
Reported total allocated memory (MB): 200.938
Reported internal fragmentation: 0.00448054%
Bits/key stored: 10.1852
Average FP rate %: 0.963387
And from db_bench (block cache) with jemalloc:
$ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
$ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
$ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
17063835
$ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
17430747
$ #^ 2.1% additional filter storage
$ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
rocksdb.block.cache.index.add COUNT : 33
rocksdb.block.cache.index.bytes.insert COUNT : 8440400
rocksdb.block.cache.filter.add COUNT : 33
rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
rocksdb.bloom.filter.useful COUNT : 4963889
rocksdb.bloom.filter.full.positive COUNT : 1214081
rocksdb.bloom.filter.full.true.positive COUNT : 1161999
$ #^ 1.04 % observed FP rate
$ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
rocksdb.block.cache.index.add COUNT : 33
rocksdb.block.cache.index.bytes.insert COUNT : 8448592
rocksdb.block.cache.filter.add COUNT : 33
rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
rocksdb.bloom.filter.useful COUNT : 5360933
rocksdb.bloom.filter.full.positive COUNT : 1321315
rocksdb.bloom.filter.full.true.positive COUNT : 1262999
$ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters
(Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427
Test Plan: unit test added, 'make check' with gcc, clang and valgrind
Reviewed By: siying
Differential Revision: D22124374
Pulled By: pdillinger
fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
Summary:
Avoid using `cf_consistency` together with `enable_compaction_filter` as
the former heavily uses snapshots while the latter is incompatible with
snapshots.
Also fix a clang-analyze error for a write to a variable that is never
read.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7006
Reviewed By: zhichao-cao
Differential Revision: D22141679
Pulled By: ajkr
fbshipit-source-id: 1840ae238168818a9ab5973f90fd78c067399447
Summary:
Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988
Test Plan:
running a minified blackbox crash test
```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600
```
Reviewed By: anand1976
Differential Revision: D22072888
Pulled By: ajkr
fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320
Summary:
Add crash test for the case of best-efforts recovery.
After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819
Test Plan:
```
./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0
./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0
make crash_test_with_best_efforts_recovery
```
Reviewed By: anand1976
Differential Revision: D21436753
Pulled By: riversand963
fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926
Summary:
x.size() -1 or y - 1 can overflow to an extremely large value when x.size() pr y is 0 when they are unsigned type. The end condition of i in the for loop will be extremely large, potentially causes segment fault. Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6902
Test Plan: pass make asan_check
Reviewed By: ajkr
Differential Revision: D21843767
Pulled By: zhichao-cao
fbshipit-source-id: 5b8b88155ac5a93d86246d832e89905a783bb5a1
Summary:
In NoBatchedOpsStress::TestMultiGet, call txn->Get() when transactions
are in use.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6860
Test Plan: make crash_test_with_txn
Reviewed By: pdillinger
Differential Revision: D21667249
Pulled By: anand1976
fbshipit-source-id: 194bd7b9630a8efc3ae29d85422a61214e9e200e
Summary:
Add MultiGet to VerifyDb and check consistency with Get in TestMultiGet.
Test plan -
make crash_test
ASAN crash test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6849
Reviewed By: pdillinger
Differential Revision: D21635011
Pulled By: anand1976
fbshipit-source-id: deb5a79d08fefd8d8010204f1f20b83adc92310e
Summary:
This commit adds an `compression_parallel_threads` option in
db_stress. It also fixes the naming of parallel compression
option in db_bench to keep it aligned with others.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722
Reviewed By: pdillinger
Differential Revision: D21091385
fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100
Summary:
In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test.
This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727
Test Plan:
export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"
make -j24 crash_test
Reviewed By: zhichao-cao
Differential Revision: D21139250
Pulled By: cheng-chang
fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee
Summary:
Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712
Test Plan: Run crash_test and see the option can be used or not used by chance.
Reviewed By: ltamasi
Differential Revision: D21056566
fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793
Summary:
Add env_fault_injection argument to db_stress. When enabled,
FaultInjectionTestEnv will be used instead. Currently this
option does not support running with other env setting.
This will allow
us to later manually produce error when running db_crashtest.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687
Test Plan:
make db_stress -j32
./db_stress --env_fault_injection
./db_stress --env_fault_injection --hdfs // expect error message
Reviewed By: ajkr
Differential Revision: D21014683
Pulled By: yhchiang
fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a
Summary:
This was causing db_crashtest.py to wrongly assume an error by parsing the output. Hopefully this will stabilize the crash tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6705
Test Plan: make blackbox_crash_test
Reviewed By: ltamasi
Differential Revision: D21043335
Pulled By: anand1976
fbshipit-source-id: 5cddd112b124d4e2ebd11724a17d4ef0f50c1cf8
Summary:
1. Fix a memory leak in FaultInjectionTestFS in the stack trace related
code
2. Check status of all MultiGet keys before deciding whether an error
was swallowed, instead of assuming an ok status for any key means an
undetected error
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6700
Test Plan: Run db_stress with asan and fault injection
Reviewed By: cheng-chang
Differential Revision: D21021498
Pulled By: anand1976
fbshipit-source-id: 489191efd1ab0fa834923a1e1d57253a7a315465
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538
Test Plan:
crash_test
make check
Reviewed By: riversand963
Differential Revision: D20714347
Pulled By: anand1976
fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
Summary:
Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`,
`GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the
command line parameter `get_live_files_and_wal_files_one_in` is specified.
The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable
in the sense that they can return errors if another thread removes a WAL file
while they are executing (which is a perfectly plausible and legitimate scenario).
The patch splits this command line parameter into three (one for each API),
and changes the crash test script so that only `GetLiveFiles` is tested during
our continuous crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491
Test Plan:
```
make check
python tools/db_crashtest.py whitebox
```
Reviewed By: siying
Differential Revision: D20312200
Pulled By: ltamasi
fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8
Summary:
When using custom Env, trying to call DestroyDB() with default Options will
fail.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6539
Test Plan: ./db_stress
Differential Revision: D20476204
Pulled By: riversand963
fbshipit-source-id: 612c6754660cc9b5bb3e9c2dbb2f6ecd7f648797
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
Background jobs in WritePrepared DB might access the db via a snapshot checker callback. The stress tests therefore should cancel background jobs before deleting the db in ::Reopen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6355
Differential Revision: D19664132
Pulled By: maysamyabandeh
fbshipit-source-id: 6060a830e8aad0015c10448286ad37c8a346ac01
Summary:
Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314
Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run.
Differential Revision: D19458717
fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce
Summary:
Right now, when validating prefix iterator, if control iterator is invalidate but prefix iterator shows value, we determine it as a test failure. However, this fails to consider the case where a file or memtable containing a tombstone is filtered out by a prefix bloom filter. The fix is to relax the check in this case. If we are out of prefix range, then ignore the check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6269
Test Plan: Run crash_test for a short while and it still passes.
Differential Revision: D19317594
fbshipit-source-id: b964a1cdc1df5efe439d4b32f8023e1fbc8598c1
Summary:
The crash test is failing with non-ok status after TransactionDB::Open. This patch adds more debugging information.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6272
Differential Revision: D19314527
Pulled By: maysamyabandeh
fbshipit-source-id: d45ecb0f2144e052fb4b5fdd483150440991a3b4
Summary:
When called on transactions, MultiGet could return a legit MergeInProgress status. The patch excludes this case from errors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6257
Differential Revision: D19275787
Pulled By: maysamyabandeh
fbshipit-source-id: f7158229422af015947e592ae066b4273c9fb9a4
Summary:
Stress tests count number of errors and report them at the end. Not all the cases are accompanied with a log line which makes debugging difficult. The patch adds a log line to the remaining cases.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6256
Differential Revision: D19268785
Pulled By: maysamyabandeh
fbshipit-source-id: bdabcaa5c5c7edcb4ce4f25e38fd8a3fd9c7700b
Summary:
allow_concurrent_memtable_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6254
Differential Revision: D19265819
Pulled By: maysamyabandeh
fbshipit-source-id: 47f2e2dc26fe0972c7152f4da15dadb9703f1179
Summary:
Clang analyzer was falsely reporting on use of txn=nullptr.
Added a new const variable so that it can properly prune impossible
control flows.
Also, 'make analyze' previously required setting USE_CLANG=1 as an
environment variable, not a make variable, or else compilation errors
like
g++: error: unrecognized command line option ‘-Wshorten-64-to-32’
Now USE_CLANG is not required for 'make analyze' (it's implied) and you
can do an incremental analysis (recompile what has changed) with
'USE_CLANG=1 make analyze_incremental'
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6244
Test Plan: 'make -j24 analyze', 'make crash_test'
Differential Revision: D19225950
Pulled By: pdillinger
fbshipit-source-id: 14f4039aa552228826a2de62b2671450e0fed3cb
Summary:
This commit is suspected in some crash test failures such as
Verification failed for column family 0 key 78438077: Value not found: NotFound:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243
Test Plan: 'make check' and start 'make crash_test'
Differential Revision: D19220495
Pulled By: pdillinger
fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa
Summary:
Call Transaction::MultiGet from TestMultiGet() in db_stress. We add some Puts/Merges/Deletes into the transaction in order to exercise MultiGetFromBatchAndDB. There is no data verification on read, just check status. There is currently no read data verification in db_stress as it requires synchronization with writes. It needs to be tackled separately.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6227
Test Plan: make crash_test_with_txn
Differential Revision: D19204611
Pulled By: anand1976
fbshipit-source-id: 770d0e30d002e88626c264c58103f1d709bb060c
Summary:
Recently db_stress starts to use a special Env that keeps all manifest files. This should not apply to checkpoint directory and causes test failure like this:
Verification failed: Checkpoint gave inconsistent state. Status is IO error: While mkdir: /dev/shm/rocksdb/rocksdb_crashtest_whitebox/.checkpoint27.tmp: File exists
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6233
Test Plan: Run crash_test with high chance of checkpoint and make sure it doesn't reproduce.
Differential Revision: D19207250
fbshipit-source-id: 12a931379e2e0572bb84aa658b6d03770c8551d4
Summary:
Listners are one source of bugs because we frequently release some mutex to invoke them, which introduce race conditions. Implement all callback functions in db_stress's listener class, and randomly sleep.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6197
Test Plan: Run crash_test for a while and see no obvious problem.
Differential Revision: D19134015
fbshipit-source-id: b9ea8be9366e4501759119520cd4f204943538f6
Summary:
db_stress to execute DB::GetApproximateSizes() with randomized keys and options. Return value is not validated but error will be reported.
Two ways to generate the range keys: (1) two random keys; (2) a small range.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6213
Test Plan: (1) run "make crash_test" for a while; (2) hack the code to ingest some errors to see it is reported.
Differential Revision: D19204665
fbshipit-source-id: 652db36f13bcb5a3bd8fe4a10c0aa22a77a0bce2
Summary:
Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165
Test Plan: run make crash_test
Differential Revision: D19204646
Pulled By: anand1976
fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7
Summary:
Several improvements to crash_test/stress_test:
(1) Stress_test to support an parameter of bottommost compression
(2) Rename those FLAGS_* variables that are not gflags to avoid confusion
(3) Crash_test to randomly generate compression type for bottommost compression with half the chance.
(4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment.
(5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215
Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files.
Differential Revision: D19171255
fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735
Summary:
1. Cover SeekToFirst() and SeekToLast().
2. Try to record the history of iterator operations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6166
Test Plan: Do some manual changes in the code to cover the failure cases and see the error printing is correct and SeekToFirst() and SeekToLast() sometimes show up.
Differential Revision: D19047079
fbshipit-source-id: 1ed616f919fe4d32c0a021fc37932a7bd3063bcd
Summary:
Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224
Test Plan: pass db_stress default run.
Differential Revision: D19183358
Pulled By: zhichao-cao
fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a
Summary:
As title. We can run non-cf-consistency stress tests with verify_db_one_in>0,
thus remove the check added previously.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6231
Test Plan:
```
make crash_test
```
Differential Revision: D19198295
Pulled By: riversand963
fbshipit-source-id: e874c701bb03ab76eaab00f059dd4032bb2f537f
Summary:
The patch adds support for BlobDB to `db_stress`. Note that BlobDB currently does
not support (amongst other features) Column Families or the `SingleDelete` API,
so for now, those should be disabled on the command line when running `db_stress` in
BlobDB mode (using `-column_families=1` and `-nooverwritepercent=0`,
respectively). Also, some BlobDB features that do not go well with the verification logic
in `db_stress` like TTL and FIFO eviction are not supported currently.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6230
Test Plan:
```
./db_stress -max_key=100000 -use_blob_db -column_families=1 -nooverwritepercent=0 -reopen=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 -blob_db_enable_gc -blob_db_gc_cutoff=0.1 -blob_db_min_blob_size=10 -blob_db_bytes_per_sync=16384
```
Differential Revision: D19191476
Pulled By: ltamasi
fbshipit-source-id: 35840452af8c5e6095249c7fd9a53a119a0985fc
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.
Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173
Differential Revision: D19047367
Pulled By: riversand963
fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
Summary:
Complete some refactoring called for in https://github.com/facebook/rocksdb/issues/6148. Somehow I got some 'make format' in here for files I didn't change, but that should be OK. (I'm not sure why "hide whitespace changes" doesn't seem to help in review.)
Not addressed in this PR: some operations simply print to stdout rather than failing on discovering a bad status or inconsistency.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6195
Differential Revision: D19131067
Pulled By: pdillinger
fbshipit-source-id: 4f416e6b792023989ef119f385fe122426cb825d
Summary:
Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203
Differential Revision: D19145753
Pulled By: anand1976
fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b
Summary:
In https://github.com/facebook/rocksdb/issues/6174 we fixed the stress test to respect the CancelAllBackgroundWork + Close order for WritePrepared transactions. The fix missed to take into account that some invocation of CancelAllBackgroundWork are with wait=false parameter which essentially breaks the order.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6191
Differential Revision: D19102709
Pulled By: maysamyabandeh
fbshipit-source-id: f4e7b5fdae47ff1c1ac284ba1cf67d5d3f3d03eb
Summary:
Several options are trivially added to crash test and random values are picked.
Made simple test run non-dynamic level and normal test run dynamic level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176
Test Plan: Run crash_test and watch the printing
Differential Revision: D19053955
fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502
Summary:
compaction history is stored in manifest files. Preserve all of them in db_stress would help debugging.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6142
Test Plan: Run db_stress and observe that manifest files are preserved. Run whole crash_test and see how DB directory looks like.
Differential Revision: D19047026
fbshipit-source-id: f83c3e0bb5332b1b4768be5dcee56a24f9b760a9
Summary:
In the current db_stress, all the keys are generated randomly and follows the uniform distribution. In order to test some corner cases that some key are always updated or read, we need to generate the key based on other distributions. In this PR, the key is generated based on Zipfian distribution and the skewness can be controlled by setting hot_key_alpha (0.8 to 1.5 is suggested). The larger hot_key_alpha is, the more skewed will be. Not that, usually, if hot_key_alpha is larger than 2, there might be only 1 or 2 keys that are generated. If hot_key_alpha is 0, it generate the key follows uniform distribution (random key)
Testing plan: pass the db_stress and printed the keys to make sure it follows the distribution.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6163
Differential Revision: D18978480
Pulled By: zhichao-cao
fbshipit-source-id: e123b4865477f7478e83fb581f9576bada334680
Summary:
Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171
Test Plan:
```
make -j32 crash_test
Differential Revision: D19038399
Pulled By: maysamyabandeh
fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0
Summary:
Close asserts that there is no unreleased snapshots. For WritePrepared transaction, this means that the background work that holds on a snapshot must be canceled first. Update the stress tests to respect the sequence.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6174
Test Plan:
```
make -j32 crash_test
Differential Revision: D19057322
Pulled By: maysamyabandeh
fbshipit-source-id: c9e9e24f779bbfb0ab72c2717e34576c01bc6362
Summary:
And clean up related code, especially in stress test.
(More clean up of db_stress_test_base.cc coming after this.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154
Test Plan: make check, make blackbox_crash_test for a bit
Differential Revision: D18938180
Pulled By: pdillinger
fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e
Summary:
With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164
Test Plan:
```
make -j32 crash_test_with_txn
Differential Revision: D18991899
Pulled By: maysamyabandeh
fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
Summary:
Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158
Test Plan:
```
make -j32 crash_test_with_txn
```
Differential Revision: D18946307
Pulled By: maysamyabandeh
fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950
Summary:
Add SyncWAL to db_stress. Specify with `-sync_wal_one_in=N` so that it will be
called once every N operations on average.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6149
Test Plan:
```
$make db_stress
$./db_stress -sync_wal_one_in=100 -ops_per_thread=100000
```
Differential Revision: D18922529
Pulled By: riversand963
fbshipit-source-id: 4c0b8cb8fa21852722cffd957deddf688f12ea56
Summary:
CancelAllBackgroundWork() and Close() are frequently used features but we don't cover it in stress test. Simply execute them before closing the DB with 1/2 chance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6141
Test Plan: Run "db_stress".
Differential Revision: D18900861
fbshipit-source-id: 49b46ccfae120d0f9de3e0543b82fb6d715949d0
Summary:
Add an option to explicitly disable building shared versions of the
RocksDB libraries. The shared libraries cannot be built in cases where
some dependencies are only available as static libraries. This allows
still building RocksDB in these situations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6122
Differential Revision: D18920740
fbshipit-source-id: d24f66d93c68a1e65635e6e0b663bae62c903bca
Summary:
Right now, in db_stress, compact range is simply executed without any immediate data validation. Add a simply validation which compares hash for all keys within the compact range to stay the same against the same snapshot before and after the compaction.
Also, randomly tune most knobs of CompactRangeOptions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6140
Test Plan: Run db_stress with "--compact_range_one_in=2000 --compact_range_width=100000000" for a while. Manually ingest some hacky code and observe the error path.
Differential Revision: D18900230
fbshipit-source-id: d96e75bc8c38dd5ec702571ffe7cf5f4ea93ee10
Summary:
Especially with non-integral bits/key now supported,
db_crashtest should vary the bloom_bits configuration. The probabilities
look like this:
1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
chance of 0 which disables the bloom filter.
1/2 chance of a float from a lognormal distribution with a median of 10.
This always produces positive values but with a decent chance of < 1
(overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
implementation limits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103
Test Plan:
start 'make blackbox_crash_test' several times and look at
configuration output
Differential Revision: D18734877
Pulled By: pdillinger
fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
Summary:
Two changes:
1. Prevent static variables in a header file
2. Add "override" keyword when virtual functions are overridden.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6137
Test Plan: Build db_stress with or without LITE.
Differential Revision: D18892007
fbshipit-source-id: 295356427a34473b23ed36d6ed4ef3ae35a32db0
Summary:
db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files.
Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134
Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1
Differential Revision: D18876064
fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d