Summary:
Users can set the priority for file reads associated with their operation by setting `ReadOptions::rate_limiter_priority` to something other than `Env::IO_TOTAL`. Rate limiting `VerifyChecksum()` and `VerifyFileChecksums()` is the motivation for this PR, so it also includes benchmarks and minor bug fixes to get that working.
`RandomAccessFileReader::Read()` already had support for rate limiting compaction reads. I changed that rate limiting to be non-specific to compaction, but rather performed according to the passed in `Env::IOPriority`. Now the compaction read rate limiting is supported by setting `rate_limiter_priority = Env::IO_LOW` on its `ReadOptions`.
There is no default value for the new `Env::IOPriority` parameter to `RandomAccessFileReader::Read()`. That means this PR goes through all callers (in some cases multiple layers up the call stack) to find a `ReadOptions` to provide the priority. There are TODOs for cases I believe it would be good to let user control the priority some day (e.g., file footer reads), and no TODO in cases I believe it doesn't matter (e.g., trace file reads).
The API doc only lists the missing cases where a file read associated with a provided `ReadOptions` cannot be rate limited. For cases like file ingestion checksum calculation, there is no API to provide `ReadOptions` or `Env::IOPriority`, so I didn't count that as missing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9424
Test Plan:
- new unit tests
- new benchmarks on ~50MB database with 1MB/s read rate limit and 100ms refill interval; verified with strace reads are chunked (at 0.1MB per chunk) and spaced roughly 100ms apart.
- setup command: `./db_bench -benchmarks=fillrandom,compact -db=/tmp/testdb -target_file_size_base=1048576 -disable_auto_compactions=true -file_checksum=true`
- benchmarks command: `strace -ttfe pread64 ./db_bench -benchmarks=verifychecksum,verifyfilechecksums -use_existing_db=true -db=/tmp/testdb -rate_limiter_bytes_per_sec=1048576 -rate_limit_bg_reads=1 -rate_limit_user_ops=true -file_checksum=true`
- crash test using IO_USER priority on non-validation reads with https://github.com/facebook/rocksdb/issues/9567 reverted: `python3 tools/db_crashtest.py blackbox --max_key=1000000 --write_buffer_size=524288 --target_file_size_base=524288 --level_compaction_dynamic_level_bytes=true --duration=3600 --rate_limit_bg_reads=true --rate_limit_user_ops=true --rate_limiter_bytes_per_sec=10485760 --interval=10`
Reviewed By: hx235
Differential Revision: D33747386
Pulled By: ajkr
fbshipit-source-id: a2d985e97912fba8c54763798e04f006ccc56e0c
Summary:
ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`.
Namely, `WriteOptions` should not include information about "what-to-write", but should just
include information about "how-to-write".
According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore,
this PR removes `WriteOptions::timestamp` for compliance.
After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set
of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and
`SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity
made me reconsider doing it in another PR (maybe).
For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take
extra `timestamp` information when writing to `WriteBatch`es.
These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list.
Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to
`WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps
allocated already and multiple timestamps can be updated.
The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp
size of the default column family. This will be used to allocate space when calling APIs that do not
specify a column family handle.
Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing
some assertions about timestamp to returning Status code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8946
Test Plan:
make check
./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8
./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0
Make sure there is no perf regression by running the following
```
./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom
```
Before this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom : 1.831 micros/op 546235 ops/sec; 60.4 MB/s
```
After this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom : 1.820 micros/op 549404 ops/sec; 60.8 MB/s
```
Reviewed By: ltamasi
Differential Revision: D33721359
Pulled By: riversand963
fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09
Summary:
When `--sync_fault_injection` is set, this PR takes a snapshot of the expected values and starts an operation trace when the DB is opened. These files are stored in `--expected_values_dir`. They will be used for recovering the expected state of the DB following a crash where a suffix of unsynced operations are allowed to be lost.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8960
Test Plan: injected crashed at various points in `FileExpectedStateManager` and verified the next run recovers the state/trace file with highest seqno and removes all older/temporary files. Note we don't use sync_fault_injection in CI crash tests yet.
Reviewed By: pdillinger
Differential Revision: D31194941
Pulled By: ajkr
fbshipit-source-id: b0f935a529a0186c5a9c7709fcaa8829de8a84cf
Summary:
Several improvements to MultiRead:
1. Fix a bug in stress test which causes false positive when both MultiRead() return and individual read request have failure injected.
2. Add two more types of fault that should be handled: empty read results and checksum mismatch
3. Add a message indicating which type of fault is injected
4. Increase the failure rate
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8937
Reviewed By: anand1976
Differential Revision: D31085930
fbshipit-source-id: 3a04994a3cadebf9a64d25e1fe12b14b7a272fba
Summary:
add the injest_error_severity to control if it is a retryable IO Error or a fatal or unrecoverable error. Use a flag to indicate, if fatal error comes, the flag is set and db is stopped (but not corrupted).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8479
Test Plan: run ./db_stress --reopen=0 --read_fault_one_in=1000 --write_fault_one_in=5 --disable_wal=true --write_buffer_size=3000000 -writepercent=5 -readpercent=50 --injest_error_severity=2 --column_families=1, make check
Reviewed By: anand1976
Differential Revision: D29524271
Pulled By: zhichao-cao
fbshipit-source-id: 1aa9fb9b5655b0adba6f5ad12005ca8c074c795b
Summary:
Add some basic test for user-defined timestamp to db_stress. Currently,
read with timestamp always tries to read using the current timestamp.
Due to the per-key timestamp-sequence ordering constraint, we only add timestamp-
related tests to the `NonBatchedOpsStressTest` since this test serializes accesses
to the same key and uses a file to cross-check data correctness.
The timestamp feature is not supported in a number of components, e.g. Merge, SingleDelete,
DeleteRange, CompactionFilter, Readonly instance, secondary instance, SST file ingestion, transaction,
etc. Therefore, db_stress should exit if user enables both timestamp and these features at the same
time. The (currently) incompatible features can be found in
`CheckAndSetOptionsForUserTimestamp`.
This PR also fixes a bug triggered when timestamp is enabled together with
`index_type=kBinarySearchWithFirstKey`. This bug fix will also be in another separate PR
with more unit tests coverage. Fixing it here because I do not want to exclude the index type
from crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8061
Test Plan: make crash_test_with_ts
Reviewed By: jay-zhuang
Differential Revision: D27056282
Pulled By: riversand963
fbshipit-source-id: c3e00ad1023fdb9ebbdf9601ec18270c5e2925a9
Summary:
Cleans up some of the dependencies on test code in the Makefile while building tools:
- Moves the test::RandomString, DBBaseTest::RandomString into Random
- Moves the test::RandomHumanReadableString into Random
- Moves the DestroyDir method into file_utils
- Moves the SetupSyncPointsToMockDirectIO into sync_point.
- Moves the FaultInjection Env and FS classes under env
These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies. By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.
Tested both release and debug builds via Make and CMake for both static and shared libraries.
More work remains to clean up how the tools are built and remove some unnecessary dependencies. There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097
Reviewed By: riversand963
Differential Revision: D22463160
Pulled By: pdillinger
fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
Summary:
In NoBatchedOpsStress::TestMultiGet, call txn->Get() when transactions
are in use.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6860
Test Plan: make crash_test_with_txn
Reviewed By: pdillinger
Differential Revision: D21667249
Pulled By: anand1976
fbshipit-source-id: 194bd7b9630a8efc3ae29d85422a61214e9e200e
Summary:
Add MultiGet to VerifyDb and check consistency with Get in TestMultiGet.
Test plan -
make crash_test
ASAN crash test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6849
Reviewed By: pdillinger
Differential Revision: D21635011
Pulled By: anand1976
fbshipit-source-id: deb5a79d08fefd8d8010204f1f20b83adc92310e
Summary:
1. Fix a memory leak in FaultInjectionTestFS in the stack trace related
code
2. Check status of all MultiGet keys before deciding whether an error
was swallowed, instead of assuming an ok status for any key means an
undetected error
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6700
Test Plan: Run db_stress with asan and fault injection
Reviewed By: cheng-chang
Differential Revision: D21021498
Pulled By: anand1976
fbshipit-source-id: 489191efd1ab0fa834923a1e1d57253a7a315465
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538
Test Plan:
crash_test
make check
Reviewed By: riversand963
Differential Revision: D20714347
Pulled By: anand1976
fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
When called on transactions, MultiGet could return a legit MergeInProgress status. The patch excludes this case from errors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6257
Differential Revision: D19275787
Pulled By: maysamyabandeh
fbshipit-source-id: f7158229422af015947e592ae066b4273c9fb9a4
Summary:
Stress tests count number of errors and report them at the end. Not all the cases are accompanied with a log line which makes debugging difficult. The patch adds a log line to the remaining cases.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6256
Differential Revision: D19268785
Pulled By: maysamyabandeh
fbshipit-source-id: bdabcaa5c5c7edcb4ce4f25e38fd8a3fd9c7700b
Summary:
Clang analyzer was falsely reporting on use of txn=nullptr.
Added a new const variable so that it can properly prune impossible
control flows.
Also, 'make analyze' previously required setting USE_CLANG=1 as an
environment variable, not a make variable, or else compilation errors
like
g++: error: unrecognized command line option ‘-Wshorten-64-to-32’
Now USE_CLANG is not required for 'make analyze' (it's implied) and you
can do an incremental analysis (recompile what has changed) with
'USE_CLANG=1 make analyze_incremental'
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6244
Test Plan: 'make -j24 analyze', 'make crash_test'
Differential Revision: D19225950
Pulled By: pdillinger
fbshipit-source-id: 14f4039aa552228826a2de62b2671450e0fed3cb
Summary:
This commit is suspected in some crash test failures such as
Verification failed for column family 0 key 78438077: Value not found: NotFound:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243
Test Plan: 'make check' and start 'make crash_test'
Differential Revision: D19220495
Pulled By: pdillinger
fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa
Summary:
Call Transaction::MultiGet from TestMultiGet() in db_stress. We add some Puts/Merges/Deletes into the transaction in order to exercise MultiGetFromBatchAndDB. There is no data verification on read, just check status. There is currently no read data verification in db_stress as it requires synchronization with writes. It needs to be tackled separately.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6227
Test Plan: make crash_test_with_txn
Differential Revision: D19204611
Pulled By: anand1976
fbshipit-source-id: 770d0e30d002e88626c264c58103f1d709bb060c
Summary:
Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165
Test Plan: run make crash_test
Differential Revision: D19204646
Pulled By: anand1976
fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7
Summary:
Several improvements to crash_test/stress_test:
(1) Stress_test to support an parameter of bottommost compression
(2) Rename those FLAGS_* variables that are not gflags to avoid confusion
(3) Crash_test to randomly generate compression type for bottommost compression with half the chance.
(4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment.
(5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215
Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files.
Differential Revision: D19171255
fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.
Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173
Differential Revision: D19047367
Pulled By: riversand963
fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
Summary:
And clean up related code, especially in stress test.
(More clean up of db_stress_test_base.cc coming after this.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154
Test Plan: make check, make blackbox_crash_test for a bit
Differential Revision: D18938180
Pulled By: pdillinger
fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e
Summary:
Two changes:
1. Prevent static variables in a header file
2. Add "override" keyword when virtual functions are overridden.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6137
Test Plan: Build db_stress with or without LITE.
Differential Revision: D18892007
fbshipit-source-id: 295356427a34473b23ed36d6ed4ef3ae35a32db0
Summary:
db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files.
Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134
Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1
Differential Revision: D18876064
fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d