rocksdb/db_stress_tool
Andrew Kryczka 559943cdc0 Refactor expected state in stress/crash test (#8913)
Summary:
This is a precursor refactoring to enable an upcoming feature: persistence failure correctness testing.

- Changed `--expected_values_path` to `--expected_values_dir` and migrated "db_crashtest.py" to use the new flag. For persistence failure correctness testing there are multiple possible correct states since unsynced data is allowed to be dropped. Making it possible to restore all these possible correct states will eventually involve files containing snapshots of expected values and DB trace files.
- The expected values directory is managed by an `ExpectedStateManager` instance. Managing expected state files is separated out of `SharedState` to prevent `SharedState` from becoming too complex when the new files and features (snapshotting, tracing, and restoring) are introduced.
- Migrated expected values file access/management out of `SharedState` into a separate class called `ExpectedState`. This is not exposed directly to the test but rather the `ExpectedState` for the latest values file is accessed via a pass-through API on `ExpectedStateManager`. This forces the test to always access the single latest `ExpectedState`.
- Changed the initialization of the latest expected values file to use a tempfile followed by rename, and also add cleanup logic for possible stranded tempfiles.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8913

Test Plan:
run in several ways; try to make sure it's not obviously broken.

- crashtest blackbox without TEST_TMPDIR
```
$ python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none
```
- crashtest blackbox with TEST_TMPDIR
```
$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none
```
- crashtest whitebox with TEST_TMPDIR
```
$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py whitebox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none --random_kill_odd=88887
```
- db_stress without expected_values_dir
```
$ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true
```
- db_stress with expected_values_dir and manual corruption
```
$ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true --expected_values_dir=./
// modify one byte in "./LATEST.state"
$ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=false --expected_values_dir=./
...
Verification failed for column family 0 key 0000000000000000 (0): Value not found: NotFound:
...
```

Reviewed By: riversand963

Differential Revision: D30921951

Pulled By: ajkr

fbshipit-source-id: babfe218062e55d018c9b046536c0289fb78f41c
2021-09-28 14:13:33 -07:00
..
batched_ops_stress.cc Allow WriteBatch to have keys with different timestamp sizes (#8725) 2021-09-12 15:34:26 -07:00
cf_consistency_stress.cc Fix cf_consistency_stress for backup/restore, harmonize (#7373) 2020-09-10 22:55:06 -07:00
CMakeLists.txt Refactor expected state in stress/crash test (#8913) 2021-09-28 14:13:33 -07:00
db_stress_common.cc Add user-defined timestamps to db_stress (#8061) 2021-03-23 05:13:30 -07:00
db_stress_common.h Refactor expected state in stress/crash test (#8913) 2021-09-28 14:13:33 -07:00
db_stress_compaction_filter.h Prevent deadlock in db_stress with DbStressCompactionFilter (#8956) 2021-09-24 16:54:02 -07:00
db_stress_driver.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
db_stress_driver.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
db_stress_env_wrapper.h db_stress to add --open_metadata_write_fault_one_in (#8235) 2021-04-28 10:58:05 -07:00
db_stress_gflags.cc Refactor expected state in stress/crash test (#8913) 2021-09-28 14:13:33 -07:00
db_stress_listener.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
db_stress_shared_state.cc Silence false alarms in db_stress fault injection (#6741) 2020-04-24 13:06:12 -07:00
db_stress_shared_state.h Refactor expected state in stress/crash test (#8913) 2021-09-28 14:13:33 -07:00
db_stress_stat.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
db_stress_table_properties_collector.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
db_stress_test_base.cc Avoid overwriting first non-OK Status in db_stress setup (#8907) 2021-09-15 14:28:09 -07:00
db_stress_test_base.h Inject fatal write failures to db_stress when DB is running (#8479) 2021-07-01 14:16:47 -07:00
db_stress_tool.cc Stress test to inject read failures in DB reopen (#8476) 2021-07-06 11:05:27 -07:00
db_stress.cc Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
expected_state.cc Refactor expected state in stress/crash test (#8913) 2021-09-28 14:13:33 -07:00
expected_state.h Refactor expected state in stress/crash test (#8913) 2021-09-28 14:13:33 -07:00
no_batched_ops_stress.cc Improve fault injection to MultiRead (#8937) 2021-09-21 14:48:15 -07:00