Summary:
When the trace contains the MultiGet record, with this PR, it can replay the MultiGet.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8577
Test Plan: make check and replay the real trace.
Reviewed By: anand1976
Differential Revision: D29864060
Pulled By: zhichao-cao
fbshipit-source-id: 5288d4fc9b6a3cb331de1e0c635d4e044dcb534a
Summary:
Allow extra arguments to be passed to db_stress in fbcode crash tests by the ```rocksdb-lego-determinator``` invoker.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8587
Reviewed By: zhichao-cao
Differential Revision: D29940217
Pulled By: anand1976
fbshipit-source-id: 17cbcd2def60eff2a895553f917694496c4742aa
Summary:
- Added Type/CreateFromString
- Added ability to load EventListeners to DBOptions
- Since EventListeners did not previously have a Name(), defaulted to "". If there is no name, the listener cannot be loaded from the ObjectRegistry.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8473
Reviewed By: zhichao-cao
Differential Revision: D29901488
Pulled By: mrambacher
fbshipit-source-id: 2d3a4aa6db1562ac03e7ad41b360e3521d486254
Summary:
Add `experimental_mempurge_policy` option flag and introduce two new `MemPurge` (Memtable Garbage Collection) policies: 'ALWAYS' and 'ALTERNATE'. Default value: ALTERNATE.
`ALWAYS`: every flush will first go through a `MemPurge` process. If the output is too big to fit into a single memtable, then the mempurge is aborted and a regular flush process carries on. `ALWAYS` is designed for user that need to reduce the number of L0 SST file created to a strict minimum, and can afford a small dent in performance (possibly hits to CPU usage, read efficiency, and maximum burst write throughput).
`ALTERNATE`: a flush is transformed into a `MemPurge` except if one of the memtables being flushed is the product of a previous `MemPurge`. `ALTERNATE` is a good tradeoff between reduction in number of L0 SST files created and performance. `ALTERNATE` perform particularly well for completely random garbage ratios, or garbage ratios anywhere in (0%,50%], and even higher when there is a wild variability in garbage ratios.
This PR also includes support for `experimental_mempurge_policy` in `db_bench`.
Testing was done locally by replacing all the `MemPurge` policies of the unit tests with `ALTERNATE`, as well as local testing with `db_crashtest.py` `whitebox` and `blackbox`. Overall, if an `ALWAYS` mempurge policy passes the tests, there is no reasons why an `ALTERNATE` policy would fail, and therefore the mempurge policy was set to `ALWAYS` for all mempurge unit tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8583
Reviewed By: pdillinger
Differential Revision: D29888050
Pulled By: bjlemaire
fbshipit-source-id: e2cf26646d66679f6f5fb29842624615610759c1
Summary:
DistributedMutex hasn't been used in the code base and enabling
`USE_FOLLY_DISTRIBUTED_MUTEX` only runs the mutex tests from third-party
lib. So disabling it for now.
The implementation may also out of date, should re-sync with folly before
using.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8584
Test Plan: CI
Reviewed By: ajkr
Differential Revision: D29888960
Pulled By: jay-zhuang
fbshipit-source-id: 3e75f73386c6ed03efb96a1400258d602a724f17
Summary:
PR https://github.com/facebook/rocksdb/issues/8519 fix db_bench_tool.cc for MSVC build errors by simply copy-paste, this PR fix the copy-paste while also works for MSVC.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8553
Reviewed By: ajkr
Differential Revision: D29838056
Pulled By: jay-zhuang
fbshipit-source-id: 0cd60c146b87a355c3dc1061dfe813169d75cea4
Summary:
event log info may be truncated, the default buffer size is 512, this PR changes buffer size to 8192.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8563
Reviewed By: ajkr
Differential Revision: D29838229
Pulled By: jay-zhuang
fbshipit-source-id: 00c5dea3caff0641a209f02c972e92d65b505f50
Summary:
Originally the 2 options `db_log_dir` and `wal_dir` will be reused in a snapshot db since the options files are just copied. By default, if `wal_dir` was not set when a db was created, it is set to the db's dir. Therefore, the snapshot db will use the same WAL dir. If both the original db and the snapshot db write to or delete from the WAL dir, one may modify or delete files which belong to the other. The same applies to `db_log_dir` as well, but as info log files are not copied or linked, it is simpler for this option.
2 arguments are added to `Checkpoint::CreateCheckpoint()`, allowing to override these 2 options.
`wal_dir`: If the function argument `wal_dir` is empty, or set to the original db location, or the checkpoint location, the snapshot's `wal_dir` option will be updated to the checkpoint location. Otherwise, the absolute path specified in the argument will be used. During checkpointing, live WAL files will be copied or linked the new location, instead of the current WAL dir specified in the original db.
`db_log_dir`: Same as `wal_dir`, but no files will be copied or linked.
A new unit test was added: `CheckpointTest.CheckpointWithOptionsDirsTest`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8572
Test Plan:
New unit test
```
checkpoint_test --gtest_filter="CheckpointTest.CheckpointWithOptionsDirsTest"
```
Output
```
Note: Google Test filter = CheckpointTest.CheckpointWithOptionsDirsTest
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from CheckpointTest
[ RUN ] CheckpointTest.CheckpointWithOptionsDirsTest
[ OK ] CheckpointTest.CheckpointWithOptionsDirsTest (11712 ms)
[----------] 1 test from CheckpointTest (11712 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (11713 ms total)
[ PASSED ] 1 test.
```
This test will fail without this patch. Just modify the code to remove the 2 arguments introduced in this patch in `CreateCheckpoint()`.
Reviewed By: zhichao-cao
Differential Revision: D29832761
Pulled By: autopear
fbshipit-source-id: e6a639b4d674380df82998c0839e79cab695fe29
Summary:
The PerThreadDBPath has already specified a slash. It does not need to be specified when initializing the test path.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8555
Reviewed By: ajkr
Differential Revision: D29758399
Pulled By: jay-zhuang
fbshipit-source-id: 6d2b878523e3e8580536e2829cb25489844d9011
Summary:
The main challenge to make the memtable garbage collection prototype (nicknamed `mempurge`) was to not get rid of WAL files that contain unflushed (but mempurged) data. That was successfully guaranteed by not writing the VersionEdit to the MANIFEST file after a successful mempurge.
By not writing VersionEdits to the `MANIFEST` file after a succesful mempurge operation, we do not change the earliest log file number that contains unflushed data: `cfd->GetLogNumber()` (`cfd->SetLogNumber()` is only called in `VersionSet::ProcessManifestWrites`). As a result, a number of functions introduced earlier just for the mempurge operation are not obscolete/redundant. (e.g.: `FlushJob::ExtractEarliestLogFileNumber`), and this PR aims at cleaning up all these now-unnecessary functions. In particular, we no longer need to store the earliest log file number in the `MemTable` struct itself. This PR therefore also reverts the `MemTable` struct to its original form.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8558
Test Plan: Already included in `db_flush_test.cc`.
Reviewed By: anand1976
Differential Revision: D29764351
Pulled By: bjlemaire
fbshipit-source-id: 0f43b260fa270251862512f397d3f24ee62e8437
Summary:
Now we can analyze the MultiGet queries in the trace file and generate a set of the statistic and analysis files. Note that, when one MultiGet access N keys, we count each sub-get-query individually. But the over all query number is still the MultiGet not the sub-get-query.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8575
Test Plan: added new unit test and make check
Reviewed By: anand1976
Differential Revision: D29860633
Pulled By: zhichao-cao
fbshipit-source-id: a132128527f36828d266df8e36e3ec626c2170be
Summary:
If the primary's CURRENT file is missing or inaccessible, the secondary should not hang
trying repeatedly to switch to the next MANIFEST.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8200
Test Plan: make check
Reviewed By: jay-zhuang
Differential Revision: D27840627
Pulled By: riversand963
fbshipit-source-id: 071fed97cbab1bc5cdefd1dc235e5cd406c174e1
Summary:
ObjectLibrary is shared between multiple DB instances, the
Register() could have race condition.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8574
Test Plan: pass the failed test
Reviewed By: ajkr
Differential Revision: D29855096
Pulled By: jay-zhuang
fbshipit-source-id: 541eed0bd495d2c963d858d81e7eabf1ba16153c
Summary:
Rare TSAN and valgrind failures are caused by unnecessary
reading of a field on the TaskLimiterToken::limiter_ for an assertion
after the token has been released and the limiter destroyed. To simplify
we can simply destroy the token before triggering DB shutdown
(potentially destroying the limiter). This makes the ReleaseOnce logic
unnecessary.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8567
Test Plan: watch for more failures in CI
Reviewed By: ajkr
Differential Revision: D29811795
Pulled By: pdillinger
fbshipit-source-id: 135549ebb98fe4f176d1542ed85d5bd6350a40b3
Summary:
https://github.com/facebook/rocksdb/pull/8548 is not complete. We should instead cover all cases writable files are buffered, not just when failures are ingested. Extend it to any case where failures are ingested in DB open.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8570
Test Plan: Run db_stress and see it doesn't break
Reviewed By: jay-zhuang
Differential Revision: D29830415
fbshipit-source-id: 94449a0468fb2f7eec17423724008c9c63b2445d
Summary:
Try avoid expensive updating options operation if
`SetDBOptions()` does not change any option value.
Skip updating is not guaranteed, for example, changing `bytes_per_sync`
to `0` may still trigger updating, as the value could be sanitized.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8518
Test Plan: added unittest
Reviewed By: riversand963
Differential Revision: D29672639
Pulled By: jay-zhuang
fbshipit-source-id: b7931de62ceea6f1bdff0d1209adf1197d3ed1f4
Summary:
Add flags `overwrite_probability` and `overwrite_window_size` flag to `db_bench`.
Add the possibility of performing a `filluniquerandom` benchmark with an overwrite probability.
For each write operation, there is a probability _p_ that the write is an overwrite (_p_=`overwrite_probability`).
When an overwrite is decided, the key is randomly chosen from the last _N_ keys previously inserted into the DB (with _N_=`overwrite_window_size`).
When a pure write is decided, the key inserted into the DB is unique and therefore will not be an overwrite.
The `overwrite_window_size` is used so that the user can decide if the overwrite are mostly targeting recently inserted keys (when `overwrite_window_size` is small compared to the total number of writes), or can also target keys inserted "a long time ago" (when `overwrite_window_size` is comparable to total number of writes).
Note that total number of writes = # of unique insertions + # of overwrites.
No unit test specifically added.
Local testing show the following **throughputs** for `filluniquerandom` with 1M total writes:
- bypass the code inserts (no `overwrite_probability` flag specified): ~14.0MB/s
- `overwrite_probability=0.99`, `overwrite_window_size=10`: ~17.0MB/s
- `overwrite_probability=0.10`, `overwrite_window_size=10`: ~14.0MB/s
- `overwrite_probability=0.99`, `overwrite_window_size=1M`: ~14.5MB/s
- `overwrite_probability=0.10`, `overwrite_window_size=1M`: ~14.0MB/s
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8569
Reviewed By: pdillinger
Differential Revision: D29818631
Pulled By: bjlemaire
fbshipit-source-id: d472b4ea4e457a4da7c4ee4f14b40cccd6a4587a
Summary:
If we want to check whether a Status s is NoSpace() or not, we should check the subcode instread of using s==Status::NoSpace(). Fix some of the incorrect check in the ErrorHandler.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8504
Test Plan: make check
Reviewed By: anand1976
Differential Revision: D29601764
Pulled By: zhichao-cao
fbshipit-source-id: cdab56a827891c23746bba9cbb53f169fe35f086
Summary:
Delete column family handlers before deleting db to avoid `last_ref`
assert.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8564
Test Plan: Inject compaction test in db_stress test
Reviewed By: pdillinger
Differential Revision: D29797375
Pulled By: jay-zhuang
fbshipit-source-id: e8baf4d279f4db5d963db95c9445454156205501
Summary:
Tiny PR to add the `experimental_allow_mempurge` to the `db_bench` tool (`Mempurge` is the current prototype for memtable garbage collection).
This is useful to benchmark the prototype of this new feature, stress test it and help find new meaningful heuristics for GC.
By default, the flag to allow `mempurge` is set to `false`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8546
Reviewed By: anand1976
Differential Revision: D29738338
Pulled By: bjlemaire
fbshipit-source-id: 01892883a2f1c714c110718674da05992d6e2dd6
Summary:
Fixed a stats_history_test failure on Windows
* In StatsHistoryTest.InMemoryStatsHistoryPurging test, the capping memory cost of stats_history_size on Windows increases to 15000 bytes with latest changes
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8520
Reviewed By: ajkr
Differential Revision: D29734631
Pulled By: mrambacher
fbshipit-source-id: 461698fcf22ef06acfb7f7aa86f8415aaffe7f1e
Summary:
Already has good coverage for GetProperty and GetIntProperty
but this one was missing.
This should add more confidence to https://github.com/facebook/rocksdb/issues/8538
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8551
Test Plan:
brief local run with boosted probability showed no immediate
issues
Reviewed By: siying
Differential Revision: D29746383
Pulled By: pdillinger
fbshipit-source-id: 9f9f525bc1a7607f85e563e33bda1979ef197127
Summary:
Currently, the code shows that we delete memtables immedately after it is trimmed from history. Although it should never happen as the super version still holds the memtable, which is only switched after it, it feels a good practice not to do it, but use clean it up in the standard way: put it to WriteContext and clean it after DB mutex.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8530
Test Plan: Run all existing tests.
Reviewed By: ajkr
Differential Revision: D29703410
fbshipit-source-id: 21d8068ac6377de4b6fa7a89697195742659fde4
Summary:
There is an extra " in options.h (`"index block""`)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8550
Test Plan: None
Reviewed By: jay-zhuang
Differential Revision: D29746077
Pulled By: autopear
fbshipit-source-id: 2e5117296e5414b7c7440d990926bc1e567a0b4f
Summary:
When DB Stress enables write failure in reopen, WAL files are also created with a wrapper writalbe file which buffers write until fsync. However, crash test currently expects all writes to WAL is persistent. This is at odd with the unsynced bytes dropped. To work it around temporarily, we disable WAL write failure for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8548
Test Plan: Run db_stress. Manual printf to make sure only WAL files are skipped.
Reviewed By: jay-zhuang
Differential Revision: D29745095
fbshipit-source-id: 1879dd2c01abad7879ca243ee94570ec47c347f3
Summary:
Otherwise the build may report warning about missing
`benchmark.h` for some targets, the error won't break the build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8523
Test Plan:
`make blackbox_ubsan_crash_test` on a machine without
benchmark lib installed.
Reviewed By: pdillinger
Differential Revision: D29682478
Pulled By: jay-zhuang
fbshipit-source-id: e1261fbcda46bc6bd3cd39b7b03b7f78927d0430
Summary:
Some URIs for creating instances (ala SecondaryCache) use complex URIs like (cache://name;prop=value). These URIs were treated as name-value properties. With this change, if the URI does not contain an "id=XX" setting, it will be treated as a single string value (and not an ID and map of name-value properties).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8547
Reviewed By: anand1976
Differential Revision: D29741386
Pulled By: mrambacher
fbshipit-source-id: 0621f62bec3a6699a7b66c7c0b5634b2856653aa
Summary:
I previously didn't notice the DB mutex was being held during
block cache entry stat scans, probably because I primarily checked for
read performance regressions, because they require the block cache and
are traditionally latency-sensitive.
This change does some refactoring to avoid holding DB mutex and to
avoid triggering and waiting for a scan in GetProperty("rocksdb.cfstats").
Some tests have to be updated because now the stats collector is
populated in the Cache aggressively on DB startup rather than lazily.
(I hope to clean up some of this added complexity in the future.)
This change also ensures proper treatment of need_out_of_mutex for
non-int DB properties.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8538
Test Plan:
Added unit test logic that uses sync points to fail if the DB mutex
is held during a scan, covering the various ways that a scan might be
triggered.
Performance test - the known impact to holding the DB mutex is on
TransactionDB, and the easiest way to see the impact is to hack the
scan code to almost always miss and take an artificially long time
scanning. Here I've injected an unconditional 5s sleep at the call to
ApplyToAllEntries.
Before (hacked):
$ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 433.219 micros/op 2308 ops/sec; 0.1 MB/s ( transactions:78999 aborts:0)
rocksdb.db.write.micros P50 : 16.135883 P95 : 36.622503 P99 : 66.036115 P100 : 5000614.000000 COUNT : 149677 SUM : 8364856
$ TEST_TMPDIR=/dev/shm ./db_bench.base_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 448.802 micros/op 2228 ops/sec; 0.1 MB/s ( transactions:75999 aborts:0)
rocksdb.db.write.micros P50 : 16.629221 P95 : 37.320607 P99 : 72.144341 P100 : 5000871.000000 COUNT : 143995 SUM : 13472323
Notice the 5s P100 write time.
After (hacked):
$ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 303.645 micros/op 3293 ops/sec; 0.1 MB/s ( transactions:98999 aborts:0)
rocksdb.db.write.micros P50 : 16.061871 P95 : 33.978834 P99 : 60.018017 P100 : 616315.000000 COUNT : 187619 SUM : 4097407
$ TEST_TMPDIR=/dev/shm ./db_bench.new_xxx -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 310.383 micros/op 3221 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0)
rocksdb.db.write.micros P50 : 16.270026 P95 : 35.786844 P99 : 64.302878 P100 : 603088.000000 COUNT : 183819 SUM : 4095918
P100 write is now ~0.6s. Not good, but it's the same even if I completely bypass all the scanning code:
$ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 311.365 micros/op 3211 ops/sec; 0.1 MB/s ( transactions:96999 aborts:0)
rocksdb.db.write.micros P50 : 16.274362 P95 : 36.221184 P99 : 68.809783 P100 : 649808.000000 COUNT : 183819 SUM : 4156767
$ TEST_TMPDIR=/dev/shm ./db_bench.new_skip -benchmarks=randomtransaction,stats -cache_index_and_filter_blocks=1 -bloom_bits=10 -partition_index_and_filters=1 -duration=30 -stats_dump_period_sec=12 -cache_size=100000000 -statistics -transaction_db 2>&1 | egrep 'db.db.write.micros|micros/op'
randomtransaction : 308.395 micros/op 3242 ops/sec; 0.1 MB/s ( transactions:97999 aborts:0)
rocksdb.db.write.micros P50 : 16.106222 P95 : 37.202403 P99 : 67.081875 P100 : 598091.000000 COUNT : 185714 SUM : 4098832
No substantial difference.
Reviewed By: siying
Differential Revision: D29738847
Pulled By: pdillinger
fbshipit-source-id: 1c5c155f5a1b62e4fea0fd4eeb515a8b7474027b
Summary:
Right now, db_bench with seekrandom and multiple DB setup creates iterator for all DBs just to query one of them. It's different from most real workloads. Fix it by only creating iterators that will be queried.
Also fix a bug that DBs are not destroyed in multi-DB mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7818
Test Plan: Run db_bench with single/multiDB X using/not using tailing iterator with ASAN build, and validate the behavior is expected.
Reviewed By: ajkr
Differential Revision: D25720226
fbshipit-source-id: c2ff7ff7120e5ba64287a30b057c5d29b2cbe20b
Summary:
Add `experiemental_allow_mempurge` flag support for `db_stress` and `db_crashtest.py`, with a `false` default value.
I succesfully tested locally both `whitebox` and `blackbox` crash tests with `experiemental_allow_mempurge` flag set as true.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8545
Reviewed By: akankshamahajan15
Differential Revision: D29734513
Pulled By: bjlemaire
fbshipit-source-id: 24316c0eccf6caf409e95c035f31d822c66714ae
Summary:
Made the EncryptionProvider and BlockCipher classes inherit from Customizable. Added/fixed the CreateFromString method to these classes to create instances from builtin or registered classes. Added tests to verify that instances can be registered and retrieved as appropriate.
Added the ability to configure the builtin (CTR, ROT13) classes from configurable properties. Added the appropriate tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8354
Reviewed By: zhichao-cao
Differential Revision: D29558949
Pulled By: mrambacher
fbshipit-source-id: c20286b32d179777e060f51a58943e9b0cf81d04
Summary:
In https://github.com/facebook/rocksdb/issues/8539 I accidentally only checked for GCC TSAN, which is
what I tested locally, while CircleCI and FB CI use clang TSAN. Related:
other existing code like in stack_trace.cc only check for clang TSAN.
I've now standardized these to the GCC convention in port/lang.h, so now
#ifdef __SANITIZE_THREAD__
can check for any TSAN (assuming lang.h include)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8543
Test Plan:
Put an assert(false) in slice_test and look for the NOTE
about "signal-unsafe call", both GCC and clang. Eventually, CircleCI
TSAN in https://github.com/facebook/rocksdb/issues/8538
Reviewed By: zhichao-cao
Differential Revision: D29728483
Pulled By: pdillinger
fbshipit-source-id: 8a3b8015c2ed48078214c3ee17146a2c3f11c9f7
Summary:
In this PR, `mempurge` is made compatible with the Write Ahead Log: in case of recovery, the DB is now capable of recovering the data that was "mempurged" and kept in the `imm()` list of immutable memtables.
The twist was to add a uint64_t to the `memtable` struct to store the number of the earliest log file containing entries from the `memtable`. When a `Flush` operation is replaced with a `MemPurge`, the `VersionEdit` (which usually contains the new min log file number to pick up for recovery and the level 0 file path of the newly created SST file) is no longer appended to the manifest log, and every time the `deleteWal` method is called, a check is made on the list of immutable memtables.
This PR also includes a unit test that verifies that no data is lost upon Reopening of the database when the mempurge feature is activated. This extensive unit test includes two column families, with valid data contained in the imm() at time of "crash"/reopening (recovery).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8528
Reviewed By: pdillinger
Differential Revision: D29701097
Pulled By: bjlemaire
fbshipit-source-id: 072a900fb6ccc1edcf5eef6caf88f3060238edf9
Summary:
Some bits are mutated and read while holding a lock, other
immutable bits (esp. secondary cache compatibility) can be read by
arbitrary threads without holding a lock. AFAIK, this doesn't cause an
issue on any architecture we care about, because you will get some
legitimate version of the value that includes the initialization, as
long as synchronization guarantees the initialization happens before the
read.
I've only seen this in https://github.com/facebook/rocksdb/issues/8538 so far, but it should be fixed regardless.
Otherwise, we'll surely get these false reports again some time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8539
Test Plan: some local TSAN test runs and in CircleCI
Reviewed By: zhichao-cao
Differential Revision: D29720262
Pulled By: pdillinger
fbshipit-source-id: 365fd7e565577c648815161f71b339bcb5ce12d5
Summary:
Some cmake and test configuration are set in pre-steps
enviroment variables. Add the missing steps.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8524
Test Plan: CI pass
Reviewed By: siying
Differential Revision: D29682731
Pulled By: jay-zhuang
fbshipit-source-id: afda1acf6a7b76989db450442b0b27f387388b9d
Summary:
The removed function in this PR, just only have declared and dose not have any reference used.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8508
Reviewed By: mrambacher
Differential Revision: D29649033
Pulled By: jay-zhuang
fbshipit-source-id: df98143b73d6c184a2a60c9f7ea2548a065ee35d
Summary:
This PR is for https://github.com/facebook/rocksdb/issues/8453
We need to update `s = biter.status();` when `biter.status().IsIncomplete()` is true. By doing this, can fix the problem in issue.
Besides, we still need to update `db_statistics` in `get_context.ReportCounters()` before return back.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8485
Reviewed By: jay-zhuang
Differential Revision: D29604835
Pulled By: ajkr
fbshipit-source-id: c7f2f1cd058223ce1b507ec05d57cf264b9c9710
Summary:
Fixed a few MSVC (VCToolsVersion=14.0) build errors and warnings
* `DEFINE_string` is a macro and VC compiler complains that it cannot put [ifdef-inside-define](https://stackoverflow.com/questions/5586429/ifdef-inside-define)
* `sleep()` is not a recognizable function. Use `FLAGS_env->SleepForMicroseconds` instead
* Define precise type in comparison to avoid mismatch warning
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8519
Reviewed By: jay-zhuang
Differential Revision: D29683086
fbshipit-source-id: 8c80941472089f8daba84ae29597e75e603850e4
Summary:
Bumps [addressable](https://github.com/sporkmonger/addressable) from 2.7.0 to 2.8.0.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/sporkmonger/addressable/blob/main/CHANGELOG.md">addressable's changelog</a>.</em></p>
<blockquote>
<h1>Addressable 2.8.0</h1>
<ul>
<li>fixes ReDoS vulnerability in Addressable::Template#match</li>
<li>no longer replaces <code>+</code> with spaces in queries for non-http(s) schemes</li>
<li>fixed encoding ipv6 literals</li>
<li>the <code>:compacted</code> flag for <code>normalized_query</code> now dedupes parameters</li>
<li>fix broken <code>escape_component</code> alias</li>
<li>dropping support for Ruby 2.0 and 2.1</li>
<li>adding Ruby 3.0 compatibility for development tasks</li>
<li>drop support for <code>rack-mount</code> and remove Addressable::Template#generate</li>
<li>performance improvements</li>
<li>switch CI/CD to GitHub Actions</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="6469a232c0"><code>6469a23</code></a> Updating gemspec again</li>
<li><a href="24336385de"><code>2433638</code></a> Merge branch 'main' of github.com:sporkmonger/addressable into main</li>
<li><a href="e9c76b8897"><code>e9c76b8</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/378">https://github.com/facebook/rocksdb/issues/378</a> from ashmaroli/flat-map</li>
<li><a href="56c5cf7ece"><code>56c5cf7</code></a> Update the gemspec</li>
<li><a href="c1fed1ca0a"><code>c1fed1c</code></a> Require a non-vulnerable rake</li>
<li><a href="0d8a3127e3"><code>0d8a312</code></a> Adding note about ReDoS vulnerability</li>
<li><a href="89c76130ce"><code>89c7613</code></a> Merge branch 'template-regexp' into main</li>
<li><a href="cf8884f815"><code>cf8884f</code></a> Note about alias fix</li>
<li><a href="bb03f7112e"><code>bb03f71</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sporkmonger/addressable/issues/371">https://github.com/facebook/rocksdb/issues/371</a> from charleystran/add_missing_encode_component_doc_entry</li>
<li><a href="6d1d8094a6"><code>6d1d809</code></a> Adding note about :compacted normalization</li>
<li>Additional commits viewable in <a href="https://github.com/sporkmonger/addressable/compare/addressable-2.7.0...addressable-2.8.0">compare view</a></li>
</ul>
</details>
<br />
[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=addressable&package-manager=bundler&previous-version=2.7.0&new-version=2.8.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language
You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts).
</details>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8515
Reviewed By: jay-zhuang
Differential Revision: D29668988
Pulled By: ajkr
fbshipit-source-id: c4b7abd4a879a7b562cb8ba745088dba6644f503
Summary:
MyRocks apparently uses valgrind to check for unreachable
unfreed data, which is stricter than our valgrind checks. Internal ref:
D29257815
This patch adds valgrind support to STATIC_AVOID_DESTRUCTION so that it's
not reported with those stricter checks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8503
Test Plan:
make valgrind_test
Also, with modified VALGRIND_OPTS (see Makefile), more kinds of
failures seen before than after this commit.
Reviewed By: ajkr, yizhang82
Differential Revision: D29597784
Pulled By: pdillinger
fbshipit-source-id: 360de157a176aec4d1be99ca20d160ecd47c0873
Summary:
1. Fix printing of stats when there are no writes (wamp=0). Previously had a div0 error
2. Added multireadrandom command as a valid target
3. Added ability to pass additional command line options to db_bench. Now can say things like benchmark.sh readrandom --mmap_read and the option will be passed to db_bench.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8346
Reviewed By: zhichao-cao
Differential Revision: D29500436
Pulled By: mrambacher
fbshipit-source-id: 54e90708aae9133be3a903e35efdf8f8abbd86fa
Summary:
The MemPurge output status can either be an Abort if the mempurge is aborted due to the new_mem memtable reaching more than the target capacity (currently 60%), or for other reasons. As a result, in the log, we want to differentiate between an abort status, which in this PR only leads to a ROCKS_LOG_INFO, and any other status, which in this PR leads to a ROCKS_LOG_WARN.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8514
Reviewed By: pdillinger
Differential Revision: D29662446
Pulled By: bjlemaire
fbshipit-source-id: c9bec8e238ebc7ecb14fbbddf580e6887e281c16
Summary:
When db is open as secondary, there are basically 2 step process:
1) Collect column families from wal log
2) Apply changes to Memtable
In case primary db is TransactionDB instance, wal log will contain some additional data, like noop, etc. ColumnFamilyCollector doesn't implement methods to handle these, so it fails to open a wal log written by TransactionDB. (Everything works fine with standard DB::Open).
Memtable recovery process knows how to handle such wal logs, so only missing piece seems to be ColumnFamilyCollector.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8456
Reviewed By: ajkr
Differential Revision: D29455945
Pulled By: mrambacher
fbshipit-source-id: 5b29560fcbc008e17e95d0dc4b07558f3d63e26f