Commit Graph

10814 Commits

Author SHA1 Message Date
Siddhartha Roychowdhury
fec4403ff1 Integrate WAL compression into log reader/writer. (#9642)
Summary:
Integrate the streaming compress/uncompress API into WAL compression.
The streaming compression object is stored in the log_writer along with a reusable output buffer to store the compressed buffer(s).
The streaming uncompress object is stored in the log_reader along with a reusable output buffer to store the uncompressed buffer(s).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9642

Test Plan:
Added unit tests to verify different scenarios - large buffers, split compressed buffers, etc.

Future optimizations:
The overhead for small records is quite high, so it makes sense to compress only buffers above a certain threshold and use a separate record type to indicate that those records are compressed.

Reviewed By: anand1976

Differential Revision: D34709167

Pulled By: sidroyc

fbshipit-source-id: a37a3cd1301adff6152fb3fcd23726106af07dd4
2022-03-09 15:49:53 -08:00
Yanqin Jin
565fcead22 Fix clang-analyze by adding assertion (#9682)
Summary:
Clang-analyze complains about potential nullptr dereference.
Fix by adding an assertion to make clang happy.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9682

Test Plan: USE_CLANG=1 make -j20 analyze_incremental

Reviewed By: ltamasi

Differential Revision: D34755210

Pulled By: riversand963

fbshipit-source-id: 948e1899846ee1aa05a1b500a11ff43b0b412e0a
2022-03-09 10:13:02 -08:00
Yanqin Jin
3b6dc049f7 Support user-defined timestamps in write-committed txns (#9629)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9629

Pessimistic transactions use pessimistic concurrency control, i.e. locking. Keys are
locked upon first operation that writes the key or has the intention of writing. For example,
`PessimisticTransaction::Put()`, `PessimisticTransaction::Delete()`,
`PessimisticTransaction::SingleDelete()` will write to or delete a key, while
`PessimisticTransaction::GetForUpdate()` is used by application to indicate
to RocksDB that the transaction has the intention of performing write operation later
in the same transaction.
Pessimistic transactions support two-phase commit (2PC). A transaction can be
`Prepared()`'ed and then `Commit()`. The prepare phase is similar to a promise: once
`Prepare()` succeeds, the transaction has acquired the necessary resources to commit.
The resources include locks, persistence of WAL, etc.
Write-committed transaction is the default pessimistic transaction implementation. In
RocksDB write-committed transaction, `Prepare()` will write data to the WAL as a prepare
section. `Commit()` will write a commit marker to the WAL and then write data to the
memtables. While writing to the memtables, different keys in the transaction's write batch
will be assigned different sequence numbers in ascending order.
Until commit/rollback, the transaction holds locks on the keys so that no other transaction
can write to the same keys. Furthermore, the keys' sequence numbers represent the order
in which they are committed and should be made visible. This is convenient for us to
implement support for user-defined timestamps.
Since column families with and without timestamps can co-exist in the same database,
a transaction may or may not involve timestamps. Based on this observation, we add two
optional members to each `PessimisticTransaction`, `read_timestamp_` and
`commit_timestamp_`. If no key in the transaction's write batch has timestamp, then
setting these two variables do not have any effect. For the rest of this commit, we discuss
only the cases when these two variables are meaningful.

read_timestamp_ is used mainly for validation, and should be set before first call to
`GetForUpdate()`. Otherwise, the latter will return non-ok status. `GetForUpdate()` calls
`TryLock()` that can verify if another transaction has written the same key since
`read_timestamp_` till this call to `GetForUpdate()`. If another transaction has indeed
written the same key, then validation fails, and RocksDB allows this transaction to
refine `read_timestamp_` by increasing it. Note that a transaction can still use `Get()`
with a different timestamp to read, but the result of the read should not be used to
determine data that will be written later.

commit_timestamp_ must be set after finishing writing and before transaction commit.
This applies to both 2PC and non-2PC cases. In the case of 2PC, it's usually set after
prepare phase succeeds.

We currently require that the commit timestamp be chosen after all keys are locked. This
means we disallow the `TransactionDB`-level APIs if user-defined timestamp is used
by the transaction. Specifically, calling `PessimisticTransactionDB::Put()`,
`PessimisticTransactionDB::Delete()`, `PessimisticTransactionDB::SingleDelete()`,
etc. will return non-ok status because they specify timestamps before locking the keys.
Users are also prompted to use the `Transaction` APIs when they receive the non-ok status.

Reviewed By: ltamasi

Differential Revision: D31822445

fbshipit-source-id: b82abf8e230216dc89cc519564a588224a88fd43
2022-03-08 16:20:59 -08:00
Hui Xiao
ca0ef54f16 Rate-limit automatic WAL flush after each user write (#9607)
Summary:
**Context:**
WAL flush is currently not rate-limited by `Options::rate_limiter`. This PR is to provide rate-limiting to auto WAL flush, the one that automatically happen after each user write operation (i.e, `Options::manual_wal_flush == false`), by adding `WriteOptions::rate_limiter_options`.

Note that we are NOT rate-limiting WAL flush that do NOT automatically happen after each user write, such as  `Options::manual_wal_flush == true + manual FlushWAL()` (rate-limiting multiple WAL flushes),  for the benefits of:
- being consistent with [ReadOptions::rate_limiter_priority](https://github.com/facebook/rocksdb/blob/7.0.fb/include/rocksdb/options.h#L515)
- being able to turn off some WAL flush's rate-limiting but not all (e.g, turn off specific the WAL flush of a critical user write like a service's heartbeat)

`WriteOptions::rate_limiter_options` only accept `Env::IO_USER` and `Env::IO_TOTAL` currently due to an implementation constraint.
- The constraint is that we currently queue parallel writes (including WAL writes) based on FIFO policy which does not factor rate limiter priority into this layer's scheduling. If we allow lower priorities such as `Env::IO_HIGH/MID/LOW` and such writes specified with lower priorities occurs before ones specified with higher priorities (even just by a tiny bit in arrival time), the former would have blocked the latter, leading to a "priority inversion" issue and contradictory to what we promise for rate-limiting priority. Therefore we only allow `Env::IO_USER` and `Env::IO_TOTAL`  right now before improving that scheduling.

A pre-requisite to this feature is to support operation-level rate limiting in `WritableFileWriter`, which is also included in this PR.

**Summary:**
- Renamed test suite `DBRateLimiterTest to DBRateLimiterOnReadTest` for adding a new test suite
- Accept `rate_limiter_priority` in `WritableFileWriter`'s private and public write functions
- Passed `WriteOptions::rate_limiter_options` to `WritableFileWriter` in the path of automatic WAL flush.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9607

Test Plan:
- Added new unit test to verify existing flush/compaction rate-limiting does not break, since `DBTest, RateLimitingTest` is disabled and current db-level rate-limiting tests focus on read only (e.g, `db_rate_limiter_test`, `DBTest2, RateLimitedCompactionReads`).
- Added new unit test `DBRateLimiterOnWriteWALTest, AutoWalFlush`
- `strace -ftt -e trace=write ./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -rate_limit_auto_wal_flush=1 -rate_limiter_bytes_per_sec=15 -rate_limiter_refill_period_us=1000000 -write_buffer_size=100000000 -disable_auto_compactions=1 -num=100`
   - verified that WAL flush(i.e, system-call _write_) were chunked into 15 bytes and each _write_ was roughly 1 second apart
   - verified the chunking disappeared when `-rate_limit_auto_wal_flush=0`
- crash test: `python3 tools/db_crashtest.py blackbox --disable_wal=0  --rate_limit_auto_wal_flush=1 --rate_limiter_bytes_per_sec=10485760 --interval=10` killed as normal

**Benchmarked on flush/compaction to ensure no performance regression:**
- compaction with rate-limiting  (see table 1, avg over 1280-run):  pre-change: **915635 micros/op**; post-change:
   **907350 micros/op (improved by 0.106%)**
```
#!/bin/bash
TEST_TMPDIR=/dev/shm/testdb
START=1
NUM_DATA_ENTRY=8
N=10

rm -f compact_bmk_output.txt compact_bmk_output_2.txt dont_care_output.txt
for i in $(eval echo "{$START..$NUM_DATA_ENTRY}")
do
    NUM_RUN=$(($N*(2**($i-1))))
    for j in $(eval echo "{$START..$NUM_RUN}")
    do
       ./db_bench --benchmarks=fillrandom -db=$TEST_TMPDIR -disable_auto_compactions=1 -write_buffer_size=6710886 > dont_care_output.txt && ./db_bench --benchmarks=compact -use_existing_db=1 -db=$TEST_TMPDIR -level0_file_num_compaction_trigger=1 -rate_limiter_bytes_per_sec=100000000 | egrep 'compact'
    done > compact_bmk_output.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' compact_bmk_output.txt >> compact_bmk_output_2.txt
done
```
- compaction w/o rate-limiting  (see table 2, avg over 640-run):  pre-change: **822197 micros/op**; post-change: **823148 micros/op (regressed by 0.12%)**
```
Same as above script, except that -rate_limiter_bytes_per_sec=0
```
- flush with rate-limiting (see table 3, avg over 320-run, run on the [patch](ee5c6023a9) to augment current db_bench ): pre-change: **745752 micros/op**; post-change: **745331 micros/op (regressed by 0.06 %)**
```
 #!/bin/bash
TEST_TMPDIR=/dev/shm/testdb
START=1
NUM_DATA_ENTRY=8
N=10

rm -f flush_bmk_output.txt flush_bmk_output_2.txt

for i in $(eval echo "{$START..$NUM_DATA_ENTRY}")
do
    NUM_RUN=$(($N*(2**($i-1))))
    for j in $(eval echo "{$START..$NUM_RUN}")
    do
       ./db_bench -db=$TEST_TMPDIR -write_buffer_size=1048576000 -num=1000000 -rate_limiter_bytes_per_sec=100000000 -benchmarks=fillseq,flush | egrep 'flush'
    done > flush_bmk_output.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' flush_bmk_output.txt >> flush_bmk_output_2.txt
done

```
- flush w/o rate-limiting (see table 4, avg over 320-run, run on the [patch](ee5c6023a9) to augment current db_bench): pre-change: **487512 micros/op**, post-change: **485856 micors/ops (improved by 0.34%)**
```
Same as above script, except that -rate_limiter_bytes_per_sec=0
```

| table 1 - compact with rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op  (%)
-- | -- | -- | -- | -- | --
10 | 896978 | 16046.9 | 901242 | 15670.9 | 0.475373978
20 | 893718 | 15813 | 886505 | 17544.7 | -0.8070778478
40 | 900426 | 23882.2 | 894958 | 15104.5 | -0.6072681153
80 | 906635 | 21761.5 | 903332 | 23948.3 | -0.3643141948
160 | 898632 | 21098.9 | 907583 | 21145 | 0.9960695813
3.20E+02 | 905252 | 22785.5 | 908106 | 25325.5 | 0.3152713278
6.40E+02 | 905213 | 23598.6 | 906741 | 21370.5 | 0.1688000504
**1.28E+03** | **908316** | **23533.1** | **907350** | **24626.8** | **-0.1063506533**
average over #-run | 901896.25 | 21064.9625 | 901977.125 | 20592.025 | 0.008967217682

| table 2 - compact w/o rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op  (%)
-- | -- | -- | -- | -- | --
10 | 811211 | 26996.7 | 807586 | 28456.4 | -0.4468627768
20 | 815465 | 14803.7 | 814608 | 28719.7 | -0.105093413
40 | 809203 | 26187.1 | 797835 | 25492.1 | -1.404839082
80 | 822088 | 28765.3 | 822192 | 32840.4 | 0.01265071379
160 | 821719 | 36344.7 | 821664 | 29544.9 | -0.006693285661
3.20E+02 | 820921 | 27756.4 | 821403 | 28347.7 | 0.05871454135
**6.40E+02** | **822197** | **28960.6** | **823148** | **30055.1** | **0.1156657103**
average over #-run | 8.18E+05 | 2.71E+04 | 8.15E+05 | 2.91E+04 |  -0.25

| table 3 - flush with rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op  (%)
-- | -- | -- | -- | -- | --
10 | 741721 | 11770.8 | 740345 | 5949.76 | -0.1855144994
20 | 735169 | 3561.83 | 743199 | 9755.77 | 1.09226586
40 | 743368 | 8891.03 | 742102 | 8683.22 | -0.1703059588
80 | 742129 | 8148.51 | 743417 | 9631.58| 0.1735547324
160 | 749045 | 9757.21 | 746256 | 9191.86 | -0.3723407806
**3.20E+02** | **745752** | **9819.65** | **745331** | **9840.62** | **-0.0564530836**
6.40E+02 | 749006 | 11080.5 | 748173 | 10578.7 | -0.1112140624
average over #-run | 743741.4286 | 9004.218571 | 744117.5714 | 9090.215714 | 0.05057441238

| table 4 - flush w/o rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op (%)
-- | -- | -- | -- | -- | --
10 | 477283 | 24719.6 | 473864 | 12379 | -0.7163464863
20 | 486743 | 20175.2 | 502296 | 23931.3 | 3.195320734
40 | 482846 | 15309.2 | 489820 | 22259.5 | 1.444352858
80 | 491490 | 21883.1 | 490071 | 23085.7 | -0.2887139108
160 | 493347 | 28074.3 | 483609 | 21211.7 | -1.973864238
**3.20E+02** | **487512** | **21401.5** | **485856** | **22195.2** | **-0.3396839462**
6.40E+02 | 490307 | 25418.6 | 485435 | 22405.2 | -0.9936631539
average over #-run | 4.87E+05 | 2.24E+04 | 4.87E+05 | 2.11E+04 | 0.00E+00

Reviewed By: ajkr

Differential Revision: D34442441

Pulled By: hx235

fbshipit-source-id: 4790f13e1e5c0a95ae1d1cc93ffcf69dc6e78bdd
2022-03-08 13:19:39 -08:00
Ezgi Çiçek
27d6ef8e60 Rename mutable_cf_options to signify explicity copy (#9666)
Summary:
Signify explicit copy with comment and better name for variable `mutable_cf_options`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9666

Reviewed By: riversand963

Differential Revision: D34680934

Pulled By: ezgicicek

fbshipit-source-id: b64ef18725fe523835d14ceb4b29bcdfe493f8ed
2022-03-08 11:26:40 -08:00
GuKaifeng
c967436453 remove redundant assignment code for member state (#9665)
Summary:
Remove redundant assignment code for member `state` in the constructor of `ImmutableDBOptions`.
There are two identical and redundant statements `stats = statistics.get();` in lines 740 and 748 of the code.
This commit removed the line 740.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9665

Reviewed By: ajkr

Differential Revision: D34686649

Pulled By: riversand963

fbshipit-source-id: 8f246ece382b6845528f4e2c843ce09bb66b2b0f
2022-03-08 11:03:56 -08:00
Peter Dillinger
4a9ae4f713 Avoid .trash handling race in db_stress Checkpoint (#9673)
Summary:
The shared SstFileManager in db_stress can create background
work that races with TestCheckpoint such that DestroyDir fails because
of file rename while it is running. Analogous to change already made
for TestBackupRestore

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9673

Test Plan:
make blackbox_crash_test for a while with
checkpoint_one_in=100

Reviewed By: ajkr

Differential Revision: D34702215

Pulled By: pdillinger

fbshipit-source-id: ac3e166efa28cba6c6f4b9b391e799394603ebfd
2022-03-08 08:36:25 -08:00
Jay Zhuang
36aec94d85 compression_per_level should be used for flush and changeable (#9658)
Summary:
- Make `compression_per_level` dynamical changeable with `SetOptions`;
- Fix a bug that `compression_per_level` is not used for flush;

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9658

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D34700749

Pulled By: jay-zhuang

fbshipit-source-id: a23b9dfa7ad03d393c1d71781d19e91de796f49c
2022-03-07 18:06:19 -08:00
Peter Dillinger
9b8b8b1504 Remove remaining SKIP_LINK=1 in circleci config (#9669)
Summary:
Should be unnecessary

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9669

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D34689277

Pulled By: pdillinger

fbshipit-source-id: 5d44de1f851503fd1777b869c06c330f3c4deade
2022-03-07 15:23:25 -08:00
Yanqin Jin
785b804a9a Update Githubpages version (#9670)
Summary:
According to https://pages.github.com/versions/, bump the version from 209 to
225 to address https://github.com/facebook/rocksdb/security/dependabot/2

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9670

Test Plan:
```
cd docs && bundle check
```

Reviewed By: ajkr

Differential Revision: D34690813

Pulled By: riversand963

fbshipit-source-id: c9b5fb8a5e3f2a176672480bcb4068befa3e2158
2022-03-07 14:48:06 -08:00
anand76
7574841aac Fix issue #9627 (#9657)
Summary:
SMB mounts do not support hard links. The ENOTSUP error code is
returned, which should be interpreted by PosixFileSystem as
IOStatus::NotSupported().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9657

Reviewed By: mrambacher

Differential Revision: D34634783

Pulled By: anand1976

fbshipit-source-id: 0d57f5b2e6118e4c20e9ed1a293327428c3aecac
2022-03-07 11:39:31 -08:00
Adam Retter
dab19afe56 Fix RocksJava releases for macOS (#9662)
Summary:
Addresses the problems described in https://github.com/facebook/rocksdb/pull/9254#issuecomment-1054598516 and https://github.com/facebook/rocksdb/pull/9254#issuecomment-1059574837 that have blocked a RocksJava release

**NOTE** Also needs to be ported to 6.29.fb branch.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9662

Reviewed By: ajkr

Differential Revision: D34689200

Pulled By: pdillinger

fbshipit-source-id: c62fe34c54f05be5a00ee1daec8ec7454baa5eb8
2022-03-07 10:50:52 -08:00
Dmitry Vinnik
f20b674796 Adding Social Banner in Support of Ukraine (#9652)
Summary:
Our mission at [Meta Open Source](https://opensource.facebook.com/) is to empower communities through open source, and we believe that it means building a welcoming and safe environment for all. As a part of this work, we are adding this banner in support for Ukraine during this crisis.

## Testing
<img width="1080" alt="image" src="https://user-images.githubusercontent.com/12485205/156454047-9c153135-f3a6-41f7-adbe-8139759565ae.png">

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9652

Reviewed By: jay-zhuang

Differential Revision: D34647211

Pulled By: dmitryvinn-fb

fbshipit-source-id: b89cdc7eafcc58b1f503ee8e1939e43bffcb3b3f
2022-03-04 14:51:59 -08:00
Peter Dillinger
ce60d0cbe5 Test refactoring for Backups+Temperatures (#9655)
Summary:
In preparation for more support for file Temperatures in BackupEngine,
this change does some test refactoring:
* Move DBTest2::BackupFileTemperature test to
BackupEngineTest::FileTemperatures, with some updates to make it work
in the new home. This test will soon be expanded for deeper backup work.
* Move FileTemperatureTestFS from db_test2.cc to db_test_util.h, to
support sharing because of above moved test, but split off the "no link"
part to the test needing it.
* Use custom FileSystems in backupable_db_test rather than custom Envs,
because going through Env file interfaces doesn't support temperatures.
* Fix RemapFileSystem to map DirFsyncOptions::renamed_new_name
parameter to FsyncWithDirOptions, which was required because this
limitation caused a crash only after moving to higher fidelity of
FileSystem interface (vs. LegacyDirectoryWrapper throwing away some
parameter details)
* `backupable_options_` -> `engine_options_` as part of the ongoing
work to get rid of the obsolete "backupable" naming.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9655

Test Plan: test code updates only

Reviewed By: jay-zhuang

Differential Revision: D34622183

Pulled By: pdillinger

fbshipit-source-id: f24b7a596a89b9e089e960f4e5d772575513e93f
2022-03-04 12:32:30 -08:00
Hui Xiao
fc61e98ae6 Attempt to deflake DBLogicalBlockSizeCacheTest.CreateColumnFamilies (#9516)
Summary:
**Context:**
`DBLogicalBlockSizeCacheTest.CreateColumnFamilies` is flaky on a rare occurrence of assertion failure below
```
db/db_logical_block_size_cache_test.cc:210
Expected equality of these values:
  1
  cache_->GetRefCount(cf_path_0_)
    Which is: 2
```

Root-cause: `ASSERT_OK(db->DestroyColumnFamilyHandle(cfs[0]));` in the test may not successfully decrease the ref count of `cf_path_0_` since the decreasing only happens in the clean-up of `ColumnFamilyData` when `ColumnFamilyData` has no referencing to it, which may not be true when `db->DestroyColumnFamilyHandle(cfs[0])` is called since background work such as `DumpStats()` can hold reference to that `ColumnFamilyData` (suggested and repro-d by ajkr ). Similar case `ASSERT_OK(db->DestroyColumnFamilyHandle(cfs[1]));`.

See following for a deterministic repro:
```
 diff --git a/db/db_impl/db_impl.cc b/db/db_impl/db_impl.cc
index 196b428a3..4e7a834c4 100644
 --- a/db/db_impl/db_impl.cc
+++ b/db/db_impl/db_impl.cc
@@ -956,10 +956,16 @@ void DBImpl::DumpStats() {
         // near-atomically.
         // Get a ref before unlocking
         cfd->Ref();
+        if (cfd->GetName() == "cf1" || cfd->GetName() == "cf2") {
+          TEST_SYNC_POINT("DBImpl::DumpStats:PostCFDRef");
+        }
         {
           InstrumentedMutexUnlock u(&mutex_);
           cfd->internal_stats()->CollectCacheEntryStats(/*foreground=*/false);
         }
+        if (cfd->GetName() == "cf1" || cfd->GetName() == "cf2") {
+          TEST_SYNC_POINT("DBImpl::DumpStats::PreCFDUnrefAndTryDelete");
+        }
         cfd->UnrefAndTryDelete();
       }
     }
 diff --git a/db/db_logical_block_size_cache_test.cc b/db/db_logical_block_size_cache_test.cc
index 1057871c9..c3872c036 100644
 --- a/db/db_logical_block_size_cache_test.cc
+++ b/db/db_logical_block_size_cache_test.cc
@@ -9,6 +9,7 @@
 #include "env/io_posix.h"
 #include "rocksdb/db.h"
 #include "rocksdb/env.h"
+#include "test_util/sync_point.h"

 namespace ROCKSDB_NAMESPACE {
 class EnvWithCustomLogicalBlockSizeCache : public EnvWrapper {
@@ -183,6 +184,15 @@ TEST_F(DBLogicalBlockSizeCacheTest, CreateColumnFamilies) {
   ASSERT_EQ(1, cache_->GetRefCount(dbname_));

   std::vector<ColumnFamilyHandle*> cfs;
+  ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
+  ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
+      {{"DBLogicalBlockSizeCacheTest::CreateColumnFamilies::PostSetupTwoCFH",
+        "DBImpl::DumpStats:StartRunning"},
+       {"DBImpl::DumpStats:PostCFDRef",
+        "DBLogicalBlockSizeCacheTest::CreateColumnFamilies::PreDeleteTwoCFH"},
+       {"DBLogicalBlockSizeCacheTest::CreateColumnFamilies::"
+        "PostFinishCheckingRef",
+        "DBImpl::DumpStats::PreCFDUnrefAndTryDelete"}});
   ASSERT_OK(db->CreateColumnFamilies(cf_options, {"cf1", "cf2"}, &cfs));
   ASSERT_EQ(2, cache_->Size());
   ASSERT_TRUE(cache_->Contains(dbname_));
@@ -190,7 +200,7 @@ TEST_F(DBLogicalBlockSizeCacheTest, CreateColumnFamilies) {
   ASSERT_TRUE(cache_->Contains(cf_path_0_));
   ASSERT_EQ(2, cache_->GetRefCount(cf_path_0_));
   }

    // Delete one handle will not drop cache because another handle is still
   // referencing cf_path_0_.
+  TEST_SYNC_POINT(
+      "DBLogicalBlockSizeCacheTest::CreateColumnFamilies::PostSetupTwoCFH");
+  TEST_SYNC_POINT(
+      "DBLogicalBlockSizeCacheTest::CreateColumnFamilies::PreDeleteTwoCFH");
   ASSERT_OK(db->DestroyColumnFamilyHandle(cfs[0]));
   ASSERT_EQ(2, cache_->Size());
   ASSERT_TRUE(cache_->Contains(dbname_));
@@ -209,16 +221,20 @@ TEST_F(DBLogicalBlockSizeCacheTest, CreateColumnFamilies) {
   ASSERT_TRUE(cache_->Contains(cf_path_0_));
    // Will fail
   ASSERT_EQ(1, cache_->GetRefCount(cf_path_0_));

   // Delete the last handle will drop cache.
   ASSERT_OK(db->DestroyColumnFamilyHandle(cfs[1]));
   ASSERT_EQ(1, cache_->Size());
   ASSERT_TRUE(cache_->Contains(dbname_));
   // Will fail
   ASSERT_EQ(1, cache_->GetRefCount(dbname_));

+  TEST_SYNC_POINT(
+      "DBLogicalBlockSizeCacheTest::CreateColumnFamilies::"
+      "PostFinishCheckingRef");
   delete db;
   ASSERT_EQ(0, cache_->Size());
   ASSERT_OK(DestroyDB(dbname_, options,
       {{"cf1", cf_options}, {"cf2", cf_options}}));
+  ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
 }
```

**Summary**
- Removed the flaky assertion
- Clarified the comments for the test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9516

Test Plan:
- CI
- Monitor for future flakiness

Reviewed By: ajkr

Differential Revision: D34055232

Pulled By: hx235

fbshipit-source-id: 9bf83ae5fa88bf6fc829876494d4692082e4c357
2022-03-04 11:35:28 -08:00
Hui Xiao
4a776d81cc Dynamic toggling of BlockBasedTableOptions::detect_filter_construct_corruption (#9654)
Summary:
**Context/Summary:**
As requested, `BlockBasedTableOptions::detect_filter_construct_corruption` can now be dynamically configured using `DB::SetOptions` after this PR

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9654

Test Plan: - New unit test

Reviewed By: pdillinger

Differential Revision: D34622609

Pulled By: hx235

fbshipit-source-id: c06773ef3d029e6bf1724d3a72dffd37a8ec66d9
2022-03-04 10:35:08 -08:00
anand76
3362a730dc Avoid usage of ReopenWritableFile in db_stress (#9649)
Summary:
The UniqueIdVerifier constructor currently calls ReopenWritableFile on
the FileSystem, which might not be supported. Instead of relying on
reopening the unique IDs file for writing, create a new file and copy
the original contents.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9649

Test Plan: Run db_stress

Reviewed By: pdillinger

Differential Revision: D34572307

Pulled By: anand1976

fbshipit-source-id: 3a777908582d79dae57488d4278bad126774f698
2022-03-04 10:30:10 -08:00
Jay Zhuang
67542bfab5 Improve build speed (#9605)
Summary:
Improve the CI build speed:
- split the macos tests to 2 parallel jobs
- split tsan tests to 2 parallel jobs
- move non-shm tests to nightly build
- slow jobs use lager machine
- fast jobs use smaller machine
- add microbench to no-test jobs
- add run-microbench to nightly build

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9605

Reviewed By: riversand963

Differential Revision: D34358982

Pulled By: jay-zhuang

fbshipit-source-id: d5091b3f4ef6d25c5c37920fb614f3342ee60e4a
2022-03-03 11:58:51 -08:00
Yanqin Jin
659a16d52b Fix bug causing incorrect data returned by snapshot read (#9648)
Summary:
This bug affects use cases that meet the following conditions
- (has only the default column family or disables WAL) and
- has at least one event listener
- atomic flush is NOT affected.

If the above conditions meet, then RocksDB can release the db mutex before picking all the
existing memtables to flush. In the meantime, a snapshot can be created and db's sequence
number can still be incremented. The upcoming flush will ignore this snapshot.
A later read using this snapshot can return incorrect result.

To fix this issue, we call the listeners callbacks after picking the memtables so that we avoid
creating snapshots during this interval.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9648

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D34555456

Pulled By: riversand963

fbshipit-source-id: 1438981e9f069a5916686b1a0ad7627f734cf0ee
2022-03-02 21:03:14 -08:00
Yuriy Chernyshov
73fd589b1a Do not rely on ADL when invoking std::max_element (#9608)
Summary:
Certain STLs use raw pointers and ADL does not work for them.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9608

Reviewed By: ajkr

Differential Revision: D34583012

Pulled By: riversand963

fbshipit-source-id: 7de6bbc8a080c3e7243ce0d758fe83f1663168aa
2022-03-02 17:41:02 -08:00
jingkai.yuan
926ee13811 Fix corruption error when compressing blob data with zlib. (#9572)
Summary:
The plain data length may not be big enough if the compression actually expands data. So use deflateBound() to get the upper limit on the compressed output before deflate().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9572

Reviewed By: riversand963

Differential Revision: D34326475

Pulled By: ajkr

fbshipit-source-id: 4b679cb7a83a62782a127785b4d5eb9aa4646449
2022-03-02 16:35:21 -08:00
Jay Zhuang
db8647969d Unschedule manual compaction from thread-pool queue (#9625)
Summary:
PR https://github.com/facebook/rocksdb/issues/9557 introduced a race condition between manual compaction
foreground thread and background compaction thread.
This PR adds the ability to really unschedule manual compaction from
thread-pool queue by differentiate tag name for manual compaction and
other tasks.
Also fix an issue that db `close()` didn't cancel the manual compaction thread.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9625

Test Plan: unittest not hang

Reviewed By: ajkr

Differential Revision: D34410811

Pulled By: jay-zhuang

fbshipit-source-id: cb14065eabb8cf1345fa042b5652d4f788c0c40c
2022-03-02 13:43:00 -08:00
Akanksha Mahajan
d74468e348 Update Poll and ReadAsync API in File System (#9623)
Summary:
Update the signature of Poll and ReadAsync APIs in filesystem.
Instead of unique_ptr, void** will be passed as io_handle and the delete function.
io_handle and delete function should be provided by underlying
FileSystem and its lifetime will be maintained by RocksDB. io_handle
will be deleted by RocksDB once callback is made to update the results or Poll is
called to get the results.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9623

Test Plan: Add a new unit test.

Reviewed By: anand1976

Differential Revision: D34403529

Pulled By: akankshamahajan15

fbshipit-source-id: ea185a5f4c7bec334631e4f781ea7ba4135645f0
2022-03-01 17:11:42 -08:00
Patrick Somaru
ff8763c187 regenerate config jsons, reduce noise (#9644)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9644

Reviewed By: jay-zhuang

Differential Revision: D34543778

fbshipit-source-id: eae5f2c0ced4c11d365d0049bdb288598e364e8f
2022-03-01 15:09:45 -08:00
Patrick Somaru
af6cb50bc4 update buckifier for new json format and updated macros (#9643)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9643

Reviewed By: jay-zhuang

Differential Revision: D34543573

fbshipit-source-id: fec0c81ece37ca5eb958cef13ac9657cca6338b7
2022-03-01 15:09:45 -08:00
sdong
33742c2a9f Remove BlockBasedTableOptions.hash_index_allow_collision (#9454)
Summary:
BlockBasedTableOptions.hash_index_allow_collision is already deprecated and has no effect. Delete it for preparing 7.0 release.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9454

Test Plan: Run all existing tests.

Reviewed By: ajkr

Differential Revision: D33805827

fbshipit-source-id: ed8a436d1d083173ec6aef2a762ba02e1eefdc9d
2022-03-01 13:58:02 -08:00
Jonathan Albrecht
3edbeeaa50 Reenable s390x platform_dependent travis job (#9631)
Summary:
Fix g++ -march=native detection and reenable s390x in travis

This PR fixes s390x assembler messages:
```
Error: invalid switch -march=z14
Error: unrecognized option -march=z14
```

The s390x travis build was failing with gcc-7 because the assembler on
ubuntu 16.04 is too old to recognize the z14 model so it doesn't work
with -march=native on a z14 machine. It fixes the check for the
-march=native flag so that the assembler will get called and correctly
fail on ubuntu 16.04 which will cause the build to fall back to
-march=z196 which works.

The other changes are needed so builds work more consistently on
s390x:

1. Set make parallelism to 1 for s390x: The default was 4 previously
but I saw frequent internal compiler errors on travis probably due to
low resources. The `platform_dependent` job works more consistently
but is roughly 10 minutes slower although it varies.
2. Remove status_checked jobs, as we are relying on CircleCI for
these now and do not really need platform coverage on them.

Fixes https://github.com/facebook/rocksdb/issues/9524

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9631

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D34553989

Pulled By: pdillinger

fbshipit-source-id: a6e3a7276446721c4c0bebc4ed217c2ca2b53f11
2022-03-01 13:50:41 -08:00
dependabot[bot]
9e9e3d16b9 Bump nokogiri from 1.12.5 to 1.13.3 in /docs (#9636)
Summary:
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.12.5 to 1.13.3.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p>
<blockquote>
<h2>1.13.3 / 2022-02-21</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] Revert a HTML4 parser bug in libxml 2.9.13 (introduced in Nokogiri v1.13.2). The bug causes libxml2's HTML4 parser to fail to recover when encountering a bare <code>&lt;</code> character in some contexts. This version of Nokogiri restores the earlier behavior, which is to recover from the parse error and treat the <code>&lt;</code> as normal character data (which will be serialized as <code>&amp;lt;</code> in a text node). The bug (and the fix) is only relevant when the <code>RECOVER</code> parse option is set, as it is by default. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2461">https://github.com/facebook/rocksdb/issues/2461</a>]</li>
</ul>
<hr />
<p>SHA256 checksums:</p>
<pre><code>025a4e333f6f903072a919f5f75b03a8f70e4969dab4280375b73f9d8ff8d2c0  nokogiri-1.13.3-aarch64-linux.gem
b9cb59c6a6da8cf4dbee5dbb569c7cc95a6741392e69053544e0f40b15ab9ad5  nokogiri-1.13.3-arm64-darwin.gem
e55d18cee64c19d51d35ad80634e465dbcdd46ac4233cb42c1e410307244ebae  nokogiri-1.13.3-java.gem
53e2d68116cd00a873406b8bdb90c78a6f10e00df7ddf917a639ac137719b67b  nokogiri-1.13.3-x64-mingw-ucrt.gem
b5f39ebb662a1be7d1c61f8f0a2a683f1bb11690a6f00a99a1aa23a071f80145  nokogiri-1.13.3-x64-mingw32.gem
7c0de5863aace4bbbc73c4766cf084d1f0b7a495591e46d1666200cede404432  nokogiri-1.13.3-x86-linux.gem
675cc3e7d7cca0d6790047a062cd3aa3eab59e3cb9b19374c34f98bade588c66  nokogiri-1.13.3-x86-mingw32.gem
f445596a5a76941a9d1980747535ab50d3399d1b46c32989bc26b7dd988ee498  nokogiri-1.13.3-x86_64-darwin.gem
3f6340661c2a283b337d227ea224f859623775b2f5c09a6bf197b786563958df  nokogiri-1.13.3-x86_64-linux.gem
bf1b1bceff910abb0b7ad825535951101a0361b859c2ad1be155c010081ecbdc  nokogiri-1.13.3.gem
</code></pre>
<h2>1.13.2 / 2022-02-21</h2>
<h3>Security</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-23308">CVE-2022-23308</a>.</li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2021-30560">CVE-2021-30560</a>.</li>
</ul>
<p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-fq42-c5rg-92c2">GHSA-fq42-c5rg-92c2</a> for more information about these CVEs.</p>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. Full changelog is available at <a href="https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news">https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news</a></li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. Full changelog is available at <a href="https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news">https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news</a></li>
</ul>
<hr />
<p>SHA256 checksums:</p>
<pre><code>63469a9bb56a21c62fbaea58d15f54f8f167ff6fde51c5c2262072f939926fdd  nokogiri-1.13.2-aarch64-linux.gem
2986617f982f645c06f22515b721e6d2613dd69493e5c41ddd03c4830c3b3065  nokogiri-1.13.2-arm64-darwin.gem
aca1d66206740b29d0d586b1d049116adcb31e6cdd7c4dd3a96eb77da215a0c4  nokogiri-1.13.2-java.gem
b9e4eea1a200d9a927a5bc7d662c427e128779cba0098ea49ddbdb3ffc3ddaec  nokogiri-1.13.2-x64-mingw-ucrt.gem
48d5493fec495867c5516a908a068c1387a1d17c5aeca6a1c98c089d9d9fdcf8  nokogiri-1.13.2-x64-mingw32.gem
62034d7aaaa83fbfcb8876273cc5551489396841a66230d3200b67919ef76cf9  nokogiri-1.13.2-x86-linux.gem
e07237b82394017c2bfec73c637317ee7dbfb56e92546151666abec551e46d1d  nokogiri-1.13.2-x86-mingw32.gem
&lt;/tr&gt;&lt;/table&gt;
</code></pre>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md">nokogiri's changelog</a>.</em></p>
<blockquote>
<h2>1.13.3 / 2022-02-21</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] Revert a HTML4 parser bug in libxml 2.9.13 (introduced in Nokogiri v1.13.2). The bug causes libxml2's HTML4 parser to fail to recover when encountering a bare <code>&lt;</code> character in some contexts. This version of Nokogiri restores the earlier behavior, which is to recover from the parse error and treat the <code>&lt;</code> as normal character data (which will be serialized as <code>&amp;lt;</code> in a text node). The bug (and the fix) is only relevant when the <code>RECOVER</code> parse option is set, as it is by default. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2461">https://github.com/facebook/rocksdb/issues/2461</a>]</li>
</ul>
<h2>1.13.2 / 2022-02-21</h2>
<h3>Security</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-23308">CVE-2022-23308</a>.</li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. This update addresses <a href="https://nvd.nist.gov/vuln/detail/CVE-2021-30560">CVE-2021-30560</a>.</li>
</ul>
<p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-fq42-c5rg-92c2">GHSA-fq42-c5rg-92c2</a> for more information about these CVEs.</p>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored libxml2 is updated from 2.9.12 to 2.9.13. Full changelog is available at <a href="https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news">https://download.gnome.org/sources/libxml2/2.9/libxml2-2.9.13.news</a></li>
<li>[CRuby] Vendored libxslt is updated from 1.1.34 to 1.1.35. Full changelog is available at <a href="https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news">https://download.gnome.org/sources/libxslt/1.1/libxslt-1.1.35.news</a></li>
</ul>
<h2>1.13.1 / 2022-01-13</h2>
<h3>Fixed</h3>
<ul>
<li>Fix <code>Nokogiri::XSLT.quote_params</code> regression in v1.13.0 that raised an exception when non-string stylesheet parameters were passed. Non-string parameters (e.g., integers and symbols) are now explicitly supported and both keys and values will be stringified with <code>#to_s</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2418">https://github.com/facebook/rocksdb/issues/2418</a>]</li>
<li>Fix CSS selector query regression in v1.13.0 that raised an <code>Nokogiri::XML::XPath::SyntaxError</code> when parsing XPath attributes mixed into the CSS query. Although this mash-up of XPath and CSS syntax previously worked unintentionally, it is now an officially supported feature and is documented as such. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2419">https://github.com/facebook/rocksdb/issues/2419</a>]</li>
</ul>
<h2>1.13.0 / 2022-01-06</h2>
<h3>Notes</h3>
<h4>Ruby</h4>
<p>This release introduces native gem support for Ruby 3.1. Please note that Windows users should use the <code>x64-mingw-ucrt</code> platform gem for Ruby 3.1, and <code>x64-mingw32</code> for Ruby 2.6–3.0 (see <a href="https://rubyinstaller.org/2021/12/31/rubyinstaller-3.1.0-1-released.html">RubyInstaller 3.1.0 release notes</a>).</p>
<p>This release ends support for:</p>
<ul>
<li>Ruby 2.5, for which <a href="https://www.ruby-lang.org/en/downloads/branches/">official support ended 2021-03-31</a>.</li>
<li>JRuby 9.2, which is a Ruby 2.5-compatible release.</li>
</ul>
<h4>Faster, more reliable installation: Native Gem for ARM64 Linux</h4>
<p>This version of Nokogiri ships experimental native gem support for the <code>aarch64-linux</code> platform, which should support AWS Graviton and other ARM Linux platforms. We don't yet have CI running for this platform, and so we're interested in hearing back from y'all whether this is working, and what problems you're seeing. Please send us feedback here: <a href="https://github.com/sparklemotion/nokogiri/discussions/2359">Feedback: Have you used the <code>aarch64-linux</code> native gem?</a></p>

</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="7d74cedf27"><code>7d74ced</code></a> version bump to v1.13.3</li>
<li><a href="5970fd95c8"><code>5970fd9</code></a> fix: revert libxml2 regression with HTML4 recovery</li>
<li><a href="49b86631b7"><code>49b8663</code></a> version bump to v1.13.2</li>
<li><a href="4729133787"><code>4729133</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2457">https://github.com/facebook/rocksdb/issues/2457</a> from sparklemotion/flavorjones-libxml-2.9.13-v1.13.x</li>
<li><a href="379f757ef5"><code>379f757</code></a> dev(package): work around gnome mirrors with expired certs</li>
<li><a href="95cf66ca9f"><code>95cf66c</code></a> dep: upgrade libxml2 2.9.12 → 2.9.13</li>
<li><a href="d37dd02ea5"><code>d37dd02</code></a> dep: upgrade libxslt 1.1.34 → 1.1.35</li>
<li><a href="59a93986ec"><code>59a9398</code></a> dep: upgrade mini_portile 2.7 to 2.8</li>
<li><a href="e885463285"><code>e885463</code></a> dev(package): handle either .tar.gz or .tar.xz archive names</li>
<li><a href="7957c7b009"><code>7957c7b</code></a> style: rubocop</li>
<li>Additional commits viewable in <a href="https://github.com/sparklemotion/nokogiri/compare/v1.12.5...v1.13.3">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nokogiri&package-manager=bundler&previous-version=1.12.5&new-version=1.13.3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

 ---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts).

</details>

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9636

Reviewed By: anand1976

Differential Revision: D34556272

Pulled By: jay-zhuang

fbshipit-source-id: 76aa7e92ca3fcf5d34c53091b94bfe5b0af7b55d
2022-03-01 11:24:16 -08:00
ehds@qq.com
d95e13e9cc typo(clock_cache) fix incomplete message typo (#9638)
Summary:
`LRU` should be `CLOCK`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9638

Reviewed By: mrambacher

Differential Revision: D34523550

Pulled By: jay-zhuang

fbshipit-source-id: ca06ada1aac45d3707016c1590541287dab6ef79
2022-03-01 10:57:09 -08:00
Jay Zhuang
e3ef41b02f Use released clang-format instead of the one from dev branch (#9646)
Summary:
We should use the released clang-format version instead of the one from
dev branch. Otherwise the format report could be inconsistent with local
development env and CI. e.g.: https://github.com/facebook/rocksdb/issues/9644

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9646

Test Plan: CI

Reviewed By: riversand963

Differential Revision: D34554065

Pulled By: jay-zhuang

fbshipit-source-id: b841bc400becb4272be18c803eb03a7a1172da6f
2022-03-01 10:51:38 -08:00
Si Ke
06c8afeff5 Fix pointer to jlong conversion in 32 bit OS (#9396)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9396

Reviewed By: jay-zhuang

Differential Revision: D34529654

Pulled By: pdillinger

fbshipit-source-id: cf62152ba86b02f9ffa7780f370ad49089e56a0b
2022-03-01 09:02:15 -08:00
Adam Retter
7d7e88c7d1 Improve build detect for RISCV (#9366)
Summary:
Related to: https://github.com/facebook/rocksdb/pull/9215

* Adds build_detect_platform support for RISCV on Linux (at least on SiFive Unmatched platforms)

This still leaves some linking issues on RISCV remaining (e.g. when building `db_test`):
```
/usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `__gnu_cxx::new_allocator<char>::deallocate(char*, unsigned long)':
/usr/include/c++/10/ext/new_allocator.h:133: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(memtable.o): in function `std::__atomic_base<bool>::compare_exchange_weak(bool&, bool, std::memory_order, std::memory_order)':
/usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: /usr/include/c++/10/bits/atomic_base.h:464: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(memtable.o):/usr/include/c++/10/bits/atomic_base.h:464: more undefined references to `__atomic_compare_exchange_1' follow
/usr/bin/ld: ./librocksdb_debug.a(db_impl.o): in function `rocksdb::DBImpl::NewIteratorImpl(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, unsigned long, rocksdb::ReadCallback*, bool, bool)':
/home/adamretter/rocksdb/db/db_impl/db_impl.cc:3019: undefined reference to `__atomic_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::Writer::CreateMutex()':
/home/adamretter/rocksdb/./db/write_thread.h:205: undefined reference to `__atomic_compare_exchange_1'
/usr/bin/ld: ./librocksdb_debug.a(write_thread.o): in function `rocksdb::WriteThread::SetState(rocksdb::WriteThread::Writer*, unsigned char)':
/home/adamretter/rocksdb/db/write_thread.cc:222: undefined reference to `__atomic_compare_exchange_1'
collect2: error: ld returned 1 exit status
make: *** [Makefile:1449: db_test] Error 1
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9366

Reviewed By: jay-zhuang

Differential Revision: D34377664

Pulled By: mrambacher

fbshipit-source-id: c86f9d0cd1cb0c18de72b06f1bf5847f23f51118
2022-03-01 04:24:54 -08:00
Andrew Kryczka
0a89cea5f5 Handle failures in block-based table size/offset approximation (#9615)
Summary:
In crash test with fault injection, we were seeing stack traces like the following:

```
https://github.com/facebook/rocksdb/issues/3 0x00007f75f763c533 in __GI___assert_fail (assertion=assertion@entry=0x1c5b2a0 "end_offset >= start_offset", file=file@entry=0x1c580a0 "table/block_based/block_based_table_reader.cc", line=line@entry=3245,
function=function@entry=0x1c60e60 "virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller)") at assert.c:101
https://github.com/facebook/rocksdb/issues/4 0x00000000010ea9b4 in rocksdb::BlockBasedTable::ApproximateSize (this=<optimized out>, start=..., end=..., caller=<optimized out>) at table/block_based/block_based_table_reader.cc:3224
https://github.com/facebook/rocksdb/issues/5 0x0000000000be61fb in rocksdb::TableCache::ApproximateSize (this=0x60f0000161b0, start=..., end=..., fd=..., caller=caller@entry=rocksdb::kCompaction, internal_comparator=..., prefix_extractor=...) at db/table_cache.cc:719
https://github.com/facebook/rocksdb/issues/6 0x0000000000c3eaec in rocksdb::VersionSet::ApproximateSize (this=<optimized out>, v=<optimized out>, f=..., start=..., end=..., caller=<optimized out>) at ./db/version_set.h:850
https://github.com/facebook/rocksdb/issues/7 0x0000000000c6ebc3 in rocksdb::VersionSet::ApproximateSize (this=<optimized out>, options=..., v=v@entry=0x621000047500, start=..., end=..., start_level=start_level@entry=0, end_level=<optimized out>, caller=<optimized out>)
at db/version_set.cc:5657
https://github.com/facebook/rocksdb/issues/8 0x000000000166e894 in rocksdb::CompactionJob::GenSubcompactionBoundaries (this=<optimized out>) at ./include/rocksdb/options.h:1869
https://github.com/facebook/rocksdb/issues/9 0x000000000168c526 in rocksdb::CompactionJob::Prepare (this=this@entry=0x7f75f3ffcf00) at db/compaction/compaction_job.cc:546
```

The problem occurred in `ApproximateSize()` when the index `Seek()` for the first `ApproximateDataOffsetOf()` encountered an I/O error, while the second `Seek()` did not. In the old code that scenario caused `start_offset == data_size` , thus it was easy to trip the assertion that `end_offset >= start_offset`.

The fix is to set `start_offset == 0` when the first index `Seek()` fails, and `end_offset == data_size` when the second index `Seek()` fails. I doubt these give an "on average correct" answer for how this function is used, but I/O errors in index seeks are hopefully rare, it looked consistent with what was already there, and it was easier to calculate.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9615

Test Plan:
run the repro command for a while and stopped seeing coredumps -

```
$ while !  ./db_stress --block_size=128 --cache_size=32768 --clear_column_family_one_in=0 --column_families=1 --continuous_verification_interval=0 --db=/dev/shm/rocksdb_crashtest --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --index_type=2 --iterpercent=10  --kill_random_test=18887 --max_key=1000000 --max_bytes_for_level_base=2048576 --nooverwritepercent=1 --open_files=-1 --open_read_fault_one_in=32 --ops_per_thread=1000000 --prefixpercent=5 --read_fault_one_in=0 --readpercent=45 --reopen=0 --skip_verifydb=1 --subcompactions=2 --target_file_size_base=524288 --test_batches_snapshots=0 --value_size_mult=32 --write_buffer_size=524288 --writepercent=35  ; do : ; done
```

Reviewed By: pdillinger

Differential Revision: D34383069

Pulled By: ajkr

fbshipit-source-id: fac26c3b20ea962e75387515ba5f2724dc48719f
2022-02-28 23:45:08 -08:00
stefan-zobel
ddb7620a61 Fix trivial Javadoc omissions (#9534)
Summary:
- fix spelling of `valueSizeSofLimit` and add "param" description in ReadOptions
- add 3 missing "return" in RocksDB

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9534

Reviewed By: riversand963

Differential Revision: D34131186

Pulled By: mrambacher

fbshipit-source-id: 7eb7ec177906052837180b291d67fb1c838c49e1
2022-02-28 11:51:17 -08:00
Andrew Kryczka
9983eecdfb Dedicate cacheline for DB mutex (#9637)
Summary:
We found a case of cacheline bouncing due to writers locking/unlocking `mutex_` and readers accessing `block_cache_tracer_`. We discovered it only after the issue was fixed by https://github.com/facebook/rocksdb/issues/9462 shifting the `DBImpl` members such that `mutex_` and `block_cache_tracer_` were naturally placed in separate cachelines in our regression testing setup. This PR forces the cacheline alignment of `mutex_` so we don't accidentally reintroduce the problem.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9637

Reviewed By: riversand963

Differential Revision: D34502233

Pulled By: ajkr

fbshipit-source-id: 46aa313b7fe83e80c3de254e332b6fb242434c07
2022-02-27 11:36:54 -08:00
Changneng Chen
9ed96703d1 Add support for BlobDB to ldb (#9630)
Summary:
Add the configuration options and help messages of BlobDB to `ldb`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9630

Test Plan: `python ./tools/ldb_test.py`

Reviewed By: ltamasi

Differential Revision: D34443176

Pulled By: changneng

fbshipit-source-id: 5b3f185cdfc2561e06dd37215c7edfbca07dbe80
2022-02-25 23:13:11 -08:00
Hui Xiao
87a8b3c8af Deflake DBErrorHandlingFSTest.MultiCFWALWriteError (#9496)
Summary:
**Context:**
As part of https://github.com/facebook/rocksdb/pull/6949, file deletion is disabled for faulty database on the IOError of MANIFEST write/sync and [re-enabled again during `DBImpl::Resume()` if all recovery is completed](e66199d848 (diff-d9341fbe2a5d4089b93b22c5ed7f666bc311b378c26d0786f4b50c290e460187R396)). Before re-enabling file deletion, it `assert(versions_->io_status().ok());`, which IMO assumes `versions_` is **the** `version_` in the recovery process.

However, this is not necessarily true due to `s = error_handler_.ClearBGError();` happening before that assertion can unblock some foreground thread by [`EventHelpers::NotifyOnErrorRecoveryEnd()`](3122cb4358/db/error_handler.cc (L552-L553)) as part of the `ClearBGError()`. That foreground thread can do whatever it wants including closing/reopening the db and clean up that same `versions_`.

As a consequence,  `assert(versions_->io_status().ok());`, will access `io_status()` of a nullptr and test like `DBErrorHandlingFSTest.MultiCFWALWriteError` becomes flaky. The unblocked foreground thread (in this case, the testing thread) proceeds to [reopen the db](https://github.com/facebook/rocksdb/blob/6.29.fb/db/error_handler_fs_test.cc?fbclid=IwAR1kQOxSbTUmaHQPAGz5jdMHXtDsDFKiFl8rifX-vIz4B23Y0S9jBkssSCg#L1494), where [`versions_` gets reset to nullptr](https://github.com/facebook/rocksdb/blob/6.29.fb/db/db_impl/db_impl.cc?fbclid=IwAR2uRhwBiPKgmE9q_6CM2mzbfwjoRgsGpXOrHruSJUDcAKc9rYZtVSvKdOY#L678) as part of the old db clean-up. If this happens right before `assert(versions_->io_status().ok()); ` gets excuted in the background thread, then we can see error like
```
db/db_impl/db_impl.cc:420:5: runtime error: member call on null pointer of type 'rocksdb::VersionSet'
assert(versions_->io_status().ok());
```

**Summary:**
- I proposed to call `s = error_handler_.ClearBGError();` after we know it's fine to wake up foreground, which I think is right before we LOG `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");`
   - As the context,  the orignal https://github.com/facebook/rocksdb/pull/3997  introducing `DBImpl::Resume()` calls `s = error_handler_.ClearBGError();` very close to calling `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");` while the later https://github.com/facebook/rocksdb/pull/6949 distances these two calls a bit.
   - And it seems fine to me that `s = error_handler_.ClearBGError();` happens after `EnableFileDeletions(/*force=*/true);` at least syntax-wise since these two functions are orthogonal. And it also seems okay to me that we re-enable file deletion before `s = error_handler_.ClearBGError();`, which basically is resetting some state variables.
- In addition, to preserve the previous behavior of  https://github.com/facebook/rocksdb/pull/6949 where status of re-enabling file deletion is not taken account into the general status of resuming the db, I separated `enable_file_deletion_s` from the general `s`
- In addition, to make `ROCKS_LOG_INFO(immutable_db_options_.info_log, "Successfully resumed DB");` more clear, I separated it into its own if-block.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9496

Test Plan:
- Manually reproduce the assertion failure in`DBErrorHandlingFSTest.MultiCFWALWriteError` by injecting sleep like below so that it's more likely for `assert(versions_->io_status().ok());` to execute after [reopening the db](https://github.com/facebook/rocksdb/blob/6.29.fb/db/error_handler_fs_test.cc?fbclid=IwAR1kQOxSbTUmaHQPAGz5jdMHXtDsDFKiFl8rifX-vIz4B23Y0S9jBkssSCg#L1494) in the foreground (i.e, testing) thread
```
sleep(1);
assert(versions_->io_status().ok());
```
   `python3 gtest-parallel/gtest_parallel.py -r 100 -w 100 rocksdb/error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.MultiCFWALWriteError`
   ```
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DBErrorHandlingFSTest
[ RUN      ] DBErrorHandlingFSTest.MultiCFWALWriteError
Received signal 11 (Segmentation fault)
#0   rocksdb/error_handler_fs_test() [0x5818a4] rocksdb::DBImpl::ResumeImpl(rocksdb::DBRecoverContext)  /data/users/huixiao/rocksdb/db/db_impl/db_impl.cc:421
https://github.com/facebook/rocksdb/issues/1   rocksdb/error_handler_fs_test() [0x6379ff] rocksdb::ErrorHandler::RecoverFromBGError(bool) /data/users/huixiao/rocksdb/db/error_handler.cc:600
https://github.com/facebook/rocksdb/issues/2   rocksdb/error_handler_fs_test() [0x7c5362] rocksdb::SstFileManagerImpl::ClearError()       /data/users/huixiao/rocksdb/file/sst_file_manager_impl.cc:310
https://github.com/facebook/rocksdb/issues/3   rocksdb/error_handler_fs_test()
   ```
- The assertion failure does not happen with PR
`python3 gtest-parallel/gtest_parallel.py -r 100 -w 100 rocksdb/error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.MultiCFWALWriteError`
`[100/100] DBErrorHandlingFSTest.MultiCFWALWriteError (43785 ms)  `

Reviewed By: riversand963, anand1976

Differential Revision: D33990099

Pulled By: hx235

fbshipit-source-id: 2e0259a471fa8892ff177da91b3e1c0792dd7bab
2022-02-25 14:44:46 -08:00
Siddhartha Roychowdhury
21345d2823 Streaming Compression API for WAL compression. (#9619)
Summary:
Implement a streaming compression API (compress/uncompress) to use for WAL compression. The log_writer would use the compress class/API to compress a record before writing it out in chunks. The log_reader would use the uncompress class/API to uncompress the chunks and combine into a single record.

Added unit test to verify the API for different sizes/compression types.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9619

Test Plan: make -j24 check

Reviewed By: anand1976

Differential Revision: D34437346

Pulled By: sidroyc

fbshipit-source-id: b180569ad2ddcf3106380f8758b556cc0ad18382
2022-02-23 23:45:04 -08:00
Bo Wang
f706a9c199 Add a secondary cache implementation based on LRUCache 1 (#9518)
Summary:
**Summary:**
RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO.

This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518

Test Plan:
In this PR, the lru_secondary_cache_test.cc includes the following tests:
1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests.
2. The integration tests with both primary cache and this secondary cache .

**Follow Up:**

1. Statistics (e.g. compression ratio) will be added in another PR.
2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact.

Reviewed By: anand1976

Differential Revision: D34430930

Pulled By: gitbw95

fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
2022-02-23 16:06:27 -08:00
Yanqin Jin
6f12599863 Support WBWI for keys having timestamps (#9603)
Summary:
This PR supports inserting keys to a `WriteBatchWithIndex` for column families that enable user-defined timestamps
and reading the keys back. **The index does not have timestamps.**

Writing a key to WBWI is unchanged, because the underlying WriteBatch already supports it.
When reading the keys back, we need to make sure to distinguish between keys with and without timestamps before
comparison.

When user calls `GetFromBatchAndDB()`, no timestamp is needed to query the batch, but a timestamp has to be
provided to query the db. The assumption is that data in the batch must be newer than data from the db.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9603

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D34354849

Pulled By: riversand963

fbshipit-source-id: d25d1f84e2240ce543e521fa30595082fb8db9a0
2022-02-22 14:23:01 -08:00
Andrew Kryczka
8ca433f912 Fix test race conditions with OnFlushCompleted() (#9617)
Summary:
We often see flaky tests due to `DB::Flush()` or `DBImpl::TEST_WaitForFlushMemTable()` not waiting until event listeners complete. For example, https://github.com/facebook/rocksdb/issues/9084, https://github.com/facebook/rocksdb/issues/9400, https://github.com/facebook/rocksdb/issues/9528, plus two new ones this week: "EventListenerTest.OnSingleDBFlushTest" and "DBFlushTest.FireOnFlushCompletedAfterCommittedResult". I ran a `make check` with the below race condition-coercing patch and fixed  issues it found besides old BlobDB.

```
 diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc
index 0e1864788..aaba68c4a 100644
 --- a/db/db_impl/db_impl_compaction_flush.cc
+++ b/db/db_impl/db_impl_compaction_flush.cc
@@ -861,6 +861,8 @@ void DBImpl::NotifyOnFlushCompleted(
        mutable_cf_options.level0_stop_writes_trigger);
   // release lock while notifying events
   mutex_.Unlock();
+  bg_cv_.SignalAll();
+  sleep(1);
   {
     for (auto& info : *flush_jobs_info) {
       info->triggered_writes_slowdown = triggered_writes_slowdown;
```

The reason I did not fix old BlobDB issues is because it appears to have a fundamental (non-test) issue. In particular, it uses an EventListener to keep track of the files. OnFlushCompleted() could be delayed until even after a compaction involving that flushed file completes, causing the compaction to unexpectedly delete an untracked file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9617

Test Plan: `make check` including the race condition coercing patch

Reviewed By: hx235

Differential Revision: D34384022

Pulled By: ajkr

fbshipit-source-id: 2652ded39b415277c5d6a628414345223930514e
2022-02-22 12:23:00 -08:00
Andrew Kryczka
96978e4d96 Enable core dumps in TSAN/UBSAN crash tests (#9616)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9616

Reviewed By: hx235

Differential Revision: D34383489

Pulled By: ajkr

fbshipit-source-id: e4299000ef38073ec57e6ab5836150fdf8ce43d4
2022-02-22 12:23:00 -08:00
anand76
d795a730be Combine data members of IOStatus with Status (#9549)
Summary:
Combine the data members retryable_, data_loss_ and scope_ of IOStatus
with Status, as protected members. IOStatus is now defined as a derived class of Status with
no new data, but additional methods. This will allow us to eventually
track the result of FileSystem calls in RocksDB with one variable
instead of two.

Benchmark commands and results are below. The performance after changes seems slightly better.

```./db_bench -db=/data/mysql/rocksdb/prefix_scan -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -use_direct_io_for_flush_and_compaction=true -target_file_size_base=16777216```

```./db_bench -use_existing_db=true --db=/data/mysql/rocksdb/prefix_scan -benchmarks="readseq,seekrandom,readseq" -key_size=32 -value_size=512 -num=5000000 -seek_nexts=10000 -use_direct_reads=true -duration=60 -ops_between_duration_checks=1 -readonly=true -adaptive_readahead=false -threads=1 -cache_size=10485760000```

Before -
seekrandom   :    3715.432 micros/op 269 ops/sec; 1394.9 MB/s (16149 of 16149 found)
seekrandom   :    3687.177 micros/op 271 ops/sec; 1405.6 MB/s (16273 of 16273 found)
seekrandom   :    3709.646 micros/op 269 ops/sec; 1397.1 MB/s (16175 of 16175 found)

readseq      :       0.369 micros/op 2711321 ops/sec; 1406.6 MB/s
readseq      :       0.363 micros/op 2754092 ops/sec; 1428.8 MB/s
readseq      :       0.372 micros/op 2688046 ops/sec; 1394.6 MB/s

After -
seekrandom   :    3606.830 micros/op 277 ops/sec; 1436.9 MB/s (16636 of 16636 found)
seekrandom   :    3594.467 micros/op 278 ops/sec; 1441.9 MB/s (16693 of 16693 found)
seekrandom   :    3597.919 micros/op 277 ops/sec; 1440.5 MB/s (16677 of 16677 found)

readseq      :       0.354 micros/op 2822809 ops/sec; 1464.5 MB/s
readseq      :       0.358 micros/op 2795080 ops/sec; 1450.1 MB/s
readseq      :       0.354 micros/op 2822889 ops/sec; 1464.5 MB/s

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9549

Reviewed By: pdillinger

Differential Revision: D34310362

Pulled By: anand1976

fbshipit-source-id: 54b27756edf9c9ecfe730a2dce542a7a46743096
2022-02-22 11:23:01 -08:00
Patrick Somaru
ba65cfff63 configure microbenchmarks, regenerate targets (#9599)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9599

Reviewed By: jay-zhuang, hodgesds

Differential Revision: D34214408

fbshipit-source-id: 6932200772f52ce77e550646ee3d1a928295844a
2022-02-22 09:24:51 -08:00
Andrew Kryczka
3379d1466f Fix DBTest2.BackupFileTemperature memory leak (#9610)
Summary:
Valgrind was failing with the below error because we forgot to destroy
the `BackupEngine` object:

```
==421173== Command: ./db_test2 --gtest_filter=DBTest2.BackupFileTemperature
==421173==
Note: Google Test filter = DBTest2.BackupFileTemperature
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DBTest2
[ RUN      ] DBTest2.BackupFileTemperature
--421173-- WARNING: unhandled amd64-linux syscall: 425
--421173-- You may be able to write your own handler.
--421173-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--421173-- Nevertheless we consider this a bug.  Please report
--421173-- it at http://valgrind.org/support/bug_reports.html.
[       OK ] DBTest2.BackupFileTemperature (3366 ms)
[----------] 1 test from DBTest2 (3371 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (3413 ms total)
[  PASSED  ] 1 test.
==421173==
==421173== HEAP SUMMARY:
==421173==     in use at exit: 13,042 bytes in 195 blocks
==421173==   total heap usage: 26,022 allocs, 25,827 frees, 27,555,265 bytes allocated
==421173==
==421173== 8 bytes in 1 blocks are possibly lost in loss record 6 of 167
==421173==    at 0x4838DBF: operator new(unsigned long) (vg_replace_malloc.c:344)
==421173==    by 0x8D4606: allocate (new_allocator.h:114)
==421173==    by 0x8D4606: allocate (alloc_traits.h:445)
==421173==    by 0x8D4606: _M_allocate (stl_vector.h:343)
==421173==    by 0x8D4606: reserve (vector.tcc:78)
==421173==    by 0x8D4606: rocksdb::BackupEngineImpl::Initialize() (backupable_db.cc:1174)
==421173==    by 0x8D5473: Initialize (backupable_db.cc:918)
==421173==    by 0x8D5473: rocksdb::BackupEngine::Open(rocksdb::BackupEngineOptions const&, rocksdb::Env*, rocksdb::BackupEngine**) (backupable_db.cc:937)
==421173==    by 0x50AC8F: Open (backup_engine.h:585)
==421173==    by 0x50AC8F: rocksdb::DBTest2_BackupFileTemperature_Test::TestBody() (db_test2.cc:6996)
...
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9610

Test Plan:
```
$ make -j24 ROCKSDBTESTS_SUBSET=db_test2 valgrind_check_some
```

Reviewed By: akankshamahajan15

Differential Revision: D34371210

Pulled By: ajkr

fbshipit-source-id: 68154fcb0c51b28222efa23fa4ee02df8d925a18
2022-02-21 19:23:19 -08:00
Andrew Kryczka
7ae4da924a Update HISTORY.md and version.h for 7.0 release (#9609)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9609

Reviewed By: riversand963

Differential Revision: D34370309

Pulled By: ajkr

fbshipit-source-id: 5fc9306439aefa4b2d61d847534ea6758c30b6a5
2022-02-20 15:22:54 -08:00
Akanksha Mahajan
3699b171e4 Change enum SizeApproximationFlags to enum class (#9604)
Summary:
Change enum SizeApproximationFlags to enum and class and add
overloaded operators for the transition between enum class and uint8_t

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9604

Test Plan: Circle CI jobs

Reviewed By: riversand963

Differential Revision: D34360281

Pulled By: akankshamahajan15

fbshipit-source-id: 6351dfdb717ae3c4530d324c3d37a8ecb01dd1ef
2022-02-18 20:22:57 -08:00
Jay Zhuang
d3a2f284d9 Add Temperature info in NewSequentialFile() (#9499)
Summary:
Add Temperature hints information from RocksDB in API
`NewSequentialFile()`. backup and checkpoint operations need to open the
source files with `NewSequentialFile()`, which will have the temperature
hints. Other operations are not covered.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9499

Test Plan: Added unittest

Reviewed By: pdillinger

Differential Revision: D34006115

Pulled By: jay-zhuang

fbshipit-source-id: 568b34602b76520e53128672bd07e9d886786a2f
2022-02-18 18:23:07 -08:00
Akanksha Mahajan
559525dcbb Add Async Read and Poll APIs in FileSystem (#9564)
Summary:
This PR adds support for new APIs Async Read that reads the data
asynchronously and Poll API that checks if requested read request has
completed or not.

Usage: In RocksDB, we are currently planning to prefetch data
asynchronously during sequential scanning and RocksDB will call these
APIs to prefetch more data in advanced.

Design:
- ReadAsync API submits the read request to underlying FileSystem in
order to read data asynchronously. When read request is completed,
callback function will be called. cb_arg is used by RocksDB to track the
original request submitted and IOHandle is used by FileSystem to keep track
of IO requests at their level.

- The Poll API  is added in FileSystem because the call could end up handling
completions for multiple different files which is not specific to a
FSRandomAccessFile instance. There could be multiple outstanding file reads
from different files in future and they can complete in any order.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9564

Test Plan: Test will be added in separate PR.

Reviewed By: anand1976

Differential Revision: D34226216

Pulled By: akankshamahajan15

fbshipit-source-id: 95e64edafb17f543f7232421d51e2665a3267f69
2022-02-18 17:23:18 -08:00
Bo Wang
67f071fade Fixes #9565 (#9586)
Summary:
[Compaction::IsTrivialMove](a2b9be42b6/db/compaction/compaction.cc (L318)) checks whether allow_trivial_move is set, and if so it returns the value of is_trivial_move_. The allow_trivial_move option is there for universal compaction. So when this is set and leveled compaction is enabled, then useful code that follows this block never gets a chance to run.

A check that [compaction_style == kCompactionStyleUniversal](320d9a8e8a/db/db_impl/db_impl_compaction_flush.cc (L1030)) should be added to avoid doing the wrong thing for leveled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9586

Test Plan:
To reproduce this:
First edit db/compaction/compaction.cc with
```
 diff --git a/db/compaction/compaction.cc b/db/compaction/compaction.cc
index 7ae50b91e..52dd489b1 100644
 --- a/db/compaction/compaction.cc
+++ b/db/compaction/compaction.cc
@@ -319,6 +319,8 @@ bool Compaction::IsTrivialMove() const {
   // input files are non overlapping
   if ((mutable_cf_options_.compaction_options_universal.allow_trivial_move) &&
       (output_level_ != 0)) {
+    printf("IsTrivialMove:: return %d because universal allow_trivial_move\n", (int) is_trivial_move_);
+    // abort();
     return is_trivial_move_;
   }
```

And then run
```
./db_bench --benchmarks=fillseq --allow_concurrent_memtable_write=false --level0_file_num_compaction_trigger=4 --level0_slowdown_writes_trigger=20 --level0_stop_writes_trigger=30 --max_background_jobs=8 --max_write_buffer_number=8 --db=/data/m/rx --wal_dir=/data/m/rx --num=800000000 --num_levels=8 --key_size=20 --value_size=400 --block_size=8192 --cache_size=51539607552 --cache_numshardbits=6 --compression_max_dict_bytes=0 --compression_ratio=0.5 --compression_type=lz4 --bytes_per_sync=8388608 --cache_index_and_filter_blocks=1 --cache_high_pri_pool_ratio=0.5 --benchmark_write_rate_limit=0 --write_buffer_size=16777216 --target_file_size_base=16777216 --max_bytes_for_level_base=67108864 --verify_checksum=1 --delete_obsolete_files_period_micros=62914560 --max_bytes_for_level_multiplier=8 --statistics=0 --stats_per_interval=1 --stats_interval_seconds=20 --histogram=1 --memtablerep=skip_list --bloom_bits=10 --open_files=-1 --subcompactions=1 --compaction_style=0 --min_level_to_compress=3 --level_compaction_dynamic_level_bytes=true --pin_l0_filter_and_index_blocks_in_cache=1 --soft_pending_compaction_bytes_limit=167503724544 --hard_pending_compaction_bytes_limit=335007449088 --min_level_to_compress=0 --use_existing_db=0 --sync=0 --threads=1 --memtablerep=vector --allow_concurrent_memtable_write=false --disable_wal=1 --seed=1641328309 --universal_allow_trivial_move=1
```
Example output with the debug code added

```
IsTrivialMove:: return 0 because universal allow_trivial_move
IsTrivialMove:: return 0 because universal allow_trivial_move
```

After this PR, the bug is fixed.

Reviewed By: ajkr

Differential Revision: D34350451

Pulled By: gitbw95

fbshipit-source-id: 3232005cc47c40a7e75d316cfc7960beb5bdff3a
2022-02-18 14:23:07 -08:00