Compare commits

..

180 Commits

Author SHA1 Message Date
Hui Xiao
f6339de0d2 Clarify some SequentialFileReader::Read logic (#10002)
Summary:
**Context/Summary:**
The logic related to PositionedRead in SequentialFileReader::Read confused me a bit as discussed here https://github.com/facebook/rocksdb/pull/9973#discussion_r872869256. Therefore I added a drawing with help from cbi42.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10002

Test Plan: - no code change

Reviewed By: anand1976, cbi42

Differential Revision: D36422632

Pulled By: hx235

fbshipit-source-id: 9a8311d2365564f90d216c430f542fc11b2d9cde
2022-05-17 10:24:04 -07:00
mrambacher
b11ff347b4 Use STATIC_AVOID_DESTRUCTION for static objects with non-trivial destructors (#9958)
Summary:
Changed the static objects that had non-trivial destructors to use the STATIC_AVOID_DESTRUCTION construct.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9958

Reviewed By: pdillinger

Differential Revision: D36442982

Pulled By: mrambacher

fbshipit-source-id: 029d47b1374d30d198bfede369a4c0ae7a4eb519
2022-05-17 09:39:22 -07:00
Yanqin Jin
3f263ef536 Add a temporary option for user to opt-out enforcement of SingleDelete contract (#9983)
Summary:
PR https://github.com/facebook/rocksdb/issues/9888 started to enforce the contract of single delete described in https://github.com/facebook/rocksdb/wiki/Single-Delete.

For some of existing use cases, it is desirable to have a transition during which compaction will not fail
if the contract is violated. Therefore, we add a temporary option `enforce_single_del_contracts` to allow
application to opt out from this new strict behavior. Once transition completes, the flag can be set to `true` again.

In a future release, the option will be removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9983

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D36333672

Pulled By: riversand963

fbshipit-source-id: dcb703ea0ed08076a1422f1bfb9914afe3c2caa2
2022-05-16 15:44:59 -07:00
Hui Xiao
e66e6d2faa Use SpecialEnv to speed up some slow BackupEngineRateLimitingTestWithParam (#9974)
Summary:
**Context:**
`BackupEngineRateLimitingTestWithParam.RateLimiting` and `BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup` involve creating backup and restoring of a big database with rate-limiting. Using the normal env with a normal clock requires real elapse of time (13702 - 19848 ms/per test). As suggested in https://github.com/facebook/rocksdb/pull/8722#discussion_r703698603, this PR is to speed it up with SpecialEnv (`time_elapse_only_sleep=true`) where its clock accepts fake elapse of time during rate-limiting (100 - 600 ms/per test)

**Summary:**
- Added TEST_ function to set clock of the default rate limiters in backup engine
- Shrunk testdb by 10 times while keeping it big enough for testing
- Renamed some test variables and reorganized some if-else branch for clarity without changing the test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9974

Test Plan:
- Run tests pre/post PR the same time to verify the tests are sped up by 90 - 95%
`BackupEngineRateLimitingTestWithParam.RateLimiting`
Pre:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 (11123 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 (9441 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 (11096 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 (9339 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 (11121 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 (9413 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 (11185 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 (9511 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (82230 ms total)
```
Post:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 (395 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 (564 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 (358 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 (567 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 (173 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 (176 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 (191 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 (177 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (2601 ms total)
```
`BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup`
Pre:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 (7275 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 (3961 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 (7117 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 (3921 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 (19862 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 (10231 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 (19848 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 (10372 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (82587 ms total)
```
Post:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 (157 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 (152 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 (160 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 (158 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 (155 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 (151 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 (146 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 (153 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (1232 ms total)
```

Reviewed By: pdillinger

Differential Revision: D36336345

Pulled By: hx235

fbshipit-source-id: 724c6ba745f95f56d4440a6d2f1e4512a2987589
2022-05-16 10:54:02 -07:00
mrambacher
204a42ca97 Added GetFactoryCount/Names/Types to ObjectRegistry (#9358)
Summary:
These methods allow for more thorough testing of the ObjectRegistry and Customizable infrastructure in a simpler manner.  With this change, the Customizable tests can now check what factories are registered and attempt to create each of them in a systematic fashion.

With this change, I think all of the factories registered with the ObjectRegistry/CreateFromString are now tested via the customizable_test classes.

Note that there were a few other minor changes.  There was a "posix://*" register with the ObjectRegistry which was missed during the PatternEntry conversion -- these changes found that.  The nickname and default names for the FileSystem classes was also inverted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9358

Reviewed By: pdillinger

Differential Revision: D33433542

Pulled By: mrambacher

fbshipit-source-id: 9a32da74e6620745b4eeffb2712be70eeeadfa7e
2022-05-16 09:44:43 -07:00
sdong
c4cd8e1acc Fix a bug handling multiget index I/O error. (#9993)
Summary:
In one path of BlockBasedTable::MultiGet(), Next() is directly called after calling Seek() against the index iterator. This might cause crash if an I/O error happens in Seek().
The bug is discovered in crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9993

Test Plan: See existing CI tests pass.

Reviewed By: anand1976

Differential Revision: D36381758

fbshipit-source-id: a11e0aa48dcee168c2554c33b532646ffdb68877
2022-05-13 13:15:10 -07:00
Yanqin Jin
b58a1a035b Revert "Bugfix/fix manual flush blocking bug (#9893)" (#9992)
Summary:
This reverts commit 6d2577e567.

A proposal for resolving our current internal test failures. A fix is
being planned.

More context can be found: https://github.com/facebook/rocksdb/pull/9893#issuecomment-1126230634
TSAN error: https://github.com/facebook/rocksdb/pull/9893#issuecomment-1126233132

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9992

Reviewed By: siying

Differential Revision: D36379154

Pulled By: riversand963

fbshipit-source-id: b240261e766eff099513799cf5631832093f4cd2
2022-05-13 12:31:30 -07:00
Yanqin Jin
f6d9730ea1 Fix stress test with best-efforts-recovery (#9986)
Summary:
This PR

- since we are testing with disable_wal = true and best_efforts_recovery, we should set column family count to 1, due to the requirement of `ExpectedState` tracking and replaying logic.
- during backup and checkpoint restore, disable best-efforts-recovery. This does not matter now because db_crashtest.py always disables wal when testing best-efforts-recovery. In the future, if we enable wal, then not setting `restore_opitions.best_efforts_recovery` will cause backup db not to recover the WALs, and differ from db (that enables WAL).
- during verification of backup and checkpoint restore, print the key where inconsistency exists between expected state and db.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9986

Test Plan: TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_best_efforts_recovery

Reviewed By: siying

Differential Revision: D36353105

Pulled By: riversand963

fbshipit-source-id: a484da161273e6216a1f7e245bac15a349693917
2022-05-13 12:29:20 -07:00
mrambacher
bfc6a8ee4a Option type info functions (#9411)
Summary:
Add methods to set the various functions (Parse, Serialize, Equals) to the OptionTypeInfo.  These methods simplify the number of constructors required for OptionTypeInfo and make the code a little clearer.

Add functions to the OptionTypeInfo for Prepare and Validate.  These methods allow types other than Configurable and Customizable to have Prepare and Validate logic.  These methods could be used by an option to guarantee that its settings were in a range or that a value was initialized.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9411

Reviewed By: pdillinger

Differential Revision: D36174849

Pulled By: mrambacher

fbshipit-source-id: 72517d8c6bab4723788a4c1a9e16590bff870125
2022-05-13 04:57:08 -07:00
Peter Dillinger
cdaa9576bb Put build size checking logic in Makefile (#9989)
Summary:
... for better maintainability, in case of Makefile changes /
refactoring. This is lightly modified from rocksd-lego-determinator, and
will be used by Meta-internal CI with custom REPORT_BUILD_STATISTIC

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9989

Test Plan: some manual stuff

Reviewed By: jay-zhuang

Differential Revision: D36362362

Pulled By: pdillinger

fbshipit-source-id: 52b65b6282fe839dc6d906ff95a3ed66ca1574ba
2022-05-12 22:54:18 -07:00
Chen Xiao
07c6807113 Add pmem-rocksdb-plugin link in PLUGINs.md (#9934)
Summary:
This change adds pmem-rocksdb-plugin link in PLUGINS.md. The link is: https://github.com/pmem/pmem-rocksdb-plugin. It provides a collection plugins to enable Persistent Memory (PMEM) on RocksDB.

The pmem-rocksdb-plugin repo contains RocksDB’s plugins for LSM-tree based KV store to fit it on the PMEM by effectively utilize its characteristics. The first two basic plugins are:
1) Providing a filesystem API wrapper to write RocksDB's WAL (Write Ahead Log) files on PMEM to optimize write performance. 2) Using PMEM as secondary cache to optimize read performance.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9934

Reviewed By: jay-zhuang

Differential Revision: D36366893

Pulled By: riversand963

fbshipit-source-id: d58a39365e9b5d6a3249d4e9b377c7fb2c79badb
2022-05-12 22:02:28 -07:00
Yueh-Hsuan Chiang
bcb1287235 Port the batched version of MultiGet() to RocksDB's C API (#9952)
Summary:
The batched version of MultiGet() is not available in RocksDB's C API.
This PR implements rocksdb_batched_multi_get_cf which is a C wrapper function
that invokes the batched version of MultiGet() which takes one single column family.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9952

Test Plan: Added a new test case under "columnfamilies" test case in c_test.cc

Reviewed By: riversand963

Differential Revision: D36302888

Pulled By: ajkr

fbshipit-source-id: fa134c4a1c8e7d72dd4ae8649a74e3797b5cf4e6
2022-05-12 18:17:36 -07:00
Akanksha Mahajan
6442a62e46 Update WAL corruption test so that it fails without fix (#9942)
Summary:
In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted WAL, and the "column family inconsistency" error will be hit, causing recovery to fail.

This PR update unit tests to emulate the errors and tests are failing without a fix.

Error:
```
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/0
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/0, where GetParam() = (true, false) (91 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/1
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/1, where GetParam() = (false, false) (92 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/2
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/2, where GetParam() = (true, true) (95 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/3
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/3, where GetParam() = (false, true) (92 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/0
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/0, where GetParam() = (true, false) (94 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/1
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/1, where GetParam() = (false, false) (97 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/2
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/2, where GetParam() = (true, true) (94 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/3
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/3, where GetParam() = (false, true) (91 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/0
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/0, where GetParam() = (true, false) (93 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/1
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/1, where GetParam() = (false, false) (94 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/2
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/2, where GetParam() = (true, true) (90 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/3
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/3, where GetParam() = (false, true) (93 ms)
[----------] 12 tests from CorruptionTest/CrashDuringRecoveryWithCorruptionTest (1116 ms total)

```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9942

Test Plan: Not needed

Reviewed By: riversand963

Differential Revision: D36324112

Pulled By: akankshamahajan15

fbshipit-source-id: cab2075ac4ebe48f5ef93a6ea162558aa4fc334d
2022-05-11 16:12:55 -07:00
Peter Dillinger
e96e8e2d05 Remove slack CircleCI hook (#9982)
Summary:
Our Slack site is deprecated

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9982

Test Plan: CircleCI

Reviewed By: siying

Differential Revision: D36322050

Pulled By: pdillinger

fbshipit-source-id: 678202404d307e1547e4203d7e6bd467803ccd5e
2022-05-11 13:17:21 -07:00
Andrew Kryczka
e943bbdd2f Temporarily disable sync_fault_injection (#9979)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9979

Reviewed By: siying

Differential Revision: D36301555

Pulled By: ajkr

fbshipit-source-id: ed298d3484b6aad3ef19746e984bf4c52be33a9f
2022-05-11 12:19:07 -07:00
Peter Dillinger
e8d604cf85 Reorganize CircleCI workflows (#9981)
Summary:
Condense down to 8 groups rather than 20+ for ease of browsing
pages like
https://app.circleci.com/pipelines/github/facebook/rocksdb?branch=main&filter=all

Also, run nightly builds at 1AM or 2AM Pacific (depending on daylight
time) rather than 4PM or 5PM Pacific, so that they actually use each
day's landed changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9981

Test Plan:
CI
And manually inspected
```
grep -Eo 'build-[^: ]*' .circleci/config.yml | sort | uniq -c | less
```
to ensure I didn't orphan anything

Reviewed By: jay-zhuang

Differential Revision: D36317634

Pulled By: pdillinger

fbshipit-source-id: 1c10d29d6b5d60ce3dd1364cd91f175380075ff3
2022-05-11 11:16:09 -07:00
yaphet
26768edb65 Support single delete in ldb (#9469)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9469

Reviewed By: riversand963

Differential Revision: D33953484

fbshipit-source-id: f4e84a2d9865957d744c7e84ff02ffbb0a62b0a8
2022-05-10 16:37:19 -07:00
Peter Dillinger
0d1613aad6 Avoid some warnings-as-error in CircleCI+unity+AVX512F (#9978)
Summary:
Example failure when compiling on sufficiently new hardware and built-in headers:

```
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/12.1.0/include/immintrin.h:49,
                 from ./util/bloom_impl.h:21,
                 from table/block_based/filter_policy.cc:31,
                 from unity.cc:167:
In function '__m512i _mm512_shuffle_epi32(__m512i, _MM_PERM_ENUM)',
    inlined from 'void XXH3_accumulate_512_avx512(void*, const void*, const void*)' at util/xxhash.h:3605:58,
    inlined from 'void XXH3_accumulate(xxh_u64*, const xxh_u8*, const xxh_u8*, size_t, XXH3_f_accumulate_512)' at util/xxhash.h:4229:17,
    inlined from 'void XXH3_hashLong_internal_loop(xxh_u64*, const xxh_u8*, size_t, const xxh_u8*, size_t, XXH3_f_accumulate_512, XXH3_f_scrambleAcc)' at util/xxhash.h:4251:24,
    inlined from 'XXH128_hash_t XXH3_hashLong_128b_internal(const void*, size_t, const xxh_u8*, size_t, XXH3_f_accumulate_512, XXH3_f_scrambleAcc)' at util/xxhash.h:5065:32,
    inlined from 'XXH128_hash_t XXH3_hashLong_128b_withSecret(const void*, size_t, XXH64_hash_t, const void*, size_t)' at util/xxhash.h:5104:39:
/usr/local/lib/gcc/x86_64-linux-gnu/12.1.0/include/avx512fintrin.h:4459:50: error: '__Y' may be used uninitialized [-Werror=maybe-uninitialized]
```

https://app.circleci.com/pipelines/github/facebook/rocksdb/13295/workflows/1695fb5c-40c1-423b-96b4-45107dc3012d/jobs/360416

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9978

Test Plan:
I was able to re-run in CircleCI with ssh, see the failure, ssh in and
verify that adding -fno-avx512f fixed the failure. Will watch build-linux-unity-and-headers

Reviewed By: riversand963

Differential Revision: D36296028

Pulled By: pdillinger

fbshipit-source-id: ba5955cf2ac730f57d1d18c2f517e92f34be77a3
2022-05-10 15:24:40 -07:00
Peter Dillinger
e78451f3f6 Increase soft open file limit for mini-crashtest on Linux (#9972)
Summary:
CircleCI was using a soft open file limit of 1024 which would
frequently be exceeded during test runs. Now using
```
ulimit -S -n `ulimit -H -n`
```
to set soft limit up to the hard limit (524288 in my test). I've also
applied this same idiom to existing applicable MacOS configurations to
reduce hard-coding numbers.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9972

Test Plan: CI

Reviewed By: riversand963

Differential Revision: D36262943

Pulled By: pdillinger

fbshipit-source-id: 86320cdf9b68a97fdb73531a7b4a59b4c2d2f73f
2022-05-10 09:51:03 -07:00
Andrew Kryczka
7b7a37c069 Add microbenchmarks for DB::GetMergeOperands() (#9971)
Summary:
The new microbenchmarks, DBGetMergeOperandsInMemtable and DBGetMergeOperandsInSstFile, correspond to the two different LSMs tested: all data in one memtable and all data in one SST file, respectively. Both cases are parameterized by thread count (1 or 8) and merge operands per key (1, 32, or 1024). The SST file case is additionally parameterized by whether data is in block cache or mmap'd memory.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9971

Test Plan:
```
$ TEST_TMPDIR=/dev/shm/db_basic_bench/ ./db_basic_bench --benchmark_filter=DBGetMergeOperands
The number of inputs is very large. DBGet will be repeated at least 192 times.
The number of inputs is very large. DBGet will be repeated at least 192 times.
2022-05-09T13:15:40-07:00
Running ./db_basic_bench
Run on (36 X 2570.91 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x18)
  L1 Instruction 32 KiB (x18)
  L2 Unified 1024 KiB (x18)
  L3 Unified 25344 KiB (x1)
Load Average: 4.50, 4.33, 4.37
----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                  Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------------
DBGetMergeOperandsInMemtable/entries_per_key:1/threads:1                 846 ns          846 ns       849893 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:32/threads:1               2436 ns         2436 ns       305779 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:1024/threads:1            77226 ns        77224 ns         8152 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:1/threads:8                 116 ns          929 ns       779368 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:32/threads:8                330 ns         2644 ns       280824 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:1024/threads:8            12466 ns        99718 ns         7200 db_size=0
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:0/threads:1          1640 ns         1640 ns       461262 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:1/threads:1          1693 ns         1693 ns       439936 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:0/threads:1         3999 ns         3999 ns       172881 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:1/threads:1         5544 ns         5543 ns       135657 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:0/threads:1      78767 ns        78761 ns         8395 db_size=19.6389M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:1/threads:1     157242 ns       157238 ns         4495 db_size=19.6389M
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:0/threads:8           231 ns         1848 ns       347768 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:1/threads:8           214 ns         1715 ns       393312 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:0/threads:8          596 ns         4767 ns       142088 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:1/threads:8          720 ns         5757 ns       118200 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:0/threads:8      11613 ns        92460 ns         7344 db_size=19.6389M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:1/threads:8      19989 ns       159908 ns         4440 db_size=19.6389M
```

Reviewed By: jay-zhuang

Differential Revision: D36258861

Pulled By: ajkr

fbshipit-source-id: 04b733e1cc3a4a70ed9baa894c50fdf96c0d6064
2022-05-09 15:17:19 -07:00
Peter Dillinger
c5c58708db Fix format_compatible blowing away its TEST_TMPDIR (#9970)
Summary:
https://github.com/facebook/rocksdb/issues/9961 broke format_compatible check because of `make clean`
referencing TEST_TMPDIR. The Makefile behavior seems reasonable to me,
so here's a fix in check_format_compatible.sh

Apparently I also included removing a redundant part of our CircleCI config.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9970

Test Plan: manual run: SHORT_TEST=1 ./tools/check_format_compatible.sh

Reviewed By: riversand963

Differential Revision: D36258172

Pulled By: pdillinger

fbshipit-source-id: d46507f04614e888b414ff23b88d040ae2b5c294
2022-05-09 13:38:46 -07:00
Davide Angelocola
4527bb2fed Fix conversion issues in MutableOptions (#9194)
Summary:
Removing unnecessary checks around conversion from int/long to double as it does not lose information (see https://docs.oracle.com/javase/specs/jls/se9/html/jls-5.html#jls-5.1.2).

For example, `value > Double.MAX_VALUE` is always false when value is long or int.

Can you please have a look adamretter? Also fixed some other minor issues (do you prefer a separate PR?)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9194

Reviewed By: ajkr

Differential Revision: D36221694

fbshipit-source-id: bf327c07386560b87ddc0c98039e8d6e8f2f1e82
2022-05-09 12:34:26 -07:00
Wang Yuan
89571b30e5 Improve the precision of row entry charge in row_cache (#9337)
Summary:
- For entry charge, we should only calculate the value size instead of including key size in LRUCache
- The capacity of string could show the memory usage precisely

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9337

Reviewed By: ajkr

Differential Revision: D36219855

fbshipit-source-id: 393e48ca419d230dc552ae62dd0eb1cc9f45961d
2022-05-09 12:27:38 -07:00
Luca Giacchino
39b6c5791a Improve memkind library detection (#9134)
Summary:
Improve memkind library detection in build_detect_platform:

- The current position of -lmemkind does not work with all versions of gcc
- LDFLAGS allows specifying non-standard library path through EXTRA_LDFLAGS

After the change, the options match TBB detection.
This is a follow-up to https://github.com/facebook/rocksdb/issues/6214.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9134

Reviewed By: ajkr, mrambacher

Differential Revision: D32192028

fbshipit-source-id: 115fafe8d93f1fe6aaf80afb32b2cb67aad074c7
2022-05-09 12:26:09 -07:00
leipeng
9f7968b2ed arena.h: fix Arena::IsInInlineBlock() (#9317)
Summary:
When I enable hugepage on my box, unit test fails, this PR fixes this issue:

[  FAILED  ] ArenaTest.ApproximateMemoryUsage (1 ms)

memory/arena_test.cc:127: Failure
Value of: arena.IsInInlineBlock()
  Actual: true
Expected: false
arena.IsInInlineBlock() = 1
memory/arena_test.cc:127: Failure
Value of: arena.IsInInlineBlock()
  Actual: true
Expected: false

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9317

Reviewed By: ajkr

Differential Revision: D36219813

fbshipit-source-id: 08d040d9f37ec4c16987e4150c2db876180d163d
2022-05-09 12:21:21 -07:00
Qingyou Meng
7b55b50839 util/ribbon_alg.h: removed duplicate word "vector" (#9216)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9216

Reviewed By: riversand963

Differential Revision: D36219934

fbshipit-source-id: 8253b4e3eacceb8b040eeaa45cd5a50570a4eba6
2022-05-06 18:38:13 -07:00
aierui
d1cc91c142 typo fix: delete duplicate comment word (#9249)
Summary:
typo fix: delete duplicate comment word

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9249

Reviewed By: riversand963

Differential Revision: D36219911

fbshipit-source-id: 01e2fda65590f18fe46eefb56e049e6f2d028ae8
2022-05-06 18:29:33 -07:00
♚ PH⑦ de Soria™♛
9381436bf3 Fixed typo (#9331)
Summary:
Just fixing a very minor typo in the doc block :) Hope it will help anyway 😊

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9331

Reviewed By: riversand963

Differential Revision: D34339823

fbshipit-source-id: b76104bc3efbc9d1f38cbf5c6dd7648dc909ced3
2022-05-06 17:41:07 -07:00
Peter Dillinger
e03d958b91 Clean up variables for temporary directory (#9961)
Summary:
Having all of TMPD, TMPDIR and TEST_TMPDIR as configuration
parameters is confusing. This change simplifies a number of things by
standardizing on TEST_TMPDIR, while still recognizing the old names
also. In detail:
* crash_test.mk also needs to use TEST_TMPDIR for crash test, so put in
shared common.mk (an upgrade of python.mk)
* Always exporting TEST_TMPDIR eliminates the need to propagate TMPD or
export TEST_TMPDIR in selective places.
* Use --tmpdir option to gnu_parallel so that it doesn't need TMPDIR
environment variable
* Remove obsolete parloop and parallel_check Makefile targets
* Remove undefined, unused function ResetTmpDirForDirectIO()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9961

Test Plan: manual + CI

Reviewed By: riversand963

Differential Revision: D36212178

Pulled By: pdillinger

fbshipit-source-id: b76c1876c4f4d38b37789c2779eaa7c3026824dd
2022-05-06 16:38:06 -07:00
Roman Puchkovskiy
00889cf8f2 Never use String#getBytes() in the production code (#9487)
Summary:
There are encodings that are not ASCII-compatible (like cp1140), so it is possible that a JVM is run with a default encoding in which String#getBytes() would return unexpected values even for ASCII strings.

A little bit of context: https://stackoverflow.com/questions/70913929/can-an-encoding-incompatible-with-ascii-encoding-be-set-as-a-default-encoding-in/70914154

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9487

Reviewed By: riversand963

Differential Revision: D34097728

fbshipit-source-id: afd654ecaf20f6d30d9fc20c6a090398de2585eb
2022-05-06 16:22:15 -07:00
sdong
736a7b5433 Remove own ToString() (#9955)
Summary:
ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955

Test Plan: Watch CI tests

Reviewed By: riversand963

Differential Revision: D36176799

fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471
2022-05-06 13:03:58 -07:00
Andrew Kryczka
62d84e2a2b db_stress fault injection in release mode (#9957)
Summary:
Previously all fault injection was ignored in release mode. This PR adds it back except for read fault injection (`--read_fault_one_in > 0`) since its dependency (`IGNORE_STATUS_IF_ERROR`) is unavailable in release mode.

Other notable changes include:

- Moved `EnableWriteErrorInjection()` for `--write_fault_one_in > 0` so it's after `DB::Open()` without depending on `SyncPoint`
- Made `--read_fault_one_in > 0` return an error in release mode
- Updated `db_crashtest.py` to always set `--read_fault_one_in=0` in release mode

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9957

Test Plan:
```
$ DEBUG_LEVEL=0 make -j24 db_stress
$ DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox
```

Reviewed By: anand1976

Differential Revision: D36193830

Pulled By: ajkr

fbshipit-source-id: 0b97946b4e3f06e3e0f6e7833c2763da08ec5321
2022-05-06 11:17:08 -07:00
Otto Kekäläinen
b7aaa98762 Fix various spelling errors still found in code (#9653)
Summary:
dont -> don't
refered -> referred

This is a re-run of PR#7785 and acc9679 since these typos keep coming back.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9653

Reviewed By: jay-zhuang

Differential Revision: D34879593

fbshipit-source-id: d7631fb779ea0129beae92abfb838038e60790f8
2022-05-05 19:45:32 -07:00
Andrew Kryczka
a62506aee2 Enable unsynced data loss in crash test (#9947)
Summary:
`db_stress` already tracks expected state history to verify prefix-recoverability when `sync_fault_injection` is enabled. This PR enables `sync_fault_injection` in `db_crashtest.py`.

Previously enabling `sync_fault_injection` would cause whole unsynced files to be dropped. This PR adds a more interesting case of losing only the tail of unsynced data by implementing `TestFSWritableFile::RangeSync()` and enabling `{wal_,}bytes_per_sync`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9947

Test Plan:
- regular blackbox, blackbox --simple
- various commands to stress this new case, such as `TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=100000 --write_buffer_size=2097152 --avoid_flush_during_recovery=1 --disable_wal=0 --interval=10 --db_write_buffer_size=0 --sync_fault_injection=1 --wal_compression=none --delpercent=0 --delrangepercent=0 --prefixpercent=0 --iterpercent=0 --writepercent=100 --readpercent=0 --wal_bytes_per_sync=131072 --duration=36000 --sync=0 --open_write_fault_one_in=16`

Reviewed By: riversand963

Differential Revision: D36152775

Pulled By: ajkr

fbshipit-source-id: 44b68a7fad0a4cf74af9fe1f39be01baab8141d8
2022-05-05 13:21:03 -07:00
sdong
49628c9a83 Use std::numeric_limits<> (#9954)
Summary:
Right now we still don't fully use std::numeric_limits but use a macro, mainly for supporting VS 2013. Right now we only support VS 2017 and up so it is not a problem. The code comment claims that MinGW still needs it. We don't have a CI running MinGW so it's hard to validate. since we now require C++17, it's hard to imagine MinGW would still build RocksDB but doesn't support std::numeric_limits<>.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9954

Test Plan: See CI Runs.

Reviewed By: riversand963

Differential Revision: D36173954

fbshipit-source-id: a35a73af17cdcae20e258cdef57fcf29a50b49e0
2022-05-05 13:08:21 -07:00
sdong
46f8889b6a platform010 gcc (#9946)
Summary:
Make platform010 gcc build work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9946

Test Plan:
ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 make release -j
ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 make all check -j

Reviewed By: pdillinger, mdcallag

Differential Revision: D36152684

fbshipit-source-id: ca7b0916c51501a72bb15ad33a85e8c5cac5b505
2022-05-05 11:45:51 -07:00
Trynity Mirell
e62c23cce4 Generate pkg-config file via CMake (#9945)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/7934

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9945

Test Plan:
Built via Homebrew pointing to my fork/branch:

```
  ~/src/github.com/facebook/fbthrift on   main ❯ cat ~/.homebrew/opt/rocksdb/lib/pkgconfig/rocksdb.pc                                                                                                                                                     took  1h 17m 48s at  04:24:54 pm
prefix="/Users/trynity/.homebrew/Cellar/rocksdb/HEAD-968e4dd"
exec_prefix="${prefix}"
libdir="${prefix}/lib"
includedir="${prefix}/include"

Name: rocksdb
Description: An embeddable persistent key-value store for fast storage
URL: https://rocksdb.org/
Version: 7.3.0
Cflags: -I"${includedir}"
Libs: -L"${libdir}" -lrocksdb
```

Reviewed By: riversand963

Differential Revision: D36161635

Pulled By: trynity

fbshipit-source-id: 0f1a9c30e43797ee65e6696896e06fde0658456e
2022-05-05 09:03:31 -07:00
Yanqin Jin
9d634dd5b6 Rename kRemoveWithSingleDelete to kPurge (#9951)
Summary:
PR 9929 adds a new CompactionFilter::Decision, i.e.
kRemoveWithSingleDelete so that CompactionFilter can indicate to
CompactionIterator that a PUT can only be removed with SD. However, how
CompactionIterator handles such a key is implementation detail which
should not be implied in the public API. In fact,
such a PUT can just be dropped. This is an optimization which we will apply in the near future.

Discussion thread: https://github.com/facebook/rocksdb/pull/9929#discussion_r863198964

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9951

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D36156590

Pulled By: riversand963

fbshipit-source-id: 7b7d01f47bba4cad7d9cca6ca52984f27f88b372
2022-05-05 08:16:20 -07:00
sdong
68ac507f96 Printing IO Error in DumpDBFileSummary (#9940)
Summary:
Right now in DumpDBFileSummary, IO error isn't printed out, but they are sometimes helpful. Print it out instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9940

Test Plan: Watch existing tests to pass.

Reviewed By: riversand963

Differential Revision: D36113016

fbshipit-source-id: 13002080fa4dc76589e2c1c5a1079df8a3c9391c
2022-05-04 10:19:53 -07:00
Mark Callaghan
bf68d1c93d Print elapsed time and number of operations completed (#9886)
Summary:
This is inspired by debugging a regression test that runs for ~0.05 seconds and the short
running time makes it prone to variance. While db_bench ran for ~60 seconds, 59.95 seconds
was spent opening 128 databases (and doing recovery). So it was harder to notice that the
benchmark only ran for 0.05 seconds.

Normally I add output to the end of the line to make life easier for existing tools that parse it
but in this case the output near the end of the line has two optional parts and one of the optional
parts adds an extra newline.

This is for https://github.com/facebook/rocksdb/issues/9856

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9886

Test Plan:
./db_bench --benchmarks=overwrite,readrandom --num=1000000 --threads=4

old output:
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 overwrite    :      14.108 micros/op 283338 ops/sec;   31.3 MB/s
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 readrandom   :       7.994 micros/op 496788 ops/sec;   55.0 MB/s (1000000 of 1000000 found)

new output:
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 overwrite    :      14.117 micros/op 282862 ops/sec 14.141 seconds 4000000 operations;   31.3 MB/s
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 readrandom   :       8.649 micros/op 458475 ops/sec 8.725 seconds 4000000 operations;   49.8 MB/s (981548 of 1000000 found)

Reviewed By: ajkr

Differential Revision: D36102269

Pulled By: mdcallag

fbshipit-source-id: 5cd8a9e11f5cbe2a46809571afd83335b6b0caa0
2022-05-04 10:15:49 -07:00
jsteemann
95663ff763 do not call DeleteFile for not-created sst files (#9920)
Summary:
When a memtable is flushed and the flush would lead to a 0 byte .sst
file being created, RocksDB does not write out the empty .sst file to
disk.
However it still calls Env::DeleteFile() on the file as part of some
cleanup procedure at the end of BuildTable().
Because the to-be-deleted file does not exist, this requires
implementors of the DeleteFile() API to check if the file exists on
their own code, or otherwise risk running into PathNotFound errors when
DeleteFile is invoked on non-existing files.
This PR fixes the situation so that when no .sst file is created,
Deletefile will not be called either.
TableFileCreationStarted() will still be called as before.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9920

Reviewed By: ajkr

Differential Revision: D36107102

Pulled By: riversand963

fbshipit-source-id: 15881ba3fa3192dd448f906280a1cfc7a68a114a
2022-05-04 10:15:30 -07:00
Hui Xiao
de537dcaf1 Fix a comment in RateLimiter::RequestToken (#9933)
Summary:
**Context/Summary:**
- As titled

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9933

Test Plan: - No code change

Reviewed By: ajkr

Differential Revision: D36086544

Pulled By: hx235

fbshipit-source-id: 2bdd19f67e45df1e3af4121b0c1a5e866a57826d
2022-05-04 10:10:36 -07:00
Jay Zhuang
270179bb12 Default try_load_options to true when DB is specified (#9937)
Summary:
If the DB path is specified, the user would expect ldb loads the
options from the path, but it's not:
```
$ ldb list_live_files_metadata --db=`pwd`
```
Default `try_load_options` to true in that case. The user can still
disable that by:
```
$ ldb list_live_files_metadata --db=`pwd` --try_load_options=false
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9937

Test Plan:
`ldb list_live_files_metadata --db=`pwd`` is able to work for
a db generated with different options.num_levels.

Reviewed By: ajkr

Differential Revision: D36106708

Pulled By: jay-zhuang

fbshipit-source-id: 2732fdc027a4d172436b2c9b6a9787b56b10c710
2022-05-04 08:49:46 -07:00
Xinyu Zeng
8b74cea7fe Reduce comparator objects init cost in BlockIter (#9611)
Summary:
This PR solves the problem discussed in https://github.com/facebook/rocksdb/issues/7149. By storing the pointer of InternalKeyComparator as icmp_ in BlockIter, the object size remains the same. And for each call to CompareCurrentKey, there is no need to create Comparator objects. One can use icmp_ directly or use the "user_comparator" from the icmp_.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9611

Test Plan:
with https://github.com/facebook/rocksdb/issues/9903,

```
$ TEST_TMPDIR=/dev/shm python3.6 ../benchmark/tools/compare.py benchmarks ./db_basic_bench ../rocksdb-pr9611/db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1 --benchmark_repetitions=50
...
Comparing ./db_basic_bench to ../rocksdb-pr9611/db_basic_bench
Benchmark                                                                                                                                                               Time             CPU      Time Old      Time New       CPU Old       CPU New
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
...
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_pvalue                 0.0001          0.0001      U Test, Repetitions: 50 vs 50
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_mean                  -0.0483         -0.0483          3924          3734          3924          3734
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_median                -0.0713         -0.0713          3971          3687          3970          3687
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_stddev                -0.0342         -0.0344           225           217           225           217
DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:1/iterations:262144/threads:1_cv                    +0.0148         +0.0146             0             0             0             0
OVERALL_GEOMEAN                                                                                                                                                      -0.0483         -0.0483             0             0             0             0
```

Reviewed By: akankshamahajan15

Differential Revision: D35882037

Pulled By: ajkr

fbshipit-source-id: 9e5337bbad8f1239dff7aa9f6549020d599bfcdf
2022-05-03 17:37:19 -07:00
Siying Dong
b82edffc7b Improve comments to options.allow_mmap_reads (#9936)
Summary:
It confused users and use that with options.allow_mmap_reads = true, CPU is high with checksum verification. Add a comment to explain it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9936

Reviewed By: anand1976

Differential Revision: D36106529

fbshipit-source-id: 3d723bd686f96a84c694c8b2d91ad28d9ccfd979
2022-05-03 16:21:31 -07:00
Andrew Kryczka
440c7f6306 db_basic_bench fix for DB object cleanup (#9939)
Summary:
Use `unique_ptr<DB>` to make sure the DB object is deleted. Previously it was not, which led to accumulating file descriptors for deleted directories because a `DBImpl::db_dir_` from each test remained alive.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9939

Test Plan: run `lsof -p $(pidof db_basic_bench)` while benchmark runs; verify no FDs for deleted directories.

Reviewed By: jay-zhuang

Differential Revision: D36108761

Pulled By: ajkr

fbshipit-source-id: cfe02646b038a445af7d5db8989eb1f40d658359
2022-05-03 13:38:38 -07:00
Peter Dillinger
bb87164db3 Fork and simplify LRUCache for developing enhancements (#9917)
Summary:
To support a project to prototype and evaluate algorithmic
enhancments and alternatives to LRUCache, here I have separated out
LRUCache into internal-only "FastLRUCache" and cut it down to
essentials, so that details like secondary cache handling and
priorities do not interfere with prototyping. These can be
re-integrated later as needed, along with refactoring to minimize code
duplication (which would slow down prototyping for now).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9917

Test Plan:
unit tests updated to ensure basic functionality has (likely)
been preserved

Reviewed By: anand1976

Differential Revision: D35995554

Pulled By: pdillinger

fbshipit-source-id: d67b20b7ada3b5d3bfe56d897a73885894a1d9db
2022-05-03 12:32:02 -07:00
Peter Dillinger
4b9a1a2f56 Fix db_crashtest.py call inconsistency in crash_test.mk (#9935)
Summary:
Some tests crashing because not using custom DB_STRESS_CMD

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9935

Test Plan: internal tests

Reviewed By: riversand963

Differential Revision: D36104347

Pulled By: pdillinger

fbshipit-source-id: 23f080704a124174203f54ffd85578c2047effe5
2022-05-03 12:03:57 -07:00
Mark Callaghan
b6ec3328af Make --benchmarks=flush flush the default column family (#9887)
Summary:
db_bench --benchmarks=flush wasn't flushing the default column family.

This is for https://github.com/facebook/rocksdb/issues/9880

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9887

Test Plan:
Confirm that flush works (*.log is empty) when "flush" added to benchmark list
Confirm that *.log is not empty otherwise.

Repeat for all combinations for: uses column families, uses multiple databases

./db_bench --benchmarks=overwrite --num=10000
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
-rw-r--r-- 1 me users 1380286 Apr 21 10:47 /tmp/rocksdbtest-2260/dbbench/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
 -rw-r--r-- 1 me users 0 Apr 21 10:48 /tmp/rocksdbtest-2260/dbbench/000008.log

./db_bench --benchmarks=overwrite --num=10000 --num_column_families=4
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
  -rw-r--r-- 1 me users 1387823 Apr 21 10:49 /tmp/rocksdbtest-2260/dbbench/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000 --num_column_families=4
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
-rw-r--r-- 1 me users 0 Apr 21 10:51 /tmp/rocksdbtest-2260/dbbench/000014.log

./db_bench --benchmarks=overwrite --num=10000 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
 -rw-r--r-- 1 me users 1380838 Apr 21 10:55 /tmp/rocksdbtest-2260/dbbench/0/000004.log
 -rw-r--r-- 1 me users 1379734 Apr 21 10:55 /tmp/rocksdbtest-2260/dbbench/1/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
-rw-r--r-- 1 me users 0 Apr 21 10:57 /tmp/rocksdbtest-2260/dbbench/0/000013.log
-rw-r--r-- 1 me users 0 Apr 21 10:57 /tmp/rocksdbtest-2260/dbbench/1/000013.log

./db_bench --benchmarks=overwrite --num=10000 --num_column_families=4 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
-rw-r--r-- 1 me users 1395108 Apr 21 10:52 /tmp/rocksdbtest-2260/dbbench/1/000004.log
-rw-r--r-- 1 me users 1380411 Apr 21 10:52 /tmp/rocksdbtest-2260/dbbench/0/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000 --num_column_families=4 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
-rw-r--r-- 1 me users 0 Apr 21 10:54 /tmp/rocksdbtest-2260/dbbench/0/000022.log
-rw-r--r-- 1 me users 0 Apr 21 10:54 /tmp/rocksdbtest-2260/dbbench/1/000022.log

Reviewed By: ajkr

Differential Revision: D36026777

Pulled By: mdcallag

fbshipit-source-id: d42d3d7efceea7b9a25bbbc0f04461d2b7301122
2022-05-03 09:37:49 -07:00
Yanqin Jin
2b5df21e95 Remove ifdef for try_emplace after upgrading to c++17 (#9932)
Summary:
Test plan
make check

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9932

Reviewed By: ajkr

Differential Revision: D36085404

Pulled By: riversand963

fbshipit-source-id: 2ece14ca0e2e4c1288339ff79e7e126b76eaf786
2022-05-02 19:39:24 -07:00
Andrew Kryczka
cda34dd64a Allow consecutive SingleDelete() in stress/crash test (#9930)
Summary:
We need to support consecutive SingleDelete(), so this PR adds it to the stress/crash tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9930

Test Plan: `python3 tools/db_crashtest.py blackbox --simple --nooverwritepercent=50 --writepercent=90 --delpercent=10 --readpercent=0 --prefixpercent=0 --delrangepercent=0 --iterpercent=0 --max_key=1000000 --duration=3600 --interval=10 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33`

Reviewed By: riversand963

Differential Revision: D36081863

Pulled By: ajkr

fbshipit-source-id: 3566cdbaed375b8003126fc298968eb1a854317f
2022-05-02 16:19:00 -07:00
Yanqin Jin
06394ff4e7 Fix a bug of CompactionIterator/CompactionFilter using Delete (#9929)
Summary:
When compaction filter determines that a key should be removed, it updates the internal key's type
to `Delete`. If this internal key is preserved in current compaction but seen by a later compaction
together with `SingleDelete`, it will cause compaction iterator to return Corruption.

To fix the issue, compaction filter should return more information in addition to the intention of removing
a key. Therefore, we add a new `kRemoveWithSingleDelete` to `CompactionFilter::Decision`. Seeing
`kRemoveWithSingleDelete`, compaction iterator will update the op type of the internal key to `kTypeSingleDelete`.

In addition, I updated db_stress_shared_state.[cc|h] so that `no_overwrite_ids_` becomes `const`. It is easier to
reason about thread-safety if accessed from multiple threads. This information is passed to `PrepareTxnDBOptions()`
when calling from `Open()` so that we can set up the rollback deletion type callback for transactions.

Finally, disable compaction filter for multiops_txn because the key removal logic of `DbStressCompactionFilter` does
not quite work with `MultiOpsTxnsStressTest`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9929

Test Plan:
make check
make crash_test
make crash_test_with_txn

Reviewed By: anand1976

Differential Revision: D36069678

Pulled By: riversand963

fbshipit-source-id: cedd2f1ba958af59ad3916f1ba6f424307955f92
2022-05-02 13:25:45 -07:00
Changyu Bi
37f490834d Specify largest_seqno in VerifyChecksum (#9919)
Summary:
`VerifyChecksum()` does not specify `largest_seqno` when creating a `TableReader`. As a result, the `TableReader` uses the `TableReaderOptions` default value (0) for `largest_seqno`. This causes the following error when the file has a nonzero global seqno in its properties:
```
Corruption: An external sst file with version 2 have global seqno property with value , while largest seqno in the file is 0
```
This PR fixes this by specifying `largest_seqno` in `VerifyChecksumInternal` with `largest_seqno` from the file metadata.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9919

Test Plan: `make check`

Reviewed By: ajkr

Differential Revision: D36028824

Pulled By: cbi42

fbshipit-source-id: 428d028a79386f46ef97bb6b6051dc76c83e1f2b
2022-05-02 10:22:08 -07:00
Yanqin Jin
2b5c29f9f3 Enforce the contract of SingleDelete (#9888)
Summary:
Enforce the contract of SingleDelete so that they are not mixed with
Delete for the same key. Otherwise, it will lead to undefined behavior.
See https://github.com/facebook/rocksdb/wiki/Single-Delete#notes.

Also fix unit tests and write-unprepared.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9888

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D35837817

Pulled By: riversand963

fbshipit-source-id: acd06e4dcba8cb18df92b44ed18c57e10e5a7635
2022-04-28 14:48:27 -07:00
Anvesh Komuravelli
aafb377bb5 Update protection info on recovered logs data (#9875)
Summary:
Update protection info on recovered logs data

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9875

Test Plan:
- Benchmark setup: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576000`
- Benchmark command: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=overwrite -write_buffer_size=1048576000 -writes=1 -report_open_timing=true`
- Results before this PR
```
OpenDb:     2350.14 milliseconds
OpenDb:     2296.94 milliseconds
OpenDb:     2184.29 milliseconds
OpenDb:     2167.59 milliseconds
OpenDb:     2231.24 milliseconds
OpenDb:     2109.57 milliseconds
OpenDb:     2197.71 milliseconds
OpenDb:     2120.8 milliseconds
OpenDb:     2148.12 milliseconds
OpenDb:     2207.95 milliseconds
```
- Results after this PR
```
OpenDb:     2424.52 milliseconds
OpenDb:     2359.84 milliseconds
OpenDb:     2317.68 milliseconds
OpenDb:     2339.4 milliseconds
OpenDb:     2325.36 milliseconds
OpenDb:     2321.06 milliseconds
OpenDb:     2353.98 milliseconds
OpenDb:     2344.64 milliseconds
OpenDb:     2384.09 milliseconds
OpenDb:     2428.58 milliseconds
```

Mean regressed 7.2% (2201.4 -> 2359.9)

Reviewed By: ajkr

Differential Revision: D36012787

Pulled By: akomurav

fbshipit-source-id: d2aba09f29c6beb2fd0fe8e1e359be910b4ef02a
2022-04-28 14:42:00 -07:00
Akanksha Mahajan
fce65e7e4f Fix bug in async_io path which reads incorrect length (#9916)
Summary:
In FilePrefetchBuffer, in case data is overlapping between two
buffers and more data is required to read and copy that to third buffer,
incorrect length was updated resulting in
```
Iterator diverged from control iterator which has value 00000000000310C3000000000000012B0000000000000274 total_order_seek: 1 auto_prefix_mode: 0 S 000000000002C37F000000000000012B000000000000001C NNNPPPPPNN; total_order_seek: 1 auto_prefix_mode: 0 S 000000000002F10B00000000000000BF78787878787878 NNNPNNNNPN; total_order_seek: 1 auto_prefix_mode: 0 S 00000000000310C3000000000000012B000000000000026B
iterator is not valid
Control CF default
db_stress: db_stress_tool/db_stress_test_base.cc:1388: void rocksdb::StressTest::VerifyIterator(rocksdb::ThreadState*, rocksdb::ColumnFamilyHandle*, const rocksdb::ReadOptions&, rocksdb::Iterator*, rocksdb::Iterator*, rocksdb::StressTest::LastIterateOp, const rocksdb::Slice&, const string&, bool*): Assertion `false' failed.
Aborted (core dumped)
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9916

Test Plan:
```
- CircleCI jobs
- Ran db_stress with OPTIONS file which caught the bug
 ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=42.26248932628998 --bottommost_compression_type=disable --cache_index_and_filter_blocks=0 --cache_size=8388608 --checkpoint_one_in=0 --checksum_type=kxxHash --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 --compression_max_dict_buffer_bytes=1073741823 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/ --db_write_buffer_size=134217728 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_blob_files=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --fail_if_options_file_error=0 --file_checksum_impl=none --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=12 --index_type=2 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=0 --mark_for_compaction_one_file_in=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=8388608 --memtable_prefix_bloom_size_ratio=0.001 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=0 --read_only=0 --readpercent=50 --recycle_log_file_num=1 --reopen=0 --reserve_table_reader_memory=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_catch_up_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=3 --unpartitioned_pinning=3 --use_blob_db=0 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=0 --use_txn=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=0 --writepercent=35 --options_file=/home/akankshamahajan/OPTIONS.orig -column_families=1

db_bench with async_io enabled to make sure db_bench completes successfully without any failure.
- ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1
```

crash_test in progress

Reviewed By: anand1976

Differential Revision: D35985789

Pulled By: akankshamahajan15

fbshipit-source-id: 5abe185f34caa99ca587d4bdc8954bd0802b1bf9
2022-04-27 22:33:29 -07:00
Yanqin Jin
94e245a14d Improve stress test for MultiOpsTxnsStressTest (#9829)
Summary:
Adds more coverage to `MultiOpsTxnsStressTest` with a focus on write-prepared transactions.

1. Add a hack to manually evict commit cache entries. We currently cannot assign small values to `wp_commit_cache_bits` because it requires a prepared transaction to commit within a certain range of sequence numbers, otherwise it will throw.
2. Add coverage for commit-time-write-batch. If write policy is write-prepared, we need to set `use_only_the_last_commit_time_batch_for_recovery` to true.
3. After each flush/compaction, verify data consistency. This is possible since data size can be small: default numbers of primary/secondary keys are just 1000.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9829

Test Plan:
```
TEST_TMPDIR=/dev/shm/rocksdb_crashtest_blackbox/ make blackbox_crash_test_with_multiops_wp_txn
```

Reviewed By: pdillinger

Differential Revision: D35806678

Pulled By: riversand963

fbshipit-source-id: d7fde7a29fda0fb481a61f553e0ca0c47da93616
2022-04-27 17:50:54 -07:00
Herman Lee
d9d456de49 Fix locktree accesses to PessimisticTransactions (#9898)
Summary:
The current locktree implementation stores the address of the
PessimisticTransactions object as the TXNID. However, when a transaction
is blocked on a lock, it records the list of waitees with conflicting
locks using the rocksdb assigned TransactionID. This is performed by
calling GetID() on PessimisticTransactions objects of the waitees,
and then recorded in the waiter's list.

However, there is no guarantee the objects are valid when recording the
waitee list during the conflict callbacks because the waitee
could have released the lock and freed the PessimisticTransactions
object.

The waitee/txnid values are only valid PessimisticTransaction objects
while the mutex for the root of the locktree is held.

The simplest fix for this problem is to use the address of the
PessimisticTransaction as the TransactionID so that it is consistent
with its usage in the locktree. The TXNID is only converted back to a
PessimisticTransaction for the report_wait callbacks. Since
these callbacks are now all made within the critical section where the
lock_request queue mutx is held, these conversions will be safe.
Otherwise, only the uint64_t TXNID of the waitee is registerd
with the waiter transaction. The PessimisitcTransaction object of the
waitee is never referenced.

The main downside of this approach is the TransactionID will not change
if the PessimisticTransaction object is reused for new transactions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9898

Test Plan:
Add a new test case and run unit tests.
Also verified with MyRocks workloads using range locks that the
crash no longer happens.

Reviewed By: riversand963

Differential Revision: D35950376

Pulled By: hermanlee

fbshipit-source-id: 8c9cae272e23e487fc139b6a8ed5b8f8f24b1570
2022-04-27 09:12:52 -07:00
Paras Sethia
68ee228dec RocksDB: fix bug in crash-recovery correctness testing (#9897)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9897

Fixes https://github.com/facebook/rocksdb/issues/9385.

Update State to reflect the value in the DB after a crash

Reviewed By: ajkr

Differential Revision: D35788808

fbshipit-source-id: 2d21d8537ab380a17cad3e90ac72b3eb1b56de9f
2022-04-27 06:01:09 -07:00
Peter Dillinger
9d0cae7104 Eliminate unnecessary (slow) block cache Ref()ing in MultiGet (#9899)
Summary:
When MultiGet() determines that multiple query keys can be
served by examining the same data block in block cache (one Lookup()),
each PinnableSlice referring to data in that data block needs to hold
on to the block in cache so that they can be released at arbitrary
times by the API user. Historically this is accomplished with extra
calls to Ref() on the Handle from Lookup(), with each PinnableSlice
cleanup calling Release() on the Handle, but this creates extra
contention on the block cache for the extra Ref()s and Release()es,
especially because they hit the same cache shard repeatedly.

In the case of merge operands (possibly more cases?), the problem was
compounded by doing an extra Ref()+eventual Release() for each merge
operand for a key reusing a block (which could be the same key!), rather
than one Ref() per key. (Note: the non-shared case with `biter` was
already one per key.)

This change optimizes MultiGet not to rely on these extra, contentious
Ref()+Release() calls by instead, in the shared block case, wrapping
the cache Release() cleanup in a refcounted object referenced by the
PinnableSlices, such that after the last wrapped reference is released,
the cache entry is Release()ed. Relaxed atomic refcounts should be
much faster than mutex-guarded Ref() and Release(), and much less prone
to a performance cliff when MultiGet() does a lot of block sharing.

Note that I did not use std::shared_ptr, because that would require an
extra indirection object (shared_ptr itself new/delete) in order to
associate a ref increment/decrement with a Cleanable cleanup entry. (If
I assumed it was the size of two pointers, I could do some hackery to
make it work without the extra indirection, but that's too fragile.)

Some details:
* Fixed (removed) extra block cache tracing entries in cases of cache
entry reuse in MultiGet, but it's likely that in some other cases traces
are missing (XXX comment inserted)
* Moved existing implementations for cleanable.h from iterator.cc to
new cleanable.cc
* Improved API comments on Cleanable
* Added a public SharedCleanablePtr class to cleanable.h in case others
could benefit from the same pattern (potentially many Cleanables and/or
smart pointers referencing a shared Cleanable)
* Add a typedef for MultiGetContext::Mask
* Some variable renaming for clarity

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9899

Test Plan:
Added unit tests for SharedCleanablePtr.

Greatly enhanced ability of existing tests to detect cache use-after-free.
* Release PinnableSlices from MultiGet as they are read rather than in
bulk (in db_test_util wrapper).
* In ASAN build, default to using a trivially small LRUCache for block_cache
so that entries are immediately erased when unreferenced. (Updated two
tests that depend on caching.) New ASAN testsuite running time seems
OK to me.

If I introduce a bug into my implementation where we skip the shared
cleanups on block reuse, ASAN detects the bug in
`db_basic_test *MultiGet*`. If I remove either of the above testing
enhancements, the bug is not detected.

Consider for follow-up work: manipulate or randomize ordering of
PinnableSlice use and release from MultiGet db_test_util wrapper. But in
typical cases, natural ordering gives pretty good functional coverage.

Performance test:
In the extreme (but possible) case of MultiGetting the same or adjacent keys
in a batch, throughput can improve by an order of magnitude.
`./db_bench -benchmarks=multireadrandom -db=/dev/shm/testdb -readonly -num=5 -duration=10 -threads=20 -multiread_batched -batch_size=200`
Before ops/sec, num=5: 1,384,394
Before ops/sec, num=500: 6,423,720
After ops/sec, num=500: 10,658,794
After ops/sec, num=5: 16,027,257

Also note that previously, with high parallelism, having query keys
concentrated in a single block was worse than spreading them out a bit. Now
concentrated in a single block is faster than spread out, which is hopefully
consistent with natural expectation.

Random query performance: with num=1000000, over 999 x 10s runs running before & after simultaneously (each -threads=12):
Before: multireadrandom [AVG    999 runs] : 1088699 (± 7344) ops/sec;  120.4 (± 0.8 ) MB/sec
After: multireadrandom [AVG    999 runs] : 1090402 (± 7230) ops/sec;  120.6 (± 0.8 ) MB/sec
Possibly better, possibly in the noise.

Reviewed By: anand1976

Differential Revision: D35907003

Pulled By: pdillinger

fbshipit-source-id: bbd244d703649a8ca12d476f2d03853ed9d1a17e
2022-04-26 21:59:24 -07:00
Andrew Kryczka
ce2d8a4239 fix clang-analyze in corruption_test (#9908)
Summary:
This PR fixes a clang-analyze error that I introduced in https://github.com/facebook/rocksdb/issues/9906:

```
db/corruption_test.cc:358:15: warning: Called C++ object pointer is null
    ASSERT_OK(db_->Put(WriteOptions(), cfhs[0], "k", "v"));
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./test_util/testharness.h:76:62: note: expanded from macro 'ASSERT_OK'
  ASSERT_PRED_FORMAT1(ROCKSDB_NAMESPACE::test::AssertStatus, s)
                                                             ^
third-party/gtest-1.8.1/fused-src/gtest/gtest.h:19909:36: note: expanded
from macro 'ASSERT_PRED_FORMAT1'
  GTEST_PRED_FORMAT1_(pred_format, v1, GTEST_FATAL_FAILURE_)
                                   ^~
third-party/gtest-1.8.1/fused-src/gtest/gtest.h:19892:34: note: expanded
from macro 'GTEST_PRED_FORMAT1_'
  GTEST_ASSERT_(pred_format(#v1, v1), \
                                 ^~
third-party/gtest-1.8.1/fused-src/gtest/gtest.h:19868:52: note: expanded
from macro 'GTEST_ASSERT_'
  if (const ::testing::AssertionResult gtest_ar = (expression)) \
                                                   ^~~~~~~~~~
1 warning generated.
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9908

Reviewed By: riversand963

Differential Revision: D35953147

Pulled By: ajkr

fbshipit-source-id: 9b837bd7581c6e1e2cdbc961c099652256eb9d4b
2022-04-26 19:21:34 -07:00
Andrew Kryczka
1eb279dcce Add mmap DBGet microbench parameters (#9903)
Summary:
I tried evaluating https://github.com/facebook/rocksdb/issues/9611 using DBGet microbenchmarks but mostly found the change is well within the noise even for hundreds of repetitions; meanwhile, the InternalKeyComparator CPU it saves is 1-2% according to perf so it should be measurable. In this PR I tried adding a mmap mode that will bypass compression/checksum/block cache/file read to focus more on the block lookup paths, and also increased the Get() count.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9903

Reviewed By: jay-zhuang, riversand963

Differential Revision: D35907375

Pulled By: ajkr

fbshipit-source-id: 69490d5040ef0863e1ce296724104d0aa7667215
2022-04-26 16:46:39 -07:00
Andrew Kryczka
c5d367f472 Revert open logic changes in #9634 (#9906)
Summary:
Left HISTORY.md and unit tests.
Added a new unit test to repro the corruption scenario that this PR fixes, and HISTORY.md line for that.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9906

Reviewed By: riversand963

Differential Revision: D35940093

Pulled By: ajkr

fbshipit-source-id: 9816f99e1ce405ba36f316beb4f6378c37c8c86b
2022-04-26 14:46:53 -07:00
Akanksha Mahajan
3653029dda Add stats related to async prefetching (#9845)
Summary:
Add stats PREFETCHED_BYTES_DISCARDED and POLL_WAIT_MICROS.
PREFETCHED_BYTES_DISCARDED records number of prefetched bytes discarded by
FilePrefetchBuffer. POLL_WAIT_MICROS records the time taken by underling
file_system Poll API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9845

Test Plan: Update existing tests

Reviewed By: anand1976

Differential Revision: D35909694

Pulled By: akankshamahajan15

fbshipit-source-id: e009ef940bb9ed72c9446f5529095caabb8a1e36
2022-04-25 21:58:22 -07:00
RoeyMaor
6d2577e567 Bugfix/fix manual flush blocking bug (#9893)
Summary:
Fix https://github.com/facebook/rocksdb/issues/9892

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9893

Reviewed By: jay-zhuang

Differential Revision: D35880959

Pulled By: ajkr

fbshipit-source-id: dad1139ad0983cfbd5c5cd6fa6b71022f889735a
2022-04-25 18:52:33 -07:00
Jaromir Vanek
fb9a167a55 Add 95% confidence intervals to db_bench output (#9882)
Summary:
Enhancing `db_bench` output with 95% statistical confidence intervals for better performance evaluation. The goal is to unambiguously separate random variance when running benchmark over multiple iterations.

Output enhanced with confidence intervals exposed in brackets:

```
$ ./db_bench --benchmarks=fillseq[-X10]

Running benchmark for 10 times
fillseq      :       4.961 micros/op 201578 ops/sec;   22.3 MB/s
fillseq      :       5.030 micros/op 198824 ops/sec;   22.0 MB/s
fillseq [AVG 2 runs] : 200201 (± 2698) ops/sec;   22.1 (± 0.3) MB/sec
fillseq      :       4.963 micros/op 201471 ops/sec;   22.3 MB/s
fillseq [AVG 3 runs] : 200624 (± 1765) ops/sec;   22.2 (± 0.2) MB/sec
fillseq      :       5.035 micros/op 198625 ops/sec;   22.0 MB/s
fillseq [AVG 4 runs] : 200124 (± 1586) ops/sec;   22.1 (± 0.2) MB/sec
fillseq      :       4.979 micros/op 200861 ops/sec;   22.2 MB/s
fillseq [AVG 5 runs] : 200272 (± 1262) ops/sec;   22.2 (± 0.1) MB/sec
fillseq      :       4.893 micros/op 204367 ops/sec;   22.6 MB/s
fillseq [AVG 6 runs] : 200954 (± 1688) ops/sec;   22.2 (± 0.2) MB/sec
fillseq      :       4.914 micros/op 203502 ops/sec;   22.5 MB/s
fillseq [AVG 7 runs] : 201318 (± 1595) ops/sec;   22.3 (± 0.2) MB/sec
fillseq      :       4.998 micros/op 200074 ops/sec;   22.1 MB/s
fillseq [AVG 8 runs] : 201163 (± 1415) ops/sec;   22.3 (± 0.2) MB/sec
fillseq      :       4.946 micros/op 202188 ops/sec;   22.4 MB/s
fillseq [AVG 9 runs] : 201277 (± 1267) ops/sec;   22.3 (± 0.1) MB/sec
fillseq      :       5.093 micros/op 196331 ops/sec;   21.7 MB/s
fillseq [AVG 10 runs] : 200782 (± 1491) ops/sec;   22.2 (± 0.2) MB/sec
fillseq [AVG    10 runs] : 200782 (± 1491) ops/sec;   22.2 (± 0.2) MB/sec
fillseq [MEDIAN 10 runs] : 201166 ops/sec;   22.3 MB/s
```

For more explicit interval representation, use `--confidence_interval_only` flag:

```
$ ./db_bench --benchmarks=fillseq[-X10] --confidence_interval_only

Running benchmark for 10 times
fillseq      :       4.935 micros/op 202648 ops/sec;   22.4 MB/s
fillseq      :       5.078 micros/op 196943 ops/sec;   21.8 MB/s
fillseq [CI95 2 runs] : (194205, 205385) ops/sec; (21.5, 22.7) MB/sec
fillseq      :       5.159 micros/op 193816 ops/sec;   21.4 MB/s
fillseq [CI95 3 runs] : (192735, 202869) ops/sec; (21.3, 22.4) MB/sec
fillseq      :       4.947 micros/op 202158 ops/sec;   22.4 MB/s
fillseq [CI95 4 runs] : (194721, 203061) ops/sec; (21.5, 22.5) MB/sec
fillseq      :       4.908 micros/op 203756 ops/sec;   22.5 MB/s
fillseq [CI95 5 runs] : (196113, 203615) ops/sec; (21.7, 22.5) MB/sec
fillseq      :       5.063 micros/op 197528 ops/sec;   21.9 MB/s
fillseq [CI95 6 runs] : (196319, 202631) ops/sec; (21.7, 22.4) MB/sec
fillseq      :       5.214 micros/op 191799 ops/sec;   21.2 MB/s
fillseq [CI95 7 runs] : (194953, 201803) ops/sec; (21.6, 22.3) MB/sec
fillseq      :       5.260 micros/op 190095 ops/sec;   21.0 MB/s
fillseq [CI95 8 runs] : (193749, 200937) ops/sec; (21.4, 22.2) MB/sec
fillseq      :       5.076 micros/op 196992 ops/sec;   21.8 MB/s
fillseq [CI95 9 runs] : (194134, 200474) ops/sec; (21.5, 22.2) MB/sec
fillseq      :       5.388 micros/op 185603 ops/sec;   20.5 MB/s
fillseq [CI95 10 runs] : (192487, 199781) ops/sec; (21.3, 22.1) MB/sec
fillseq [AVG    10 runs] : 196134 (± 3647) ops/sec;   21.7 (± 0.4) MB/sec
fillseq [MEDIAN 10 runs] : 196968 ops/sec;   21.8 MB/sec
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9882

Reviewed By: pdillinger

Differential Revision: D35796148

Pulled By: vanekjar

fbshipit-source-id: 8313712d16728ff982b8aff28195ee56622385b8
2022-04-25 14:49:54 -07:00
Akanksha Mahajan
5bd374b392 Add experimental new FS API AbortIO to cancel read request (#9901)
Summary:
Add experimental new API AbortIO in FileSystem to abort the
read requests submitted asynchronously through ReadAsync API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9901

Test Plan: Existing tests

Reviewed By: anand1976

Differential Revision: D35885591

Pulled By: akankshamahajan15

fbshipit-source-id: df3944e6e9e6e487af1fa688376b4abb6837fb02
2022-04-25 14:20:03 -07:00
yuzhangyu
ac29645743 Add blob dump support to the dump_live_files command (#9896)
Summary:
This patch completes the second part of the task: "Add blob support to the dump and dump_live_files command"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9896

Reviewed By: ltamasi

Differential Revision: D35852667

Pulled By: jowlyzhang

fbshipit-source-id: a006456c881f468a92da689e895134762e9574e1
2022-04-22 16:54:43 -07:00
yuzhangyu
fff28a7725 Add blob dump support to the dump command (#9881)
Summary:
This patch is the first part of adding blob dump support. It only adds blob dump support to the dump command. A follow up patch will add blob dump support to the dump_live_files command.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9881

Reviewed By: ltamasi

Differential Revision: D35796731

Pulled By: jowlyzhang

fbshipit-source-id: 2cc5973b222d505a331ac7b969edcf992b47c5ee
2022-04-21 20:37:07 -07:00
Yanqin Jin
d13825e586 Add rollback_deletion_type_callback to TxnDBOptions (#9873)
Summary:
This PR does not affect write-committed.

Add a member, `rollback_deletion_type_callback` to TransactionDBOptions
so that a write-prepared transaction, when rolling back, can call this
callback to decide if a `Delete` or `SingleDelete` should be used to
cancel a prior `Put` written to the database during prepare phase.

The purpose of this PR is to prevent mixing `Delete` and `SingleDelete`
for the same key, causing undefined behaviors. Without this PR, the
following can happen:

```
// The application always issues SingleDelete when deleting keys.

txn1->Put('a');
txn1->Prepare(); // writes to memtable and potentially gets flushed/compacted to Lmax
txn1->Rollback();  // inserts DELETE('a')

txn2->Put('a');
txn2->Commit();  // writes to memtable and potentially gets flushed/compacted
```

In the database, we may have
```
L0:   [PUT('a', s=100)]
L1:   [DELETE('a', s=90)]
Lmax: [PUT('a', s=0)]
```

If a compaction compacts L0 and L1, then we have
```
L1:    [PUT('a', s=100)]
Lmax:  [PUT('a', s=0)]
```

If a future transaction issues a SingleDelete, we have
```
L0:    [SD('a', s=110)]
L1:    [PUT('a', s=100)]
Lmax:  [PUT('a', s=0)]
```

Then, a compaction including L0, L1 and Lmax leads to
```
Lmax:  [PUT('a', s=0)]
```

which is incorrect.

Similar bugs reported and addressed in
https://github.com/cockroachdb/pebble/issues/1255. Based on our team's
current priority, we have decided to take this approach for now. We may
come back and revisit in the future.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9873

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D35762170

Pulled By: riversand963

fbshipit-source-id: b28d56eefc786b53c9844b9ef4a7807acdd82c8d
2022-04-20 18:57:32 -07:00
Peter Dillinger
1bac873fcf Mark GetLiveFilesStorageInfo ready for production use (#9868)
Summary:
... by filling out remaining testing hole: handling of
db_pathsi+cf_paths. (Note that while GetLiveFilesStorageInfo works
with db_paths / cf_paths, Checkpoint and BackupEngine do not and
are marked appropriately.)

Also improved comments for "live files" APIs, and grouped them
together in db.h.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9868

Test Plan: Adding to existing unit tests

Reviewed By: jay-zhuang

Differential Revision: D35752254

Pulled By: pdillinger

fbshipit-source-id: c70eb67748fad61826e2f554b674638700abefb2
2022-04-20 16:09:34 -07:00
Jay Zhuang
2ea4205a69 Add 7.2 to compatible check (#9858)
Summary:
Add 7.2 to compatible check (should change it with version update).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9858

Reviewed By: riversand963

Differential Revision: D35722897

Pulled By: jay-zhuang

fbshipit-source-id: 08c782b9344599d7296543eb0c61afcd9a869a1a
2022-04-20 11:34:20 -07:00
yuzhangyu
9b5790f018 Add --decode_blob_index option to idump and dump commands (#9870)
Summary:
This patch completes the first part of the task: "Extend all three commands so they can decode and print blob references if a new option --decode_blob_index is specified"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9870

Reviewed By: ltamasi

Differential Revision: D35753932

Pulled By: jowlyzhang

fbshipit-source-id: 9d2bbba0eef2ed86b982767eba9de1b4881f35c9
2022-04-20 11:10:20 -07:00
Hui Xiao
a5063c8931 Fix issue of opening too many files in BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity (#9869)
Summary:
**Context:**
Unit test for https://github.com/facebook/rocksdb/pull/9748 keeps opening new files to see whether the new feature is able to correctly constrain the opening based on block cache capacity.

However, the unit test has two places written inefficiently that can lead to opening too many new files relative to underlying operating system/file system constraint, even before hitting the block cache capacity:
(1) [opened_table_reader_num < 2 * max_table_reader_num](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR438), which can leads to 1200 + open files because of (2) below
(2) NewLRUCache(6 * CacheReservationManagerImpl<CacheEntryRole::kBlockBasedTableReader>::GetDummyEntrySize()) in [here](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR364)

Therefore we see CI failures like this on machine with a strict open file limit ~1000 (see the "table_1021" naming in following error msg)
https://app.circleci.com/pipelines/github/facebook/rocksdb/12886/workflows/75524682-3fa4-41ee-9a61-81827b51d99b/jobs/345270
```
fs_->NewWritableFile(path, foptions, &file, nullptr)
IO error: While open a file for appending: /dev/shm/rocksdb.Jedwt/run-block_based_table_reader_test-CapMemoryUsageUnderCacheCapacity-BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity-0/block_based_table_reader_test_1668910_829492452552920927/**table_1021**: Too many open files
```

**Summary:**
- Revised the test more efficiently on the above 2 places,  including using 1.1 instead 2 in the threshold and lowering down the block cache capacity a bit
- Renamed some variables for clarity

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9869

Test Plan:
- Manual inspection of max opened table reader in all test case, which is around ~389
- Circle CI to see if error is gone

Reviewed By: ajkr

Differential Revision: D35752655

Pulled By: hx235

fbshipit-source-id: 8a0953d39d561babfa4257b8ed8550bb21b04839
2022-04-19 19:02:00 -07:00
Bo Wang
01fdec23fe Add release note for #9747 (#9874)
Summary:
Add release note for CompressedSecondaryCache and the update of SecondaryCache::Lookup().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9874

Reviewed By: jay-zhuang

Differential Revision: D35765973

Pulled By: gitbw95

fbshipit-source-id: 98232508c4f2047216def9c11a038cfb98709690
2022-04-19 18:24:58 -07:00
Peter Dillinger
682fc8ba6a Release note for #9546 (#9872)
Summary:
We don't really have a mechanism for internal-only release
notes, so adding this to the standard release notes. For picking into
7.2 release.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9872

Test Plan: release note only

Reviewed By: jay-zhuang

Differential Revision: D35761307

Pulled By: pdillinger

fbshipit-source-id: 5d1932767fff48456323df948604dbb956ac27b2
2022-04-19 16:44:05 -07:00
Federico Guerinoni
bbf5867353 Add C API for setting strict_capacity_limit (#9855)
Summary:
This allows to set with true the field `strict_capacity_limit` from C
API and other languages that wrap that.

Signed-off-by: Federico Guerinoni <guerinoni.federico@gmail.com>

Closes: https://github.com/facebook/rocksdb/issues/9707

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9855

Reviewed By: ajkr

Differential Revision: D35724150

Pulled By: jay-zhuang

fbshipit-source-id: d8514797e9d90b1cd88329018f9ac4776722aa0f
2022-04-19 09:34:02 -07:00
Andrew Kryczka
690f1edf37 Avoid overwriting OPTIONS file settings in db_bench (#9862)
Summary:
`InitializeOptionsGeneral()` was overwriting many options that were already configured by OPTIONS file, potentially with the flag default values. This PR changes that function to only overwrite options in limited scenarios, as described at the top of its definition. Block cache is still a violation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9862

Test Plan: ran under various scenarios (multi-DB, single DB, OPTIONS file, flags) and verified options are set as expected

Reviewed By: jay-zhuang

Differential Revision: D35736960

Pulled By: ajkr

fbshipit-source-id: 75b77740af37e6f5741618f8a8f5685df2417d03
2022-04-18 23:46:16 -07:00
Peter Dillinger
1601433b3a Misc CI improvements / additions (#9859)
Summary:
* Add valgrind test to nightly CircleCI (in case it can catch something that
ASAN/UBSAN does not)
* Add clang13+asan+ubsan+folly test to nightly CircleCI, for broader testing
* Consolidate many copies of ASAN_OPTIONS= while also allowing it to be
inherited from parent environment rather than always overridden.
* Move UBSAN exclusion from Makefile into options_settable_test.cc

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9859

Test Plan: CI

Reviewed By: jay-zhuang

Differential Revision: D35730903

Pulled By: pdillinger

fbshipit-source-id: 6f5464034e8115f9a07f6f7aec1de9219ec2837c
2022-04-18 20:26:37 -07:00
Hui Xiao
e83c55439a Conditionally declare and define variable that is unused in LITE mode (#9854)
Summary:
Context:
As mentioned in https://github.com/facebook/rocksdb/issues/9701, we have the following in LITE=1 make static_lib for v7.0.2
```
  CC       file/sequence_file_reader.o
  CC       file/sst_file_manager_impl.o
  CC       file/writable_file_writer.o
In file included from file/writable_file_writer.cc:10:
./file/writable_file_writer.h:163:15: error: private field 'temperature_' is not used [-Werror,-Wunused-private-field]
  Temperature temperature_;
              ^
1 error generated.
make: *** [file/writable_file_writer.o] Error 1
```

 as titled

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9854

Test Plan:
- Local `LITE=1 make static_lib` reveals the same error and error is gone after this fix
- CI

Reviewed By: ajkr, jay-zhuang

Differential Revision: D35706585

Pulled By: hx235

fbshipit-source-id: 7743310298231ad6866304ffa2225c8abdc91d9a
2022-04-18 14:16:35 -07:00
Peter Dillinger
41237dd306 Add "no compression" job to CircleCI (#9850)
Summary:
Since they operate at distinct abstraction layers, I thought it
was prudent to combine with EncryptedEnv CI test for each PR, for efficiency
in testing. Also added supported compressions to sst_dump --help output
so that CI job can verify no compiled-in compression support.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9850

Test Plan: CI, some manual stuff

Reviewed By: riversand963

Differential Revision: D35682346

Pulled By: pdillinger

fbshipit-source-id: be9879c1533fed304ee32c89fd9ba4b07c2b90cc
2022-04-18 12:47:16 -07:00
Jay Zhuang
3d473235d4 Update main version.h to NEXT release (7.3) (#9852)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9852

Reviewed By: ajkr

Differential Revision: D35694753

Pulled By: jay-zhuang

fbshipit-source-id: 729d416afc588e5db2367e899589bbb5419820d6
2022-04-18 10:26:21 -07:00
Jay Zhuang
673ada8225 Update HISTORY.md for 7.2 release (#9848)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9848

Reviewed By: riversand963

Differential Revision: D35677606

Pulled By: jay-zhuang

fbshipit-source-id: 8a597ea47f302a6f51fb6672a33c848d613bccfc
2022-04-16 17:15:47 -07:00
sdong
4f9c0fd083 Add Aggregation Merge Operator (#9780)
Summary:
Add a merge operator that allows users to register specific aggregation function so that they can does aggregation based per key using different aggregation types.
See comments of function CreateAggMergeOperator() for actual usage.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9780

Test Plan: Add a unit test to coverage various cases.

Reviewed By: ltamasi

Differential Revision: D35267444

fbshipit-source-id: 5b02f31c4f3e17e96dd4025cdc49fca8c2868628
2022-04-15 23:24:05 -07:00
Levi Tamasi
db536ee045 Propagate errors from UpdateBoundaries (#9851)
Summary:
In `FileMetaData`, we keep track of the lowest-numbered blob file
referenced by the SST file in question for the purposes of BlobDB's
garbage collection in the `oldest_blob_file_number` field, which is
updated in `UpdateBoundaries`. However, with the current code,
`BlobIndex` decoding errors (or invalid blob file numbers) are swallowed
in this method. The patch changes this by propagating these errors
and failing the corresponding flush/compaction. (Note that since blob
references are generated by the BlobDB code and also parsed by
`CompactionIterator`, in reality this can only happen in the case of
memory corruption.)

This change necessitated updating some unit tests that involved
fake/corrupt `BlobIndex` objects. Some of these just used a dummy string like
`"blob_index"` as a placeholder; these were replaced with real `BlobIndex`es.
Some were relying on the earlier behavior to simulate corruption; these
were replaced with `SyncPoint`-based test code that corrupts a valid
blob reference at read time.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9851

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D35683671

Pulled By: ltamasi

fbshipit-source-id: f7387af9945c48e4d5c4cd864f1ba425c7ad51f6
2022-04-15 20:25:48 -07:00
Yanqin Jin
be81609b43 Add a fail_if_not_bottommost_level to IngestExternalFileOptions (#9849)
Summary:
This new options allows application to specify that files must be
ingested to bottommost level, otherwise the ingestion will fail instead
of silently ingesting to a non-bottommost level.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9849

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D35680307

Pulled By: riversand963

fbshipit-source-id: 01cf54ef6c76198f7654dc06b5544631dea1be1e
2022-04-15 18:12:06 -07:00
Akanksha Mahajan
0c7f455f85 Make initial auto readahead_size configurable (#9836)
Summary:
Make initial auto readahead_size configurable

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9836

Test Plan:
Added new unit test
Ran regression:
Without change:

```
./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 7.0
Date:       Thu Mar 17 13:11:34 2022
CPU:        24 * Intel Core Processor (Broadwell)
CPUCache:   16384 KB
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     512 bytes each (256 bytes after compression)
Entries:    5000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    2594.0 MB (estimated)
FileSize:   1373.3 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
DB path: [/tmp/prefix_scan_prefetch_main]
seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
```

With this change:
```
 ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
Set seed to 1649895440554504 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 7.2
Date:       Wed Apr 13 17:17:20 2022
CPU:        24 * Intel Core Processor (Broadwell)
CPUCache:   16384 KB
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     512 bytes each (256 bytes after compression)
Entries:    5000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    2594.0 MB (estimated)
FileSize:   1373.3 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
DB path: [/tmp/prefix_scan_prefetch_main]
... finished 100 ops
seekrandom   :  476892.488 micros/op 2 ops/sec;  344.6 MB/s (252 of 252 found)
```

Reviewed By: anand1976

Differential Revision: D35632815

Pulled By: akankshamahajan15

fbshipit-source-id: c8057a88f9294c9d03b1d434b03affe02f74d796
2022-04-15 17:28:09 -07:00
sdong
d5dfa8c6fe Upgrade development environment. (#9843)
Summary:
It's to support Meta's internal environment platform010. Gcc still doesn't work but USE_CLANG=1 should work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9843

Test Plan: Try to make and ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 USE_CLANG=1 make

Reviewed By: pdillinger

Differential Revision: D35652507

fbshipit-source-id: a4a14b2fa4a2d6ca6fbf1b65060e81c39f079363
2022-04-15 16:05:38 -07:00
Jay Zhuang
e91ec64cac Remove flaky servicelab metrics DBPut P95/P99 (#9844)
Summary:
The P95 and P99 metrics are flaky, similar to DBGet ones which removed
in https://github.com/facebook/rocksdb/issues/9742 .

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9844

Test Plan: `$ ./buckifier/buckify_rocksdb.py`

Reviewed By: ajkr

Differential Revision: D35655531

Pulled By: jay-zhuang

fbshipit-source-id: c1409f0fba4e23d461a65f988c27ac5e2ae85d13
2022-04-15 13:56:22 -07:00
yuzhangyu
082eb04200 Add option --decode_blob_index to dump_live_files command (#9842)
Summary:
This change only add decode blob index support to dump_live_files command, which is part of a task to add blob support to a few commands.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9842

Reviewed By: ltamasi

Differential Revision: D35650167

Pulled By: jowlyzhang

fbshipit-source-id: a78151b98bc38ac6f52c6e01ca6927a3429ddd14
2022-04-15 09:04:04 -07:00
Yanqin Jin
fe63899d1a Add checks to GetUpdatesSince (#9459)
Summary:
Make `DB::GetUpdatesSince` return early if told to scan WALs generated by transactions
with write-prepared or write-unprepared policies (`seq_per_batch` is true), as indicated by
API comment.

Also add checks to `TransactionLogIterator` to clarify some conditions.

No API change.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9459

Test Plan:
make check

Closing https://github.com/facebook/rocksdb/issues/1565

Reviewed By: akankshamahajan15

Differential Revision: D33821243

Pulled By: riversand963

fbshipit-source-id: c8b155d020ce0980e2d3b3b1da40b96e65b48d79
2022-04-14 17:12:16 -07:00
Yanqin Jin
0bd4dcde6b CompactionIterator sees consistent view of which keys are committed (#9830)
Summary:
**This PR does not affect the functionality of `DB` and write-committed transactions.**

`CompactionIterator` uses `KeyCommitted(seq)` to determine if a key in the database is committed.
As the name 'write-committed' implies, if write-committed policy is used, a key exists in the database only if
it is committed. In fact, the implementation of `KeyCommitted()` is as follows:

```
inline bool KeyCommitted(SequenceNumber seq) {
  // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
  return snapshot_checker_ == nullptr ||
         snapshot_checker_->CheckInSnapshot(seq, kMaxSequence) == SnapshotCheckerResult::kInSnapshot;
}
```

With that being said, we focus on write-prepared/write-unprepared transactions.

A few notes:
- A key can exist in the db even if it's uncommitted. Therefore, we rely on `snapshot_checker_` to determine data visibility. We also require that all writes go through transaction API instead of the raw `WriteBatch` + `Write`, thus at most one uncommitted version of one user key can exist in the database.
- `CompactionIterator` outputs a key as long as the key is uncommitted.

Due to the above reasons, it is possible that `CompactionIterator` decides to output an uncommitted key without
doing further checks on the key (`NextFromInput()`). By the time the key is being prepared for output, the key becomes
committed because the `snapshot_checker_(seq, kMaxSequence)` becomes true in the implementation of `KeyCommitted()`.
Then `CompactionIterator` will try to zero its sequence number and hit assertion error if the key is a tombstone.

To fix this issue, we should make the `CompactionIterator` see a consistent view of the input keys. Note that
for write-prepared/write-unprepared, the background flush/compaction jobs already take a "job snapshot" before starting
processing keys. The job snapshot is released only after the entire flush/compaction finishes. We can use this snapshot
to determine whether a key is committed or not with minor change to `KeyCommitted()`.

```
inline bool KeyCommitted(SequenceNumber sequence) {
  // For non-txn-db and write-committed, snapshot_checker_ is always nullptr.
  return snapshot_checker_ == nullptr ||
         snapshot_checker_->CheckInSnapshot(sequence, job_snapshot_) ==
             SnapshotCheckerResult::kInSnapshot;
}
```

As a result, whether a key is committed or not will remain a constant throughout compaction, causing no trouble
for `CompactionIterator`s assertions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9830

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D35561162

Pulled By: riversand963

fbshipit-source-id: 0e00d200c195240341cfe6d34cbc86798b315b9f
2022-04-14 11:11:04 -07:00
Jonathan Albrecht
844a35108b Fix minimum libzstd version that supports ZSTD_STREAMING (#9841)
Summary:
The minimum libzstd version that has `ZSTD_compressStream2` is
1.4.0 so only define ZSTD_STREAMING in that case.

Fixes building on Ubuntu 18.04 which has libzstd 1.3.3 as its
repository version.

Fixes https://github.com/facebook/rocksdb/issues/9795

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9841

Test Plan:
Build and test on Ubuntu 18.04 with:
  apt-get install libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev \
    libzstd-dev libgflags-dev g++ make curl

Reviewed By: ajkr

Differential Revision: D35648738

Pulled By: jay-zhuang

fbshipit-source-id: 2a9e969bcc17a7dc10172f3817283409de885811
2022-04-14 11:05:39 -07:00
Andrew Kryczka
d6e016be6d Expose CacheEntryRole and map keys for block cache stat collections (#9838)
Summary:
This gives users the ability to examine the map populated by `GetMapProperty()` with property `kBlockCacheEntryStats`. It also sets us up for a possible future where cache reservations are configured according to `CacheEntryRole`s rather than flags coupled to roles.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9838

Test Plan:
- migrated test DBBlockCacheTest.CacheEntryRoleStats to use this API. That test verifies some of the contents are as expected
- added a DBPropertiesTest to verify the public map keys are present, and nothing else

Reviewed By: hx235

Differential Revision: D35629493

Pulled By: ajkr

fbshipit-source-id: 5c4356b8560e85d1f881fd32c44c15960b02fc68
2022-04-14 09:38:55 -07:00
Peter Dillinger
fefacd33e3 Add db_stress to buck build (#9840)
Summary:
For internal testing purposes (minimal deps)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9840

Test Plan: buck build :db_stress

Reviewed By: hx235

Differential Revision: D35635192

Pulled By: pdillinger

fbshipit-source-id: eefca3bcea174de6fdcdc1c763774f3134c7342c
2022-04-13 23:54:35 -07:00
Peter Dillinger
b3a6fb7e86 Serialize a space-hungry test (#9837)
Summary:
Tends to fill up /dev/shm

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9837

Test Plan: Some manual testing

Reviewed By: hx235

Differential Revision: D35627568

Pulled By: pdillinger

fbshipit-source-id: 22710f7b10bc287570475dae42318dd346f78db9
2022-04-13 17:10:43 -07:00
Levi Tamasi
5645207758 Expose the amount of garbage in live blob files as a dedicated DB property (#9835)
Summary:
This information has been already available as part of the `rocksdb.blob-stats`
string property. The patch adds a dedicated integer property to make it easier
to surface this information in monitoring systems.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9835

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D35619495

Pulled By: ltamasi

fbshipit-source-id: 03fb0b228aa27d3859a1e3783bcb7eca095607f8
2022-04-13 13:36:30 -07:00
Jay Zhuang
dc1c90c4e3 Support canceling running RemoteCompaction on remote side (#9725)
Summary:
Add the ability to cancel remote compaction on the remote side by
setting `OpenAndCompactOptions.canceled` to true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9725

Test Plan: added unittest

Reviewed By: ajkr

Differential Revision: D35018800

Pulled By: jay-zhuang

fbshipit-source-id: be3652f9645e0347df429e42a5614d5a9b3a1ec4
2022-04-13 13:28:09 -07:00
Siying Dong
9454e744ed Update supported VS versions in INSTALL.md (#9823)
Summary:
We only run CI for VS2017 and VS2019 now, so the claim that users can build with "VS13" is stale.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9823

Reviewed By: riversand963

Differential Revision: D35511401

fbshipit-source-id: e3ae2643e26ab46753fea439599d2ed98abba439
2022-04-13 13:03:40 -07:00
Peter Dillinger
7c7df1850a Update main version.h to NEXT release (#9834)
Summary:
Henceforth, the version number in version.h shall reflect the
*next* version number to be tagged (to the best of our knowledge) rather
than the *previous* (unpatched) version.

The primary advantage is being able to distinguish (in source code `#if`s
or human running tools) the development version from the last released
version.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9834

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D35617373

Pulled By: pdillinger

fbshipit-source-id: f3286089d17b82409e6af08e5aa9c1affefe2862
2022-04-13 12:16:07 -07:00
Peter Dillinger
efd035164b Meta-internal folly integration with F14FastMap (#9546)
Summary:
Especially after updating to C++17, I don't see a compelling case for
*requiring* any folly components in RocksDB. I was able to purge the existing
hard dependencies, and it can be quite difficult to strip out non-trivial components
from folly for use in RocksDB. (The prospect of doing that on F14 has changed
my mind on the best approach here.)

But this change creates an optional integration where we can plug in
components from folly at compile time, starting here with F14FastMap to replace
std::unordered_map when possible (probably no public APIs for example). I have
replaced the biggest CPU users of std::unordered_map with compile-time
pluggable UnorderedMap which will use F14FastMap when USE_FOLLY is set.
USE_FOLLY is always set in the Meta-internal buck build, and a simulation of
that is in the Makefile for public CI testing. A full folly build is not needed, but
checking out the full folly repo is much simpler for getting the dependency,
and anything else we might want to optionally integrate in the future.

Some picky details:
* I don't think the distributed mutex stuff is actually used, so it was easy to remove.
* I implemented an alternative to `folly::constexpr_log2` (which is much easier
in C++17 than C++11) so that I could pull out the hard dependencies on
`ConstexprMath.h`
* I had to add noexcept move constructors/operators to some types to make
F14's complainUnlessNothrowMoveAndDestroy check happy, and I added a
macro to make that easier in some common cases.
* Updated Meta-internal buck build to use folly F14Map (always)

No updates to HISTORY.md nor INSTALL.md as this is not (yet?) considered a
production integration for open source users.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9546

Test Plan:
CircleCI tests updated so that a couple of them use folly.

Most internal unit & stress/crash tests updated to use Meta-internal latest folly.
(Note: they should probably use buck but they currently use Makefile.)

Example performance improvement: when filter partitions are pinned in cache,
they are tracked by PartitionedFilterBlockReader::filter_map_ and we can build
a test that exercises that heavily. Build DB with

```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters
```

and test with (simultaneous runs with & without folly, ~20 times each to see
convergence)

```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench_folly -readonly -use_existing_db -benchmarks=readrandom -num=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters -duration=40 -pin_l0_filter_and_index_blocks_in_cache
```

Average ops/s no folly: 26229.2
Average ops/s with folly: 26853.3 (+2.4%)

Reviewed By: ajkr

Differential Revision: D34181736

Pulled By: pdillinger

fbshipit-source-id: ffa6ad5104c2880321d8a1aa7187e00ab0d02e94
2022-04-13 07:34:01 -07:00
Jay Zhuang
f934a0af46 Add event listener support on remote compactor side (#9821)
Summary:
So the user is able to set event listener on the compactor
side.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9821

Test Plan: unittest added

Reviewed By: ajkr

Differential Revision: D35485388

Pulled By: jay-zhuang

fbshipit-source-id: 669d8a3aaee012b75b940470306756c03ffa09b2
2022-04-12 17:25:36 -07:00
Kinan Dak Albab
1eee99fc8c Fix usage of USE_RTTI flag in CMakeLists. (#9760)
Summary:
By default, rocksdb release compiles with `-fno-rtti`. This causes issues when linking with other code that requires RTTI. Documentation indicate that setting the environment variable `USE_RTTI=1` when compiling rocksdb can override this behavior so that `-fno-rtti` is not used (http://rocksdb.org/blog/2017/09/28/rocksdb-5-8-released.html). However, this environment flag had no effect due to a bug in how `CMakeLists.txt` refers to `USE_RTTI`. This PR fixes this issue.

Now, running `USE_RTTI=1 cmake <......>` is correctly recognized by cmake, and causes `ROCKSDB_USE_RTTI `to be defined and `-fno-rtti` not to be issued for release builds. Behavior when USE_RTTI=0 or USE_RTTI is not provided is unchanged.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9760

Reviewed By: jay-zhuang

Differential Revision: D35334552

Pulled By: mrambacher

fbshipit-source-id: e405fcac4e14b246642e52bc7e73b04bf143e5b6
2022-04-12 12:12:23 -07:00
dependabot[bot]
0b81efed1d Bump nokogiri from 1.13.3 to 1.13.4 in /docs (#9831)
Summary:
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.3 to 1.13.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p>
<blockquote>
<h2>1.13.4 / 2022-04-11</h2>
<h3>Security</h3>
<ul>
<li>Address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-24836">CVE-2022-24836</a>, a regular expression denial-of-service vulnerability. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-crjr-9rc5-ghw8">GHSA-crjr-9rc5-ghw8</a> for more information.</li>
<li>[CRuby] Vendored zlib is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2018-25032">CVE-2018-25032</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-v6gp-9mmm-c6p5">GHSA-v6gp-9mmm-c6p5</a> for more information.</li>
<li>[JRuby] Vendored Xerces-J (<code>xerces:xercesImpl</code>) is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-23437">CVE-2022-23437</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-xxx9-3xcr-gjj3">GHSA-xxx9-3xcr-gjj3</a> for more information.</li>
<li>[JRuby] Vendored nekohtml (<code>org.cyberneko.html</code>) is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-24839">CVE-2022-24839</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-gx8x-g87m-h5q6">GHSA-gx8x-g87m-h5q6</a> for more information.</li>
</ul>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored zlib is updated from 1.2.11 to 1.2.12. (See <a href="https://github.com/sparklemotion/nokogiri/blob/v1.13.x/LICENSE-DEPENDENCIES.md#platform-releases">LICENSE-DEPENDENCIES.md</a> for details on which packages redistribute this library.)</li>
<li>[JRuby] Vendored Xerces-J (<code>xerces:xercesImpl</code>) is updated from 2.12.0 to 2.12.2.</li>
<li>[JRuby] Vendored nekohtml (<code>org.cyberneko.html</code>) is updated from a fork of 1.9.21 to 1.9.22.noko2. This fork is now publicly developed at <a href="https://github.com/sparklemotion/nekohtml">https://github.com/sparklemotion/nekohtml</a></li>
</ul>
<hr />
<p>sha256sum:</p>
<pre><code>095ff1995ed3dda3ea98a5f08bdc54bef02be1ce4e7c81034c4812e5e7c6e7e3  nokogiri-1.13.4-aarch64-linux.gem
7ebfc7415c819bcd4e849627e879cef2fb328bec90e802e50d74ccd13a60ec75  nokogiri-1.13.4-arm64-darwin.gem
41efd87c121991de26ef0393ac713d687e539813c3b79e454a2e3ffeecd107ea  nokogiri-1.13.4-java.gem
ab547504692ada0cec9d2e4e15afab659677c3f4c1ac3ea639bf5212b65246a1  nokogiri-1.13.4-x64-mingw-ucrt.gem
fa5c64cfdb71642ed647428e4d0d75ee0f4d189cfb63560c66fd8bdf99eb146b  nokogiri-1.13.4-x64-mingw32.gem
d6f07cbcbc28b75e8ac5d6e729ffba3602dffa0ad16ffac2322c9b4eb9b971fc  nokogiri-1.13.4-x86-linux.gem
0f7a4fd13e25abe3f98663fef0d115d58fdeff62cf23fef12d368e42adad2ce6  nokogiri-1.13.4-x86-mingw32.gem
3eef282f00ad360304fbcd5d72eb1710ff41138efda9513bb49eec832db5fa3e  nokogiri-1.13.4-x86_64-darwin.gem
3978610354ec67b59c128d23259c87b18374ee1f61cb9ed99de7143a88e70204  nokogiri-1.13.4-x86_64-linux.gem
0d46044eb39271e3360dae95ed6061ce17bc0028d475651dc48db393488c83bc  nokogiri-1.13.4.gem
</code></pre>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/blob/v1.13.4/CHANGELOG.md">nokogiri's changelog</a>.</em></p>
<blockquote>
<h2>1.13.4 / 2022-04-11</h2>
<h3>Security</h3>
<ul>
<li>Address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-24836">CVE-2022-24836</a>, a regular expression denial-of-service vulnerability. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-crjr-9rc5-ghw8">GHSA-crjr-9rc5-ghw8</a> for more information.</li>
<li>[CRuby] Vendored zlib is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2018-25032">CVE-2018-25032</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-v6gp-9mmm-c6p5">GHSA-v6gp-9mmm-c6p5</a> for more information.</li>
<li>[JRuby] Vendored Xerces-J (<code>xerces:xercesImpl</code>) is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-23437">CVE-2022-23437</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-xxx9-3xcr-gjj3">GHSA-xxx9-3xcr-gjj3</a> for more information.</li>
<li>[JRuby] Vendored nekohtml (<code>org.cyberneko.html</code>) is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-24839">CVE-2022-24839</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-gx8x-g87m-h5q6">GHSA-gx8x-g87m-h5q6</a> for more information.</li>
</ul>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] Vendored zlib is updated from 1.2.11 to 1.2.12. (See <a href="https://github.com/sparklemotion/nokogiri/blob/v1.13.x/LICENSE-DEPENDENCIES.md#platform-releases">LICENSE-DEPENDENCIES.md</a> for details on which packages redistribute this library.)</li>
<li>[JRuby] Vendored Xerces-J (<code>xerces:xercesImpl</code>) is updated from 2.12.0 to 2.12.2.</li>
<li>[JRuby] Vendored nekohtml (<code>org.cyberneko.html</code>) is updated from a fork of 1.9.21 to 1.9.22.noko2. This fork is now publicly developed at <a href="https://github.com/sparklemotion/nekohtml">https://github.com/sparklemotion/nekohtml</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="4e2c4b2571"><code>4e2c4b2</code></a> version bump to v1.13.4</li>
<li><a href="6a20ee4d5d"><code>6a20ee4</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2510">https://github.com/facebook/rocksdb/issues/2510</a> from sparklemotion/flavorjones-encoding-reader-perfo...</li>
<li><a href="b848031a59"><code>b848031</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2509">https://github.com/facebook/rocksdb/issues/2509</a> from sparklemotion/flavorjones-parse-processing-inst...</li>
<li><a href="c0ecf3b6ef"><code>c0ecf3b</code></a> test: pend the LIBXML_LOADED_VERSION test on freebsd</li>
<li><a href="e444525ef1"><code>e444525</code></a> fix(perf): HTML4::EncodingReader detection</li>
<li><a href="1eb5580666"><code>1eb5580</code></a> style(rubocop): allow intentional use of empty initializer</li>
<li><a href="0feac5af68"><code>0feac5a</code></a> fix(dep): HTML parsing of processing instructions</li>
<li><a href="db72b906c5"><code>db72b90</code></a> test: recent nekohtml versions do not consider 'a' to be inline</li>
<li><a href="2af2a87985"><code>2af2a87</code></a> style(rubocop): allow intentional use of empty initializer</li>
<li><a href="ba7a28c9a2"><code>ba7a28c</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2499">https://github.com/facebook/rocksdb/issues/2499</a> from sparklemotion/2441-xerces-2.12.2-backport-v1.13.x</li>
<li>Additional commits viewable in <a href="https://github.com/sparklemotion/nokogiri/compare/v1.13.3...v1.13.4">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nokogiri&package-manager=bundler&previous-version=1.13.3&new-version=1.13.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

 ---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts).

</details>

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9831

Reviewed By: akankshamahajan15

Differential Revision: D35580365

Pulled By: jay-zhuang

fbshipit-source-id: f9d7d3096598418740e2c174d4dbc99a73e02dc6
2022-04-12 09:07:14 -07:00
Akanksha Mahajan
ae82d91492 Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true (#9634)
Summary:
1) In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
2) For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.

If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.

As a solution,
1. the corrupted WALs whose numbers are larger than the
corrupted wal and smaller than the new WAL will be moved to archive folder.
2. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9634

Test Plan:
1. Added new unit tests
                2. make crast_test -j

Reviewed By: riversand963

Differential Revision: D34463666

Pulled By: akankshamahajan15

fbshipit-source-id: e233d3af0ed4e2028ca0cf051e5a334a0fdc9d19
2022-04-11 15:39:31 -07:00
Akanksha Mahajan
63e68a4e77 Enable async prefetching for ReadOptions.readahead_size (#9827)
Summary:
Currently async prefetching is enabled for implicit internal auto readahead in FilePrefetchBuffer if `ReadOptions.async_io` is set. This PR enables async prefetching for `ReadOptions.readahead_size` when `ReadOptions.async_io` is set true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9827

Test Plan: Update unit test

Reviewed By: anand1976

Differential Revision: D35552129

Pulled By: akankshamahajan15

fbshipit-source-id: d9f9a96672852a591375a21eef15355cf3289f5c
2022-04-11 13:46:57 -07:00
mrambacher
b7db7eae26 Plugin Registry (#7949)
Summary:
Added a Plugin class to the ObjectRegistry.  Enabled compile-time and program-time addition of plugins to the Registry.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7949

Reviewed By: mrambacher

Differential Revision: D33517674

Pulled By: pdillinger

fbshipit-source-id: c3e3270aab76a489bfa9e85d78cdfca951912557
2022-04-11 13:44:09 -07:00
gitbw95
f241d082b6 Prevent double caching in the compressed secondary cache (#9747)
Summary:
###  **Summary:**
When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached.

**Changes include:**
1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions.
2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache().
3. Rename LRUSecondaryCache to CompressedSecondaryCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747

Test Plan:
**Test Scripts:**
1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB.
./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1

2. overwrite it to a stable state:
./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1

4. Run read tests with diffeernt cache setting:

T1:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000  --statistics -db=/db_bench_1

T2:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

T3:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

T4:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

**Before this PR**
| Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
|------------|-------------------------------------|----------------|
|520 MB | 0 MB | 85.5% |
|320 MB | 400 MB | 96.2% |
|520 MB | 400 MB | 98.3% |
|20 MB | 500 MB | 98.8% |

**Before this PR**
| Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
|------------|-------------------------------------|----------------|
|520 MB | 0 MB | 85.5% |
|320 MB | 400 MB | 99.9% |
|520 MB | 400 MB | 99.9% |
|20 MB | 500 MB | 99.2% |

Reviewed By: anand1976

Differential Revision: D35117499

Pulled By: gitbw95

fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2022-04-11 13:28:33 -07:00
Akanksha Mahajan
f3bcac39a6 Fix stress test failure in ReadAsync. (#9824)
Summary:
Fix stress test failure in ReadAsync by ignoring errors
injected during async read by FaultInjectionFS.
Failure:
```
 WARNING: prefix_size is non-zero but memtablerep != prefix_hash
Didn't get expected error from MultiGet.
num_keys 14 Expected 1 errors, seen 0
Callstack that injected the fault
Injected error type = 32538
Message: error;
#0   ./db_stress() [0x6f7dd4] rocksdb::port::SaveStack(int*, int)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/port/stack_trace.cc:152
https://github.com/facebook/rocksdb/issues/1   ./db_stress() [0x7f2bda] rocksdb::FaultInjectionTestFS::InjectThreadSpecificReadError(rocksdb::FaultInjectionTestFS::ErrorOperation, rocksdb::Slice*, bool, char*, bool, bool*)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:891
https://github.com/facebook/rocksdb/issues/2   ./db_stress() [0x7f2e78] rocksdb::TestFSRandomAccessFile::Read(unsigned long, unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, rocksdb::IODebugContext*) const	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/utilities/fault_injection_fs.cc:367
https://github.com/facebook/rocksdb/issues/3   ./db_stress() [0x6483d7] rocksdb::(anonymous namespace)::CompositeRandomAccessFileWrapper::Read(unsigned long, unsigned long, rocksdb::Slice*, char*) const	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/env/composite_env.cc:61
https://github.com/facebook/rocksdb/issues/4   ./db_stress() [0x654564] rocksdb::(anonymous namespace)::LegacyRandomAccessFileWrapper::Read(unsigned long, unsigned long, rocksdb::IOOptions const&, rocksdb::Slice*, char*, rocksdb::IODebugContext*) const	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/env/env.cc:152
https://github.com/facebook/rocksdb/issues/5   ./db_stress() [0x659b3b] rocksdb::FSRandomAccessFile::ReadAsync(rocksdb::FSReadRequest&, rocksdb::IOOptions const&, std::function<void (rocksdb::FSReadRequest const&, void*)>, void*, void**, std::function<void (void*)>*, rocksdb::IODebugContext*)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/./include/rocksdb/file_system.h:896
https://github.com/facebook/rocksdb/issues/6   ./db_stress() [0x8b8bab] rocksdb::RandomAccessFileReader::ReadAsync(rocksdb::FSReadRequest&, rocksdb::IOOptions const&, std::function<void (rocksdb::FSReadRequest const&, void*)>, void*, void**, std::function<void (void*)>*, rocksdb::Env::IOPriority)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/random_access_file_reader.cc:459
https://github.com/facebook/rocksdb/issues/7   ./db_stress() [0x8b501f] rocksdb::FilePrefetchBuffer::ReadAsync(rocksdb::IOOptions const&, rocksdb::RandomAccessFileReader*, rocksdb::Env::IOPriority, unsigned long, unsigned long, unsigned long, unsigned int)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/file_prefetch_buffer.cc:124
https://github.com/facebook/rocksdb/issues/8   ./db_stress() [0x8b55fc] rocksdb::FilePrefetchBuffer::PrefetchAsync(rocksdb::IOOptions const&, rocksdb::RandomAccessFileReader*, unsigned long, unsigned long, unsigned long, rocksdb::Env::IOPriority, bool&)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/file_prefetch_buffer.cc:363
https://github.com/facebook/rocksdb/issues/9   ./db_stress() [0x8b61f8] rocksdb::FilePrefetchBuffer::TryReadFromCacheAsync(rocksdb::IOOptions const&, rocksdb::RandomAccessFileReader*, unsigned long, unsigned long, rocksdb::Slice*, rocksdb::Status*, rocksdb::Env::IOPriority, bool)	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/file/file_prefetch_buffer.cc:482
https://github.com/facebook/rocksdb/issues/10  ./db_stress() [0x745e04] rocksdb::BlockFetcher::TryGetFromPrefetchBuffer()	/data/sandcastle/boxes/trunk-hg-fbcode-fbsource/fbcode/internal_repo_rocksdb/repo/table/block_fetcher.cc:76
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9824

Test Plan:
```
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_concurrent_memtable_write=0 --async_io=1 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 -- backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=5.037629726741734 --bottommost_compression_type=lz4hc --cache_index_and_filter_blocks=0 --cache_size=8388608 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=100 --compression_max_dict_buffer_bytes=1073741823 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --db=/home/akankshamahajan/dev/shm/rocksdb/rocksdb_crashtest_blackbox --db_write_buffer_size=8388608 --delpercent=0 --delrangepercent=0 --destroy_db_initially=0 - detect_filter_construct_corruption=1 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --expected_values_dir=/home/akankshamahajan/dev/shm/rocksdb/rocksdb_crashtest_expected --experimental_mempurge_threshold=8.772789063014715 --fail_if_options_file_error=0 --file_checksum_impl=crc32c --flush_one_in=1000000 --format_version=3 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=15 --index_type=3 --iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=False --long_running_snapshots=0 --mark_for_compaction_one_file_in=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_prefix_bloom_size_ratio=0.001 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=1000 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=32 --readpercent=100 --recycle_log_file_num=1 --reopen=0 --reserve_table_reader_memory=1 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --set_options_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=3 --unpartitioned_pinning=2 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=1 --use_multiget=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=0
```

Reviewed By: anand1976

Differential Revision: D35514566

Pulled By: akankshamahajan15

fbshipit-source-id: e2a868fdd7422604774c1419738f9926a21e92a4
2022-04-11 10:56:11 -07:00
Yanqin Jin
0ad9ee30ce Remove dead code (#9825)
Summary:
Options `preserve_deletes` and `iter_start_seqnum` have been removed since 7.0.

This PR removes dead code related to these two removed options.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9825

Test Plan: make check

Reviewed By: akankshamahajan15

Differential Revision: D35517950

Pulled By: riversand963

fbshipit-source-id: 86282ce5ec4087acb94a06a42a1b6d55b1715482
2022-04-11 10:26:55 -07:00
Duncan Bellamy
25e31d1a94 tools/db_bench_tool.cc use uint64_t instead of size_t (#9800)
Summary:
to fix compilation for 32bit

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9800

Reviewed By: riversand963

Differential Revision: D35404447

fbshipit-source-id: 6a1185bb38f3a718357aa120e3b26a1ea77f023d
2022-04-08 13:29:19 -07:00
Hui Xiao
f337542948 Fix a bug of TEST_SetRandomTableProperties due to non-zero padding between fields in TableProperties struct (#9812)
Summary:
Context:
https://github.com/facebook/rocksdb/pull/9748#discussion_r843134214 reveals an issue with TEST_SetRandomTableProperties when non-zero padding is used between the last string field and first non-string field in TableProperties.
Fixed by https://github.com/facebook/rocksdb/pull/9748#discussion_r843244375

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9812

Test Plan: No production code changes and rely on existing CI

Reviewed By: ajkr

Differential Revision: D35423680

Pulled By: hx235

fbshipit-source-id: fd855eef3d32771bb79c65bd7012ab8bb3c400ab
2022-04-07 12:25:43 -07:00
Akanksha Mahajan
3fc2eaf561 Fix valgrind test failure for async read (#9819)
Summary:
Since all plaftorms don't support io_uring. So updated the unit
test to take that into consideration when testing async reads in unit tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9819

Test Plan:
valgrind --error-exitcode=2 --leak-check=full ./prefetch_test
--gtest_filter=PrefetchTest2.ReadAsyncWithPosixFS
CircleCI jobs

Reviewed By: pdillinger

Differential Revision: D35469959

Pulled By: akankshamahajan15

fbshipit-source-id: b170459ec816487fc0a13b1d55dbbe4f754b2eba
2022-04-07 10:31:50 -07:00
Akanksha Mahajan
7ea26abb8b Fix reseting of async_read_in_progress_ variable in FilePrefetchBuffer to call Poll API (#9815)
Summary:
Currently RocksDB reset async_read_in_progress_ in callback
due to which underlying filesystem relying on Poll API won't be called
leading to stale memory access.
In order to fix it, async_read_in_progress_ will be reset after Poll API
is called to make sure underlying file_system waiting on Poll can clear
its state or take appropriate action.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9815

Test Plan: CircleCI tests

Reviewed By: anand1976

Differential Revision: D35451534

Pulled By: akankshamahajan15

fbshipit-source-id: b70ef6251a7aa9ed4876ba5e5100baa33d7d474c
2022-04-06 18:36:23 -07:00
sdong
e03f8a0c12 L0 Subcompaction to trim input files (#9802)
Summary:
When sub compaction is decided for L0->L1 compaction, most of the cases, all L0 files will be involved in all sub compactions. However, it is not always the case. When files are generally (but not strictly) inserted in sequential order, there can be a subset of L0 files invovled. Yet RocksDB always open all those L0 files, and build an iterator, read many of the files' first of last block with expensive readahead. We trim some input files to reduce overhead a little bit.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9802

Test Plan: Add a unit test to cover this case and manually validate the behavior while running the test.

Reviewed By: ajkr

Differential Revision: D35371031

fbshipit-source-id: 701ed7375b5cbe41672e93b38fe8a1503dad08b6
2022-04-06 18:19:19 -07:00
Peter Dillinger
8ce7cea93f Tests for filter compatibility (#9773)
Summary:
This change adds two unit tests that would each catch the
regression fixed in https://github.com/facebook/rocksdb/issues/9736

* TableMetaIndexKeys - detects any churn in metaindex block keys
generated by SST files using standard db_test_util configurations.
* BloomFilterCompatibility - this detects if any common built-in
FilterPolicy configurations fail to read filters generated by another.
(The regression bug caused NewRibbonFilterPolicy not to read filters
from NewBloomFilterPolicy and vice-versa.) This replaces some previous
tests that didn't really appear to be testing much of anything except
basic data correctness, which doesn't tell you a filter is being used.

Light refactoring in meta_blocks.cc/h to support inspecting metaindex
keys.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9773

Test Plan:
this is the test. Verified that 7.0.2 fails both tests and 7.0.3 passes.
With backporting for intentional API changes in 7.0, 6.29 also passes.

Reviewed By: ajkr

Differential Revision: D35236248

Pulled By: pdillinger

fbshipit-source-id: 493dfe9ad7e27524bf7c6c1af8a4b8c31bc6ef5a
2022-04-06 15:54:40 -07:00
anand76
c3d7e16252 Add WAL compression to stress tests (#9811)
Summary:
Add the WAL compression feature to the stress test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9811

Reviewed By: riversand963

Differential Revision: D35414316

Pulled By: anand1976

fbshipit-source-id: 0c17b1ec55679a52f088ad368798b57139bd921a
2022-04-06 15:47:09 -07:00
Peter Dillinger
ad32646e18 Remove public rocksdb-lego-determinator (#9803)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9803

Only use Meta-internal version now. precommit_checker.py also now obsolete

Bring back `make commit_prereq` in follow-up work

Reviewed By: jay-zhuang

Differential Revision: D35372283

fbshipit-source-id: 7428438ca51f878802c301d0d5591675e551a113
2022-04-06 14:27:01 -07:00
Akanksha Mahajan
0b8f885939 Update stats for Read and ReadAsync in random_access_file_reader for async prefetching (#9810)
Summary:
Update stats in random_access_file_reader for Read and
ReadAsync API to take into account the read latency for async
prefetching.

It also fixes ERROR_HANDLER_AUTORESUME_RETRY_COUNT stat whose value was
incorrect in portal.h

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9810

Test Plan: Update unit test

Reviewed By: anand1976

Differential Revision: D35433081

Pulled By: akankshamahajan15

fbshipit-source-id: aeec3901270e58a003ce6b5214bd25ddcb3a12a9
2022-04-06 14:26:53 -07:00
Hui Xiao
49623f9c8e Account memory of big memory users in BlockBasedTable in global memory limit (#9748)
Summary:
**Context:**
Through heap profiling, we discovered that `BlockBasedTableReader` objects can accumulate and lead to high memory usage (e.g, `max_open_file = -1`). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to https://github.com/facebook/rocksdb/pull/8428,  this PR is to track an estimate of `BlockBasedTableReader` object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation.

**Summary:**
- Approximate big memory users  (`BlockBasedTable::Rep` and `TableProperties` )' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary)
- Charge all of these memory usages to block cache on `BlockBasedTable::Open()` and release them on `~BlockBasedTable()` as there is no memory usage fluctuation of concern in between
- Refactor on CacheReservationManager (and its call-sites) to add concurrent support for BlockBasedTable  used in this PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9748

Test Plan:
- New unit tests
- db bench: `OpenDb` : **-0.52% in ms**
  - Setup `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576`
  - Repeated run with pre-change w/o feature and post-change with feature, benchmark `OpenDb`:  `./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true| egrep 'OpenDb:'`

#-run | (feature-off) avg milliseconds | std milliseconds | (feature-on) avg milliseconds | std milliseconds | change (%)
-- | -- | -- | -- | -- | --
10 | 11.4018 | 5.95173 | 9.47788 | 1.57538 | -16.87382694
20 | 9.23746 | 0.841053 | 9.32377 | 1.14074 | 0.9343477536
40 | 9.0876 | 0.671129 | 9.35053 | 1.11713 | 2.893283155
80 | 9.72514 | 2.28459 | 9.52013 | 1.0894 | -2.108041632
160 | 9.74677 | 0.991234 | 9.84743 | 1.73396 | 1.032752389
320 | 10.7297 | 5.11555 | 10.547 | 1.97692 | **-1.70275031**
640 | 11.7092 | 2.36565 | 11.7869 | 2.69377 | **0.6635807741**

-  db bench on write with cost to cache in WriteBufferManager (just in case this PR's CRM refactoring accidentally slows down anything in WBM) : `fillseq` : **+0.54% in micros/op**
`./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 | egrep 'fillseq'`

#-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  avg micros/op | std micros/op | change (%)
-- | -- | -- | -- | -- | --
10 | 6.15 | 0.260187 | 6.289 | 0.371192 | 2.260162602
20 | 7.28025 | 0.465402 | 7.37255 | 0.451256 | 1.267813605
40 | 7.06312 | 0.490654 | 7.13803 | 0.478676 | **1.060579461**
80 | 7.14035 | 0.972831 | 7.14196 | 0.92971 | **0.02254791432**

-  filter bench: `bloom filter`: **-0.78% in ms/key**
    - ` ./filter_bench -impl=2 -quick -reserve_table_builder_memory=true | grep 'Build avg'`

#-run | (pre-PR) avg ns/key | std ns/key | (post-PR)  ns/key | std ns/key | change (%)
-- | -- | -- | -- | -- | --
10 | 26.4369 | 0.442182 | 26.3273 | 0.422919 | **-0.4145720565**
20 | 26.4451 | 0.592787 | 26.1419 | 0.62451 | **-1.1465262**

- Crash test `python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1` killed as normal

Reviewed By: ajkr

Differential Revision: D35136549

Pulled By: hx235

fbshipit-source-id: 146978858d0f900f43f4eb09bfd3e83195e3be28
2022-04-06 10:33:00 -07:00
Ramkumar Vadivelu
633b7f15d5 Update/Fix API comments for OpenForReadOnly() and OpenAsSecondary() (#9807)
Summary:
Updates/fixes to API comments for OpenForReadOnly() and OpenAsSecondary()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9807

Reviewed By: ajkr

Differential Revision: D35419206

Pulled By: ramvadiv

fbshipit-source-id: ac2514a14e4ec77b2ed34c5dca6251528c5b92f1
2022-04-05 20:22:47 -07:00
Andrew Kryczka
3ae9c5309b Remove explicit padding from CacheAlignedInstrumentedMutex (#9809)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/9779.

The padding at the end of a struct is added implicitly according to the
sizeof spec: "When applied to a class, the result is the
number of bytes in an object of that class including any padding
required for placing objects of that type in an array"
(https://eel.is/c++draft/expr.sizeof#2.sentence-2). We should drop the
explicit padding since it assumed support for zero-length arrays, which
is non-standard.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9809

Test Plan: rely on CI

Reviewed By: riversand963

Differential Revision: D35413496

Pulled By: ajkr

fbshipit-source-id: 25d52ca45e648ad0d5657149f26f6adecbed1cb4
2022-04-05 18:32:05 -07:00
gukaifeng
60ceb8d0e2 rename property "kIsFileDeletionsEnabled" to "kIsFileDeletionsDisabled" (#9791)
Summary:
The name of this property "kIsFileDeletionsEnabled" is very, very easy to misunderstand.

I think 0 represents false (i.e. disabled) and non-0 means true (enabled), and this property is just the opposite.

I modified the name of this property, and as few other positions as possible, so that the final meaning remains the same, but the name of this property is more common sense.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9791

Reviewed By: ajkr

Differential Revision: D35362166

Pulled By: jay-zhuang

fbshipit-source-id: 85310d88bdd131893effb64e1adb7d0d7b202f88
2022-04-05 17:16:47 -07:00
Changyu Bi
a180c5cc3a Added GetMergeOperands() to stress test (#9804)
Summary:
db_stress does not yet cover is GetMergeOperands(), added GetMergeOperands() to db_stress.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9804

Test Plan:
```make -j32 db_stress```

```python3 tools/db_crashtest.py blackbox --simple --interval=30 --duration=2400 --max_key=100000 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152 --value_size_mult=33```

Reviewed By: ajkr

Differential Revision: D35387137

Pulled By: cbi42

fbshipit-source-id: 8f851ef68b5af4d824128ad55ebe564f7ad6f7e6
2022-04-05 14:56:28 -07:00
Andrew Kryczka
04623e7cd4 Fix GetMergeOperands() heap-use-after-free on flushed memtable (#9805)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/9066.

Prior to the fix in this PR, this PR's unit test reported the following error under ASAN:

```
==2175705==ERROR: AddressSanitizer: heap-use-after-free on address 0x61f0000012a5 at pc 0x7f0fc36e76ce bp 0x7ffc103e9ca0 sp 0x7ffc103e9450
READ of size 5 at 0x61f0000012a5 thread T0
    #0 0x7f0fc36e76cd in __interceptor_memcpy /home/engshare/third-party2/gcc/9.x/src/gcc-10.x/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:790
    https://github.com/facebook/rocksdb/issues/1 0x7f0fc35a207e in std::char_traits<char>::copy(char*, char const*, unsigned long) /home/engshare/third-party2/libgcc/9.x/src/gcc-9.x/x86_64-facebook-linux/libstdc++-v3/include/bits/char_traits.h:365
    https://github.com/facebook/rocksdb/issues/2 0x7f0fc35a207e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_S_copy(char*, char const*, unsigned long) /home/engshare/third-party2/libgcc/9.x/src/gcc-9.x/x86_64-facebook-linux/libstdc++-v3/include/bits/basic_string.h:351
    https://github.com/facebook/rocksdb/issues/3 0x7f0fc35a207e in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_replace(unsigned long, unsigned long, char const*, unsigned long) /home/engshare/third-party2/libgcc/9.x/src/gcc-9.x/x86_64-facebook-linux/libstdc++-v3/include/bits/basic_string.tcc:440
    https://github.com/facebook/rocksdb/issues/4 0x8679ca in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::assign(char const*, unsigned long) /mnt/gvfs/third-party2/libgcc/4959b39cfbe5965a37c861c4c327fa7c5c759b87/9.x/platform009/9202ce7/include/c++/9.3.0/bits/basic_string.h:1422
    https://github.com/facebook/rocksdb/issues/5 0x8679ca in rocksdb::PinnableSlice::PinSelf(rocksdb::Slice const&) include/rocksdb/slice.h:171
    https://github.com/facebook/rocksdb/issues/6 0x8679ca in rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImplOptions&) db/db_impl/db_impl.cc:1930
    https://github.com/facebook/rocksdb/issues/7 0x547324 in rocksdb::DBImpl::GetMergeOperands(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, rocksdb::GetMergeOperandsOptions*, int*) db/db_impl/db_impl.h:203
    https://github.com/facebook/rocksdb/issues/8 0x547324 in rocksdb::DBMergeOperandTest_FlushedMergeOperandReadAfterFreeBug_Test::TestBody() db/db_merge_operand_test.cc:117
    https://github.com/facebook/rocksdb/issues/9 0x7241da in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
    https://github.com/facebook/rocksdb/issues/10 0x7241da in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
    https://github.com/facebook/rocksdb/issues/11 0x701a47 in testing::Test::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3973
    https://github.com/facebook/rocksdb/issues/12 0x702040 in testing::Test::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3965
    https://github.com/facebook/rocksdb/issues/13 0x702040 in testing::TestInfo::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4149
    https://github.com/facebook/rocksdb/issues/14 0x7025f7 in testing::TestInfo::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4124
    https://github.com/facebook/rocksdb/issues/15 0x7025f7 in testing::TestCase::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4267
    https://github.com/facebook/rocksdb/issues/16 0x704217 in testing::TestCase::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4253
    https://github.com/facebook/rocksdb/issues/17 0x704217 in testing::internal::UnitTestImpl::RunAllTests() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6633
    https://github.com/facebook/rocksdb/issues/18 0x72505a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
    https://github.com/facebook/rocksdb/issues/19 0x72505a in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
    https://github.com/facebook/rocksdb/issues/20 0x704aa1 in testing::UnitTest::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6242
    https://github.com/facebook/rocksdb/issues/21 0x4c4aff in RUN_ALL_TESTS() third-party/gtest-1.8.1/fused-src/gtest/gtest.h:22110
    https://github.com/facebook/rocksdb/issues/22 0x4c4aff in main db/db_merge_operand_test.cc:404
    https://github.com/facebook/rocksdb/issues/23 0x7f0fc3108dc4 in __libc_start_main ../csu/libc-start.c:308
    https://github.com/facebook/rocksdb/issues/24 0x5445fd in _start (/data/users/andrewkr/rocksdb/db_merge_operand_test+0x5445fd)

0x61f0000012a5 is located 1061 bytes inside of 3264-byte region [0x61f000000e80,0x61f000001b40)
freed by thread T0 here:
    #0 0x7f0fc375b6af in operator delete(void*, unsigned long) /home/engshare/third-party2/gcc/9.x/src/gcc-10.x/libsanitizer/asan/asan_new_delete.cc:177
    https://github.com/facebook/rocksdb/issues/1 0x743be8 in rocksdb::SuperVersion::~SuperVersion() db/column_family.cc:432
    https://github.com/facebook/rocksdb/issues/2 0x8052aa in rocksdb::DBImpl::CleanupSuperVersion(rocksdb::SuperVersion*) db/db_impl/db_impl.cc:3534
    https://github.com/facebook/rocksdb/issues/3 0x8676c2 in rocksdb::DBImpl::ReturnAndCleanupSuperVersion(rocksdb::ColumnFamilyData*, rocksdb::SuperVersion*) db/db_impl/db_impl.cc:3544
    https://github.com/facebook/rocksdb/issues/4 0x8676c2 in rocksdb::DBImpl::GetImpl(rocksdb::ReadOptions const&, rocksdb::Slice const&, rocksdb::DBImpl::GetImplOptions&) db/db_impl/db_impl.cc:1911
    https://github.com/facebook/rocksdb/issues/5 0x547324 in rocksdb::DBImpl::GetMergeOperands(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*, rocksdb::Slice const&, rocksdb::PinnableSlice*, rocksdb::GetMergeOperandsOptions*, int*) db/db_impl/db_impl.h:203
    https://github.com/facebook/rocksdb/issues/6 0x547324 in rocksdb::DBMergeOperandTest_FlushedMergeOperandReadAfterFreeBug_Test::TestBody() db/db_merge_operand_test.cc:117
    https://github.com/facebook/rocksdb/issues/7 0x7241da in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
    https://github.com/facebook/rocksdb/issues/8 0x7241da in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
    https://github.com/facebook/rocksdb/issues/9 0x701a47 in testing::Test::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3973
    https://github.com/facebook/rocksdb/issues/10 0x702040 in testing::Test::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3965
    https://github.com/facebook/rocksdb/issues/11 0x702040 in testing::TestInfo::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4149
    https://github.com/facebook/rocksdb/issues/12 0x7025f7 in testing::TestInfo::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4124
    https://github.com/facebook/rocksdb/issues/13 0x7025f7 in testing::TestCase::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4267
    https://github.com/facebook/rocksdb/issues/14 0x704217 in testing::TestCase::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:4253
    https://github.com/facebook/rocksdb/issues/15 0x704217 in testing::internal::UnitTestImpl::RunAllTests() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6633
    https://github.com/facebook/rocksdb/issues/16 0x72505a in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3899
    https://github.com/facebook/rocksdb/issues/17 0x72505a in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:3935
    https://github.com/facebook/rocksdb/issues/18 0x704aa1 in testing::UnitTest::Run() third-party/gtest-1.8.1/fused-src/gtest/gtest-all.cc:6242
    https://github.com/facebook/rocksdb/issues/19 0x4c4aff in RUN_ALL_TESTS() third-party/gtest-1.8.1/fused-src/gtest/gtest.h:22110
    https://github.com/facebook/rocksdb/issues/20 0x4c4aff in main db/db_merge_operand_test.cc:404
    https://github.com/facebook/rocksdb/issues/21 0x7f0fc3108dc4 in __libc_start_main ../csu/libc-start.c:308
    https://github.com/facebook/rocksdb/issues/22 0x5445fd in _start (/data/users/andrewkr/rocksdb/db_merge_operand_test+0x5445fd)
...
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9805

Test Plan: following the fix in this PR, the new unit test passes

Reviewed By: jay-zhuang

Differential Revision: D35388415

Pulled By: ajkr

fbshipit-source-id: b39c5d002155906c8abc4a3429eca696dbf916d0
2022-04-05 12:26:36 -07:00
Yanqin Jin
1a1c5bda23 Disallow commit-time-batch for write-prepared/write-unprepared txn conditionally (#9794)
Summary:
For write-prepared/write-unprepared transactions,
GetCommitTimeWriteBatch() can be used only if the transaction is started
with `TransactionOptions::use_only_the_last_commit_time_batch_for_recovery` set
to true. Otherwise, it is possible that multiple uncommitted versions of the
same key exist in the database. During bottommost compaction, RocksDB may
set the sequence numbers of both to zero once they become committed, causing
output SST file to have two identical internal keys.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9794

Test Plan:
make check
pay special attention to the following
```
transaction_test --gtest_filter=MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest/*
```

Reviewed By: lth

Differential Revision: D35327214

Pulled By: riversand963

fbshipit-source-id: 3bae00a28359c10e96e4c6f676d20de5610d8a0f
2022-04-05 11:10:20 -07:00
Peter Dillinger
6534c6dea4 Fix remaining uses of "backupable" (#9792)
Summary:
Various renaming and fixes to get rid of remaining uses of
"backupable" which is terminology leftover from the original, flawed
design of BackupableDB. Now any DB can be backed up, using BackupEngine.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9792

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D35334386

Pulled By: pdillinger

fbshipit-source-id: 2108a42b4575c8cccdfd791c549aae93ec2f3329
2022-04-05 09:52:33 -07:00
Hui Xiao
9cd47ce554 Add Env::IOPriority to IOOptions (#9806)
Summary:
**Context/Todo:**
As requested, allow IOOptions to take in an Env::IOPriority for convenience to pass down rate limiter related hint to file system level and for future interaction between RocksDB internal's rate limiting and custom file system level's rate-limiting.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9806

Test Plan: No actual code changes in RocksDB internals

Reviewed By: ajkr

Differential Revision: D35388966

Pulled By: hx235

fbshipit-source-id: 5891c97c3f9184cd221a9ab8536ce8dfa8526c08
2022-04-05 08:46:48 -07:00
Akanksha Mahajan
36bc3da97f Fix segfault in FilePrefetchBuffer with async_io enabled (#9777)
Summary:
If FilePrefetchBuffer object is destroyed and then later Poll() calls callback on object which has been destroyed, it gives segfault on accessing destroyed object. It was caught after adding unit tests that tests Posix implementation of ReadAsync and Poll APIs.
This PR also updates and fixes existing IOURing tests which were not running locally because RocksDbIOUringEnable function wasn't defined and IOUring was disabled for those tests

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9777

Test Plan: Added new unit test

Reviewed By: anand1976

Differential Revision: D35254002

Pulled By: akankshamahajan15

fbshipit-source-id: 68e80054ffb14ae25c255920ebc6548ca5f130a1
2022-04-04 15:35:43 -07:00
Jay Zhuang
ec77a92882 Fix commit_prereq and other targets (#9797)
Summary:
Make `commit_prereq` work and a few other improvements:
* Remove gcc 481 and gcc5xx which are no longer supported
* Remove platform007 which is gone
* `make clean` work for both mac and linux
* `precommit_checker.py` to python3

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9797

Test Plan: `make commit_prereq`

Reviewed By: ajkr

Differential Revision: D35338536

Pulled By: jay-zhuang

fbshipit-source-id: 1e159962ab9d31c43c4b85de7d0f582d3e881ffe
2022-04-04 09:58:18 -07:00
SGZW
f68706409d Fix typo about file/sst_file_manager_impl.h (#9799)
Summary:
Fix typo deletition-> deletion

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9799

Reviewed By: ajkr

Differential Revision: D35341617

Pulled By: jay-zhuang

fbshipit-source-id: 32bc384b99e5564f6a673076c6a4f160ee6c2e46
2022-04-04 09:57:33 -07:00
sdong
d4159c8046 build_tools/rocksdb-lego-determinator to pass parallelism information for no_compression (#9796)
Summary:
Right now, parallelism information passed to "build_tools/rocksdb-lego-determinator no_compression" isn't effective when the test actually runs, as the information is dropped in the middle. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9796

Test Plan: Run "build_tools/rocksdb-lego-determinator no_compression" and execute the command line generated and observe the parallelism.

Reviewed By: jay-zhuang

Differential Revision: D35330085

fbshipit-source-id: e9b32d0520d61fbc2697ebd841099485f64482e3
2022-04-04 09:51:05 -07:00
Chen Lixiang
cd59b139fc Fix some typos in comments and HISTORY.md (#9798)
Summary:
compation --> compaction

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9798

Reviewed By: ajkr

Differential Revision: D35341611

Pulled By: jay-zhuang

fbshipit-source-id: 5ea07527c311de75cade219456b6ee52b23020f6
2022-04-04 09:32:57 -07:00
yaphet
fcd32e687b remove some break line (#9716)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9716

Reviewed By: mrambacher

Differential Revision: D35026096

Pulled By: jay-zhuang

fbshipit-source-id: 296c38418e2bb7948d7802e439a08c6621bdb49b
2022-04-02 09:51:53 -07:00
sdong
190d5c1318 Reduce build/test parallelism in build_tools/rocksdb-lego-determinator (#9788)
Summary:
build_tools/rocksdb-lego-determinator is to generate commands for continuous tests. Recently it changed to by default run tests in parallel with parallelism to be number of CPU processors. This sometimes causes out of space when running so many tests in parallel. Reduce the parallelism by half to temporarily work it around.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9788

Test Plan: Run build_tools/rocksdb-lego-determinator and watch generated commands.

Reviewed By: pdillinger

Differential Revision: D35327704

fbshipit-source-id: 95a8c51a111bb6ab62c456c74ab9c905b457ea8f
2022-04-01 16:38:08 -07:00
Bo Wang
bcabee737f Improve comments for some files (#9793)
Summary:
Update the comments, e.g. fixing typo, formatting, etc.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9793

Reviewed By: jay-zhuang

Differential Revision: D35323989

Pulled By: gitbw95

fbshipit-source-id: 4a72fc02b67abaae8be0d1439b68f9967a68052d
2022-04-01 16:06:14 -07:00
Andrew Kryczka
f246e56d0a Fix a few documentation errors including in public APIs (#9789)
Summary:
The internal WriteBatch doc wrongly indicated which optypes are followed by varstring. Updated some optypes according to the following code: 76383bea5d/db/write_batch.cc (L418-L429)

The `Iterator::Refresh()` + `DeleteRange()` bug was fixed in https://github.com/facebook/rocksdb/issues/9258; removed the warnings.

`GetMergeOperands()` does populate `*number_of_operands` including upon successful return: 76383bea5d/db/db_impl/db_impl.cc (L1917-L1919)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9789

Reviewed By: riversand963

Differential Revision: D35303421

Pulled By: ajkr

fbshipit-source-id: 9b0e1be5f6b2e2b31461e6c33ecb5f5381824452
2022-04-01 10:30:17 -07:00
Jay Zhuang
2876e6a13b Update internal benchmark version (#9787)
Summary:
So the build on dev server will work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9787

Test Plan: `$ make db_basic_bench` on dev server.

Reviewed By: ajkr

Differential Revision: D35295466

Pulled By: jay-zhuang

fbshipit-source-id: 58dccc65bc29e1185b97cbeb7630ed66deb604aa
2022-04-01 10:29:09 -07:00
Andrew Kryczka
bfea9e7c02 Add benchmark for GetMergeOperands() (#9785)
Summary:
There's an existing benchmark, "getmergeoperands", but it is unconventional in that it has multiple phases and hardcoded setup parameters.

This PR adds a different one, "readrandomoperands", that follows the pattern of other benchmarks of having a single phase and taking its configuration from existing flags.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9785

Test Plan:
```
$ ./db_bench -benchmarks=mergerandom -merge_operator=StringAppendOperator -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -compression_type=none -disable_auto_compactions=true
$ ./db_bench -use_existing_db=true -benchmarks=readrandomoperands -merge_operator=StringAppendOperator -disable_auto_compactions=true -duration=10
...
readrandomoperands :     542.082 micros/op 1844 ops/sec;    0.2 MB/s (11980 of 18999 found)
```

Reviewed By: jay-zhuang

Differential Revision: D35290412

Pulled By: ajkr

fbshipit-source-id: fb367ca614b128cef844a75f0e5d9dd7c3328d85
2022-03-31 21:23:58 -07:00
Yanqin Jin
6eafdf135a Encode min_log_number_to_keep and delete_wals_before in one version edit (#9766)
Summary:
min_log_number_to_keep denotes that the WALs whose numbers are below
this value **will** be deleted by RocksDB.
delete_wals_before will be used by RocksDB if
track_and_verify_wals_in_manifest is set to true. During recovery,
RocksDB uses the info encoded in delete_wals_before to reconstruct its
knowledge about what WALs to expect existing.
If these two tags are not encoded in the same VersionEdit, then it's
possible for min_log_number_to_keep=100 to exist, but
delete_wals_before=100 to be lost due to power failure. Subsequent
recovery will delete 99.log. If the db crashes again, the following
recovery will expect to see 99.log since there is no
delete_wals_before=100 in the MANIFEST, but the WAL is already deleted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9766

Test Plan:
First of all, make check.
Second, format compatibility.
SHORT_TEST=1 ./tools/check_format_compatible.sh

Reviewed By: ltamasi

Differential Revision: D35203623

Pulled By: riversand963

fbshipit-source-id: 45623fc4b4b50d299d5e0f9559a3a4c5e9522c8f
2022-03-31 20:00:52 -07:00
Jay Zhuang
76383bea5d Add microbench document (#9781)
Summary:
Add basic microbenchmark document

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9781

Reviewed By: gitbw95

Differential Revision: D35272866

Pulled By: jay-zhuang

fbshipit-source-id: f482e652151fd05ca46e29629261833f038a6075
2022-03-31 17:17:44 -07:00
sdong
bbcf7b192c Fix DB::Open() error logging (#9784)
Summary:
Right now we log a wrong error when DB::Open() fails. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9784

Test Plan: CI runs should pass

Reviewed By: ajkr, riversand963

Differential Revision: D35290203

fbshipit-source-id: ffc640afa27f6b0a2382ee153dc43f28d9e242be
2022-03-31 15:52:01 -07:00
Yanqin Jin
de9df6e818 Do not release and re-acquire dbmutex on memtable-switch if no listener (#9758)
Summary:
There is no need to release-and-acquire immediately when no listener is registered. This is
what we have been doing for `NotifyOnFlushBegin()`, `NotifyOnFlushCompleted()`, `NotifyOnCompactionBegin()`,
`NotifyOnCompactionCompleted()`, and some other `NotifyOnXX` methods in event_helpers.cc.
Do the same for `NotifyOnMemTableSealed ()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9758

Test Plan: make check

Reviewed By: jay-zhuang

Differential Revision: D35159552

Pulled By: riversand963

fbshipit-source-id: 6e0aac50bd5c8f506d809b6638c33a7a28d1e87f
2022-03-30 20:48:23 -07:00
bbkot
e55018a8ce fixing issue #8345 RocksDB does not work when using UNC network paths (#9384)
Summary:
Fix https://github.com/facebook/rocksdb/issues/8345
RocksDB does not work with network filesystem paths on Windows, e.g. "\\hostname\folder\..."

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9384

Reviewed By: mrambacher

Differential Revision: D33830622

Pulled By: riversand963

fbshipit-source-id: 2a99dc3c94415eb1460e110784b97d71600218f1
2022-03-30 15:55:31 -07:00
Peter Dillinger
105d7f0c7c Document SetOptions API (#9778)
Summary:
much needed

Some other minor tweaks also

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9778

Test Plan: existing tests

Reviewed By: ajkr

Differential Revision: D35258195

Pulled By: pdillinger

fbshipit-source-id: 974ddafc23a540aacceb91da72e81593d818f99c
2022-03-30 14:51:12 -07:00
Akanksha Mahajan
fd66005628 Add 'adaptive_readahead' and 'async_io' options to db_stress (#9750)
Summary:
Same as title

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9750

Test Plan:
export CRASH_TEST_EXT_ARGS=" --async_io=1 --adaptive_readahead=1;
make -j crash_test

Reviewed By: jay-zhuang

Differential Revision: D35114326

Pulled By: akankshamahajan15

fbshipit-source-id: 8b05c95be09f7aff6cb9eb757aa20a6520349d45
2022-03-30 13:52:37 -07:00
Hui Xiao
60106b91ac Add 7.0.fb/7.1.fb to check_format_compatible.sh (#9772)
Summary:
As titled

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9772

Test Plan: `./tools/check_format_compatible.sh 7.1.fb` (and manually removed 2.7.fb due to pre-existing assertion failure) passed compatibility test

Reviewed By: ajkr

Differential Revision: D35233659

Pulled By: hx235

fbshipit-source-id: 6b93263a5724d752347e04f1396628804c24a880
2022-03-30 11:11:39 -07:00
Jay Zhuang
d5c34fa8f4 Upgrade gbenchmark to 1.6.1 (#9775)
Summary:
Upgrade google benchmark to the latest 1.6.1.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9775

Test Plan: CI

Reviewed By: riversand963

Differential Revision: D35252889

Pulled By: jay-zhuang

fbshipit-source-id: 4d60dd1c6f522d0af0b3942ae8fa88e5ae17f34a
2022-03-30 10:09:49 -07:00
Jingjing Wang
5a085d789d pristine code
Summary:
This commit was generated using `mgt import`.
pristine code for third-party libraries:
third-party/benchmark

upgrade google benchmark to v1.6.1

contains a local patch that reverts [this](https://github.com/google/benchmark/pull/1227?fbclid=IwAR2CCmIJmjU62SPPQQf_t8kdAsMjYv_Pa_GxabYUOdQpGPZUHKwbnYS_1oE) and changs `enum Flags` to be `enum Flags : uint32_t`.

Reviewed By: chadaustin

Differential Revision: D35136540

fbshipit-source-id: f3662f953cd87956e5e9b767e55e3697f99d3b49
2022-03-29 15:06:17 -07:00
Peter Dillinger
40e3f30a28 Fix FileStorageInfo fields from GetLiveFilesMetaData (#9769)
Summary:
In making `SstFileMetaData` inherit from `FileStorageInfo`, I
overlooked setting some `FileStorageInfo` fields when then default
`SstFileMetaData()` ctor is used. This affected `GetLiveFilesMetaData()`.

Also removed some buggy `static_cast<size_t>`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9769

Test Plan: Updated tests

Reviewed By: jay-zhuang

Differential Revision: D35220383

Pulled By: pdillinger

fbshipit-source-id: 05b4ee468258dbd3699517e1124838bf405fe7f8
2022-03-29 14:36:35 -07:00
Jack Robison
5dbdb197f1 Fix broken zlib dependency, update it from 1.2.11 to 1.2.12 (#9764)
Summary:
Zlib (https://www.zlib.net/) has been updated to 1.2.12 due to CVE-2018-25032

- https://nvd.nist.gov/vuln/detail/CVE-2018-25032
- https://github.com/madler/zlib/issues/605

The source .tar.gz is no longer available, and the Makefile for rocksdb now fails as a result. This PR updates the dependency to the newer (and available) version, 1.2.12

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9764

Reviewed By: ajkr

Differential Revision: D35220367

Pulled By: jay-zhuang

fbshipit-source-id: 1f68ff8f048a6dba42077f048ac143468f0e2478
2022-03-29 13:35:09 -07:00
Adam Retter
f61df6524a Update the version of Visual Studio required (#9765)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9765

Reviewed By: ajkr

Differential Revision: D35220757

Pulled By: jay-zhuang

fbshipit-source-id: b7749aa9bd04e3c3d7757e5e64921ff422600ec0
2022-03-29 13:23:31 -07:00
Alan Paxton
b6ad0d958f Fb 9718 verify checksums is ignored (#9767)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/9718

The verify_checksums flag of read_options should be passed to the read options used by the BlockFetcher in a couple of cases where it is not at present. It will now happen (but did not, previously) on iteration and on [multi]get, where a fetcher is created as part of the iterate/get call.

This may result in much better performance in a few workloads where the client chooses to remove verification.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9767

Reviewed By: mrambacher

Differential Revision: D35218986

Pulled By: jay-zhuang

fbshipit-source-id: 329d29764bb70fbc7f2673440bc46c107a813bc8
2022-03-29 11:54:54 -07:00
Mark Callaghan
a5e5130556 Update HISTORY for db_bench changes (#9759)
Summary:
These should have been part of the original PRs that changed db_bench, but I forgot to do that.
The PRs are:
* https://github.com/facebook/rocksdb/pull/9740
* https://github.com/facebook/rocksdb/pull/9733

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9759

Test Plan: No test needed.

Reviewed By: jay-zhuang

Differential Revision: D35159553

Pulled By: mdcallag

fbshipit-source-id: b44d075527309ee0bd4c5a92e5dd94ebf72f363e
2022-03-28 16:02:53 -07:00
Akanksha Mahajan
33f8a08af2 Fix some errors in async prefetching in FilePrefetchBuffer (#9734)
Summary:
In ReadOption `async_io` which prefetches the data asynchronously, db_bench and db_stress runs were failing  because wrong data was prefetched which resulted in Error: Checksum mismatched. Wrong data was copied because capacity was less than actual size needed. It has been fixed in this PR.

Since there are two separate methods for async and sync prefetching, these changes are in async prefetching methods and any changes would not effect normal prefetching. I ran the regressions to make sure normal prefetching is fine.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9734

Test Plan:
1. CircleCI jobs

2.  Ran db_bench
```
. /db_bench -use_existing_db=true
-db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32
-value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680
-duration=120 -ops_between_duration_checks=1 -async_io=1 -adaptive_readahead=1

```
3. Ran db_stress test
```
export CRASH_TEST_EXT_ARGS=" --async_io=1 --adaptive_readahead=1"
make crash_test -j
```

4. Run regressions for async_io disabled.

Old flow without any async changes:
```
./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 7.0
Date:       Thu Mar 17 13:11:34 2022
CPU:        24 * Intel Core Processor (Broadwell)
CPUCache:   16384 KB
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     512 bytes each (256 bytes after compression)
Entries:    5000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    2594.0 MB (estimated)
FileSize:   1373.3 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
DB path: [/tmp/prefix_scan_prefetch_main]
seekrandom   :  483618.390 micros/op 2 ops/sec;  338.9 MB/s (249 of 249 found)
```

With async prefetching changes and async_io disabled to make sure in normal prefetching there is no regression.
 ```
 ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 --async_io=0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 7.1
Date:       Wed Mar 23 15:56:37 2022
CPU:        24 * Intel Core Processor (Broadwell)
CPUCache:   16384 KB
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     512 bytes each (256 bytes after compression)
Entries:    5000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    2594.0 MB (estimated)
FileSize:   1373.3 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
DB path: [/tmp/prefix_scan_prefetch_main]
seekrandom   :  481819.816 micros/op 2 ops/sec;  340.2 MB/s (250 of 250 found)
```

Reviewed By: riversand963

Differential Revision: D35058471

Pulled By: akankshamahajan15

fbshipit-source-id: 9233a1e6d97cea0c7a8111bfb9e8ac3251c341ce
2022-03-25 18:26:22 -07:00
Mark Callaghan
37de4e1d08 Correctly set ThreadState::tid (#9757)
Summary:
Fixes a bug introduced by me in https://github.com/facebook/rocksdb/pull/9733
That PR added a counter so that the per-thread seeds in ThreadState would
be unique even when --benchmarks had more than one test. But it incorrectly
used this counter as the value for ThreadState::tid as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9757

Test Plan:
Confirm that unexpectedly good QPS results on the regression tests return
to normal with this fix. I have confirmed that the QPS increase starts with
the PR 9733 diff.

Reviewed By: jay-zhuang

Differential Revision: D35149303

Pulled By: mdcallag

fbshipit-source-id: dee5cc36b7faaba6c3be6d6a253d3c2eaad72864
2022-03-25 15:30:28 -07:00
Hui Xiao
e2cb9aa27c Clarify Options::rate_limiter api doc for #9607 Rate-limit automatic WAL flush after each user write (#9745)
Summary:
As title for https://github.com/facebook/rocksdb/pull/9607

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9745

Test Plan: No code change

Reviewed By: ajkr

Differential Revision: D35096901

Pulled By: hx235

fbshipit-source-id: 6bd3671baecfdc04579b0a81a957bfaa7bed81e1
2022-03-25 15:16:07 -07:00
Jermy Li
b83263bbe4 jni: uniformly use GetByteArrayRegion() to copy bytes (#9380)
Summary:
Uniformly use GetByteArrayRegion() instead of GetByteArrayElements()
to copy bytes.
In addition, it can avoid an inefficient ReleaseByteArrayElements()
operation.
Some benefits of GetByteArrayRegion() can be referred to:
https://stackoverflow.com/a/2480493

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9380

Reviewed By: ajkr

Differential Revision: D35135474

Pulled By: jay-zhuang

fbshipit-source-id: a32c1774d37f2d22b9bcd105d83e0bb984b71b54
2022-03-25 10:24:58 -07:00
Mark Callaghan
1a130fa3c1 db_bench should use a good seed when --seed is not set or set to 0 (#9740)
Summary:
This is for https://github.com/facebook/rocksdb/issues/9737

I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.

Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).

With this change the actual seed is printed when it was 0 at process start:
  Set seed to 1647992570365606 because --seed was 0

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9740

Test Plan:
Perf results:

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
  readrandom   :       6.469 micros/op 154583 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
  readrandom   :       6.565 micros/op 152321 ops/sec;   16.9 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
  readrandom   :       6.461 micros/op 154777 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
  readrandom   :       6.525 micros/op 153244 ops/sec;   17.0 MB/s (4000000 of 4000000 found)

Reviewed By: jay-zhuang

Differential Revision: D35145361

Pulled By: mdcallag

fbshipit-source-id: 2b35b153ccec46b27d7c9405997523555fc51267
2022-03-25 10:12:27 -07:00
myasuka
98130c5a26 Enable READ_BLOCK_COMPACTION_MICROS to track stats (#9722)
Summary:
After commit [d642c60](d642c60bdc), the stats `READ_BLOCK_COMPACTION_MICROS` cannot record any compaction read duration, and it always report zero.

This PR targets to distinguish `READ_BLOCK_COMPACTION_MICROS` with `READ_BLOCK_GET_MICROS` so that `READ_BLOCK_COMPACTION_MICROS` could record the correct stats.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9722

Reviewed By: ajkr

Differential Revision: D35021870

Pulled By: jay-zhuang

fbshipit-source-id: f1a804994265e51465de64c2a08f2e0eeb6fc5a3
2022-03-24 15:06:24 -07:00
Jay Zhuang
81d1cdca7f Fix make clean fail after java build (#9710)
Summary:
Seems clean-rocksjava and clean-rocks conflict.
Also remove unnecessary step in java CI build, otherwise it will rebuild
the code again as java make sample do clean up first.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9710

Test Plan: `make rocksdbjava && make clean` should return success

Reviewed By: riversand963

Differential Revision: D35122872

Pulled By: jay-zhuang

fbshipit-source-id: 2a15b83e7a763c0fc0e42e1f35aac9551f951ece
2022-03-24 13:39:15 -07:00
Mark Callaghan
409635cb2a Add --slow_usecs option to determine when long op message is printed (#9732)
Summary:
This adds the --slow_usecs option with a default value of 1M. Operations that
take this much time have a message printed when --histogram=1, --stats_interval=0
and --stats_interval_seconds=0. The current code hardwired this to 20,000 usecs
and for some stress tests that reduced throughput by 20% or more.

This is for https://github.com/facebook/rocksdb/issues/9620

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9732

Test Plan:
./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100 --histogram=1
./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100000 --histogram=1

Reviewed By: jay-zhuang

Differential Revision: D35121522

Pulled By: mdcallag

fbshipit-source-id: daf27f937efd748980545d6395db332712fc078b
2022-03-24 13:39:01 -07:00
Peter Dillinger
cad809978a Fix heap use-after-free race with DropColumnFamily (#9730)
Summary:
Although ColumnFamilySet comments say that DB mutex can be
freed during iteration, as long as you hold a ref while releasing DB
mutex, this is not quite true because UnrefAndTryDelete might delete cfd
right before it is needed to get ->next_ for the next iteration of the
loop.

This change solves the problem by making a wrapper class that makes such
iteration easier while handling the tricky details of UnrefAndTryDelete
on the previous cfd only after getting next_ in operator++.

FreeDeadColumnFamilies should already have been obsolete; this removes
it for good. Similarly, ColumnFamilySet::iterator doesn't need to check
for cfd with 0 refs, because those are immediately deleted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9730

Test Plan:
was reported with ASAN on unit tests like
DBLogicalBlockSizeCacheTest.CreateColumnFamily (very rare); keep watching

Reviewed By: ltamasi

Differential Revision: D35038143

Pulled By: pdillinger

fbshipit-source-id: 0a5478d5be96c135343a00603711b7df43ae19c9
2022-03-24 13:05:17 -07:00
Alan Paxton
dec144f172 Extend Java RocksDB iterators to support indirect Byte Buffers (#9222)
Summary:
Extend Java RocksDB iterators to support indirect byte buffers, to add to the existing support for direct byte buffers.
Code to distinguish direct/indirect buffers is switched in Java, and a 2nd separate JNI call implemented to support indirect
buffers. Indirect support passes contained buffers using byte[]

There are some Java subclasses of iterator (WBWIIterator, SstFileReaderIterator) which also now have parallel JNI support functions implemented, along with direct/indirect switches in Java methods.

Closes https://github.com/facebook/rocksdb/issues/6282

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9222

Reviewed By: ajkr

Differential Revision: D35115283

Pulled By: jay-zhuang

fbshipit-source-id: f8d5d20b975aef700560fbcc99f707bb028dc42e
2022-03-24 12:50:38 -07:00
Alan Paxton
8ae0c33a7a Add new checksum type kXXH3 to Java API (#9749)
Summary:
Fix https://github.com/facebook/rocksdb/issues/9720

And make a couple of incidental tests test the thing they were meant to test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9749

Reviewed By: ajkr

Differential Revision: D35115298

Pulled By: jay-zhuang

fbshipit-source-id: d687d1f070d29216be9693601c71131bbea87c79
2022-03-24 12:33:12 -07:00
Mark Callaghan
f219e3d5d8 db_bench should fail on bad values for --compaction_fadvice and --value_size_distribution_type (#9741)
Summary:
db_bench quietly parses and ignores bad values for --compaction_fadvice and --value_size_distribution_type
I prefer that it fail for them as it does for bad option values in most other cases. Otherwise a benchmark
result will be provided for the wrong configuration and the result will be misleading.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9741

Test Plan:
These now fail:
./db_bench --compaction_fadvice=noney
Unknown compaction fadvice:noney

./db_bench --value_size_distribution_type=norma
Cannot parse distribution type 'norma'

While correct values continue to work:
 ./db_bench --value_size_distribution_type=normal
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags

./db_bench --compaction_fadvice=none
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags

Reviewed By: siying

Differential Revision: D35115973

Pulled By: mdcallag

fbshipit-source-id: c2b10de5c2d1ea7c7539e676f5bd556351f5d370
2022-03-24 11:46:27 -07:00
Yanqin Jin
862304a1fc Add two new targets to determinator (#9753)
Summary:
Test plan
```
build_tools/rocksdb-lego-determinator stress_crash_with_multiops_wc_txn
build_tools/rocksdb-lego-determinator stress_crash_with_multiops_wp_txn
```

Spot check the printed job spec.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9753

Reviewed By: jay-zhuang

Differential Revision: D35117116

Pulled By: riversand963

fbshipit-source-id: a7ed82e8cb9bc2fd13f4f00291c6a39457415fb0
2022-03-24 11:27:12 -07:00
Jay Zhuang
18463f8c00 Remove DBGet P95/P99 benchmark metrics (#9742)
Summary:
DBGet p95 and p99 have high variation, remove them for now.
Also increase the iteration to 3 to avoid false positive.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9742

Test Plan: Internal CI

Reviewed By: ajkr

Differential Revision: D35082820

Pulled By: jay-zhuang

fbshipit-source-id: facc1d56b94e54aa8c8852c207aae2ae4e4924b0
2022-03-24 10:08:35 -07:00
Mark Callaghan
d583d23d86 Avoid seed reuse when --benchmarks has more than one test (#9733)
Summary:
When --benchmarks has more than one test then the threads in one benchmark
will use the same set of seeds as the threads in the previous benchmark.
This diff fixe that.

This fixes https://github.com/facebook/rocksdb/issues/9632

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9733

Test Plan:
For this command line the block cache is 8GB, so it caches at most 1024 8KB blocks. Note that without
this diff the second run of readrandom has a much better response time because seed reuse means the
second run reads the same 1000 blocks as the first run and they are cached at that point. But with
this diff that does not happen.

./db_bench --benchmarks=fillseq,flush,compact0,waitforcompaction,levelstats,readrandom,readrandom --compression_type=zlib --num=10000000 --reads=1000 --block_size=8192

...

```
Level Files Size(MB)
--------------------
  0        0        0
  1       11      238
  2        9      253
  3        0        0
  4        0        0
  5        0        0
  6        0        0
```

 --- perf results without this diff

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      46.212 micros/op 21618 ops/sec;    2.4 MB/s (1000 of 1000 found)

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      21.963 micros/op 45450 ops/sec;    5.0 MB/s (1000 of 1000 found)

 --- perf results with this diff

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      47.213 micros/op 21126 ops/sec;    2.3 MB/s (1000 of 1000 found)

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      42.880 micros/op 23299 ops/sec;    2.6 MB/s (1000 of 1000 found)

Reviewed By: jay-zhuang

Differential Revision: D35089763

Pulled By: mdcallag

fbshipit-source-id: 1b50143a07afe876b8c8e5fa50dd94a8ce57fc6b
2022-03-24 08:57:48 -07:00
Peter Dillinger
727d11ceb4 Revise history of 7.1.0 for patch (#9746)
Summary:
This updates main branch with a HISTORY update going into
7.1.fb branch before tagging 7.1.0.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9746

Test Plan: HISTORY.md only

Reviewed By: ajkr, hx235

Differential Revision: D35099194

Pulled By: pdillinger

fbshipit-source-id: b74ea8b626118dac235e387038420829850b8da2
2022-03-24 08:48:45 -07:00
Yanqin Jin
c18c4a081c Add new determinators for multiops transactions stress test (#9708)
Summary:
Add determinators for multiops transactions stress test with
write-committed and write-prepared policies.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9708

Test Plan: Internal CI

Reviewed By: jay-zhuang

Differential Revision: D34967263

Pulled By: riversand963

fbshipit-source-id: 170a0842d56dccb6ed6bc0c5adfd33849acd6b31
2022-03-23 22:29:50 -07:00
Yanqin Jin
e0c84aa0dc Fix a race condition in WAL tracking causing DB open failure (#9715)
Summary:
There is a race condition if WAL tracking in the MANIFEST is enabled in a database that disables 2PC.

The race condition is between two background flush threads trying to install flush results to the MANIFEST.

Consider an example database with two column families: "default" (cfd0) and "cf1" (cfd1). Initially,
both column families have one mutable (active) memtable whose data backed by 6.log.

1. Trigger a manual flush for "cf1", creating a 7.log
2. Insert another key to "default", and trigger flush for "default", creating 8.log
3. BgFlushThread1 finishes writing 9.sst
4. BgFlushThread2 finishes writing 10.sst

```
Time  BgFlushThread1                                    BgFlushThread2
 |    mutex_.Lock()
 |    precompute min_wal_to_keep as 6
 |    mutex_.Unlock()
 |                                                     mutex_.Lock()
 |                                                     precompute min_wal_to_keep as 6
 |                                                     join MANIFEST write queue and mutex_.Unlock()
 |    write to MANIFEST
 |    mutex_.Lock()
 |    cfd1->log_number = 7
 |    Signal bg_flush_2 and mutex_.Unlock()
 |                                                     wake up and mutex_.Lock()
 |                                                     cfd0->log_number = 8
 |                                                     FindObsoleteFiles() with job_context->log_number == 7
 |                                                     mutex_.Unlock()
 |                                                     PurgeObsoleteFiles() deletes 6.log
 V
```

As shown in the above, BgFlushThread2 thinks that the min wal to keep is 6.log because "cf1" has unflushed data in 6.log (cf1.log_number=6).
Similarly, BgThread1 thinks that min wal to keep is also 6.log because "default" has unflushed data (default.log_number=6).
No WAL deletion will be written to MANIFEST because 6 is equal to `versions_->wals_.min_wal_number_to_keep`,
due to https://github.com/facebook/rocksdb/blob/7.1.fb/db/memtable_list.cc#L513:L514.
The bg flush thread that finishes last will perform file purging. `job_context.log_number` will be evaluated as 7, i.e.
the min wal that contains unflushed data, causing 6.log to be deleted. However, MANIFEST thinks 6.log should still exist.
If you close the db at this point, you won't be able to re-open it if `track_and_verify_wal_in_manifest` is true.

We must handle the case of multiple bg flush threads, and it is difficult for one bg flush thread to know
the correct min wal number until the other bg flush threads have finished committing to the manifest and updated
the `cfd::log_number`.
To fix this issue, we rename an existing variable `min_log_number_to_keep_2pc` to `min_log_number_to_keep`,
and use it to track WAL file deletion in non-2pc mode as well.
This variable is updated only 1) during recovery with mutex held, or 2) in the MANIFEST write thread.
`min_log_number_to_keep` means RocksDB will delete WALs below it, although there may be WALs
above it which are also obsolete. Formally, we will have [min_wal_to_keep, max_obsolete_wal]. During recovery, we
make sure that only WALs above max_obsolete_wal are checked and added back to `alive_log_files_`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9715

Test Plan:
```
make check
```
Also ran stress test below (with asan) to make sure it completes successfully.
```
TEST_TMPDIR=/dev/shm/rocksdb OPT=-g ASAN_OPTIONS=disable_coredump=0 \
CRASH_TEST_EXT_ARGS=--compression_type=zstd SKIP_FORMAT_BUCK_CHECKS=1 \
make J=52 -j52 blackbox_asan_crash_test
```

Reviewed By: ltamasi

Differential Revision: D34984412

Pulled By: riversand963

fbshipit-source-id: c7b21a8d84751bb55ea79c9f387103d21b231005
2022-03-23 19:41:31 -07:00
Yanqin Jin
29bec740f5 Return invalid argument if batch is null (#9744)
Summary:
Originally, a corruption will be returned by `DBImpl::WriteImpl(batch...)` if batch is
null. This is inaccurate since there is no data corruption.
Return `Status::InvalidArgument()` instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9744

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D35086268

Pulled By: riversand963

fbshipit-source-id: 677397b007a53bc25210eac0178d49c9797b5951
2022-03-23 14:28:13 -07:00
Mark Callaghan
6904fd0c86 db_bench should fail when an option uses an invalid compression type (#9729)
Summary:
This changes db_bench to fail at startup for invalid compression types. It had been
changing them to Snappy. For other invalid options it fails at startup.

This is for https://github.com/facebook/rocksdb/issues/9621

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9729

Test Plan:
This continues to work:
./db_bench --benchmarks=fillrandom --compression_type=lz4

This now fails rather than changing the compression type to Snappy
./db_bench --benchmarks=fillrandom --compression_type=lz44
Cannot parse compression type 'lz44'

Reviewed By: jay-zhuang

Differential Revision: D35081323

Pulled By: mdcallag

fbshipit-source-id: 9b38c835abddce11aa7feb235df63f53cf829981
2022-03-23 12:26:34 -07:00
Peter Dillinger
91687d70ea Fix a major performance bug in 7.0 re: filter compatibility (#9736)
Summary:
Bloom filters generated by pre-7.0 releases are not read by
7.0.x releases (and vice-versa) due to changes to FilterPolicy::Name()
in https://github.com/facebook/rocksdb/issues/9590. This can severely impact read performance and read I/O on
upgrade or downgrade with existing DB, but not data correctness.

To fix, we go back using the old, unified name in SST metadata but (for
a while anyway) recognize the aliases that could be generated by early
7.0.x releases. This unfortunately requires a public API change to avoid
interfering with all the good changes from https://github.com/facebook/rocksdb/issues/9590, but the API change
only affects users with custom FilterPolicy, which should be very few.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9736

Test Plan:
manual

Generate DBs with
```
./db_bench.7.0 -db=/dev/shm/rocksdb.7.0 -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=fillrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0
```
and similar. Compare with
```
for IMPL in 6.29 7.0 fixed; do for DB in 6.29 7.0 fixed; do echo "Testing $IMPL on $DB:"; ./db_bench.$IMPL -db=/dev/shm/rocksdb.$DB -use_existing_db -readonly -bloom_bits=10 -benchmarks=readrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -duration=10 2>&1 | grep micros/op; done; done
```

Results:
```
Testing 6.29 on 6.29:
readrandom   :      34.381 micros/op 29085 ops/sec;    3.2 MB/s (291999 of 291999 found)
Testing 6.29 on 7.0:
readrandom   :     190.443 micros/op 5249 ops/sec;    0.6 MB/s (52999 of 52999 found)
Testing 6.29 on fixed:
readrandom   :      40.148 micros/op 24907 ops/sec;    2.8 MB/s (249999 of 249999 found)
Testing 7.0 on 6.29:
readrandom   :     229.430 micros/op 4357 ops/sec;    0.5 MB/s (43999 of 43999 found)
Testing 7.0 on 7.0:
readrandom   :      33.348 micros/op 29986 ops/sec;    3.3 MB/s (299999 of 299999 found)
Testing 7.0 on fixed:
readrandom   :     152.734 micros/op 6546 ops/sec;    0.7 MB/s (65999 of 65999 found)
Testing fixed on 6.29:
readrandom   :      32.024 micros/op 31224 ops/sec;    3.5 MB/s (312999 of 312999 found)
Testing fixed on 7.0:
readrandom   :      33.990 micros/op 29390 ops/sec;    3.3 MB/s (294999 of 294999 found)
Testing fixed on fixed:
readrandom   :      28.714 micros/op 34825 ops/sec;    3.9 MB/s (348999 of 348999 found)
```

Just paying attention to order of magnitude of ops/sec (short test
durations, lots of noise), it's clear that with the fix we can read <= 6.29
& >= 7.0 at full speed, where neither 6.29 nor 7.0 can on both. And 6.29
release can properly read fixed DB at full speed.

Reviewed By: siying, ajkr

Differential Revision: D35057844

Pulled By: pdillinger

fbshipit-source-id: a46893a6af4bf084375ebe4728066d00eb08f050
2022-03-23 10:00:54 -07:00
Mark Callaghan
d71e5a5beb Add number of running flushes & compactions to --stats_per_interval output (#9726)
Summary:
This is for https://github.com/facebook/rocksdb/issues/9709 and add two lines to the end of DB Stats
for num-running-compactions and num-running-flushes.

For example ...

** DB Stats **
Uptime(secs): 6.0 total, 1.0 interval
Cumulative writes: 915K writes, 915K keys, 915K commit groups, 1.0 writes per commit group, ingest: 0.11 GB, 18.95 MB/s
Cumulative WAL: 915K writes, 0 syncs, 915000.00 writes per sync, written: 0.11 GB, 18.95 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 133K writes, 133K keys, 133K commit groups, 1.0 writes per commit group, ingest: 16.62 MB, 16.53 MB/s
Interval WAL: 133K writes, 0 syncs, 133000.00 writes per sync, written: 0.02 GB, 16.53 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
num-running-compactions: 0
num-running-flushes: 0

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9726

Reviewed By: jay-zhuang

Differential Revision: D35066759

Pulled By: mdcallag

fbshipit-source-id: c161fadd3c15c5aa715a820dab6bfedb46dc099b
2022-03-23 09:33:41 -07:00
Yanqin Jin
3bd150c442 Print information about all column families when using ldb (#9719)
Summary:
Before this PR, the following command prints only the default column
family's information in the end:
```
ldb --db=. --hex manifest_dump --verbose
```

We should print all column families instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9719

Test Plan:
`make check` makes sure nothing breaks.

Generate a DB, use the above command to verify all column families are
printed.

Reviewed By: akankshamahajan15

Differential Revision: D34992453

Pulled By: riversand963

fbshipit-source-id: de1d38c4539cd89f74e1a6240ad7a6e2416bf198
2022-03-22 20:29:01 -07:00
Akanksha Mahajan
f07eec1bf8 Add async_io read option in db_bench (#9735)
Summary:
Add async_io Read option in db_bench

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9735

Test Plan:
./db_bench -use_existing_db=true
-db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32
-value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680
-duration=120 -ops_between_duration_checks=1 -async_io=1

Reviewed By: riversand963

Differential Revision: D35058482

Pulled By: akankshamahajan15

fbshipit-source-id: 1522b638c79f6d85bb7408c67f6ab76dbabeeee7
2022-03-22 17:21:35 -07:00
Mark Callaghan
63a284a6ad For db_bench --benchmarks=fillseq with --num_multi_db load databases … (#9713)
Summary:
…in order

This fixes https://github.com/facebook/rocksdb/issues/9650
For db_bench --benchmarks=fillseq --num_multi_db=X it loads databases in sequence
rather than randomly choosing a database per Put. The benefits are:
1) avoids long delays between flushing memtables
2) avoids flushing memtables for all of them at the same point in time
3) puts same number of keys per database so that query tests will find keys as expected

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9713

Test Plan:
Using db_bench.1 without the change and db_bench.2 with the change:

for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq --num_multi_db=4 --num=10000000; du -hs /data/m/rx ; done

 --- without the change
    fillseq      :       3.188 micros/op 313682 ops/sec;   34.7 MB/s
    real    2m7.787s
    user    1m52.776s
    sys     0m46.549s
    2.7G    /data/m/rx

 --- with the change

    fillseq      :       3.149 micros/op 317563 ops/sec;   35.1 MB/s
    real    2m6.196s
    user    1m51.482s
    sys     0m46.003s
    2.7G    /data/m/rx

    Also, temporarily added a printf to confirm that the code switches to the next database at the right time
    ZZ switch to db 1 at 10000000
    ZZ switch to db 2 at 20000000
    ZZ switch to db 3 at 30000000

for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq,readrandom --num_multi_db=4 --num=100000; du -hs /data/m/rx ; done

 --- without the change, smaller database, note that not all keys are found by readrandom because databases have < and > --num keys

    fillseq      :       3.176 micros/op 314805 ops/sec;   34.8 MB/s
    readrandom   :       1.913 micros/op 522616 ops/sec;   57.7 MB/s (99873 of 100000 found)

 --- with the change, smaller database, note that all keys are found by readrandom

    fillseq      :       3.110 micros/op 321566 ops/sec;   35.6 MB/s
    readrandom   :       1.714 micros/op 583257 ops/sec;   64.5 MB/s (100000 of 100000 found)

Reviewed By: jay-zhuang

Differential Revision: D35030168

Pulled By: mdcallag

fbshipit-source-id: 2a18c4ec571d954cf5a57b00a11802a3608823ee
2022-03-22 10:36:24 -07:00
gitbw95
8102690a52 Update Cache::Release param from force_erase to erase_if_last_ref (#9728)
Summary:
The param name force_erase may be misleading, since the handle is erased only if it has last reference even if the param is set true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9728

Reviewed By: pdillinger

Differential Revision: D35038673

Pulled By: gitbw95

fbshipit-source-id: 0d16d1e8fed17b97eba7fb53207119332f659a5f
2022-03-22 10:22:18 -07:00
452 changed files with 21837 additions and 23317 deletions

View File

@ -2,7 +2,6 @@ version: 2.1
orbs:
win: circleci/windows@2.4.0
slack: circleci/slack@3.4.2
aliases:
- &notify-on-main-failure
@ -57,7 +56,6 @@ commands:
post-steps:
steps:
- slack/status: *notify-on-main-failure
- store_test_results: # store test result if there's any
path: /tmp/test-results
- store_artifacts: # store LOG for debugging if there's any
@ -102,10 +100,22 @@ commands:
install-benchmark:
steps:
- run:
name: Install ninja build
command: sudo apt-get update -y && sudo apt-get install -y ninja-build
- run:
name: Install benchmark
command: |
sudo apt-get update -y && sudo apt-get install -y libbenchmark-dev
git clone --depth 1 --branch v1.6.1 https://github.com/google/benchmark.git ~/benchmark
cd ~/benchmark && mkdir build && cd build
cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_GTEST_TESTS=0
ninja && sudo ninja install
install-valgrind:
steps:
- run:
name: Install valgrind
command: sudo apt-get update -y && sudo apt-get install -y valgrind
upgrade-cmake:
steps:
@ -170,7 +180,7 @@ jobs:
- increase-max-open-files-on-macos
- install-gflags-on-macos
- pre-steps-macos
- run: ulimit -S -n 1048576 && OPT=-DCIRCLECI make V=1 J=32 -j32 all
- run: ulimit -S -n `ulimit -H -n` && OPT=-DCIRCLECI make V=1 J=32 -j32 all
- post-steps
build-macos-cmake:
@ -189,7 +199,7 @@ jobs:
- pre-steps-macos
- run:
name: "cmake generate project file"
command: ulimit -S -n 1048576 && mkdir build && cd build && cmake -DWITH_GFLAGS=1 ..
command: ulimit -S -n `ulimit -H -n` && mkdir build && cd build && cmake -DWITH_GFLAGS=1 ..
- run:
name: "Build tests"
command: cd build && make V=1 -j32
@ -198,14 +208,14 @@ jobs:
steps:
- run:
name: "Run even tests"
command: ulimit -S -n 1048576 && cd build && ctest -j32 -I 0,,2
command: ulimit -S -n `ulimit -H -n` && cd build && ctest -j32 -I 0,,2
- when:
condition:
not: << parameters.run_even_tests >>
steps:
- run:
name: "Run odd tests"
command: ulimit -S -n 1048576 && cd build && ctest -j32 -I 1,,2
command: ulimit -S -n `ulimit -H -n` && cd build && ctest -j32 -I 1,,2
- post-steps
build-linux:
@ -218,14 +228,16 @@ jobs:
- run: make V=1 J=32 -j32 check
- post-steps
build-linux-encrypted-env:
build-linux-encrypted_env-no_compression:
machine:
image: ubuntu-2004:202111-02
resource_class: 2xlarge
steps:
- pre-steps
- install-gflags
- run: ENCRYPTED_ENV=1 make V=1 J=32 -j32 check
- run: ENCRYPTED_ENV=1 ROCKSDB_DISABLE_SNAPPY=1 ROCKSDB_DISABLE_ZLIB=1 ROCKSDB_DISABLE_BZIP=1 ROCKSDB_DISABLE_LZ4=1 ROCKSDB_DISABLE_ZSTD=1 make V=1 J=32 -j32 check
- run: |
./sst_dump --help | egrep -q 'Supported compression types: kNoCompression$' # Verify no compiled in compression
- post-steps
build-linux-shared_lib-alt_namespace-status_checked:
@ -306,7 +318,7 @@ jobs:
- pre-steps
- install-gflags
- install-clang-10
- run: ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 CC=clang-10 CXX=clang++-10 ROCKSDB_DISABLE_ALIGNED_NEW=1 USE_CLANG=1 make V=1 -j32 check # aligned new doesn't work for reason we haven't figured out
- run: COMPILE_WITH_ASAN=1 CC=clang-10 CXX=clang++-10 ROCKSDB_DISABLE_ALIGNED_NEW=1 USE_CLANG=1 make V=1 -j32 check # aligned new doesn't work for reason we haven't figured out
- post-steps
build-linux-clang10-mini-tsan:
@ -350,6 +362,17 @@ jobs:
- run: COMPILE_WITH_UBSAN=1 OPT="-fsanitize-blacklist=.circleci/ubsan_suppression_list.txt" CC=clang-10 CXX=clang++-10 ROCKSDB_DISABLE_ALIGNED_NEW=1 USE_CLANG=1 make V=1 -j32 ubsan_check # aligned new doesn't work for reason we haven't figured out
- post-steps
build-linux-valgrind:
machine:
image: ubuntu-2004:202111-02
resource_class: 2xlarge
steps:
- pre-steps
- install-gflags
- install-valgrind
- run: PORTABLE=1 make V=1 -j32 valgrind_test
- post-steps
build-linux-clang10-clang-analyze:
machine:
image: ubuntu-2004:202111-02
@ -362,7 +385,7 @@ jobs:
- run: CC=clang-10 CXX=clang++-10 ROCKSDB_DISABLE_ALIGNED_NEW=1 CLANG_ANALYZER="/usr/bin/clang++-10" CLANG_SCAN_BUILD=scan-build-10 USE_CLANG=1 make V=1 -j32 analyze # aligned new doesn't work for reason we haven't figured out. For unknown, reason passing "clang++-10" as CLANG_ANALYZER doesn't work, and we need a full path.
- post-steps
build-linux-cmake:
build-linux-cmake-with-folly:
machine:
image: ubuntu-2004:202111-02
resource_class: 2xlarge
@ -370,10 +393,11 @@ jobs:
- pre-steps
- install-gflags
- upgrade-cmake
- run: (mkdir build && cd build && cmake -DWITH_GFLAGS=1 .. && make V=1 -j20 && ctest -j20)
- run: make checkout_folly
- run: (mkdir build && cd build && cmake -DUSE_FOLLY=1 -DWITH_GFLAGS=1 .. && make V=1 -j20 && ctest -j20)
- post-steps
build-linux-cmake-ubuntu-20:
build-linux-cmake-with-benchmark:
machine:
image: ubuntu-2004:202111-02
resource_class: 2xlarge
@ -387,22 +411,25 @@ jobs:
build-linux-unity-and-headers:
docker: # executor type
- image: gcc:latest
environment:
EXTRA_CXXFLAGS: -mno-avx512f # Warnings-as-error in avx512fintrin.h, would be used on newer hardware
resource_class: large
steps:
- checkout # check out the code in the project directory
- run: apt-get update -y && apt-get install -y libgflags-dev
- run: TEST_TMPDIR=/dev/shm && make V=1 -j8 unity_test
- run: make V=1 -j8 unity_test
- run: make V=1 -j8 -k check-headers # could be moved to a different build
- post-steps
build-linux-gcc-7:
build-linux-gcc-7-with-folly:
machine:
image: ubuntu-2004:202111-02
resource_class: 2xlarge
steps:
- pre-steps
- run: sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test && sudo apt-get update -y && sudo apt-get install gcc-7 g++-7 libgflags-dev
- run: CC=gcc-7 CXX=g++-7 V=1 make -j32 check
- run: make checkout_folly
- run: USE_FOLLY=1 CC=gcc-7 CXX=g++-7 V=1 make -j32 check
- post-steps
build-linux-gcc-8-no_test_run:
@ -431,8 +458,8 @@ jobs:
resource_class: xlarge
steps:
- pre-steps
- install-benchmark
- run: sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test && sudo apt-get update -y && sudo apt-get install gcc-11 g++-11 libgflags-dev
- install-benchmark
- run: CC=gcc-11 CXX=g++-11 V=1 make -j16 all microbench
- post-steps
@ -447,6 +474,19 @@ jobs:
- run: CC=clang-13 CXX=clang++-13 USE_CLANG=1 make -j16 all microbench
- post-steps
# Ensure ASAN+UBSAN with folly, and full testsuite with clang 13
build-linux-clang-13-asan-ubsan-with-folly:
machine:
image: ubuntu-2004:202111-02
resource_class: 2xlarge
steps:
- pre-steps
- install-clang-13
- install-gflags
- run: make checkout_folly
- run: CC=clang-13 CXX=clang++-13 USE_CLANG=1 USE_FOLLY=1 COMPILE_WITH_UBSAN=1 COMPILE_WITH_ASAN=1 make -j32 check
- post-steps
# This job is only to make sure the microbench tests are able to run, the benchmark result is not meaningful as the CI host is changing.
build-linux-run-microbench:
machine:
@ -466,7 +506,7 @@ jobs:
- pre-steps
- install-gflags
- install-compression-libs
- run: make V=1 -j8 CRASH_TEST_EXT_ARGS=--duration=960 blackbox_crash_test_with_atomic_flush
- run: ulimit -S -n `ulimit -H -n` && make V=1 -j8 CRASH_TEST_EXT_ARGS=--duration=960 blackbox_crash_test_with_atomic_flush
- post-steps
build-windows:
@ -553,9 +593,6 @@ jobs:
echo 'export PATH=$JAVA_HOME/bin:$PATH' >> $BASH_ENV
which java && java -version
which javac && javac -version
- run:
name: "Build RocksDBJava Shared Library"
command: make V=1 J=8 -j8 rocksdbjava
- run:
name: "Test RocksDBJava"
command: make V=1 J=8 -j8 jtest
@ -601,9 +638,6 @@ jobs:
echo 'export PATH=$JAVA_HOME/bin:$PATH' >> $BASH_ENV
which java && java -version
which javac && javac -version
- run:
name: "Build RocksDBJava Shared Library"
command: make V=1 J=16 -j16 rocksdbjava
- run:
name: "Test RocksDBJava"
command: make V=1 J=16 -j16 jtest
@ -794,108 +828,76 @@ jobs:
workflows:
version: 2
build-linux:
jobs-linux-run-tests:
jobs:
- build-linux
build-linux-cmake:
jobs:
- build-linux-cmake
- build-linux-cmake-ubuntu-20
build-linux-encrypted-env:
jobs:
- build-linux-encrypted-env
build-linux-shared_lib-alt_namespace-status_checked:
jobs:
- build-linux-shared_lib-alt_namespace-status_checked
build-linux-lite:
jobs:
- build-linux-cmake-with-folly
- build-linux-gcc-7-with-folly
- build-linux-cmake-with-benchmark
- build-linux-encrypted_env-no_compression
- build-linux-lite
build-linux-release:
jobs:
- build-linux-release
build-linux-release-rtti:
jobs:
- build-linux-release-rtti
build-linux-lite-release:
jobs:
- build-linux-lite-release
build-linux-clang10-asan:
jobs-linux-run-tests-san:
jobs:
- build-linux-clang10-asan
build-linux-clang10-mini-tsan:
jobs:
- build-linux-clang10-ubsan
- build-linux-clang10-mini-tsan:
start_test: ""
end_test: "env_test"
- build-linux-clang10-mini-tsan:
start_test: "env_test"
end_test: ""
build-linux-clang10-ubsan:
- build-linux-shared_lib-alt_namespace-status_checked
jobs-linux-no-test-run:
jobs:
- build-linux-clang10-ubsan
build-linux-clang10-clang-analyze:
- build-linux-release
- build-linux-release-rtti
- build-linux-lite-release
- build-examples
- build-fuzzers
- build-linux-clang-no_test_run
- build-linux-clang-13-no_test_run
- build-linux-gcc-8-no_test_run
- build-linux-gcc-10-cxx20-no_test_run
- build-linux-gcc-11-no_test_run
- build-linux-arm-cmake-no_test_run
jobs-linux-other-checks:
jobs:
- build-linux-clang10-clang-analyze
build-linux-unity-and-headers:
jobs:
- build-linux-unity-and-headers
build-linux-mini-crashtest:
jobs:
- build-linux-mini-crashtest
build-windows-vs2019:
jobs-windows:
jobs:
- build-windows:
name: "build-windows-vs2019"
build-windows-vs2019-cxx20:
jobs:
- build-windows:
name: "build-windows-vs2019-cxx20"
extra_cmake_opt: -DCMAKE_CXX_STANDARD=20
build-windows-vs2017:
jobs:
- build-windows:
name: "build-windows-vs2017"
vs_year: "2017"
cmake_generator: "Visual Studio 15 Win64"
build-java:
- build-cmake-mingw
jobs-java:
jobs:
- build-linux-java
- build-linux-java-static
- build-macos-java
- build-macos-java-static
- build-macos-java-static-universal
build-examples:
jobs:
- build-examples
build-linux-compilers-no_test_run:
jobs:
- build-linux-clang-no_test_run
- build-linux-clang-13-no_test_run
- build-linux-gcc-7
- build-linux-gcc-8-no_test_run
- build-linux-gcc-10-cxx20-no_test_run
- build-linux-gcc-11-no_test_run
- build-linux-arm-cmake-no_test_run
build-macos:
jobs-macos:
jobs:
- build-macos
- build-macos-cmake:
run_even_tests: true
- build-macos-cmake:
run_even_tests: false
build-cmake-mingw:
jobs:
- build-cmake-mingw
build-linux-arm:
jobs-linux-arm:
jobs:
- build-linux-arm
build-fuzzers:
jobs:
- build-fuzzers
nightly:
triggers:
- schedule:
cron: "0 0 * * *"
cron: "0 9 * * *"
filters:
branches:
only:
@ -905,3 +907,5 @@ workflows:
- build-linux-arm-test-full
- build-linux-run-microbench
- build-linux-non-shm
- build-linux-clang-13-asan-ubsan-with-folly
- build-linux-valgrind

1
.gitignore vendored
View File

@ -95,3 +95,4 @@ fuzz/proto/gen/
fuzz/crash-*
cmake-build-*
third-party/folly/

View File

@ -239,10 +239,10 @@ script:
OPT=-DTRAVIS LIB_MODE=shared V=1 ROCKSDBTESTS_PLATFORM_DEPENDENT=only make -j$MK_PARALLEL all_but_some_tests check_some
;;
1)
OPT=-DTRAVIS LIB_MODE=shared V=1 ROCKSDBTESTS_PLATFORM_DEPENDENT=exclude ROCKSDBTESTS_END=backupable_db_test make -j$MK_PARALLEL check_some
OPT=-DTRAVIS LIB_MODE=shared V=1 ROCKSDBTESTS_PLATFORM_DEPENDENT=exclude ROCKSDBTESTS_END=backup_engine_test make -j$MK_PARALLEL check_some
;;
2)
OPT="-DTRAVIS -DROCKSDB_NAMESPACE=alternative_rocksdb_ns" LIB_MODE=shared V=1 make -j$MK_PARALLEL tools && OPT="-DTRAVIS -DROCKSDB_NAMESPACE=alternative_rocksdb_ns" LIB_MODE=shared V=1 ROCKSDBTESTS_PLATFORM_DEPENDENT=exclude ROCKSDBTESTS_START=backupable_db_test ROCKSDBTESTS_END=db_universal_compaction_test make -j$MK_PARALLEL check_some
OPT="-DTRAVIS -DROCKSDB_NAMESPACE=alternative_rocksdb_ns" LIB_MODE=shared V=1 make -j$MK_PARALLEL tools && OPT="-DTRAVIS -DROCKSDB_NAMESPACE=alternative_rocksdb_ns" LIB_MODE=shared V=1 ROCKSDBTESTS_PLATFORM_DEPENDENT=exclude ROCKSDBTESTS_START=backup_engine_test ROCKSDBTESTS_END=db_universal_compaction_test make -j$MK_PARALLEL check_some
;;
3)
OPT=-DTRAVIS LIB_MODE=shared V=1 ROCKSDBTESTS_PLATFORM_DEPENDENT=exclude ROCKSDBTESTS_START=db_universal_compaction_test ROCKSDBTESTS_END=table_properties_collector_test make -j$MK_PARALLEL check_some

View File

@ -2,7 +2,7 @@
# This cmake build is for Windows 64-bit only.
#
# Prerequisites:
# You must have at least Visual Studio 2015 Update 3. Start the Developer Command Prompt window that is a part of Visual Studio installation.
# You must have at least Visual Studio 2019. Start the Developer Command Prompt window that is a part of Visual Studio installation.
# Run the build commands from within the Developer Command Prompt window to have paths to the compiler and runtime libraries set.
# You must have git.exe in your %PATH% environment variable.
#
@ -40,6 +40,8 @@ include(GoogleTest)
get_rocksdb_version(rocksdb_VERSION)
project(rocksdb
VERSION ${rocksdb_VERSION}
DESCRIPTION "An embeddable persistent key-value store for fast storage"
HOMEPAGE_URL https://rocksdb.org/
LANGUAGES CXX C ASM)
if(POLICY CMP0042)
@ -78,19 +80,6 @@ if ($ENV{CIRCLECI})
add_definitions(-DCIRCLECI)
endif()
# third-party/folly is only validated to work on Linux and Windows for now.
# So only turn it on there by default.
if(CMAKE_SYSTEM_NAME MATCHES "Linux|Windows")
if(MSVC AND MSVC_VERSION LESS 1910)
# Folly does not compile with MSVC older than VS2017
option(WITH_FOLLY_DISTRIBUTED_MUTEX "build with folly::DistributedMutex" OFF)
else()
option(WITH_FOLLY_DISTRIBUTED_MUTEX "build with folly::DistributedMutex" ON)
endif()
else()
option(WITH_FOLLY_DISTRIBUTED_MUTEX "build with folly::DistributedMutex" OFF)
endif()
if( NOT DEFINED CMAKE_CXX_STANDARD )
set(CMAKE_CXX_STANDARD 17)
endif()
@ -182,26 +171,6 @@ else()
endif()
endif()
string(TIMESTAMP TS "%Y-%m-%d %H:%M:%S" UTC)
set(BUILD_DATE "${TS}" CACHE STRING "the time we first built rocksdb")
find_package(Git)
if(GIT_FOUND AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git")
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_SHA COMMAND "${GIT_EXECUTABLE}" rev-parse HEAD )
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" RESULT_VARIABLE GIT_MOD COMMAND "${GIT_EXECUTABLE}" diff-index HEAD --quiet)
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_DATE COMMAND "${GIT_EXECUTABLE}" log -1 --date=format:"%Y-%m-%d %T" --format="%ad")
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_TAG RESULT_VARIABLE rv COMMAND "${GIT_EXECUTABLE}" symbolic-ref -q --short HEAD OUTPUT_STRIP_TRAILING_WHITESPACE)
if (rv AND NOT rv EQUAL 0)
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_TAG COMMAND "${GIT_EXECUTABLE}" describe --tags --exact-match OUTPUT_STRIP_TRAILING_WHITESPACE)
endif()
else()
set(GIT_SHA 0)
set(GIT_MOD 1)
endif()
string(REGEX REPLACE "[^0-9a-fA-F]+" "" GIT_SHA "${GIT_SHA}")
string(REGEX REPLACE "[^0-9: /-]+" "" GIT_DATE "${GIT_DATE}")
option(WITH_MD_LIBRARY "build with MD" ON)
if(WIN32 AND MSVC)
if(WITH_MD_LIBRARY)
@ -211,9 +180,6 @@ if(WIN32 AND MSVC)
endif()
endif()
set(BUILD_VERSION_CC ${CMAKE_BINARY_DIR}/build_version.cc)
configure_file(util/build_version.cc.in ${BUILD_VERSION_CC} @ONLY)
if(MSVC)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /Zi /nologo /EHsc /GS /Gd /GR /GF /fp:precise /Zc:wchar_t /Zc:forScope /errorReport:queue")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} /FC /d2Zi+ /W4 /wd4127 /wd4800 /wd4996 /wd4351 /wd4100 /wd4204 /wd4324")
@ -456,30 +422,32 @@ if (ASSERT_STATUS_CHECKED)
add_definitions(-DROCKSDB_ASSERT_STATUS_CHECKED)
endif()
if(DEFINED USE_RTTI)
if(USE_RTTI)
message(STATUS "Enabling RTTI")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -DROCKSDB_USE_RTTI")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DROCKSDB_USE_RTTI")
else()
if(MSVC)
message(STATUS "Disabling RTTI in Release builds. Always on in Debug.")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -DROCKSDB_USE_RTTI")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GR-")
else()
message(STATUS "Disabling RTTI in Release builds")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-rtti")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -fno-rtti")
endif()
endif()
else()
# RTTI is by default AUTO which enables it in debug and disables it in release.
set(USE_RTTI AUTO CACHE STRING "Enable RTTI in builds")
set_property(CACHE USE_RTTI PROPERTY STRINGS AUTO ON OFF)
if(USE_RTTI STREQUAL "AUTO")
message(STATUS "Enabling RTTI in Debug builds only (default)")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -DROCKSDB_USE_RTTI")
if(MSVC)
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GR-")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GR-")
else()
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -fno-rtti")
endif()
elseif(USE_RTTI)
message(STATUS "Enabling RTTI in all builds")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -DROCKSDB_USE_RTTI")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DROCKSDB_USE_RTTI")
else()
if(MSVC)
message(STATUS "Disabling RTTI in Release builds. Always on in Debug.")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -DROCKSDB_USE_RTTI")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} /GR-")
else()
message(STATUS "Disabling RTTI in all builds")
set(CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-rtti")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -fno-rtti")
endif()
endif()
# Used to run CI build and tests so we can run faster
@ -615,8 +583,9 @@ endif()
include_directories(${PROJECT_SOURCE_DIR})
include_directories(${PROJECT_SOURCE_DIR}/include)
if(WITH_FOLLY_DISTRIBUTED_MUTEX)
if(USE_FOLLY)
include_directories(${PROJECT_SOURCE_DIR}/third-party/folly)
add_definitions(-DUSE_FOLLY -DFOLLY_NO_CONFIG)
endif()
find_package(Threads REQUIRED)
@ -628,8 +597,9 @@ set(SOURCES
cache/cache_key.cc
cache/cache_reservation_manager.cc
cache/clock_cache.cc
cache/compressed_secondary_cache.cc
cache/fast_lru_cache.cc
cache/lru_cache.cc
cache/lru_secondary_cache.cc
cache/sharded_cache.cc
db/arena_wrapped_db_iter.cc
db/blob/blob_fetcher.cc
@ -829,6 +799,7 @@ set(SOURCES
trace_replay/trace_record_result.cc
trace_replay/trace_record.cc
trace_replay/trace_replay.cc
util/cleanable.cc
util/coding.cc
util/compaction_job_stats_impl.cc
util/comparator.cc
@ -849,7 +820,8 @@ set(SOURCES
util/thread_local.cc
util/threadpool_imp.cc
util/xxhash.cc
utilities/backupable/backupable_db.cc
utilities/agg_merge/agg_merge.cc
utilities/backup/backup_engine.cc
utilities/blob_db/blob_compaction_filter.cc
utilities/blob_db/blob_db.cc
utilities/blob_db/blob_db_impl.cc
@ -998,13 +970,12 @@ else()
env/io_posix.cc)
endif()
if(WITH_FOLLY_DISTRIBUTED_MUTEX)
if(USE_FOLLY)
list(APPEND SOURCES
third-party/folly/folly/detail/Futex.cpp
third-party/folly/folly/synchronization/AtomicNotification.cpp
third-party/folly/folly/synchronization/DistributedMutex.cpp
third-party/folly/folly/synchronization/ParkingLot.cpp
third-party/folly/folly/synchronization/WaitOptions.cpp)
third-party/folly/folly/container/detail/F14Table.cpp
third-party/folly/folly/lang/SafeAssert.cpp
third-party/folly/folly/lang/ToAscii.cpp
third-party/folly/folly/ScopeGuard.cpp)
endif()
set(ROCKSDB_STATIC_LIB rocksdb${ARTIFACT_SUFFIX})
@ -1019,6 +990,60 @@ else()
set(SYSTEM_LIBS ${CMAKE_THREAD_LIBS_INIT})
endif()
set(ROCKSDB_PLUGIN_EXTERNS "")
set(ROCKSDB_PLUGIN_BUILTINS "")
message(STATUS "ROCKSDB PLUGINS TO BUILD ${ROCKSDB_PLUGINS}")
list(APPEND PLUGINS ${ROCKSDB_PLUGINS})
foreach(PLUGIN IN LISTS PLUGINS)
set(PLUGIN_ROOT "${CMAKE_SOURCE_DIR}/plugin/${PLUGIN}/")
message("including rocksb plugin ${PLUGIN_ROOT}")
set(PLUGINMKFILE "${PLUGIN_ROOT}${PLUGIN}.mk")
if (NOT EXISTS ${PLUGINMKFILE})
message(FATAL_ERROR "Missing plugin makefile: ${PLUGINMKFILE}")
endif()
file(READ ${PLUGINMKFILE} PLUGINMK)
string(REGEX MATCH "SOURCES = ([^\n]*)" FOO ${PLUGINMK})
set(MK_SOURCES ${CMAKE_MATCH_1})
separate_arguments(MK_SOURCES)
foreach(MK_FILE IN LISTS MK_SOURCES)
list(APPEND SOURCES "${PLUGIN_ROOT}${MK_FILE}")
endforeach()
string(REGEX MATCH "_FUNC = ([^\n]*)" FOO ${PLUGINMK})
if (NOT ${CMAKE_MATCH_1} STREQUAL "")
string(APPEND ROCKSDB_PLUGIN_BUILTINS "{\"${PLUGIN}\", " ${CMAKE_MATCH_1} "},")
string(APPEND ROCKSDB_PLUGIN_EXTERNS "int " ${CMAKE_MATCH_1} "(ROCKSDB_NAMESPACE::ObjectLibrary&, const std::string&); ")
endif()
string(REGEX MATCH "_LIBS = ([^\n]*)" FOO ${PLUGINMK})
if (NOT ${CMAKE_MATCH_1} STREQUAL "")
list(APPEND THIRDPARTY_LIBS "${CMAKE_MATCH_1}")
endif()
message("THIRDPARTY_LIBS=${THIRDPARTY_LIBS}")
#TODO: We need to set any compile/link-time flags and add any link libraries
endforeach()
string(TIMESTAMP TS "%Y-%m-%d %H:%M:%S" UTC)
set(BUILD_DATE "${TS}" CACHE STRING "the time we first built rocksdb")
find_package(Git)
if(GIT_FOUND AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/.git")
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_SHA COMMAND "${GIT_EXECUTABLE}" rev-parse HEAD )
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" RESULT_VARIABLE GIT_MOD COMMAND "${GIT_EXECUTABLE}" diff-index HEAD --quiet)
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_DATE COMMAND "${GIT_EXECUTABLE}" log -1 --date=format:"%Y-%m-%d %T" --format="%ad")
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_TAG RESULT_VARIABLE rv COMMAND "${GIT_EXECUTABLE}" symbolic-ref -q --short HEAD OUTPUT_STRIP_TRAILING_WHITESPACE)
if (rv AND NOT rv EQUAL 0)
execute_process(WORKING_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}" OUTPUT_VARIABLE GIT_TAG COMMAND "${GIT_EXECUTABLE}" describe --tags --exact-match OUTPUT_STRIP_TRAILING_WHITESPACE)
endif()
else()
set(GIT_SHA 0)
set(GIT_MOD 1)
endif()
string(REGEX REPLACE "[^0-9a-fA-F]+" "" GIT_SHA "${GIT_SHA}")
string(REGEX REPLACE "[^0-9: /-]+" "" GIT_DATE "${GIT_DATE}")
set(BUILD_VERSION_CC ${CMAKE_BINARY_DIR}/build_version.cc)
configure_file(util/build_version.cc.in ${BUILD_VERSION_CC} @ONLY)
add_library(${ROCKSDB_STATIC_LIB} STATIC ${SOURCES} ${BUILD_VERSION_CC})
target_link_libraries(${ROCKSDB_STATIC_LIB} PRIVATE
${THIRDPARTY_LIBS} ${SYSTEM_LIBS})
@ -1098,6 +1123,12 @@ if(NOT WIN32 OR ROCKSDB_INSTALL_ON_WINDOWS)
COMPATIBILITY SameMajorVersion
)
configure_file(
${CMAKE_CURRENT_SOURCE_DIR}/${PROJECT_NAME}.pc.in
${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}.pc
@ONLY
)
install(DIRECTORY include/rocksdb COMPONENT devel DESTINATION "${CMAKE_INSTALL_INCLUDEDIR}")
install(DIRECTORY "${PROJECT_SOURCE_DIR}/cmake/modules" COMPONENT devel DESTINATION ${package_config_destination})
@ -1136,6 +1167,13 @@ if(NOT WIN32 OR ROCKSDB_INSTALL_ON_WINDOWS)
COMPONENT devel
DESTINATION ${package_config_destination}
)
install(
FILES
${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}.pc
COMPONENT devel
DESTINATION ${CMAKE_INSTALL_LIBDIR}/pkgconfig
)
endif()
option(WITH_ALL_TESTS "Build all test, rather than a small subset" ON)
@ -1157,8 +1195,8 @@ if(WITH_TESTS)
list(APPEND TESTS
cache/cache_reservation_manager_test.cc
cache/cache_test.cc
cache/compressed_secondary_cache_test.cc
cache/lru_cache_test.cc
cache/lru_secondary_cache_test.cc
db/blob/blob_counting_iterator_test.cc
db/blob/blob_file_addition_test.cc
db/blob/blob_file_builder_test.cc
@ -1315,7 +1353,8 @@ if(WITH_TESTS)
util/thread_list_test.cc
util/thread_local_test.cc
util/work_queue_test.cc
utilities/backupable/backupable_db_test.cc
utilities/agg_merge/agg_merge_test.cc
utilities/backup/backup_engine_test.cc
utilities/blob_db/blob_db_test.cc
utilities/cassandra/cassandra_functional_test.cc
utilities/cassandra/cassandra_format_test.cc
@ -1346,14 +1385,11 @@ if(WITH_TESTS)
)
endif()
if(WITH_FOLLY_DISTRIBUTED_MUTEX)
list(APPEND TESTS third-party/folly/folly/synchronization/test/DistributedMutexTest.cpp)
endif()
set(TESTUTIL_SOURCE
db/db_test_util.cc
monitoring/thread_status_updater_debug.cc
table/mock_table.cc
utilities/agg_merge/test_agg_merge.cc
utilities/cassandra/test_utils.cc
)
enable_testing()

View File

@ -1,4 +1,67 @@
# Rocksdb Change Log
## Unreleased
### Bug Fixes
* Fixed a bug where manual flush would block forever even though flush options had wait=false.
* Fixed a bug where RocksDB could corrupt DBs with `avoid_flush_during_recovery == true` by removing valid WALs, leading to `Status::Corruption` with message like "SST file is ahead of WALs" when attempting to reopen.
* Fixed a bug in async_io path where incorrect length of data is read by FilePrefetchBuffer if data is consumed from two populated buffers and request for more data is sent.
* Fixed a CompactionFilter bug. Compaction filter used to use `Delete` to remove keys, even if the keys should be removed with `SingleDelete`. Mixing `Delete` and `SingleDelete` may cause undefined behavior.
* Fixed a bug which might cause process crash when I/O error happens when reading an index block in MultiGet().
### New Features
* DB::GetLiveFilesStorageInfo is ready for production use.
* Add new stats PREFETCHED_BYTES_DISCARDED which records number of prefetched bytes discarded by RocksDB FilePrefetchBuffer on destruction and POLL_WAIT_MICROS records wait time for FS::Poll API completion.
### Public API changes
* Add rollback_deletion_type_callback to TransactionDBOptions so that write-prepared transactions know whether to issue a Delete or SingleDelete to cancel a previous key written during prior prepare phase. The PR aims to prevent mixing SingleDeletes and Deletes for the same key that can lead to undefined behaviors for write-prepared transactions.
* EXPERIMENTAL: Add new API AbortIO in file_system to abort the read requests submitted asynchronously.
* CompactionFilter::Decision has a new value: kRemoveWithSingleDelete. If CompactionFilter returns this decision, then CompactionIterator will use `SingleDelete` to mark a key as removed.
* Renamed CompactionFilter::Decision::kRemoveWithSingleDelete to kPurge since the latter sounds more general and hides the implementation details of how compaction iterator handles keys.
* Added ability to specify functions for Prepare and Validate to OptionsTypeInfo. Added methods to OptionTypeInfo to set the functions via an API. These methods are intended for RocksDB plugin developers for configuration management.
* Added a new immutable db options, enforce_single_del_contracts. If set to false (default is true), compaction will NOT fail due to a single delete followed by a delete for the same key. The purpose of this temporay option is to help existing use cases migrate.
### Bug Fixes
* RocksDB calls FileSystem::Poll API during FilePrefetchBuffer destruction which impacts performance as it waits for read requets completion which is not needed anymore. Calling FileSystem::AbortIO to abort those requests instead fixes that performance issue.
* Fixed unnecessary block cache contention when queries within a MultiGet batch and across parallel batches access the same data block, which previously could cause severely degraded performance in this unusual case. (In more typical MultiGet cases, this fix is expected to yield a small or negligible performance improvement.)
### Behavior changes
* Enforce the existing contract of SingleDelete so that SingleDelete cannot be mixed with Delete because it leads to undefined behavior. Fix a number of unit tests that violate the contract but happen to pass.
* ldb `--try_load_options` default to true if `--db` is specified and not creating a new DB, the user can still explicitly disable that by `--try_load_options=false` (or explicitly enable that by `--try_load_options`).
## 7.2.0 (04/15/2022)
### Bug Fixes
* Fixed bug which caused rocksdb failure in the situation when rocksdb was accessible using UNC path
* Fixed a race condition when 2PC is disabled and WAL tracking in the MANIFEST is enabled. The race condition is between two background flush threads trying to install flush results, causing a WAL deletion not tracked in the MANIFEST. A future DB open may fail.
* Fixed a heap use-after-free race with DropColumnFamily.
* Fixed a bug that `rocksdb.read.block.compaction.micros` cannot track compaction stats (#9722).
* Fixed `file_type`, `relative_filename` and `directory` fields returned by `GetLiveFilesMetaData()`, which were added in inheriting from `FileStorageInfo`.
* Fixed a bug affecting `track_and_verify_wals_in_manifest`. Without the fix, application may see "open error: Corruption: Missing WAL with log number" while trying to open the db. The corruption is a false alarm but prevents DB open (#9766).
* Fix segfault in FilePrefetchBuffer with async_io as it doesn't wait for pending jobs to complete on destruction.
* Fix ERROR_HANDLER_AUTORESUME_RETRY_COUNT stat whose value was set wrong in portal.h
* Fixed a bug for non-TransactionDB with avoid_flush_during_recovery = true and TransactionDB where in case of crash, min_log_number_to_keep may not change on recovery and persisting a new MANIFEST with advanced log_numbers for some column families, results in "column family inconsistency" error on second recovery. As a solution the corrupted WALs whose numbers are larger than the corrupted wal and smaller than the new WAL will be moved to archive folder.
* Fixed a bug in RocksDB DB::Open() which may creates and writes to two new MANIFEST files even before recovery succeeds. Now writes to MANIFEST are persisted only after recovery is successful.
### New Features
* For db_bench when --seed=0 or --seed is not set then it uses the current time as the seed value. Previously it used the value 1000.
* For db_bench when --benchmark lists multiple tests and each test uses a seed for a RNG then the seeds across tests will no longer be repeated.
* Added an option to dynamically charge an updating estimated memory usage of block-based table reader to block cache if block cache available. To enable this feature, set `BlockBasedTableOptions::reserve_table_reader_memory = true`.
* Add new stat ASYNC_READ_BYTES that calculates number of bytes read during async read call and users can check if async code path is being called by RocksDB internal automatic prefetching for sequential reads.
* Enable async prefetching if ReadOptions.readahead_size is set along with ReadOptions.async_io in FilePrefetchBuffer.
* Add event listener support on remote compaction compactor side.
* Added a dedicated integer DB property `rocksdb.live-blob-file-garbage-size` that exposes the total amount of garbage in the blob files in the current version.
* RocksDB does internal auto prefetching if it notices sequential reads. It starts with readahead size `initial_auto_readahead_size` which now can be configured through BlockBasedTableOptions.
* Add a merge operator that allows users to register specific aggregation function so that they can does aggregation using different aggregation types for different keys. See comments in include/rocksdb/utilities/agg_merge.h for actual usage. The feature is experimental and the format is subject to change and we won't provide a migration tool.
* Meta-internal / Experimental: Improve CPU performance by replacing many uses of std::unordered_map with folly::F14FastMap when RocksDB is compiled together with Folly.
* Experimental: Add CompressedSecondaryCache, a concrete implementation of rocksdb::SecondaryCache, that integrates with compression libraries (e.g. LZ4) to hold compressed blocks.
### Behavior changes
* Disallow usage of commit-time-write-batch for write-prepared/write-unprepared transactions if TransactionOptions::use_only_the_last_commit_time_batch_for_recovery is false to prevent two (or more) uncommitted versions of the same key in the database. Otherwise, bottommost compaction may violate the internal key uniqueness invariant of SSTs if the sequence numbers of both internal keys are zeroed out (#9794).
* Make DB::GetUpdatesSince() return NotSupported early for write-prepared/write-unprepared transactions, as the API contract indicates.
### Public API changes
* Exposed APIs to examine results of block cache stats collections in a structured way. In particular, users of `GetMapProperty()` with property `kBlockCacheEntryStats` can now use the functions in `BlockCacheEntryStatsMapKeys` to find stats in the map.
* Add `fail_if_not_bottommost_level` to IngestExternalFileOptions so that ingestion will fail if the file(s) cannot be ingested to the bottommost level.
* Add output parameter `is_in_sec_cache` to `SecondaryCache::Lookup()`. It is to indicate whether the handle is possibly erased from the secondary cache after the Lookup.
## 7.1.0 (03/23/2022)
### New Features
* Allow WriteBatchWithIndex to index a WriteBatch that includes keys with user-defined timestamps. The index itself does not have timestamp.
@ -1658,7 +1721,7 @@ if set to something > 0 user will see 2 changes in iterators behavior 1) only ke
* Added a new way to report QPS from db_bench (check out --report_file and --report_interval_seconds)
* Added a cache for individual rows. See DBOptions::row_cache for more info.
* Several new features on EventListener (see include/rocksdb/listener.h):
- OnCompationCompleted() now returns per-compaction job statistics, defined in include/rocksdb/compaction_job_stats.h.
- OnCompactionCompleted() now returns per-compaction job statistics, defined in include/rocksdb/compaction_job_stats.h.
- Added OnTableFileCreated() and OnTableFileDeleted().
* Add compaction_options_universal.enable_trivial_move to true, to allow trivial move while performing universal compaction. Trivial move will happen only when all the input files are non overlapping.

View File

@ -47,7 +47,7 @@ to build a portable binary, add `PORTABLE=1` before your make commands, like thi
* If you wish to build the RocksJava static target, then cmake is required for building Snappy.
* If you wish to run microbench (e.g, `make microbench`, `make ribbon_bench` or `cmake -DWITH_BENCHMARK=1`), Google benchmark < 1.6.0 is needed.
* If you wish to run microbench (e.g, `make microbench`, `make ribbon_bench` or `cmake -DWITH_BENCHMARK=1`), Google benchmark >= 1.6.0 is needed.
## Supported platforms
@ -180,8 +180,7 @@ to build a portable binary, add `PORTABLE=1` before your make commands, like thi
* **iOS**:
* Run: `TARGET_OS=IOS make static_lib`. When building the project which uses rocksdb iOS library, make sure to define two important pre-processing macros: `ROCKSDB_LITE` and `IOS_CROSS_COMPILE`.
* **Windows**:
* For building with MS Visual Studio 13 you will need Update 4 installed.
* **Windows** (Visual Studio 2017 to up):
* Read and follow the instructions at CMakeLists.txt
* Or install via [vcpkg](https://github.com/microsoft/vcpkg)
* run `vcpkg install rocksdb:x64-windows`

239
Makefile
View File

@ -8,7 +8,7 @@
BASH_EXISTS := $(shell which bash)
SHELL := $(shell which bash)
include python.mk
include common.mk
CLEAN_FILES = # deliberately empty, so we can append below.
CFLAGS += ${EXTRA_CFLAGS}
@ -232,14 +232,20 @@ include make_config.mk
ROCKSDB_PLUGIN_MKS = $(foreach plugin, $(ROCKSDB_PLUGINS), plugin/$(plugin)/*.mk)
include $(ROCKSDB_PLUGIN_MKS)
ROCKSDB_PLUGIN_SOURCES = $(foreach plugin, $(ROCKSDB_PLUGINS), $(foreach source, $($(plugin)_SOURCES), plugin/$(plugin)/$(source)))
ROCKSDB_PLUGIN_HEADERS = $(foreach plugin, $(ROCKSDB_PLUGINS), $(foreach header, $($(plugin)_HEADERS), plugin/$(plugin)/$(header)))
ROCKSDB_PLUGIN_PROTO =ROCKSDB_NAMESPACE::ObjectLibrary\&, const std::string\&
ROCKSDB_PLUGIN_SOURCES = $(foreach p, $(ROCKSDB_PLUGINS), $(foreach source, $($(p)_SOURCES), plugin/$(p)/$(source)))
ROCKSDB_PLUGIN_HEADERS = $(foreach p, $(ROCKSDB_PLUGINS), $(foreach header, $($(p)_HEADERS), plugin/$(p)/$(header)))
ROCKSDB_PLUGIN_LIBS = $(foreach p, $(ROCKSDB_PLUGINS), $(foreach lib, $($(p)_LIBS), -l$(lib)))
ROCKSDB_PLUGIN_W_FUNCS = $(foreach p, $(ROCKSDB_PLUGINS), $(if $($(p)_FUNC), $(p)))
ROCKSDB_PLUGIN_EXTERNS = $(foreach p, $(ROCKSDB_PLUGIN_W_FUNCS), int $($(p)_FUNC)($(ROCKSDB_PLUGIN_PROTO));)
ROCKSDB_PLUGIN_BUILTINS = $(foreach p, $(ROCKSDB_PLUGIN_W_FUNCS), {\"$(p)\"\, $($(p)_FUNC)}\,)
ROCKSDB_PLUGIN_LDFLAGS = $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_LDFLAGS))
ROCKSDB_PLUGIN_PKGCONFIG_REQUIRES = $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_PKGCONFIG_REQUIRES))
CXXFLAGS += $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_CXXFLAGS))
PLATFORM_LDFLAGS += $(ROCKSDB_PLUGIN_LDFLAGS)
# Patch up the link flags for JNI from the plugins
ROCKSDB_PLUGIN_LDFLAGS = $(foreach plugin, $(ROCKSDB_PLUGINS), $($(plugin)_LDFLAGS))
PLATFORM_LDFLAGS += $(ROCKSDB_PLUGIN_LDFLAGS)
JAVA_LDFLAGS += $(ROCKSDB_PLUGIN_LDFLAGS)
JAVA_STATIC_LDFLAGS += $(ROCKSDB_PLUGIN_LDFLAGS)
@ -282,7 +288,7 @@ missing_make_config_paths := $(shell \
grep "\./\S*\|/\S*" -o $(CURDIR)/make_config.mk | \
while read path; \
do [ -e $$path ] || echo $$path; \
done | sort | uniq)
done | sort | uniq | grep -v "/DOES/NOT/EXIST")
$(foreach path, $(missing_make_config_paths), \
$(warning Warning: $(path) does not exist))
@ -334,6 +340,8 @@ endif
# ASAN doesn't work well with jemalloc. If we're compiling with ASAN, we should use regular malloc.
ifdef COMPILE_WITH_ASAN
DISABLE_JEMALLOC=1
ASAN_OPTIONS?=detect_stack_use_after_return=1
export ASAN_OPTIONS
EXEC_LDFLAGS += -fsanitize=address
PLATFORM_CCFLAGS += -fsanitize=address
PLATFORM_CXXFLAGS += -fsanitize=address
@ -394,6 +402,10 @@ ifndef DISABLE_JEMALLOC
ifdef JEMALLOC
PLATFORM_CXXFLAGS += -DROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE
PLATFORM_CCFLAGS += -DROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE
ifeq ($(USE_FOLLY),1)
PLATFORM_CXXFLAGS += -DUSE_JEMALLOC
PLATFORM_CCFLAGS += -DUSE_JEMALLOC
endif
endif
ifdef WITH_JEMALLOC_FLAG
PLATFORM_LDFLAGS += -ljemalloc
@ -404,8 +416,8 @@ ifndef DISABLE_JEMALLOC
PLATFORM_CCFLAGS += $(JEMALLOC_INCLUDE)
endif
ifndef USE_FOLLY_DISTRIBUTED_MUTEX
USE_FOLLY_DISTRIBUTED_MUTEX=0
ifndef USE_FOLLY
USE_FOLLY=0
endif
ifndef GTEST_THROW_ON_FAILURE
@ -425,8 +437,12 @@ else
PLATFORM_CXXFLAGS += -isystem $(GTEST_DIR)
endif
ifeq ($(USE_FOLLY_DISTRIBUTED_MUTEX),1)
FOLLY_DIR = ./third-party/folly
# This provides a Makefile simulation of a Meta-internal folly integration.
# It is not validated for general use.
ifeq ($(USE_FOLLY),1)
ifeq (,$(FOLLY_DIR))
FOLLY_DIR = ./third-party/folly
endif
# AIX: pre-defined system headers are surrounded by an extern "C" block
ifeq ($(PLATFORM), OS_AIX)
PLATFORM_CCFLAGS += -I$(FOLLY_DIR)
@ -435,6 +451,8 @@ ifeq ($(USE_FOLLY_DISTRIBUTED_MUTEX),1)
PLATFORM_CCFLAGS += -isystem $(FOLLY_DIR)
PLATFORM_CXXFLAGS += -isystem $(FOLLY_DIR)
endif
PLATFORM_CCFLAGS += -DUSE_FOLLY -DFOLLY_NO_CONFIG
PLATFORM_CXXFLAGS += -DUSE_FOLLY -DFOLLY_NO_CONFIG
endif
ifdef TEST_CACHE_LINE_SIZE
@ -521,7 +539,7 @@ LIB_OBJECTS += $(patsubst %.c, $(OBJ_DIR)/%.o, $(LIB_SOURCES_C))
LIB_OBJECTS += $(patsubst %.S, $(OBJ_DIR)/%.o, $(LIB_SOURCES_ASM))
endif
ifeq ($(USE_FOLLY_DISTRIBUTED_MUTEX),1)
ifeq ($(USE_FOLLY),1)
LIB_OBJECTS += $(patsubst %.cpp, $(OBJ_DIR)/%.o, $(FOLLY_SOURCES))
endif
@ -556,11 +574,6 @@ ALL_SOURCES += $(ROCKSDB_PLUGIN_SOURCES)
TESTS = $(patsubst %.cc, %, $(notdir $(TEST_MAIN_SOURCES)))
TESTS += $(patsubst %.c, %, $(notdir $(TEST_MAIN_SOURCES_C)))
ifeq ($(USE_FOLLY_DISTRIBUTED_MUTEX),1)
TESTS += folly_synchronization_distributed_mutex_test
ALL_SOURCES += third-party/folly/folly/synchronization/test/DistributedMutexTest.cc
endif
# `make check-headers` to very that each header file includes its own
# dependencies
ifneq ($(filter check-headers, $(MAKECMDGOALS)),)
@ -585,9 +598,6 @@ am__v_CCH_1 =
check-headers: $(HEADER_OK_FILES)
# options_settable_test doesn't pass with UBSAN as we use hack in the test
ifdef COMPILE_WITH_UBSAN
TESTS := $(shell echo $(TESTS) | sed 's/\boptions_settable_test\b//g')
endif
ifdef ASSERT_STATUS_CHECKED
# TODO: finish fixing all tests to pass this check
TESTS_FAILING_ASC = \
@ -607,10 +617,13 @@ ROCKSDBTESTS_SUBSET ?= $(TESTS)
# env_test - suspicious use of test::TmpDir
# deletefile_test - serial because it generates giant temporary files in
# its various tests. Parallel can fill up your /dev/shm
# db_bloom_filter_test - serial because excessive space usage by instances
# of DBFilterConstructionReserveMemoryTestWithParam can fill up /dev/shm
NON_PARALLEL_TEST = \
c_test \
env_test \
deletefile_test \
db_bloom_filter_test \
PARALLEL_TEST = $(filter-out $(NON_PARALLEL_TEST), $(TESTS))
@ -728,7 +741,7 @@ else
git_mod := $(shell git diff-index HEAD --quiet 2>/dev/null; echo $$?)
git_date := $(shell git log -1 --date=format:"%Y-%m-%d %T" --format="%ad" 2>/dev/null)
endif
gen_build_version = sed -e s/@GIT_SHA@/$(git_sha)/ -e s:@GIT_TAG@:"$(git_tag)": -e s/@GIT_MOD@/"$(git_mod)"/ -e s/@BUILD_DATE@/"$(build_date)"/ -e s/@GIT_DATE@/"$(git_date)"/ util/build_version.cc.in
gen_build_version = sed -e s/@GIT_SHA@/$(git_sha)/ -e s:@GIT_TAG@:"$(git_tag)": -e s/@GIT_MOD@/"$(git_mod)"/ -e s/@BUILD_DATE@/"$(build_date)"/ -e s/@GIT_DATE@/"$(git_date)"/ -e s/@ROCKSDB_PLUGIN_BUILTINS@/'$(ROCKSDB_PLUGIN_BUILTINS)'/ -e s/@ROCKSDB_PLUGIN_EXTERNS@/"$(ROCKSDB_PLUGIN_EXTERNS)"/ util/build_version.cc.in
# Record the version of the source that we are compiling.
# We keep a record of the git revision in this file. It is then built
@ -782,17 +795,10 @@ $(SHARED4): $(LIB_OBJECTS)
$(AM_V_CCLD) $(CXX) $(PLATFORM_SHARED_LDFLAGS)$(SHARED3) $(LIB_OBJECTS) $(LDFLAGS) -o $@
endif # PLATFORM_SHARED_EXT
.PHONY: blackbox_crash_test check clean coverage crash_test ldb_tests package \
release tags tags0 valgrind_check whitebox_crash_test format static_lib shared_lib all \
dbg rocksdbjavastatic rocksdbjava gen-pc install install-static install-shared uninstall \
analyze tools tools_lib check-headers \
blackbox_crash_test_with_atomic_flush whitebox_crash_test_with_atomic_flush \
blackbox_crash_test_with_txn whitebox_crash_test_with_txn \
blackbox_crash_test_with_best_efforts_recovery \
blackbox_crash_test_with_ts whitebox_crash_test_with_ts \
blackbox_crash_test_with_multiops_wc_txn \
blackbox_crash_test_with_multiops_wp_txn
.PHONY: check clean coverage ldb_tests package dbg gen-pc build_size \
release tags tags0 valgrind_check format static_lib shared_lib all \
rocksdbjavastatic rocksdbjava install install-static install-shared \
uninstall analyze tools tools_lib check-headers checkout_folly
all: $(LIBRARY) $(BENCHMARKS) tools tools_lib test_libs $(TESTS)
@ -826,20 +832,8 @@ release: clean
coverage: clean
COVERAGEFLAGS="-fprofile-arcs -ftest-coverage" LDFLAGS+="-lgcov" $(MAKE) J=1 all check
cd coverage && ./coverage_test.sh
# Delete intermediate files
$(FIND) . -type f -regex ".*\.\(\(gcda\)\|\(gcno\)\)" -exec rm -f {} \;
ifneq (,$(filter check parallel_check,$(MAKECMDGOALS)),)
# Use /dev/shm if it has the sticky bit set (otherwise, /tmp),
# and create a randomly-named rocksdb.XXXX directory therein.
# We'll use that directory in the "make check" rules.
ifeq ($(TMPD),)
TMPDIR := $(shell echo $${TMPDIR:-/tmp})
TMPD := $(shell f=/dev/shm; test -k $$f || f=$(TMPDIR); \
perl -le 'use File::Temp "tempdir";' \
-e 'print tempdir("'$$f'/rocksdb.XXXX", CLEANUP => 0)')
endif
endif
# Delete intermediate files
$(FIND) . -type f \( -name "*.gcda" -o -name "*.gcno" \) -exec rm -f {} \;
# Run all tests in parallel, accumulating per-test logs in t/log-*.
#
@ -880,7 +874,7 @@ $(parallel_tests):
TEST_SCRIPT=t/run-$$TEST_BINARY-$${TEST_NAME//\//-}; \
printf '%s\n' \
'#!/bin/sh' \
"d=\$(TMPD)$$TEST_SCRIPT" \
"d=\$(TEST_TMPDIR)$$TEST_SCRIPT" \
'mkdir -p $$d' \
"TEST_TMPDIR=\$$d $(DRIVER) ./$$TEST_BINARY --gtest_filter=$$TEST_NAME" \
> $$TEST_SCRIPT; \
@ -940,7 +934,6 @@ endif
.PHONY: check_0
check_0:
$(AM_V_GEN)export TEST_TMPDIR=$(TMPD); \
printf '%s\n' '' \
'To monitor subtest <duration,pass/fail,name>,' \
' run "make watch-log" in a separate window' ''; \
@ -951,7 +944,8 @@ check_0:
| $(prioritize_long_running_tests) \
| grep -E '$(tests-regexp)' \
| grep -E -v '$(EXCLUDE_TESTS_REGEX)' \
| build_tools/gnu_parallel -j$(J) --plain --joblog=LOG --eta --gnu '{} $(parallel_redir)' ; \
| build_tools/gnu_parallel -j$(J) --plain --joblog=LOG --eta --gnu \
--tmpdir=$(TEST_TMPDIR) '{} $(parallel_redir)' ; \
parallel_retcode=$$? ; \
awk '{ if ($$7 != 0 || $$8 != 0) { if ($$7 == "Exitval") { h = $$0; } else { if (!f) print h; print; f = 1 } } } END { if(f) exit 1; }' < LOG ; \
awk_retcode=$$?; \
@ -962,7 +956,6 @@ valgrind-exclude-regexp = InlineSkipTest.ConcurrentInsert|TransactionStressTest.
.PHONY: valgrind_check_0
valgrind_check_0: test_log_prefix := valgrind_
valgrind_check_0:
$(AM_V_GEN)export TEST_TMPDIR=$(TMPD); \
printf '%s\n' '' \
'To monitor subtest <duration,pass/fail,name>,' \
' run "make watch-log" in a separate window' ''; \
@ -974,10 +967,11 @@ valgrind_check_0:
| grep -E '$(tests-regexp)' \
| grep -E -v '$(valgrind-exclude-regexp)' \
| build_tools/gnu_parallel -j$(J) --plain --joblog=LOG --eta --gnu \
'(if [[ "{}" == "./"* ]] ; then $(DRIVER) {}; else {}; fi) \
--tmpdir=$(TEST_TMPDIR) \
'(if [[ "{}" == "./"* ]] ; then $(DRIVER) {}; else {}; fi) \
$(parallel_redir)' \
CLEAN_FILES += t LOG $(TMPD)
CLEAN_FILES += t LOG $(TEST_TMPDIR)
# When running parallel "make check", you can monitor its progress
# from another window.
@ -1000,12 +994,12 @@ check: all
&& (build_tools/gnu_parallel --gnu --help 2>/dev/null) | \
grep -q 'GNU Parallel'; \
then \
$(MAKE) T="$$t" TMPD=$(TMPD) check_0; \
$(MAKE) T="$$t" check_0; \
else \
for t in $(TESTS); do \
echo "===== Running $$t (`date`)"; ./$$t || exit 1; done; \
fi
rm -rf $(TMPD)
rm -rf $(TEST_TMPDIR)
ifneq ($(PLATFORM), OS_AIX)
$(PYTHON) tools/check_all_python.py
ifeq ($(filter -DROCKSDB_LITE,$(OPT)),)
@ -1032,31 +1026,31 @@ ldb_tests: ldb
include crash_test.mk
asan_check: clean
ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 $(MAKE) check -j32
COMPILE_WITH_ASAN=1 $(MAKE) check -j32
$(MAKE) clean
asan_crash_test: clean
ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 $(MAKE) crash_test
COMPILE_WITH_ASAN=1 $(MAKE) crash_test
$(MAKE) clean
whitebox_asan_crash_test: clean
ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 $(MAKE) whitebox_crash_test
COMPILE_WITH_ASAN=1 $(MAKE) whitebox_crash_test
$(MAKE) clean
blackbox_asan_crash_test: clean
ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 $(MAKE) blackbox_crash_test
COMPILE_WITH_ASAN=1 $(MAKE) blackbox_crash_test
$(MAKE) clean
asan_crash_test_with_atomic_flush: clean
ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 $(MAKE) crash_test_with_atomic_flush
COMPILE_WITH_ASAN=1 $(MAKE) crash_test_with_atomic_flush
$(MAKE) clean
asan_crash_test_with_txn: clean
ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 $(MAKE) crash_test_with_txn
COMPILE_WITH_ASAN=1 $(MAKE) crash_test_with_txn
$(MAKE) clean
asan_crash_test_with_best_efforts_recovery: clean
ASAN_OPTIONS=detect_stack_use_after_return=1 COMPILE_WITH_ASAN=1 $(MAKE) crash_test_with_best_efforts_recovery
COMPILE_WITH_ASAN=1 $(MAKE) crash_test_with_best_efforts_recovery
$(MAKE) clean
ubsan_check: clean
@ -1102,11 +1096,11 @@ valgrind_test_some:
valgrind_check: $(TESTS)
$(MAKE) DRIVER="$(VALGRIND_VER) $(VALGRIND_OPTS)" gen_parallel_tests
$(AM_V_GEN)if test "$(J)" != 1 \
&& (build_tools/gnu_parallel --gnu --help 2>/dev/null) | \
&& (build_tools/gnu_parallel --gnu --help 2>/dev/null) | \
grep -q 'GNU Parallel'; \
then \
$(MAKE) TMPD=$(TMPD) \
DRIVER="$(VALGRIND_VER) $(VALGRIND_OPTS)" valgrind_check_0; \
$(MAKE) \
DRIVER="$(VALGRIND_VER) $(VALGRIND_OPTS)" valgrind_check_0; \
else \
for t in $(filter-out %skiplist_test options_settable_test,$(TESTS)); do \
$(VALGRIND_VER) $(VALGRIND_OPTS) ./$$t; \
@ -1126,27 +1120,6 @@ valgrind_check_some: $(ROCKSDBTESTS_SUBSET)
fi; \
done
ifneq ($(PAR_TEST),)
parloop:
ret_bad=0; \
for t in $(PAR_TEST); do \
echo "===== Running $$t in parallel $(NUM_PAR) (`date`)";\
if [ $(db_test) -eq 1 ]; then \
seq $(J) | v="$$t" build_tools/gnu_parallel --gnu --plain 's=$(TMPD)/rdb-{}; export TEST_TMPDIR=$$s;' \
'timeout 2m ./db_test --gtest_filter=$$v >> $$s/log-{} 2>1'; \
else\
seq $(J) | v="./$$t" build_tools/gnu_parallel --gnu --plain 's=$(TMPD)/rdb-{};' \
'export TEST_TMPDIR=$$s; timeout 10m $$v >> $$s/log-{} 2>1'; \
fi; \
ret_code=$$?; \
if [ $$ret_code -ne 0 ]; then \
ret_bad=$$ret_code; \
echo $$t exited with $$ret_code; \
fi; \
done; \
exit $$ret_bad;
endif
test_names = \
./db_test --gtest_list_tests \
| perl -n \
@ -1154,24 +1127,6 @@ test_names = \
-e '/^(\s*)(\S+)/; !$$1 and do {$$p=$$2; break};' \
-e 'print qq! $$p$$2!'
parallel_check: $(TESTS)
$(AM_V_GEN)if test "$(J)" > 1 \
&& (build_tools/gnu_parallel --gnu --help 2>/dev/null) | \
grep -q 'GNU Parallel'; \
then \
echo Running in parallel $(J); \
else \
echo "Need to have GNU Parallel and J > 1"; exit 1; \
fi; \
ret_bad=0; \
echo $(J);\
echo Test Dir: $(TMPD); \
seq $(J) | build_tools/gnu_parallel --gnu --plain 's=$(TMPD)/rdb-{}; rm -rf $$s; mkdir $$s'; \
$(MAKE) PAR_TEST="$(shell $(test_names))" TMPD=$(TMPD) \
J=$(J) db_test=1 parloop; \
$(MAKE) PAR_TEST="$(filter-out db_test, $(TESTS))" \
TMPD=$(TMPD) J=$(J) db_test=0 parloop;
analyze: clean
USE_CLANG=1 $(MAKE) analyze_incremental
@ -1214,9 +1169,9 @@ clean-rocks:
rm -f $(BENCHMARKS) $(TOOLS) $(TESTS) $(PARALLEL_TEST) $(ALL_STATIC_LIBS) $(ALL_SHARED_LIBS) $(MICROBENCHS)
rm -rf $(CLEAN_FILES) ios-x86 ios-arm scan_build_report
$(FIND) . -name "*.[oda]" -exec rm -f {} \;
$(FIND) . -type f -regex ".*\.\(\(gcda\)\|\(gcno\)\)" -exec rm -f {} \;
$(FIND) . -type f \( -name "*.gcda" -o -name "*.gcno" \) -exec rm -f {} \;
clean-rocksjava:
clean-rocksjava: clean-rocks
rm -rf jl jls
cd java && $(MAKE) clean
@ -1300,11 +1255,6 @@ trace_analyzer: $(OBJ_DIR)/tools/trace_analyzer.o $(ANALYZE_OBJECTS) $(TOOLS_LIB
block_cache_trace_analyzer: $(OBJ_DIR)/tools/block_cache_analyzer/block_cache_trace_analyzer_tool.o $(ANALYZE_OBJECTS) $(TOOLS_LIBRARY) $(LIBRARY)
$(AM_LINK)
ifeq ($(USE_FOLLY_DISTRIBUTED_MUTEX),1)
folly_synchronization_distributed_mutex_test: $(OBJ_DIR)/third-party/folly/folly/synchronization/test/DistributedMutexTest.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
endif
cache_bench: $(OBJ_DIR)/cache/cache_bench.o $(CACHE_BENCH_OBJECTS) $(LIBRARY)
$(AM_LINK)
@ -1371,6 +1321,9 @@ ribbon_test: $(OBJ_DIR)/util/ribbon_test.o $(TEST_LIBRARY) $(LIBRARY)
option_change_migration_test: $(OBJ_DIR)/utilities/option_change_migration/option_change_migration_test.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
agg_merge_test: $(OBJ_DIR)/utilities/agg_merge/agg_merge_test.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
stringappend_test: $(OBJ_DIR)/utilities/merge_operators/string_append/stringappend_test.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
@ -1551,7 +1504,7 @@ perf_context_test: $(OBJ_DIR)/db/perf_context_test.o $(TEST_LIBRARY) $(LIBRARY)
prefix_test: $(OBJ_DIR)/db/prefix_test.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
backupable_db_test: $(OBJ_DIR)/utilities/backupable/backupable_db_test.o $(TEST_LIBRARY) $(LIBRARY)
backup_engine_test: $(OBJ_DIR)/utilities/backup/backup_engine_test.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
checkpoint_test: $(OBJ_DIR)/utilities/checkpoint/checkpoint_test.o $(TEST_LIBRARY) $(LIBRARY)
@ -1842,7 +1795,7 @@ statistics_test: $(OBJ_DIR)/monitoring/statistics_test.o $(TEST_LIBRARY) $(LIBRA
stats_history_test: $(OBJ_DIR)/monitoring/stats_history_test.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
lru_secondary_cache_test: $(OBJ_DIR)/cache/lru_secondary_cache_test.o $(TEST_LIBRARY) $(LIBRARY)
compressed_secondary_cache_test: $(OBJ_DIR)/cache/compressed_secondary_cache_test.o $(TEST_LIBRARY) $(LIBRARY)
$(AM_LINK)
lru_cache_test: $(OBJ_DIR)/cache/lru_cache_test.o $(TEST_LIBRARY) $(LIBRARY)
@ -2044,8 +1997,8 @@ ROCKSDB_JAVADOCS_JAR = rocksdbjni-$(ROCKSDB_JAVA_VERSION)-javadoc.jar
ROCKSDB_SOURCES_JAR = rocksdbjni-$(ROCKSDB_JAVA_VERSION)-sources.jar
SHA256_CMD = sha256sum
ZLIB_VER ?= 1.2.11
ZLIB_SHA256 ?= c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1
ZLIB_VER ?= 1.2.12
ZLIB_SHA256 ?= 91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9
ZLIB_DOWNLOAD_BASE ?= http://zlib.net
BZIP2_VER ?= 1.0.8
BZIP2_SHA256 ?= ab5a03176ee106d3f0fa90e381da478ddae405918153cca248e682cd0c4a2269
@ -2372,10 +2325,58 @@ jtest: rocksdbjava
jdb_bench:
cd java;$(MAKE) db_bench;
commit_prereq: build_tools/rocksdb-lego-determinator \
build_tools/precommit_checker.py
J=$(J) build_tools/precommit_checker.py unit unit_481 clang_unit release release_481 clang_release tsan asan ubsan lite unit_non_shm
$(MAKE) clean && $(MAKE) jclean && $(MAKE) rocksdbjava;
commit_prereq:
echo "TODO: bring this back using parts of old precommit_checker.py and rocksdb-lego-determinator"
false # J=$(J) build_tools/precommit_checker.py unit clang_unit release clang_release tsan asan ubsan lite unit_non_shm
# $(MAKE) clean && $(MAKE) jclean && $(MAKE) rocksdbjava;
# For public CI runs, checkout folly in a way that can build with RocksDB.
# This is mostly intended as a test-only simulation of Meta-internal folly
# integration.
checkout_folly:
if [ -e third-party/folly ]; then \
cd third-party/folly && git fetch origin; \
else \
cd third-party && git clone https://github.com/facebook/folly.git; \
fi
@# Pin to a particular version for public CI, so that PR authors don't
@# need to worry about folly breaking our integration. Update periodically
cd third-party/folly && git reset --hard 98b9b2c1124e99f50f9085ddee74ce32afffc665
@# A hack to remove boost dependency.
@# NOTE: this hack is not needed if using FBCODE compiler config
perl -pi -e 's/^(#include <boost)/\/\/$$1/' third-party/folly/folly/functional/Invoke.h
# ---------------------------------------------------------------------------
# Build size testing
# ---------------------------------------------------------------------------
REPORT_BUILD_STATISTIC?=echo STATISTIC:
build_size:
# === normal build, static ===
$(MAKE) clean
$(MAKE) static_lib
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.static_lib $$(stat --printf="%s" librocksdb.a)
strip librocksdb.a
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.static_lib_stripped $$(stat --printf="%s" librocksdb.a)
# === normal build, shared ===
$(MAKE) clean
$(MAKE) shared_lib
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.shared_lib $$(stat --printf="%s" `readlink -f librocksdb.so`)
strip `readlink -f librocksdb.so`
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.shared_lib_stripped $$(stat --printf="%s" `readlink -f librocksdb.so`)
# === lite build, static ===
$(MAKE) clean
$(MAKE) LITE=1 static_lib
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.static_lib_lite $$(stat --printf="%s" librocksdb.a)
strip librocksdb.a
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.static_lib_lite_stripped $$(stat --printf="%s" librocksdb.a)
# === lite build, shared ===
$(MAKE) clean
$(MAKE) LITE=1 shared_lib
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.shared_lib_lite $$(stat --printf="%s" `readlink -f librocksdb.so`)
strip `readlink -f librocksdb.so`
$(REPORT_BUILD_STATISTIC) rocksdb.build_size.shared_lib_lite_stripped $$(stat --printf="%s" `readlink -f librocksdb.so`)
# ---------------------------------------------------------------------------
# Platform-specific compilation
@ -2429,7 +2430,7 @@ endif
ifneq ($(SKIP_DEPENDS), 1)
DEPFILES = $(patsubst %.cc, $(OBJ_DIR)/%.cc.d, $(ALL_SOURCES))
DEPFILES+ = $(patsubst %.c, $(OBJ_DIR)/%.c.d, $(LIB_SOURCES_C) $(TEST_MAIN_SOURCES_C))
ifeq ($(USE_FOLLY_DISTRIBUTED_MUTEX),1)
ifeq ($(USE_FOLLY),1)
DEPFILES +=$(patsubst %.cpp, $(OBJ_DIR)/%.cpp.d, $(FOLLY_SOURCES))
endif
endif
@ -2477,7 +2478,7 @@ list_all_tests:
# Remove the rules for which dependencies should not be generated and see if any are left.
#If so, include the dependencies; if not, do not include the dependency files
ROCKS_DEP_RULES=$(filter-out clean format check-format check-buck-targets check-headers check-sources jclean jtest package analyze tags rocksdbjavastatic% unity.% unity_test, $(MAKECMDGOALS))
ROCKS_DEP_RULES=$(filter-out clean format check-format check-buck-targets check-headers check-sources jclean jtest package analyze tags rocksdbjavastatic% unity.% unity_test checkout_folly, $(MAKECMDGOALS))
ifneq ("$(ROCKS_DEP_RULES)", "")
-include $(DEPFILES)
endif

View File

@ -4,3 +4,4 @@ This is the list of all known third-party plugins for RocksDB. If something is m
* [HDFS](https://github.com/riversand963/rocksdb-hdfs-env): an Env used for interacting with HDFS. Migrated from main RocksDB repo
* [ZenFS](https://github.com/westerndigitalcorporation/zenfs): a file system for zoned block devices
* [RADOS](https://github.com/riversand963/rocksdb-rados-env): an Env used for interacting with RADOS. Migrated from RocksDB main repo.
* [PMEM](https://github.com/pmem/pmem-rocksdb-plugin): a collection of plugins to enable Persistent Memory on RocksDB.

View File

@ -4,7 +4,7 @@ RocksDBLite is a project focused on mobile use cases, which don't need a lot of
Some examples of the features disabled by ROCKSDB_LITE:
* compiled-in support for LDB tool
* No backupable DB
* No backup engine
* No support for replication (which we provide in form of TransactionalIterator)
* No advanced monitoring tools
* No special-purpose memtables that are highly optimized for specific use cases

931
TARGETS

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

10
buckifier/buckify_rocksdb.py Normal file → Executable file
View File

@ -1,3 +1,4 @@
#!/usr/bin/env python3
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
from __future__ import absolute_import
from __future__ import division
@ -143,7 +144,8 @@ def generate_targets(repo_path, deps_map):
src_mk["LIB_SOURCES"] +
# always add range_tree, it's only excluded on ppc64, which we don't use internally
src_mk["RANGE_TREE_SOURCES"] +
src_mk["TOOL_LIB_SOURCES"])
src_mk["TOOL_LIB_SOURCES"],
deps=["//folly/container:f14_hash"])
# rocksdb_whole_archive_lib
TARGETS.add_library(
"rocksdb_whole_archive_lib",
@ -151,7 +153,7 @@ def generate_targets(repo_path, deps_map):
# always add range_tree, it's only excluded on ppc64, which we don't use internally
src_mk["RANGE_TREE_SOURCES"] +
src_mk["TOOL_LIB_SOURCES"],
deps=None,
deps=["//folly/container:f14_hash"],
headers=None,
extra_external_deps="",
link_whole=True)
@ -183,6 +185,10 @@ def generate_targets(repo_path, deps_map):
src_mk.get("ANALYZER_LIB_SOURCES", [])
+ src_mk.get('STRESS_LIB_SOURCES', [])
+ ["test_util/testutil.cc"])
# db_stress binary
TARGETS.add_binary("db_stress",
["db_stress_tool/db_stress.cc"],
[":rocksdb_stress_lib"])
# bench binaries
for src in src_mk.get("MICROBENCH_SOURCES", []):
name = src.rsplit('/',1)[1].split('.')[0] if '/' in src else src.split('.')[0]

View File

@ -63,13 +63,8 @@ if [ -z "$ROCKSDB_NO_FBCODE" -a -d /mnt/gvfs/third-party ]; then
if [ "$LIB_MODE" == "shared" ]; then
PIC_BUILD=1
fi
if [ -n "$ROCKSDB_FBCODE_BUILD_WITH_481" ]; then
# we need this to build with MySQL. Don't use for other purposes.
source "$PWD/build_tools/fbcode_config4.8.1.sh"
elif [ -n "$ROCKSDB_FBCODE_BUILD_WITH_5xx" ]; then
source "$PWD/build_tools/fbcode_config.sh"
elif [ -n "$ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007" ]; then
source "$PWD/build_tools/fbcode_config_platform007.sh"
if [ -n "$ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010" ]; then
source "$PWD/build_tools/fbcode_config_platform010.sh"
elif [ -n "$ROCKSDB_FBCODE_BUILD_WITH_PLATFORM009" ]; then
source "$PWD/build_tools/fbcode_config_platform009.sh"
else
@ -474,7 +469,7 @@ EOF
if ! test $ROCKSDB_DISABLE_MEMKIND; then
# Test whether memkind library is installed
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -lmemkind -x c++ - -o test.o 2>/dev/null <<EOF
$CXX $PLATFORM_CXXFLAGS $LDFLAGS -x c++ - -o test.o -lmemkind 2>/dev/null <<EOF
#include <memkind.h>
int main() {
memkind_malloc(MEMKIND_DAX_KMEM, 1024);
@ -598,7 +593,7 @@ EOF
fi
if ! test $ROCKSDB_DISABLE_BENCHMARK; then
# Test whether google benchmark is available
$CXX $PLATFORM_CXXFLAGS -x c++ - -o /dev/null -lbenchmark 2>/dev/null <<EOF
$CXX $PLATFORM_CXXFLAGS -x c++ - -o /dev/null -lbenchmark -lpthread 2>/dev/null <<EOF
#include <benchmark/benchmark.h>
int main() {}
EOF
@ -885,8 +880,8 @@ if test -n "$WITH_JEMALLOC_FLAG"; then
echo "WITH_JEMALLOC_FLAG=$WITH_JEMALLOC_FLAG" >> "$OUTPUT"
fi
echo "LUA_PATH=$LUA_PATH" >> "$OUTPUT"
if test -n "$USE_FOLLY_DISTRIBUTED_MUTEX"; then
echo "USE_FOLLY_DISTRIBUTED_MUTEX=$USE_FOLLY_DISTRIBUTED_MUTEX" >> "$OUTPUT"
if test -n "$USE_FOLLY"; then
echo "USE_FOLLY=$USE_FOLLY" >> "$OUTPUT"
fi
if test -n "$PPC_LIBC_IS_GNU"; then
echo "PPC_LIBC_IS_GNU=$PPC_LIBC_IS_GNU" >> "$OUTPUT"

View File

@ -1,19 +0,0 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
GCC_BASE=/mnt/gvfs/third-party2/gcc/7331085db891a2ef4a88a48a751d834e8d68f4cb/5.x/centos7-native/c447969
CLANG_BASE=/mnt/gvfs/third-party2/llvm-fb/1bd23f9917738974ad0ff305aa23eb5f93f18305/9.0.0/centos7-native/c9f9104
LIBGCC_BASE=/mnt/gvfs/third-party2/libgcc/6ace84e956873d53638c738b6f65f3f469cca74c/5.x/gcc-5-glibc-2.23/339d858
GLIBC_BASE=/mnt/gvfs/third-party2/glibc/192b0f42d63dcf6210d6ceae387b49af049e6e0c/2.23/gcc-5-glibc-2.23/ca1d1c0
SNAPPY_BASE=/mnt/gvfs/third-party2/snappy/7f9bdaada18f59bc27ec2b0871eb8a6144343aef/1.1.3/gcc-5-glibc-2.23/9bc6787
ZLIB_BASE=/mnt/gvfs/third-party2/zlib/2d9f0b9a4274cc21f61272a9e89bdb859bce8f1f/1.2.8/gcc-5-glibc-2.23/9bc6787
BZIP2_BASE=/mnt/gvfs/third-party2/bzip2/dc49a21c5fceec6456a7a28a94dcd16690af1337/1.0.6/gcc-5-glibc-2.23/9bc6787
LZ4_BASE=/mnt/gvfs/third-party2/lz4/0f607f8fc442ea7d6b876931b1898bb573d5e5da/1.9.1/gcc-5-glibc-2.23/9bc6787
ZSTD_BASE=/mnt/gvfs/third-party2/zstd/ca22bc441a4eb709e9e0b1f9fec9750fed7b31c5/1.4.x/gcc-5-glibc-2.23/03859b5
GFLAGS_BASE=/mnt/gvfs/third-party2/gflags/0b9929d2588991c65a57168bf88aff2db87c5d48/2.2.0/gcc-5-glibc-2.23/9bc6787
JEMALLOC_BASE=/mnt/gvfs/third-party2/jemalloc/c26f08f47ac35fc31da2633b7da92d6b863246eb/master/gcc-5-glibc-2.23/0c8f76d
NUMA_BASE=/mnt/gvfs/third-party2/numa/3f3fb57a5ccc5fd21c66416c0b83e0aa76a05376/2.0.11/gcc-5-glibc-2.23/9bc6787
LIBUNWIND_BASE=/mnt/gvfs/third-party2/libunwind/40c73d874898b386a71847f1b99115d93822d11f/1.4/gcc-5-glibc-2.23/b443de1
TBB_BASE=/mnt/gvfs/third-party2/tbb/4ce8e8dba77cdbd81b75d6f0c32fd7a1b76a11ec/2018_U5/gcc-5-glibc-2.23/9bc6787
KERNEL_HEADERS_BASE=/mnt/gvfs/third-party2/kernel-headers/fb251ecd2f5ae16f8671f7014c246e52a748fe0b/4.0.9-36_fbk5_2933_gd092e3f/gcc-5-glibc-2.23/da39a3e
BINUTILS_BASE=/mnt/gvfs/third-party2/binutils/2e3cb7d119b3cea5f1e738cc13a1ac69f49eb875/2.29.1/centos7-native/da39a3e
VALGRIND_BASE=/mnt/gvfs/third-party2/valgrind/d42d152a15636529b0861ec493927200ebebca8e/3.15.0/gcc-5-glibc-2.23/9bc6787
LUA_BASE=/mnt/gvfs/third-party2/lua/f0cd714433206d5139df61659eb7b28b1dea6683/5.2.3/gcc-5-glibc-2.23/65372bd

View File

@ -1,20 +0,0 @@
# shellcheck disable=SC2148
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
GCC_BASE=/mnt/gvfs/third-party2/gcc/cf7d14c625ce30bae1a4661c2319c5a283e4dd22/4.8.1/centos6-native/cc6c9dc
CLANG_BASE=/mnt/gvfs/third-party2/llvm-fb/8598c375b0e94e1448182eb3df034704144a838d/stable/centos6-native/3f16ddd
LIBGCC_BASE=/mnt/gvfs/third-party2/libgcc/d6e0a7da6faba45f5e5b1638f9edd7afc2f34e7d/4.8.1/gcc-4.8.1-glibc-2.17/8aac7fc
GLIBC_BASE=/mnt/gvfs/third-party2/glibc/d282e6e8f3d20f4e40a516834847bdc038e07973/2.17/gcc-4.8.1-glibc-2.17/99df8fc
SNAPPY_BASE=/mnt/gvfs/third-party2/snappy/8c38a4c1e52b4c2cc8a9cdc31b9c947ed7dbfcb4/1.1.3/gcc-4.8.1-glibc-2.17/c3f970a
ZLIB_BASE=/mnt/gvfs/third-party2/zlib/0882df3713c7a84f15abe368dc004581f20b39d7/1.2.8/gcc-4.8.1-glibc-2.17/c3f970a
BZIP2_BASE=/mnt/gvfs/third-party2/bzip2/740325875f6729f42d28deaa2147b0854f3a347e/1.0.6/gcc-4.8.1-glibc-2.17/c3f970a
LZ4_BASE=/mnt/gvfs/third-party2/lz4/0e790b441e2d9acd68d51e1d2e028f88c6a79ddf/r131/gcc-4.8.1-glibc-2.17/c3f970a
ZSTD_BASE=/mnt/gvfs/third-party2/zstd/9455f75ff7f4831dc9fda02a6a0f8c68922fad8f/1.0.0/gcc-4.8.1-glibc-2.17/c3f970a
GFLAGS_BASE=/mnt/gvfs/third-party2/gflags/f001a51b2854957676d07306ef3abf67186b5c8b/2.1.1/gcc-4.8.1-glibc-2.17/c3f970a
JEMALLOC_BASE=/mnt/gvfs/third-party2/jemalloc/fc8a13ca1fffa4d0765c716c5a0b49f0c107518f/master/gcc-4.8.1-glibc-2.17/8d31e51
NUMA_BASE=/mnt/gvfs/third-party2/numa/17c514c4d102a25ca15f4558be564eeed76f4b6a/2.0.8/gcc-4.8.1-glibc-2.17/c3f970a
LIBUNWIND_BASE=/mnt/gvfs/third-party2/libunwind/ad576de2a1ea560c4d3434304f0fc4e079bede42/trunk/gcc-4.8.1-glibc-2.17/675d945
TBB_BASE=/mnt/gvfs/third-party2/tbb/9d9a554877d0c5bef330fe818ab7178806dd316a/4.0_update2/gcc-4.8.1-glibc-2.17/c3f970a
KERNEL_HEADERS_BASE=/mnt/gvfs/third-party2/kernel-headers/7c111ff27e0c466235163f00f280a9d617c3d2ec/4.0.9-36_fbk5_2933_gd092e3f/gcc-4.8.1-glibc-2.17/da39a3e
BINUTILS_BASE=/mnt/gvfs/third-party2/binutils/b7fd454c4b10c6a81015d4524ed06cdeab558490/2.26/centos6-native/da39a3e
VALGRIND_BASE=/mnt/gvfs/third-party2/valgrind/d7f4d4d86674a57668e3a96f76f0e17dd0eb8765/3.8.1/gcc-4.8.1-glibc-2.17/c3f970a
LUA_BASE=/mnt/gvfs/third-party2/lua/61e4abf5813bbc39bc4f548757ccfcadde175a48/5.2.3/centos6-native/730f94e

View File

@ -1,20 +0,0 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
GCC_BASE=/mnt/gvfs/third-party2/gcc/7331085db891a2ef4a88a48a751d834e8d68f4cb/7.x/centos7-native/b2ef2b6
CLANG_BASE=/mnt/gvfs/third-party2/llvm-fb/963d9aeda70cc4779885b1277484fe7544a04e3e/9.0.0/platform007/9e92d53/
LIBGCC_BASE=/mnt/gvfs/third-party2/libgcc/6ace84e956873d53638c738b6f65f3f469cca74c/7.x/platform007/5620abc
GLIBC_BASE=/mnt/gvfs/third-party2/glibc/192b0f42d63dcf6210d6ceae387b49af049e6e0c/2.26/platform007/f259413
SNAPPY_BASE=/mnt/gvfs/third-party2/snappy/7f9bdaada18f59bc27ec2b0871eb8a6144343aef/1.1.3/platform007/ca4da3d
ZLIB_BASE=/mnt/gvfs/third-party2/zlib/2d9f0b9a4274cc21f61272a9e89bdb859bce8f1f/1.2.8/platform007/ca4da3d
BZIP2_BASE=/mnt/gvfs/third-party2/bzip2/dc49a21c5fceec6456a7a28a94dcd16690af1337/1.0.6/platform007/ca4da3d
LZ4_BASE=/mnt/gvfs/third-party2/lz4/0f607f8fc442ea7d6b876931b1898bb573d5e5da/1.9.1/platform007/ca4da3d
ZSTD_BASE=/mnt/gvfs/third-party2/zstd/ca22bc441a4eb709e9e0b1f9fec9750fed7b31c5/1.4.x/platform007/15a3614
GFLAGS_BASE=/mnt/gvfs/third-party2/gflags/0b9929d2588991c65a57168bf88aff2db87c5d48/2.2.0/platform007/ca4da3d
JEMALLOC_BASE=/mnt/gvfs/third-party2/jemalloc/c26f08f47ac35fc31da2633b7da92d6b863246eb/master/platform007/c26c002
NUMA_BASE=/mnt/gvfs/third-party2/numa/3f3fb57a5ccc5fd21c66416c0b83e0aa76a05376/2.0.11/platform007/ca4da3d
LIBUNWIND_BASE=/mnt/gvfs/third-party2/libunwind/40c73d874898b386a71847f1b99115d93822d11f/1.4/platform007/6f3e0a9
TBB_BASE=/mnt/gvfs/third-party2/tbb/4ce8e8dba77cdbd81b75d6f0c32fd7a1b76a11ec/2018_U5/platform007/ca4da3d
LIBURING_BASE=/mnt/gvfs/third-party2/liburing/79427253fd0d42677255aacfe6d13bfe63f752eb/20190828/platform007/ca4da3d
KERNEL_HEADERS_BASE=/mnt/gvfs/third-party2/kernel-headers/fb251ecd2f5ae16f8671f7014c246e52a748fe0b/fb/platform007/da39a3e
BINUTILS_BASE=/mnt/gvfs/third-party2/binutils/ab9f09bba370e7066cafd4eb59752db93f2e8312/2.29.1/platform007/15a3614
VALGRIND_BASE=/mnt/gvfs/third-party2/valgrind/d42d152a15636529b0861ec493927200ebebca8e/3.15.0/platform007/ca4da3d
LUA_BASE=/mnt/gvfs/third-party2/lua/f0cd714433206d5139df61659eb7b28b1dea6683/5.3.4/platform007/5007832

View File

@ -18,4 +18,5 @@ KERNEL_HEADERS_BASE=/mnt/gvfs/third-party2/kernel-headers/32b8a2407b634df3f8f948
BINUTILS_BASE=/mnt/gvfs/third-party2/binutils/08634589372fa5f237bfd374e8c644a8364e78c1/2.32/platform009/ba86d1f/
VALGRIND_BASE=/mnt/gvfs/third-party2/valgrind/6ae525939ad02e5e676855082fbbc7828dbafeac/3.15.0/platform009/7f3b187
LUA_BASE=/mnt/gvfs/third-party2/lua/162efd9561a3d21f6869f4814011e9cf1b3ff4dc/5.3.4/platform009/a6271c4
BENCHMARK_BASE=/mnt/gvfs/third-party2/benchmark/bce8d9564eaf161700aa3a20b1051564acf555fb/1.5.5/platform009/7f3b187
BENCHMARK_BASE=/mnt/gvfs/third-party2/benchmark/30bf49ad6414325e17f3425b0edcb64239427ae3/1.6.1/platform009/7f3b187
BOOST_BASE=/mnt/gvfs/third-party2/boost/201b7d74941e54b436dfa364a063aa6d2cd7de4c/1.69.0/platform009/8a7ffdf

View File

@ -0,0 +1,22 @@
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
# The file is generated using update_dependencies.sh.
GCC_BASE=/mnt/gvfs/third-party2/gcc/e40bde78650fa91b8405a857e3f10bf336633fb0/11.x/centos7-native/886b5eb
CLANG_BASE=/mnt/gvfs/third-party2/llvm-fb/2043340983c032915adbb6f78903dc855b65aee8/12/platform010/9520e0f
LIBGCC_BASE=/mnt/gvfs/third-party2/libgcc/c00dcc6a3e4125c7e8b248e9a79c14b78ac9e0ca/11.x/platform010/5684a5a
GLIBC_BASE=/mnt/gvfs/third-party2/glibc/0b9c8e4b060eda62f3bc1c6127bbe1256697569b/2.34/platform010/f259413
SNAPPY_BASE=/mnt/gvfs/third-party2/snappy/bc9647f7912b131315827d65cb6189c21f381d05/1.1.3/platform010/76ebdda
ZLIB_BASE=/mnt/gvfs/third-party2/zlib/a6f5f3f1d063d2d00cd02fc12f0f05fc3ab3a994/1.2.11/platform010/76ebdda
BZIP2_BASE=/mnt/gvfs/third-party2/bzip2/09703139cfc376bd8a82642385a0e97726b28287/1.0.6/platform010/76ebdda
LZ4_BASE=/mnt/gvfs/third-party2/lz4/60220d6a5bf7722b9cc239a1368c596619b12060/1.9.1/platform010/76ebdda
ZSTD_BASE=/mnt/gvfs/third-party2/zstd/50eace8143eaaea9473deae1f3283e0049e05633/1.4.x/platform010/64091f4
GFLAGS_BASE=/mnt/gvfs/third-party2/gflags/5d27e5919771603da06000a027b12f799e58a4f7/2.2.0/platform010/76ebdda
JEMALLOC_BASE=/mnt/gvfs/third-party2/jemalloc/b62912d333ef33f9760efa6219dbe3fe6abb3b0e/master/platform010/f57cc4a
NUMA_BASE=/mnt/gvfs/third-party2/numa/6b412770957aa3c8a87e5e0dcd8cc2f45f393bc0/2.0.11/platform010/76ebdda
LIBUNWIND_BASE=/mnt/gvfs/third-party2/libunwind/52f69816e936e147664ad717eb71a1a0e9dc973a/1.4/platform010/5074a48
TBB_BASE=/mnt/gvfs/third-party2/tbb/c9cc192099fa84c0dcd0ffeedd44a373ad6e4925/2018_U5/platform010/76ebdda
LIBURING_BASE=/mnt/gvfs/third-party2/liburing/a98e2d137007e3ebf7f33bd6f99c2c56bdaf8488/20210212/platform010/76ebdda
BENCHMARK_BASE=/mnt/gvfs/third-party2/benchmark/780c7a0f9cf0967961e69ad08e61cddd85d61821/trunk/platform010/76ebdda
KERNEL_HEADERS_BASE=/mnt/gvfs/third-party2/kernel-headers/02d9f76aaaba580611cf75e741753c800c7fdc12/fb/platform010/da39a3e
BINUTILS_BASE=/mnt/gvfs/third-party2/binutils/938dc3f064ef3a48c0446f5b11d788d50b3eb5ee/2.37/centos7-native/da39a3e
VALGRIND_BASE=/mnt/gvfs/third-party2/valgrind/429a6b3203eb415f1599bd15183659153129188e/3.15.0/platform010/76ebdda
LUA_BASE=/mnt/gvfs/third-party2/lua/363787fa5cac2a8aa20638909210443278fa138e/5.3.4/platform010/9079c97

View File

@ -21,38 +21,48 @@ LIBGCC_LIBS=" -L $LIBGCC_BASE/lib"
GLIBC_INCLUDE="$GLIBC_BASE/include"
GLIBC_LIBS=" -L $GLIBC_BASE/lib"
# snappy
SNAPPY_INCLUDE=" -I $SNAPPY_BASE/include/"
if test -z $PIC_BUILD; then
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy.a"
else
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy_pic.a"
fi
CFLAGS+=" -DSNAPPY"
if test -z $PIC_BUILD; then
# location of zlib headers and libraries
ZLIB_INCLUDE=" -I $ZLIB_BASE/include/"
ZLIB_LIBS=" $ZLIB_BASE/lib/libz.a"
CFLAGS+=" -DZLIB"
# location of bzip headers and libraries
BZIP_INCLUDE=" -I $BZIP2_BASE/include/"
BZIP_LIBS=" $BZIP2_BASE/lib/libbz2.a"
CFLAGS+=" -DBZIP2"
LZ4_INCLUDE=" -I $LZ4_BASE/include/"
LZ4_LIBS=" $LZ4_BASE/lib/liblz4.a"
CFLAGS+=" -DLZ4"
if ! test $ROCKSDB_DISABLE_SNAPPY; then
# snappy
SNAPPY_INCLUDE=" -I $SNAPPY_BASE/include/"
if test -z $PIC_BUILD; then
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy.a"
else
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy_pic.a"
fi
CFLAGS+=" -DSNAPPY"
fi
ZSTD_INCLUDE=" -I $ZSTD_BASE/include/"
if test -z $PIC_BUILD; then
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd.a"
else
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd_pic.a"
if ! test $ROCKSDB_DISABLE_ZLIB; then
# location of zlib headers and libraries
ZLIB_INCLUDE=" -I $ZLIB_BASE/include/"
ZLIB_LIBS=" $ZLIB_BASE/lib/libz.a"
CFLAGS+=" -DZLIB"
fi
if ! test $ROCKSDB_DISABLE_BZIP; then
# location of bzip headers and libraries
BZIP_INCLUDE=" -I $BZIP2_BASE/include/"
BZIP_LIBS=" $BZIP2_BASE/lib/libbz2.a"
CFLAGS+=" -DBZIP2"
fi
if ! test $ROCKSDB_DISABLE_LZ4; then
LZ4_INCLUDE=" -I $LZ4_BASE/include/"
LZ4_LIBS=" $LZ4_BASE/lib/liblz4.a"
CFLAGS+=" -DLZ4"
fi
fi
if ! test $ROCKSDB_DISABLE_ZSTD; then
ZSTD_INCLUDE=" -I $ZSTD_BASE/include/"
if test -z $PIC_BUILD; then
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd.a"
else
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd_pic.a"
fi
CFLAGS+=" -DZSTD -DZSTD_STATIC_LINKING_ONLY"
fi
CFLAGS+=" -DZSTD -DZSTD_STATIC_LINKING_ONLY"
# location of gflags headers and libraries
GFLAGS_INCLUDE=" -I $GFLAGS_BASE/include/"
@ -162,6 +172,4 @@ else
LUA_LIB=" $LUA_PATH/lib/liblua_pic.a"
fi
USE_FOLLY_DISTRIBUTED_MUTEX=1
export CC CXX AR CFLAGS CXXFLAGS EXEC_LDFLAGS EXEC_LDFLAGS_SHARED VALGRIND_VER JEMALLOC_LIB JEMALLOC_INCLUDE CLANG_ANALYZER CLANG_SCAN_BUILD LUA_PATH LUA_LIB

View File

@ -1,120 +0,0 @@
#!/bin/sh
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
#
# Set environment variables so that we can compile rocksdb using
# fbcode settings. It uses the latest g++ compiler and also
# uses jemalloc
BASEDIR=`dirname $BASH_SOURCE`
source "$BASEDIR/dependencies_4.8.1.sh"
# location of libgcc
LIBGCC_INCLUDE="$LIBGCC_BASE/include"
LIBGCC_LIBS=" -L $LIBGCC_BASE/lib"
# location of glibc
GLIBC_INCLUDE="$GLIBC_BASE/include"
GLIBC_LIBS=" -L $GLIBC_BASE/lib"
# location of snappy headers and libraries
SNAPPY_INCLUDE=" -I $SNAPPY_BASE/include"
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy.a"
# location of zlib headers and libraries
ZLIB_INCLUDE=" -I $ZLIB_BASE/include"
ZLIB_LIBS=" $ZLIB_BASE/lib/libz.a"
# location of bzip headers and libraries
BZIP2_INCLUDE=" -I $BZIP2_BASE/include/"
BZIP2_LIBS=" $BZIP2_BASE/lib/libbz2.a"
LZ4_INCLUDE=" -I $LZ4_BASE/include"
LZ4_LIBS=" $LZ4_BASE/lib/liblz4.a"
ZSTD_INCLUDE=" -I $ZSTD_BASE/include"
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd.a"
# location of gflags headers and libraries
GFLAGS_INCLUDE=" -I $GFLAGS_BASE/include/"
GFLAGS_LIBS=" $GFLAGS_BASE/lib/libgflags.a"
# location of jemalloc
JEMALLOC_INCLUDE=" -I $JEMALLOC_BASE/include"
JEMALLOC_LIB="$JEMALLOC_BASE/lib/libjemalloc.a"
# location of numa
NUMA_INCLUDE=" -I $NUMA_BASE/include/"
NUMA_LIB=" $NUMA_BASE/lib/libnuma.a"
# location of libunwind
LIBUNWIND="$LIBUNWIND_BASE/lib/libunwind.a"
# location of tbb
TBB_INCLUDE=" -isystem $TBB_BASE/include/"
TBB_LIBS="$TBB_BASE/lib/libtbb.a"
test "$USE_SSE" || USE_SSE=1
export USE_SSE
test "$PORTABLE" || PORTABLE=1
export PORTABLE
BINUTILS="$BINUTILS_BASE/bin"
AR="$BINUTILS/ar"
DEPS_INCLUDE="$SNAPPY_INCLUDE $ZLIB_INCLUDE $BZIP2_INCLUDE $LZ4_INCLUDE $ZSTD_INCLUDE $GFLAGS_INCLUDE $NUMA_INCLUDE $TBB_INCLUDE"
STDLIBS="-L $GCC_BASE/lib64"
if [ -z "$USE_CLANG" ]; then
# gcc
CC="$GCC_BASE/bin/gcc"
CXX="$GCC_BASE/bin/g++"
CXX="$GCC_BASE/bin/gcc-ar"
CFLAGS="-B$BINUTILS/gold -m64 -mtune=generic"
CFLAGS+=" -isystem $GLIBC_INCLUDE"
CFLAGS+=" -isystem $LIBGCC_INCLUDE"
JEMALLOC=1
else
# clang
CLANG_BIN="$CLANG_BASE/bin"
CLANG_LIB="$CLANG_BASE/lib"
CLANG_INCLUDE="$CLANG_LIB/clang/*/include"
CC="$CLANG_BIN/clang"
CXX="$CLANG_BIN/clang++"
AR="$CLANG_BIN/llvm-ar"
KERNEL_HEADERS_INCLUDE="$KERNEL_HEADERS_BASE/include/"
CFLAGS="-B$BINUTILS/gold -nostdinc -nostdlib"
CFLAGS+=" -isystem $LIBGCC_BASE/include/c++/4.8.1 "
CFLAGS+=" -isystem $LIBGCC_BASE/include/c++/4.8.1/x86_64-facebook-linux "
CFLAGS+=" -isystem $GLIBC_INCLUDE"
CFLAGS+=" -isystem $LIBGCC_INCLUDE"
CFLAGS+=" -isystem $CLANG_INCLUDE"
CFLAGS+=" -isystem $KERNEL_HEADERS_INCLUDE/linux "
CFLAGS+=" -isystem $KERNEL_HEADERS_INCLUDE "
CXXFLAGS="-nostdinc++"
fi
CFLAGS+=" $DEPS_INCLUDE"
CFLAGS+=" -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DROCKSDB_FALLOCATE_PRESENT -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_SUPPORT_THREAD_LOCAL -DHAVE_SSE42"
CFLAGS+=" -DSNAPPY -DGFLAGS=google -DZLIB -DBZIP2 -DLZ4 -DZSTD -DNUMA -DTBB"
CXXFLAGS+=" $CFLAGS"
EXEC_LDFLAGS=" $SNAPPY_LIBS $ZLIB_LIBS $BZIP2_LIBS $LZ4_LIBS $ZSTD_LIBS $GFLAGS_LIBS $NUMA_LIB $TBB_LIBS"
EXEC_LDFLAGS+=" -Wl,--dynamic-linker,/usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/ld.so"
EXEC_LDFLAGS+=" $LIBUNWIND"
EXEC_LDFLAGS+=" -Wl,-rpath=/usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib"
# required by libtbb
EXEC_LDFLAGS+=" -ldl"
PLATFORM_LDFLAGS="$LIBGCC_LIBS $GLIBC_LIBS $STDLIBS -lgcc -lstdc++"
EXEC_LDFLAGS_SHARED="$SNAPPY_LIBS $ZLIB_LIBS $BZIP2_LIBS $LZ4_LIBS $ZSTD_LIBS $GFLAGS_LIBS"
VALGRIND_VER="$VALGRIND_BASE/bin/"
LUA_PATH="$LUA_BASE"
export CC CXX AR CFLAGS CXXFLAGS EXEC_LDFLAGS EXEC_LDFLAGS_SHARED VALGRIND_VER JEMALLOC_LIB JEMALLOC_INCLUDE LUA_PATH

View File

@ -1,170 +0,0 @@
#!/bin/sh
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
#
# Set environment variables so that we can compile rocksdb using
# fbcode settings. It uses the latest g++ and clang compilers and also
# uses jemalloc
# Environment variables that change the behavior of this script:
# PIC_BUILD -- if true, it will only take pic versions of libraries from fbcode. libraries that don't have pic variant will not be included
BASEDIR=`dirname $BASH_SOURCE`
source "$BASEDIR/dependencies_platform007.sh"
CFLAGS=""
# libgcc
LIBGCC_INCLUDE="$LIBGCC_BASE/include/c++/7.3.0"
LIBGCC_LIBS=" -L $LIBGCC_BASE/lib"
# glibc
GLIBC_INCLUDE="$GLIBC_BASE/include"
GLIBC_LIBS=" -L $GLIBC_BASE/lib"
# snappy
SNAPPY_INCLUDE=" -I $SNAPPY_BASE/include/"
if test -z $PIC_BUILD; then
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy.a"
else
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy_pic.a"
fi
CFLAGS+=" -DSNAPPY"
if test -z $PIC_BUILD; then
# location of zlib headers and libraries
ZLIB_INCLUDE=" -I $ZLIB_BASE/include/"
ZLIB_LIBS=" $ZLIB_BASE/lib/libz.a"
CFLAGS+=" -DZLIB"
# location of bzip headers and libraries
BZIP_INCLUDE=" -I $BZIP2_BASE/include/"
BZIP_LIBS=" $BZIP2_BASE/lib/libbz2.a"
CFLAGS+=" -DBZIP2"
LZ4_INCLUDE=" -I $LZ4_BASE/include/"
LZ4_LIBS=" $LZ4_BASE/lib/liblz4.a"
CFLAGS+=" -DLZ4"
fi
ZSTD_INCLUDE=" -I $ZSTD_BASE/include/"
if test -z $PIC_BUILD; then
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd.a"
else
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd_pic.a"
fi
CFLAGS+=" -DZSTD"
# location of gflags headers and libraries
GFLAGS_INCLUDE=" -I $GFLAGS_BASE/include/"
if test -z $PIC_BUILD; then
GFLAGS_LIBS=" $GFLAGS_BASE/lib/libgflags.a"
else
GFLAGS_LIBS=" $GFLAGS_BASE/lib/libgflags_pic.a"
fi
CFLAGS+=" -DGFLAGS=gflags"
# location of jemalloc
JEMALLOC_INCLUDE=" -I $JEMALLOC_BASE/include/"
JEMALLOC_LIB=" $JEMALLOC_BASE/lib/libjemalloc.a"
if test -z $PIC_BUILD; then
# location of numa
NUMA_INCLUDE=" -I $NUMA_BASE/include/"
NUMA_LIB=" $NUMA_BASE/lib/libnuma.a"
CFLAGS+=" -DNUMA"
# location of libunwind
LIBUNWIND="$LIBUNWIND_BASE/lib/libunwind.a"
fi
# location of TBB
TBB_INCLUDE=" -isystem $TBB_BASE/include/"
if test -z $PIC_BUILD; then
TBB_LIBS="$TBB_BASE/lib/libtbb.a"
else
TBB_LIBS="$TBB_BASE/lib/libtbb_pic.a"
fi
CFLAGS+=" -DTBB"
# location of LIBURING
LIBURING_INCLUDE=" -isystem $LIBURING_BASE/include/"
if test -z $PIC_BUILD; then
LIBURING_LIBS="$LIBURING_BASE/lib/liburing.a"
else
LIBURING_LIBS="$LIBURING_BASE/lib/liburing_pic.a"
fi
CFLAGS+=" -DLIBURING"
test "$USE_SSE" || USE_SSE=1
export USE_SSE
test "$PORTABLE" || PORTABLE=1
export PORTABLE
BINUTILS="$BINUTILS_BASE/bin"
AR="$BINUTILS/ar"
DEPS_INCLUDE="$SNAPPY_INCLUDE $ZLIB_INCLUDE $BZIP_INCLUDE $LZ4_INCLUDE $ZSTD_INCLUDE $GFLAGS_INCLUDE $NUMA_INCLUDE $TBB_INCLUDE $LIBURING_INCLUDE"
STDLIBS="-L $GCC_BASE/lib64"
CLANG_BIN="$CLANG_BASE/bin"
CLANG_LIB="$CLANG_BASE/lib"
CLANG_SRC="$CLANG_BASE/../../src"
CLANG_ANALYZER="$CLANG_BIN/clang++"
CLANG_SCAN_BUILD="$CLANG_SRC/llvm/tools/clang/tools/scan-build/bin/scan-build"
if [ -z "$USE_CLANG" ]; then
# gcc
CC="$GCC_BASE/bin/gcc"
CXX="$GCC_BASE/bin/g++"
AR="$GCC_BASE/bin/gcc-ar"
CFLAGS+=" -B$BINUTILS/gold"
CFLAGS+=" -isystem $LIBGCC_INCLUDE"
CFLAGS+=" -isystem $GLIBC_INCLUDE"
JEMALLOC=1
else
# clang
CLANG_INCLUDE="$CLANG_LIB/clang/stable/include"
CC="$CLANG_BIN/clang"
CXX="$CLANG_BIN/clang++"
AR="$CLANG_BIN/llvm-ar"
KERNEL_HEADERS_INCLUDE="$KERNEL_HEADERS_BASE/include"
CFLAGS+=" -B$BINUTILS/gold -nostdinc -nostdlib"
CFLAGS+=" -isystem $LIBGCC_BASE/include/c++/7.x "
CFLAGS+=" -isystem $LIBGCC_BASE/include/c++/7.x/x86_64-facebook-linux "
CFLAGS+=" -isystem $GLIBC_INCLUDE"
CFLAGS+=" -isystem $LIBGCC_INCLUDE"
CFLAGS+=" -isystem $CLANG_INCLUDE"
CFLAGS+=" -isystem $KERNEL_HEADERS_INCLUDE/linux "
CFLAGS+=" -isystem $KERNEL_HEADERS_INCLUDE "
CFLAGS+=" -Wno-expansion-to-defined "
CXXFLAGS="-nostdinc++"
fi
CFLAGS+=" $DEPS_INCLUDE"
CFLAGS+=" -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DROCKSDB_FALLOCATE_PRESENT -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_SUPPORT_THREAD_LOCAL -DHAVE_SSE42 -DROCKSDB_IOURING_PRESENT"
CXXFLAGS+=" $CFLAGS"
EXEC_LDFLAGS=" $SNAPPY_LIBS $ZLIB_LIBS $BZIP_LIBS $LZ4_LIBS $ZSTD_LIBS $GFLAGS_LIBS $NUMA_LIB $TBB_LIBS $LIBURING_LIBS"
EXEC_LDFLAGS+=" -B$BINUTILS/gold"
EXEC_LDFLAGS+=" -Wl,--dynamic-linker,/usr/local/fbcode/platform007/lib/ld.so"
EXEC_LDFLAGS+=" $LIBUNWIND"
EXEC_LDFLAGS+=" -Wl,-rpath=/usr/local/fbcode/platform007/lib"
# required by libtbb
EXEC_LDFLAGS+=" -ldl"
PLATFORM_LDFLAGS="$LIBGCC_LIBS $GLIBC_LIBS $STDLIBS -lgcc -lstdc++"
EXEC_LDFLAGS_SHARED="$SNAPPY_LIBS $ZLIB_LIBS $BZIP_LIBS $LZ4_LIBS $ZSTD_LIBS $GFLAGS_LIBS $TBB_LIBS $LIBURING_LIBS"
VALGRIND_VER="$VALGRIND_BASE/bin/"
# lua not supported because it's on track for deprecation, I think
LUA_PATH=
LUA_LIB=
export CC CXX AR CFLAGS CXXFLAGS EXEC_LDFLAGS EXEC_LDFLAGS_SHARED VALGRIND_VER JEMALLOC_LIB JEMALLOC_INCLUDE CLANG_ANALYZER CLANG_SCAN_BUILD LUA_PATH LUA_LIB

View File

@ -27,28 +27,38 @@ else
MAYBE_PIC=_pic
fi
# snappy
SNAPPY_INCLUDE=" -I $SNAPPY_BASE/include/"
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy${MAYBE_PIC}.a"
CFLAGS+=" -DSNAPPY"
if ! test $ROCKSDB_DISABLE_SNAPPY; then
# snappy
SNAPPY_INCLUDE=" -I $SNAPPY_BASE/include/"
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy${MAYBE_PIC}.a"
CFLAGS+=" -DSNAPPY"
fi
# location of zlib headers and libraries
ZLIB_INCLUDE=" -I $ZLIB_BASE/include/"
ZLIB_LIBS=" $ZLIB_BASE/lib/libz${MAYBE_PIC}.a"
CFLAGS+=" -DZLIB"
if ! test $ROCKSDB_DISABLE_ZLIB; then
# location of zlib headers and libraries
ZLIB_INCLUDE=" -I $ZLIB_BASE/include/"
ZLIB_LIBS=" $ZLIB_BASE/lib/libz${MAYBE_PIC}.a"
CFLAGS+=" -DZLIB"
fi
# location of bzip headers and libraries
BZIP_INCLUDE=" -I $BZIP2_BASE/include/"
BZIP_LIBS=" $BZIP2_BASE/lib/libbz2${MAYBE_PIC}.a"
CFLAGS+=" -DBZIP2"
if ! test $ROCKSDB_DISABLE_BZIP; then
# location of bzip headers and libraries
BZIP_INCLUDE=" -I $BZIP2_BASE/include/"
BZIP_LIBS=" $BZIP2_BASE/lib/libbz2${MAYBE_PIC}.a"
CFLAGS+=" -DBZIP2"
fi
LZ4_INCLUDE=" -I $LZ4_BASE/include/"
LZ4_LIBS=" $LZ4_BASE/lib/liblz4${MAYBE_PIC}.a"
CFLAGS+=" -DLZ4"
if ! test $ROCKSDB_DISABLE_LZ4; then
LZ4_INCLUDE=" -I $LZ4_BASE/include/"
LZ4_LIBS=" $LZ4_BASE/lib/liblz4${MAYBE_PIC}.a"
CFLAGS+=" -DLZ4"
fi
ZSTD_INCLUDE=" -I $ZSTD_BASE/include/"
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd${MAYBE_PIC}.a"
CFLAGS+=" -DZSTD"
if ! test $ROCKSDB_DISABLE_ZSTD; then
ZSTD_INCLUDE=" -I $ZSTD_BASE/include/"
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd${MAYBE_PIC}.a"
CFLAGS+=" -DZSTD"
fi
# location of gflags headers and libraries
GFLAGS_INCLUDE=" -I $GFLAGS_BASE/include/"
@ -58,6 +68,8 @@ CFLAGS+=" -DGFLAGS=gflags"
BENCHMARK_INCLUDE=" -I $BENCHMARK_BASE/include/"
BENCHMARK_LIBS=" $BENCHMARK_BASE/lib/libbenchmark${MAYBE_PIC}.a"
BOOST_INCLUDE=" -I $BOOST_BASE/include/"
# location of jemalloc
JEMALLOC_INCLUDE=" -I $JEMALLOC_BASE/include/"
JEMALLOC_LIB=" $JEMALLOC_BASE/lib/libjemalloc${MAYBE_PIC}.a"
@ -89,7 +101,7 @@ BINUTILS="$BINUTILS_BASE/bin"
AR="$BINUTILS/ar"
AS="$BINUTILS/as"
DEPS_INCLUDE="$SNAPPY_INCLUDE $ZLIB_INCLUDE $BZIP_INCLUDE $LZ4_INCLUDE $ZSTD_INCLUDE $GFLAGS_INCLUDE $NUMA_INCLUDE $TBB_INCLUDE $LIBURING_INCLUDE $BENCHMARK_INCLUDE"
DEPS_INCLUDE="$SNAPPY_INCLUDE $ZLIB_INCLUDE $BZIP_INCLUDE $LZ4_INCLUDE $ZSTD_INCLUDE $GFLAGS_INCLUDE $NUMA_INCLUDE $TBB_INCLUDE $LIBURING_INCLUDE $BENCHMARK_INCLUDE $BOOST_INCLUDE"
STDLIBS="-L $GCC_BASE/lib64"

View File

@ -0,0 +1,175 @@
#!/bin/sh
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
#
# Set environment variables so that we can compile rocksdb using
# fbcode settings. It uses the latest g++ and clang compilers and also
# uses jemalloc
# Environment variables that change the behavior of this script:
# PIC_BUILD -- if true, it will only take pic versions of libraries from fbcode. libraries that don't have pic variant will not be included
BASEDIR=`dirname $BASH_SOURCE`
source "$BASEDIR/dependencies_platform010.sh"
# Disallow using libraries from default locations as they might not be compatible with platform010 libraries.
CFLAGS=" --sysroot=/DOES/NOT/EXIST"
# libgcc
LIBGCC_INCLUDE="$LIBGCC_BASE/include/c++/trunk"
LIBGCC_LIBS=" -L $LIBGCC_BASE/lib -B$LIBGCC_BASE/lib/gcc/x86_64-facebook-linux/trunk/"
# glibc
GLIBC_INCLUDE="$GLIBC_BASE/include"
GLIBC_LIBS=" -L $GLIBC_BASE/lib"
GLIBC_LIBS+=" -B$GLIBC_BASE/lib"
if test -z $PIC_BUILD; then
MAYBE_PIC=
else
MAYBE_PIC=_pic
fi
if ! test $ROCKSDB_DISABLE_SNAPPY; then
# snappy
SNAPPY_INCLUDE=" -I $SNAPPY_BASE/include/"
SNAPPY_LIBS=" $SNAPPY_BASE/lib/libsnappy${MAYBE_PIC}.a"
CFLAGS+=" -DSNAPPY"
fi
if ! test $ROCKSDB_DISABLE_ZLIB; then
# location of zlib headers and libraries
ZLIB_INCLUDE=" -I $ZLIB_BASE/include/"
ZLIB_LIBS=" $ZLIB_BASE/lib/libz${MAYBE_PIC}.a"
CFLAGS+=" -DZLIB"
fi
if ! test $ROCKSDB_DISABLE_BZIP; then
# location of bzip headers and libraries
BZIP_INCLUDE=" -I $BZIP2_BASE/include/"
BZIP_LIBS=" $BZIP2_BASE/lib/libbz2${MAYBE_PIC}.a"
CFLAGS+=" -DBZIP2"
fi
if ! test $ROCKSDB_DISABLE_LZ4; then
LZ4_INCLUDE=" -I $LZ4_BASE/include/"
LZ4_LIBS=" $LZ4_BASE/lib/liblz4${MAYBE_PIC}.a"
CFLAGS+=" -DLZ4"
fi
if ! test $ROCKSDB_DISABLE_ZSTD; then
ZSTD_INCLUDE=" -I $ZSTD_BASE/include/"
ZSTD_LIBS=" $ZSTD_BASE/lib/libzstd${MAYBE_PIC}.a"
CFLAGS+=" -DZSTD"
fi
# location of gflags headers and libraries
GFLAGS_INCLUDE=" -I $GFLAGS_BASE/include/"
GFLAGS_LIBS=" $GFLAGS_BASE/lib/libgflags${MAYBE_PIC}.a"
CFLAGS+=" -DGFLAGS=gflags"
BENCHMARK_INCLUDE=" -I $BENCHMARK_BASE/include/"
BENCHMARK_LIBS=" $BENCHMARK_BASE/lib/libbenchmark${MAYBE_PIC}.a"
# location of jemalloc
JEMALLOC_INCLUDE=" -I $JEMALLOC_BASE/include/"
JEMALLOC_LIB=" $JEMALLOC_BASE/lib/libjemalloc${MAYBE_PIC}.a"
# location of numa
NUMA_INCLUDE=" -I $NUMA_BASE/include/"
NUMA_LIB=" $NUMA_BASE/lib/libnuma${MAYBE_PIC}.a"
CFLAGS+=" -DNUMA"
# location of libunwind
LIBUNWIND="$LIBUNWIND_BASE/lib/libunwind${MAYBE_PIC}.a"
# location of TBB
TBB_INCLUDE=" -isystem $TBB_BASE/include/"
TBB_LIBS="$TBB_BASE/lib/libtbb${MAYBE_PIC}.a"
CFLAGS+=" -DTBB"
# location of LIBURING
LIBURING_INCLUDE=" -isystem $LIBURING_BASE/include/"
LIBURING_LIBS="$LIBURING_BASE/lib/liburing${MAYBE_PIC}.a"
CFLAGS+=" -DLIBURING"
test "$USE_SSE" || USE_SSE=1
export USE_SSE
test "$PORTABLE" || PORTABLE=1
export PORTABLE
BINUTILS="$BINUTILS_BASE/bin"
AR="$BINUTILS/ar"
AS="$BINUTILS/as"
DEPS_INCLUDE="$SNAPPY_INCLUDE $ZLIB_INCLUDE $BZIP_INCLUDE $LZ4_INCLUDE $ZSTD_INCLUDE $GFLAGS_INCLUDE $NUMA_INCLUDE $TBB_INCLUDE $LIBURING_INCLUDE $BENCHMARK_INCLUDE"
STDLIBS="-L $GCC_BASE/lib64"
CLANG_BIN="$CLANG_BASE/bin"
CLANG_LIB="$CLANG_BASE/lib"
CLANG_SRC="$CLANG_BASE/../../src"
CLANG_ANALYZER="$CLANG_BIN/clang++"
CLANG_SCAN_BUILD="$CLANG_SRC/llvm/clang/tools/scan-build/bin/scan-build"
if [ -z "$USE_CLANG" ]; then
# gcc
CC="$GCC_BASE/bin/gcc"
CXX="$GCC_BASE/bin/g++"
AR="$GCC_BASE/bin/gcc-ar"
CFLAGS+=" -B$BINUTILS -nostdinc -nostdlib"
CFLAGS+=" -I$GCC_BASE/include"
CFLAGS+=" -isystem $GCC_BASE/lib/gcc/x86_64-redhat-linux-gnu/11.2.1/include"
CFLAGS+=" -isystem $GCC_BASE/lib/gcc/x86_64-redhat-linux-gnu/11.2.1/install-tools/include"
CFLAGS+=" -isystem $GCC_BASE/lib/gcc/x86_64-redhat-linux-gnu/11.2.1/include-fixed/"
CFLAGS+=" -isystem $LIBGCC_INCLUDE"
CFLAGS+=" -isystem $GLIBC_INCLUDE"
CFLAGS+=" -I$GLIBC_INCLUDE"
CFLAGS+=" -I$LIBGCC_BASE/include"
CFLAGS+=" -I$LIBGCC_BASE/include/c++/11.x/"
CFLAGS+=" -I$LIBGCC_BASE/include/c++/11.x/x86_64-facebook-linux/"
CFLAGS+=" -I$LIBGCC_BASE/include/c++/11.x/backward"
CFLAGS+=" -isystem $GLIBC_INCLUDE -I$GLIBC_INCLUDE"
JEMALLOC=1
else
# clang
CLANG_INCLUDE="$CLANG_LIB/clang/stable/include"
CC="$CLANG_BIN/clang"
CXX="$CLANG_BIN/clang++"
AR="$CLANG_BIN/llvm-ar"
CFLAGS+=" -B$BINUTILS -nostdinc -nostdlib"
CFLAGS+=" -isystem $LIBGCC_BASE/include/c++/trunk "
CFLAGS+=" -isystem $LIBGCC_BASE/include/c++/trunk/x86_64-facebook-linux "
CFLAGS+=" -isystem $GLIBC_INCLUDE"
CFLAGS+=" -isystem $LIBGCC_INCLUDE"
CFLAGS+=" -isystem $CLANG_INCLUDE"
CFLAGS+=" -Wno-expansion-to-defined "
CXXFLAGS="-nostdinc++"
fi
KERNEL_HEADERS_INCLUDE="$KERNEL_HEADERS_BASE/include"
CFLAGS+=" -isystem $KERNEL_HEADERS_INCLUDE/linux "
CFLAGS+=" -isystem $KERNEL_HEADERS_INCLUDE "
CFLAGS+=" $DEPS_INCLUDE"
CFLAGS+=" -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -DROCKSDB_FALLOCATE_PRESENT -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_RANGESYNC_PRESENT -DROCKSDB_SCHED_GETCPU_PRESENT -DROCKSDB_SUPPORT_THREAD_LOCAL -DHAVE_SSE42 -DROCKSDB_IOURING_PRESENT"
CXXFLAGS+=" $CFLAGS"
EXEC_LDFLAGS=" $SNAPPY_LIBS $ZLIB_LIBS $BZIP_LIBS $LZ4_LIBS $ZSTD_LIBS $GFLAGS_LIBS $NUMA_LIB $TBB_LIBS $LIBURING_LIBS $BENCHMARK_LIBS"
EXEC_LDFLAGS+=" -Wl,--dynamic-linker,/usr/local/fbcode/platform010/lib/ld.so"
EXEC_LDFLAGS+=" $LIBUNWIND"
EXEC_LDFLAGS+=" -Wl,-rpath=/usr/local/fbcode/platform010/lib"
EXEC_LDFLAGS+=" -Wl,-rpath=$GCC_BASE/lib64"
# required by libtbb
EXEC_LDFLAGS+=" -ldl"
PLATFORM_LDFLAGS="$LIBGCC_LIBS $GLIBC_LIBS $STDLIBS -lgcc -lstdc++"
PLATFORM_LDFLAGS+=" -B$BINUTILS"
EXEC_LDFLAGS_SHARED="$SNAPPY_LIBS $ZLIB_LIBS $BZIP_LIBS $LZ4_LIBS $ZSTD_LIBS $GFLAGS_LIBS $TBB_LIBS $LIBURING_LIBS $BENCHMARK_LIBS"
VALGRIND_VER="$VALGRIND_BASE/bin/"
export CC CXX AR AS CFLAGS CXXFLAGS EXEC_LDFLAGS EXEC_LDFLAGS_SHARED VALGRIND_VER JEMALLOC_LIB JEMALLOC_INCLUDE CLANG_ANALYZER CLANG_SCAN_BUILD LUA_PATH LUA_LIB

View File

@ -1,209 +0,0 @@
#!/usr/bin/env python2.7
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals
import argparse
import commands
import subprocess
import sys
import re
import os
import time
#
# Simple logger
#
class Log:
def __init__(self, filename):
self.filename = filename
self.f = open(self.filename, 'w+', 0)
def caption(self, str):
line = "\n##### %s #####\n" % str
if self.f:
self.f.write("%s \n" % line)
else:
print(line)
def error(self, str):
data = "\n\n##### ERROR ##### %s" % str
if self.f:
self.f.write("%s \n" % data)
else:
print(data)
def log(self, str):
if self.f:
self.f.write("%s \n" % str)
else:
print(str)
#
# Shell Environment
#
class Env(object):
def __init__(self, logfile, tests):
self.tests = tests
self.log = Log(logfile)
def shell(self, cmd, path=os.getcwd()):
if path:
os.chdir(path)
self.log.log("==== shell session ===========================")
self.log.log("%s> %s" % (path, cmd))
status = subprocess.call("cd %s; %s" % (path, cmd), shell=True,
stdout=self.log.f, stderr=self.log.f)
self.log.log("status = %s" % status)
self.log.log("============================================== \n\n")
return status
def GetOutput(self, cmd, path=os.getcwd()):
if path:
os.chdir(path)
self.log.log("==== shell session ===========================")
self.log.log("%s> %s" % (path, cmd))
status, out = commands.getstatusoutput(cmd)
self.log.log("status = %s" % status)
self.log.log("out = %s" % out)
self.log.log("============================================== \n\n")
return status, out
#
# Pre-commit checker
#
class PreCommitChecker(Env):
def __init__(self, args):
Env.__init__(self, args.logfile, args.tests)
self.ignore_failure = args.ignore_failure
#
# Get commands for a given job from the determinator file
#
def get_commands(self, test):
status, out = self.GetOutput(
"RATIO=1 build_tools/rocksdb-lego-determinator %s" % test, ".")
return status, out
#
# Run a specific CI job
#
def run_test(self, test):
self.log.caption("Running test %s locally" % test)
# get commands for the CI job determinator
status, cmds = self.get_commands(test)
if status != 0:
self.log.error("Error getting commands for test %s" % test)
return False
# Parse the JSON to extract the commands to run
cmds = re.findall("'shell':'([^\']*)'", cmds)
if len(cmds) == 0:
self.log.log("No commands found")
return False
# Run commands
for cmd in cmds:
# Replace J=<..> with the local environment variable
if "J" in os.environ:
cmd = cmd.replace("J=1", "J=%s" % os.environ["J"])
cmd = cmd.replace("make ", "make -j%s " % os.environ["J"])
# Run the command
status = self.shell(cmd, ".")
if status != 0:
self.log.error("Error running command %s for test %s"
% (cmd, test))
return False
return True
#
# Run specified CI jobs
#
def run_tests(self):
if not self.tests:
self.log.error("Invalid args. Please provide tests")
return False
self.print_separator()
self.print_row("TEST", "RESULT")
self.print_separator()
result = True
for test in self.tests:
start_time = time.time()
self.print_test(test)
result = self.run_test(test)
elapsed_min = (time.time() - start_time) / 60
if not result:
self.log.error("Error running test %s" % test)
self.print_result("FAIL (%dm)" % elapsed_min)
if not self.ignore_failure:
return False
result = False
else:
self.print_result("PASS (%dm)" % elapsed_min)
self.print_separator()
return result
#
# Print a line
#
def print_separator(self):
print("".ljust(60, "-"))
#
# Print two colums
#
def print_row(self, c0, c1):
print("%s%s" % (c0.ljust(40), c1.ljust(20)))
def print_test(self, test):
print(test.ljust(40), end="")
sys.stdout.flush()
def print_result(self, result):
print(result.ljust(20))
#
# Main
#
parser = argparse.ArgumentParser(description='RocksDB pre-commit checker.')
# --log <logfile>
parser.add_argument('--logfile', default='/tmp/precommit-check.log',
help='Log file. Default is /tmp/precommit-check.log')
# --ignore_failure
parser.add_argument('--ignore_failure', action='store_true', default=False,
help='Stop when an error occurs')
# <test ....>
parser.add_argument('tests', nargs='+',
help='CI test(s) to run. e.g: unit punit asan tsan ubsan')
args = parser.parse_args()
checker = PreCommitChecker(args)
print("Please follow log %s" % checker.log.filename)
if not checker.run_tests():
print("Error running tests. Please check log file %s"
% checker.log.filename)
sys.exit(1)
sys.exit(0)

File diff suppressed because it is too large Load Diff

View File

@ -42,7 +42,7 @@ $RunOnly.Add("c_test") | Out-Null
$RunOnly.Add("compact_on_deletion_collector_test") | Out-Null
$RunOnly.Add("merge_test") | Out-Null
$RunOnly.Add("stringappend_test") | Out-Null # Apparently incorrectly written
$RunOnly.Add("backupable_db_test") | Out-Null # Disabled
$RunOnly.Add("backup_engine_test") | Out-Null # Disabled
$RunOnly.Add("timer_queue_test") | Out-Null # Not a gtest
if($RunAll -and $SuiteRun -ne "") {
@ -491,5 +491,3 @@ if(!$script:success) {
}
exit 0

View File

@ -9,6 +9,7 @@ OUTPUT=""
function log_header()
{
echo "# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved." >> "$OUTPUT"
echo "# The file is generated using update_dependencies.sh." >> "$OUTPUT"
}
@ -18,7 +19,7 @@ function log_variable()
}
TP2_LATEST="/mnt/vol/engshare/fbcode/third-party2"
TP2_LATEST="/data/users/$USER/fbsource/fbcode/third-party2/"
## $1 => lib name
## $2 => lib version (if not provided, will try to pick latest)
## $3 => platform (if not provided, will try to pick latest gcc)
@ -50,6 +51,8 @@ function get_lib_base()
fi
result=`ls -1d $result/*/ | head -n1`
echo Finding link $result
# lib_name => LIB_NAME_BASE
local __res_var=${lib_name^^}"_BASE"
@ -61,10 +64,10 @@ function get_lib_base()
}
###########################################################
# platform007 dependencies #
# platform010 dependencies #
###########################################################
OUTPUT="$BASEDIR/dependencies_platform007.sh"
OUTPUT="$BASEDIR/dependencies_platform010.sh"
rm -f "$OUTPUT"
touch "$OUTPUT"
@ -72,40 +75,42 @@ touch "$OUTPUT"
echo "Writing dependencies to $OUTPUT"
# Compilers locations
GCC_BASE=`readlink -f $TP2_LATEST/gcc/7.x/centos7-native/*/`
CLANG_BASE=`readlink -f $TP2_LATEST/llvm-fb/stable/centos7-native/*/`
GCC_BASE=`readlink -f $TP2_LATEST/gcc/11.x/centos7-native/*/`
CLANG_BASE=`readlink -f $TP2_LATEST/llvm-fb/12/platform010/*/`
log_header
log_variable GCC_BASE
log_variable CLANG_BASE
# Libraries locations
get_lib_base libgcc 7.x platform007
get_lib_base glibc 2.26 platform007
get_lib_base snappy LATEST platform007
get_lib_base zlib LATEST platform007
get_lib_base bzip2 LATEST platform007
get_lib_base lz4 LATEST platform007
get_lib_base zstd LATEST platform007
get_lib_base gflags LATEST platform007
get_lib_base jemalloc LATEST platform007
get_lib_base numa LATEST platform007
get_lib_base libunwind LATEST platform007
get_lib_base tbb LATEST platform007
get_lib_base liburing LATEST platform007
get_lib_base libgcc 11.x platform010
get_lib_base glibc 2.34 platform010
get_lib_base snappy LATEST platform010
get_lib_base zlib LATEST platform010
get_lib_base bzip2 LATEST platform010
get_lib_base lz4 LATEST platform010
get_lib_base zstd LATEST platform010
get_lib_base gflags LATEST platform010
get_lib_base jemalloc LATEST platform010
get_lib_base numa LATEST platform010
get_lib_base libunwind LATEST platform010
get_lib_base tbb 2018_U5 platform010
get_lib_base liburing LATEST platform010
get_lib_base benchmark LATEST platform010
get_lib_base kernel-headers fb platform007
get_lib_base kernel-headers fb platform010
get_lib_base binutils LATEST centos7-native
get_lib_base valgrind LATEST platform007
get_lib_base lua 5.3.4 platform007
get_lib_base valgrind LATEST platform010
get_lib_base lua 5.3.4 platform010
git diff $OUTPUT
###########################################################
# 5.x dependencies #
# platform009 dependencies #
###########################################################
OUTPUT="$BASEDIR/dependencies.sh"
OUTPUT="$BASEDIR/dependencies_platform009.sh"
rm -f "$OUTPUT"
touch "$OUTPUT"
@ -113,70 +118,32 @@ touch "$OUTPUT"
echo "Writing dependencies to $OUTPUT"
# Compilers locations
GCC_BASE=`readlink -f $TP2_LATEST/gcc/5.x/centos7-native/*/`
CLANG_BASE=`readlink -f $TP2_LATEST/llvm-fb/stable/centos7-native/*/`
GCC_BASE=`readlink -f $TP2_LATEST/gcc/9.x/centos7-native/*/`
CLANG_BASE=`readlink -f $TP2_LATEST/llvm-fb/9.0.0/platform009/*/`
log_header
log_variable GCC_BASE
log_variable CLANG_BASE
# Libraries locations
get_lib_base libgcc 5.x gcc-5-glibc-2.23
get_lib_base glibc 2.23 gcc-5-glibc-2.23
get_lib_base snappy LATEST gcc-5-glibc-2.23
get_lib_base zlib LATEST gcc-5-glibc-2.23
get_lib_base bzip2 LATEST gcc-5-glibc-2.23
get_lib_base lz4 LATEST gcc-5-glibc-2.23
get_lib_base zstd LATEST gcc-5-glibc-2.23
get_lib_base gflags LATEST gcc-5-glibc-2.23
get_lib_base jemalloc LATEST gcc-5-glibc-2.23
get_lib_base numa LATEST gcc-5-glibc-2.23
get_lib_base libunwind LATEST gcc-5-glibc-2.23
get_lib_base tbb LATEST gcc-5-glibc-2.23
get_lib_base libgcc 9.x platform009
get_lib_base glibc 2.30 platform009
get_lib_base snappy LATEST platform009
get_lib_base zlib LATEST platform009
get_lib_base bzip2 LATEST platform009
get_lib_base lz4 LATEST platform009
get_lib_base zstd LATEST platform009
get_lib_base gflags LATEST platform009
get_lib_base jemalloc LATEST platform009
get_lib_base numa LATEST platform009
get_lib_base libunwind LATEST platform009
get_lib_base tbb 2018_U5 platform009
get_lib_base liburing LATEST platform009
get_lib_base benchmark LATEST platform009
get_lib_base kernel-headers 4.0.9-36_fbk5_2933_gd092e3f gcc-5-glibc-2.23
get_lib_base kernel-headers fb platform009
get_lib_base binutils LATEST centos7-native
get_lib_base valgrind LATEST gcc-5-glibc-2.23
get_lib_base lua 5.2.3 gcc-5-glibc-2.23
git diff $OUTPUT
###########################################################
# 4.8.1 dependencies #
###########################################################
OUTPUT="$BASEDIR/dependencies_4.8.1.sh"
rm -f "$OUTPUT"
touch "$OUTPUT"
echo "Writing 4.8.1 dependencies to $OUTPUT"
# Compilers locations
GCC_BASE=`readlink -f $TP2_LATEST/gcc/4.8.1/centos6-native/*/`
CLANG_BASE=`readlink -f $TP2_LATEST/llvm-fb/stable/centos6-native/*/`
log_header
log_variable GCC_BASE
log_variable CLANG_BASE
# Libraries locations
get_lib_base libgcc 4.8.1 gcc-4.8.1-glibc-2.17
get_lib_base glibc 2.17 gcc-4.8.1-glibc-2.17
get_lib_base snappy LATEST gcc-4.8.1-glibc-2.17
get_lib_base zlib LATEST gcc-4.8.1-glibc-2.17
get_lib_base bzip2 LATEST gcc-4.8.1-glibc-2.17
get_lib_base lz4 LATEST gcc-4.8.1-glibc-2.17
get_lib_base zstd LATEST gcc-4.8.1-glibc-2.17
get_lib_base gflags LATEST gcc-4.8.1-glibc-2.17
get_lib_base jemalloc LATEST gcc-4.8.1-glibc-2.17
get_lib_base numa LATEST gcc-4.8.1-glibc-2.17
get_lib_base libunwind LATEST gcc-4.8.1-glibc-2.17
get_lib_base tbb 4.0_update2 gcc-4.8.1-glibc-2.17
get_lib_base kernel-headers LATEST gcc-4.8.1-glibc-2.17
get_lib_base binutils LATEST centos6-native
get_lib_base valgrind 3.8.1 gcc-4.8.1-glibc-2.17
get_lib_base lua 5.2.3 centos6-native
get_lib_base valgrind LATEST platform009
get_lib_base lua 5.3.4 platform009
git diff $OUTPUT

View File

@ -11,7 +11,7 @@
namespace ROCKSDB_NAMESPACE {
std::array<const char*, kNumCacheEntryRoles> kCacheEntryRoleToCamelString{{
std::array<std::string, kNumCacheEntryRoles> kCacheEntryRoleToCamelString{{
"DataBlock",
"FilterBlock",
"FilterMetaBlock",
@ -21,10 +21,11 @@ std::array<const char*, kNumCacheEntryRoles> kCacheEntryRoleToCamelString{{
"WriteBuffer",
"CompressionDictionaryBuildingBuffer",
"FilterConstruction",
"BlockBasedTableReader",
"Misc",
}};
std::array<const char*, kNumCacheEntryRoles> kCacheEntryRoleToHyphenString{{
std::array<std::string, kNumCacheEntryRoles> kCacheEntryRoleToHyphenString{{
"data-block",
"filter-block",
"filter-meta-block",
@ -34,19 +35,76 @@ std::array<const char*, kNumCacheEntryRoles> kCacheEntryRoleToHyphenString{{
"write-buffer",
"compression-dictionary-building-buffer",
"filter-construction",
"block-based-table-reader",
"misc",
}};
const std::string& GetCacheEntryRoleName(CacheEntryRole role) {
return kCacheEntryRoleToHyphenString[static_cast<size_t>(role)];
}
const std::string& BlockCacheEntryStatsMapKeys::CacheId() {
static const std::string kCacheId = "id";
return kCacheId;
}
const std::string& BlockCacheEntryStatsMapKeys::CacheCapacityBytes() {
static const std::string kCacheCapacityBytes = "capacity";
return kCacheCapacityBytes;
}
const std::string&
BlockCacheEntryStatsMapKeys::LastCollectionDurationSeconds() {
static const std::string kLastCollectionDurationSeconds =
"secs_for_last_collection";
return kLastCollectionDurationSeconds;
}
const std::string& BlockCacheEntryStatsMapKeys::LastCollectionAgeSeconds() {
static const std::string kLastCollectionAgeSeconds =
"secs_since_last_collection";
return kLastCollectionAgeSeconds;
}
namespace {
std::string GetPrefixedCacheEntryRoleName(const std::string& prefix,
CacheEntryRole role) {
const std::string& role_name = GetCacheEntryRoleName(role);
std::string prefixed_role_name;
prefixed_role_name.reserve(prefix.size() + role_name.size());
prefixed_role_name.append(prefix);
prefixed_role_name.append(role_name);
return prefixed_role_name;
}
} // namespace
std::string BlockCacheEntryStatsMapKeys::EntryCount(CacheEntryRole role) {
const static std::string kPrefix = "count.";
return GetPrefixedCacheEntryRoleName(kPrefix, role);
}
std::string BlockCacheEntryStatsMapKeys::UsedBytes(CacheEntryRole role) {
const static std::string kPrefix = "bytes.";
return GetPrefixedCacheEntryRoleName(kPrefix, role);
}
std::string BlockCacheEntryStatsMapKeys::UsedPercent(CacheEntryRole role) {
const static std::string kPrefix = "percent.";
return GetPrefixedCacheEntryRoleName(kPrefix, role);
}
namespace {
struct Registry {
std::mutex mutex;
std::unordered_map<Cache::DeleterFn, CacheEntryRole> role_map;
UnorderedMap<Cache::DeleterFn, CacheEntryRole> role_map;
void Register(Cache::DeleterFn fn, CacheEntryRole role) {
std::lock_guard<std::mutex> lock(mutex);
role_map[fn] = role;
}
std::unordered_map<Cache::DeleterFn, CacheEntryRole> Copy() {
UnorderedMap<Cache::DeleterFn, CacheEntryRole> Copy() {
std::lock_guard<std::mutex> lock(mutex);
return role_map;
}
@ -63,7 +121,7 @@ void RegisterCacheDeleterRole(Cache::DeleterFn fn, CacheEntryRole role) {
GetRegistry().Register(fn, role);
}
std::unordered_map<Cache::DeleterFn, CacheEntryRole> CopyCacheDeleterRoleMap() {
UnorderedMap<Cache::DeleterFn, CacheEntryRole> CopyCacheDeleterRoleMap() {
return GetRegistry().Copy();
}

View File

@ -9,46 +9,15 @@
#include <cstdint>
#include <memory>
#include <type_traits>
#include <unordered_map>
#include "rocksdb/cache.h"
#include "util/hash_containers.h"
namespace ROCKSDB_NAMESPACE {
// Classifications of block cache entries, for reporting statistics
// Adding new enum to this class requires corresponding updates to
// kCacheEntryRoleToCamelString and kCacheEntryRoleToHyphenString
enum class CacheEntryRole {
// Block-based table data block
kDataBlock,
// Block-based table filter block (full or partitioned)
kFilterBlock,
// Block-based table metadata block for partitioned filter
kFilterMetaBlock,
// Block-based table deprecated filter block (old "block-based" filter)
kDeprecatedFilterBlock,
// Block-based table index block
kIndexBlock,
// Other kinds of block-based table block
kOtherBlock,
// WriteBufferManager reservations to account for memtable usage
kWriteBuffer,
// BlockBasedTableBuilder reservations to account for
// compression dictionary building buffer's memory usage
kCompressionDictionaryBuildingBuffer,
// Filter reservations to account for
// (new) bloom and ribbon filter construction's memory usage
kFilterConstruction,
// Default bucket, for miscellaneous cache entries. Do not use for
// entries that could potentially add up to large usage.
kMisc,
};
constexpr uint32_t kNumCacheEntryRoles =
static_cast<uint32_t>(CacheEntryRole::kMisc) + 1;
extern std::array<const char*, kNumCacheEntryRoles>
extern std::array<std::string, kNumCacheEntryRoles>
kCacheEntryRoleToCamelString;
extern std::array<const char*, kNumCacheEntryRoles>
extern std::array<std::string, kNumCacheEntryRoles>
kCacheEntryRoleToHyphenString;
// To associate cache entries with their role, we use a hack on the
@ -75,7 +44,7 @@ void RegisterCacheDeleterRole(Cache::DeleterFn fn, CacheEntryRole role);
// * This is suitable for preparing for batch operations, like with
// CacheEntryStatsCollector.
// * The number of mappings should be sufficiently small (dozens).
std::unordered_map<Cache::DeleterFn, CacheEntryRole> CopyCacheDeleterRoleMap();
UnorderedMap<Cache::DeleterFn, CacheEntryRole> CopyCacheDeleterRoleMap();
// ************************************************************** //
// An automatic registration infrastructure. This enables code

View File

@ -17,12 +17,30 @@
#include "rocksdb/cache.h"
#include "rocksdb/slice.h"
#include "rocksdb/status.h"
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/reader_common.h"
#include "util/coding.h"
namespace ROCKSDB_NAMESPACE {
CacheReservationManager::CacheReservationManager(std::shared_ptr<Cache> cache,
bool delayed_decrease)
template <CacheEntryRole R>
CacheReservationManagerImpl<R>::CacheReservationHandle::CacheReservationHandle(
std::size_t incremental_memory_used,
std::shared_ptr<CacheReservationManagerImpl> cache_res_mgr)
: incremental_memory_used_(incremental_memory_used) {
assert(cache_res_mgr);
cache_res_mgr_ = cache_res_mgr;
}
template <CacheEntryRole R>
CacheReservationManagerImpl<
R>::CacheReservationHandle::~CacheReservationHandle() {
Status s = cache_res_mgr_->ReleaseCacheReservation(incremental_memory_used_);
s.PermitUncheckedError();
}
template <CacheEntryRole R>
CacheReservationManagerImpl<R>::CacheReservationManagerImpl(
std::shared_ptr<Cache> cache, bool delayed_decrease)
: delayed_decrease_(delayed_decrease),
cache_allocated_size_(0),
memory_used_(0) {
@ -30,14 +48,15 @@ CacheReservationManager::CacheReservationManager(std::shared_ptr<Cache> cache,
cache_ = cache;
}
CacheReservationManager::~CacheReservationManager() {
template <CacheEntryRole R>
CacheReservationManagerImpl<R>::~CacheReservationManagerImpl() {
for (auto* handle : dummy_handles_) {
cache_->Release(handle, true);
}
}
template <CacheEntryRole R>
Status CacheReservationManager::UpdateCacheReservation(
Status CacheReservationManagerImpl<R>::UpdateCacheReservation(
std::size_t new_mem_used) {
memory_used_ = new_mem_used;
std::size_t cur_cache_allocated_size =
@ -45,7 +64,7 @@ Status CacheReservationManager::UpdateCacheReservation(
if (new_mem_used == cur_cache_allocated_size) {
return Status::OK();
} else if (new_mem_used > cur_cache_allocated_size) {
Status s = IncreaseCacheReservation<R>(new_mem_used);
Status s = IncreaseCacheReservation(new_mem_used);
return s;
} else {
// In delayed decrease mode, we don't decrease cache reservation
@ -66,41 +85,32 @@ Status CacheReservationManager::UpdateCacheReservation(
}
}
// Explicitly instantiate templates for "CacheEntryRole" values we use.
// This makes it possible to keep the template definitions in the .cc file.
template Status CacheReservationManager::UpdateCacheReservation<
CacheEntryRole::kWriteBuffer>(std::size_t new_mem_used);
template Status CacheReservationManager::UpdateCacheReservation<
CacheEntryRole::kCompressionDictionaryBuildingBuffer>(
std::size_t new_mem_used);
// For cache reservation manager unit tests
template Status CacheReservationManager::UpdateCacheReservation<
CacheEntryRole::kMisc>(std::size_t new_mem_used);
template <CacheEntryRole R>
Status CacheReservationManager::MakeCacheReservation(
Status CacheReservationManagerImpl<R>::MakeCacheReservation(
std::size_t incremental_memory_used,
std::unique_ptr<CacheReservationHandle<R>>* handle) {
assert(handle != nullptr);
std::unique_ptr<CacheReservationManager::CacheReservationHandle>* handle) {
assert(handle);
Status s =
UpdateCacheReservation<R>(GetTotalMemoryUsed() + incremental_memory_used);
(*handle).reset(new CacheReservationHandle<R>(incremental_memory_used,
shared_from_this()));
UpdateCacheReservation(GetTotalMemoryUsed() + incremental_memory_used);
(*handle).reset(new CacheReservationManagerImpl::CacheReservationHandle(
incremental_memory_used,
std::enable_shared_from_this<
CacheReservationManagerImpl<R>>::shared_from_this()));
return s;
}
template Status
CacheReservationManager::MakeCacheReservation<CacheEntryRole::kMisc>(
std::size_t incremental_memory_used,
std::unique_ptr<CacheReservationHandle<CacheEntryRole::kMisc>>* handle);
template Status CacheReservationManager::MakeCacheReservation<
CacheEntryRole::kFilterConstruction>(
std::size_t incremental_memory_used,
std::unique_ptr<
CacheReservationHandle<CacheEntryRole::kFilterConstruction>>* handle);
template <CacheEntryRole R>
Status CacheReservationManagerImpl<R>::ReleaseCacheReservation(
std::size_t incremental_memory_used) {
assert(GetTotalMemoryUsed() >= incremental_memory_used);
std::size_t updated_total_mem_used =
GetTotalMemoryUsed() - incremental_memory_used;
Status s = UpdateCacheReservation(updated_total_mem_used);
return s;
}
template <CacheEntryRole R>
Status CacheReservationManager::IncreaseCacheReservation(
Status CacheReservationManagerImpl<R>::IncreaseCacheReservation(
std::size_t new_mem_used) {
Status return_status = Status::OK();
while (new_mem_used > cache_allocated_size_.load(std::memory_order_relaxed)) {
@ -118,7 +128,8 @@ Status CacheReservationManager::IncreaseCacheReservation(
return return_status;
}
Status CacheReservationManager::DecreaseCacheReservation(
template <CacheEntryRole R>
Status CacheReservationManagerImpl<R>::DecreaseCacheReservation(
std::size_t new_mem_used) {
Status return_status = Status::OK();
@ -137,15 +148,18 @@ Status CacheReservationManager::DecreaseCacheReservation(
return return_status;
}
std::size_t CacheReservationManager::GetTotalReservedCacheSize() {
template <CacheEntryRole R>
std::size_t CacheReservationManagerImpl<R>::GetTotalReservedCacheSize() {
return cache_allocated_size_.load(std::memory_order_relaxed);
}
std::size_t CacheReservationManager::GetTotalMemoryUsed() {
template <CacheEntryRole R>
std::size_t CacheReservationManagerImpl<R>::GetTotalMemoryUsed() {
return memory_used_;
}
Slice CacheReservationManager::GetNextCacheKey() {
template <CacheEntryRole R>
Slice CacheReservationManagerImpl<R>::GetNextCacheKey() {
// Calling this function will have the side-effect of changing the
// underlying cache_key_ that is shared among other keys generated from this
// fucntion. Therefore please make sure the previous keys are saved/copied
@ -155,34 +169,15 @@ Slice CacheReservationManager::GetNextCacheKey() {
}
template <CacheEntryRole R>
Cache::DeleterFn CacheReservationManager::TEST_GetNoopDeleterForRole() {
Cache::DeleterFn CacheReservationManagerImpl<R>::TEST_GetNoopDeleterForRole() {
return GetNoopDeleterForRole<R>();
}
template Cache::DeleterFn CacheReservationManager::TEST_GetNoopDeleterForRole<
CacheEntryRole::kFilterConstruction>();
template <CacheEntryRole R>
CacheReservationHandle<R>::CacheReservationHandle(
std::size_t incremental_memory_used,
std::shared_ptr<CacheReservationManager> cache_res_mgr)
: incremental_memory_used_(incremental_memory_used) {
assert(cache_res_mgr != nullptr);
cache_res_mgr_ = cache_res_mgr;
}
template <CacheEntryRole R>
CacheReservationHandle<R>::~CacheReservationHandle() {
assert(cache_res_mgr_ != nullptr);
assert(cache_res_mgr_->GetTotalMemoryUsed() >= incremental_memory_used_);
Status s = cache_res_mgr_->UpdateCacheReservation<R>(
cache_res_mgr_->GetTotalMemoryUsed() - incremental_memory_used_);
s.PermitUncheckedError();
}
// Explicitly instantiate templates for "CacheEntryRole" values we use.
// This makes it possible to keep the template definitions in the .cc file.
template class CacheReservationHandle<CacheEntryRole::kMisc>;
template class CacheReservationHandle<CacheEntryRole::kFilterConstruction>;
template class CacheReservationManagerImpl<
CacheEntryRole::kBlockBasedTableReader>;
template class CacheReservationManagerImpl<
CacheEntryRole::kCompressionDictionaryBuildingBuffer>;
template class CacheReservationManagerImpl<CacheEntryRole::kFilterConstruction>;
template class CacheReservationManagerImpl<CacheEntryRole::kMisc>;
template class CacheReservationManagerImpl<CacheEntryRole::kWriteBuffer>;
} // namespace ROCKSDB_NAMESPACE

View File

@ -13,54 +13,90 @@
#include <cstddef>
#include <cstdint>
#include <memory>
#include <mutex>
#include <vector>
#include "cache/cache_entry_roles.h"
#include "cache/cache_key.h"
#include "rocksdb/cache.h"
#include "rocksdb/slice.h"
#include "rocksdb/status.h"
#include "table/block_based/block_based_table_reader.h"
#include "util/coding.h"
namespace ROCKSDB_NAMESPACE {
// CacheReservationManager is an interface for reserving cache space for the
// memory used
class CacheReservationManager {
public:
// CacheReservationHandle is for managing the lifetime of a cache reservation
// for an incremental amount of memory used (i.e, incremental_memory_used)
class CacheReservationHandle {
public:
virtual ~CacheReservationHandle() {}
};
virtual ~CacheReservationManager() {}
virtual Status UpdateCacheReservation(std::size_t new_memory_used) = 0;
virtual Status MakeCacheReservation(
std::size_t incremental_memory_used,
std::unique_ptr<CacheReservationManager::CacheReservationHandle>
*handle) = 0;
virtual std::size_t GetTotalReservedCacheSize() = 0;
virtual std::size_t GetTotalMemoryUsed() = 0;
};
template <CacheEntryRole R>
class CacheReservationHandle;
// CacheReservationManager is for reserving cache space for the memory used
// through inserting/releasing dummy entries in the cache.
// CacheReservationManagerImpl implements interface CacheReservationManager
// for reserving cache space for the memory used by inserting/releasing dummy
// entries in the cache.
//
// This class is NOT thread-safe, except that GetTotalReservedCacheSize()
// can be called without external synchronization.
class CacheReservationManager
: public std::enable_shared_from_this<CacheReservationManager> {
template <CacheEntryRole R>
class CacheReservationManagerImpl
: public CacheReservationManager,
public std::enable_shared_from_this<CacheReservationManagerImpl<R>> {
public:
// Construct a CacheReservationManager
class CacheReservationHandle
: public CacheReservationManager::CacheReservationHandle {
public:
CacheReservationHandle(
std::size_t incremental_memory_used,
std::shared_ptr<CacheReservationManagerImpl> cache_res_mgr);
~CacheReservationHandle() override;
private:
std::size_t incremental_memory_used_;
std::shared_ptr<CacheReservationManagerImpl> cache_res_mgr_;
};
// Construct a CacheReservationManagerImpl
// @param cache The cache where dummy entries are inserted and released for
// reserving cache space
// @param delayed_decrease If set true, then dummy entries won't be released
// immediately when memory usage decreases.
// immediately when memory usage decreases.
// Instead, it will be released when the memory usage
// decreases to 3/4 of what we have reserved so far.
// This is for saving some future dummy entry
// insertion when memory usage increases are likely to
// happen in the near future.
explicit CacheReservationManager(std::shared_ptr<Cache> cache,
bool delayed_decrease = false);
//
// REQUIRED: cache is not nullptr
explicit CacheReservationManagerImpl(std::shared_ptr<Cache> cache,
bool delayed_decrease = false);
// no copy constructor, copy assignment, move constructor, move assignment
CacheReservationManager(const CacheReservationManager &) = delete;
CacheReservationManager &operator=(const CacheReservationManager &) = delete;
CacheReservationManager(CacheReservationManager &&) = delete;
CacheReservationManager &operator=(CacheReservationManager &&) = delete;
CacheReservationManagerImpl(const CacheReservationManagerImpl &) = delete;
CacheReservationManagerImpl &operator=(const CacheReservationManagerImpl &) =
delete;
CacheReservationManagerImpl(CacheReservationManagerImpl &&) = delete;
CacheReservationManagerImpl &operator=(CacheReservationManagerImpl &&) =
delete;
~CacheReservationManager();
~CacheReservationManagerImpl() override;
template <CacheEntryRole R>
// One of the two ways of reserving/releasing cache,
// see CacheReservationManager::MakeCacheReservation() for the other.
// Use ONLY one of them to prevent unexpected behavior.
// One of the two ways of reserving/releasing cache space,
// see MakeCacheReservation() for the other.
//
// Use ONLY one of these two ways to prevent unexpected behavior.
//
// Insert and release dummy entries in the cache to
// match the size of total dummy entries with the least multiple of
@ -90,11 +126,13 @@ class CacheReservationManager
// Otherwise, it returns the first non-ok status;
// On releasing dummy entries, it always returns Status::OK().
// On keeping dummy entries the same, it always returns Status::OK().
Status UpdateCacheReservation(std::size_t new_memory_used);
Status UpdateCacheReservation(std::size_t new_memory_used) override;
// One of the two ways of reserving/releasing cache,
// see CacheReservationManager::UpdateCacheReservation() for the other.
// Use ONLY one of them to prevent unexpected behavior.
// One of the two ways of reserving cache space and releasing is done through
// destruction of CacheReservationHandle.
// See UpdateCacheReservation() for the other way.
//
// Use ONLY one of these two ways to prevent unexpected behavior.
//
// Insert dummy entries in the cache for the incremental memory usage
// to match the size of total dummy entries with the least multiple of
@ -118,21 +156,19 @@ class CacheReservationManager
// calling MakeCacheReservation() is needed if you want
// GetTotalMemoryUsed() indeed returns the latest memory used.
//
// @param handle An pointer to std::unique_ptr<CacheReservationHandle<R>> that
// manages the lifetime of the handle and its cache reservation.
// @param handle An pointer to std::unique_ptr<CacheReservationHandle> that
// manages the lifetime of the cache reservation represented by the
// handle.
//
// @return It returns Status::OK() if all dummy
// entry insertions succeed.
// Otherwise, it returns the first non-ok status;
//
// REQUIRES: handle != nullptr
// REQUIRES: The CacheReservationManager object is NOT managed by
// std::unique_ptr as CacheReservationHandle needs to
// shares ownership to the CacheReservationManager object.
template <CacheEntryRole R>
Status MakeCacheReservation(
std::size_t incremental_memory_used,
std::unique_ptr<CacheReservationHandle<R>> *handle);
std::unique_ptr<CacheReservationManager::CacheReservationHandle> *handle)
override;
// Return the size of the cache (which is a multiple of kSizeDummyEntry)
// successfully reserved by calling UpdateCacheReservation().
@ -142,25 +178,25 @@ class CacheReservationManager
// smaller number than the actual reserved cache size due to
// the returned number will always be a multiple of kSizeDummyEntry
// and cache full might happen in the middle of inserting a dummy entry.
std::size_t GetTotalReservedCacheSize();
std::size_t GetTotalReservedCacheSize() override;
// Return the latest total memory used indicated by the most recent call of
// UpdateCacheReservation(std::size_t new_memory_used);
std::size_t GetTotalMemoryUsed();
std::size_t GetTotalMemoryUsed() override;
static constexpr std::size_t GetDummyEntrySize() { return kSizeDummyEntry; }
// For testing only - it is to help ensure the NoopDeleterForRole<R>
// accessed from CacheReservationManager and the one accessed from the test
// are from the same translation units
template <CacheEntryRole R>
// accessed from CacheReservationManagerImpl and the one accessed from the
// test are from the same translation units
static Cache::DeleterFn TEST_GetNoopDeleterForRole();
private:
static constexpr std::size_t kSizeDummyEntry = 256 * 1024;
Slice GetNextCacheKey();
template <CacheEntryRole R>
Status ReleaseCacheReservation(std::size_t incremental_memory_used);
Status IncreaseCacheReservation(std::size_t new_mem_used);
Status DecreaseCacheReservation(std::size_t new_mem_used);
@ -172,20 +208,81 @@ class CacheReservationManager
CacheKey cache_key_;
};
// CacheReservationHandle is for managing the lifetime of a cache reservation
// This class is NOT thread-safe
template <CacheEntryRole R>
class CacheReservationHandle {
class ConcurrentCacheReservationManager
: public CacheReservationManager,
public std::enable_shared_from_this<ConcurrentCacheReservationManager> {
public:
// REQUIRES: cache_res_mgr != nullptr
explicit CacheReservationHandle(
std::size_t incremental_memory_used,
std::shared_ptr<CacheReservationManager> cache_res_mgr);
class CacheReservationHandle
: public CacheReservationManager::CacheReservationHandle {
public:
CacheReservationHandle(
std::shared_ptr<ConcurrentCacheReservationManager> cache_res_mgr,
std::unique_ptr<CacheReservationManager::CacheReservationHandle>
cache_res_handle) {
assert(cache_res_mgr && cache_res_handle);
cache_res_mgr_ = cache_res_mgr;
cache_res_handle_ = std::move(cache_res_handle);
}
~CacheReservationHandle();
~CacheReservationHandle() override {
std::lock_guard<std::mutex> lock(cache_res_mgr_->cache_res_mgr_mu_);
cache_res_handle_.reset();
}
private:
std::shared_ptr<ConcurrentCacheReservationManager> cache_res_mgr_;
std::unique_ptr<CacheReservationManager::CacheReservationHandle>
cache_res_handle_;
};
explicit ConcurrentCacheReservationManager(
std::shared_ptr<CacheReservationManager> cache_res_mgr) {
cache_res_mgr_ = std::move(cache_res_mgr);
}
ConcurrentCacheReservationManager(const ConcurrentCacheReservationManager &) =
delete;
ConcurrentCacheReservationManager &operator=(
const ConcurrentCacheReservationManager &) = delete;
ConcurrentCacheReservationManager(ConcurrentCacheReservationManager &&) =
delete;
ConcurrentCacheReservationManager &operator=(
ConcurrentCacheReservationManager &&) = delete;
~ConcurrentCacheReservationManager() override {}
inline Status UpdateCacheReservation(std::size_t new_memory_used) override {
std::lock_guard<std::mutex> lock(cache_res_mgr_mu_);
return cache_res_mgr_->UpdateCacheReservation(new_memory_used);
}
inline Status MakeCacheReservation(
std::size_t incremental_memory_used,
std::unique_ptr<CacheReservationManager::CacheReservationHandle> *handle)
override {
std::unique_ptr<CacheReservationManager::CacheReservationHandle>
wrapped_handle;
Status s;
{
std::lock_guard<std::mutex> lock(cache_res_mgr_mu_);
s = cache_res_mgr_->MakeCacheReservation(incremental_memory_used,
&wrapped_handle);
}
(*handle).reset(
new ConcurrentCacheReservationManager::CacheReservationHandle(
std::enable_shared_from_this<
ConcurrentCacheReservationManager>::shared_from_this(),
std::move(wrapped_handle)));
return s;
}
inline std::size_t GetTotalReservedCacheSize() override {
return cache_res_mgr_->GetTotalReservedCacheSize();
}
inline std::size_t GetTotalMemoryUsed() override {
std::lock_guard<std::mutex> lock(cache_res_mgr_mu_);
return cache_res_mgr_->GetTotalMemoryUsed();
}
private:
std::size_t incremental_memory_used_;
std::mutex cache_res_mgr_mu_;
std::shared_ptr<CacheReservationManager> cache_res_mgr_;
};
} // namespace ROCKSDB_NAMESPACE

View File

@ -15,7 +15,6 @@
#include "cache/cache_entry_roles.h"
#include "rocksdb/cache.h"
#include "rocksdb/slice.h"
#include "table/block_based/block_based_table_reader.h"
#include "test_util/testharness.h"
#include "util/coding.h"
@ -23,25 +22,24 @@ namespace ROCKSDB_NAMESPACE {
class CacheReservationManagerTest : public ::testing::Test {
protected:
static constexpr std::size_t kSizeDummyEntry =
CacheReservationManager::GetDummyEntrySize();
CacheReservationManagerImpl<CacheEntryRole::kMisc>::GetDummyEntrySize();
static constexpr std::size_t kCacheCapacity = 4096 * kSizeDummyEntry;
static constexpr int kNumShardBits = 0; // 2^0 shard
static constexpr std::size_t kMetaDataChargeOverhead = 10000;
std::shared_ptr<Cache> cache = NewLRUCache(kCacheCapacity, kNumShardBits);
std::unique_ptr<CacheReservationManager> test_cache_rev_mng;
std::shared_ptr<CacheReservationManager> test_cache_rev_mng;
CacheReservationManagerTest() {
test_cache_rev_mng.reset(new CacheReservationManager(cache));
test_cache_rev_mng =
std::make_shared<CacheReservationManagerImpl<CacheEntryRole::kMisc>>(
cache);
}
};
TEST_F(CacheReservationManagerTest, GenerateCacheKey) {
std::size_t new_mem_used = 1 * kSizeDummyEntry;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
ASSERT_EQ(s, Status::OK());
ASSERT_GE(cache->GetPinnedUsage(), 1 * kSizeDummyEntry);
ASSERT_LT(cache->GetPinnedUsage(),
@ -49,13 +47,14 @@ TEST_F(CacheReservationManagerTest, GenerateCacheKey) {
// Next unique Cache key
CacheKey ckey = CacheKey::CreateUniqueForCacheLifetime(cache.get());
// Back it up to the one used by CRM (using CacheKey implementation details)
using PairU64 = std::pair<uint64_t, uint64_t>;
// Get to the underlying values
using PairU64 = std::array<uint64_t, 2>;
auto& ckey_pair = *reinterpret_cast<PairU64*>(&ckey);
ckey_pair.second--;
// Back it up to the one used by CRM (using CacheKey implementation details)
ckey_pair[1]--;
// Specific key (subject to implementation details)
EXPECT_EQ(ckey_pair, PairU64(0, 2));
EXPECT_EQ(ckey_pair, PairU64({0, 2}));
Cache::Handle* handle = cache->Lookup(ckey.AsSlice());
EXPECT_NE(handle, nullptr)
@ -66,10 +65,7 @@ TEST_F(CacheReservationManagerTest, GenerateCacheKey) {
TEST_F(CacheReservationManagerTest, KeepCacheReservationTheSame) {
std::size_t new_mem_used = 1 * kSizeDummyEntry;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
ASSERT_EQ(s, Status::OK());
ASSERT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
1 * kSizeDummyEntry);
@ -79,9 +75,7 @@ TEST_F(CacheReservationManagerTest, KeepCacheReservationTheSame) {
ASSERT_LT(initial_pinned_usage,
1 * kSizeDummyEntry + kMetaDataChargeOverhead);
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to keep cache reservation the same when new_mem_used equals "
"to current cache reservation";
@ -100,10 +94,7 @@ TEST_F(CacheReservationManagerTest, KeepCacheReservationTheSame) {
TEST_F(CacheReservationManagerTest,
IncreaseCacheReservationByMultiplesOfDummyEntrySize) {
std::size_t new_mem_used = 2 * kSizeDummyEntry;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to increase cache reservation correctly";
EXPECT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
@ -121,10 +112,7 @@ TEST_F(CacheReservationManagerTest,
TEST_F(CacheReservationManagerTest,
IncreaseCacheReservationNotByMultiplesOfDummyEntrySize) {
std::size_t new_mem_used = 2 * kSizeDummyEntry + kSizeDummyEntry / 2;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to increase cache reservation correctly";
EXPECT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
@ -143,7 +131,7 @@ TEST(CacheReservationManagerIncreaseReservcationOnFullCacheTest,
IncreaseCacheReservationOnFullCache) {
;
constexpr std::size_t kSizeDummyEntry =
CacheReservationManager::GetDummyEntrySize();
CacheReservationManagerImpl<CacheEntryRole::kMisc>::GetDummyEntrySize();
constexpr std::size_t kSmallCacheCapacity = 4 * kSizeDummyEntry;
constexpr std::size_t kBigCacheCapacity = 4096 * kSizeDummyEntry;
constexpr std::size_t kMetaDataChargeOverhead = 10000;
@ -153,14 +141,12 @@ TEST(CacheReservationManagerIncreaseReservcationOnFullCacheTest,
lo.num_shard_bits = 0; // 2^0 shard
lo.strict_capacity_limit = true;
std::shared_ptr<Cache> cache = NewLRUCache(lo);
std::unique_ptr<CacheReservationManager> test_cache_rev_mng(
new CacheReservationManager(cache));
std::shared_ptr<CacheReservationManager> test_cache_rev_mng =
std::make_shared<CacheReservationManagerImpl<CacheEntryRole::kMisc>>(
cache);
std::size_t new_mem_used = kSmallCacheCapacity + 1;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::Incomplete())
<< "Failed to return status to indicate failure of dummy entry insertion "
"during cache reservation on full cache";
@ -183,9 +169,7 @@ TEST(CacheReservationManagerIncreaseReservcationOnFullCacheTest,
"encountering cache resevation failure due to full cache";
new_mem_used = kSmallCacheCapacity / 2; // 2 dummy entries
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to decrease cache reservation after encountering cache "
"reservation failure due to full cache";
@ -207,9 +191,7 @@ TEST(CacheReservationManagerIncreaseReservcationOnFullCacheTest,
// Create cache full again for subsequent tests
new_mem_used = kSmallCacheCapacity + 1;
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::Incomplete())
<< "Failed to return status to indicate failure of dummy entry insertion "
"during cache reservation on full cache";
@ -235,9 +217,7 @@ TEST(CacheReservationManagerIncreaseReservcationOnFullCacheTest,
// succeed
cache->SetCapacity(kBigCacheCapacity);
new_mem_used = kSmallCacheCapacity + 1;
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to increase cache reservation after increasing cache capacity "
"and mitigating cache full error";
@ -259,10 +239,7 @@ TEST(CacheReservationManagerIncreaseReservcationOnFullCacheTest,
TEST_F(CacheReservationManagerTest,
DecreaseCacheReservationByMultiplesOfDummyEntrySize) {
std::size_t new_mem_used = 2 * kSizeDummyEntry;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
ASSERT_EQ(s, Status::OK());
ASSERT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
2 * kSizeDummyEntry);
@ -272,9 +249,7 @@ TEST_F(CacheReservationManagerTest,
2 * kSizeDummyEntry + kMetaDataChargeOverhead);
new_mem_used = 1 * kSizeDummyEntry;
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to decrease cache reservation correctly";
EXPECT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
@ -292,10 +267,7 @@ TEST_F(CacheReservationManagerTest,
TEST_F(CacheReservationManagerTest,
DecreaseCacheReservationNotByMultiplesOfDummyEntrySize) {
std::size_t new_mem_used = 2 * kSizeDummyEntry;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
ASSERT_EQ(s, Status::OK());
ASSERT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
2 * kSizeDummyEntry);
@ -305,9 +277,7 @@ TEST_F(CacheReservationManagerTest,
2 * kSizeDummyEntry + kMetaDataChargeOverhead);
new_mem_used = kSizeDummyEntry / 2;
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to decrease cache reservation correctly";
EXPECT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
@ -325,7 +295,7 @@ TEST_F(CacheReservationManagerTest,
TEST(CacheReservationManagerWithDelayedDecreaseTest,
DecreaseCacheReservationWithDelayedDecrease) {
constexpr std::size_t kSizeDummyEntry =
CacheReservationManager::GetDummyEntrySize();
CacheReservationManagerImpl<CacheEntryRole::kMisc>::GetDummyEntrySize();
constexpr std::size_t kCacheCapacity = 4096 * kSizeDummyEntry;
constexpr std::size_t kMetaDataChargeOverhead = 10000;
@ -333,14 +303,12 @@ TEST(CacheReservationManagerWithDelayedDecreaseTest,
lo.capacity = kCacheCapacity;
lo.num_shard_bits = 0;
std::shared_ptr<Cache> cache = NewLRUCache(lo);
std::unique_ptr<CacheReservationManager> test_cache_rev_mng(
new CacheReservationManager(cache, true /* delayed_decrease */));
std::shared_ptr<CacheReservationManager> test_cache_rev_mng =
std::make_shared<CacheReservationManagerImpl<CacheEntryRole::kMisc>>(
cache, true /* delayed_decrease */);
std::size_t new_mem_used = 8 * kSizeDummyEntry;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
ASSERT_EQ(s, Status::OK());
ASSERT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
8 * kSizeDummyEntry);
@ -351,9 +319,7 @@ TEST(CacheReservationManagerWithDelayedDecreaseTest,
8 * kSizeDummyEntry + kMetaDataChargeOverhead);
new_mem_used = 6 * kSizeDummyEntry;
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK()) << "Failed to delay decreasing cache reservation";
EXPECT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
8 * kSizeDummyEntry)
@ -365,9 +331,7 @@ TEST(CacheReservationManagerWithDelayedDecreaseTest,
<< "Failed to delay decreasing underlying dummy entries in cache";
new_mem_used = 7 * kSizeDummyEntry;
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK()) << "Failed to delay decreasing cache reservation";
EXPECT_EQ(test_cache_rev_mng->GetTotalReservedCacheSize(),
8 * kSizeDummyEntry)
@ -379,9 +343,7 @@ TEST(CacheReservationManagerWithDelayedDecreaseTest,
<< "Failed to delay decreasing underlying dummy entries in cache";
new_mem_used = 6 * kSizeDummyEntry - 1;
s = test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
EXPECT_EQ(s, Status::OK())
<< "Failed to decrease cache reservation correctly when new_mem_used < "
"GetTotalReservedCacheSize() * 3 / 4 on delayed decrease mode";
@ -405,7 +367,7 @@ TEST(CacheReservationManagerWithDelayedDecreaseTest,
TEST(CacheReservationManagerDestructorTest,
ReleaseRemainingDummyEntriesOnDestruction) {
constexpr std::size_t kSizeDummyEntry =
CacheReservationManager::GetDummyEntrySize();
CacheReservationManagerImpl<CacheEntryRole::kMisc>::GetDummyEntrySize();
constexpr std::size_t kCacheCapacity = 4096 * kSizeDummyEntry;
constexpr std::size_t kMetaDataChargeOverhead = 10000;
@ -414,13 +376,11 @@ TEST(CacheReservationManagerDestructorTest,
lo.num_shard_bits = 0;
std::shared_ptr<Cache> cache = NewLRUCache(lo);
{
std::unique_ptr<CacheReservationManager> test_cache_rev_mng(
new CacheReservationManager(cache));
std::shared_ptr<CacheReservationManager> test_cache_rev_mng =
std::make_shared<CacheReservationManagerImpl<CacheEntryRole::kMisc>>(
cache);
std::size_t new_mem_used = 1 * kSizeDummyEntry;
Status s =
test_cache_rev_mng
->UpdateCacheReservation<ROCKSDB_NAMESPACE::CacheEntryRole::kMisc>(
new_mem_used);
Status s = test_cache_rev_mng->UpdateCacheReservation(new_mem_used);
ASSERT_EQ(s, Status::OK());
ASSERT_GE(cache->GetPinnedUsage(), 1 * kSizeDummyEntry);
ASSERT_LT(cache->GetPinnedUsage(),
@ -442,18 +402,19 @@ TEST(CacheReservationHandleTest, HandleTest) {
std::shared_ptr<Cache> cache = NewLRUCache(lo);
std::shared_ptr<CacheReservationManager> test_cache_rev_mng(
std::make_shared<CacheReservationManager>(cache));
std::make_shared<CacheReservationManagerImpl<CacheEntryRole::kMisc>>(
cache));
std::size_t mem_used = 0;
const std::size_t incremental_mem_used_handle_1 = 1 * kSizeDummyEntry;
const std::size_t incremental_mem_used_handle_2 = 2 * kSizeDummyEntry;
std::unique_ptr<CacheReservationHandle<CacheEntryRole::kMisc>> handle_1,
std::unique_ptr<CacheReservationManager::CacheReservationHandle> handle_1,
handle_2;
// To test consecutive CacheReservationManager::MakeCacheReservation works
// correctly in terms of returning the handle as well as updating cache
// reservation and the latest total memory used
Status s = test_cache_rev_mng->MakeCacheReservation<CacheEntryRole::kMisc>(
Status s = test_cache_rev_mng->MakeCacheReservation(
incremental_mem_used_handle_1, &handle_1);
mem_used = mem_used + incremental_mem_used_handle_1;
ASSERT_EQ(s, Status::OK());
@ -463,8 +424,8 @@ TEST(CacheReservationHandleTest, HandleTest) {
EXPECT_GE(cache->GetPinnedUsage(), mem_used);
EXPECT_LT(cache->GetPinnedUsage(), mem_used + kMetaDataChargeOverhead);
s = test_cache_rev_mng->MakeCacheReservation<CacheEntryRole::kMisc>(
incremental_mem_used_handle_2, &handle_2);
s = test_cache_rev_mng->MakeCacheReservation(incremental_mem_used_handle_2,
&handle_2);
mem_used = mem_used + incremental_mem_used_handle_2;
ASSERT_EQ(s, Status::OK());
EXPECT_TRUE(handle_2 != nullptr);
@ -473,8 +434,9 @@ TEST(CacheReservationHandleTest, HandleTest) {
EXPECT_GE(cache->GetPinnedUsage(), mem_used);
EXPECT_LT(cache->GetPinnedUsage(), mem_used + kMetaDataChargeOverhead);
// To test CacheReservationHandle::~CacheReservationHandle() works correctly
// in releasing the cache reserved for the handle
// To test
// CacheReservationManager::CacheReservationHandle::~CacheReservationHandle()
// works correctly in releasing the cache reserved for the handle
handle_1.reset();
EXPECT_TRUE(handle_1 == nullptr);
mem_used = mem_used - incremental_mem_used_handle_1;

46
cache/cache_test.cc vendored
View File

@ -14,7 +14,9 @@
#include <iostream>
#include <string>
#include <vector>
#include "cache/clock_cache.h"
#include "cache/fast_lru_cache.h"
#include "cache/lru_cache.h"
#include "test_util/testharness.h"
#include "util/coding.h"
@ -39,6 +41,7 @@ static int DecodeValue(void* v) {
const std::string kLRU = "lru";
const std::string kClock = "clock";
const std::string kFast = "fast";
void dumbDeleter(const Slice& /*key*/, void* /*value*/) {}
@ -83,6 +86,9 @@ class CacheTest : public testing::TestWithParam<std::string> {
if (type == kClock) {
return NewClockCache(capacity);
}
if (type == kFast) {
return NewFastLRUCache(capacity);
}
return nullptr;
}
@ -103,6 +109,10 @@ class CacheTest : public testing::TestWithParam<std::string> {
return NewClockCache(capacity, num_shard_bits, strict_capacity_limit,
charge_policy);
}
if (type == kFast) {
return NewFastLRUCache(capacity, num_shard_bits, strict_capacity_limit,
charge_policy);
}
return nullptr;
}
@ -183,7 +193,7 @@ TEST_P(CacheTest, UsageTest) {
// make sure the cache will be overloaded
for (uint64_t i = 1; i < kCapacity; ++i) {
auto key = ToString(i);
auto key = std::to_string(i);
ASSERT_OK(cache->Insert(key, reinterpret_cast<void*>(value), key.size() + 5,
dumbDeleter));
ASSERT_OK(precise_cache->Insert(key, reinterpret_cast<void*>(value),
@ -255,7 +265,7 @@ TEST_P(CacheTest, PinnedUsageTest) {
// check that overloading the cache does not change the pinned usage
for (uint64_t i = 1; i < 2 * kCapacity; ++i) {
auto key = ToString(i);
auto key = std::to_string(i);
ASSERT_OK(cache->Insert(key, reinterpret_cast<void*>(value), key.size() + 5,
dumbDeleter));
ASSERT_OK(precise_cache->Insert(key, reinterpret_cast<void*>(value),
@ -575,7 +585,7 @@ TEST_P(CacheTest, SetCapacity) {
std::vector<Cache::Handle*> handles(10);
// Insert 5 entries, but not releasing.
for (size_t i = 0; i < 5; i++) {
std::string key = ToString(i+1);
std::string key = std::to_string(i + 1);
Status s = cache->Insert(key, new Value(i + 1), 1, &deleter, &handles[i]);
ASSERT_TRUE(s.ok());
}
@ -590,7 +600,7 @@ TEST_P(CacheTest, SetCapacity) {
// then decrease capacity to 7, final capacity should be 7
// and usage should be 7
for (size_t i = 5; i < 10; i++) {
std::string key = ToString(i+1);
std::string key = std::to_string(i + 1);
Status s = cache->Insert(key, new Value(i + 1), 1, &deleter, &handles[i]);
ASSERT_TRUE(s.ok());
}
@ -621,7 +631,7 @@ TEST_P(LRUCacheTest, SetStrictCapacityLimit) {
std::vector<Cache::Handle*> handles(10);
Status s;
for (size_t i = 0; i < 10; i++) {
std::string key = ToString(i + 1);
std::string key = std::to_string(i + 1);
s = cache->Insert(key, new Value(i + 1), 1, &deleter, &handles[i]);
ASSERT_OK(s);
ASSERT_NE(nullptr, handles[i]);
@ -645,7 +655,7 @@ TEST_P(LRUCacheTest, SetStrictCapacityLimit) {
// test3: init with flag being true.
std::shared_ptr<Cache> cache2 = NewCache(5, 0, true);
for (size_t i = 0; i < 5; i++) {
std::string key = ToString(i + 1);
std::string key = std::to_string(i + 1);
s = cache2->Insert(key, new Value(i + 1), 1, &deleter, &handles[i]);
ASSERT_OK(s);
ASSERT_NE(nullptr, handles[i]);
@ -675,14 +685,14 @@ TEST_P(CacheTest, OverCapacity) {
// Insert n+1 entries, but not releasing.
for (size_t i = 0; i < n + 1; i++) {
std::string key = ToString(i+1);
std::string key = std::to_string(i + 1);
Status s = cache->Insert(key, new Value(i + 1), 1, &deleter, &handles[i]);
ASSERT_TRUE(s.ok());
}
// Guess what's in the cache now?
for (size_t i = 0; i < n + 1; i++) {
std::string key = ToString(i+1);
std::string key = std::to_string(i + 1);
auto h = cache->Lookup(key);
ASSERT_TRUE(h != nullptr);
if (h) cache->Release(h);
@ -703,7 +713,7 @@ TEST_P(CacheTest, OverCapacity) {
// This is consistent with the LRU policy since the element 0
// was released first
for (size_t i = 0; i < n + 1; i++) {
std::string key = ToString(i+1);
std::string key = std::to_string(i + 1);
auto h = cache->Lookup(key);
if (h) {
ASSERT_NE(i, 0U);
@ -744,9 +754,9 @@ TEST_P(CacheTest, ApplyToAllEntriesTest) {
std::vector<std::string> callback_state;
const auto callback = [&](const Slice& key, void* value, size_t charge,
Cache::DeleterFn deleter) {
callback_state.push_back(ToString(DecodeKey(key)) + "," +
ToString(DecodeValue(value)) + "," +
ToString(charge));
callback_state.push_back(std::to_string(DecodeKey(key)) + "," +
std::to_string(DecodeValue(value)) + "," +
std::to_string(charge));
assert(deleter == &CacheTest::Deleter);
};
@ -755,8 +765,8 @@ TEST_P(CacheTest, ApplyToAllEntriesTest) {
for (int i = 0; i < 10; ++i) {
Insert(i, i * 2, i + 1);
inserted.push_back(ToString(i) + "," + ToString(i * 2) + "," +
ToString(i + 1));
inserted.push_back(std::to_string(i) + "," + std::to_string(i * 2) + "," +
std::to_string(i + 1));
}
cache_->ApplyToAllEntries(callback, /*opts*/ {});
@ -838,11 +848,13 @@ TEST_P(CacheTest, GetChargeAndDeleter) {
std::shared_ptr<Cache> (*new_clock_cache_func)(
size_t, int, bool, CacheMetadataChargePolicy) = NewClockCache;
INSTANTIATE_TEST_CASE_P(CacheTestInstance, CacheTest,
testing::Values(kLRU, kClock));
testing::Values(kLRU, kClock, kFast));
#else
INSTANTIATE_TEST_CASE_P(CacheTestInstance, CacheTest, testing::Values(kLRU));
INSTANTIATE_TEST_CASE_P(CacheTestInstance, CacheTest,
testing::Values(kLRU, kFast));
#endif // SUPPORT_CLOCK_CACHE
INSTANTIATE_TEST_CASE_P(CacheTestInstance, LRUCacheTest, testing::Values(kLRU));
INSTANTIATE_TEST_CASE_P(CacheTestInstance, LRUCacheTest,
testing::Values(kLRU, kFast));
} // namespace ROCKSDB_NAMESPACE

10
cache/clock_cache.cc vendored
View File

@ -286,8 +286,8 @@ class ClockCacheShard final : public CacheShard {
return Lookup(key, hash);
}
bool Release(Cache::Handle* handle, bool /*useful*/,
bool force_erase) override {
return Release(handle, force_erase);
bool erase_if_last_ref) override {
return Release(handle, erase_if_last_ref);
}
bool IsReady(Cache::Handle* /*handle*/) override { return true; }
void Wait(Cache::Handle* /*handle*/) override {}
@ -297,7 +297,7 @@ class ClockCacheShard final : public CacheShard {
//
// Not necessary to hold mutex_ before being called.
bool Ref(Cache::Handle* handle) override;
bool Release(Cache::Handle* handle, bool force_erase = false) override;
bool Release(Cache::Handle* handle, bool erase_if_last_ref = false) override;
void Erase(const Slice& key, uint32_t hash) override;
bool EraseAndConfirm(const Slice& key, uint32_t hash,
CleanupContext* context);
@ -725,11 +725,11 @@ Cache::Handle* ClockCacheShard::Lookup(const Slice& key, uint32_t hash) {
return reinterpret_cast<Cache::Handle*>(handle);
}
bool ClockCacheShard::Release(Cache::Handle* h, bool force_erase) {
bool ClockCacheShard::Release(Cache::Handle* h, bool erase_if_last_ref) {
CleanupContext context;
CacheHandle* handle = reinterpret_cast<CacheHandle*>(h);
bool erased = Unref(handle, true, &context);
if (force_erase && !erased) {
if (erase_if_last_ref && !erased) {
erased = EraseAndConfirm(handle->key, handle->hash, &context);
}
Cleanup(context);

View File

@ -3,7 +3,7 @@
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#include "cache/lru_secondary_cache.h"
#include "cache/compressed_secondary_cache.h"
#include <memory>
@ -22,7 +22,7 @@ void DeletionCallback(const Slice& /*key*/, void* obj) {
} // namespace
LRUSecondaryCache::LRUSecondaryCache(
CompressedSecondaryCache::CompressedSecondaryCache(
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
double high_pri_pool_ratio,
std::shared_ptr<MemoryAllocator> memory_allocator, bool use_adaptive_mutex,
@ -37,11 +37,13 @@ LRUSecondaryCache::LRUSecondaryCache(
use_adaptive_mutex, metadata_charge_policy);
}
LRUSecondaryCache::~LRUSecondaryCache() { cache_.reset(); }
CompressedSecondaryCache::~CompressedSecondaryCache() { cache_.reset(); }
std::unique_ptr<SecondaryCacheResultHandle> LRUSecondaryCache::Lookup(
const Slice& key, const Cache::CreateCallback& create_cb, bool /*wait*/) {
std::unique_ptr<SecondaryCacheResultHandle> CompressedSecondaryCache::Lookup(
const Slice& key, const Cache::CreateCallback& create_cb, bool /*wait*/,
bool& is_in_sec_cache) {
std::unique_ptr<SecondaryCacheResultHandle> handle;
is_in_sec_cache = false;
Cache::Handle* lru_handle = cache_->Lookup(key);
if (lru_handle == nullptr) {
return handle;
@ -69,24 +71,25 @@ std::unique_ptr<SecondaryCacheResultHandle> LRUSecondaryCache::Lookup(
cache_options_.memory_allocator.get());
if (!uncompressed) {
cache_->Release(lru_handle, true);
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
return handle;
}
s = create_cb(uncompressed.get(), uncompressed_size, &value, &charge);
}
if (!s.ok()) {
cache_->Release(lru_handle, true);
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
return handle;
}
handle.reset(new LRUSecondaryCacheResultHandle(value, charge));
cache_->Release(lru_handle);
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
handle.reset(new CompressedSecondaryCacheResultHandle(value, charge));
return handle;
}
Status LRUSecondaryCache::Insert(const Slice& key, void* value,
const Cache::CacheItemHelper* helper) {
Status CompressedSecondaryCache::Insert(const Slice& key, void* value,
const Cache::CacheItemHelper* helper) {
size_t size = (*helper->size_cb)(value);
CacheAllocationPtr ptr =
AllocateBlock(size, cache_options_.memory_allocator.get());
@ -125,9 +128,9 @@ Status LRUSecondaryCache::Insert(const Slice& key, void* value,
return cache_->Insert(key, buf, size, DeletionCallback);
}
void LRUSecondaryCache::Erase(const Slice& key) { cache_->Erase(key); }
void CompressedSecondaryCache::Erase(const Slice& key) { cache_->Erase(key); }
std::string LRUSecondaryCache::GetPrintableOptions() const {
std::string CompressedSecondaryCache::GetPrintableOptions() const {
std::string ret;
ret.reserve(20000);
const int kBufferSize = 200;
@ -142,23 +145,23 @@ std::string LRUSecondaryCache::GetPrintableOptions() const {
return ret;
}
std::shared_ptr<SecondaryCache> NewLRUSecondaryCache(
std::shared_ptr<SecondaryCache> NewCompressedSecondaryCache(
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
double high_pri_pool_ratio,
std::shared_ptr<MemoryAllocator> memory_allocator, bool use_adaptive_mutex,
CacheMetadataChargePolicy metadata_charge_policy,
CompressionType compression_type, uint32_t compress_format_version) {
return std::make_shared<LRUSecondaryCache>(
return std::make_shared<CompressedSecondaryCache>(
capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio,
memory_allocator, use_adaptive_mutex, metadata_charge_policy,
compression_type, compress_format_version);
}
std::shared_ptr<SecondaryCache> NewLRUSecondaryCache(
const LRUSecondaryCacheOptions& opts) {
std::shared_ptr<SecondaryCache> NewCompressedSecondaryCache(
const CompressedSecondaryCacheOptions& opts) {
// The secondary_cache is disabled for this LRUCache instance.
assert(opts.secondary_cache == nullptr);
return NewLRUSecondaryCache(
return NewCompressedSecondaryCache(
opts.capacity, opts.num_shard_bits, opts.strict_capacity_limit,
opts.high_pri_pool_ratio, opts.memory_allocator, opts.use_adaptive_mutex,
opts.metadata_charge_policy, opts.compression_type,

View File

@ -16,15 +16,16 @@
namespace ROCKSDB_NAMESPACE {
class LRUSecondaryCacheResultHandle : public SecondaryCacheResultHandle {
class CompressedSecondaryCacheResultHandle : public SecondaryCacheResultHandle {
public:
LRUSecondaryCacheResultHandle(void* value, size_t size)
CompressedSecondaryCacheResultHandle(void* value, size_t size)
: value_(value), size_(size) {}
virtual ~LRUSecondaryCacheResultHandle() override = default;
virtual ~CompressedSecondaryCacheResultHandle() override = default;
LRUSecondaryCacheResultHandle(const LRUSecondaryCacheResultHandle&) = delete;
LRUSecondaryCacheResultHandle& operator=(
const LRUSecondaryCacheResultHandle&) = delete;
CompressedSecondaryCacheResultHandle(
const CompressedSecondaryCacheResultHandle&) = delete;
CompressedSecondaryCacheResultHandle& operator=(
const CompressedSecondaryCacheResultHandle&) = delete;
bool IsReady() override { return true; }
@ -39,19 +40,19 @@ class LRUSecondaryCacheResultHandle : public SecondaryCacheResultHandle {
size_t size_;
};
// The LRUSecondaryCache is a concrete implementation of
// The CompressedSecondaryCache is a concrete implementation of
// rocksdb::SecondaryCache.
//
// Users can also cast a pointer to it and call methods on
// it directly, especially custom methods that may be added
// in the future. For example -
// std::unique_ptr<rocksdb::SecondaryCache> cache =
// NewLRUSecondaryCache(opts);
// static_cast<LRUSecondaryCache*>(cache.get())->Erase(key);
// NewCompressedSecondaryCache(opts);
// static_cast<CompressedSecondaryCache*>(cache.get())->Erase(key);
class LRUSecondaryCache : public SecondaryCache {
class CompressedSecondaryCache : public SecondaryCache {
public:
LRUSecondaryCache(
CompressedSecondaryCache(
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
double high_pri_pool_ratio,
std::shared_ptr<MemoryAllocator> memory_allocator = nullptr,
@ -60,16 +61,16 @@ class LRUSecondaryCache : public SecondaryCache {
kDontChargeCacheMetadata,
CompressionType compression_type = CompressionType::kLZ4Compression,
uint32_t compress_format_version = 2);
virtual ~LRUSecondaryCache() override;
virtual ~CompressedSecondaryCache() override;
const char* Name() const override { return "LRUSecondaryCache"; }
const char* Name() const override { return "CompressedSecondaryCache"; }
Status Insert(const Slice& key, void* value,
const Cache::CacheItemHelper* helper) override;
std::unique_ptr<SecondaryCacheResultHandle> Lookup(
const Slice& key, const Cache::CreateCallback& create_cb,
bool /*wait*/) override;
const Slice& key, const Cache::CreateCallback& create_cb, bool /*wait*/,
bool& is_in_sec_cache) override;
void Erase(const Slice& key) override;
@ -79,7 +80,7 @@ class LRUSecondaryCache : public SecondaryCache {
private:
std::shared_ptr<Cache> cache_;
LRUSecondaryCacheOptions cache_options_;
CompressedSecondaryCacheOptions cache_options_;
};
} // namespace ROCKSDB_NAMESPACE

View File

@ -3,7 +3,7 @@
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#include "cache/lru_secondary_cache.h"
#include "cache/compressed_secondary_cache.h"
#include <algorithm>
#include <cstdint>
@ -17,10 +17,10 @@
namespace ROCKSDB_NAMESPACE {
class LRUSecondaryCacheTest : public testing::Test {
class CompressedSecondaryCacheTest : public testing::Test {
public:
LRUSecondaryCacheTest() : fail_create_(false) {}
~LRUSecondaryCacheTest() {}
CompressedSecondaryCacheTest() : fail_create_(false) {}
~CompressedSecondaryCacheTest() {}
protected:
class TestItem {
@ -80,7 +80,7 @@ class LRUSecondaryCacheTest : public testing::Test {
void SetFailCreate(bool fail) { fail_create_ = fail; }
void BasicTest(bool sec_cache_is_compressed, bool use_jemalloc) {
LRUSecondaryCacheOptions opts;
CompressedSecondaryCacheOptions opts;
opts.capacity = 2048;
opts.num_shard_bits = 0;
opts.metadata_charge_policy = kDontChargeCacheMetadata;
@ -107,11 +107,13 @@ class LRUSecondaryCacheTest : public testing::Test {
ROCKSDB_GTEST_BYPASS("JEMALLOC not supported");
}
}
std::shared_ptr<SecondaryCache> cache = NewLRUSecondaryCache(opts);
std::shared_ptr<SecondaryCache> sec_cache =
NewCompressedSecondaryCache(opts);
bool is_in_sec_cache{true};
// Lookup an non-existent key.
std::unique_ptr<SecondaryCacheResultHandle> handle0 =
cache->Lookup("k0", test_item_creator, true);
sec_cache->Lookup("k0", test_item_creator, true, is_in_sec_cache);
ASSERT_EQ(handle0, nullptr);
Random rnd(301);
@ -119,51 +121,47 @@ class LRUSecondaryCacheTest : public testing::Test {
std::string str1;
test::CompressibleString(&rnd, 0.25, 1000, &str1);
TestItem item1(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", &item1, &LRUSecondaryCacheTest::helper_));
ASSERT_OK(sec_cache->Insert("k1", &item1,
&CompressedSecondaryCacheTest::helper_));
std::unique_ptr<SecondaryCacheResultHandle> handle1 =
cache->Lookup("k1", test_item_creator, true);
sec_cache->Lookup("k1", test_item_creator, true, is_in_sec_cache);
ASSERT_NE(handle1, nullptr);
// delete reinterpret_cast<TestItem*>(handle1->Value());
ASSERT_FALSE(is_in_sec_cache);
std::unique_ptr<TestItem> val1 =
std::unique_ptr<TestItem>(static_cast<TestItem*>(handle1->Value()));
ASSERT_NE(val1, nullptr);
ASSERT_EQ(memcmp(val1->Buf(), item1.Buf(), item1.Size()), 0);
// Lookup the first item again.
std::unique_ptr<SecondaryCacheResultHandle> handle1_1 =
sec_cache->Lookup("k1", test_item_creator, true, is_in_sec_cache);
ASSERT_EQ(handle1_1, nullptr);
// Insert and Lookup the second item.
std::string str2;
test::CompressibleString(&rnd, 0.5, 1000, &str2);
TestItem item2(str2.data(), str2.length());
ASSERT_OK(cache->Insert("k2", &item2, &LRUSecondaryCacheTest::helper_));
ASSERT_OK(sec_cache->Insert("k2", &item2,
&CompressedSecondaryCacheTest::helper_));
std::unique_ptr<SecondaryCacheResultHandle> handle2 =
cache->Lookup("k2", test_item_creator, true);
sec_cache->Lookup("k2", test_item_creator, true, is_in_sec_cache);
ASSERT_NE(handle2, nullptr);
std::unique_ptr<TestItem> val2 =
std::unique_ptr<TestItem>(static_cast<TestItem*>(handle2->Value()));
ASSERT_NE(val2, nullptr);
ASSERT_EQ(memcmp(val2->Buf(), item2.Buf(), item2.Size()), 0);
// Lookup the first item again to make sure it is still in the cache.
std::unique_ptr<SecondaryCacheResultHandle> handle1_1 =
cache->Lookup("k1", test_item_creator, true);
ASSERT_NE(handle1_1, nullptr);
std::unique_ptr<TestItem> val1_1 =
std::unique_ptr<TestItem>(static_cast<TestItem*>(handle1_1->Value()));
ASSERT_NE(val1_1, nullptr);
ASSERT_EQ(memcmp(val1_1->Buf(), item1.Buf(), item1.Size()), 0);
std::vector<SecondaryCacheResultHandle*> handles = {handle1.get(),
handle2.get()};
cache->WaitAll(handles);
sec_cache->WaitAll(handles);
cache->Erase("k1");
handle1 = cache->Lookup("k1", test_item_creator, true);
ASSERT_EQ(handle1, nullptr);
cache.reset();
sec_cache.reset();
}
void FailsTest(bool sec_cache_is_compressed) {
LRUSecondaryCacheOptions secondary_cache_opts;
CompressedSecondaryCacheOptions secondary_cache_opts;
if (sec_cache_is_compressed) {
if (!LZ4_Supported()) {
ROCKSDB_GTEST_SKIP("This test requires LZ4 support.");
@ -176,32 +174,28 @@ class LRUSecondaryCacheTest : public testing::Test {
secondary_cache_opts.capacity = 1100;
secondary_cache_opts.num_shard_bits = 0;
secondary_cache_opts.metadata_charge_policy = kDontChargeCacheMetadata;
std::shared_ptr<SecondaryCache> cache =
NewLRUSecondaryCache(secondary_cache_opts);
std::shared_ptr<SecondaryCache> sec_cache =
NewCompressedSecondaryCache(secondary_cache_opts);
// Insert and Lookup the first item.
Random rnd(301);
std::string str1(rnd.RandomString(1000));
TestItem item1(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", &item1, &LRUSecondaryCacheTest::helper_));
std::unique_ptr<SecondaryCacheResultHandle> handle1 =
cache->Lookup("k1", test_item_creator, true);
ASSERT_NE(handle1, nullptr);
std::unique_ptr<TestItem> val1 =
std::unique_ptr<TestItem>(static_cast<TestItem*>(handle1->Value()));
ASSERT_NE(val1, nullptr);
ASSERT_EQ(memcmp(val1->Buf(), item1.Buf(), item1.Size()), 0);
ASSERT_OK(sec_cache->Insert("k1", &item1,
&CompressedSecondaryCacheTest::helper_));
// Insert and Lookup the second item.
std::string str2(rnd.RandomString(200));
TestItem item2(str2.data(), str2.length());
// k1 is evicted.
ASSERT_OK(cache->Insert("k2", &item2, &LRUSecondaryCacheTest::helper_));
ASSERT_OK(sec_cache->Insert("k2", &item2,
&CompressedSecondaryCacheTest::helper_));
bool is_in_sec_cache{false};
std::unique_ptr<SecondaryCacheResultHandle> handle1_1 =
cache->Lookup("k1", test_item_creator, true);
sec_cache->Lookup("k1", test_item_creator, true, is_in_sec_cache);
ASSERT_EQ(handle1_1, nullptr);
std::unique_ptr<SecondaryCacheResultHandle> handle2 =
cache->Lookup("k2", test_item_creator, true);
sec_cache->Lookup("k2", test_item_creator, true, is_in_sec_cache);
ASSERT_NE(handle2, nullptr);
std::unique_ptr<TestItem> val2 =
std::unique_ptr<TestItem>(static_cast<TestItem*>(handle2->Value()));
@ -211,20 +205,20 @@ class LRUSecondaryCacheTest : public testing::Test {
// Create Fails.
SetFailCreate(true);
std::unique_ptr<SecondaryCacheResultHandle> handle2_1 =
cache->Lookup("k2", test_item_creator, true);
sec_cache->Lookup("k2", test_item_creator, true, is_in_sec_cache);
ASSERT_EQ(handle2_1, nullptr);
// Save Fails.
std::string str3 = rnd.RandomString(10);
TestItem item3(str3.data(), str3.length());
ASSERT_NOK(
cache->Insert("k3", &item3, &LRUSecondaryCacheTest::helper_fail_));
ASSERT_NOK(sec_cache->Insert("k3", &item3,
&CompressedSecondaryCacheTest::helper_fail_));
cache.reset();
sec_cache.reset();
}
void BasicIntegrationTest(bool sec_cache_is_compressed) {
LRUSecondaryCacheOptions secondary_cache_opts;
CompressedSecondaryCacheOptions secondary_cache_opts;
if (sec_cache_is_compressed) {
if (!LZ4_Supported()) {
@ -239,7 +233,7 @@ class LRUSecondaryCacheTest : public testing::Test {
secondary_cache_opts.num_shard_bits = 0;
secondary_cache_opts.metadata_charge_policy = kDontChargeCacheMetadata;
std::shared_ptr<SecondaryCache> secondary_cache =
NewLRUSecondaryCache(secondary_cache_opts);
NewCompressedSecondaryCache(secondary_cache_opts);
LRUCacheOptions lru_cache_opts(1024, 0, false, 0.5, nullptr,
kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
@ -252,26 +246,26 @@ class LRUSecondaryCacheTest : public testing::Test {
std::string str1 = rnd.RandomString(1010);
std::string str1_clone{str1};
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k1", item1, &CompressedSecondaryCacheTest::helper_,
str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// After Insert, lru cache contains k2 and secondary cache contains k1.
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k2", item2, &CompressedSecondaryCacheTest::helper_,
str2.length()));
std::string str3 = rnd.RandomString(1020);
TestItem* item3 = new TestItem(str3.data(), str3.length());
// After Insert, lru cache contains k3 and secondary cache contains k1 and
// k2
ASSERT_OK(cache->Insert("k3", item3, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k3", item3, &CompressedSecondaryCacheTest::helper_,
str3.length()));
Cache::Handle* handle;
handle =
cache->Lookup("k3", &LRUSecondaryCacheTest::helper_, test_item_creator,
Cache::Priority::LOW, true, stats.get());
handle = cache->Lookup("k3", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true,
stats.get());
ASSERT_NE(handle, nullptr);
TestItem* val3 = static_cast<TestItem*>(cache->Value(handle));
ASSERT_NE(val3, nullptr);
@ -279,34 +273,35 @@ class LRUSecondaryCacheTest : public testing::Test {
cache->Release(handle);
// Lookup an non-existent key.
handle =
cache->Lookup("k0", &LRUSecondaryCacheTest::helper_, test_item_creator,
Cache::Priority::LOW, true, stats.get());
handle = cache->Lookup("k0", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true,
stats.get());
ASSERT_EQ(handle, nullptr);
// This Lookup should promote k1 and demote k3, so k2 is evicted from the
// secondary cache. The lru cache contains k1 and secondary cache contains
// k3. item1 was Free(), so it cannot be compared against the item1.
handle =
cache->Lookup("k1", &LRUSecondaryCacheTest::helper_, test_item_creator,
Cache::Priority::LOW, true, stats.get());
// This Lookup should promote k1 and erase k1 from the secondary cache,
// then k3 is demoted. So k2 and k3 are in the secondary cache.
handle = cache->Lookup("k1", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true,
stats.get());
ASSERT_NE(handle, nullptr);
TestItem* val1_1 = static_cast<TestItem*>(cache->Value(handle));
ASSERT_NE(val1_1, nullptr);
ASSERT_EQ(memcmp(val1_1->Buf(), str1_clone.data(), str1_clone.size()), 0);
cache->Release(handle);
handle =
cache->Lookup("k2", &LRUSecondaryCacheTest::helper_, test_item_creator,
Cache::Priority::LOW, true, stats.get());
ASSERT_EQ(handle, nullptr);
handle = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true,
stats.get());
ASSERT_NE(handle, nullptr);
cache->Release(handle);
cache.reset();
secondary_cache.reset();
}
void BasicIntegrationFailTest(bool sec_cache_is_compressed) {
LRUSecondaryCacheOptions secondary_cache_opts;
CompressedSecondaryCacheOptions secondary_cache_opts;
if (sec_cache_is_compressed) {
if (!LZ4_Supported()) {
@ -321,7 +316,7 @@ class LRUSecondaryCacheTest : public testing::Test {
secondary_cache_opts.num_shard_bits = 0;
secondary_cache_opts.metadata_charge_policy = kDontChargeCacheMetadata;
std::shared_ptr<SecondaryCache> secondary_cache =
NewLRUSecondaryCache(secondary_cache_opts);
NewCompressedSecondaryCache(secondary_cache_opts);
LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
@ -333,7 +328,8 @@ class LRUSecondaryCacheTest : public testing::Test {
auto item1 =
std::unique_ptr<TestItem>(new TestItem(str1.data(), str1.length()));
ASSERT_NOK(cache->Insert("k1", item1.get(), nullptr, str1.length()));
ASSERT_OK(cache->Insert("k1", item1.get(), &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k1", item1.get(),
&CompressedSecondaryCacheTest::helper_,
str1.length()));
item1.release(); // Appease clang-analyze "potential memory leak"
@ -341,7 +337,7 @@ class LRUSecondaryCacheTest : public testing::Test {
handle = cache->Lookup("k2", nullptr, test_item_creator,
Cache::Priority::LOW, true);
ASSERT_EQ(handle, nullptr);
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, false);
ASSERT_EQ(handle, nullptr);
@ -350,7 +346,7 @@ class LRUSecondaryCacheTest : public testing::Test {
}
void IntegrationSaveFailTest(bool sec_cache_is_compressed) {
LRUSecondaryCacheOptions secondary_cache_opts;
CompressedSecondaryCacheOptions secondary_cache_opts;
if (sec_cache_is_compressed) {
if (!LZ4_Supported()) {
@ -366,7 +362,7 @@ class LRUSecondaryCacheTest : public testing::Test {
secondary_cache_opts.metadata_charge_policy = kDontChargeCacheMetadata;
std::shared_ptr<SecondaryCache> secondary_cache =
NewLRUSecondaryCache(secondary_cache_opts);
NewCompressedSecondaryCache(secondary_cache_opts);
LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
@ -376,25 +372,27 @@ class LRUSecondaryCacheTest : public testing::Test {
Random rnd(301);
std::string str1 = rnd.RandomString(1020);
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_fail_,
ASSERT_OK(cache->Insert("k1", item1,
&CompressedSecondaryCacheTest::helper_fail_,
str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// k1 should be demoted to the secondary cache.
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_fail_,
ASSERT_OK(cache->Insert("k2", item2,
&CompressedSecondaryCacheTest::helper_fail_,
str2.length()));
Cache::Handle* handle;
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_fail_,
handle = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_fail_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
// This lookup should fail, since k1 demotion would have failed
handle = cache->Lookup("k1", &LRUSecondaryCacheTest::helper_fail_,
handle = cache->Lookup("k1", &CompressedSecondaryCacheTest::helper_fail_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_EQ(handle, nullptr);
// Since k1 didn't get promoted, k2 should still be in cache
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_fail_,
handle = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_fail_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
@ -404,7 +402,7 @@ class LRUSecondaryCacheTest : public testing::Test {
}
void IntegrationCreateFailTest(bool sec_cache_is_compressed) {
LRUSecondaryCacheOptions secondary_cache_opts;
CompressedSecondaryCacheOptions secondary_cache_opts;
if (sec_cache_is_compressed) {
if (!LZ4_Supported()) {
@ -420,7 +418,7 @@ class LRUSecondaryCacheTest : public testing::Test {
secondary_cache_opts.metadata_charge_policy = kDontChargeCacheMetadata;
std::shared_ptr<SecondaryCache> secondary_cache =
NewLRUSecondaryCache(secondary_cache_opts);
NewCompressedSecondaryCache(secondary_cache_opts);
LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
@ -430,27 +428,27 @@ class LRUSecondaryCacheTest : public testing::Test {
Random rnd(301);
std::string str1 = rnd.RandomString(1020);
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k1", item1, &CompressedSecondaryCacheTest::helper_,
str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// k1 should be demoted to the secondary cache.
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k2", item2, &CompressedSecondaryCacheTest::helper_,
str2.length()));
Cache::Handle* handle;
SetFailCreate(true);
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
// This lookup should fail, since k1 creation would have failed
handle = cache->Lookup("k1", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k1", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_EQ(handle, nullptr);
// Since k1 didn't get promoted, k2 should still be in cache
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
@ -460,7 +458,7 @@ class LRUSecondaryCacheTest : public testing::Test {
}
void IntegrationFullCapacityTest(bool sec_cache_is_compressed) {
LRUSecondaryCacheOptions secondary_cache_opts;
CompressedSecondaryCacheOptions secondary_cache_opts;
if (sec_cache_is_compressed) {
if (!LZ4_Supported()) {
@ -476,7 +474,7 @@ class LRUSecondaryCacheTest : public testing::Test {
secondary_cache_opts.metadata_charge_policy = kDontChargeCacheMetadata;
std::shared_ptr<SecondaryCache> secondary_cache =
NewLRUSecondaryCache(secondary_cache_opts);
NewCompressedSecondaryCache(secondary_cache_opts);
LRUCacheOptions opts(1024, 0, /*_strict_capacity_limit=*/true, 0.5, nullptr,
kDefaultToAdaptiveMutex, kDontChargeCacheMetadata);
@ -486,31 +484,32 @@ class LRUSecondaryCacheTest : public testing::Test {
Random rnd(301);
std::string str1 = rnd.RandomString(1020);
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k1", item1, &CompressedSecondaryCacheTest::helper_,
str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// k1 should be demoted to the secondary cache.
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k2", item2, &CompressedSecondaryCacheTest::helper_,
str2.length()));
Cache::Handle* handle;
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
// k1 promotion should fail due to the block cache being at capacity,
// but the lookup should still succeed
Cache::Handle* handle2;
handle2 = cache->Lookup("k1", &LRUSecondaryCacheTest::helper_,
handle2 = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle2, nullptr);
cache->Release(handle2);
// k1 promotion should fail due to the block cache being at capacity,
// but the lookup should still succeed
Cache::Handle* handle1;
handle1 = cache->Lookup("k1", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle1, nullptr);
cache->Release(handle1);
// Since k1 didn't get inserted, k2 should still be in cache
handle2 = cache->Lookup("k2", &CompressedSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle2, nullptr);
// Since k1 didn't get inserted, k2 should still be in cache
cache->Release(handle);
cache->Release(handle2);
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
cache.reset();
secondary_cache.reset();
@ -520,72 +519,83 @@ class LRUSecondaryCacheTest : public testing::Test {
bool fail_create_;
};
Cache::CacheItemHelper LRUSecondaryCacheTest::helper_(
LRUSecondaryCacheTest::SizeCallback, LRUSecondaryCacheTest::SaveToCallback,
LRUSecondaryCacheTest::DeletionCallback);
Cache::CacheItemHelper CompressedSecondaryCacheTest::helper_(
CompressedSecondaryCacheTest::SizeCallback,
CompressedSecondaryCacheTest::SaveToCallback,
CompressedSecondaryCacheTest::DeletionCallback);
Cache::CacheItemHelper LRUSecondaryCacheTest::helper_fail_(
LRUSecondaryCacheTest::SizeCallback,
LRUSecondaryCacheTest::SaveToCallbackFail,
LRUSecondaryCacheTest::DeletionCallback);
Cache::CacheItemHelper CompressedSecondaryCacheTest::helper_fail_(
CompressedSecondaryCacheTest::SizeCallback,
CompressedSecondaryCacheTest::SaveToCallbackFail,
CompressedSecondaryCacheTest::DeletionCallback);
TEST_F(LRUSecondaryCacheTest, BasicTestWithNoCompression) {
TEST_F(CompressedSecondaryCacheTest, BasicTestWithNoCompression) {
BasicTest(false, false);
}
TEST_F(LRUSecondaryCacheTest, BasicTestWithMemoryAllocatorAndNoCompression) {
TEST_F(CompressedSecondaryCacheTest,
BasicTestWithMemoryAllocatorAndNoCompression) {
BasicTest(false, true);
}
TEST_F(LRUSecondaryCacheTest, BasicTestWithCompression) {
TEST_F(CompressedSecondaryCacheTest, BasicTestWithCompression) {
BasicTest(true, false);
}
TEST_F(LRUSecondaryCacheTest, BasicTestWithMemoryAllocatorAndCompression) {
TEST_F(CompressedSecondaryCacheTest,
BasicTestWithMemoryAllocatorAndCompression) {
BasicTest(true, true);
}
TEST_F(LRUSecondaryCacheTest, FailsTestWithNoCompression) { FailsTest(false); }
TEST_F(CompressedSecondaryCacheTest, FailsTestWithNoCompression) {
FailsTest(false);
}
TEST_F(LRUSecondaryCacheTest, FailsTestWithCompression) { FailsTest(true); }
TEST_F(CompressedSecondaryCacheTest, FailsTestWithCompression) {
FailsTest(true);
}
TEST_F(LRUSecondaryCacheTest, BasicIntegrationTestWithNoCompression) {
TEST_F(CompressedSecondaryCacheTest, BasicIntegrationTestWithNoCompression) {
BasicIntegrationTest(false);
}
TEST_F(LRUSecondaryCacheTest, BasicIntegrationTestWithCompression) {
TEST_F(CompressedSecondaryCacheTest, BasicIntegrationTestWithCompression) {
BasicIntegrationTest(true);
}
TEST_F(LRUSecondaryCacheTest, BasicIntegrationFailTestWithNoCompression) {
TEST_F(CompressedSecondaryCacheTest,
BasicIntegrationFailTestWithNoCompression) {
BasicIntegrationFailTest(false);
}
TEST_F(LRUSecondaryCacheTest, BasicIntegrationFailTestWithCompression) {
TEST_F(CompressedSecondaryCacheTest, BasicIntegrationFailTestWithCompression) {
BasicIntegrationFailTest(true);
}
TEST_F(LRUSecondaryCacheTest, IntegrationSaveFailTestWithNoCompression) {
TEST_F(CompressedSecondaryCacheTest, IntegrationSaveFailTestWithNoCompression) {
IntegrationSaveFailTest(false);
}
TEST_F(LRUSecondaryCacheTest, IntegrationSaveFailTestWithCompression) {
TEST_F(CompressedSecondaryCacheTest, IntegrationSaveFailTestWithCompression) {
IntegrationSaveFailTest(true);
}
TEST_F(LRUSecondaryCacheTest, IntegrationCreateFailTestWithNoCompression) {
TEST_F(CompressedSecondaryCacheTest,
IntegrationCreateFailTestWithNoCompression) {
IntegrationCreateFailTest(false);
}
TEST_F(LRUSecondaryCacheTest, IntegrationCreateFailTestWithCompression) {
TEST_F(CompressedSecondaryCacheTest, IntegrationCreateFailTestWithCompression) {
IntegrationCreateFailTest(true);
}
TEST_F(LRUSecondaryCacheTest, IntegrationFullCapacityTestWithNoCompression) {
TEST_F(CompressedSecondaryCacheTest,
IntegrationFullCapacityTestWithNoCompression) {
IntegrationFullCapacityTest(false);
}
TEST_F(LRUSecondaryCacheTest, IntegrationFullCapacityTestWithCompression) {
TEST_F(CompressedSecondaryCacheTest,
IntegrationFullCapacityTestWithCompression) {
IntegrationFullCapacityTest(true);
}

511
cache/fast_lru_cache.cc vendored Normal file
View File

@ -0,0 +1,511 @@
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#include "cache/fast_lru_cache.h"
#include <cassert>
#include <cstdint>
#include <cstdio>
#include "monitoring/perf_context_imp.h"
#include "monitoring/statistics.h"
#include "port/lang.h"
#include "util/mutexlock.h"
namespace ROCKSDB_NAMESPACE {
namespace fast_lru_cache {
LRUHandleTable::LRUHandleTable(int max_upper_hash_bits)
: length_bits_(/* historical starting size*/ 4),
list_(new LRUHandle* [size_t{1} << length_bits_] {}),
elems_(0),
max_length_bits_(max_upper_hash_bits) {}
LRUHandleTable::~LRUHandleTable() {
ApplyToEntriesRange(
[](LRUHandle* h) {
if (!h->HasRefs()) {
h->Free();
}
},
0, uint32_t{1} << length_bits_);
}
LRUHandle* LRUHandleTable::Lookup(const Slice& key, uint32_t hash) {
return *FindPointer(key, hash);
}
LRUHandle* LRUHandleTable::Insert(LRUHandle* h) {
LRUHandle** ptr = FindPointer(h->key(), h->hash);
LRUHandle* old = *ptr;
h->next_hash = (old == nullptr ? nullptr : old->next_hash);
*ptr = h;
if (old == nullptr) {
++elems_;
if ((elems_ >> length_bits_) > 0) { // elems_ >= length
// Since each cache entry is fairly large, we aim for a small
// average linked list length (<= 1).
Resize();
}
}
return old;
}
LRUHandle* LRUHandleTable::Remove(const Slice& key, uint32_t hash) {
LRUHandle** ptr = FindPointer(key, hash);
LRUHandle* result = *ptr;
if (result != nullptr) {
*ptr = result->next_hash;
--elems_;
}
return result;
}
LRUHandle** LRUHandleTable::FindPointer(const Slice& key, uint32_t hash) {
LRUHandle** ptr = &list_[hash >> (32 - length_bits_)];
while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) {
ptr = &(*ptr)->next_hash;
}
return ptr;
}
void LRUHandleTable::Resize() {
if (length_bits_ >= max_length_bits_) {
// Due to reaching limit of hash information, if we made the table bigger,
// we would allocate more addresses but only the same number would be used.
return;
}
if (length_bits_ >= 31) {
// Avoid undefined behavior shifting uint32_t by 32.
return;
}
uint32_t old_length = uint32_t{1} << length_bits_;
int new_length_bits = length_bits_ + 1;
std::unique_ptr<LRUHandle* []> new_list {
new LRUHandle* [size_t{1} << new_length_bits] {}
};
uint32_t count = 0;
for (uint32_t i = 0; i < old_length; i++) {
LRUHandle* h = list_[i];
while (h != nullptr) {
LRUHandle* next = h->next_hash;
uint32_t hash = h->hash;
LRUHandle** ptr = &new_list[hash >> (32 - new_length_bits)];
h->next_hash = *ptr;
*ptr = h;
h = next;
count++;
}
}
assert(elems_ == count);
list_ = std::move(new_list);
length_bits_ = new_length_bits;
}
LRUCacheShard::LRUCacheShard(size_t capacity, bool strict_capacity_limit,
CacheMetadataChargePolicy metadata_charge_policy,
int max_upper_hash_bits)
: capacity_(0),
strict_capacity_limit_(strict_capacity_limit),
table_(max_upper_hash_bits),
usage_(0),
lru_usage_(0) {
set_metadata_charge_policy(metadata_charge_policy);
// Make empty circular linked list.
lru_.next = &lru_;
lru_.prev = &lru_;
lru_low_pri_ = &lru_;
SetCapacity(capacity);
}
void LRUCacheShard::EraseUnRefEntries() {
autovector<LRUHandle*> last_reference_list;
{
MutexLock l(&mutex_);
while (lru_.next != &lru_) {
LRUHandle* old = lru_.next;
// LRU list contains only elements which can be evicted.
assert(old->InCache() && !old->HasRefs());
LRU_Remove(old);
table_.Remove(old->key(), old->hash);
old->SetInCache(false);
size_t total_charge = old->CalcTotalCharge(metadata_charge_policy_);
assert(usage_ >= total_charge);
usage_ -= total_charge;
last_reference_list.push_back(old);
}
}
// Free the entries here outside of mutex for performance reasons.
for (auto entry : last_reference_list) {
entry->Free();
}
}
void LRUCacheShard::ApplyToSomeEntries(
const std::function<void(const Slice& key, void* value, size_t charge,
DeleterFn deleter)>& callback,
uint32_t average_entries_per_lock, uint32_t* state) {
// The state is essentially going to be the starting hash, which works
// nicely even if we resize between calls because we use upper-most
// hash bits for table indexes.
MutexLock l(&mutex_);
uint32_t length_bits = table_.GetLengthBits();
uint32_t length = uint32_t{1} << length_bits;
assert(average_entries_per_lock > 0);
// Assuming we are called with same average_entries_per_lock repeatedly,
// this simplifies some logic (index_end will not overflow).
assert(average_entries_per_lock < length || *state == 0);
uint32_t index_begin = *state >> (32 - length_bits);
uint32_t index_end = index_begin + average_entries_per_lock;
if (index_end >= length) {
// Going to end
index_end = length;
*state = UINT32_MAX;
} else {
*state = index_end << (32 - length_bits);
}
table_.ApplyToEntriesRange(
[callback](LRUHandle* h) {
callback(h->key(), h->value, h->charge, h->deleter);
},
index_begin, index_end);
}
void LRUCacheShard::LRU_Remove(LRUHandle* e) {
assert(e->next != nullptr);
assert(e->prev != nullptr);
e->next->prev = e->prev;
e->prev->next = e->next;
e->prev = e->next = nullptr;
size_t total_charge = e->CalcTotalCharge(metadata_charge_policy_);
assert(lru_usage_ >= total_charge);
lru_usage_ -= total_charge;
}
void LRUCacheShard::LRU_Insert(LRUHandle* e) {
assert(e->next == nullptr);
assert(e->prev == nullptr);
size_t total_charge = e->CalcTotalCharge(metadata_charge_policy_);
// Inset "e" to head of LRU list.
e->next = &lru_;
e->prev = lru_.prev;
e->prev->next = e;
e->next->prev = e;
lru_usage_ += total_charge;
}
void LRUCacheShard::EvictFromLRU(size_t charge,
autovector<LRUHandle*>* deleted) {
while ((usage_ + charge) > capacity_ && lru_.next != &lru_) {
LRUHandle* old = lru_.next;
// LRU list contains only elements which can be evicted.
assert(old->InCache() && !old->HasRefs());
LRU_Remove(old);
table_.Remove(old->key(), old->hash);
old->SetInCache(false);
size_t old_total_charge = old->CalcTotalCharge(metadata_charge_policy_);
assert(usage_ >= old_total_charge);
usage_ -= old_total_charge;
deleted->push_back(old);
}
}
void LRUCacheShard::SetCapacity(size_t capacity) {
autovector<LRUHandle*> last_reference_list;
{
MutexLock l(&mutex_);
capacity_ = capacity;
EvictFromLRU(0, &last_reference_list);
}
// Free the entries here outside of mutex for performance reasons.
for (auto entry : last_reference_list) {
entry->Free();
}
}
void LRUCacheShard::SetStrictCapacityLimit(bool strict_capacity_limit) {
MutexLock l(&mutex_);
strict_capacity_limit_ = strict_capacity_limit;
}
Status LRUCacheShard::InsertItem(LRUHandle* e, Cache::Handle** handle,
bool free_handle_on_fail) {
Status s = Status::OK();
autovector<LRUHandle*> last_reference_list;
size_t total_charge = e->CalcTotalCharge(metadata_charge_policy_);
{
MutexLock l(&mutex_);
// Free the space following strict LRU policy until enough space
// is freed or the lru list is empty.
EvictFromLRU(total_charge, &last_reference_list);
if ((usage_ + total_charge) > capacity_ &&
(strict_capacity_limit_ || handle == nullptr)) {
e->SetInCache(false);
if (handle == nullptr) {
// Don't insert the entry but still return ok, as if the entry inserted
// into cache and get evicted immediately.
last_reference_list.push_back(e);
} else {
if (free_handle_on_fail) {
delete[] reinterpret_cast<char*>(e);
*handle = nullptr;
}
s = Status::Incomplete("Insert failed due to LRU cache being full.");
}
} else {
// Insert into the cache. Note that the cache might get larger than its
// capacity if not enough space was freed up.
LRUHandle* old = table_.Insert(e);
usage_ += total_charge;
if (old != nullptr) {
s = Status::OkOverwritten();
assert(old->InCache());
old->SetInCache(false);
if (!old->HasRefs()) {
// old is on LRU because it's in cache and its reference count is 0.
LRU_Remove(old);
size_t old_total_charge =
old->CalcTotalCharge(metadata_charge_policy_);
assert(usage_ >= old_total_charge);
usage_ -= old_total_charge;
last_reference_list.push_back(old);
}
}
if (handle == nullptr) {
LRU_Insert(e);
} else {
// If caller already holds a ref, no need to take one here.
if (!e->HasRefs()) {
e->Ref();
}
*handle = reinterpret_cast<Cache::Handle*>(e);
}
}
}
// Free the entries here outside of mutex for performance reasons.
for (auto entry : last_reference_list) {
entry->Free();
}
return s;
}
Cache::Handle* LRUCacheShard::Lookup(const Slice& key, uint32_t hash) {
LRUHandle* e = nullptr;
{
MutexLock l(&mutex_);
e = table_.Lookup(key, hash);
if (e != nullptr) {
assert(e->InCache());
if (!e->HasRefs()) {
// The entry is in LRU since it's in hash and has no external references
LRU_Remove(e);
}
e->Ref();
}
}
return reinterpret_cast<Cache::Handle*>(e);
}
bool LRUCacheShard::Ref(Cache::Handle* h) {
LRUHandle* e = reinterpret_cast<LRUHandle*>(h);
MutexLock l(&mutex_);
// To create another reference - entry must be already externally referenced.
assert(e->HasRefs());
e->Ref();
return true;
}
bool LRUCacheShard::Release(Cache::Handle* handle, bool erase_if_last_ref) {
if (handle == nullptr) {
return false;
}
LRUHandle* e = reinterpret_cast<LRUHandle*>(handle);
bool last_reference = false;
{
MutexLock l(&mutex_);
last_reference = e->Unref();
if (last_reference && e->InCache()) {
// The item is still in cache, and nobody else holds a reference to it.
if (usage_ > capacity_ || erase_if_last_ref) {
// The LRU list must be empty since the cache is full.
assert(lru_.next == &lru_ || erase_if_last_ref);
// Take this opportunity and remove the item.
table_.Remove(e->key(), e->hash);
e->SetInCache(false);
} else {
// Put the item back on the LRU list, and don't free it.
LRU_Insert(e);
last_reference = false;
}
}
// If it was the last reference, then decrement the cache usage.
if (last_reference) {
size_t total_charge = e->CalcTotalCharge(metadata_charge_policy_);
assert(usage_ >= total_charge);
usage_ -= total_charge;
}
}
// Free the entry here outside of mutex for performance reasons.
if (last_reference) {
e->Free();
}
return last_reference;
}
Status LRUCacheShard::Insert(const Slice& key, uint32_t hash, void* value,
size_t charge, Cache::DeleterFn deleter,
Cache::Handle** handle,
Cache::Priority /*priority*/) {
// Allocate the memory here outside of the mutex.
// If the cache is full, we'll have to release it.
// It shouldn't happen very often though.
LRUHandle* e = reinterpret_cast<LRUHandle*>(
new char[sizeof(LRUHandle) - 1 + key.size()]);
e->value = value;
e->flags = 0;
e->deleter = deleter;
e->charge = charge;
e->key_length = key.size();
e->hash = hash;
e->refs = 0;
e->next = e->prev = nullptr;
e->SetInCache(true);
memcpy(e->key_data, key.data(), key.size());
return InsertItem(e, handle, /* free_handle_on_fail */ true);
}
void LRUCacheShard::Erase(const Slice& key, uint32_t hash) {
LRUHandle* e;
bool last_reference = false;
{
MutexLock l(&mutex_);
e = table_.Remove(key, hash);
if (e != nullptr) {
assert(e->InCache());
e->SetInCache(false);
if (!e->HasRefs()) {
// The entry is in LRU since it's in hash and has no external references
LRU_Remove(e);
size_t total_charge = e->CalcTotalCharge(metadata_charge_policy_);
assert(usage_ >= total_charge);
usage_ -= total_charge;
last_reference = true;
}
}
}
// Free the entry here outside of mutex for performance reasons.
// last_reference will only be true if e != nullptr.
if (last_reference) {
e->Free();
}
}
size_t LRUCacheShard::GetUsage() const {
MutexLock l(&mutex_);
return usage_;
}
size_t LRUCacheShard::GetPinnedUsage() const {
MutexLock l(&mutex_);
assert(usage_ >= lru_usage_);
return usage_ - lru_usage_;
}
std::string LRUCacheShard::GetPrintableOptions() const { return std::string{}; }
LRUCache::LRUCache(size_t capacity, int num_shard_bits,
bool strict_capacity_limit,
CacheMetadataChargePolicy metadata_charge_policy)
: ShardedCache(capacity, num_shard_bits, strict_capacity_limit) {
num_shards_ = 1 << num_shard_bits;
shards_ = reinterpret_cast<LRUCacheShard*>(
port::cacheline_aligned_alloc(sizeof(LRUCacheShard) * num_shards_));
size_t per_shard = (capacity + (num_shards_ - 1)) / num_shards_;
for (int i = 0; i < num_shards_; i++) {
new (&shards_[i])
LRUCacheShard(per_shard, strict_capacity_limit, metadata_charge_policy,
/* max_upper_hash_bits */ 32 - num_shard_bits);
}
}
LRUCache::~LRUCache() {
if (shards_ != nullptr) {
assert(num_shards_ > 0);
for (int i = 0; i < num_shards_; i++) {
shards_[i].~LRUCacheShard();
}
port::cacheline_aligned_free(shards_);
}
}
CacheShard* LRUCache::GetShard(uint32_t shard) {
return reinterpret_cast<CacheShard*>(&shards_[shard]);
}
const CacheShard* LRUCache::GetShard(uint32_t shard) const {
return reinterpret_cast<CacheShard*>(&shards_[shard]);
}
void* LRUCache::Value(Handle* handle) {
return reinterpret_cast<const LRUHandle*>(handle)->value;
}
size_t LRUCache::GetCharge(Handle* handle) const {
return reinterpret_cast<const LRUHandle*>(handle)->charge;
}
Cache::DeleterFn LRUCache::GetDeleter(Handle* handle) const {
auto h = reinterpret_cast<const LRUHandle*>(handle);
return h->deleter;
}
uint32_t LRUCache::GetHash(Handle* handle) const {
return reinterpret_cast<const LRUHandle*>(handle)->hash;
}
void LRUCache::DisownData() {
// Leak data only if that won't generate an ASAN/valgrind warning.
if (!kMustFreeHeapAllocations) {
shards_ = nullptr;
num_shards_ = 0;
}
}
} // namespace fast_lru_cache
std::shared_ptr<Cache> NewFastLRUCache(
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
CacheMetadataChargePolicy metadata_charge_policy) {
if (num_shard_bits >= 20) {
return nullptr; // The cache cannot be sharded into too many fine pieces.
}
if (num_shard_bits < 0) {
num_shard_bits = GetDefaultCacheShardBits(capacity);
}
return std::make_shared<fast_lru_cache::LRUCache>(
capacity, num_shard_bits, strict_capacity_limit, metadata_charge_policy);
}
} // namespace ROCKSDB_NAMESPACE

299
cache/fast_lru_cache.h vendored Normal file
View File

@ -0,0 +1,299 @@
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#pragma once
#include <memory>
#include <string>
#include "cache/sharded_cache.h"
#include "port/lang.h"
#include "port/malloc.h"
#include "port/port.h"
#include "rocksdb/secondary_cache.h"
#include "util/autovector.h"
namespace ROCKSDB_NAMESPACE {
namespace fast_lru_cache {
// An experimental (under development!) alternative to LRUCache
struct LRUHandle {
void* value;
Cache::DeleterFn deleter;
LRUHandle* next_hash;
LRUHandle* next;
LRUHandle* prev;
size_t charge; // TODO(opt): Only allow uint32_t?
size_t key_length;
// The hash of key(). Used for fast sharding and comparisons.
uint32_t hash;
// The number of external refs to this entry. The cache itself is not counted.
uint32_t refs;
enum Flags : uint8_t {
// Whether this entry is referenced by the hash table.
IN_CACHE = (1 << 0),
};
uint8_t flags;
// Beginning of the key (MUST BE THE LAST FIELD IN THIS STRUCT!)
char key_data[1];
Slice key() const { return Slice(key_data, key_length); }
// Increase the reference count by 1.
void Ref() { refs++; }
// Just reduce the reference count by 1. Return true if it was last reference.
bool Unref() {
assert(refs > 0);
refs--;
return refs == 0;
}
// Return true if there are external refs, false otherwise.
bool HasRefs() const { return refs > 0; }
bool InCache() const { return flags & IN_CACHE; }
void SetInCache(bool in_cache) {
if (in_cache) {
flags |= IN_CACHE;
} else {
flags &= ~IN_CACHE;
}
}
void Free() {
assert(refs == 0);
if (deleter) {
(*deleter)(key(), value);
}
delete[] reinterpret_cast<char*>(this);
}
// Calculate the memory usage by metadata.
inline size_t CalcTotalCharge(
CacheMetadataChargePolicy metadata_charge_policy) {
size_t meta_charge = 0;
if (metadata_charge_policy == kFullChargeCacheMetadata) {
#ifdef ROCKSDB_MALLOC_USABLE_SIZE
meta_charge += malloc_usable_size(static_cast<void*>(this));
#else
// This is the size that is used when a new handle is created.
meta_charge += sizeof(LRUHandle) - 1 + key_length;
#endif
}
return charge + meta_charge;
}
};
// We provide our own simple hash table since it removes a whole bunch
// of porting hacks and is also faster than some of the built-in hash
// table implementations in some of the compiler/runtime combinations
// we have tested. E.g., readrandom speeds up by ~5% over the g++
// 4.4.3's builtin hashtable.
class LRUHandleTable {
public:
// If the table uses more hash bits than `max_upper_hash_bits`,
// it will eat into the bits used for sharding, which are constant
// for a given LRUHandleTable.
explicit LRUHandleTable(int max_upper_hash_bits);
~LRUHandleTable();
LRUHandle* Lookup(const Slice& key, uint32_t hash);
LRUHandle* Insert(LRUHandle* h);
LRUHandle* Remove(const Slice& key, uint32_t hash);
template <typename T>
void ApplyToEntriesRange(T func, uint32_t index_begin, uint32_t index_end) {
for (uint32_t i = index_begin; i < index_end; i++) {
LRUHandle* h = list_[i];
while (h != nullptr) {
auto n = h->next_hash;
assert(h->InCache());
func(h);
h = n;
}
}
}
int GetLengthBits() const { return length_bits_; }
private:
// Return a pointer to slot that points to a cache entry that
// matches key/hash. If there is no such cache entry, return a
// pointer to the trailing slot in the corresponding linked list.
LRUHandle** FindPointer(const Slice& key, uint32_t hash);
void Resize();
// Number of hash bits (upper because lower bits used for sharding)
// used for table index. Length == 1 << length_bits_
int length_bits_;
// The table consists of an array of buckets where each bucket is
// a linked list of cache entries that hash into the bucket.
std::unique_ptr<LRUHandle*[]> list_;
// Number of elements currently in the table.
uint32_t elems_;
// Set from max_upper_hash_bits (see constructor).
const int max_length_bits_;
};
// A single shard of sharded cache.
class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
public:
LRUCacheShard(size_t capacity, bool strict_capacity_limit,
CacheMetadataChargePolicy metadata_charge_policy,
int max_upper_hash_bits);
~LRUCacheShard() override = default;
// Separate from constructor so caller can easily make an array of LRUCache
// if current usage is more than new capacity, the function will attempt to
// free the needed space.
void SetCapacity(size_t capacity) override;
// Set the flag to reject insertion if cache if full.
void SetStrictCapacityLimit(bool strict_capacity_limit) override;
// Like Cache methods, but with an extra "hash" parameter.
Status Insert(const Slice& key, uint32_t hash, void* value, size_t charge,
Cache::DeleterFn deleter, Cache::Handle** handle,
Cache::Priority priority) override;
Status Insert(const Slice& key, uint32_t hash, void* value,
const Cache::CacheItemHelper* helper, size_t charge,
Cache::Handle** handle, Cache::Priority priority) override {
return Insert(key, hash, value, charge, helper->del_cb, handle, priority);
}
Cache::Handle* Lookup(const Slice& key, uint32_t hash,
const Cache::CacheItemHelper* /*helper*/,
const Cache::CreateCallback& /*create_cb*/,
Cache::Priority /*priority*/, bool /*wait*/,
Statistics* /*stats*/) override {
return Lookup(key, hash);
}
Cache::Handle* Lookup(const Slice& key, uint32_t hash) override;
bool Release(Cache::Handle* handle, bool /*useful*/,
bool erase_if_last_ref) override {
return Release(handle, erase_if_last_ref);
}
bool IsReady(Cache::Handle* /*handle*/) override { return true; }
void Wait(Cache::Handle* /*handle*/) override {}
bool Ref(Cache::Handle* handle) override;
bool Release(Cache::Handle* handle, bool erase_if_last_ref = false) override;
void Erase(const Slice& key, uint32_t hash) override;
size_t GetUsage() const override;
size_t GetPinnedUsage() const override;
void ApplyToSomeEntries(
const std::function<void(const Slice& key, void* value, size_t charge,
DeleterFn deleter)>& callback,
uint32_t average_entries_per_lock, uint32_t* state) override;
void EraseUnRefEntries() override;
std::string GetPrintableOptions() const override;
private:
friend class LRUCache;
// Insert an item into the hash table and, if handle is null, insert into
// the LRU list. Older items are evicted as necessary. If the cache is full
// and free_handle_on_fail is true, the item is deleted and handle is set to
// nullptr.
Status InsertItem(LRUHandle* item, Cache::Handle** handle,
bool free_handle_on_fail);
void LRU_Remove(LRUHandle* e);
void LRU_Insert(LRUHandle* e);
// Free some space following strict LRU policy until enough space
// to hold (usage_ + charge) is freed or the lru list is empty
// This function is not thread safe - it needs to be executed while
// holding the mutex_.
void EvictFromLRU(size_t charge, autovector<LRUHandle*>* deleted);
// Initialized before use.
size_t capacity_;
// Whether to reject insertion if cache reaches its full capacity.
bool strict_capacity_limit_;
// Dummy head of LRU list.
// lru.prev is newest entry, lru.next is oldest entry.
// LRU contains items which can be evicted, ie reference only by cache
LRUHandle lru_;
// Pointer to head of low-pri pool in LRU list.
LRUHandle* lru_low_pri_;
// ------------^^^^^^^^^^^^^-----------
// Not frequently modified data members
// ------------------------------------
//
// We separate data members that are updated frequently from the ones that
// are not frequently updated so that they don't share the same cache line
// which will lead into false cache sharing
//
// ------------------------------------
// Frequently modified data members
// ------------vvvvvvvvvvvvv-----------
LRUHandleTable table_;
// Memory size for entries residing in the cache.
size_t usage_;
// Memory size for entries residing only in the LRU list.
size_t lru_usage_;
// mutex_ protects the following state.
// We don't count mutex_ as the cache's internal state so semantically we
// don't mind mutex_ invoking the non-const actions.
mutable port::Mutex mutex_;
};
class LRUCache
#ifdef NDEBUG
final
#endif
: public ShardedCache {
public:
LRUCache(size_t capacity, int num_shard_bits, bool strict_capacity_limit,
CacheMetadataChargePolicy metadata_charge_policy =
kDontChargeCacheMetadata);
~LRUCache() override;
const char* Name() const override { return "LRUCache"; }
CacheShard* GetShard(uint32_t shard) override;
const CacheShard* GetShard(uint32_t shard) const override;
void* Value(Handle* handle) override;
size_t GetCharge(Handle* handle) const override;
uint32_t GetHash(Handle* handle) const override;
DeleterFn GetDeleter(Handle* handle) const override;
void DisownData() override;
private:
LRUCacheShard* shards_ = nullptr;
int num_shards_ = 0;
};
} // namespace fast_lru_cache
std::shared_ptr<Cache> NewFastLRUCache(
size_t capacity, int num_shard_bits = -1,
bool strict_capacity_limit = false,
CacheMetadataChargePolicy metadata_charge_policy =
kDefaultCacheMetadataChargePolicy);
} // namespace ROCKSDB_NAMESPACE

84
cache/lru_cache.cc vendored
View File

@ -19,6 +19,7 @@
#include "util/mutexlock.h"
namespace ROCKSDB_NAMESPACE {
namespace lru_cache {
LRUHandleTable::LRUHandleTable(int max_upper_hash_bits)
: length_bits_(/* historical starting size*/ 4),
@ -76,13 +77,12 @@ LRUHandle** LRUHandleTable::FindPointer(const Slice& key, uint32_t hash) {
void LRUHandleTable::Resize() {
if (length_bits_ >= max_length_bits_) {
// Due to reaching limit of hash information, if we made the table
// bigger, we would allocate more addresses but only the same
// number would be used.
// Due to reaching limit of hash information, if we made the table bigger,
// we would allocate more addresses but only the same number would be used.
return;
}
if (length_bits_ >= 31) {
// Avoid undefined behavior shifting uint32_t by 32
// Avoid undefined behavior shifting uint32_t by 32.
return;
}
@ -125,7 +125,7 @@ LRUCacheShard::LRUCacheShard(
mutex_(use_adaptive_mutex),
secondary_cache_(secondary_cache) {
set_metadata_charge_policy(metadata_charge_policy);
// Make empty circular linked list
// Make empty circular linked list.
lru_.next = &lru_;
lru_.prev = &lru_;
lru_low_pri_ = &lru_;
@ -138,7 +138,7 @@ void LRUCacheShard::EraseUnRefEntries() {
MutexLock l(&mutex_);
while (lru_.next != &lru_) {
LRUHandle* old = lru_.next;
// LRU list contains only elements which can be evicted
// LRU list contains only elements which can be evicted.
assert(old->InCache() && !old->HasRefs());
LRU_Remove(old);
table_.Remove(old->key(), old->hash);
@ -168,7 +168,7 @@ void LRUCacheShard::ApplyToSomeEntries(
assert(average_entries_per_lock > 0);
// Assuming we are called with same average_entries_per_lock repeatedly,
// this simplifies some logic (index_end will not overflow)
// this simplifies some logic (index_end will not overflow).
assert(average_entries_per_lock < length || *state == 0);
uint32_t index_begin = *state >> (32 - length_bits);
@ -274,7 +274,7 @@ void LRUCacheShard::EvictFromLRU(size_t charge,
autovector<LRUHandle*>* deleted) {
while ((usage_ + charge) > capacity_ && lru_.next != &lru_) {
LRUHandle* old = lru_.next;
// LRU list contains only elements which can be evicted
// LRU list contains only elements which can be evicted.
assert(old->InCache() && !old->HasRefs());
LRU_Remove(old);
table_.Remove(old->key(), old->hash);
@ -295,11 +295,11 @@ void LRUCacheShard::SetCapacity(size_t capacity) {
EvictFromLRU(0, &last_reference_list);
}
// Try to insert the evicted entries into tiered cache
// Free the entries outside of mutex for performance reasons
// Try to insert the evicted entries into tiered cache.
// Free the entries outside of mutex for performance reasons.
for (auto entry : last_reference_list) {
if (secondary_cache_ && entry->IsSecondaryCacheCompatible() &&
!entry->IsPromoted()) {
!entry->IsInSecondaryCache()) {
secondary_cache_->Insert(entry->key(), entry->value, entry->info_.helper)
.PermitUncheckedError();
}
@ -322,7 +322,7 @@ Status LRUCacheShard::InsertItem(LRUHandle* e, Cache::Handle** handle,
MutexLock l(&mutex_);
// Free the space following strict LRU policy until enough space
// is freed or the lru list is empty
// is freed or the lru list is empty.
EvictFromLRU(total_charge, &last_reference_list);
if ((usage_ + total_charge) > capacity_ &&
@ -349,7 +349,7 @@ Status LRUCacheShard::InsertItem(LRUHandle* e, Cache::Handle** handle,
assert(old->InCache());
old->SetInCache(false);
if (!old->HasRefs()) {
// old is on LRU because it's in cache and its reference count is 0
// old is on LRU because it's in cache and its reference count is 0.
LRU_Remove(old);
size_t old_total_charge =
old->CalcTotalCharge(metadata_charge_policy_);
@ -361,7 +361,7 @@ Status LRUCacheShard::InsertItem(LRUHandle* e, Cache::Handle** handle,
if (handle == nullptr) {
LRU_Insert(e);
} else {
// If caller already holds a ref, no need to take one here
// If caller already holds a ref, no need to take one here.
if (!e->HasRefs()) {
e->Ref();
}
@ -370,11 +370,11 @@ Status LRUCacheShard::InsertItem(LRUHandle* e, Cache::Handle** handle,
}
}
// Try to insert the evicted entries into the secondary cache
// Free the entries here outside of mutex for performance reasons
// Try to insert the evicted entries into the secondary cache.
// Free the entries here outside of mutex for performance reasons.
for (auto entry : last_reference_list) {
if (secondary_cache_ && entry->IsSecondaryCacheCompatible() &&
!entry->IsPromoted()) {
!entry->IsInSecondaryCache()) {
secondary_cache_->Insert(entry->key(), entry->value, entry->info_.helper)
.PermitUncheckedError();
}
@ -390,7 +390,6 @@ void LRUCacheShard::Promote(LRUHandle* e) {
assert(secondary_handle->IsReady());
e->SetIncomplete(false);
e->SetInCache(true);
e->SetPromoted(true);
e->value = secondary_handle->Value();
e->charge = secondary_handle->Size();
delete secondary_handle;
@ -404,7 +403,7 @@ void LRUCacheShard::Promote(LRUHandle* e) {
Status s = InsertItem(e, &handle, /*free_handle_on_fail=*/false);
if (!s.ok()) {
// Item is in memory, but not accounted against the cache capacity.
// When the handle is released, the item should get deleted
// When the handle is released, the item should get deleted.
assert(!e->InCache());
}
} else {
@ -437,8 +436,8 @@ Cache::Handle* LRUCacheShard::Lookup(
}
// If handle table lookup failed, then allocate a handle outside the
// mutex if we're going to lookup in the secondary cache
// Only support synchronous for now
// mutex if we're going to lookup in the secondary cache.
// Only support synchronous for now.
// TODO: Support asynchronous lookup in secondary cache
if (!e && secondary_cache_ && helper && helper->saveto_cb) {
// For objects from the secondary cache, we expect the caller to provide
@ -447,8 +446,9 @@ Cache::Handle* LRUCacheShard::Lookup(
// accounting purposes, which we won't demote to the secondary cache
// anyway.
assert(create_cb && helper->del_cb);
bool is_in_sec_cache{false};
std::unique_ptr<SecondaryCacheResultHandle> secondary_handle =
secondary_cache_->Lookup(key, create_cb, wait);
secondary_cache_->Lookup(key, create_cb, wait, is_in_sec_cache);
if (secondary_handle != nullptr) {
e = reinterpret_cast<LRUHandle*>(
new char[sizeof(LRUHandle) - 1 + key.size()]);
@ -468,8 +468,9 @@ Cache::Handle* LRUCacheShard::Lookup(
if (wait) {
Promote(e);
e->SetIsInSecondaryCache(is_in_sec_cache);
if (!e->value) {
// The secondary cache returned a handle, but the lookup failed
// The secondary cache returned a handle, but the lookup failed.
e->Unref();
e->Free();
e = nullptr;
@ -479,8 +480,9 @@ Cache::Handle* LRUCacheShard::Lookup(
}
} else {
// If wait is false, we always return a handle and let the caller
// release the handle after checking for success or failure
// release the handle after checking for success or failure.
e->SetIncomplete(true);
e->SetIsInSecondaryCache(is_in_sec_cache);
// This may be slightly inaccurate, if the lookup eventually fails.
// But the probability is very low.
PERF_COUNTER_ADD(secondary_cache_hit_count, 1);
@ -494,7 +496,7 @@ Cache::Handle* LRUCacheShard::Lookup(
bool LRUCacheShard::Ref(Cache::Handle* h) {
LRUHandle* e = reinterpret_cast<LRUHandle*>(h);
MutexLock l(&mutex_);
// To create another reference - entry must be already externally referenced
// To create another reference - entry must be already externally referenced.
assert(e->HasRefs());
e->Ref();
return true;
@ -507,7 +509,7 @@ void LRUCacheShard::SetHighPriorityPoolRatio(double high_pri_pool_ratio) {
MaintainPoolSize();
}
bool LRUCacheShard::Release(Cache::Handle* handle, bool force_erase) {
bool LRUCacheShard::Release(Cache::Handle* handle, bool erase_if_last_ref) {
if (handle == nullptr) {
return false;
}
@ -517,15 +519,15 @@ bool LRUCacheShard::Release(Cache::Handle* handle, bool force_erase) {
MutexLock l(&mutex_);
last_reference = e->Unref();
if (last_reference && e->InCache()) {
// The item is still in cache, and nobody else holds a reference to it
if (usage_ > capacity_ || force_erase) {
// The LRU list must be empty since the cache is full
assert(lru_.next == &lru_ || force_erase);
// Take this opportunity and remove the item
// The item is still in cache, and nobody else holds a reference to it.
if (usage_ > capacity_ || erase_if_last_ref) {
// The LRU list must be empty since the cache is full.
assert(lru_.next == &lru_ || erase_if_last_ref);
// Take this opportunity and remove the item.
table_.Remove(e->key(), e->hash);
e->SetInCache(false);
} else {
// Put the item back on the LRU list, and don't free it
// Put the item back on the LRU list, and don't free it.
LRU_Insert(e);
last_reference = false;
}
@ -542,7 +544,7 @@ bool LRUCacheShard::Release(Cache::Handle* handle, bool force_erase) {
}
}
// Free the entry here outside of mutex for performance reasons
// Free the entry here outside of mutex for performance reasons.
if (last_reference) {
e->Free();
}
@ -554,8 +556,8 @@ Status LRUCacheShard::Insert(const Slice& key, uint32_t hash, void* value,
void (*deleter)(const Slice& key, void* value),
const Cache::CacheItemHelper* helper,
Cache::Handle** handle, Cache::Priority priority) {
// Allocate the memory here outside of the mutex
// If the cache is full, we'll have to release it
// Allocate the memory here outside of the mutex.
// If the cache is full, we'll have to release it.
// It shouldn't happen very often though.
LRUHandle* e = reinterpret_cast<LRUHandle*>(
new char[sizeof(LRUHandle) - 1 + key.size()]);
@ -603,8 +605,8 @@ void LRUCacheShard::Erase(const Slice& key, uint32_t hash) {
}
}
// Free the entry here outside of mutex for performance reasons
// last_reference will only be true if e != nullptr
// Free the entry here outside of mutex for performance reasons.
// last_reference will only be true if e != nullptr.
if (last_reference) {
e->Free();
}
@ -705,7 +707,7 @@ uint32_t LRUCache::GetHash(Handle* handle) const {
}
void LRUCache::DisownData() {
// Leak data only if that won't generate an ASAN/valgrind warning
// Leak data only if that won't generate an ASAN/valgrind warning.
if (!kMustFreeHeapAllocations) {
shards_ = nullptr;
num_shards_ = 0;
@ -758,6 +760,8 @@ void LRUCache::WaitAll(std::vector<Handle*>& handles) {
}
}
} // namespace lru_cache
std::shared_ptr<Cache> NewLRUCache(
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
double high_pri_pool_ratio,
@ -765,10 +769,10 @@ std::shared_ptr<Cache> NewLRUCache(
CacheMetadataChargePolicy metadata_charge_policy,
const std::shared_ptr<SecondaryCache>& secondary_cache) {
if (num_shard_bits >= 20) {
return nullptr; // the cache cannot be sharded into too many fine pieces
return nullptr; // The cache cannot be sharded into too many fine pieces.
}
if (high_pri_pool_ratio < 0.0 || high_pri_pool_ratio > 1.0) {
// invalid high_pri_pool_ratio
// Invalid high_pri_pool_ratio
return nullptr;
}
if (num_shard_bits < 0) {

66
cache/lru_cache.h vendored
View File

@ -19,6 +19,7 @@
#include "util/autovector.h"
namespace ROCKSDB_NAMESPACE {
namespace lru_cache {
// LRU cache implementation. This class is not thread-safe.
@ -81,12 +82,12 @@ struct LRUHandle {
IN_HIGH_PRI_POOL = (1 << 2),
// Whether this entry has had any lookups (hits).
HAS_HIT = (1 << 3),
// Can this be inserted into the secondary cache
// Can this be inserted into the secondary cache.
IS_SECONDARY_CACHE_COMPATIBLE = (1 << 4),
// Is the handle still being read from a lower tier
// Is the handle still being read from a lower tier.
IS_PENDING = (1 << 5),
// Has the item been promoted from a lower tier
IS_PROMOTED = (1 << 6),
// Whether this handle is still in a lower tier
IS_IN_SECONDARY_CACHE = (1 << 6),
};
uint8_t flags;
@ -129,7 +130,7 @@ struct LRUHandle {
#endif // __SANITIZE_THREAD__
}
bool IsPending() const { return flags & IS_PENDING; }
bool IsPromoted() const { return flags & IS_PROMOTED; }
bool IsInSecondaryCache() const { return flags & IS_IN_SECONDARY_CACHE; }
void SetInCache(bool in_cache) {
if (in_cache) {
@ -176,11 +177,11 @@ struct LRUHandle {
}
}
void SetPromoted(bool promoted) {
if (promoted) {
flags |= IS_PROMOTED;
void SetIsInSecondaryCache(bool is_in_secondary_cache) {
if (is_in_secondary_cache) {
flags |= IS_IN_SECONDARY_CACHE;
} else {
flags &= ~IS_PROMOTED;
flags &= ~IS_IN_SECONDARY_CACHE;
}
}
@ -208,7 +209,7 @@ struct LRUHandle {
delete[] reinterpret_cast<char*>(this);
}
// Calculate the memory usage by metadata
// Calculate the memory usage by metadata.
inline size_t CalcTotalCharge(
CacheMetadataChargePolicy metadata_charge_policy) {
size_t meta_charge = 0;
@ -216,7 +217,7 @@ struct LRUHandle {
#ifdef ROCKSDB_MALLOC_USABLE_SIZE
meta_charge += malloc_usable_size(static_cast<void*>(this));
#else
// This is the size that is used when a new handle is created
// This is the size that is used when a new handle is created.
meta_charge += sizeof(LRUHandle) - 1 + key_length;
#endif
}
@ -272,10 +273,10 @@ class LRUHandleTable {
// a linked list of cache entries that hash into the bucket.
std::unique_ptr<LRUHandle*[]> list_;
// Number of elements currently in the table
// Number of elements currently in the table.
uint32_t elems_;
// Set from max_upper_hash_bits (see constructor)
// Set from max_upper_hash_bits (see constructor).
const int max_length_bits_;
};
@ -291,7 +292,7 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
// Separate from constructor so caller can easily make an array of LRUCache
// if current usage is more than new capacity, the function will attempt to
// free the needed space
// free the needed space.
virtual void SetCapacity(size_t capacity) override;
// Set the flag to reject insertion if cache if full.
@ -314,8 +315,7 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
assert(helper);
return Insert(key, hash, value, charge, nullptr, helper, handle, priority);
}
// If helper_cb is null, the values of the following arguments don't
// matter
// If helper_cb is null, the values of the following arguments don't matter.
virtual Cache::Handle* Lookup(const Slice& key, uint32_t hash,
const ShardedCache::CacheItemHelper* helper,
const ShardedCache::CreateCallback& create_cb,
@ -326,14 +326,14 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
nullptr);
}
virtual bool Release(Cache::Handle* handle, bool /*useful*/,
bool force_erase) override {
return Release(handle, force_erase);
bool erase_if_last_ref) override {
return Release(handle, erase_if_last_ref);
}
virtual bool IsReady(Cache::Handle* /*handle*/) override;
virtual void Wait(Cache::Handle* /*handle*/) override {}
virtual bool Ref(Cache::Handle* handle) override;
virtual bool Release(Cache::Handle* handle,
bool force_erase = false) override;
bool erase_if_last_ref = false) override;
virtual void Erase(const Slice& key, uint32_t hash) override;
// Although in some platforms the update of size_t is atomic, to make sure
@ -354,8 +354,8 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
void TEST_GetLRUList(LRUHandle** lru, LRUHandle** lru_low_pri);
// Retrieves number of elements in LRU, for unit test purpose only
// not threadsafe
// Retrieves number of elements in LRU, for unit test purpose only.
// Not threadsafe.
size_t TEST_GetLRUSize();
// Retrieves high pri pool ratio
@ -365,14 +365,16 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
friend class LRUCache;
// Insert an item into the hash table and, if handle is null, insert into
// the LRU list. Older items are evicted as necessary. If the cache is full
// and free_handle_on_fail is true, the item is deleted and handle is set to.
// and free_handle_on_fail is true, the item is deleted and handle is set to
// nullptr.
Status InsertItem(LRUHandle* item, Cache::Handle** handle,
bool free_handle_on_fail);
Status Insert(const Slice& key, uint32_t hash, void* value, size_t charge,
DeleterFn deleter, const Cache::CacheItemHelper* helper,
Cache::Handle** handle, Cache::Priority priority);
// Promote an item looked up from the secondary cache to the LRU cache. The
// item is only inserted into the hash table and not the LRU list, and only
// Promote an item looked up from the secondary cache to the LRU cache.
// The item may be still in the secondary cache.
// It is only inserted into the hash table and not the LRU list, and only
// if the cache is not at full capacity, as is the case during Insert. The
// caller should hold a reference on the LRUHandle. When the caller releases
// the last reference, the item is added to the LRU list.
@ -389,7 +391,7 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
// Free some space following strict LRU policy until enough space
// to hold (usage_ + charge) is freed or the lru list is empty
// This function is not thread safe - it needs to be executed while
// holding the mutex_
// holding the mutex_.
void EvictFromLRU(size_t charge, autovector<LRUHandle*>* deleted);
// Initialized before use.
@ -429,10 +431,10 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
// ------------vvvvvvvvvvvvv-----------
LRUHandleTable table_;
// Memory size for entries residing in the cache
// Memory size for entries residing in the cache.
size_t usage_;
// Memory size for entries residing only in the LRU list
// Memory size for entries residing only in the LRU list.
size_t lru_usage_;
// mutex_ protects the following state.
@ -467,9 +469,9 @@ class LRUCache
virtual void DisownData() override;
virtual void WaitAll(std::vector<Handle*>& handles) override;
// Retrieves number of elements in LRU, for unit test purpose only
// Retrieves number of elements in LRU, for unit test purpose only.
size_t TEST_GetLRUSize();
// Retrieves high pri pool ratio
// Retrieves high pri pool ratio.
double GetHighPriPoolRatio();
private:
@ -478,4 +480,10 @@ class LRUCache
std::shared_ptr<SecondaryCache> secondary_cache_;
};
} // namespace lru_cache
using LRUCache = lru_cache::LRUCache;
using LRUHandle = lru_cache::LRUHandle;
using LRUCacheShard = lru_cache::LRUCacheShard;
} // namespace ROCKSDB_NAMESPACE

View File

@ -266,12 +266,13 @@ class TestSecondaryCache : public SecondaryCache {
}
std::unique_ptr<SecondaryCacheResultHandle> Lookup(
const Slice& key, const Cache::CreateCallback& create_cb,
bool /*wait*/) override {
const Slice& key, const Cache::CreateCallback& create_cb, bool /*wait*/,
bool& is_in_sec_cache) override {
std::string key_str = key.ToString();
TEST_SYNC_POINT_CALLBACK("TestSecondaryCache::Lookup", &key_str);
std::unique_ptr<SecondaryCacheResultHandle> secondary_handle;
is_in_sec_cache = false;
ResultType type = ResultType::SUCCESS;
auto iter = result_map_.find(key.ToString());
if (iter != result_map_.end()) {
@ -296,6 +297,7 @@ class TestSecondaryCache : public SecondaryCache {
if (s.ok()) {
secondary_handle.reset(new TestSecondaryCacheResultHandle(
cache_.get(), handle, value, charge, type));
is_in_sec_cache = true;
} else {
cache_->Release(handle);
}
@ -383,10 +385,10 @@ class DBSecondaryCacheTest : public DBTestBase {
std::unique_ptr<Env> fault_env_;
};
class LRUSecondaryCacheTest : public LRUCacheTest {
class LRUCacheSecondaryCacheTest : public LRUCacheTest {
public:
LRUSecondaryCacheTest() : fail_create_(false) {}
~LRUSecondaryCacheTest() {}
LRUCacheSecondaryCacheTest() : fail_create_(false) {}
~LRUCacheSecondaryCacheTest() {}
protected:
class TestItem {
@ -449,16 +451,17 @@ class LRUSecondaryCacheTest : public LRUCacheTest {
bool fail_create_;
};
Cache::CacheItemHelper LRUSecondaryCacheTest::helper_(
LRUSecondaryCacheTest::SizeCallback, LRUSecondaryCacheTest::SaveToCallback,
LRUSecondaryCacheTest::DeletionCallback);
Cache::CacheItemHelper LRUCacheSecondaryCacheTest::helper_(
LRUCacheSecondaryCacheTest::SizeCallback,
LRUCacheSecondaryCacheTest::SaveToCallback,
LRUCacheSecondaryCacheTest::DeletionCallback);
Cache::CacheItemHelper LRUSecondaryCacheTest::helper_fail_(
LRUSecondaryCacheTest::SizeCallback,
LRUSecondaryCacheTest::SaveToCallbackFail,
LRUSecondaryCacheTest::DeletionCallback);
Cache::CacheItemHelper LRUCacheSecondaryCacheTest::helper_fail_(
LRUCacheSecondaryCacheTest::SizeCallback,
LRUCacheSecondaryCacheTest::SaveToCallbackFail,
LRUCacheSecondaryCacheTest::DeletionCallback);
TEST_F(LRUSecondaryCacheTest, BasicTest) {
TEST_F(LRUCacheSecondaryCacheTest, BasicTest) {
LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
std::shared_ptr<TestSecondaryCache> secondary_cache =
@ -470,25 +473,25 @@ TEST_F(LRUSecondaryCacheTest, BasicTest) {
Random rnd(301);
std::string str1 = rnd.RandomString(1020);
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k1", item1, &LRUCacheSecondaryCacheTest::helper_,
str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// k1 should be demoted to NVM
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k2", item2, &LRUCacheSecondaryCacheTest::helper_,
str2.length()));
get_perf_context()->Reset();
Cache::Handle* handle;
handle =
cache->Lookup("k2", &LRUSecondaryCacheTest::helper_, test_item_creator,
Cache::Priority::LOW, true, stats.get());
cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true, stats.get());
ASSERT_NE(handle, nullptr);
cache->Release(handle);
// This lookup should promote k1 and demote k2
handle =
cache->Lookup("k1", &LRUSecondaryCacheTest::helper_, test_item_creator,
Cache::Priority::LOW, true, stats.get());
cache->Lookup("k1", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true, stats.get());
ASSERT_NE(handle, nullptr);
cache->Release(handle);
ASSERT_EQ(secondary_cache->num_inserts(), 2u);
@ -502,7 +505,7 @@ TEST_F(LRUSecondaryCacheTest, BasicTest) {
secondary_cache.reset();
}
TEST_F(LRUSecondaryCacheTest, BasicFailTest) {
TEST_F(LRUCacheSecondaryCacheTest, BasicFailTest) {
LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
std::shared_ptr<TestSecondaryCache> secondary_cache =
@ -515,15 +518,15 @@ TEST_F(LRUSecondaryCacheTest, BasicFailTest) {
auto item1 = std::make_unique<TestItem>(str1.data(), str1.length());
ASSERT_TRUE(cache->Insert("k1", item1.get(), nullptr, str1.length())
.IsInvalidArgument());
ASSERT_OK(cache->Insert("k1", item1.get(), &LRUSecondaryCacheTest::helper_,
str1.length()));
ASSERT_OK(cache->Insert("k1", item1.get(),
&LRUCacheSecondaryCacheTest::helper_, str1.length()));
item1.release(); // Appease clang-analyze "potential memory leak"
Cache::Handle* handle;
handle = cache->Lookup("k2", nullptr, test_item_creator, Cache::Priority::LOW,
true);
ASSERT_EQ(handle, nullptr);
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, false);
ASSERT_EQ(handle, nullptr);
@ -531,7 +534,7 @@ TEST_F(LRUSecondaryCacheTest, BasicFailTest) {
secondary_cache.reset();
}
TEST_F(LRUSecondaryCacheTest, SaveFailTest) {
TEST_F(LRUCacheSecondaryCacheTest, SaveFailTest) {
LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
std::shared_ptr<TestSecondaryCache> secondary_cache =
@ -542,25 +545,25 @@ TEST_F(LRUSecondaryCacheTest, SaveFailTest) {
Random rnd(301);
std::string str1 = rnd.RandomString(1020);
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_fail_,
str1.length()));
ASSERT_OK(cache->Insert(
"k1", item1, &LRUCacheSecondaryCacheTest::helper_fail_, str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// k1 should be demoted to NVM
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_fail_,
str2.length()));
ASSERT_OK(cache->Insert(
"k2", item2, &LRUCacheSecondaryCacheTest::helper_fail_, str2.length()));
Cache::Handle* handle;
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_fail_,
handle = cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_fail_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
// This lookup should fail, since k1 demotion would have failed
handle = cache->Lookup("k1", &LRUSecondaryCacheTest::helper_fail_,
handle = cache->Lookup("k1", &LRUCacheSecondaryCacheTest::helper_fail_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_EQ(handle, nullptr);
// Since k1 didn't get promoted, k2 should still be in cache
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_fail_,
handle = cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_fail_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
@ -571,7 +574,7 @@ TEST_F(LRUSecondaryCacheTest, SaveFailTest) {
secondary_cache.reset();
}
TEST_F(LRUSecondaryCacheTest, CreateFailTest) {
TEST_F(LRUCacheSecondaryCacheTest, CreateFailTest) {
LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
std::shared_ptr<TestSecondaryCache> secondary_cache =
@ -582,26 +585,26 @@ TEST_F(LRUSecondaryCacheTest, CreateFailTest) {
Random rnd(301);
std::string str1 = rnd.RandomString(1020);
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k1", item1, &LRUCacheSecondaryCacheTest::helper_,
str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// k1 should be demoted to NVM
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k2", item2, &LRUCacheSecondaryCacheTest::helper_,
str2.length()));
Cache::Handle* handle;
SetFailCreate(true);
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
// This lookup should fail, since k1 creation would have failed
handle = cache->Lookup("k1", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k1", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_EQ(handle, nullptr);
// Since k1 didn't get promoted, k2 should still be in cache
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
@ -612,7 +615,7 @@ TEST_F(LRUSecondaryCacheTest, CreateFailTest) {
secondary_cache.reset();
}
TEST_F(LRUSecondaryCacheTest, FullCapacityTest) {
TEST_F(LRUCacheSecondaryCacheTest, FullCapacityTest) {
LRUCacheOptions opts(1024, 0, /*_strict_capacity_limit=*/true, 0.5, nullptr,
kDefaultToAdaptiveMutex, kDontChargeCacheMetadata);
std::shared_ptr<TestSecondaryCache> secondary_cache =
@ -623,28 +626,28 @@ TEST_F(LRUSecondaryCacheTest, FullCapacityTest) {
Random rnd(301);
std::string str1 = rnd.RandomString(1020);
TestItem* item1 = new TestItem(str1.data(), str1.length());
ASSERT_OK(cache->Insert("k1", item1, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k1", item1, &LRUCacheSecondaryCacheTest::helper_,
str1.length()));
std::string str2 = rnd.RandomString(1020);
TestItem* item2 = new TestItem(str2.data(), str2.length());
// k1 should be demoted to NVM
ASSERT_OK(cache->Insert("k2", item2, &LRUSecondaryCacheTest::helper_,
ASSERT_OK(cache->Insert("k2", item2, &LRUCacheSecondaryCacheTest::helper_,
str2.length()));
Cache::Handle* handle;
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
// k1 promotion should fail due to the block cache being at capacity,
// but the lookup should still succeed
Cache::Handle* handle2;
handle2 = cache->Lookup("k1", &LRUSecondaryCacheTest::helper_,
handle2 = cache->Lookup("k1", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle2, nullptr);
// Since k1 didn't get inserted, k2 should still be in cache
cache->Release(handle);
cache->Release(handle2);
handle = cache->Lookup("k2", &LRUSecondaryCacheTest::helper_,
handle = cache->Lookup("k2", &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, true);
ASSERT_NE(handle, nullptr);
cache->Release(handle);
@ -1046,7 +1049,7 @@ TEST_F(DBSecondaryCacheTest, SecondaryCacheFailureTest) {
Destroy(options);
}
TEST_F(LRUSecondaryCacheTest, BasicWaitAllTest) {
TEST_F(LRUCacheSecondaryCacheTest, BasicWaitAllTest) {
LRUCacheOptions opts(1024, 2, false, 0.5, nullptr, kDefaultToAdaptiveMutex,
kDontChargeCacheMetadata);
std::shared_ptr<TestSecondaryCache> secondary_cache =
@ -1062,7 +1065,8 @@ TEST_F(LRUSecondaryCacheTest, BasicWaitAllTest) {
values.emplace_back(str);
TestItem* item = new TestItem(str.data(), str.length());
ASSERT_OK(cache->Insert("k" + std::to_string(i), item,
&LRUSecondaryCacheTest::helper_, str.length()));
&LRUCacheSecondaryCacheTest::helper_,
str.length()));
}
// Force all entries to be evicted to the secondary cache
cache->SetCapacity(0);
@ -1075,9 +1079,9 @@ TEST_F(LRUSecondaryCacheTest, BasicWaitAllTest) {
{"k5", TestSecondaryCache::ResultType::FAIL}});
std::vector<Cache::Handle*> results;
for (int i = 0; i < 6; ++i) {
results.emplace_back(
cache->Lookup("k" + std::to_string(i), &LRUSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, false));
results.emplace_back(cache->Lookup(
"k" + std::to_string(i), &LRUCacheSecondaryCacheTest::helper_,
test_item_creator, Cache::Priority::LOW, false));
}
cache->WaitAll(results);
for (int i = 0; i < 6; ++i) {

View File

@ -104,14 +104,15 @@ bool ShardedCache::Ref(Handle* handle) {
return GetShard(Shard(hash))->Ref(handle);
}
bool ShardedCache::Release(Handle* handle, bool force_erase) {
bool ShardedCache::Release(Handle* handle, bool erase_if_last_ref) {
uint32_t hash = GetHash(handle);
return GetShard(Shard(hash))->Release(handle, force_erase);
return GetShard(Shard(hash))->Release(handle, erase_if_last_ref);
}
bool ShardedCache::Release(Handle* handle, bool useful, bool force_erase) {
bool ShardedCache::Release(Handle* handle, bool useful,
bool erase_if_last_ref) {
uint32_t hash = GetHash(handle);
return GetShard(Shard(hash))->Release(handle, useful, force_erase);
return GetShard(Shard(hash))->Release(handle, useful, erase_if_last_ref);
}
void ShardedCache::Erase(const Slice& key) {

View File

@ -37,11 +37,11 @@ class CacheShard {
Cache::Priority priority, bool wait,
Statistics* stats) = 0;
virtual bool Release(Cache::Handle* handle, bool useful,
bool force_erase) = 0;
bool erase_if_last_ref) = 0;
virtual bool IsReady(Cache::Handle* handle) = 0;
virtual void Wait(Cache::Handle* handle) = 0;
virtual bool Ref(Cache::Handle* handle) = 0;
virtual bool Release(Cache::Handle* handle, bool force_erase) = 0;
virtual bool Release(Cache::Handle* handle, bool erase_if_last_ref) = 0;
virtual void Erase(const Slice& key, uint32_t hash) = 0;
virtual void SetCapacity(size_t capacity) = 0;
virtual void SetStrictCapacityLimit(bool strict_capacity_limit) = 0;
@ -94,11 +94,11 @@ class ShardedCache : public Cache {
const CreateCallback& create_cb, Priority priority,
bool wait, Statistics* stats = nullptr) override;
virtual bool Release(Handle* handle, bool useful,
bool force_erase = false) override;
bool erase_if_last_ref = false) override;
virtual bool IsReady(Handle* handle) override;
virtual void Wait(Handle* handle) override;
virtual bool Ref(Handle* handle) override;
virtual bool Release(Handle* handle, bool force_erase = false) override;
virtual bool Release(Handle* handle, bool erase_if_last_ref = false) override;
virtual void Erase(const Slice& key) override;
virtual uint64_t NewId() override;
virtual size_t GetCapacity() const override;

30
common.mk Normal file
View File

@ -0,0 +1,30 @@
ifndef PYTHON
# Default to python3. Some distros like CentOS 8 do not have `python`.
ifeq ($(origin PYTHON), undefined)
PYTHON := $(shell which python3 || which python || echo python3)
endif
export PYTHON
endif
# To setup tmp directory, first recognize some old variables for setting
# test tmp directory or base tmp directory. TEST_TMPDIR is usually read
# by RocksDB tools though Env/FileSystem::GetTestDirectory.
ifeq ($(TEST_TMPDIR),)
TEST_TMPDIR := $(TMPD)
endif
ifeq ($(TEST_TMPDIR),)
ifeq ($(BASE_TMPDIR),)
BASE_TMPDIR :=$(TMPDIR)
endif
ifeq ($(BASE_TMPDIR),)
BASE_TMPDIR :=/tmp
endif
# Use /dev/shm if it has the sticky bit set (otherwise, /tmp or other
# base dir), and create a randomly-named rocksdb.XXXX directory therein.
TEST_TMPDIR := $(shell f=/dev/shm; test -k $$f || f=$(BASE_TMPDIR); \
perl -le 'use File::Temp "tempdir";' \
-e 'print tempdir("'$$f'/rocksdb.XXXX", CLEANUP => 0)')
endif
export TEST_TMPDIR

View File

@ -12,7 +12,7 @@ fi
ROOT=".."
# Fetch right version of gcov
if [ -d /mnt/gvfs/third-party -a -z "$CXX" ]; then
source $ROOT/build_tools/fbcode_config_platform007.sh
source $ROOT/build_tools/fbcode_config_platform009.sh
GCOV=$GCC_BASE/bin/gcov
else
GCOV=$(which gcov)

View File

@ -5,7 +5,7 @@
# build DB_STRESS_CMD so it must exist prior.
DB_STRESS_CMD?=./db_stress
include python.mk
include common.mk
CRASHTEST_MAKE=$(MAKE) -f crash_test.mk
CRASHTEST_PY=$(PYTHON) -u tools/db_crashtest.py --stress_cmd=$(DB_STRESS_CMD)
@ -42,6 +42,12 @@ crash_test_with_ts: $(DB_STRESS_CMD)
$(CRASHTEST_MAKE) whitebox_crash_test_with_ts
$(CRASHTEST_MAKE) blackbox_crash_test_with_ts
crash_test_with_multiops_wc_txn: $(DB_STRESS_CMD)
$(CRASHTEST_MAKE) blackbox_crash_test_with_multiops_wc_txn
crash_test_with_multiops_wp_txn: $(DB_STRESS_CMD)
$(CRASHTEST_MAKE) blackbox_crash_test_with_multiops_wp_txn
blackbox_crash_test: $(DB_STRESS_CMD)
$(CRASHTEST_PY) --simple blackbox $(CRASH_TEST_EXT_ARGS)
$(CRASHTEST_PY) blackbox $(CRASH_TEST_EXT_ARGS)
@ -59,10 +65,10 @@ blackbox_crash_test_with_ts: $(DB_STRESS_CMD)
$(CRASHTEST_PY) --enable_ts blackbox $(CRASH_TEST_EXT_ARGS)
blackbox_crash_test_with_multiops_wc_txn: $(DB_STRESS_CMD)
$(PYTHON) -u tools/db_crashtest.py --test_multiops_txn --write_policy write_committed blackbox $(CRASH_TEST_EXT_ARGS)
$(CRASHTEST_PY) --test_multiops_txn --write_policy write_committed blackbox $(CRASH_TEST_EXT_ARGS)
blackbox_crash_test_with_multiops_wp_txn: $(DB_STRESS_CMD)
$(PYTHON) -u tools/db_crashtest.py --test_multiops_txn --write_policy write_prepared blackbox $(CRASH_TEST_EXT_ARGS)
$(CRASHTEST_PY) --test_multiops_txn --write_policy write_prepared blackbox $(CRASH_TEST_EXT_ARGS)
ifeq ($(CRASH_TEST_KILL_ODD),)
CRASH_TEST_KILL_ODD=888887

View File

@ -23,7 +23,7 @@ Status ArenaWrappedDBIter::GetProperty(std::string prop_name,
if (prop_name == "rocksdb.iterator.super-version-number") {
// First try to pass the value returned from inner iterator.
if (!db_iter_->GetProperty(prop_name, prop).ok()) {
*prop = ToString(sv_number_);
*prop = std::to_string(sv_number_);
}
return Status::OK();
}

View File

@ -96,9 +96,9 @@ class BlobIndex {
assert(slice.size() > 0);
type_ = static_cast<Type>(*slice.data());
if (type_ >= Type::kUnknown) {
return Status::Corruption(
kErrorMessage,
"Unknown blob index type: " + ToString(static_cast<char>(type_)));
return Status::Corruption(kErrorMessage,
"Unknown blob index type: " +
std::to_string(static_cast<char>(type_)));
}
slice = Slice(slice.data() + 1, slice.size() - 1);
if (HasTTL()) {

View File

@ -366,19 +366,26 @@ TEST_F(DBBlobBasicTest, GetBlob_CorruptIndex) {
Reopen(options);
constexpr char key[] = "key";
constexpr char blob[] = "blob";
// Fake a corrupt blob index.
const std::string blob_index("foobar");
WriteBatch batch;
ASSERT_OK(WriteBatchInternal::PutBlobIndex(&batch, 0, key, blob_index));
ASSERT_OK(db_->Write(WriteOptions(), &batch));
ASSERT_OK(Put(key, blob));
ASSERT_OK(Flush());
SyncPoint::GetInstance()->SetCallBack(
"Version::Get::TamperWithBlobIndex", [](void* arg) {
Slice* const blob_index = static_cast<Slice*>(arg);
assert(blob_index);
assert(!blob_index->empty());
blob_index->remove_prefix(1);
});
SyncPoint::GetInstance()->EnableProcessing();
PinnableSlice result;
ASSERT_TRUE(db_->Get(ReadOptions(), db_->DefaultColumnFamily(), key, &result)
.IsCorruption());
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
}
TEST_F(DBBlobBasicTest, MultiGetBlob_CorruptIndex) {
@ -401,17 +408,27 @@ TEST_F(DBBlobBasicTest, MultiGetBlob_CorruptIndex) {
}
constexpr char key[] = "key";
{
// Fake a corrupt blob index.
const std::string blob_index("foobar");
WriteBatch batch;
ASSERT_OK(WriteBatchInternal::PutBlobIndex(&batch, 0, key, blob_index));
ASSERT_OK(db_->Write(WriteOptions(), &batch));
keys[kNumOfKeys] = Slice(static_cast<const char*>(key), sizeof(key) - 1);
}
constexpr char blob[] = "blob";
ASSERT_OK(Put(key, blob));
keys[kNumOfKeys] = key;
ASSERT_OK(Flush());
SyncPoint::GetInstance()->SetCallBack(
"Version::MultiGet::TamperWithBlobIndex", [&key](void* arg) {
KeyContext* const key_context = static_cast<KeyContext*>(arg);
assert(key_context);
assert(key_context->key);
if (*(key_context->key) == key) {
Slice* const blob_index = key_context->value;
assert(blob_index);
assert(!blob_index->empty());
blob_index->remove_prefix(1);
}
});
SyncPoint::GetInstance()->EnableProcessing();
std::array<PinnableSlice, kNumOfKeys + 1> values;
std::array<Status, kNumOfKeys + 1> statuses;
db_->MultiGet(ReadOptions(), dbfull()->DefaultColumnFamily(), kNumOfKeys + 1,
@ -425,6 +442,9 @@ TEST_F(DBBlobBasicTest, MultiGetBlob_CorruptIndex) {
ASSERT_TRUE(statuses[i].IsCorruption());
}
}
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
}
TEST_F(DBBlobBasicTest, MultiGetBlob_ExceedSoftLimit) {
@ -733,6 +753,14 @@ TEST_F(DBBlobBasicTest, Properties) {
&live_blob_file_size));
ASSERT_EQ(live_blob_file_size, total_expected_size);
// Total amount of garbage in live blob files
{
uint64_t live_blob_file_garbage_size = 0;
ASSERT_TRUE(db_->GetIntProperty(DB::Properties::kLiveBlobFileGarbageSize,
&live_blob_file_garbage_size));
ASSERT_EQ(live_blob_file_garbage_size, 0);
}
// Total size of all blob files across all versions
// Note: this should be the same as above since we only have one
// version at this point.
@ -768,6 +796,14 @@ TEST_F(DBBlobBasicTest, Properties) {
<< "\nBlob file space amplification: " << expected_space_amp << '\n';
ASSERT_EQ(blob_stats, oss.str());
// Total amount of garbage in live blob files
{
uint64_t live_blob_file_garbage_size = 0;
ASSERT_TRUE(db_->GetIntProperty(DB::Properties::kLiveBlobFileGarbageSize,
&live_blob_file_garbage_size));
ASSERT_EQ(live_blob_file_garbage_size, expected_garbage_size);
}
}
TEST_F(DBBlobBasicTest, PropertiesMultiVersion) {

View File

@ -415,16 +415,30 @@ TEST_F(DBBlobCompactionTest, CorruptedBlobIndex) {
new ValueMutationFilter(""));
options.compaction_filter = compaction_filter_guard.get();
DestroyAndReopen(options);
// Mock a corrupted blob index
constexpr char key[] = "key";
std::string blob_idx("blob_idx");
WriteBatch write_batch;
ASSERT_OK(WriteBatchInternal::PutBlobIndex(&write_batch, 0, key, blob_idx));
ASSERT_OK(db_->Write(WriteOptions(), &write_batch));
constexpr char blob[] = "blob";
ASSERT_OK(Put(key, blob));
ASSERT_OK(Flush());
SyncPoint::GetInstance()->SetCallBack(
"CompactionIterator::InvokeFilterIfNeeded::TamperWithBlobIndex",
[](void* arg) {
Slice* const blob_index = static_cast<Slice*>(arg);
assert(blob_index);
assert(!blob_index->empty());
blob_index->remove_prefix(1);
});
SyncPoint::GetInstance()->EnableProcessing();
ASSERT_TRUE(db_->CompactRange(CompactRangeOptions(), /*begin=*/nullptr,
/*end=*/nullptr)
.IsCorruption());
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
Close();
}

View File

@ -13,6 +13,7 @@
#include <vector>
#include "db/arena_wrapped_db_iter.h"
#include "db/blob/blob_index.h"
#include "db/column_family.h"
#include "db/db_iter.h"
#include "db/db_test_util.h"
@ -138,20 +139,39 @@ class DBBlobIndexTest : public DBTestBase {
}
};
// Should be able to write kTypeBlobIndex to memtables and SST files.
// Note: the following test case pertains to the StackableDB-based BlobDB
// implementation. We should be able to write kTypeBlobIndex to memtables and
// SST files.
TEST_F(DBBlobIndexTest, Write) {
for (auto tier : kAllTiers) {
DestroyAndReopen(GetTestOptions());
for (int i = 1; i <= 5; i++) {
std::string index = ToString(i);
std::vector<std::pair<std::string, std::string>> key_values;
constexpr size_t num_key_values = 5;
key_values.reserve(num_key_values);
for (size_t i = 1; i <= num_key_values; ++i) {
std::string key = "key" + std::to_string(i);
std::string blob_index;
BlobIndex::EncodeInlinedTTL(&blob_index, /* expiration */ 9876543210,
"blob" + std::to_string(i));
key_values.emplace_back(std::move(key), std::move(blob_index));
}
for (const auto& key_value : key_values) {
WriteBatch batch;
ASSERT_OK(PutBlobIndex(&batch, "key" + index, "blob" + index));
ASSERT_OK(PutBlobIndex(&batch, key_value.first, key_value.second));
ASSERT_OK(Write(&batch));
}
MoveDataTo(tier);
for (int i = 1; i <= 5; i++) {
std::string index = ToString(i);
ASSERT_EQ("blob" + index, GetBlobIndex("key" + index));
for (const auto& key_value : key_values) {
ASSERT_EQ(GetBlobIndex(key_value.first), key_value.second);
}
}
}
@ -164,13 +184,19 @@ TEST_F(DBBlobIndexTest, Write) {
// accidentally opening the base DB of a stacked BlobDB and actual corruption
// when using the integrated BlobDB.
TEST_F(DBBlobIndexTest, Get) {
std::string blob_index;
BlobIndex::EncodeInlinedTTL(&blob_index, /* expiration */ 9876543210, "blob");
for (auto tier : kAllTiers) {
DestroyAndReopen(GetTestOptions());
WriteBatch batch;
ASSERT_OK(batch.Put("key", "value"));
ASSERT_OK(PutBlobIndex(&batch, "blob_key", "blob_index"));
ASSERT_OK(PutBlobIndex(&batch, "blob_key", blob_index));
ASSERT_OK(Write(&batch));
MoveDataTo(tier);
// Verify normal value
bool is_blob_index = false;
PinnableSlice value;
@ -178,6 +204,7 @@ TEST_F(DBBlobIndexTest, Get) {
ASSERT_EQ("value", GetImpl("key"));
ASSERT_EQ("value", GetImpl("key", &is_blob_index));
ASSERT_FALSE(is_blob_index);
// Verify blob index
if (tier <= kImmutableMemtables) {
ASSERT_TRUE(Get("blob_key", &value).IsNotSupported());
@ -186,7 +213,7 @@ TEST_F(DBBlobIndexTest, Get) {
ASSERT_TRUE(Get("blob_key", &value).IsCorruption());
ASSERT_EQ("CORRUPTION", GetImpl("blob_key"));
}
ASSERT_EQ("blob_index", GetImpl("blob_key", &is_blob_index));
ASSERT_EQ(blob_index, GetImpl("blob_key", &is_blob_index));
ASSERT_TRUE(is_blob_index);
}
}
@ -196,11 +223,14 @@ TEST_F(DBBlobIndexTest, Get) {
// if blob index is updated with a normal value. See the test case above for
// more details.
TEST_F(DBBlobIndexTest, Updated) {
std::string blob_index;
BlobIndex::EncodeInlinedTTL(&blob_index, /* expiration */ 9876543210, "blob");
for (auto tier : kAllTiers) {
DestroyAndReopen(GetTestOptions());
WriteBatch batch;
for (int i = 0; i < 10; i++) {
ASSERT_OK(PutBlobIndex(&batch, "key" + ToString(i), "blob_index"));
ASSERT_OK(PutBlobIndex(&batch, "key" + std::to_string(i), blob_index));
}
ASSERT_OK(Write(&batch));
// Avoid blob values from being purged.
@ -218,7 +248,7 @@ TEST_F(DBBlobIndexTest, Updated) {
ASSERT_OK(dbfull()->DeleteRange(WriteOptions(), cfh(), "key6", "key9"));
MoveDataTo(tier);
for (int i = 0; i < 10; i++) {
ASSERT_EQ("blob_index", GetBlobIndex("key" + ToString(i), snapshot));
ASSERT_EQ(blob_index, GetBlobIndex("key" + std::to_string(i), snapshot));
}
ASSERT_EQ("new_value", Get("key1"));
if (tier <= kImmutableMemtables) {
@ -230,9 +260,9 @@ TEST_F(DBBlobIndexTest, Updated) {
ASSERT_EQ("NOT_FOUND", Get("key4"));
ASSERT_EQ("a,b,c", GetImpl("key5"));
for (int i = 6; i < 9; i++) {
ASSERT_EQ("NOT_FOUND", Get("key" + ToString(i)));
ASSERT_EQ("NOT_FOUND", Get("key" + std::to_string(i)));
}
ASSERT_EQ("blob_index", GetBlobIndex("key9"));
ASSERT_EQ(blob_index, GetBlobIndex("key9"));
dbfull()->ReleaseSnapshot(snapshot);
}
}
@ -271,7 +301,7 @@ TEST_F(DBBlobIndexTest, Iterate) {
};
auto get_value = [&](int index, int version) {
return get_key(index) + "_value" + ToString(version);
return get_key(index) + "_value" + std::to_string(version);
};
auto check_iterator = [&](Iterator* iterator, Status::Code expected_status,
@ -471,7 +501,7 @@ TEST_F(DBBlobIndexTest, IntegratedBlobIterate) {
auto get_key = [](size_t index) { return ("key" + std::to_string(index)); };
auto get_value = [&](size_t index, size_t version) {
return get_key(index) + "_value" + ToString(version);
return get_key(index) + "_value" + std::to_string(version);
};
auto check_iterator = [&](Iterator* iterator, Status expected_status,

View File

@ -62,9 +62,9 @@ Status BuildTable(
FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions,
std::vector<SequenceNumber> snapshots,
SequenceNumber earliest_write_conflict_snapshot,
SnapshotChecker* snapshot_checker, bool paranoid_file_checks,
InternalStats* internal_stats, IOStatus* io_status,
const std::shared_ptr<IOTracer>& io_tracer,
SequenceNumber job_snapshot, SnapshotChecker* snapshot_checker,
bool paranoid_file_checks, InternalStats* internal_stats,
IOStatus* io_status, const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCreationReason blob_creation_reason, EventLogger* event_logger,
int job_id, const Env::IOPriority io_priority,
TableProperties* table_properties, Env::WriteLifeTimeHint write_hint,
@ -115,6 +115,7 @@ Status BuildTable(
assert(fs);
TableProperties tp;
bool table_file_created = false;
if (iter->Valid() || !range_del_agg->IsEmpty()) {
std::unique_ptr<CompactionFilter> compaction_filter;
if (ioptions.compaction_filter_factory != nullptr &&
@ -158,6 +159,8 @@ Status BuildTable(
file_checksum_func_name);
return s;
}
table_file_created = true;
FileTypeSet tmp_set = ioptions.checksum_handoff_file_types;
file->SetIOPriority(io_priority);
file->SetWriteLifeTimeHint(write_hint);
@ -189,12 +192,14 @@ Status BuildTable(
CompactionIterator c_iter(
iter, tboptions.internal_comparator.user_comparator(), &merge,
kMaxSequenceNumber, &snapshots, earliest_write_conflict_snapshot,
snapshot_checker, env, ShouldReportDetailedTime(env, ioptions.stats),
job_snapshot, snapshot_checker, env,
ShouldReportDetailedTime(env, ioptions.stats),
true /* internal key corruption is not ok */, range_del_agg.get(),
blob_file_builder.get(), ioptions.allow_data_in_errors,
ioptions.enforce_single_del_contracts,
/*compaction=*/nullptr, compaction_filter.get(),
/*shutting_down=*/nullptr,
/*preserve_deletes_seqnum=*/0, /*manual_compaction_paused=*/nullptr,
/*manual_compaction_paused=*/nullptr,
/*manual_compaction_canceled=*/nullptr, db_options.info_log,
full_history_ts_low);
@ -211,7 +216,11 @@ Status BuildTable(
break;
}
builder->Add(key, value);
meta->UpdateBoundaries(key, value, ikey.sequence, ikey.type);
s = meta->UpdateBoundaries(key, value, ikey.sequence, ikey.type);
if (!s.ok()) {
break;
}
// TODO(noetzli): Update stats after flush, too.
if (io_priority == Env::IO_HIGH &&
@ -366,15 +375,17 @@ Status BuildTable(
constexpr IODebugContext* dbg = nullptr;
Status ignored = fs->DeleteFile(fname, IOOptions(), dbg);
ignored.PermitUncheckedError();
if (table_file_created) {
Status ignored = fs->DeleteFile(fname, IOOptions(), dbg);
ignored.PermitUncheckedError();
}
assert(blob_file_additions || blob_file_paths.empty());
if (blob_file_additions) {
for (const std::string& blob_file_path : blob_file_paths) {
ignored = DeleteDBFile(&db_options, blob_file_path, dbname,
/*force_bg=*/false, /*force_fg=*/false);
Status ignored = DeleteDBFile(&db_options, blob_file_path, dbname,
/*force_bg=*/false, /*force_fg=*/false);
ignored.PermitUncheckedError();
TEST_SYNC_POINT("BuildTable::AfterDeleteFile");
}

View File

@ -57,9 +57,9 @@ extern Status BuildTable(
FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions,
std::vector<SequenceNumber> snapshots,
SequenceNumber earliest_write_conflict_snapshot,
SnapshotChecker* snapshot_checker, bool paranoid_file_checks,
InternalStats* internal_stats, IOStatus* io_status,
const std::shared_ptr<IOTracer>& io_tracer,
SequenceNumber job_snapshot, SnapshotChecker* snapshot_checker,
bool paranoid_file_checks, InternalStats* internal_stats,
IOStatus* io_status, const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCreationReason blob_creation_reason,
EventLogger* event_logger = nullptr, int job_id = 0,
const Env::IOPriority io_priority = Env::IO_HIGH,

45
db/c.cc
View File

@ -1163,6 +1163,43 @@ void rocksdb_multi_get_cf(
}
}
void rocksdb_batched_multi_get_cf(rocksdb_t* db,
const rocksdb_readoptions_t* options,
rocksdb_column_family_handle_t* column_family,
size_t num_keys, const char* const* keys_list,
const size_t* keys_list_sizes,
rocksdb_pinnableslice_t** values, char** errs,
const bool sorted_input) {
Slice* key_slices = new Slice[num_keys];
PinnableSlice* value_slices = new PinnableSlice[num_keys];
Status* statuses = new Status[num_keys];
for (size_t i = 0; i < num_keys; ++i) {
key_slices[i] = Slice(keys_list[i], keys_list_sizes[i]);
}
db->rep->MultiGet(options->rep, column_family->rep, num_keys, key_slices,
value_slices, statuses, sorted_input);
for (size_t i = 0; i < num_keys; ++i) {
if (statuses[i].ok()) {
values[i] = new (rocksdb_pinnableslice_t);
values[i]->rep = std::move(value_slices[i]);
errs[i] = nullptr;
} else {
values[i] = nullptr;
if (!statuses[i].IsNotFound()) {
errs[i] = strdup(statuses[i].ToString().c_str());
} else {
errs[i] = nullptr;
}
}
}
delete[] key_slices;
delete[] value_slices;
delete[] statuses;
}
unsigned char rocksdb_key_may_exist(rocksdb_t* db,
const rocksdb_readoptions_t* options,
const char* key, size_t key_len,
@ -4193,6 +4230,14 @@ rocksdb_cache_t* rocksdb_cache_create_lru(size_t capacity) {
return c;
}
rocksdb_cache_t* rocksdb_cache_create_lru_with_strict_capacity_limit(
size_t capacity) {
rocksdb_cache_t* c = new rocksdb_cache_t;
c->rep = NewLRUCache(capacity);
c->rep->SetStrictCapacityLimit(true);
return c;
}
rocksdb_cache_t* rocksdb_cache_create_lru_opts(
rocksdb_lru_cache_options_t* opt) {
rocksdb_cache_t* c = new rocksdb_cache_t;

View File

@ -1260,15 +1260,18 @@ int main(int argc, char** argv) {
rocksdb_writebatch_clear(wb);
rocksdb_writebatch_put_cf(wb, handles[1], "bar", 3, "b", 1);
rocksdb_writebatch_put_cf(wb, handles[1], "box", 3, "c", 1);
rocksdb_writebatch_put_cf(wb, handles[1], "buff", 4, "rocksdb", 7);
rocksdb_writebatch_delete_cf(wb, handles[1], "bar", 3);
rocksdb_write(db, woptions, wb, &err);
CheckNoError(err);
CheckGetCF(db, roptions, handles[1], "baz", NULL);
CheckGetCF(db, roptions, handles[1], "bar", NULL);
CheckGetCF(db, roptions, handles[1], "box", "c");
CheckGetCF(db, roptions, handles[1], "buff", "rocksdb");
CheckPinGetCF(db, roptions, handles[1], "baz", NULL);
CheckPinGetCF(db, roptions, handles[1], "bar", NULL);
CheckPinGetCF(db, roptions, handles[1], "box", "c");
CheckPinGetCF(db, roptions, handles[1], "buff", "rocksdb");
rocksdb_writebatch_destroy(wb);
rocksdb_flush_wal(db, 1, &err);
@ -1299,6 +1302,26 @@ int main(int argc, char** argv) {
Free(&vals[i]);
}
{
const char* batched_keys[4] = {"box", "buff", "barfooxx", "box"};
const size_t batched_keys_sizes[4] = {3, 4, 8, 3};
const char* expected_value[4] = {"c", "rocksdb", NULL, "c"};
char* batched_errs[4];
rocksdb_pinnableslice_t* pvals[4];
rocksdb_batched_multi_get_cf(db, roptions, handles[1], 4, batched_keys,
batched_keys_sizes, pvals, batched_errs,
false);
const char* val;
size_t val_len;
for (i = 0; i < 4; ++i) {
val = rocksdb_pinnableslice_value(pvals[i], &val_len);
CheckNoError(batched_errs[i]);
CheckEqual(expected_value[i], val, val_len);
rocksdb_pinnableslice_destroy(pvals[i]);
}
}
{
unsigned char value_found = 0;
@ -1330,7 +1353,7 @@ int main(int argc, char** argv) {
for (i = 0; rocksdb_iter_valid(iter) != 0; rocksdb_iter_next(iter)) {
i++;
}
CheckCondition(i == 3);
CheckCondition(i == 4);
rocksdb_iter_get_error(iter, &err);
CheckNoError(err);
rocksdb_iter_destroy(iter);
@ -1354,7 +1377,7 @@ int main(int argc, char** argv) {
for (i = 0; rocksdb_iter_valid(iter) != 0; rocksdb_iter_next(iter)) {
i++;
}
CheckCondition(i == 3);
CheckCondition(i == 4);
rocksdb_iter_get_error(iter, &err);
CheckNoError(err);
rocksdb_iter_destroy(iter);
@ -2448,7 +2471,7 @@ int main(int argc, char** argv) {
rocksdb_fifo_compaction_options_destroy(fco);
}
StartPhase("backupable_db_option");
StartPhase("backup_engine_option");
{
rocksdb_backup_engine_options_t* bdo;
bdo = rocksdb_backup_engine_options_create("path");

View File

@ -501,7 +501,8 @@ std::vector<std::string> ColumnFamilyData::GetDbPaths() const {
return paths;
}
const uint32_t ColumnFamilyData::kDummyColumnFamilyDataId = port::kMaxUint32;
const uint32_t ColumnFamilyData::kDummyColumnFamilyDataId =
std::numeric_limits<uint32_t>::max();
ColumnFamilyData::ColumnFamilyData(
uint32_t id, const std::string& name, Version* _dummy_versions,
@ -826,8 +827,8 @@ int GetL0ThresholdSpeedupCompaction(int level0_file_num_compaction_trigger,
// condition.
// Or twice as compaction trigger, if it is smaller.
int64_t res = std::min(twice_level0_trigger, one_fourth_trigger_slowdown);
if (res >= port::kMaxInt32) {
return port::kMaxInt32;
if (res >= std::numeric_limits<int32_t>::max()) {
return std::numeric_limits<int32_t>::max();
} else {
// res fits in int
return static_cast<int>(res);
@ -1562,20 +1563,6 @@ ColumnFamilyData* ColumnFamilySet::CreateColumnFamily(
return new_cfd;
}
// REQUIRES: DB mutex held
void ColumnFamilySet::FreeDeadColumnFamilies() {
autovector<ColumnFamilyData*> to_delete;
for (auto cfd = dummy_cfd_->next_; cfd != dummy_cfd_; cfd = cfd->next_) {
if (cfd->refs_.load(std::memory_order_relaxed) == 0) {
to_delete.push_back(cfd);
}
}
for (auto cfd : to_delete) {
// this is very rare, so it's not a problem that we do it under a mutex
delete cfd;
}
}
// under a DB mutex AND from a write thread
void ColumnFamilySet::RemoveColumnFamily(ColumnFamilyData* cfd) {
auto cfd_iter = column_family_data_.find(cfd->GetID());

View File

@ -9,10 +9,10 @@
#pragma once
#include <unordered_map>
#include <string>
#include <vector>
#include <atomic>
#include <string>
#include <unordered_map>
#include <vector>
#include "db/memtable_list.h"
#include "db/table_cache.h"
@ -25,6 +25,7 @@
#include "rocksdb/env.h"
#include "rocksdb/options.h"
#include "trace_replay/block_cache_tracer.h"
#include "util/hash_containers.h"
#include "util/thread_local.h"
namespace ROCKSDB_NAMESPACE {
@ -520,9 +521,10 @@ class ColumnFamilyData {
ThreadLocalPtr* TEST_GetLocalSV() { return local_sv_.get(); }
WriteBufferManager* write_buffer_mgr() { return write_buffer_manager_; }
static const uint32_t kDummyColumnFamilyDataId;
private:
friend class ColumnFamilySet;
static const uint32_t kDummyColumnFamilyDataId;
ColumnFamilyData(uint32_t id, const std::string& name,
Version* dummy_versions, Cache* table_cache,
WriteBufferManager* write_buffer_manager,
@ -628,10 +630,8 @@ class ColumnFamilyData {
// held and it needs to be executed from the write thread. SetDropped() also
// guarantees that it will be called only from single-threaded LogAndApply(),
// but this condition is not that important.
// * Iteration -- hold DB mutex, but you can release it in the body of
// iteration. If you release DB mutex in body, reference the column
// family before the mutex and unreference after you unlock, since the column
// family might get dropped when the DB mutex is released
// * Iteration -- hold DB mutex. If you want to release the DB mutex in the
// body of the iteration, wrap in a RefedColumnFamilySet.
// * GetDefault() -- thread safe
// * GetColumnFamily() -- either inside of DB mutex or from a write thread
// * GetNextColumnFamilyID(), GetMaxColumnFamily(), UpdateMaxColumnFamily(),
@ -643,17 +643,12 @@ class ColumnFamilySet {
public:
explicit iterator(ColumnFamilyData* cfd)
: current_(cfd) {}
// NOTE: minimum operators for for-loop iteration
iterator& operator++() {
// dropped column families might still be included in this iteration
// (we're only removing them when client drops the last reference to the
// column family).
// dummy is never dead, so this will never be infinite
do {
current_ = current_->next_;
} while (current_->refs_.load(std::memory_order_relaxed) == 0);
current_ = current_->next_;
return *this;
}
bool operator!=(const iterator& other) {
bool operator!=(const iterator& other) const {
return this->current_ != other.current_;
}
ColumnFamilyData* operator*() { return current_; }
@ -692,10 +687,6 @@ class ColumnFamilySet {
iterator begin() { return iterator(dummy_cfd_->next_); }
iterator end() { return iterator(dummy_cfd_); }
// REQUIRES: DB mutex held
// Don't call while iterating over ColumnFamilySet
void FreeDeadColumnFamilies();
Cache* get_table_cache() { return table_cache_; }
WriteBufferManager* write_buffer_manager() { return write_buffer_manager_; }
@ -715,8 +706,8 @@ class ColumnFamilySet {
// * when reading, at least one condition needs to be satisfied:
// 1. DB mutex locked
// 2. accessed from a single-threaded write thread
std::unordered_map<std::string, uint32_t> column_families_;
std::unordered_map<uint32_t, ColumnFamilyData*> column_family_data_;
UnorderedMap<std::string, uint32_t> column_families_;
UnorderedMap<uint32_t, ColumnFamilyData*> column_family_data_;
uint32_t max_column_family_;
const FileOptions file_options_;
@ -738,6 +729,55 @@ class ColumnFamilySet {
std::string db_session_id_;
};
// A wrapper for ColumnFamilySet that supports releasing DB mutex during each
// iteration over the iterator, because the cfd is Refed and Unrefed during
// each iteration to prevent concurrent CF drop from destroying it (until
// Unref).
class RefedColumnFamilySet {
public:
explicit RefedColumnFamilySet(ColumnFamilySet* cfs) : wrapped_(cfs) {}
class iterator {
public:
explicit iterator(ColumnFamilySet::iterator wrapped) : wrapped_(wrapped) {
MaybeRef(*wrapped_);
}
~iterator() { MaybeUnref(*wrapped_); }
inline void MaybeRef(ColumnFamilyData* cfd) {
if (cfd->GetID() != ColumnFamilyData::kDummyColumnFamilyDataId) {
cfd->Ref();
}
}
inline void MaybeUnref(ColumnFamilyData* cfd) {
if (cfd->GetID() != ColumnFamilyData::kDummyColumnFamilyDataId) {
cfd->UnrefAndTryDelete();
}
}
// NOTE: minimum operators for for-loop iteration
inline iterator& operator++() {
ColumnFamilyData* old = *wrapped_;
++wrapped_;
// Can only unref & potentially free cfd after accessing its next_
MaybeUnref(old);
MaybeRef(*wrapped_);
return *this;
}
inline bool operator!=(const iterator& other) const {
return this->wrapped_ != other.wrapped_;
}
inline ColumnFamilyData* operator*() { return *wrapped_; }
private:
ColumnFamilySet::iterator wrapped_;
};
iterator begin() { return iterator(wrapped_->begin()); }
iterator end() { return iterator(wrapped_->end()); }
private:
ColumnFamilySet* wrapped_;
};
// We use ColumnFamilyMemTablesImpl to provide WriteBatch a way to access
// memtables of different column families (specified by ID in the write batch)
class ColumnFamilyMemTablesImpl : public ColumnFamilyMemTables {

View File

@ -383,7 +383,7 @@ class ColumnFamilyTestBase : public testing::Test {
int NumTableFilesAtLevel(int level, int cf) {
return GetProperty(cf,
"rocksdb.num-files-at-level" + ToString(level));
"rocksdb.num-files-at-level" + std::to_string(level));
}
#ifndef ROCKSDB_LITE
@ -783,7 +783,7 @@ TEST_P(ColumnFamilyTest, BulkAddDrop) {
std::vector<std::string> cf_names;
std::vector<ColumnFamilyHandle*> cf_handles;
for (int i = 1; i <= kNumCF; i++) {
cf_names.push_back("cf1-" + ToString(i));
cf_names.push_back("cf1-" + std::to_string(i));
}
ASSERT_OK(db_->CreateColumnFamilies(cf_options, cf_names, &cf_handles));
for (int i = 1; i <= kNumCF; i++) {
@ -796,7 +796,8 @@ TEST_P(ColumnFamilyTest, BulkAddDrop) {
}
cf_handles.clear();
for (int i = 1; i <= kNumCF; i++) {
cf_descriptors.emplace_back("cf2-" + ToString(i), ColumnFamilyOptions());
cf_descriptors.emplace_back("cf2-" + std::to_string(i),
ColumnFamilyOptions());
}
ASSERT_OK(db_->CreateColumnFamilies(cf_descriptors, &cf_handles));
for (int i = 1; i <= kNumCF; i++) {
@ -820,7 +821,7 @@ TEST_P(ColumnFamilyTest, DropTest) {
Open({"default"});
CreateColumnFamiliesAndReopen({"pikachu"});
for (int i = 0; i < 100; ++i) {
ASSERT_OK(Put(1, ToString(i), "bar" + ToString(i)));
ASSERT_OK(Put(1, std::to_string(i), "bar" + std::to_string(i)));
}
ASSERT_OK(Flush(1));
@ -1344,7 +1345,7 @@ TEST_P(ColumnFamilyTest, DifferentCompactionStyles) {
PutRandomData(1, 10, 12000);
PutRandomData(1, 1, 10);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
// SETUP column family "two" -- level style with 4 levels
@ -1352,7 +1353,7 @@ TEST_P(ColumnFamilyTest, DifferentCompactionStyles) {
PutRandomData(2, 10, 12000);
PutRandomData(2, 1, 10);
WaitForFlush(2);
AssertFilesPerLevel(ToString(i + 1), 2);
AssertFilesPerLevel(std::to_string(i + 1), 2);
}
// TRIGGER compaction "one"
@ -1416,7 +1417,7 @@ TEST_P(ColumnFamilyTest, MultipleManualCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
bool cf_1_1 = true;
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
@ -1446,7 +1447,7 @@ TEST_P(ColumnFamilyTest, MultipleManualCompactions) {
PutRandomData(2, 10, 12000);
PutRandomData(2, 1, 10);
WaitForFlush(2);
AssertFilesPerLevel(ToString(i + 1), 2);
AssertFilesPerLevel(std::to_string(i + 1), 2);
}
threads.emplace_back([&] {
TEST_SYNC_POINT("ColumnFamilyTest::MultiManual:1");
@ -1533,7 +1534,7 @@ TEST_P(ColumnFamilyTest, AutomaticAndManualCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
TEST_SYNC_POINT("ColumnFamilyTest::AutoManual:1");
@ -1543,7 +1544,7 @@ TEST_P(ColumnFamilyTest, AutomaticAndManualCompactions) {
PutRandomData(2, 10, 12000);
PutRandomData(2, 1, 10);
WaitForFlush(2);
AssertFilesPerLevel(ToString(i + 1), 2);
AssertFilesPerLevel(std::to_string(i + 1), 2);
}
ROCKSDB_NAMESPACE::port::Thread threads([&] {
CompactRangeOptions compact_options;
@ -1615,7 +1616,7 @@ TEST_P(ColumnFamilyTest, ManualAndAutomaticCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
bool cf_1_1 = true;
bool cf_1_2 = true;
@ -1650,7 +1651,7 @@ TEST_P(ColumnFamilyTest, ManualAndAutomaticCompactions) {
PutRandomData(2, 10, 12000);
PutRandomData(2, 1, 10);
WaitForFlush(2);
AssertFilesPerLevel(ToString(i + 1), 2);
AssertFilesPerLevel(std::to_string(i + 1), 2);
}
TEST_SYNC_POINT("ColumnFamilyTest::ManualAuto:5");
threads.join();
@ -1709,7 +1710,7 @@ TEST_P(ColumnFamilyTest, SameCFManualManualCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
bool cf_1_1 = true;
bool cf_1_2 = true;
@ -1748,8 +1749,8 @@ TEST_P(ColumnFamilyTest, SameCFManualManualCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(one.level0_file_num_compaction_trigger + i),
1);
AssertFilesPerLevel(
std::to_string(one.level0_file_num_compaction_trigger + i), 1);
}
ROCKSDB_NAMESPACE::port::Thread threads1([&] {
@ -1811,7 +1812,7 @@ TEST_P(ColumnFamilyTest, SameCFManualAutomaticCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
bool cf_1_1 = true;
bool cf_1_2 = true;
@ -1849,8 +1850,8 @@ TEST_P(ColumnFamilyTest, SameCFManualAutomaticCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(one.level0_file_num_compaction_trigger + i),
1);
AssertFilesPerLevel(
std::to_string(one.level0_file_num_compaction_trigger + i), 1);
}
TEST_SYNC_POINT("ColumnFamilyTest::ManualAuto:1");
@ -1904,7 +1905,7 @@ TEST_P(ColumnFamilyTest, SameCFManualAutomaticCompactionsLevel) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
bool cf_1_1 = true;
bool cf_1_2 = true;
@ -1942,8 +1943,8 @@ TEST_P(ColumnFamilyTest, SameCFManualAutomaticCompactionsLevel) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(one.level0_file_num_compaction_trigger + i),
1);
AssertFilesPerLevel(
std::to_string(one.level0_file_num_compaction_trigger + i), 1);
}
TEST_SYNC_POINT("ColumnFamilyTest::ManualAuto:1");
@ -2024,7 +2025,7 @@ TEST_P(ColumnFamilyTest, SameCFAutomaticManualCompactions) {
PutRandomData(1, 10, 12000, true);
PutRandomData(1, 1, 10, true);
WaitForFlush(1);
AssertFilesPerLevel(ToString(i + 1), 1);
AssertFilesPerLevel(std::to_string(i + 1), 1);
}
TEST_SYNC_POINT("ColumnFamilyTest::AutoManual:5");

View File

@ -91,8 +91,8 @@ TEST_F(CompactFilesTest, L0ConflictsFiles) {
// create couple files
// Background compaction starts and waits in BackgroundCallCompaction:0
for (int i = 0; i < kLevel0Trigger * 4; ++i) {
ASSERT_OK(db->Put(WriteOptions(), ToString(i), ""));
ASSERT_OK(db->Put(WriteOptions(), ToString(100 - i), ""));
ASSERT_OK(db->Put(WriteOptions(), std::to_string(i), ""));
ASSERT_OK(db->Put(WriteOptions(), std::to_string(100 - i), ""));
ASSERT_OK(db->Flush(FlushOptions()));
}
@ -136,7 +136,7 @@ TEST_F(CompactFilesTest, MultipleLevel) {
// create couple files in L0, L3, L4 and L5
for (int i = 5; i > 2; --i) {
collector->ClearFlushedFiles();
ASSERT_OK(db->Put(WriteOptions(), ToString(i), ""));
ASSERT_OK(db->Put(WriteOptions(), std::to_string(i), ""));
ASSERT_OK(db->Flush(FlushOptions()));
// Ensure background work is fully finished including listener callbacks
// before accessing listener state.
@ -145,11 +145,11 @@ TEST_F(CompactFilesTest, MultipleLevel) {
ASSERT_OK(db->CompactFiles(CompactionOptions(), l0_files, i));
std::string prop;
ASSERT_TRUE(
db->GetProperty("rocksdb.num-files-at-level" + ToString(i), &prop));
ASSERT_TRUE(db->GetProperty(
"rocksdb.num-files-at-level" + std::to_string(i), &prop));
ASSERT_EQ("1", prop);
}
ASSERT_OK(db->Put(WriteOptions(), ToString(0), ""));
ASSERT_OK(db->Put(WriteOptions(), std::to_string(0), ""));
ASSERT_OK(db->Flush(FlushOptions()));
ColumnFamilyMetaData meta;
@ -218,7 +218,7 @@ TEST_F(CompactFilesTest, ObsoleteFiles) {
// create couple files
for (int i = 1000; i < 2000; ++i) {
ASSERT_OK(db->Put(WriteOptions(), ToString(i),
ASSERT_OK(db->Put(WriteOptions(), std::to_string(i),
std::string(kWriteBufferSize / 10, 'a' + (i % 26))));
}
@ -257,14 +257,14 @@ TEST_F(CompactFilesTest, NotCutOutputOnLevel0) {
// create couple files
for (int i = 0; i < 500; ++i) {
ASSERT_OK(db->Put(WriteOptions(), ToString(i),
ASSERT_OK(db->Put(WriteOptions(), std::to_string(i),
std::string(1000, 'a' + (i % 26))));
}
ASSERT_OK(static_cast_with_check<DBImpl>(db)->TEST_WaitForFlushMemTable());
auto l0_files_1 = collector->GetFlushedFiles();
collector->ClearFlushedFiles();
for (int i = 0; i < 500; ++i) {
ASSERT_OK(db->Put(WriteOptions(), ToString(i),
ASSERT_OK(db->Put(WriteOptions(), std::to_string(i),
std::string(1000, 'a' + (i % 26))));
}
ASSERT_OK(static_cast_with_check<DBImpl>(db)->TEST_WaitForFlushMemTable());
@ -295,7 +295,7 @@ TEST_F(CompactFilesTest, CapturingPendingFiles) {
// Create 5 files.
for (int i = 0; i < 5; ++i) {
ASSERT_OK(db->Put(WriteOptions(), "key" + ToString(i), "value"));
ASSERT_OK(db->Put(WriteOptions(), "key" + std::to_string(i), "value"));
ASSERT_OK(db->Flush(FlushOptions()));
}
@ -465,7 +465,7 @@ TEST_F(CompactFilesTest, GetCompactionJobInfo) {
// create couple files
for (int i = 0; i < 500; ++i) {
ASSERT_OK(db->Put(WriteOptions(), ToString(i),
ASSERT_OK(db->Put(WriteOptions(), std::to_string(i),
std::string(1000, 'a' + (i % 26))));
}
ASSERT_OK(static_cast_with_check<DBImpl>(db)->TEST_WaitForFlushMemTable());

View File

@ -518,7 +518,7 @@ uint64_t Compaction::OutputFilePreallocationSize() const {
}
}
if (max_output_file_size_ != port::kMaxUint64 &&
if (max_output_file_size_ != std::numeric_limits<uint64_t>::max() &&
(immutable_options_.compaction_style == kCompactionStyleLevel ||
output_level() > 0)) {
preallocation_size = std::min(max_output_file_size_, preallocation_size);
@ -616,7 +616,7 @@ bool Compaction::DoesInputReferenceBlobFiles() const {
uint64_t Compaction::MinInputFileOldestAncesterTime(
const InternalKey* start, const InternalKey* end) const {
uint64_t min_oldest_ancester_time = port::kMaxUint64;
uint64_t min_oldest_ancester_time = std::numeric_limits<uint64_t>::max();
const InternalKeyComparator& icmp =
column_family_data()->internal_comparator();
for (const auto& level_files : inputs_) {

View File

@ -24,40 +24,39 @@ CompactionIterator::CompactionIterator(
InternalIterator* input, const Comparator* cmp, MergeHelper* merge_helper,
SequenceNumber last_sequence, std::vector<SequenceNumber>* snapshots,
SequenceNumber earliest_write_conflict_snapshot,
const SnapshotChecker* snapshot_checker, Env* env,
bool report_detailed_time, bool expect_valid_internal_key,
SequenceNumber job_snapshot, const SnapshotChecker* snapshot_checker,
Env* env, bool report_detailed_time, bool expect_valid_internal_key,
CompactionRangeDelAggregator* range_del_agg,
BlobFileBuilder* blob_file_builder, bool allow_data_in_errors,
const Compaction* compaction, const CompactionFilter* compaction_filter,
bool enforce_single_del_contracts, const Compaction* compaction,
const CompactionFilter* compaction_filter,
const std::atomic<bool>* shutting_down,
const SequenceNumber preserve_deletes_seqnum,
const std::atomic<int>* manual_compaction_paused,
const std::atomic<bool>* manual_compaction_canceled,
const std::shared_ptr<Logger> info_log,
const std::string* full_history_ts_low)
: CompactionIterator(
input, cmp, merge_helper, last_sequence, snapshots,
earliest_write_conflict_snapshot, snapshot_checker, env,
earliest_write_conflict_snapshot, job_snapshot, snapshot_checker, env,
report_detailed_time, expect_valid_internal_key, range_del_agg,
blob_file_builder, allow_data_in_errors,
blob_file_builder, allow_data_in_errors, enforce_single_del_contracts,
std::unique_ptr<CompactionProxy>(
compaction ? new RealCompaction(compaction) : nullptr),
compaction_filter, shutting_down, preserve_deletes_seqnum,
manual_compaction_paused, manual_compaction_canceled, info_log,
full_history_ts_low) {}
compaction_filter, shutting_down, manual_compaction_paused,
manual_compaction_canceled, info_log, full_history_ts_low) {}
CompactionIterator::CompactionIterator(
InternalIterator* input, const Comparator* cmp, MergeHelper* merge_helper,
SequenceNumber /*last_sequence*/, std::vector<SequenceNumber>* snapshots,
SequenceNumber earliest_write_conflict_snapshot,
const SnapshotChecker* snapshot_checker, Env* env,
bool report_detailed_time, bool expect_valid_internal_key,
SequenceNumber job_snapshot, const SnapshotChecker* snapshot_checker,
Env* env, bool report_detailed_time, bool expect_valid_internal_key,
CompactionRangeDelAggregator* range_del_agg,
BlobFileBuilder* blob_file_builder, bool allow_data_in_errors,
bool enforce_single_del_contracts,
std::unique_ptr<CompactionProxy> compaction,
const CompactionFilter* compaction_filter,
const std::atomic<bool>* shutting_down,
const SequenceNumber preserve_deletes_seqnum,
const std::atomic<int>* manual_compaction_paused,
const std::atomic<bool>* manual_compaction_canceled,
const std::shared_ptr<Logger> info_log,
@ -68,6 +67,7 @@ CompactionIterator::CompactionIterator(
merge_helper_(merge_helper),
snapshots_(snapshots),
earliest_write_conflict_snapshot_(earliest_write_conflict_snapshot),
job_snapshot_(job_snapshot),
snapshot_checker_(snapshot_checker),
env_(env),
clock_(env_->GetSystemClock().get()),
@ -80,9 +80,9 @@ CompactionIterator::CompactionIterator(
shutting_down_(shutting_down),
manual_compaction_paused_(manual_compaction_paused),
manual_compaction_canceled_(manual_compaction_canceled),
preserve_deletes_seqnum_(preserve_deletes_seqnum),
info_log_(info_log),
allow_data_in_errors_(allow_data_in_errors),
enforce_single_del_contracts_(enforce_single_del_contracts),
timestamp_size_(cmp_ ? cmp_->timestamp_size() : 0),
full_history_ts_low_(full_history_ts_low),
current_user_key_sequence_(0),
@ -237,6 +237,10 @@ bool CompactionIterator::InvokeFilterIfNeeded(bool* need_skip,
return false;
}
TEST_SYNC_POINT_CALLBACK(
"CompactionIterator::InvokeFilterIfNeeded::TamperWithBlobIndex",
&value_);
// For integrated BlobDB impl, CompactionIterator reads blob value.
// For Stacked BlobDB impl, the corresponding CompactionFilter's
// FilterV2 method should read the blob value.
@ -306,6 +310,14 @@ bool CompactionIterator::InvokeFilterIfNeeded(bool* need_skip,
// no value associated with delete
value_.clear();
iter_stats_.num_record_drop_user++;
} else if (filter == CompactionFilter::Decision::kPurge) {
// convert the current key to a single delete; key_ is pointing into
// current_key_ at this point, so updating current_key_ updates key()
ikey_.type = kTypeSingleDeletion;
current_key_.UpdateInternalKey(ikey_.sequence, kTypeSingleDeletion);
// no value associated with single delete
value_.clear();
iter_stats_.num_record_drop_user++;
} else if (filter == CompactionFilter::Decision::kChangeValue) {
if (ikey_.type == kTypeBlobIndex) {
// value transfer from blob file to inlined data
@ -624,24 +636,39 @@ void CompactionIterator::NextFromInput() {
TEST_SYNC_POINT_CALLBACK(
"CompactionIterator::NextFromInput:SingleDelete:2", nullptr);
if (next_ikey.type == kTypeSingleDeletion ||
next_ikey.type == kTypeDeletion) {
if (next_ikey.type == kTypeSingleDeletion) {
// We encountered two SingleDeletes for same key in a row. This
// could be due to unexpected user input. If write-(un)prepared
// transaction is used, this could also be due to releasing an old
// snapshot between a Put and its matching SingleDelete.
// Furthermore, if write-(un)prepared transaction is rolled back
// after prepare, we will write a Delete to cancel a prior Put. If
// old snapshot is released between a later Put and its matching
// SingleDelete, we will end up with a Delete followed by
// SingleDelete.
// Skip the first SingleDelete and let the next iteration decide
// how to handle the second SingleDelete or Delete.
// how to handle the second SingleDelete.
// First SingleDelete has been skipped since we already called
// input_.Next().
++iter_stats_.num_record_drop_obsolete;
++iter_stats_.num_single_del_mismatch;
} else if (next_ikey.type == kTypeDeletion) {
std::ostringstream oss;
oss << "Found SD and type: " << static_cast<int>(next_ikey.type)
<< " on the same key, violating the contract "
"of SingleDelete. Check your application to make sure the "
"application does not mix SingleDelete and Delete for "
"the same key. If you are using "
"write-prepared/write-unprepared transactions, and use "
"SingleDelete to delete certain keys, then make sure "
"TransactionDBOptions::rollback_deletion_type_callback is "
"configured properly. Mixing SD and DEL can lead to "
"undefined behaviors";
++iter_stats_.num_record_drop_obsolete;
++iter_stats_.num_single_del_mismatch;
if (enforce_single_del_contracts_) {
ROCKS_LOG_ERROR(info_log_, "%s", oss.str().c_str());
valid_ = false;
status_ = Status::Corruption(oss.str());
return;
}
ROCKS_LOG_WARN(info_log_, "%s", oss.str().c_str());
} else if (!is_timestamp_eligible_for_gc) {
// We cannot drop the SingleDelete as timestamp is enabled, and
// timestamp of this key is greater than or equal to
@ -758,7 +785,6 @@ void CompactionIterator::NextFromInput() {
(ikey_.type == kTypeDeletionWithTimestamp &&
cmp_with_history_ts_low_ < 0)) &&
DefinitelyInSnapshot(ikey_.sequence, earliest_snapshot_) &&
ikeyNotNeededForIncrementalSnapshot() &&
compaction_->KeyNotExistsBeyondOutputLevel(ikey_.user_key,
&level_ptrs_)) {
// TODO(noetzli): This is the only place where we use compaction_
@ -792,7 +818,7 @@ void CompactionIterator::NextFromInput() {
} else if ((ikey_.type == kTypeDeletion ||
(ikey_.type == kTypeDeletionWithTimestamp &&
cmp_with_history_ts_low_ < 0)) &&
bottommost_level_ && ikeyNotNeededForIncrementalSnapshot()) {
bottommost_level_) {
// Handle the case where we have a delete key at the bottom most level
// We can skip outputting the key iff there are no subsequent puts for this
// key
@ -954,6 +980,10 @@ void CompactionIterator::GarbageCollectBlobIfNeeded() {
// GC for integrated BlobDB
if (compaction_->enable_blob_garbage_collection()) {
TEST_SYNC_POINT_CALLBACK(
"CompactionIterator::GarbageCollectBlobIfNeeded::TamperWithBlobIndex",
&value_);
BlobIndex blob_index;
{
@ -1060,10 +1090,9 @@ void CompactionIterator::PrepareOutput() {
// Can we do the same for levels above bottom level as long as
// KeyNotExistsBeyondOutputLevel() return true?
if (valid_ && compaction_ != nullptr &&
!compaction_->allow_ingest_behind() &&
ikeyNotNeededForIncrementalSnapshot() && bottommost_level_ &&
!compaction_->allow_ingest_behind() && bottommost_level_ &&
DefinitelyInSnapshot(ikey_.sequence, earliest_snapshot_) &&
ikey_.type != kTypeMerge) {
ikey_.type != kTypeMerge && current_key_committed_) {
assert(ikey_.type != kTypeDeletion);
assert(ikey_.type != kTypeSingleDeletion ||
(timestamp_size_ || full_history_ts_low_));
@ -1139,13 +1168,6 @@ inline SequenceNumber CompactionIterator::findEarliestVisibleSnapshot(
return kMaxSequenceNumber;
}
// used in 2 places - prevents deletion markers to be dropped if they may be
// needed and disables seqnum zero-out in PrepareOutput for recent keys.
inline bool CompactionIterator::ikeyNotNeededForIncrementalSnapshot() {
return (!compaction_->preserve_deletes()) ||
(ikey_.sequence < preserve_deletes_seqnum_);
}
uint64_t CompactionIterator::ComputeBlobGarbageCollectionCutoffFileNumber(
const CompactionProxy* compaction) {
if (!compaction) {

View File

@ -92,8 +92,6 @@ class CompactionIterator {
virtual bool allow_ingest_behind() const = 0;
virtual bool preserve_deletes() const = 0;
virtual bool allow_mmap_reads() const = 0;
virtual bool enable_blob_garbage_collection() const = 0;
@ -139,8 +137,6 @@ class CompactionIterator {
return compaction_->immutable_options()->allow_ingest_behind;
}
bool preserve_deletes() const override { return false; }
bool allow_mmap_reads() const override {
return compaction_->immutable_options()->allow_mmap_reads;
}
@ -176,14 +172,13 @@ class CompactionIterator {
InternalIterator* input, const Comparator* cmp, MergeHelper* merge_helper,
SequenceNumber last_sequence, std::vector<SequenceNumber>* snapshots,
SequenceNumber earliest_write_conflict_snapshot,
const SnapshotChecker* snapshot_checker, Env* env,
bool report_detailed_time, bool expect_valid_internal_key,
SequenceNumber job_snapshot, const SnapshotChecker* snapshot_checker,
Env* env, bool report_detailed_time, bool expect_valid_internal_key,
CompactionRangeDelAggregator* range_del_agg,
BlobFileBuilder* blob_file_builder, bool allow_data_in_errors,
const Compaction* compaction = nullptr,
bool enforce_single_del_contracts, const Compaction* compaction = nullptr,
const CompactionFilter* compaction_filter = nullptr,
const std::atomic<bool>* shutting_down = nullptr,
const SequenceNumber preserve_deletes_seqnum = 0,
const std::atomic<int>* manual_compaction_paused = nullptr,
const std::atomic<bool>* manual_compaction_canceled = nullptr,
const std::shared_ptr<Logger> info_log = nullptr,
@ -194,14 +189,14 @@ class CompactionIterator {
InternalIterator* input, const Comparator* cmp, MergeHelper* merge_helper,
SequenceNumber last_sequence, std::vector<SequenceNumber>* snapshots,
SequenceNumber earliest_write_conflict_snapshot,
const SnapshotChecker* snapshot_checker, Env* env,
bool report_detailed_time, bool expect_valid_internal_key,
SequenceNumber job_snapshot, const SnapshotChecker* snapshot_checker,
Env* env, bool report_detailed_time, bool expect_valid_internal_key,
CompactionRangeDelAggregator* range_del_agg,
BlobFileBuilder* blob_file_builder, bool allow_data_in_errors,
bool enforce_single_del_contracts,
std::unique_ptr<CompactionProxy> compaction,
const CompactionFilter* compaction_filter = nullptr,
const std::atomic<bool>* shutting_down = nullptr,
const SequenceNumber preserve_deletes_seqnum = 0,
const std::atomic<int>* manual_compaction_paused = nullptr,
const std::atomic<bool>* manual_compaction_canceled = nullptr,
const std::shared_ptr<Logger> info_log = nullptr,
@ -272,14 +267,9 @@ class CompactionIterator {
inline SequenceNumber findEarliestVisibleSnapshot(
SequenceNumber in, SequenceNumber* prev_snapshot);
// Checks whether the currently seen ikey_ is needed for
// incremental (differential) snapshot and hence can't be dropped
// or seqnum be zero-ed out even if all other conditions for it are met.
inline bool ikeyNotNeededForIncrementalSnapshot();
inline bool KeyCommitted(SequenceNumber sequence) {
return snapshot_checker_ == nullptr ||
snapshot_checker_->CheckInSnapshot(sequence, kMaxSequenceNumber) ==
snapshot_checker_->CheckInSnapshot(sequence, job_snapshot_) ==
SnapshotCheckerResult::kInSnapshot;
}
@ -320,6 +310,7 @@ class CompactionIterator {
std::unordered_set<SequenceNumber> released_snapshots_;
std::vector<SequenceNumber>::const_iterator earliest_snapshot_iter_;
const SequenceNumber earliest_write_conflict_snapshot_;
const SequenceNumber job_snapshot_;
const SnapshotChecker* const snapshot_checker_;
Env* env_;
SystemClock* clock_;
@ -332,7 +323,6 @@ class CompactionIterator {
const std::atomic<bool>* shutting_down_;
const std::atomic<int>* manual_compaction_paused_;
const std::atomic<bool>* manual_compaction_canceled_;
const SequenceNumber preserve_deletes_seqnum_;
bool bottommost_level_;
bool valid_ = false;
bool visible_at_tip_;
@ -343,6 +333,8 @@ class CompactionIterator {
bool allow_data_in_errors_;
const bool enforce_single_del_contracts_;
// Comes from comparator.
const size_t timestamp_size_;

View File

@ -166,8 +166,6 @@ class FakeCompaction : public CompactionIterator::CompactionProxy {
bool allow_ingest_behind() const override { return is_allow_ingest_behind; }
bool preserve_deletes() const override { return false; }
bool allow_mmap_reads() const override { return false; }
bool enable_blob_garbage_collection() const override { return false; }
@ -277,11 +275,12 @@ class CompactionIteratorTest : public testing::TestWithParam<bool> {
iter_->SeekToFirst();
c_iter_.reset(new CompactionIterator(
iter_.get(), cmp_, merge_helper_.get(), last_sequence, &snapshots_,
earliest_write_conflict_snapshot, snapshot_checker_.get(),
Env::Default(), false /* report_detailed_time */, false,
range_del_agg_.get(), nullptr /* blob_file_builder */,
true /*allow_data_in_errors*/, std::move(compaction), filter,
&shutting_down_, /*preserve_deletes_seqnum=*/0,
earliest_write_conflict_snapshot, kMaxSequenceNumber,
snapshot_checker_.get(), Env::Default(),
false /* report_detailed_time */, false, range_del_agg_.get(),
nullptr /* blob_file_builder */, true /*allow_data_in_errors*/,
true /*enforce_single_del_contracts*/, std::move(compaction), filter,
&shutting_down_,
/*manual_compaction_paused=*/nullptr,
/*manual_compaction_canceled=*/nullptr, /*info_log=*/nullptr,
full_history_ts_low));
@ -315,7 +314,7 @@ class CompactionIteratorTest : public testing::TestWithParam<bool> {
key_not_exists_beyond_output_level, full_history_ts_low);
c_iter_->SeekToFirst();
for (size_t i = 0; i < expected_keys.size(); i++) {
std::string info = "i = " + ToString(i);
std::string info = "i = " + std::to_string(i);
ASSERT_TRUE(c_iter_->Valid()) << info;
ASSERT_OK(c_iter_->status()) << info;
ASSERT_EQ(expected_keys[i], c_iter_->key().ToString()) << info;

View File

@ -417,16 +417,17 @@ CompactionJob::CompactionJob(
int job_id, Compaction* compaction, const ImmutableDBOptions& db_options,
const MutableDBOptions& mutable_db_options, const FileOptions& file_options,
VersionSet* versions, const std::atomic<bool>* shutting_down,
const SequenceNumber preserve_deletes_seqnum, LogBuffer* log_buffer,
FSDirectory* db_directory, FSDirectory* output_directory,
FSDirectory* blob_output_directory, Statistics* stats,
InstrumentedMutex* db_mutex, ErrorHandler* db_error_handler,
LogBuffer* log_buffer, FSDirectory* db_directory,
FSDirectory* output_directory, FSDirectory* blob_output_directory,
Statistics* stats, InstrumentedMutex* db_mutex,
ErrorHandler* db_error_handler,
std::vector<SequenceNumber> existing_snapshots,
SequenceNumber earliest_write_conflict_snapshot,
const SnapshotChecker* snapshot_checker, std::shared_ptr<Cache> table_cache,
EventLogger* event_logger, bool paranoid_file_checks, bool measure_io_stats,
const std::string& dbname, CompactionJobStats* compaction_job_stats,
Env::Priority thread_pri, const std::shared_ptr<IOTracer>& io_tracer,
const SnapshotChecker* snapshot_checker, JobContext* job_context,
std::shared_ptr<Cache> table_cache, EventLogger* event_logger,
bool paranoid_file_checks, bool measure_io_stats, const std::string& dbname,
CompactionJobStats* compaction_job_stats, Env::Priority thread_pri,
const std::shared_ptr<IOTracer>& io_tracer,
const std::atomic<int>* manual_compaction_paused,
const std::atomic<bool>* manual_compaction_canceled,
const std::string& db_id, const std::string& db_session_id,
@ -456,7 +457,6 @@ CompactionJob::CompactionJob(
shutting_down_(shutting_down),
manual_compaction_paused_(manual_compaction_paused),
manual_compaction_canceled_(manual_compaction_canceled),
preserve_deletes_seqnum_(preserve_deletes_seqnum),
db_directory_(db_directory),
blob_output_directory_(blob_output_directory),
db_mutex_(db_mutex),
@ -464,6 +464,7 @@ CompactionJob::CompactionJob(
existing_snapshots_(std::move(existing_snapshots)),
earliest_write_conflict_snapshot_(earliest_write_conflict_snapshot),
snapshot_checker_(snapshot_checker),
job_context_(job_context),
table_cache_(std::move(table_cache)),
event_logger_(event_logger),
paranoid_file_checks_(paranoid_file_checks),
@ -1252,7 +1253,7 @@ void CompactionJob::NotifyOnSubcompactionBegin(
if (shutting_down_->load(std::memory_order_acquire)) {
return;
}
if (c->is_manual_compaction() &&
if (c->is_manual_compaction() && manual_compaction_paused_ &&
manual_compaction_paused_->load(std::memory_order_acquire) > 0) {
return;
}
@ -1341,6 +1342,8 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
CompactionRangeDelAggregator range_del_agg(&cfd->internal_comparator(),
existing_snapshots_);
// TODO: since we already use C++17, should use
// std::optional<const Slice> instead.
const Slice* const start = sub_compact->start;
const Slice* const end = sub_compact->end;
@ -1362,9 +1365,13 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
// Although the v2 aggregator is what the level iterator(s) know about,
// the AddTombstones calls will be propagated down to the v1 aggregator.
std::unique_ptr<InternalIterator> raw_input(
versions_->MakeInputIterator(read_options, sub_compact->compaction,
&range_del_agg, file_options_for_read_));
std::unique_ptr<InternalIterator> raw_input(versions_->MakeInputIterator(
read_options, sub_compact->compaction, &range_del_agg,
file_options_for_read_,
(start == nullptr) ? std::optional<const Slice>{}
: std::optional<const Slice>{*start},
(end == nullptr) ? std::optional<const Slice>{}
: std::optional<const Slice>{*end}));
InternalIterator* input = raw_input.get();
IterKey start_ikey;
@ -1463,14 +1470,17 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
Status status;
const std::string* const full_history_ts_low =
full_history_ts_low_.empty() ? nullptr : &full_history_ts_low_;
const SequenceNumber job_snapshot_seq =
job_context_ ? job_context_->GetJobSnapshotSequence()
: kMaxSequenceNumber;
sub_compact->c_iter.reset(new CompactionIterator(
input, cfd->user_comparator(), &merge, versions_->LastSequence(),
&existing_snapshots_, earliest_write_conflict_snapshot_,
&existing_snapshots_, earliest_write_conflict_snapshot_, job_snapshot_seq,
snapshot_checker_, env_, ShouldReportDetailedTime(env_, stats_),
/*expect_valid_internal_key=*/true, &range_del_agg,
blob_file_builder.get(), db_options_.allow_data_in_errors,
sub_compact->compaction, compaction_filter, shutting_down_,
preserve_deletes_seqnum_, manual_compaction_paused_,
db_options_.enforce_single_del_contracts, sub_compact->compaction,
compaction_filter, shutting_down_, manual_compaction_paused_,
manual_compaction_canceled_, db_options_.info_log, full_history_ts_low));
auto c_iter = sub_compact->c_iter.get();
c_iter->SeekToFirst();
@ -1523,11 +1533,15 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
break;
}
const ParsedInternalKey& ikey = c_iter->ikey();
status = sub_compact->current_output()->meta.UpdateBoundaries(
key, value, ikey.sequence, ikey.type);
if (!status.ok()) {
break;
}
sub_compact->current_output_file_size =
sub_compact->builder->EstimatedFileSize();
const ParsedInternalKey& ikey = c_iter->ikey();
sub_compact->current_output()->meta.UpdateBoundaries(
key, value, ikey.sequence, ikey.type);
sub_compact->num_output_records++;
// Close output file if it is big enough. Two possibilities determine it's
@ -1960,7 +1974,8 @@ Status CompactionJob::FinishCompactionOutputFile(
refined_oldest_ancester_time =
sub_compact->compaction->MinInputFileOldestAncesterTime(
&(meta->smallest), &(meta->largest));
if (refined_oldest_ancester_time != port::kMaxUint64) {
if (refined_oldest_ancester_time !=
std::numeric_limits<uint64_t>::max()) {
meta->oldest_ancester_time = refined_oldest_ancester_time;
}
}
@ -2250,7 +2265,7 @@ Status CompactionJob::OpenCompactionOutputFile(
sub_compact->compaction->MinInputFileOldestAncesterTime(
(sub_compact->start != nullptr) ? &tmp_start : nullptr,
(sub_compact->end != nullptr) ? &tmp_end : nullptr);
if (oldest_ancester_time == port::kMaxUint64) {
if (oldest_ancester_time == std::numeric_limits<uint64_t>::max()) {
oldest_ancester_time = current_time;
}
@ -2444,7 +2459,7 @@ void CompactionJob::LogCompaction() {
<< "compaction_reason"
<< GetCompactionReasonString(compaction->compaction_reason());
for (size_t i = 0; i < compaction->num_input_levels(); ++i) {
stream << ("files_L" + ToString(compaction->level(i)));
stream << ("files_L" + std::to_string(compaction->level(i)));
stream.StartArray();
for (auto f : *compaction->inputs(i)) {
stream << f->fd.GetNumber();
@ -2482,19 +2497,20 @@ CompactionServiceCompactionJob::CompactionServiceCompactionJob(
std::vector<SequenceNumber> existing_snapshots,
std::shared_ptr<Cache> table_cache, EventLogger* event_logger,
const std::string& dbname, const std::shared_ptr<IOTracer>& io_tracer,
const std::atomic<bool>* manual_compaction_canceled,
const std::string& db_id, const std::string& db_session_id,
const std::string& output_path,
const CompactionServiceInput& compaction_service_input,
CompactionServiceResult* compaction_service_result)
: CompactionJob(
job_id, compaction, db_options, mutable_db_options, file_options,
versions, shutting_down, 0, log_buffer, nullptr, output_directory,
versions, shutting_down, log_buffer, nullptr, output_directory,
nullptr, stats, db_mutex, db_error_handler, existing_snapshots,
kMaxSequenceNumber, nullptr, table_cache, event_logger,
kMaxSequenceNumber, nullptr, nullptr, table_cache, event_logger,
compaction->mutable_cf_options()->paranoid_file_checks,
compaction->mutable_cf_options()->report_bg_io_stats, dbname,
&(compaction_service_result->stats), Env::Priority::USER, io_tracer,
nullptr, nullptr, db_id, db_session_id,
nullptr, manual_compaction_canceled, db_id, db_session_id,
compaction->column_family_data()->GetFullHistoryTsLow()),
output_path_(output_path),
compaction_input_(compaction_service_input),
@ -2936,6 +2952,7 @@ static std::unordered_map<std::string, OptionTypeInfo> cs_result_type_info = {
const void* addr1, const void* addr2, std::string* mismatch) {
const auto status1 = static_cast<const Status*>(addr1);
const auto status2 = static_cast<const Status*>(addr2);
StatusSerializationAdapter adatper1(*status1);
StatusSerializationAdapter adapter2(*status2);
return OptionTypeInfo::TypesAreEqual(opts, status_adapter_type_info,
@ -2993,7 +3010,7 @@ Status CompactionServiceInput::Read(const std::string& data_str,
} else {
return Status::NotSupported(
"Compaction Service Input data version not supported: " +
ToString(format_version));
std::to_string(format_version));
}
}
@ -3022,7 +3039,7 @@ Status CompactionServiceResult::Read(const std::string& data_str,
} else {
return Status::NotSupported(
"Compaction Service Result data version not supported: " +
ToString(format_version));
std::to_string(format_version));
}
}

View File

@ -67,14 +67,13 @@ class CompactionJob {
int job_id, Compaction* compaction, const ImmutableDBOptions& db_options,
const MutableDBOptions& mutable_db_options,
const FileOptions& file_options, VersionSet* versions,
const std::atomic<bool>* shutting_down,
const SequenceNumber preserve_deletes_seqnum, LogBuffer* log_buffer,
const std::atomic<bool>* shutting_down, LogBuffer* log_buffer,
FSDirectory* db_directory, FSDirectory* output_directory,
FSDirectory* blob_output_directory, Statistics* stats,
InstrumentedMutex* db_mutex, ErrorHandler* db_error_handler,
std::vector<SequenceNumber> existing_snapshots,
SequenceNumber earliest_write_conflict_snapshot,
const SnapshotChecker* snapshot_checker,
const SnapshotChecker* snapshot_checker, JobContext* job_context,
std::shared_ptr<Cache> table_cache, EventLogger* event_logger,
bool paranoid_file_checks, bool measure_io_stats,
const std::string& dbname, CompactionJobStats* compaction_job_stats,
@ -196,7 +195,6 @@ class CompactionJob {
const std::atomic<bool>* shutting_down_;
const std::atomic<int>* manual_compaction_paused_;
const std::atomic<bool>* manual_compaction_canceled_;
const SequenceNumber preserve_deletes_seqnum_;
FSDirectory* db_directory_;
FSDirectory* blob_output_directory_;
InstrumentedMutex* db_mutex_;
@ -214,6 +212,8 @@ class CompactionJob {
const SnapshotChecker* const snapshot_checker_;
JobContext* job_context_;
std::shared_ptr<Cache> table_cache_;
EventLogger* event_logger_;
@ -345,6 +345,7 @@ class CompactionServiceCompactionJob : private CompactionJob {
std::vector<SequenceNumber> existing_snapshots,
std::shared_ptr<Cache> table_cache, EventLogger* event_logger,
const std::string& dbname, const std::shared_ptr<IOTracer>& io_tracer,
const std::atomic<bool>* manual_compaction_canceled,
const std::string& db_id, const std::string& db_session_id,
const std::string& output_path,
const CompactionServiceInput& compaction_service_input,

View File

@ -268,10 +268,10 @@ class CompactionJobStatsTest : public testing::Test,
if (cf == 0) {
// default cfd
EXPECT_TRUE(db_->GetProperty(
"rocksdb.num-files-at-level" + ToString(level), &property));
"rocksdb.num-files-at-level" + std::to_string(level), &property));
} else {
EXPECT_TRUE(db_->GetProperty(
handles_[cf], "rocksdb.num-files-at-level" + ToString(level),
handles_[cf], "rocksdb.num-files-at-level" + std::to_string(level),
&property));
}
return atoi(property.c_str());
@ -672,7 +672,7 @@ TEST_P(CompactionJobStatsTest, CompactionJobStatsTest) {
snprintf(buf, kBufSize, "%d", ++num_L0_files);
ASSERT_EQ(std::string(buf), FilesPerLevel(1));
}
ASSERT_EQ(ToString(num_L0_files), FilesPerLevel(1));
ASSERT_EQ(std::to_string(num_L0_files), FilesPerLevel(1));
// 2nd Phase: perform L0 -> L1 compaction.
int L0_compaction_count = 6;

View File

@ -87,7 +87,6 @@ class CompactionJobTestBase : public testing::Test {
/*block_cache_tracer=*/nullptr,
/*io_tracer=*/nullptr, /*db_session_id*/ "")),
shutting_down_(false),
preserve_deletes_seqnum_(0),
mock_table_factory_(new mock::MockTableFactory()),
error_handler_(nullptr, db_options_, &mutex_),
encode_u64_ts_(std::move(encode_u64_ts)) {
@ -237,8 +236,8 @@ class CompactionJobTestBase : public testing::Test {
for (int i = 0; i < 2; ++i) {
auto contents = mock::MakeMockFile();
for (int k = 0; k < kKeysPerFile; ++k) {
auto key = ToString(i * kMatchingKeys + k);
auto value = ToString(i * kKeysPerFile + k);
auto key = std::to_string(i * kMatchingKeys + k);
auto value = std::to_string(i * kKeysPerFile + k);
InternalKey internal_key(key, ++sequence_number, kTypeValue);
// This is how the key will look like once it's written in bottommost
@ -354,11 +353,11 @@ class CompactionJobTestBase : public testing::Test {
ucmp_->timestamp_size() == full_history_ts_low_.size());
CompactionJob compaction_job(
0, &compaction, db_options_, mutable_db_options_, env_options_,
versions_.get(), &shutting_down_, preserve_deletes_seqnum_, &log_buffer,
nullptr, nullptr, nullptr, nullptr, &mutex_, &error_handler_, snapshots,
earliest_write_conflict_snapshot, snapshot_checker, table_cache_,
&event_logger, false, false, dbname_, &compaction_job_stats_,
Env::Priority::USER, nullptr /* IOTracer */,
versions_.get(), &shutting_down_, &log_buffer, nullptr, nullptr,
nullptr, nullptr, &mutex_, &error_handler_, snapshots,
earliest_write_conflict_snapshot, snapshot_checker, nullptr,
table_cache_, &event_logger, false, false, dbname_,
&compaction_job_stats_, Env::Priority::USER, nullptr /* IOTracer */,
/*manual_compaction_paused=*/nullptr,
/*manual_compaction_canceled=*/nullptr, /*db_id=*/"",
/*db_session_id=*/"", full_history_ts_low_);
@ -409,7 +408,6 @@ class CompactionJobTestBase : public testing::Test {
std::unique_ptr<VersionSet> versions_;
InstrumentedMutex mutex_;
std::atomic<bool> shutting_down_;
SequenceNumber preserve_deletes_seqnum_;
std::shared_ptr<mock::MockTableFactory> mock_table_factory_;
CompactionJobStats compaction_job_stats_;
ColumnFamilyData* cfd_;
@ -892,10 +890,10 @@ TEST_F(CompactionJobTest, MultiSingleDelete) {
// -> Snapshot Put
// K: SDel SDel Put SDel Put Put Snapshot SDel Put SDel SDel Put SDel
// -> Snapshot Put Snapshot SDel
// L: SDel Put Del Put SDel Snapshot Del Put Del SDel Put SDel
// -> Snapshot SDel
// M: (Put) SDel Put Del Put SDel Snapshot Put Del SDel Put SDel Del
// -> SDel Snapshot Del
// L: SDel Put SDel Put SDel Snapshot SDel Put SDel SDel Put SDel
// -> Snapshot SDel Put SDel
// M: (Put) SDel Put SDel Put SDel Snapshot Put SDel SDel Put SDel SDel
// -> SDel Snapshot Put SDel
NewDB();
auto file1 = mock::MakeMockFile({
@ -926,14 +924,14 @@ TEST_F(CompactionJobTest, MultiSingleDelete) {
{KeyStr("L", 16U, kTypeSingleDeletion), ""},
{KeyStr("L", 15U, kTypeValue), "val"},
{KeyStr("L", 14U, kTypeSingleDeletion), ""},
{KeyStr("L", 13U, kTypeDeletion), ""},
{KeyStr("L", 13U, kTypeSingleDeletion), ""},
{KeyStr("L", 12U, kTypeValue), "val"},
{KeyStr("L", 11U, kTypeDeletion), ""},
{KeyStr("M", 16U, kTypeDeletion), ""},
{KeyStr("L", 11U, kTypeSingleDeletion), ""},
{KeyStr("M", 16U, kTypeSingleDeletion), ""},
{KeyStr("M", 15U, kTypeSingleDeletion), ""},
{KeyStr("M", 14U, kTypeValue), "val"},
{KeyStr("M", 13U, kTypeSingleDeletion), ""},
{KeyStr("M", 12U, kTypeDeletion), ""},
{KeyStr("M", 12U, kTypeSingleDeletion), ""},
{KeyStr("M", 11U, kTypeValue), "val"},
});
AddMockFile(file1);
@ -974,12 +972,12 @@ TEST_F(CompactionJobTest, MultiSingleDelete) {
{KeyStr("K", 1U, kTypeSingleDeletion), ""},
{KeyStr("L", 5U, kTypeSingleDeletion), ""},
{KeyStr("L", 4U, kTypeValue), "val"},
{KeyStr("L", 3U, kTypeDeletion), ""},
{KeyStr("L", 3U, kTypeSingleDeletion), ""},
{KeyStr("L", 2U, kTypeValue), "val"},
{KeyStr("L", 1U, kTypeSingleDeletion), ""},
{KeyStr("M", 10U, kTypeSingleDeletion), ""},
{KeyStr("M", 7U, kTypeValue), "val"},
{KeyStr("M", 5U, kTypeDeletion), ""},
{KeyStr("M", 5U, kTypeSingleDeletion), ""},
{KeyStr("M", 4U, kTypeValue), "val"},
{KeyStr("M", 3U, kTypeSingleDeletion), ""},
});
@ -1021,7 +1019,9 @@ TEST_F(CompactionJobTest, MultiSingleDelete) {
{KeyStr("K", 8U, kTypeValue), "val3"},
{KeyStr("L", 16U, kTypeSingleDeletion), ""},
{KeyStr("L", 15U, kTypeValue), ""},
{KeyStr("M", 16U, kTypeDeletion), ""},
{KeyStr("L", 11U, kTypeSingleDeletion), ""},
{KeyStr("M", 15U, kTypeSingleDeletion), ""},
{KeyStr("M", 14U, kTypeValue), ""},
{KeyStr("M", 3U, kTypeSingleDeletion), ""}});
SetLastSequence(22U);
@ -1107,6 +1107,21 @@ TEST_F(CompactionJobTest, OldestBlobFileNumber) {
/* expected_oldest_blob_file_number */ 19);
}
TEST_F(CompactionJobTest, NoEnforceSingleDeleteContract) {
db_options_.enforce_single_del_contracts = false;
NewDB();
auto file =
mock::MakeMockFile({{KeyStr("a", 4U, kTypeSingleDeletion), ""},
{KeyStr("a", 3U, kTypeDeletion), "dontcare"}});
AddMockFile(file);
SetLastSequence(4U);
auto expected_results = mock::MakeMockFile();
auto files = cfd_->current()->storage_info()->LevelFiles(0);
RunCompaction({files}, expected_results);
}
TEST_F(CompactionJobTest, InputSerialization) {
// Setup a random CompactionServiceInput
CompactionServiceInput input;

View File

@ -65,7 +65,7 @@ bool FindIntraL0Compaction(const std::vector<FileMetaData*>& level_files,
size_t compact_bytes = static_cast<size_t>(level_files[start]->fd.file_size);
uint64_t compensated_compact_bytes =
level_files[start]->compensated_file_size;
size_t compact_bytes_per_del_file = port::kMaxSizet;
size_t compact_bytes_per_del_file = std::numeric_limits<size_t>::max();
// Compaction range will be [start, limit).
size_t limit;
// Pull in files until the amount of compaction work per deleted file begins
@ -401,7 +401,7 @@ Status CompactionPicker::GetCompactionInputsFromFileNumbers(
"Cannot find matched SST files for the following file numbers:");
for (auto fn : *input_set) {
message += " ";
message += ToString(fn);
message += std::to_string(fn);
}
return Status::InvalidArgument(message);
}
@ -717,7 +717,7 @@ Compaction* CompactionPicker::CompactRange(
// files that are created during the current compaction.
if (compact_range_options.bottommost_level_compaction ==
BottommostLevelCompaction::kForceOptimized &&
max_file_num_to_ignore != port::kMaxUint64) {
max_file_num_to_ignore != std::numeric_limits<uint64_t>::max()) {
assert(input_level == output_level);
// inputs_shrunk holds a continuous subset of input files which were all
// created before the current manual compaction
@ -1004,14 +1004,14 @@ Status CompactionPicker::SanitizeCompactionInputFiles(
return Status::InvalidArgument(
"Output level for column family " + cf_meta.name +
" must between [0, " +
ToString(cf_meta.levels[cf_meta.levels.size() - 1].level) + "].");
std::to_string(cf_meta.levels[cf_meta.levels.size() - 1].level) + "].");
}
if (output_level > MaxOutputLevel()) {
return Status::InvalidArgument(
"Exceed the maximum output level defined by "
"the current compaction algorithm --- " +
ToString(MaxOutputLevel()));
std::to_string(MaxOutputLevel()));
}
if (output_level < 0) {
@ -1061,8 +1061,8 @@ Status CompactionPicker::SanitizeCompactionInputFiles(
return Status::InvalidArgument(
"Cannot compact file to up level, input file: " +
MakeTableFileName("", file_num) + " level " +
ToString(input_file_level) + " > output level " +
ToString(output_level));
std::to_string(input_file_level) + " > output level " +
std::to_string(output_level));
}
}

View File

@ -504,7 +504,7 @@ bool LevelCompactionBuilder::PickIntraL0Compaction() {
return false;
}
return FindIntraL0Compaction(level_files, kMinFilesForIntraL0Compaction,
port::kMaxUint64,
std::numeric_limits<uint64_t>::max(),
mutable_cf_options_.max_compaction_bytes,
&start_level_inputs_, earliest_mem_seqno_);
}

View File

@ -273,9 +273,9 @@ TEST_F(CompactionPickerTest, NeedsCompactionLevel) {
// start a brand new version in each test.
NewVersionStorage(kLevels, kCompactionStyleLevel);
for (int i = 0; i < file_count; ++i) {
Add(level, i, ToString((i + 100) * 1000).c_str(),
ToString((i + 100) * 1000 + 999).c_str(),
file_size, 0, i * 100, i * 100 + 99);
Add(level, i, std::to_string((i + 100) * 1000).c_str(),
std::to_string((i + 100) * 1000 + 999).c_str(), file_size, 0,
i * 100, i * 100 + 99);
}
UpdateVersionStorageInfo();
ASSERT_EQ(vstorage_->CompactionScoreLevel(0), level);
@ -439,8 +439,8 @@ TEST_F(CompactionPickerTest, NeedsCompactionUniversal) {
for (int i = 1;
i <= mutable_cf_options_.level0_file_num_compaction_trigger * 2; ++i) {
NewVersionStorage(1, kCompactionStyleUniversal);
Add(0, i, ToString((i + 100) * 1000).c_str(),
ToString((i + 100) * 1000 + 999).c_str(), 1000000, 0, i * 100,
Add(0, i, std::to_string((i + 100) * 1000).c_str(),
std::to_string((i + 100) * 1000 + 999).c_str(), 1000000, 0, i * 100,
i * 100 + 99);
UpdateVersionStorageInfo();
ASSERT_EQ(level_compaction_picker.NeedsCompaction(vstorage_.get()),
@ -852,17 +852,17 @@ TEST_F(CompactionPickerTest, UniversalIncrementalSpace4) {
// L3: (1101, 1180) (1201, 1280) ... (7901, 7908)
// L4: (1130, 1150) (1160, 1210) (1230, 1250) (1260 1310) ... (7960, 8010)
for (int i = 11; i < 79; i++) {
Add(3, 100 + i * 3, ToString(i * 100).c_str(),
ToString(i * 100 + 80).c_str(), kFileSize, 0, 200, 251);
Add(3, 100 + i * 3, std::to_string(i * 100).c_str(),
std::to_string(i * 100 + 80).c_str(), kFileSize, 0, 200, 251);
// Add a tie breaker
if (i == 66) {
Add(3, 10000U, "6690", "6699", kFileSize, 0, 200, 251);
}
Add(4, 100 + i * 3 + 1, ToString(i * 100 + 30).c_str(),
ToString(i * 100 + 50).c_str(), kFileSize, 0, 200, 251);
Add(4, 100 + i * 3 + 2, ToString(i * 100 + 60).c_str(),
ToString(i * 100 + 110).c_str(), kFileSize, 0, 200, 251);
Add(4, 100 + i * 3 + 1, std::to_string(i * 100 + 30).c_str(),
std::to_string(i * 100 + 50).c_str(), kFileSize, 0, 200, 251);
Add(4, 100 + i * 3 + 2, std::to_string(i * 100 + 60).c_str(),
std::to_string(i * 100 + 110).c_str(), kFileSize, 0, 200, 251);
}
UpdateVersionStorageInfo();
@ -899,14 +899,14 @@ TEST_F(CompactionPickerTest, UniversalIncrementalSpace5) {
// L3: (1101, 1180) (1201, 1280) ... (7901, 7908)
// L4: (1130, 1150) (1160, 1210) (1230, 1250) (1260 1310) ... (7960, 8010)
for (int i = 11; i < 70; i++) {
Add(3, 100 + i * 3, ToString(i * 100).c_str(),
ToString(i * 100 + 80).c_str(),
Add(3, 100 + i * 3, std::to_string(i * 100).c_str(),
std::to_string(i * 100 + 80).c_str(),
i % 10 == 9 ? kFileSize * 100 : kFileSize, 0, 200, 251);
Add(4, 100 + i * 3 + 1, ToString(i * 100 + 30).c_str(),
ToString(i * 100 + 50).c_str(), kFileSize, 0, 200, 251);
Add(4, 100 + i * 3 + 2, ToString(i * 100 + 60).c_str(),
ToString(i * 100 + 110).c_str(), kFileSize, 0, 200, 251);
Add(4, 100 + i * 3 + 1, std::to_string(i * 100 + 30).c_str(),
std::to_string(i * 100 + 50).c_str(), kFileSize, 0, 200, 251);
Add(4, 100 + i * 3 + 2, std::to_string(i * 100 + 60).c_str(),
std::to_string(i * 100 + 110).c_str(), kFileSize, 0, 200, 251);
}
UpdateVersionStorageInfo();
@ -941,8 +941,8 @@ TEST_F(CompactionPickerTest, NeedsCompactionFIFO) {
// size of L0 files.
for (int i = 1; i <= kFileCount; ++i) {
NewVersionStorage(1, kCompactionStyleFIFO);
Add(0, i, ToString((i + 100) * 1000).c_str(),
ToString((i + 100) * 1000 + 999).c_str(), kFileSize, 0, i * 100,
Add(0, i, std::to_string((i + 100) * 1000).c_str(),
std::to_string((i + 100) * 1000 + 999).c_str(), kFileSize, 0, i * 100,
i * 100 + 99);
UpdateVersionStorageInfo();
ASSERT_EQ(fifo_compaction_picker.NeedsCompaction(vstorage_.get()),
@ -2653,8 +2653,8 @@ TEST_F(CompactionPickerTest, UniversalMarkedManualCompaction) {
universal_compaction_picker.CompactRange(
cf_name_, mutable_cf_options_, mutable_db_options_, vstorage_.get(),
ColumnFamilyData::kCompactAllLevels, 6, CompactRangeOptions(),
nullptr, nullptr, &manual_end, &manual_conflict, port::kMaxUint64,
""));
nullptr, nullptr, &manual_end, &manual_conflict,
std::numeric_limits<uint64_t>::max(), ""));
ASSERT_TRUE(compaction);

View File

@ -1371,7 +1371,7 @@ Compaction* UniversalCompactionBuilder::PickPeriodicCompaction() {
uint64_t UniversalCompactionBuilder::GetMaxOverlappingBytes() const {
if (!mutable_cf_options_.compaction_options_universal.incremental) {
return port::kMaxUint64;
return std::numeric_limits<uint64_t>::max();
} else {
// Try to align cutting boundary with files at the next level if the
// file isn't end up with 1/2 of target size, or it would overlap

View File

@ -12,13 +12,16 @@ namespace ROCKSDB_NAMESPACE {
class MyTestCompactionService : public CompactionService {
public:
MyTestCompactionService(std::string db_path, Options& options,
std::shared_ptr<Statistics>& statistics)
MyTestCompactionService(
std::string db_path, Options& options,
std::shared_ptr<Statistics>& statistics,
std::vector<std::shared_ptr<EventListener>>& listeners)
: db_path_(std::move(db_path)),
options_(options),
statistics_(statistics),
start_info_("na", "na", "na", 0, Env::TOTAL),
wait_info_("na", "na", "na", 0, Env::TOTAL) {}
wait_info_("na", "na", "na", 0, Env::TOTAL),
listeners_(listeners) {}
static const char* kClassName() { return "MyTestCompactionService"; }
@ -71,9 +74,15 @@ class MyTestCompactionService : public CompactionService {
options_override.table_factory = options_.table_factory;
options_override.sst_partitioner_factory = options_.sst_partitioner_factory;
options_override.statistics = statistics_;
if (!listeners_.empty()) {
options_override.listeners = listeners_;
}
OpenAndCompactOptions options;
options.canceled = &canceled_;
Status s = DB::OpenAndCompact(
db_path_, db_path_ + "/" + ROCKSDB_NAMESPACE::ToString(info.job_id),
options, db_path_, db_path_ + "/" + std::to_string(info.job_id),
compaction_input, compaction_service_result, options_override);
if (is_override_wait_result_) {
*compaction_service_result = override_wait_result_;
@ -112,6 +121,8 @@ class MyTestCompactionService : public CompactionService {
is_override_wait_status_ = false;
}
void SetCanceled(bool canceled) { canceled_ = canceled; }
private:
InstrumentedMutex mutex_;
std::atomic_int compaction_num_{0};
@ -129,6 +140,8 @@ class MyTestCompactionService : public CompactionService {
CompactionServiceJobStatus::kFailure;
bool is_override_wait_result_ = false;
std::string override_wait_result_;
std::vector<std::shared_ptr<EventListener>> listeners_;
std::atomic_bool canceled_{false};
};
class CompactionServiceTest : public DBTestBase {
@ -144,7 +157,7 @@ class CompactionServiceTest : public DBTestBase {
compactor_statistics_ = CreateDBStatistics();
compaction_service_ = std::make_shared<MyTestCompactionService>(
dbname_, *options, compactor_statistics_);
dbname_, *options, compactor_statistics_, remote_listeners);
options->compaction_service = compaction_service_;
DestroyAndReopen(*options);
}
@ -163,7 +176,7 @@ class CompactionServiceTest : public DBTestBase {
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -173,7 +186,7 @@ class CompactionServiceTest : public DBTestBase {
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value_new" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value_new" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -185,13 +198,15 @@ class CompactionServiceTest : public DBTestBase {
for (int i = 0; i < 200; i++) {
auto result = Get(Key(i));
if (i % 2) {
ASSERT_EQ(result, "value" + ToString(i));
ASSERT_EQ(result, "value" + std::to_string(i));
} else {
ASSERT_EQ(result, "value_new" + ToString(i));
ASSERT_EQ(result, "value_new" + std::to_string(i));
}
}
}
std::vector<std::shared_ptr<EventListener>> remote_listeners;
private:
std::shared_ptr<Statistics> compactor_statistics_;
std::shared_ptr<Statistics> primary_statistics_;
@ -208,7 +223,7 @@ TEST_F(CompactionServiceTest, BasicCompactions) {
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -216,7 +231,7 @@ TEST_F(CompactionServiceTest, BasicCompactions) {
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value_new" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value_new" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -226,9 +241,9 @@ TEST_F(CompactionServiceTest, BasicCompactions) {
for (int i = 0; i < 200; i++) {
auto result = Get(Key(i));
if (i % 2) {
ASSERT_EQ(result, "value" + ToString(i));
ASSERT_EQ(result, "value" + std::to_string(i));
} else {
ASSERT_EQ(result, "value_new" + ToString(i));
ASSERT_EQ(result, "value_new" + std::to_string(i));
}
}
auto my_cs = GetCompactionService();
@ -265,7 +280,7 @@ TEST_F(CompactionServiceTest, BasicCompactions) {
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
s = Put(Key(key_id), "value_new" + ToString(key_id));
s = Put(Key(key_id), "value_new" + std::to_string(key_id));
if (s.IsAborted()) {
break;
}
@ -322,6 +337,51 @@ TEST_F(CompactionServiceTest, ManualCompaction) {
VerifyTestData();
}
TEST_F(CompactionServiceTest, CancelCompactionOnRemoteSide) {
Options options = CurrentOptions();
options.disable_auto_compactions = true;
ReopenWithCompactionService(&options);
GenerateTestData();
auto my_cs = GetCompactionService();
std::string start_str = Key(15);
std::string end_str = Key(45);
Slice start(start_str);
Slice end(end_str);
uint64_t comp_num = my_cs->GetCompactionNum();
// Test cancel compaction at the beginning
my_cs->SetCanceled(true);
auto s = db_->CompactRange(CompactRangeOptions(), &start, &end);
ASSERT_TRUE(s.IsIncomplete());
// compaction number is not increased
ASSERT_GE(my_cs->GetCompactionNum(), comp_num);
VerifyTestData();
// Test cancel compaction in progress
ReopenWithCompactionService(&options);
GenerateTestData();
my_cs = GetCompactionService();
my_cs->SetCanceled(false);
std::atomic_bool cancel_issued{false};
SyncPoint::GetInstance()->SetCallBack("CompactionJob::Run():Inprogress",
[&](void* /*arg*/) {
cancel_issued = true;
my_cs->SetCanceled(true);
});
SyncPoint::GetInstance()->EnableProcessing();
s = db_->CompactRange(CompactRangeOptions(), &start, &end);
ASSERT_TRUE(s.IsIncomplete());
ASSERT_TRUE(cancel_issued);
// compaction number is not increased
ASSERT_GE(my_cs->GetCompactionNum(), comp_num);
VerifyTestData();
}
TEST_F(CompactionServiceTest, FailedToStart) {
Options options = CurrentOptions();
options.disable_auto_compactions = true;
@ -407,7 +467,7 @@ TEST_F(CompactionServiceTest, CompactionFilter) {
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -415,7 +475,7 @@ TEST_F(CompactionServiceTest, CompactionFilter) {
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value_new" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value_new" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -429,9 +489,9 @@ TEST_F(CompactionServiceTest, CompactionFilter) {
if (i > 5 && i <= 105) {
ASSERT_EQ(result, "NOT_FOUND");
} else if (i % 2) {
ASSERT_EQ(result, "value" + ToString(i));
ASSERT_EQ(result, "value" + std::to_string(i));
} else {
ASSERT_EQ(result, "value_new" + ToString(i));
ASSERT_EQ(result, "value_new" + std::to_string(i));
}
}
auto my_cs = GetCompactionService();
@ -486,9 +546,9 @@ TEST_F(CompactionServiceTest, ConcurrentCompaction) {
for (int i = 0; i < 200; i++) {
auto result = Get(Key(i));
if (i % 2) {
ASSERT_EQ(result, "value" + ToString(i));
ASSERT_EQ(result, "value" + std::to_string(i));
} else {
ASSERT_EQ(result, "value_new" + ToString(i));
ASSERT_EQ(result, "value_new" + std::to_string(i));
}
}
auto my_cs = GetCompactionService();
@ -503,7 +563,7 @@ TEST_F(CompactionServiceTest, CompactionInfo) {
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -511,7 +571,7 @@ TEST_F(CompactionServiceTest, CompactionInfo) {
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value_new" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value_new" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -556,7 +616,7 @@ TEST_F(CompactionServiceTest, CompactionInfo) {
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -564,7 +624,7 @@ TEST_F(CompactionServiceTest, CompactionInfo) {
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value_new" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value_new" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -592,7 +652,7 @@ TEST_F(CompactionServiceTest, FallbackLocalAuto) {
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -600,7 +660,7 @@ TEST_F(CompactionServiceTest, FallbackLocalAuto) {
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value_new" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value_new" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -610,9 +670,9 @@ TEST_F(CompactionServiceTest, FallbackLocalAuto) {
for (int i = 0; i < 200; i++) {
auto result = Get(Key(i));
if (i % 2) {
ASSERT_EQ(result, "value" + ToString(i));
ASSERT_EQ(result, "value" + std::to_string(i));
} else {
ASSERT_EQ(result, "value_new" + ToString(i));
ASSERT_EQ(result, "value_new" + std::to_string(i));
}
}
@ -685,6 +745,88 @@ TEST_F(CompactionServiceTest, FallbackLocalManual) {
VerifyTestData();
}
TEST_F(CompactionServiceTest, RemoteEventListener) {
class RemoteEventListenerTest : public EventListener {
public:
const char* Name() const override { return "RemoteEventListenerTest"; }
void OnSubcompactionBegin(const SubcompactionJobInfo& info) override {
auto result = on_going_compactions.emplace(info.job_id);
ASSERT_TRUE(result.second); // make sure there's no duplication
compaction_num++;
EventListener::OnSubcompactionBegin(info);
}
void OnSubcompactionCompleted(const SubcompactionJobInfo& info) override {
auto num = on_going_compactions.erase(info.job_id);
ASSERT_TRUE(num == 1); // make sure the compaction id exists
EventListener::OnSubcompactionCompleted(info);
}
void OnTableFileCreated(const TableFileCreationInfo& info) override {
ASSERT_EQ(on_going_compactions.count(info.job_id), 1);
file_created++;
EventListener::OnTableFileCreated(info);
}
void OnTableFileCreationStarted(
const TableFileCreationBriefInfo& info) override {
ASSERT_EQ(on_going_compactions.count(info.job_id), 1);
file_creation_started++;
EventListener::OnTableFileCreationStarted(info);
}
bool ShouldBeNotifiedOnFileIO() override {
file_io_notified++;
return EventListener::ShouldBeNotifiedOnFileIO();
}
std::atomic_uint64_t file_io_notified{0};
std::atomic_uint64_t file_creation_started{0};
std::atomic_uint64_t file_created{0};
std::set<int> on_going_compactions; // store the job_id
std::atomic_uint64_t compaction_num{0};
};
auto listener = new RemoteEventListenerTest();
remote_listeners.emplace_back(listener);
Options options = CurrentOptions();
ReopenWithCompactionService(&options);
for (int i = 0; i < 20; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
for (int i = 0; i < 10; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value_new" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
ASSERT_OK(dbfull()->TEST_WaitForCompact());
// check the events are triggered
ASSERT_TRUE(listener->file_io_notified > 0);
ASSERT_TRUE(listener->file_creation_started > 0);
ASSERT_TRUE(listener->file_created > 0);
ASSERT_TRUE(listener->compaction_num > 0);
ASSERT_TRUE(listener->on_going_compactions.empty());
// verify result
for (int i = 0; i < 200; i++) {
auto result = Get(Key(i));
if (i % 2) {
ASSERT_EQ(result, "value" + std::to_string(i));
} else {
ASSERT_EQ(result, "value_new" + std::to_string(i));
}
}
}
} // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) {

View File

@ -397,7 +397,7 @@ TEST_P(ComparatorDBTest, DoubleComparator) {
for (uint32_t j = 0; j < divide_order; j++) {
to_divide *= 10.0;
}
source_strings.push_back(ToString(r / to_divide));
source_strings.push_back(std::to_string(r / to_divide));
}
DoRandomIteraratorTest(GetDB(), source_strings, &rnd, 200, 1000, 66);

View File

@ -40,7 +40,8 @@ Status VerifySstFileChecksum(const Options& options,
Status VerifySstFileChecksum(const Options& options,
const EnvOptions& env_options,
const ReadOptions& read_options,
const std::string& file_path) {
const std::string& file_path,
const SequenceNumber& largest_seqno) {
std::unique_ptr<FSRandomAccessFile> file;
uint64_t file_size;
InternalKeyComparator internal_comparator(options.comparator);
@ -61,12 +62,13 @@ Status VerifySstFileChecksum(const Options& options,
nullptr /* stats */, 0 /* hist_type */, nullptr /* file_read_hist */,
ioptions.rate_limiter.get()));
const bool kImmortal = true;
auto reader_options = TableReaderOptions(
ioptions, options.prefix_extractor, env_options, internal_comparator,
false /* skip_filters */, !kImmortal, false /* force_direct_prefetch */,
-1 /* level */);
reader_options.largest_seqno = largest_seqno;
s = ioptions.table_factory->NewTableReader(
TableReaderOptions(ioptions, options.prefix_extractor, env_options,
internal_comparator, false /* skip_filters */,
!kImmortal, false /* force_direct_prefetch */,
-1 /* level */),
std::move(file_reader), file_size, &table_reader,
reader_options, std::move(file_reader), file_size, &table_reader,
false /* prefetch_index_and_filter_in_cache */);
if (!s.ok()) {
return s;

View File

@ -26,6 +26,7 @@
#include "rocksdb/db.h"
#include "rocksdb/env.h"
#include "rocksdb/table.h"
#include "rocksdb/utilities/transaction_db.h"
#include "rocksdb/write_batch.h"
#include "table/block_based/block_based_table_builder.h"
#include "table/meta_blocks.h"
@ -275,6 +276,42 @@ class CorruptionTest : public testing::Test {
}
return Slice(*storage);
}
void GetSortedWalFiles(std::vector<uint64_t>& file_nums) {
std::vector<std::string> tmp_files;
ASSERT_OK(env_->GetChildren(dbname_, &tmp_files));
FileType type = kWalFile;
for (const auto& file : tmp_files) {
uint64_t number = 0;
if (ParseFileName(file, &number, &type) && type == kWalFile) {
file_nums.push_back(number);
}
}
std::sort(file_nums.begin(), file_nums.end());
}
void CorruptFileWithTruncation(FileType file, uint64_t number,
uint64_t bytes_to_truncate = 0) {
std::string path;
switch (file) {
case FileType::kWalFile:
path = LogFileName(dbname_, number);
break;
// TODO: Add other file types as this method is being used for those file
// types.
default:
return;
}
uint64_t old_size = 0;
ASSERT_OK(env_->GetFileSize(path, &old_size));
assert(old_size > bytes_to_truncate);
uint64_t new_size = old_size - bytes_to_truncate;
// If bytes_to_truncate == 0, it will do full truncation.
if (bytes_to_truncate == 0) {
new_size = 0;
}
ASSERT_OK(test::TruncateFile(env_, path, new_size));
}
};
TEST_F(CorruptionTest, Recovery) {
@ -300,6 +337,72 @@ TEST_F(CorruptionTest, Recovery) {
Check(36, 36);
}
TEST_F(CorruptionTest, PostPITRCorruptionWALsRetained) {
// Repro for bug where WALs following the point-in-time recovery were not
// retained leading to the next recovery failing.
CloseDb();
options_.wal_recovery_mode = WALRecoveryMode::kPointInTimeRecovery;
const std::string test_cf_name = "test_cf";
std::vector<ColumnFamilyDescriptor> cf_descs;
cf_descs.emplace_back(kDefaultColumnFamilyName, ColumnFamilyOptions());
cf_descs.emplace_back(test_cf_name, ColumnFamilyOptions());
uint64_t log_num;
{
options_.create_missing_column_families = true;
std::vector<ColumnFamilyHandle*> cfhs;
ASSERT_OK(DB::Open(options_, dbname_, cf_descs, &cfhs, &db_));
assert(db_ != nullptr); // suppress false clang-analyze report
ASSERT_OK(db_->Put(WriteOptions(), cfhs[0], "k", "v"));
ASSERT_OK(db_->Put(WriteOptions(), cfhs[1], "k", "v"));
ASSERT_OK(db_->Put(WriteOptions(), cfhs[0], "k2", "v2"));
std::vector<uint64_t> file_nums;
GetSortedWalFiles(file_nums);
log_num = file_nums.back();
for (auto* cfh : cfhs) {
delete cfh;
}
CloseDb();
}
CorruptFileWithTruncation(FileType::kWalFile, log_num,
/*bytes_to_truncate=*/1);
{
// Recover "k" -> "v" for both CFs. "k2" -> "v2" is lost due to truncation.
options_.avoid_flush_during_recovery = true;
std::vector<ColumnFamilyHandle*> cfhs;
ASSERT_OK(DB::Open(options_, dbname_, cf_descs, &cfhs, &db_));
assert(db_ != nullptr); // suppress false clang-analyze report
// Flush one but not both CFs and write some data so there's a seqno gap
// between the PITR corruption and the next DB session's first WAL.
ASSERT_OK(db_->Put(WriteOptions(), cfhs[1], "k2", "v2"));
ASSERT_OK(db_->Flush(FlushOptions(), cfhs[1]));
for (auto* cfh : cfhs) {
delete cfh;
}
CloseDb();
}
// With the bug, this DB open would remove the WALs following the PITR
// corruption. Then, the next recovery would fail.
for (int i = 0; i < 2; ++i) {
std::vector<ColumnFamilyHandle*> cfhs;
ASSERT_OK(DB::Open(options_, dbname_, cf_descs, &cfhs, &db_));
assert(db_ != nullptr); // suppress false clang-analyze report
for (auto* cfh : cfhs) {
delete cfh;
}
CloseDb();
}
}
TEST_F(CorruptionTest, RecoverWriteError) {
env_->writable_file_error_ = true;
Status s = TryReopen();
@ -912,6 +1015,480 @@ TEST_F(CorruptionTest, VerifyWholeTableChecksum) {
ASSERT_EQ(1, count);
}
class CrashDuringRecoveryWithCorruptionTest
: public CorruptionTest,
public testing::WithParamInterface<std::tuple<bool, bool>> {
public:
explicit CrashDuringRecoveryWithCorruptionTest()
: CorruptionTest(),
avoid_flush_during_recovery_(std::get<0>(GetParam())),
track_and_verify_wals_in_manifest_(std::get<1>(GetParam())) {}
protected:
const bool avoid_flush_during_recovery_;
const bool track_and_verify_wals_in_manifest_;
};
INSTANTIATE_TEST_CASE_P(CorruptionTest, CrashDuringRecoveryWithCorruptionTest,
::testing::Values(std::make_tuple(true, false),
std::make_tuple(false, false),
std::make_tuple(true, true),
std::make_tuple(false, true)));
// In case of non-TransactionDB with avoid_flush_during_recovery = true, RocksDB
// won't flush the data from WAL to L0 for all column families if possible. As a
// result, not all column families can increase their log_numbers, and
// min_log_number_to_keep won't change.
// It may prematurely persist a new MANIFEST even before we can declare the DB
// is in consistent state after recovery (this is when the new WAL is synced)
// and advances log_numbers for some column families.
//
// If there is power failure before we sync the new WAL, we will end up in
// a situation in which after persisting the MANIFEST, RocksDB will see some
// column families' log_numbers larger than the corrupted wal, and
// "Column family inconsistency: SST file contains data beyond the point of
// corruption" error will be hit, causing recovery to fail.
//
// After adding the fix, only after new WAL is synced, RocksDB persist a new
// MANIFEST with column families to ensure RocksDB is in consistent state.
// RocksDB writes an empty WriteBatch as a sentinel to the new WAL which is
// synced immediately afterwards. The sequence number of the sentinel
// WriteBatch will be the next sequence number immediately after the largest
// sequence number recovered from previous WALs and MANIFEST because of which DB
// will be in consistent state.
// If a future recovery starts from the new MANIFEST, then it means the new WAL
// is successfully synced. Due to the sentinel empty write batch at the
// beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
// If future recovery starts from the old MANIFEST, it means the writing the new
// MANIFEST failed. It won't have the "SST ahead of WAL" error.
//
// The combination of corrupting a WAL and injecting an error during subsequent
// re-open exposes the bug of prematurely persisting a new MANIFEST with
// advanced ColumnFamilyData::log_number.
TEST_P(CrashDuringRecoveryWithCorruptionTest, DISABLED_CrashDuringRecovery) {
CloseDb();
Options options;
options.track_and_verify_wals_in_manifest =
track_and_verify_wals_in_manifest_;
options.wal_recovery_mode = WALRecoveryMode::kPointInTimeRecovery;
options.avoid_flush_during_recovery = false;
options.env = env_;
ASSERT_OK(DestroyDB(dbname_, options));
options.create_if_missing = true;
options.max_write_buffer_number = 8;
Reopen(&options);
Status s;
const std::string test_cf_name = "test_cf";
ColumnFamilyHandle* cfh = nullptr;
s = db_->CreateColumnFamily(options, test_cf_name, &cfh);
ASSERT_OK(s);
delete cfh;
CloseDb();
std::vector<ColumnFamilyDescriptor> cf_descs;
cf_descs.emplace_back(kDefaultColumnFamilyName, options);
cf_descs.emplace_back(test_cf_name, options);
std::vector<ColumnFamilyHandle*> handles;
// 1. Open and populate the DB. Write and flush default_cf several times to
// advance wal number so that some column families have advanced log_number
// while other don't.
{
ASSERT_OK(DB::Open(options, dbname_, cf_descs, &handles, &db_));
auto* dbimpl = static_cast_with_check<DBImpl>(db_);
assert(dbimpl);
// Write one key to test_cf.
ASSERT_OK(db_->Put(WriteOptions(), handles[1], "old_key", "dontcare"));
ASSERT_OK(db_->Flush(FlushOptions(), handles[1]));
// Write to default_cf and flush this cf several times to advance wal
// number. TEST_SwitchMemtable makes sure WALs are not synced and test can
// corrupt un-sync WAL.
for (int i = 0; i < 2; ++i) {
ASSERT_OK(db_->Put(WriteOptions(), "key" + std::to_string(i), "value"));
ASSERT_OK(dbimpl->TEST_SwitchMemtable());
}
for (auto* h : handles) {
delete h;
}
handles.clear();
CloseDb();
}
// 2. Corrupt second last un-syned wal file to emulate power reset which
// caused the DB to lose the un-synced WAL.
{
std::vector<uint64_t> file_nums;
GetSortedWalFiles(file_nums);
size_t size = file_nums.size();
assert(size >= 2);
uint64_t log_num = file_nums[size - 2];
CorruptFileWithTruncation(FileType::kWalFile, log_num,
/*bytes_to_truncate=*/8);
}
// 3. After first crash reopen the DB which contains corrupted WAL. Default
// family has higher log number than corrupted wal number.
//
// Case1: If avoid_flush_during_recovery = true, RocksDB won't flush the data
// from WAL to L0 for all column families (test_cf_name in this case). As a
// result, not all column families can increase their log_numbers, and
// min_log_number_to_keep won't change.
//
// Case2: If avoid_flush_during_recovery = false, all column families have
// flushed their data from WAL to L0 during recovery, and none of them will
// ever need to read the WALs again.
// 4. Fault is injected to fail the recovery.
{
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->SetCallBack(
"DBImpl::GetLogSizeAndMaybeTruncate:0", [&](void* arg) {
auto* tmp_s = reinterpret_cast<Status*>(arg);
assert(tmp_s);
*tmp_s = Status::IOError("Injected");
});
SyncPoint::GetInstance()->EnableProcessing();
handles.clear();
options.avoid_flush_during_recovery = true;
s = DB::Open(options, dbname_, cf_descs, &handles, &db_);
ASSERT_TRUE(s.IsIOError());
ASSERT_EQ("IO error: Injected", s.ToString());
for (auto* h : handles) {
delete h;
}
CloseDb();
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
}
// 5. After second crash reopen the db with second corruption. Default family
// has higher log number than corrupted wal number.
//
// Case1: If avoid_flush_during_recovery = true, we persist a new
// MANIFEST with advanced log_numbers for some column families only after
// syncing the WAL. So during second crash, RocksDB will skip the corrupted
// WAL files as they have been moved to different folder. Since newly synced
// WAL file's sequence number (sentinel WriteBatch) will be the next
// sequence number immediately after the largest sequence number recovered
// from previous WALs and MANIFEST, db will be in consistent state and opens
// successfully.
//
// Case2: If avoid_flush_during_recovery = false, the corrupted WAL is below
// this number. So during a second crash after persisting the new MANIFEST,
// RocksDB will skip the corrupted WAL(s) because they are all below this
// bound. Therefore, we won't hit the "column family inconsistency" error
// message.
{
options.avoid_flush_during_recovery = avoid_flush_during_recovery_;
ASSERT_OK(DB::Open(options, dbname_, cf_descs, &handles, &db_));
for (auto* h : handles) {
delete h;
}
handles.clear();
CloseDb();
}
}
// In case of TransactionDB, it enables two-phase-commit. The prepare section of
// an uncommitted transaction always need to be kept. Even if we perform flush
// during recovery, we may still need to hold an old WAL. The
// min_log_number_to_keep won't change, and "Column family inconsistency: SST
// file contains data beyond the point of corruption" error will be hit, causing
// recovery to fail.
//
// After adding the fix, only after new WAL is synced, RocksDB persist a new
// MANIFEST with column families to ensure RocksDB is in consistent state.
// RocksDB writes an empty WriteBatch as a sentinel to the new WAL which is
// synced immediately afterwards. The sequence number of the sentinel
// WriteBatch will be the next sequence number immediately after the largest
// sequence number recovered from previous WALs and MANIFEST because of which DB
// will be in consistent state.
// If a future recovery starts from the new MANIFEST, then it means the new WAL
// is successfully synced. Due to the sentinel empty write batch at the
// beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
// If future recovery starts from the old MANIFEST, it means the writing the new
// MANIFEST failed. It won't have the "SST ahead of WAL" error.
//
// The combination of corrupting a WAL and injecting an error during subsequent
// re-open exposes the bug of prematurely persisting a new MANIFEST with
// advanced ColumnFamilyData::log_number.
TEST_P(CrashDuringRecoveryWithCorruptionTest,
DISABLED_TxnDbCrashDuringRecovery) {
CloseDb();
Options options;
options.wal_recovery_mode = WALRecoveryMode::kPointInTimeRecovery;
options.track_and_verify_wals_in_manifest =
track_and_verify_wals_in_manifest_;
options.avoid_flush_during_recovery = false;
options.env = env_;
ASSERT_OK(DestroyDB(dbname_, options));
options.create_if_missing = true;
options.max_write_buffer_number = 3;
Reopen(&options);
// Create cf test_cf_name.
ColumnFamilyHandle* cfh = nullptr;
const std::string test_cf_name = "test_cf";
Status s = db_->CreateColumnFamily(options, test_cf_name, &cfh);
ASSERT_OK(s);
delete cfh;
CloseDb();
std::vector<ColumnFamilyDescriptor> cf_descs;
cf_descs.emplace_back(kDefaultColumnFamilyName, options);
cf_descs.emplace_back(test_cf_name, options);
std::vector<ColumnFamilyHandle*> handles;
TransactionDB* txn_db = nullptr;
TransactionDBOptions txn_db_opts;
// 1. Open and populate the DB. Write and flush default_cf several times to
// advance wal number so that some column families have advanced log_number
// while other don't.
{
ASSERT_OK(TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs,
&handles, &txn_db));
auto* txn = txn_db->BeginTransaction(WriteOptions(), TransactionOptions());
// Put cf1
ASSERT_OK(txn->Put(handles[1], "foo", "value"));
ASSERT_OK(txn->SetName("txn0"));
ASSERT_OK(txn->Prepare());
ASSERT_OK(txn_db->Flush(FlushOptions()));
delete txn;
txn = nullptr;
auto* dbimpl = static_cast_with_check<DBImpl>(txn_db->GetRootDB());
assert(dbimpl);
// Put and flush cf0
for (int i = 0; i < 2; ++i) {
ASSERT_OK(txn_db->Put(WriteOptions(), "dontcare", "value"));
ASSERT_OK(dbimpl->TEST_SwitchMemtable());
}
// Put cf1
txn = txn_db->BeginTransaction(WriteOptions(), TransactionOptions());
ASSERT_OK(txn->Put(handles[1], "foo1", "value"));
ASSERT_OK(txn->Commit());
delete txn;
txn = nullptr;
for (auto* h : handles) {
delete h;
}
handles.clear();
delete txn_db;
}
// 2. Corrupt second last wal to emulate power reset which caused the DB to
// lose the un-synced WAL.
{
std::vector<uint64_t> file_nums;
GetSortedWalFiles(file_nums);
size_t size = file_nums.size();
assert(size >= 2);
uint64_t log_num = file_nums[size - 2];
CorruptFileWithTruncation(FileType::kWalFile, log_num,
/*bytes_to_truncate=*/8);
}
// 3. After first crash reopen the DB which contains corrupted WAL. Default
// family has higher log number than corrupted wal number. There may be old
// WAL files that it must not delete because they can contain data of
// uncommitted transactions. As a result, min_log_number_to_keep won't change.
{
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->SetCallBack(
"DBImpl::Open::BeforeSyncWAL", [&](void* arg) {
auto* tmp_s = reinterpret_cast<Status*>(arg);
assert(tmp_s);
*tmp_s = Status::IOError("Injected");
});
SyncPoint::GetInstance()->EnableProcessing();
handles.clear();
s = TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles,
&txn_db);
ASSERT_TRUE(s.IsIOError());
ASSERT_EQ("IO error: Injected", s.ToString());
for (auto* h : handles) {
delete h;
}
CloseDb();
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
}
// 4. Corrupt max_wal_num.
{
std::vector<uint64_t> file_nums;
GetSortedWalFiles(file_nums);
size_t size = file_nums.size();
assert(size >= 2);
uint64_t log_num = file_nums[size - 1];
CorruptFileWithTruncation(FileType::kWalFile, log_num);
}
// 5. After second crash reopen the db with second corruption. Default family
// has higher log number than corrupted wal number.
// We persist a new MANIFEST with advanced log_numbers for some column
// families only after syncing the WAL. So during second crash, RocksDB will
// skip the corrupted WAL files as they have been moved to different folder.
// Since newly synced WAL file's sequence number (sentinel WriteBatch) will be
// the next sequence number immediately after the largest sequence number
// recovered from previous WALs and MANIFEST, db will be in consistent state
// and opens successfully.
{
ASSERT_OK(TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs,
&handles, &txn_db));
for (auto* h : handles) {
delete h;
}
delete txn_db;
}
}
// This test is similar to
// CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery except it calls
// flush and corrupts Last WAL. It calls flush to sync some of the WALs and
// remaining are unsyned one of which is then corrupted to simulate crash.
//
// In case of non-TransactionDB with avoid_flush_during_recovery = true, RocksDB
// won't flush the data from WAL to L0 for all column families if possible. As a
// result, not all column families can increase their log_numbers, and
// min_log_number_to_keep won't change.
// It may prematurely persist a new MANIFEST even before we can declare the DB
// is in consistent state after recovery (this is when the new WAL is synced)
// and advances log_numbers for some column families.
//
// If there is power failure before we sync the new WAL, we will end up in
// a situation in which after persisting the MANIFEST, RocksDB will see some
// column families' log_numbers larger than the corrupted wal, and
// "Column family inconsistency: SST file contains data beyond the point of
// corruption" error will be hit, causing recovery to fail.
//
// After adding the fix, only after new WAL is synced, RocksDB persist a new
// MANIFEST with column families to ensure RocksDB is in consistent state.
// RocksDB writes an empty WriteBatch as a sentinel to the new WAL which is
// synced immediately afterwards. The sequence number of the sentinel
// WriteBatch will be the next sequence number immediately after the largest
// sequence number recovered from previous WALs and MANIFEST because of which DB
// will be in consistent state.
// If a future recovery starts from the new MANIFEST, then it means the new WAL
// is successfully synced. Due to the sentinel empty write batch at the
// beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
// If future recovery starts from the old MANIFEST, it means the writing the new
// MANIFEST failed. It won't have the "SST ahead of WAL" error.
// The combination of corrupting a WAL and injecting an error during subsequent
// re-open exposes the bug of prematurely persisting a new MANIFEST with
// advanced ColumnFamilyData::log_number.
TEST_P(CrashDuringRecoveryWithCorruptionTest,
DISABLED_CrashDuringRecoveryWithFlush) {
CloseDb();
Options options;
options.wal_recovery_mode = WALRecoveryMode::kPointInTimeRecovery;
options.avoid_flush_during_recovery = false;
options.env = env_;
options.create_if_missing = true;
ASSERT_OK(DestroyDB(dbname_, options));
Reopen(&options);
ColumnFamilyHandle* cfh = nullptr;
const std::string test_cf_name = "test_cf";
Status s = db_->CreateColumnFamily(options, test_cf_name, &cfh);
ASSERT_OK(s);
delete cfh;
CloseDb();
std::vector<ColumnFamilyDescriptor> cf_descs;
cf_descs.emplace_back(kDefaultColumnFamilyName, options);
cf_descs.emplace_back(test_cf_name, options);
std::vector<ColumnFamilyHandle*> handles;
{
ASSERT_OK(DB::Open(options, dbname_, cf_descs, &handles, &db_));
// Write one key to test_cf.
ASSERT_OK(db_->Put(WriteOptions(), handles[1], "old_key", "dontcare"));
// Write to default_cf and flush this cf several times to advance wal
// number.
for (int i = 0; i < 2; ++i) {
ASSERT_OK(db_->Put(WriteOptions(), "key" + std::to_string(i), "value"));
ASSERT_OK(db_->Flush(FlushOptions()));
}
ASSERT_OK(db_->Put(WriteOptions(), handles[1], "dontcare", "dontcare"));
for (auto* h : handles) {
delete h;
}
handles.clear();
CloseDb();
}
// Corrupt second last un-syned wal file to emulate power reset which
// caused the DB to lose the un-synced WAL.
{
std::vector<uint64_t> file_nums;
GetSortedWalFiles(file_nums);
size_t size = file_nums.size();
uint64_t log_num = file_nums[size - 1];
CorruptFileWithTruncation(FileType::kWalFile, log_num,
/*bytes_to_truncate=*/8);
}
// Fault is injected to fail the recovery.
{
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->SetCallBack(
"DBImpl::GetLogSizeAndMaybeTruncate:0", [&](void* arg) {
auto* tmp_s = reinterpret_cast<Status*>(arg);
assert(tmp_s);
*tmp_s = Status::IOError("Injected");
});
SyncPoint::GetInstance()->EnableProcessing();
handles.clear();
options.avoid_flush_during_recovery = true;
s = DB::Open(options, dbname_, cf_descs, &handles, &db_);
ASSERT_TRUE(s.IsIOError());
ASSERT_EQ("IO error: Injected", s.ToString());
for (auto* h : handles) {
delete h;
}
CloseDb();
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
}
// Reopen db again
{
options.avoid_flush_during_recovery = avoid_flush_during_recovery_;
ASSERT_OK(DB::Open(options, dbname_, cf_descs, &handles, &db_));
for (auto* h : handles) {
delete h;
}
}
}
} // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) {

View File

@ -95,8 +95,8 @@ class CuckooTableDBTest : public testing::Test {
int NumTableFilesAtLevel(int level) {
std::string property;
EXPECT_TRUE(db_->GetProperty("rocksdb.num-files-at-level" + ToString(level),
&property));
EXPECT_TRUE(db_->GetProperty(
"rocksdb.num-files-at-level" + std::to_string(level), &property));
return atoi(property.c_str());
}

View File

@ -175,7 +175,10 @@ TEST_F(DBBasicTest, ReadOnlyDB) {
ASSERT_TRUE(db_->SyncWAL().IsNotSupported());
}
TEST_F(DBBasicTest, ReadOnlyDBWithWriteDBIdToManifestSet) {
// TODO akanksha: Update the test to check that combination
// does not actually write to FS (use open read-only with
// CompositeEnvWrapper+ReadOnlyFileSystem).
TEST_F(DBBasicTest, DISABLED_ReadOnlyDBWithWriteDBIdToManifestSet) {
ASSERT_OK(Put("foo", "v1"));
ASSERT_OK(Put("bar", "v2"));
ASSERT_OK(Put("foo", "v3"));
@ -3780,7 +3783,7 @@ TEST_P(DBBasicTestDeadline, PointLookupDeadline) {
Random rnd(301);
for (int i = 0; i < 400; ++i) {
std::string key = "k" + ToString(i);
std::string key = "k" + std::to_string(i);
ASSERT_OK(Put(key, rnd.RandomString(100)));
}
ASSERT_OK(Flush());
@ -3863,7 +3866,7 @@ TEST_P(DBBasicTestDeadline, IteratorDeadline) {
Random rnd(301);
for (int i = 0; i < 400; ++i) {
std::string key = "k" + ToString(i);
std::string key = "k" + std::to_string(i);
ASSERT_OK(Put(key, rnd.RandomString(100)));
}
ASSERT_OK(Flush());

View File

@ -13,6 +13,7 @@
#include "cache/cache_entry_roles.h"
#include "cache/cache_key.h"
#include "cache/fast_lru_cache.h"
#include "cache/lru_cache.h"
#include "db/column_family.h"
#include "db/db_impl/db_impl.h"
@ -75,7 +76,7 @@ class DBBlockCacheTest : public DBTestBase {
void InitTable(const Options& /*options*/) {
std::string value(kValueSize, 'a');
for (size_t i = 0; i < kNumBlocks; i++) {
ASSERT_OK(Put(ToString(i), value.c_str()));
ASSERT_OK(Put(std::to_string(i), value.c_str()));
}
}
@ -204,7 +205,7 @@ TEST_F(DBBlockCacheTest, IteratorBlockCacheUsage) {
ASSERT_EQ(0, cache->GetUsage());
iter = db_->NewIterator(read_options);
iter->Seek(ToString(0));
iter->Seek(std::to_string(0));
ASSERT_LT(0, cache->GetUsage());
delete iter;
iter = nullptr;
@ -235,7 +236,7 @@ TEST_F(DBBlockCacheTest, TestWithoutCompressedBlockCache) {
// Load blocks into cache.
for (size_t i = 0; i + 1 < kNumBlocks; i++) {
iter = db_->NewIterator(read_options);
iter->Seek(ToString(i));
iter->Seek(std::to_string(i));
ASSERT_OK(iter->status());
CheckCacheCounters(options, 1, 0, 1, 0);
iterators[i].reset(iter);
@ -248,7 +249,7 @@ TEST_F(DBBlockCacheTest, TestWithoutCompressedBlockCache) {
// Test with strict capacity limit.
cache->SetStrictCapacityLimit(true);
iter = db_->NewIterator(read_options);
iter->Seek(ToString(kNumBlocks - 1));
iter->Seek(std::to_string(kNumBlocks - 1));
ASSERT_TRUE(iter->status().IsIncomplete());
CheckCacheCounters(options, 1, 0, 0, 1);
delete iter;
@ -262,7 +263,7 @@ TEST_F(DBBlockCacheTest, TestWithoutCompressedBlockCache) {
ASSERT_EQ(0, cache->GetPinnedUsage());
for (size_t i = 0; i + 1 < kNumBlocks; i++) {
iter = db_->NewIterator(read_options);
iter->Seek(ToString(i));
iter->Seek(std::to_string(i));
ASSERT_OK(iter->status());
CheckCacheCounters(options, 0, 1, 0, 0);
iterators[i].reset(iter);
@ -288,7 +289,7 @@ TEST_F(DBBlockCacheTest, TestWithCompressedBlockCache) {
std::string value(kValueSize, 'a');
for (size_t i = 0; i < kNumBlocks; i++) {
ASSERT_OK(Put(ToString(i), value));
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Flush());
}
@ -312,7 +313,7 @@ TEST_F(DBBlockCacheTest, TestWithCompressedBlockCache) {
// Load blocks into cache.
for (size_t i = 0; i < kNumBlocks - 1; i++) {
ASSERT_EQ(value, Get(ToString(i)));
ASSERT_EQ(value, Get(std::to_string(i)));
CheckCacheCounters(options, 1, 0, 1, 0);
CheckCompressedCacheCounters(options, 1, 0, 1, 0);
}
@ -333,7 +334,7 @@ TEST_F(DBBlockCacheTest, TestWithCompressedBlockCache) {
// Load last key block.
ASSERT_EQ("Result incomplete: Insert failed due to LRU cache being full.",
Get(ToString(kNumBlocks - 1)));
Get(std::to_string(kNumBlocks - 1)));
// Failure will also record the miss counter.
CheckCacheCounters(options, 1, 0, 0, 1);
CheckCompressedCacheCounters(options, 1, 0, 1, 0);
@ -342,7 +343,7 @@ TEST_F(DBBlockCacheTest, TestWithCompressedBlockCache) {
// cache and load into block cache.
cache->SetStrictCapacityLimit(false);
// Load last key block.
ASSERT_EQ(value, Get(ToString(kNumBlocks - 1)));
ASSERT_EQ(value, Get(std::to_string(kNumBlocks - 1)));
CheckCacheCounters(options, 1, 0, 1, 0);
CheckCompressedCacheCounters(options, 0, 1, 0, 0);
}
@ -567,7 +568,7 @@ TEST_F(DBBlockCacheTest, FillCacheAndIterateDB) {
Iterator* iter = nullptr;
iter = db_->NewIterator(read_options);
iter->Seek(ToString(0));
iter->Seek(std::to_string(0));
while (iter->Valid()) {
iter->Next();
}
@ -645,10 +646,10 @@ TEST_F(DBBlockCacheTest, WarmCacheWithDataBlocksDuringFlush) {
std::string value(kValueSize, 'a');
for (size_t i = 1; i <= kNumBlocks; i++) {
ASSERT_OK(Put(ToString(i), value));
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Flush());
ASSERT_EQ(i, options.statistics->getTickerCount(BLOCK_CACHE_DATA_ADD));
ASSERT_EQ(value, Get(ToString(i)));
ASSERT_EQ(value, Get(std::to_string(i)));
ASSERT_EQ(0, options.statistics->getTickerCount(BLOCK_CACHE_DATA_MISS));
ASSERT_EQ(i, options.statistics->getTickerCount(BLOCK_CACHE_DATA_HIT));
}
@ -705,7 +706,7 @@ TEST_P(DBBlockCacheTest1, WarmCacheWithBlocksDuringFlush) {
std::string value(kValueSize, 'a');
for (size_t i = 1; i <= kNumBlocks; i++) {
ASSERT_OK(Put(ToString(i), value));
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Flush());
ASSERT_EQ(i, options.statistics->getTickerCount(BLOCK_CACHE_DATA_ADD));
if (filter_type == 1) {
@ -717,7 +718,7 @@ TEST_P(DBBlockCacheTest1, WarmCacheWithBlocksDuringFlush) {
ASSERT_EQ(i, options.statistics->getTickerCount(BLOCK_CACHE_INDEX_ADD));
ASSERT_EQ(i, options.statistics->getTickerCount(BLOCK_CACHE_FILTER_ADD));
}
ASSERT_EQ(value, Get(ToString(i)));
ASSERT_EQ(value, Get(std::to_string(i)));
ASSERT_EQ(0, options.statistics->getTickerCount(BLOCK_CACHE_DATA_MISS));
ASSERT_EQ(i, options.statistics->getTickerCount(BLOCK_CACHE_DATA_HIT));
@ -772,12 +773,12 @@ TEST_F(DBBlockCacheTest, DynamicallyWarmCacheDuringFlush) {
std::string value(kValueSize, 'a');
for (size_t i = 1; i <= 5; i++) {
ASSERT_OK(Put(ToString(i), value));
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Flush());
ASSERT_EQ(1,
options.statistics->getAndResetTickerCount(BLOCK_CACHE_DATA_ADD));
ASSERT_EQ(value, Get(ToString(i)));
ASSERT_EQ(value, Get(std::to_string(i)));
ASSERT_EQ(0,
options.statistics->getAndResetTickerCount(BLOCK_CACHE_DATA_ADD));
ASSERT_EQ(
@ -790,12 +791,12 @@ TEST_F(DBBlockCacheTest, DynamicallyWarmCacheDuringFlush) {
{{"block_based_table_factory", "{prepopulate_block_cache=kDisable;}"}}));
for (size_t i = 6; i <= kNumBlocks; i++) {
ASSERT_OK(Put(ToString(i), value));
ASSERT_OK(Put(std::to_string(i), value));
ASSERT_OK(Flush());
ASSERT_EQ(0,
options.statistics->getAndResetTickerCount(BLOCK_CACHE_DATA_ADD));
ASSERT_EQ(value, Get(ToString(i)));
ASSERT_EQ(value, Get(std::to_string(i)));
ASSERT_EQ(1,
options.statistics->getAndResetTickerCount(BLOCK_CACHE_DATA_ADD));
ASSERT_EQ(
@ -934,7 +935,8 @@ TEST_F(DBBlockCacheTest, AddRedundantStats) {
int iterations_tested = 0;
for (std::shared_ptr<Cache> base_cache :
{NewLRUCache(capacity, num_shard_bits),
NewClockCache(capacity, num_shard_bits)}) {
NewClockCache(capacity, num_shard_bits),
NewFastLRUCache(capacity, num_shard_bits)}) {
if (!base_cache) {
// Skip clock cache when not supported
continue;
@ -1288,7 +1290,8 @@ TEST_F(DBBlockCacheTest, CacheEntryRoleStats) {
int iterations_tested = 0;
for (bool partition : {false, true}) {
for (std::shared_ptr<Cache> cache :
{NewLRUCache(capacity), NewClockCache(capacity)}) {
{NewLRUCache(capacity), NewClockCache(capacity),
NewFastLRUCache(capacity)}) {
if (!cache) {
// Skip clock cache when not supported
continue;
@ -1404,21 +1407,11 @@ TEST_F(DBBlockCacheTest, CacheEntryRoleStats) {
ASSERT_TRUE(
db_->GetMapProperty(DB::Properties::kBlockCacheEntryStats, &values));
EXPECT_EQ(
ToString(expected[static_cast<size_t>(CacheEntryRole::kIndexBlock)]),
values["count.index-block"]);
EXPECT_EQ(
ToString(expected[static_cast<size_t>(CacheEntryRole::kDataBlock)]),
values["count.data-block"]);
EXPECT_EQ(
ToString(expected[static_cast<size_t>(CacheEntryRole::kFilterBlock)]),
values["count.filter-block"]);
EXPECT_EQ(
ToString(
prev_expected[static_cast<size_t>(CacheEntryRole::kWriteBuffer)]),
values["count.write-buffer"]);
EXPECT_EQ(ToString(expected[static_cast<size_t>(CacheEntryRole::kMisc)]),
values["count.misc"]);
for (size_t i = 0; i < kNumCacheEntryRoles; ++i) {
auto role = static_cast<CacheEntryRole>(i);
EXPECT_EQ(std::to_string(expected[i]),
values[BlockCacheEntryStatsMapKeys::EntryCount(role)]);
}
// Add one for kWriteBuffer
{
@ -1429,18 +1422,20 @@ TEST_F(DBBlockCacheTest, CacheEntryRoleStats) {
// re-scanning stats, but not totally aggressive.
// Within some time window, we will get cached entry stats
env_->MockSleepForSeconds(1);
EXPECT_EQ(ToString(prev_expected[static_cast<size_t>(
EXPECT_EQ(std::to_string(prev_expected[static_cast<size_t>(
CacheEntryRole::kWriteBuffer)]),
values["count.write-buffer"]);
values[BlockCacheEntryStatsMapKeys::EntryCount(
CacheEntryRole::kWriteBuffer)]);
// Not enough for a "background" miss but enough for a "foreground" miss
env_->MockSleepForSeconds(45);
ASSERT_TRUE(db_->GetMapProperty(DB::Properties::kBlockCacheEntryStats,
&values));
EXPECT_EQ(
ToString(
std::to_string(
expected[static_cast<size_t>(CacheEntryRole::kWriteBuffer)]),
values["count.write-buffer"]);
values[BlockCacheEntryStatsMapKeys::EntryCount(
CacheEntryRole::kWriteBuffer)]);
}
prev_expected = expected;
@ -1645,7 +1640,7 @@ TEST_P(DBBlockCacheKeyTest, StableCacheKeys) {
SstFileWriter sst_file_writer(EnvOptions(), options);
std::vector<std::string> external;
for (int i = 0; i < 2; ++i) {
std::string f = dbname_ + "/external" + ToString(i) + ".sst";
std::string f = dbname_ + "/external" + std::to_string(i) + ".sst";
external.push_back(f);
ASSERT_OK(sst_file_writer.Open(f));
ASSERT_OK(sst_file_writer.Put(Key(key_count), "abc"));
@ -1729,7 +1724,7 @@ class CacheKeyTest : public testing::Test {
// Like SemiStructuredUniqueIdGen::GenerateNext
tp_.db_session_id = EncodeSessionId(base_session_upper_,
base_session_lower_ ^ session_counter_);
tp_.db_id = ToString(db_id_);
tp_.db_id = std::to_string(db_id_);
tp_.orig_file_number = file_number_;
bool is_stable;
std::string cur_session_id = ""; // ignored

View File

@ -17,8 +17,11 @@
#include "db/db_test_util.h"
#include "options/options_helper.h"
#include "port/stack_trace.h"
#include "rocksdb/advanced_options.h"
#include "rocksdb/convenience.h"
#include "rocksdb/filter_policy.h"
#include "rocksdb/perf_context.h"
#include "rocksdb/statistics.h"
#include "rocksdb/table.h"
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/filter_policy_internal.h"
@ -108,6 +111,7 @@ TEST_P(DBBloomFilterTestDefFormatVersion, KeyMayExist) {
options_override.filter_policy = Create(20, bfp_impl_);
options_override.partition_filters = partition_filters_;
options_override.metadata_block_size = 32;
options_override.full_block_cache = true;
Options options = CurrentOptions(options_override);
if (partition_filters_) {
auto* table_options =
@ -798,83 +802,87 @@ TEST_F(DBBloomFilterTest, BloomFilterRate) {
}
}
namespace {
struct CompatibilityConfig {
std::shared_ptr<const FilterPolicy> policy;
bool partitioned;
uint32_t format_version;
void SetInTableOptions(BlockBasedTableOptions* table_options) {
table_options->filter_policy = policy;
table_options->partition_filters = partitioned;
if (partitioned) {
table_options->index_type =
BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch;
} else {
table_options->index_type =
BlockBasedTableOptions::IndexType::kBinarySearch;
}
table_options->format_version = format_version;
}
};
// High bits per key -> almost no FPs
std::shared_ptr<const FilterPolicy> kCompatibilityBloomPolicy{
NewBloomFilterPolicy(20)};
// bloom_before_level=-1 -> always use Ribbon
std::shared_ptr<const FilterPolicy> kCompatibilityRibbonPolicy{
NewRibbonFilterPolicy(20, -1)};
std::vector<CompatibilityConfig> kCompatibilityConfigs = {
{Create(20, kDeprecatedBlock), false,
BlockBasedTableOptions().format_version},
{kCompatibilityBloomPolicy, false, BlockBasedTableOptions().format_version},
{kCompatibilityBloomPolicy, true, BlockBasedTableOptions().format_version},
{kCompatibilityBloomPolicy, false, /* legacy Bloom */ 4U},
{kCompatibilityRibbonPolicy, false,
BlockBasedTableOptions().format_version},
{kCompatibilityRibbonPolicy, true, BlockBasedTableOptions().format_version},
};
} // namespace
TEST_F(DBBloomFilterTest, BloomFilterCompatibility) {
Options options = CurrentOptions();
options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics();
BlockBasedTableOptions table_options;
table_options.filter_policy.reset(NewBloomFilterPolicy(10, true));
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
options.level0_file_num_compaction_trigger =
static_cast<int>(kCompatibilityConfigs.size()) + 1;
options.max_open_files = -1;
// Create with block based filter
CreateAndReopenWithCF({"pikachu"}, options);
Close();
const int maxKey = 10000;
for (int i = 0; i < maxKey; i++) {
ASSERT_OK(Put(1, Key(i), Key(i)));
}
ASSERT_OK(Put(1, Key(maxKey + 55555), Key(maxKey + 55555)));
Flush(1);
// Check db with full filter
table_options.filter_policy.reset(NewBloomFilterPolicy(10, false));
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
ReopenWithColumnFamilies({"default", "pikachu"}, options);
// Check if they can be found
for (int i = 0; i < maxKey; i++) {
ASSERT_EQ(Key(i), Get(1, Key(i)));
}
ASSERT_EQ(TestGetTickerCount(options, BLOOM_FILTER_USEFUL), 0);
// Check db with partitioned full filter
table_options.partition_filters = true;
table_options.index_type =
BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch;
table_options.filter_policy.reset(NewBloomFilterPolicy(10, false));
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
ReopenWithColumnFamilies({"default", "pikachu"}, options);
// Check if they can be found
for (int i = 0; i < maxKey; i++) {
ASSERT_EQ(Key(i), Get(1, Key(i)));
}
ASSERT_EQ(TestGetTickerCount(options, BLOOM_FILTER_USEFUL), 0);
}
TEST_F(DBBloomFilterTest, BloomFilterReverseCompatibility) {
for (bool partition_filters : {true, false}) {
Options options = CurrentOptions();
options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics();
// Create one file for each kind of filter. Each file covers a distinct key
// range.
for (size_t i = 0; i < kCompatibilityConfigs.size(); ++i) {
BlockBasedTableOptions table_options;
if (partition_filters) {
table_options.partition_filters = true;
table_options.index_type =
BlockBasedTableOptions::IndexType::kTwoLevelIndexSearch;
}
table_options.filter_policy.reset(NewBloomFilterPolicy(10, false));
kCompatibilityConfigs[i].SetInTableOptions(&table_options);
ASSERT_TRUE(table_options.filter_policy != nullptr);
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
DestroyAndReopen(options);
Reopen(options);
// Create with full filter
CreateAndReopenWithCF({"pikachu"}, options);
std::string prefix = std::to_string(i) + "_";
ASSERT_OK(Put(prefix + "A", "val"));
ASSERT_OK(Put(prefix + "Z", "val"));
ASSERT_OK(Flush());
}
const int maxKey = 10000;
for (int i = 0; i < maxKey; i++) {
ASSERT_OK(Put(1, Key(i), Key(i)));
}
ASSERT_OK(Put(1, Key(maxKey + 55555), Key(maxKey + 55555)));
Flush(1);
// Check db with block_based filter
table_options.filter_policy.reset(NewBloomFilterPolicy(10, true));
// Test filter is used between each pair of {reader,writer} configurations,
// because any built-in FilterPolicy should be able to read filters from any
// other built-in FilterPolicy
for (size_t i = 0; i < kCompatibilityConfigs.size(); ++i) {
BlockBasedTableOptions table_options;
kCompatibilityConfigs[i].SetInTableOptions(&table_options);
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
ReopenWithColumnFamilies({"default", "pikachu"}, options);
// Check if they can be found
for (int i = 0; i < maxKey; i++) {
ASSERT_EQ(Key(i), Get(1, Key(i)));
Reopen(options);
for (size_t j = 0; j < kCompatibilityConfigs.size(); ++j) {
std::string prefix = std::to_string(j) + "_";
ASSERT_EQ("val", Get(prefix + "A")); // Filter positive
ASSERT_EQ("val", Get(prefix + "Z")); // Filter positive
// Filter negative, with high probability
ASSERT_EQ("NOT_FOUND", Get(prefix + "Q"));
// FULL_POSITIVE does not include block-based filter case (j == 0)
EXPECT_EQ(TestGetAndResetTickerCount(options, BLOOM_FILTER_FULL_POSITIVE),
j == 0 ? 0 : 2);
EXPECT_EQ(TestGetAndResetTickerCount(options, BLOOM_FILTER_USEFUL), 1);
}
ASSERT_EQ(TestGetTickerCount(options, BLOOM_FILTER_USEFUL), 0);
}
}
@ -919,7 +927,7 @@ class FilterConstructResPeakTrackingCache : public CacheWrapper {
}
using Cache::Release;
bool Release(Handle* handle, bool force_erase = false) override {
bool Release(Handle* handle, bool erase_if_last_ref = false) override {
auto deleter = GetDeleter(handle);
if (deleter == kNoopDeleterForFilterConstruction) {
if (!last_peak_tracked_) {
@ -929,7 +937,7 @@ class FilterConstructResPeakTrackingCache : public CacheWrapper {
}
cur_cache_res_ -= GetCharge(handle);
}
bool is_successful = target_->Release(handle, force_erase);
bool is_successful = target_->Release(handle, erase_if_last_ref);
return is_successful;
}
@ -952,8 +960,8 @@ class FilterConstructResPeakTrackingCache : public CacheWrapper {
const Cache::DeleterFn
FilterConstructResPeakTrackingCache::kNoopDeleterForFilterConstruction =
CacheReservationManager::TEST_GetNoopDeleterForRole<
CacheEntryRole::kFilterConstruction>();
CacheReservationManagerImpl<
CacheEntryRole::kFilterConstruction>::TEST_GetNoopDeleterForRole();
// To align with the type of hash entry being reserved in implementation.
using FilterConstructionReserveMemoryHash = uint64_t;
@ -983,7 +991,9 @@ class DBFilterConstructionReserveMemoryTestWithParam
// trigger at least 1 dummy entry reservation each for hash entries and
// final filter, we need a large number of keys to ensure we have at least
// two partitions.
num_key_ = 18 * CacheReservationManager::GetDummyEntrySize() /
num_key_ = 18 *
CacheReservationManagerImpl<
CacheEntryRole::kFilterConstruction>::GetDummyEntrySize() /
sizeof(FilterConstructionReserveMemoryHash);
} else if (policy_ == kFastLocalBloom) {
// For Bloom Filter + FullFilter case, since we design the num_key_ to
@ -993,7 +1003,9 @@ class DBFilterConstructionReserveMemoryTestWithParam
// behavior and we don't need a large number of keys to verify we
// indeed charge the final filter for cache reservation, even though final
// filter is a lot smaller than hash entries.
num_key_ = 1 * CacheReservationManager::GetDummyEntrySize() /
num_key_ = 1 *
CacheReservationManagerImpl<
CacheEntryRole::kFilterConstruction>::GetDummyEntrySize() /
sizeof(FilterConstructionReserveMemoryHash);
} else {
// For Ribbon Filter + FullFilter case, we need a large enough number of
@ -1001,7 +1013,9 @@ class DBFilterConstructionReserveMemoryTestWithParam
// reservation will trigger at least another dummy entry (or equivalently
// to saying, causing another peak in cache reservation) as banding
// reservation might not be a multiple of dummy entry.
num_key_ = 12 * CacheReservationManager::GetDummyEntrySize() /
num_key_ = 12 *
CacheReservationManagerImpl<
CacheEntryRole::kFilterConstruction>::GetDummyEntrySize() /
sizeof(FilterConstructionReserveMemoryHash);
}
}
@ -1061,7 +1075,8 @@ class DBFilterConstructionReserveMemoryTestWithParam
};
INSTANTIATE_TEST_CASE_P(
BlockBasedTableOptions, DBFilterConstructionReserveMemoryTestWithParam,
DBFilterConstructionReserveMemoryTestWithParam,
DBFilterConstructionReserveMemoryTestWithParam,
::testing::Values(std::make_tuple(false, kFastLocalBloom, false, false),
std::make_tuple(true, kFastLocalBloom, false, false),
@ -1077,7 +1092,7 @@ INSTANTIATE_TEST_CASE_P(
std::make_tuple(true, kDeprecatedBlock, false, false),
std::make_tuple(true, kLegacyBloom, false, false)));
// TODO: Speed up this test.
// TODO: Speed up this test, and reduce disk space usage (~700MB)
// The current test inserts many keys (on the scale of dummy entry size)
// in order to make small memory user (e.g, final filter, partitioned hash
// entries/filter/banding) , which is proportional to the number of
@ -1156,8 +1171,8 @@ TEST_P(DBFilterConstructionReserveMemoryTestWithParam, ReserveMemory) {
return;
}
const std::size_t kDummyEntrySize =
CacheReservationManager::GetDummyEntrySize();
const std::size_t kDummyEntrySize = CacheReservationManagerImpl<
CacheEntryRole::kFilterConstruction>::GetDummyEntrySize();
const std::size_t predicted_hash_entries_cache_res =
num_key * sizeof(FilterConstructionReserveMemoryHash);
@ -1345,9 +1360,12 @@ TEST_P(DBFilterConstructionReserveMemoryTestWithParam, ReserveMemory) {
*
*/
if (!partition_filters) {
ASSERT_GE(std::floor(1.0 * predicted_final_filter_cache_res /
CacheReservationManager::GetDummyEntrySize()),
1)
ASSERT_GE(
std::floor(
1.0 * predicted_final_filter_cache_res /
CacheReservationManagerImpl<
CacheEntryRole::kFilterConstruction>::GetDummyEntrySize()),
1)
<< "Final filter cache reservation too small for this test - please "
"increase the number of keys";
if (!detect_filter_construct_corruption) {
@ -1695,11 +1713,11 @@ class TestingContextCustomFilterPolicy
test_report_ +=
OptionsHelper::compaction_style_to_string[context.compaction_style];
test_report_ += ",n=";
test_report_ += ROCKSDB_NAMESPACE::ToString(context.num_levels);
test_report_ += std::to_string(context.num_levels);
test_report_ += ",l=";
test_report_ += ROCKSDB_NAMESPACE::ToString(context.level_at_creation);
test_report_ += std::to_string(context.level_at_creation);
test_report_ += ",b=";
test_report_ += ROCKSDB_NAMESPACE::ToString(int{context.is_bottommost});
test_report_ += std::to_string(int{context.is_bottommost});
test_report_ += ",r=";
test_report_ += table_file_creation_reason_to_string[context.reason];
test_report_ += "\n";

View File

@ -454,7 +454,7 @@ TEST_F(DBTestCompactionFilter, CompactionFilterDeletesAll) {
// put some data
for (int table = 0; table < 4; ++table) {
for (int i = 0; i < 10 + table; ++i) {
ASSERT_OK(Put(ToString(table * 100 + i), "val"));
ASSERT_OK(Put(std::to_string(table * 100 + i), "val"));
}
ASSERT_OK(Flush());
}
@ -755,7 +755,7 @@ TEST_F(DBTestCompactionFilter, CompactionFilterContextCfId) {
#ifndef ROCKSDB_LITE
// Compaction filters aplies to all records, regardless snapshots.
TEST_F(DBTestCompactionFilter, CompactionFilterIgnoreSnapshot) {
std::string five = ToString(5);
std::string five = std::to_string(5);
Options options = CurrentOptions();
options.compaction_filter_factory = std::make_shared<DeleteISFilterFactory>();
options.disable_auto_compactions = true;
@ -766,7 +766,7 @@ TEST_F(DBTestCompactionFilter, CompactionFilterIgnoreSnapshot) {
const Snapshot* snapshot = nullptr;
for (int table = 0; table < 4; ++table) {
for (int i = 0; i < 10; ++i) {
ASSERT_OK(Put(ToString(table * 100 + i), "val"));
ASSERT_OK(Put(std::to_string(table * 100 + i), "val"));
}
ASSERT_OK(Flush());
@ -968,6 +968,71 @@ TEST_F(DBTestCompactionFilter, IgnoreSnapshotsFalseRecovery) {
ASSERT_TRUE(TryReopen(options).IsNotSupported());
}
TEST_F(DBTestCompactionFilter, DropKeyWithSingleDelete) {
Options options = GetDefaultOptions();
options.create_if_missing = true;
Reopen(options);
ASSERT_OK(Put("a", "v0"));
ASSERT_OK(Put("b", "v0"));
const Snapshot* snapshot = db_->GetSnapshot();
ASSERT_OK(SingleDelete("b"));
ASSERT_OK(Flush());
{
CompactRangeOptions cro;
cro.change_level = true;
cro.target_level = options.num_levels - 1;
ASSERT_OK(db_->CompactRange(cro, nullptr, nullptr));
}
db_->ReleaseSnapshot(snapshot);
Close();
class DeleteFilterV2 : public CompactionFilter {
public:
Decision FilterV2(int /*level*/, const Slice& key, ValueType /*value_type*/,
const Slice& /*existing_value*/,
std::string* /*new_value*/,
std::string* /*skip_until*/) const override {
if (key.starts_with("b")) {
return Decision::kPurge;
}
return Decision::kRemove;
}
const char* Name() const override { return "DeleteFilterV2"; }
} delete_filter_v2;
options.compaction_filter = &delete_filter_v2;
options.level0_file_num_compaction_trigger = 2;
Reopen(options);
ASSERT_OK(Put("b", "v1"));
ASSERT_OK(Put("x", "v1"));
ASSERT_OK(Flush());
ASSERT_OK(Put("r", "v1"));
ASSERT_OK(Put("z", "v1"));
ASSERT_OK(Flush());
ASSERT_OK(dbfull()->TEST_WaitForCompact());
Close();
options.compaction_filter = nullptr;
Reopen(options);
ASSERT_OK(SingleDelete("b"));
ASSERT_OK(Flush());
{
CompactRangeOptions cro;
cro.bottommost_level_compaction = BottommostLevelCompaction::kForce;
ASSERT_OK(db_->CompactRange(cro, nullptr, nullptr));
}
}
} // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) {

View File

@ -1354,6 +1354,74 @@ TEST_P(DBCompactionTestWithParam, TrivialMoveTargetLevel) {
}
}
TEST_P(DBCompactionTestWithParam, PartialOverlappingL0) {
class SubCompactionEventListener : public EventListener {
public:
void OnSubcompactionCompleted(const SubcompactionJobInfo&) override {
sub_compaction_finished_++;
}
std::atomic<int> sub_compaction_finished_{0};
};
Options options = CurrentOptions();
options.disable_auto_compactions = true;
options.write_buffer_size = 10 * 1024 * 1024;
options.max_subcompactions = max_subcompactions_;
SubCompactionEventListener* listener = new SubCompactionEventListener();
options.listeners.emplace_back(listener);
DestroyAndReopen(options);
// For subcompactino to trigger, output level needs to be non-empty.
ASSERT_OK(Put("key", ""));
ASSERT_OK(Put("kez", ""));
ASSERT_OK(Flush());
ASSERT_OK(Put("key", ""));
ASSERT_OK(Put("kez", ""));
ASSERT_OK(Flush());
ASSERT_OK(db_->CompactRange(CompactRangeOptions(), nullptr, nullptr));
// Ranges that are only briefly overlapping so that they won't be trivially
// moved but subcompaction ranges would only contain a subset of files.
std::vector<std::pair<int32_t, int32_t>> ranges = {
{100, 199}, {198, 399}, {397, 600}, {598, 800}, {799, 900}, {895, 999},
};
int32_t value_size = 10 * 1024; // 10 KB
Random rnd(301);
std::map<int32_t, std::string> values;
for (size_t i = 0; i < ranges.size(); i++) {
for (int32_t j = ranges[i].first; j <= ranges[i].second; j++) {
values[j] = rnd.RandomString(value_size);
ASSERT_OK(Put(Key(j), values[j]));
}
ASSERT_OK(Flush());
}
int32_t level0_files = NumTableFilesAtLevel(0, 0);
ASSERT_EQ(level0_files, ranges.size()); // Multiple files in L0
ASSERT_EQ(NumTableFilesAtLevel(1, 0), 1); // One file in L1
listener->sub_compaction_finished_ = 0;
ASSERT_OK(db_->EnableAutoCompaction({db_->DefaultColumnFamily()}));
ASSERT_OK(dbfull()->TEST_WaitForCompact());
if (max_subcompactions_ > 3) {
// RocksDB might not generate the exact number of sub compactions.
// Here we validate that at least subcompaction happened.
ASSERT_GT(listener->sub_compaction_finished_.load(), 2);
}
// We expect that all the files were compacted to L1
ASSERT_EQ(NumTableFilesAtLevel(0, 0), 0);
ASSERT_GT(NumTableFilesAtLevel(1, 0), 1);
for (size_t i = 0; i < ranges.size(); i++) {
for (int32_t j = ranges[i].first; j <= ranges[i].second; j++) {
ASSERT_EQ(Get(Key(j)), values[j]);
}
}
}
TEST_P(DBCompactionTestWithParam, ManualCompactionPartial) {
int32_t trivial_move = 0;
int32_t non_trivial_move = 0;
@ -2341,6 +2409,30 @@ TEST_P(DBCompactionTestWithParam, LevelCompactionCFPathUse) {
check_getvalues();
{ // Also verify GetLiveFilesStorageInfo with db_paths / cf_paths
std::vector<LiveFileStorageInfo> new_infos;
LiveFilesStorageInfoOptions lfsio;
lfsio.wal_size_for_flush = UINT64_MAX; // no flush
ASSERT_OK(db_->GetLiveFilesStorageInfo(lfsio, &new_infos));
std::unordered_map<std::string, int> live_sst_by_dir;
for (auto& info : new_infos) {
if (info.file_type == kTableFile) {
live_sst_by_dir[info.directory]++;
// Verify file on disk (no directory confusion)
uint64_t size;
ASSERT_OK(env_->GetFileSize(
info.directory + "/" + info.relative_filename, &size));
ASSERT_EQ(info.size, size);
}
}
ASSERT_EQ(3U * 3U, live_sst_by_dir.size());
for (auto& paths : {options.db_paths, cf_opt1.cf_paths, cf_opt2.cf_paths}) {
ASSERT_EQ(1, live_sst_by_dir[paths[0].path]);
ASSERT_EQ(4, live_sst_by_dir[paths[1].path]);
ASSERT_EQ(2, live_sst_by_dir[paths[2].path]);
}
}
ReopenWithColumnFamilies({"default", "one", "two"}, option_vector);
check_getvalues();
@ -2725,7 +2817,7 @@ TEST_P(DBCompactionTestWithParam, DISABLED_CompactFilesOnLevelCompaction) {
Random rnd(301);
for (int key = 64 * kEntriesPerBuffer; key >= 0; --key) {
ASSERT_OK(Put(1, ToString(key), rnd.RandomString(kTestValueSize)));
ASSERT_OK(Put(1, std::to_string(key), rnd.RandomString(kTestValueSize)));
}
ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable(handles_[1]));
ASSERT_OK(dbfull()->TEST_WaitForCompact());
@ -2757,7 +2849,7 @@ TEST_P(DBCompactionTestWithParam, DISABLED_CompactFilesOnLevelCompaction) {
// make sure all key-values are still there.
for (int key = 64 * kEntriesPerBuffer; key >= 0; --key) {
ASSERT_NE(Get(1, ToString(key)), "NOT_FOUND");
ASSERT_NE(Get(1, std::to_string(key)), "NOT_FOUND");
}
}
@ -4312,7 +4404,8 @@ TEST_F(DBCompactionTest, LevelPeriodicCompactionWithCompactionFilters) {
for (CompactionFilterType comp_filter_type :
{kUseCompactionFilter, kUseCompactionFilterFactory}) {
// Assert that periodic compactions are not enabled.
ASSERT_EQ(port::kMaxUint64 - 1, options.periodic_compaction_seconds);
ASSERT_EQ(std::numeric_limits<uint64_t>::max() - 1,
options.periodic_compaction_seconds);
if (comp_filter_type == kUseCompactionFilter) {
options.compaction_filter = &test_compaction_filter;
@ -4575,9 +4668,9 @@ TEST_F(DBCompactionTest, CompactRangeSkipFlushAfterDelay) {
});
TEST_SYNC_POINT("DBCompactionTest::CompactRangeSkipFlushAfterDelay:PreFlush");
ASSERT_OK(Put(ToString(0), rnd.RandomString(1024)));
ASSERT_OK(Put(std::to_string(0), rnd.RandomString(1024)));
ASSERT_OK(dbfull()->Flush(flush_opts));
ASSERT_OK(Put(ToString(0), rnd.RandomString(1024)));
ASSERT_OK(Put(std::to_string(0), rnd.RandomString(1024)));
TEST_SYNC_POINT("DBCompactionTest::CompactRangeSkipFlushAfterDelay:PostFlush");
manual_compaction_thread.join();
@ -4586,7 +4679,7 @@ TEST_F(DBCompactionTest, CompactRangeSkipFlushAfterDelay) {
std::string num_keys_in_memtable;
ASSERT_TRUE(db_->GetProperty(DB::Properties::kNumEntriesActiveMemTable,
&num_keys_in_memtable));
ASSERT_EQ(ToString(1), num_keys_in_memtable);
ASSERT_EQ(std::to_string(1), num_keys_in_memtable);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
}
@ -4735,7 +4828,7 @@ TEST_F(DBCompactionTest, SubcompactionEvent) {
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 10 + j;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -4745,7 +4838,7 @@ TEST_F(DBCompactionTest, SubcompactionEvent) {
for (int i = 0; i < 2; i++) {
for (int j = 0; j < 10; j++) {
int key_id = i * 20 + j * 2;
ASSERT_OK(Put(Key(key_id), "value" + ToString(key_id)));
ASSERT_OK(Put(Key(key_id), "value" + std::to_string(key_id)));
}
ASSERT_OK(Flush());
}
@ -5737,7 +5830,7 @@ TEST_P(DBCompactionTestWithBottommostParam, SequenceKeysManualCompaction) {
}
ASSERT_OK(dbfull()->TEST_WaitForFlushMemTable());
ASSERT_EQ(ToString(kSstNum), FilesPerLevel(0));
ASSERT_EQ(std::to_string(kSstNum), FilesPerLevel(0));
auto cro = CompactRangeOptions();
cro.bottommost_level_compaction = bottommost_level_compaction_;
@ -5750,7 +5843,7 @@ TEST_P(DBCompactionTestWithBottommostParam, SequenceKeysManualCompaction) {
ASSERT_EQ("0,1", FilesPerLevel(0));
} else {
// Just trivial move from level 0 -> 1
ASSERT_EQ("0," + ToString(kSstNum), FilesPerLevel(0));
ASSERT_EQ("0," + std::to_string(kSstNum), FilesPerLevel(0));
}
}
@ -6416,20 +6509,29 @@ TEST_F(DBCompactionTest, CompactionWithBlobGCError_CorruptIndex) {
ASSERT_OK(Put(third_key, third_value));
constexpr char fourth_key[] = "fourth_key";
constexpr char corrupt_blob_index[] = "foobar";
WriteBatch batch;
ASSERT_OK(WriteBatchInternal::PutBlobIndex(&batch, 0, fourth_key,
corrupt_blob_index));
ASSERT_OK(db_->Write(WriteOptions(), &batch));
constexpr char fourth_value[] = "fourth_value";
ASSERT_OK(Put(fourth_key, fourth_value));
ASSERT_OK(Flush());
SyncPoint::GetInstance()->SetCallBack(
"CompactionIterator::GarbageCollectBlobIfNeeded::TamperWithBlobIndex",
[](void* arg) {
Slice* const blob_index = static_cast<Slice*>(arg);
assert(blob_index);
assert(!blob_index->empty());
blob_index->remove_prefix(1);
});
SyncPoint::GetInstance()->EnableProcessing();
constexpr Slice* begin = nullptr;
constexpr Slice* end = nullptr;
ASSERT_TRUE(
db_->CompactRange(CompactRangeOptions(), begin, end).IsCorruption());
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
}
TEST_F(DBCompactionTest, CompactionWithBlobGCError_InlinedTTLIndex) {
@ -7072,7 +7174,7 @@ TEST_F(DBCompactionTest, DisableManualCompactionThreadQueueFull) {
ASSERT_OK(Put(Key(2), "value2"));
ASSERT_OK(Flush());
}
ASSERT_EQ(ToString(kNumL0Files + (kNumL0Files / 2)), FilesPerLevel(0));
ASSERT_EQ(std::to_string(kNumL0Files + (kNumL0Files / 2)), FilesPerLevel(0));
db_->DisableManualCompaction();
@ -7129,7 +7231,7 @@ TEST_F(DBCompactionTest, DisableManualCompactionThreadQueueFullDBClose) {
ASSERT_OK(Put(Key(2), "value2"));
ASSERT_OK(Flush());
}
ASSERT_EQ(ToString(kNumL0Files + (kNumL0Files / 2)), FilesPerLevel(0));
ASSERT_EQ(std::to_string(kNumL0Files + (kNumL0Files / 2)), FilesPerLevel(0));
db_->DisableManualCompaction();
@ -7189,7 +7291,7 @@ TEST_F(DBCompactionTest, DBCloseWithManualCompaction) {
ASSERT_OK(Put(Key(2), "value2"));
ASSERT_OK(Flush());
}
ASSERT_EQ(ToString(kNumL0Files + (kNumL0Files / 2)), FilesPerLevel(0));
ASSERT_EQ(std::to_string(kNumL0Files + (kNumL0Files / 2)), FilesPerLevel(0));
// Close DB with manual compaction and auto triggered compaction in the queue.
auto s = db_->Close();

View File

@ -45,17 +45,15 @@ Status DBImpl::FlushForGetLiveFiles() {
}
mutex_.Lock();
} else {
for (auto cfd : *versions_->GetColumnFamilySet()) {
for (auto cfd : versions_->GetRefedColumnFamilySet()) {
if (cfd->IsDropped()) {
continue;
}
cfd->Ref();
mutex_.Unlock();
status = FlushMemTable(cfd, FlushOptions(), FlushReason::kGetLiveFiles);
TEST_SYNC_POINT("DBImpl::GetLiveFiles:1");
TEST_SYNC_POINT("DBImpl::GetLiveFiles:2");
mutex_.Lock();
cfd->UnrefAndTryDelete();
if (!status.ok() && !status.IsColumnFamilyDropped()) {
break;
} else if (status.IsColumnFamilyDropped()) {
@ -63,7 +61,6 @@ Status DBImpl::FlushForGetLiveFiles() {
}
}
}
versions_->GetColumnFamilySet()->FreeDeadColumnFamilies();
return status;
}
@ -99,7 +96,7 @@ Status DBImpl::GetLiveFiles(std::vector<std::string>& ret,
3); // for CURRENT + MANIFEST + OPTIONS
// create names of the live files. The names are not absolute
// paths, instead they are relative to dbname_;
// paths, instead they are relative to dbname_.
for (const auto& table_file_number : live_table_files) {
ret.emplace_back(MakeTableFileName("", table_file_number));
}
@ -169,7 +166,7 @@ Status DBImpl::GetCurrentWalFile(std::unique_ptr<LogFile>* current_log_file) {
Status DBImpl::GetLiveFilesStorageInfo(
const LiveFilesStorageInfoOptions& opts,
std::vector<LiveFileStorageInfo>* files) {
// To avoid returning partial results, only move to ouput on success
// To avoid returning partial results, only move results to files on success.
assert(files);
files->clear();
std::vector<LiveFileStorageInfo> results;
@ -180,10 +177,10 @@ Status DBImpl::GetLiveFilesStorageInfo(
VectorLogPtr live_wal_files;
bool flush_memtable = true;
if (!immutable_db_options_.allow_2pc) {
if (opts.wal_size_for_flush == port::kMaxUint64) {
if (opts.wal_size_for_flush == std::numeric_limits<uint64_t>::max()) {
flush_memtable = false;
} else if (opts.wal_size_for_flush > 0) {
// If out standing log files are small, we skip the flush.
// If the outstanding log files are small, we skip the flush.
s = GetSortedWalFiles(live_wal_files);
if (!s.ok()) {
@ -316,8 +313,7 @@ Status DBImpl::GetLiveFilesStorageInfo(
info.relative_filename = kCurrentFileName;
info.directory = GetName();
info.file_type = kCurrentFile;
// CURRENT could be replaced so we have to record the contents we want
// for it
// CURRENT could be replaced so we have to record the contents as needed.
info.replacement_contents = manifest_fname + "\n";
info.size = manifest_fname.size() + 1;
if (opts.include_checksum_info) {
@ -359,7 +355,7 @@ Status DBImpl::GetLiveFilesStorageInfo(
TEST_SYNC_POINT("CheckpointImpl::CreateCustomCheckpoint:AfterGetLive1");
TEST_SYNC_POINT("CheckpointImpl::CreateCustomCheckpoint:AfterGetLive2");
// if we have more than one column family, we need to also get WAL files
// If we have more than one column family, we also need to get WAL files.
if (s.ok()) {
s = GetSortedWalFiles(live_wal_files);
}
@ -397,7 +393,7 @@ Status DBImpl::GetLiveFilesStorageInfo(
}
if (s.ok()) {
// Only move output on success
// Only move results to output on success.
*files = std::move(results);
}
return s;

View File

@ -2271,7 +2271,7 @@ TEST_P(DBAtomicFlushTest, ManualFlushUnder2PC) {
// The recovered min log number with prepared data should be non-zero.
// In 2pc mode, MinLogNumberToKeep returns the
// VersionSet::min_log_number_to_keep_2pc recovered from MANIFEST, if it's 0,
// VersionSet::min_log_number_to_keep recovered from MANIFEST, if it's 0,
// it means atomic flush didn't write the min_log_number_to_keep to MANIFEST.
cfs.push_back(kDefaultColumnFamilyName);
ASSERT_OK(TryReopenWithColumnFamilies(cfs, options));
@ -2356,7 +2356,7 @@ TEST_P(DBAtomicFlushTest, PrecomputeMinLogNumberToKeepNon2PC) {
ASSERT_OK(Flush(cf_ids));
uint64_t log_num_after_flush = dbfull()->TEST_GetCurrentLogNumber();
uint64_t min_log_number_to_keep = port::kMaxUint64;
uint64_t min_log_number_to_keep = std::numeric_limits<uint64_t>::max();
autovector<ColumnFamilyData*> flushed_cfds;
autovector<autovector<VersionEdit*>> flush_edits;
for (size_t i = 0; i != num_cfs; ++i) {

View File

@ -101,6 +101,7 @@
#include "util/compression.h"
#include "util/crc32c.h"
#include "util/defer.h"
#include "util/hash_containers.h"
#include "util/mutexlock.h"
#include "util/stop_watch.h"
#include "util/string_util.h"
@ -375,15 +376,12 @@ Status DBImpl::ResumeImpl(DBRecoverContext context) {
s = AtomicFlushMemTables(cfds, flush_opts, context.flush_reason);
mutex_.Lock();
} else {
for (auto cfd : *versions_->GetColumnFamilySet()) {
for (auto cfd : versions_->GetRefedColumnFamilySet()) {
if (cfd->IsDropped()) {
continue;
}
cfd->Ref();
mutex_.Unlock();
InstrumentedMutexUnlock u(&mutex_);
s = FlushMemTable(cfd, flush_opts, context.flush_reason);
mutex_.Lock();
cfd->UnrefAndTryDelete();
if (!s.ok()) {
break;
}
@ -495,18 +493,14 @@ void DBImpl::CancelAllBackgroundWork(bool wait) {
s.PermitUncheckedError(); //**TODO: What to do on error?
mutex_.Lock();
} else {
for (auto cfd : *versions_->GetColumnFamilySet()) {
for (auto cfd : versions_->GetRefedColumnFamilySet()) {
if (!cfd->IsDropped() && cfd->initialized() && !cfd->mem()->IsEmpty()) {
cfd->Ref();
mutex_.Unlock();
InstrumentedMutexUnlock u(&mutex_);
Status s = FlushMemTable(cfd, FlushOptions(), FlushReason::kShutDown);
s.PermitUncheckedError(); //**TODO: What to do on error?
mutex_.Lock();
cfd->UnrefAndTryDelete();
}
}
}
versions_->GetColumnFamilySet()->FreeDeadColumnFamilies();
}
shutting_down_.store(true, std::memory_order_release);
@ -573,7 +567,7 @@ Status DBImpl::CloseHelper() {
// flushing by first checking if there is a need for
// flushing (but need to implement something
// else than imm()->IsFlushPending() because the output
// memtables added to imm() dont trigger flushes).
// memtables added to imm() don't trigger flushes).
if (immutable_db_options_.experimental_mempurge_threshold > 0.0) {
Status flush_ret;
mutex_.Unlock();
@ -855,7 +849,8 @@ void DBImpl::PersistStats() {
if (stats_slice_.find(stat.first) != stats_slice_.end()) {
uint64_t delta = stat.second - stats_slice_[stat.first];
s = batch.Put(persist_stats_cf_handle_,
Slice(key, std::min(100, length)), ToString(delta));
Slice(key, std::min(100, length)),
std::to_string(delta));
}
}
}
@ -969,18 +964,13 @@ void DBImpl::DumpStats() {
TEST_SYNC_POINT("DBImpl::DumpStats:StartRunning");
{
InstrumentedMutexLock l(&mutex_);
for (auto cfd : *versions_->GetColumnFamilySet()) {
for (auto cfd : versions_->GetRefedColumnFamilySet()) {
if (cfd->initialized()) {
// Release DB mutex for gathering cache entry stats. Pass over all
// column families for this first so that other stats are dumped
// near-atomically.
// Get a ref before unlocking
cfd->Ref();
{
InstrumentedMutexUnlock u(&mutex_);
cfd->internal_stats()->CollectCacheEntryStats(/*foreground=*/false);
}
cfd->UnrefAndTryDelete();
InstrumentedMutexUnlock u(&mutex_);
cfd->internal_stats()->CollectCacheEntryStats(/*foreground=*/false);
}
}
@ -1093,7 +1083,6 @@ Status DBImpl::SetOptions(
MutableCFOptions new_options;
Status s;
Status persist_options_status;
persist_options_status.PermitUncheckedError(); // Allow uninitialized access
SuperVersionContext sv_context(/* create_superversion */ true);
{
auto db_options = GetDBOptions();
@ -1129,9 +1118,11 @@ Status DBImpl::SetOptions(
"[%s] SetOptions() succeeded", cfd->GetName().c_str());
new_options.Dump(immutable_db_options_.info_log.get());
if (!persist_options_status.ok()) {
// NOTE: WriteOptionsFile already logs on failure
s = persist_options_status;
}
} else {
persist_options_status.PermitUncheckedError(); // less important
ROCKS_LOG_WARN(immutable_db_options_.info_log, "[%s] SetOptions() failed",
cfd->GetName().c_str());
}
@ -1900,6 +1891,8 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
return s;
}
}
TEST_SYNC_POINT("DBImpl::GetImpl:PostMemTableGet:0");
TEST_SYNC_POINT("DBImpl::GetImpl:PostMemTableGet:1");
PinnedIteratorsManager pinned_iters_mgr;
if (!done) {
PERF_TIMER_GUARD(get_from_output_files_time);
@ -1917,8 +1910,6 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
{
PERF_TIMER_GUARD(get_post_process_time);
ReturnAndCleanupSuperVersion(cfd, sv);
RecordTick(stats_, NUMBER_KEYS_READ);
size_t size = 0;
if (s.ok()) {
@ -1945,6 +1936,8 @@ Status DBImpl::GetImpl(const ReadOptions& read_options, const Slice& key,
PERF_COUNTER_ADD(get_read_bytes, size);
}
RecordInHistogram(stats_, BYTES_PER_READ, size);
ReturnAndCleanupSuperVersion(cfd, sv);
}
return s;
}
@ -2009,7 +2002,7 @@ std::vector<Status> DBImpl::MultiGet(
SequenceNumber consistent_seqnum;
std::unordered_map<uint32_t, MultiGetColumnFamilyData> multiget_cf_data(
UnorderedMap<uint32_t, MultiGetColumnFamilyData> multiget_cf_data(
column_family.size());
for (auto cf : column_family) {
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(cf);
@ -2021,13 +2014,13 @@ std::vector<Status> DBImpl::MultiGet(
}
std::function<MultiGetColumnFamilyData*(
std::unordered_map<uint32_t, MultiGetColumnFamilyData>::iterator&)>
UnorderedMap<uint32_t, MultiGetColumnFamilyData>::iterator&)>
iter_deref_lambda =
[](std::unordered_map<uint32_t, MultiGetColumnFamilyData>::iterator&
[](UnorderedMap<uint32_t, MultiGetColumnFamilyData>::iterator&
cf_iter) { return &cf_iter->second; };
bool unref_only =
MultiCFSnapshot<std::unordered_map<uint32_t, MultiGetColumnFamilyData>>(
MultiCFSnapshot<UnorderedMap<uint32_t, MultiGetColumnFamilyData>>(
read_options, nullptr, iter_deref_lambda, &multiget_cf_data,
&consistent_seqnum);
@ -3363,7 +3356,7 @@ bool DBImpl::GetProperty(ColumnFamilyHandle* column_family,
bool ret_value =
GetIntPropertyInternal(cfd, *property_info, false, &int_value);
if (ret_value) {
*value = ToString(int_value);
*value = std::to_string(int_value);
}
return ret_value;
} else if (property_info->handle_string) {
@ -3490,15 +3483,13 @@ bool DBImpl::GetAggregatedIntProperty(const Slice& property,
// Needs mutex to protect the list of column families.
InstrumentedMutexLock l(&mutex_);
uint64_t value;
for (auto* cfd : *versions_->GetColumnFamilySet()) {
for (auto* cfd : versions_->GetRefedColumnFamilySet()) {
if (!cfd->initialized()) {
continue;
}
cfd->Ref();
ret = GetIntPropertyInternal(cfd, *property_info, true, &value);
// GetIntPropertyInternal may release db mutex and re-acquire it.
mutex_.AssertHeld();
cfd->UnrefAndTryDelete();
if (ret) {
sum += value;
} else {
@ -3692,6 +3683,11 @@ Status DBImpl::GetUpdatesSince(
SequenceNumber seq, std::unique_ptr<TransactionLogIterator>* iter,
const TransactionLogIterator::ReadOptions& read_options) {
RecordTick(stats_, GET_UPDATES_SINCE_CALLS);
if (seq_per_batch_) {
return Status::NotSupported(
"This API is not yet compatible with write-prepared/write-unprepared "
"transactions");
}
if (seq > versions_->LastSequence()) {
return Status::NotFound("Requested sequence not yet written in the db");
}
@ -3995,8 +3991,8 @@ Status DBImpl::CheckConsistency() {
} else if (fsize != md.size) {
corruption_messages += "Sst file size mismatch: " + file_path +
". Size recorded in manifest " +
ToString(md.size) + ", actual size " +
ToString(fsize) + "\n";
std::to_string(md.size) + ", actual size " +
std::to_string(fsize) + "\n";
}
}
}
@ -5089,6 +5085,7 @@ Status DBImpl::VerifyChecksumInternal(const ReadOptions& read_options,
}
}
// TODO: simplify using GetRefedColumnFamilySet?
std::vector<ColumnFamilyData*> cfd_list;
{
InstrumentedMutexLock l(&mutex_);
@ -5127,8 +5124,8 @@ Status DBImpl::VerifyChecksumInternal(const ReadOptions& read_options,
fmeta->file_checksum_func_name, fname,
read_options);
} else {
s = ROCKSDB_NAMESPACE::VerifySstFileChecksum(opts, file_options_,
read_options, fname);
s = ROCKSDB_NAMESPACE::VerifySstFileChecksum(
opts, file_options_, read_options, fname, fd.largest_seqno);
}
RecordTick(stats_, VERIFY_CHECKSUM_READ_BYTES,
IOSTATS(bytes_read) - prev_bytes_read);
@ -5342,7 +5339,7 @@ Status DBImpl::ReserveFileNumbersBeforeIngestion(
Status DBImpl::GetCreationTimeOfOldestFile(uint64_t* creation_time) {
if (mutable_db_options_.max_open_files == -1) {
uint64_t oldest_time = port::kMaxUint64;
uint64_t oldest_time = std::numeric_limits<uint64_t>::max();
for (auto cfd : *versions_->GetColumnFamilySet()) {
if (!cfd->IsDropped()) {
uint64_t ctime;

View File

@ -2144,7 +2144,7 @@ class DBImpl : public DB {
// only during recovery. Using this feature enables not writing the state to
// memtable on normal writes and hence improving the throughput. Each new
// write of the state will replace the previous state entirely even if the
// keys in the two consecuitive states do not overlap.
// keys in the two consecutive states do not overlap.
// It is protected by log_write_mutex_ when two_write_queues_ is enabled.
// Otherwise only the heaad of write_thread_ can access it.
WriteBatch cached_recoverable_state_;
@ -2299,7 +2299,7 @@ class DBImpl : public DB {
static const int KEEP_LOG_FILE_NUM = 1000;
// MSVC version 1800 still does not have constexpr for ::max()
static const uint64_t kNoTimeOut = port::kMaxUint64;
static const uint64_t kNoTimeOut = std::numeric_limits<uint64_t>::max();
std::string db_absolute_path_;
@ -2369,11 +2369,6 @@ class DBImpl : public DB {
// DB::Open() or passed to us
bool own_sfm_;
// Default value is 0 which means ALL deletes are
// preserved. Note that this has no effect if preserve_deletes is false.
const std::atomic<SequenceNumber> preserve_deletes_seqnum_{0};
const bool preserve_deletes_ = false;
// Flag to check whether Close() has been called on this DB
bool closed_;
// save the closing status, for re-calling the close()

View File

@ -188,7 +188,7 @@ Status DBImpl::FlushMemTableToOutputFile(
// a memtable without knowing such snapshot(s).
uint64_t max_memtable_id = needs_to_sync_closed_wals
? cfd->imm()->GetLatestMemTableID()
: port::kMaxUint64;
: std::numeric_limits<uint64_t>::max();
// If needs_to_sync_closed_wals is false, then the flush job will pick ALL
// existing memtables of the column family when PickMemTable() is called
@ -1041,7 +1041,8 @@ Status DBImpl::CompactRangeInternal(const CompactRangeOptions& options,
}
s = RunManualCompaction(cfd, ColumnFamilyData::kCompactAllLevels,
final_output_level, options, begin, end, exclusive,
false, port::kMaxUint64, trim_ts);
false, std::numeric_limits<uint64_t>::max(),
trim_ts);
} else {
int first_overlapped_level = kInvalidLevel;
int max_overlapped_level = kInvalidLevel;
@ -1078,7 +1079,7 @@ Status DBImpl::CompactRangeInternal(const CompactRangeOptions& options,
if (s.ok() && first_overlapped_level != kInvalidLevel) {
// max_file_num_to_ignore can be used to filter out newly created SST
// files, useful for bottom level compaction in a manual compaction
uint64_t max_file_num_to_ignore = port::kMaxUint64;
uint64_t max_file_num_to_ignore = std::numeric_limits<uint64_t>::max();
uint64_t next_file_number = versions_->current_next_file_number();
final_output_level = max_overlapped_level;
int output_level;
@ -1363,11 +1364,11 @@ Status DBImpl::CompactFilesImpl(
CompactionJob compaction_job(
job_context->job_id, c.get(), immutable_db_options_, mutable_db_options_,
file_options_for_compaction_, versions_.get(), &shutting_down_,
preserve_deletes_seqnum_.load(), log_buffer, directories_.GetDbDir(),
log_buffer, directories_.GetDbDir(),
GetDataDir(c->column_family_data(), c->output_path_id()),
GetDataDir(c->column_family_data(), 0), stats_, &mutex_, &error_handler_,
snapshot_seqs, earliest_write_conflict_snapshot, snapshot_checker,
table_cache_, &event_logger_,
job_context, table_cache_, &event_logger_,
c->mutable_cf_options()->paranoid_file_checks,
c->mutable_cf_options()->report_bg_io_stats, dbname_,
&compaction_job_stats, Env::Priority::USER, io_tracer_,
@ -2013,7 +2014,7 @@ Status DBImpl::FlushMemTable(ColumnFamilyData* cfd,
// be created and scheduled, status::OK() will be returned.
s = SwitchMemtable(cfd, &context);
}
const uint64_t flush_memtable_id = port::kMaxUint64;
const uint64_t flush_memtable_id = std::numeric_limits<uint64_t>::max();
if (s.ok()) {
if (cfd->imm()->NumNotFlushed() != 0 || !cfd->mem()->IsEmpty() ||
!cached_recoverable_state_empty_.load()) {
@ -2973,8 +2974,6 @@ void DBImpl::BackgroundCallCompaction(PrepickedCompaction* prepicked_compaction,
bg_bottom_compaction_scheduled_--;
}
versions_->GetColumnFamilySet()->FreeDeadColumnFamilies();
// See if there's more work to be done
MaybeScheduleFlushOrCompaction();
@ -3359,12 +3358,11 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
CompactionJob compaction_job(
job_context->job_id, c.get(), immutable_db_options_,
mutable_db_options_, file_options_for_compaction_, versions_.get(),
&shutting_down_, preserve_deletes_seqnum_.load(), log_buffer,
directories_.GetDbDir(),
&shutting_down_, log_buffer, directories_.GetDbDir(),
GetDataDir(c->column_family_data(), c->output_path_id()),
GetDataDir(c->column_family_data(), 0), stats_, &mutex_,
&error_handler_, snapshot_seqs, earliest_write_conflict_snapshot,
snapshot_checker, table_cache_, &event_logger_,
snapshot_checker, job_context, table_cache_, &event_logger_,
c->mutable_cf_options()->paranoid_file_checks,
c->mutable_cf_options()->report_bg_io_stats, dbname_,
&compaction_job_stats, thread_pri, io_tracer_,

View File

@ -118,10 +118,11 @@ Status DBImpl::TEST_CompactRange(int level, const Slice* begin,
cfd->ioptions()->compaction_style == kCompactionStyleFIFO)
? level
: level + 1;
return RunManualCompaction(cfd, level, output_level, CompactRangeOptions(),
begin, end, true, disallow_trivial_move,
port::kMaxUint64 /*max_file_num_to_ignore*/,
"" /*trim_ts*/);
return RunManualCompaction(
cfd, level, output_level, CompactRangeOptions(), begin, end, true,
disallow_trivial_move,
std::numeric_limits<uint64_t>::max() /*max_file_num_to_ignore*/,
"" /*trim_ts*/);
}
Status DBImpl::TEST_SwitchMemtable(ColumnFamilyData* cfd) {

View File

@ -23,11 +23,7 @@
namespace ROCKSDB_NAMESPACE {
uint64_t DBImpl::MinLogNumberToKeep() {
if (allow_2pc()) {
return versions_->min_log_number_to_keep_2pc();
} else {
return versions_->MinLogNumberWithUnflushedData();
}
return versions_->min_log_number_to_keep();
}
uint64_t DBImpl::MinObsoleteSstNumberToKeep() {
@ -224,7 +220,6 @@ void DBImpl::FindObsoleteFiles(JobContext* job_context, bool force,
}
// Add log files in wal_dir
if (!immutable_db_options_.IsWalDirSameAsDBPath(dbname_)) {
std::vector<std::string> log_files;
Status s = env_->GetChildren(immutable_db_options_.wal_dir, &log_files);
@ -234,6 +229,7 @@ void DBImpl::FindObsoleteFiles(JobContext* job_context, bool force,
log_file, immutable_db_options_.wal_dir);
}
}
// Add info log files in db_log_dir
if (!immutable_db_options_.db_log_dir.empty() &&
immutable_db_options_.db_log_dir != dbname_) {
@ -765,7 +761,7 @@ uint64_t PrecomputeMinLogNumberToKeepNon2PC(
assert(!cfds_to_flush.empty());
assert(cfds_to_flush.size() == edit_lists.size());
uint64_t min_log_number_to_keep = port::kMaxUint64;
uint64_t min_log_number_to_keep = std::numeric_limits<uint64_t>::max();
for (const auto& edit_list : edit_lists) {
uint64_t log = 0;
for (const auto& e : edit_list) {
@ -777,7 +773,7 @@ uint64_t PrecomputeMinLogNumberToKeepNon2PC(
min_log_number_to_keep = std::min(min_log_number_to_keep, log);
}
}
if (min_log_number_to_keep == port::kMaxUint64) {
if (min_log_number_to_keep == std::numeric_limits<uint64_t>::max()) {
min_log_number_to_keep = cfds_to_flush[0]->GetLogNumber();
for (size_t i = 1; i < cfds_to_flush.size(); i++) {
min_log_number_to_keep =

View File

@ -760,11 +760,11 @@ Status DBImpl::PersistentStatsProcessFormatVersion() {
WriteBatch batch;
if (s.ok()) {
s = batch.Put(persist_stats_cf_handle_, kFormatVersionKeyString,
ToString(kStatsCFCurrentFormatVersion));
std::to_string(kStatsCFCurrentFormatVersion));
}
if (s.ok()) {
s = batch.Put(persist_stats_cf_handle_, kCompatibleVersionKeyString,
ToString(kStatsCFCompatibleFormatVersion));
std::to_string(kStatsCFCompatibleFormatVersion));
}
if (s.ok()) {
WriteOptions wo;
@ -866,6 +866,11 @@ Status DBImpl::RecoverLogFiles(const std::vector<uint64_t>& wal_numbers,
bool flushed = false;
uint64_t corrupted_wal_number = kMaxSequenceNumber;
uint64_t min_wal_number = MinLogNumberToKeep();
if (!allow_2pc()) {
// In non-2pc mode, we skip WALs that do not back unflushed data.
min_wal_number =
std::max(min_wal_number, versions_->MinLogNumberWithUnflushedData());
}
for (auto wal_number : wal_numbers) {
if (wal_number < min_wal_number) {
ROCKS_LOG_INFO(immutable_db_options_.info_log,
@ -942,7 +947,6 @@ Status DBImpl::RecoverLogFiles(const std::vector<uint64_t>& wal_numbers,
// Read all the records and add to a memtable
std::string scratch;
Slice record;
WriteBatch batch;
TEST_SYNC_POINT_CALLBACK("DBImpl::RecoverLogFiles:BeforeReadWal",
/*arg=*/nullptr);
@ -956,10 +960,15 @@ Status DBImpl::RecoverLogFiles(const std::vector<uint64_t>& wal_numbers,
continue;
}
// We create a new batch and initialize with a valid prot_info_ to store
// the data checksums
WriteBatch batch(0, 0, 8, 0);
status = WriteBatchInternal::SetContents(&batch, record);
if (!status.ok()) {
return status;
}
SequenceNumber sequence = WriteBatchInternal::Sequence(&batch);
if (immutable_db_options_.wal_recovery_mode ==
@ -1270,9 +1279,16 @@ Status DBImpl::RecoverLogFiles(const std::vector<uint64_t>& wal_numbers,
}
std::unique_ptr<VersionEdit> wal_deletion;
if (immutable_db_options_.track_and_verify_wals_in_manifest) {
wal_deletion.reset(new VersionEdit);
wal_deletion->DeleteWalsBefore(max_wal_number + 1);
if (flushed) {
wal_deletion = std::make_unique<VersionEdit>();
if (immutable_db_options_.track_and_verify_wals_in_manifest) {
wal_deletion->DeleteWalsBefore(max_wal_number + 1);
}
if (!allow_2pc()) {
// In non-2pc mode, flushing the memtables of the column families
// means we can advance min_log_number_to_keep.
wal_deletion->SetMinLogNumberToKeep(max_wal_number + 1);
}
edit_lists.back().push_back(wal_deletion.get());
}
@ -1310,6 +1326,7 @@ Status DBImpl::GetLogSizeAndMaybeTruncate(uint64_t wal_number, bool truncate,
Status s;
// This gets the appear size of the wals, not including preallocated space.
s = env_->GetFileSize(fname, &log.size);
TEST_SYNC_POINT_CALLBACK("DBImpl::GetLogSizeAndMaybeTruncate:0", /*arg=*/&s);
if (s.ok() && truncate) {
std::unique_ptr<FSWritableFile> last_log;
Status truncate_status = fs_->ReopenWritableFile(
@ -1351,7 +1368,14 @@ Status DBImpl::RestoreAliveLogFiles(const std::vector<uint64_t>& wal_numbers) {
// FindObsoleteFiles()
total_log_size_ = 0;
log_empty_ = false;
uint64_t min_wal_with_unflushed_data =
versions_->MinLogNumberWithUnflushedData();
for (auto wal_number : wal_numbers) {
if (!allow_2pc() && wal_number < min_wal_with_unflushed_data) {
// In non-2pc mode, the WAL files not backing unflushed data are not
// alive, thus should not be added to the alive_log_files_.
continue;
}
// We preallocate space for wals, but then after a crash and restart, those
// preallocated space are not needed anymore. It is likely only the last
// log has such preallocated space, so we only truncate for the last log.
@ -1446,9 +1470,9 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
dbname_, versions_.get(), immutable_db_options_, tboptions,
file_options_for_compaction_, cfd->table_cache(), iter.get(),
std::move(range_del_iters), &meta, &blob_file_additions,
snapshot_seqs, earliest_write_conflict_snapshot, snapshot_checker,
paranoid_file_checks, cfd->internal_stats(), &io_s, io_tracer_,
BlobFileCreationReason::kRecovery, &event_logger_, job_id,
snapshot_seqs, earliest_write_conflict_snapshot, kMaxSequenceNumber,
snapshot_checker, paranoid_file_checks, cfd->internal_stats(), &io_s,
io_tracer_, BlobFileCreationReason::kRecovery, &event_logger_, job_id,
Env::IO_HIGH, nullptr /* table_properties */, write_hint,
nullptr /*full_history_ts_low*/, &blob_callback_);
LogFlush(immutable_db_options_.info_log);
@ -1802,6 +1826,7 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
if (s.ok()) {
// Need to fsync, otherwise it might get lost after a power reset.
s = impl->FlushWAL(false);
TEST_SYNC_POINT_CALLBACK("DBImpl::Open::BeforeSyncWAL", /*arg=*/&s);
if (s.ok()) {
s = log_writer->file()->Sync(impl->immutable_db_options_.use_fsync);
}
@ -1952,10 +1977,10 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
"DB::Open() failed --- Unable to persist Options file",
persist_options_status.ToString());
}
} else {
}
if (!s.ok()) {
ROCKS_LOG_WARN(impl->immutable_db_options_.info_log,
"Persisting Option File error: %s",
persist_options_status.ToString().c_str());
"DB::Open() failed: %s", s.ToString().c_str());
}
if (s.ok()) {
s = impl->StartPeriodicWorkScheduler();

View File

@ -247,15 +247,16 @@ Status DBImplSecondary::RecoverLogFiles(
if (seq_of_batch <= seq) {
continue;
}
auto curr_log_num = port::kMaxUint64;
auto curr_log_num = std::numeric_limits<uint64_t>::max();
if (cfd_to_current_log_.count(cfd) > 0) {
curr_log_num = cfd_to_current_log_[cfd];
}
// If the active memtable contains records added by replaying an
// earlier WAL, then we need to seal the memtable, add it to the
// immutable memtable list and create a new active memtable.
if (!cfd->mem()->IsEmpty() && (curr_log_num == port::kMaxUint64 ||
curr_log_num != log_number)) {
if (!cfd->mem()->IsEmpty() &&
(curr_log_num == std::numeric_limits<uint64_t>::max() ||
curr_log_num != log_number)) {
const MutableCFOptions mutable_cf_options =
*cfd->GetLatestMutableCFOptions();
MemTable* new_mem =
@ -710,8 +711,11 @@ Status DB::OpenAsSecondary(
}
Status DBImplSecondary::CompactWithoutInstallation(
ColumnFamilyHandle* cfh, const CompactionServiceInput& input,
CompactionServiceResult* result) {
const OpenAndCompactOptions& options, ColumnFamilyHandle* cfh,
const CompactionServiceInput& input, CompactionServiceResult* result) {
if (options.canceled && options.canceled->load(std::memory_order_acquire)) {
return Status::Incomplete(Status::SubCode::kManualCompactionPaused);
}
InstrumentedMutexLock l(&mutex_);
auto cfd = static_cast_with_check<ColumnFamilyHandleImpl>(cfh)->cfd();
if (!cfd) {
@ -773,7 +777,7 @@ Status DBImplSecondary::CompactWithoutInstallation(
file_options_for_compaction_, versions_.get(), &shutting_down_,
&log_buffer, output_dir.get(), stats_, &mutex_, &error_handler_,
input.snapshots, table_cache_, &event_logger_, dbname_, io_tracer_,
db_id_, db_session_id_, secondary_path_, input, result);
options.canceled, db_id_, db_session_id_, secondary_path_, input, result);
mutex_.Unlock();
s = compaction_job.Run();
@ -792,9 +796,13 @@ Status DBImplSecondary::CompactWithoutInstallation(
}
Status DB::OpenAndCompact(
const std::string& name, const std::string& output_directory,
const std::string& input, std::string* result,
const OpenAndCompactOptions& options, const std::string& name,
const std::string& output_directory, const std::string& input,
std::string* output,
const CompactionServiceOptionsOverride& override_options) {
if (options.canceled && options.canceled->load(std::memory_order_acquire)) {
return Status::Incomplete(Status::SubCode::kManualCompactionPaused);
}
CompactionServiceInput compaction_input;
Status s = CompactionServiceInput::Read(input, &compaction_input);
if (!s.ok()) {
@ -824,6 +832,7 @@ Status DB::OpenAndCompact(
override_options.table_factory;
compaction_input.column_family.options.sst_partitioner_factory =
override_options.sst_partitioner_factory;
compaction_input.db_options.listeners = override_options.listeners;
std::vector<ColumnFamilyDescriptor> column_families;
column_families.push_back(compaction_input.column_family);
@ -847,10 +856,10 @@ Status DB::OpenAndCompact(
CompactionServiceResult compaction_result;
DBImplSecondary* db_secondary = static_cast_with_check<DBImplSecondary>(db);
assert(handles.size() > 0);
s = db_secondary->CompactWithoutInstallation(handles[0], compaction_input,
&compaction_result);
s = db_secondary->CompactWithoutInstallation(
options, handles[0], compaction_input, &compaction_result);
Status serialization_status = compaction_result.Write(result);
Status serialization_status = compaction_result.Write(output);
for (auto& handle : handles) {
delete handle;
@ -862,6 +871,14 @@ Status DB::OpenAndCompact(
return s;
}
Status DB::OpenAndCompact(
const std::string& name, const std::string& output_directory,
const std::string& input, std::string* output,
const CompactionServiceOptionsOverride& override_options) {
return OpenAndCompact(OpenAndCompactOptions(), name, output_directory, input,
output, override_options);
}
#else // !ROCKSDB_LITE
Status DB::OpenAsSecondary(const Options& /*options*/,

View File

@ -228,10 +228,11 @@ class DBImplSecondary : public DBImpl {
Status CheckConsistency() override;
#ifndef NDEBUG
Status TEST_CompactWithoutInstallation(ColumnFamilyHandle* cfh,
Status TEST_CompactWithoutInstallation(const OpenAndCompactOptions& options,
ColumnFamilyHandle* cfh,
const CompactionServiceInput& input,
CompactionServiceResult* result) {
return CompactWithoutInstallation(cfh, input, result);
return CompactWithoutInstallation(options, cfh, input, result);
}
#endif // NDEBUG
@ -346,7 +347,8 @@ class DBImplSecondary : public DBImpl {
// Run compaction without installation, the output files will be placed in the
// secondary DB path. The LSM tree won't be changed, the secondary DB is still
// in read-only mode.
Status CompactWithoutInstallation(ColumnFamilyHandle* cfh,
Status CompactWithoutInstallation(const OpenAndCompactOptions& options,
ColumnFamilyHandle* cfh,
const CompactionServiceInput& input,
CompactionServiceResult* result);

View File

@ -129,7 +129,7 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
PreReleaseCallback* pre_release_callback) {
assert(!seq_per_batch_ || batch_cnt != 0);
if (my_batch == nullptr) {
return Status::Corruption("Batch is nullptr!");
return Status::InvalidArgument("Batch is nullptr!");
} else if (!disable_memtable &&
WriteBatchInternal::TimestampsUpdateNeeded(*my_batch)) {
// If writing to memtable, then we require the caller to set/update the
@ -1882,9 +1882,11 @@ void DBImpl::NotifyOnMemTableSealed(ColumnFamilyData* /*cfd*/,
return;
}
mutex_.Unlock();
for (auto listener : immutable_db_options_.listeners) {
listener->OnMemTableSealed(mem_table_info);
}
mutex_.Lock();
}
#endif // ROCKSDB_LITE
@ -2085,11 +2087,9 @@ Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) {
mutable_cf_options);
#ifndef ROCKSDB_LITE
mutex_.Unlock();
// Notify client that memtable is sealed, now that we have successfully
// installed a new memtable
NotifyOnMemTableSealed(cfd, memtable_info);
mutex_.Lock();
#endif // ROCKSDB_LITE
// It is possible that we got here without checking the value of i_os, but
// that is okay. If we did, it most likely means that s was already an error.

View File

@ -35,10 +35,12 @@ void DumpDBFileSummary(const ImmutableDBOptions& options,
Header(options.info_log, "DB SUMMARY\n");
Header(options.info_log, "DB Session ID: %s\n", session_id.c_str());
Status s;
// Get files in dbname dir
if (!env->GetChildren(dbname, &files).ok()) {
Error(options.info_log,
"Error when reading %s dir\n", dbname.c_str());
s = env->GetChildren(dbname, &files);
if (!s.ok()) {
Error(options.info_log, "Error when reading %s dir %s\n", dbname.c_str(),
s.ToString().c_str());
}
std::sort(files.begin(), files.end());
for (const std::string& file : files) {
@ -53,24 +55,27 @@ void DumpDBFileSummary(const ImmutableDBOptions& options,
Header(options.info_log, "IDENTITY file: %s\n", file.c_str());
break;
case kDescriptorFile:
if (env->GetFileSize(dbname + "/" + file, &file_size).ok()) {
s = env->GetFileSize(dbname + "/" + file, &file_size);
if (s.ok()) {
Header(options.info_log,
"MANIFEST file: %s size: %" PRIu64 " Bytes\n", file.c_str(),
file_size);
} else {
Error(options.info_log, "Error when reading MANIFEST file: %s/%s\n",
dbname.c_str(), file.c_str());
Error(options.info_log,
"Error when reading MANIFEST file: %s/%s %s\n", dbname.c_str(),
file.c_str(), s.ToString().c_str());
}
break;
case kWalFile:
if (env->GetFileSize(dbname + "/" + file, &file_size).ok()) {
s = env->GetFileSize(dbname + "/" + file, &file_size);
if (s.ok()) {
wal_info.append(file)
.append(" size: ")
.append(std::to_string(file_size))
.append(" ; ");
} else {
Error(options.info_log, "Error when reading LOG file: %s/%s\n",
dbname.c_str(), file.c_str());
Error(options.info_log, "Error when reading LOG file: %s/%s %s\n",
dbname.c_str(), file.c_str(), s.ToString().c_str());
}
break;
case kTableFile:
@ -86,10 +91,10 @@ void DumpDBFileSummary(const ImmutableDBOptions& options,
// Get sst files in db_path dir
for (auto& db_path : options.db_paths) {
if (dbname.compare(db_path.path) != 0) {
if (!env->GetChildren(db_path.path, &files).ok()) {
Error(options.info_log,
"Error when reading %s dir\n",
db_path.path.c_str());
s = env->GetChildren(db_path.path, &files);
if (!s.ok()) {
Error(options.info_log, "Error when reading %s dir %s\n",
db_path.path.c_str(), s.ToString().c_str());
continue;
}
std::sort(files.begin(), files.end());
@ -111,22 +116,25 @@ void DumpDBFileSummary(const ImmutableDBOptions& options,
// Get wal file in wal_dir
const auto& wal_dir = options.GetWalDir(dbname);
if (!options.IsWalDirSameAsDBPath(dbname)) {
if (!env->GetChildren(wal_dir, &files).ok()) {
Error(options.info_log, "Error when reading %s dir\n", wal_dir.c_str());
s = env->GetChildren(wal_dir, &files);
if (!s.ok()) {
Error(options.info_log, "Error when reading %s dir %s\n", wal_dir.c_str(),
s.ToString().c_str());
return;
}
wal_info.clear();
for (const std::string& file : files) {
if (ParseFileName(file, &number, &type)) {
if (type == kWalFile) {
if (env->GetFileSize(wal_dir + "/" + file, &file_size).ok()) {
s = env->GetFileSize(wal_dir + "/" + file, &file_size);
if (s.ok()) {
wal_info.append(file)
.append(" size: ")
.append(std::to_string(file_size))
.append(" ; ");
} else {
Error(options.info_log, "Error when reading LOG file %s/%s\n",
wal_dir.c_str(), file.c_str());
Error(options.info_log, "Error when reading LOG file %s/%s %s\n",
wal_dir.c_str(), file.c_str(), s.ToString().c_str());
}
}
}

View File

@ -78,7 +78,6 @@ DBIter::DBIter(Env* _env, const ReadOptions& read_options,
range_del_agg_(&ioptions.internal_comparator, s),
db_impl_(db_impl),
cfd_(cfd),
start_seqnum_(0ULL),
timestamp_ub_(read_options.timestamp),
timestamp_lb_(read_options.iter_start_ts),
timestamp_size_(timestamp_ub_ ? timestamp_ub_->size() : 0) {
@ -328,25 +327,7 @@ bool DBIter::FindNextUserEntryInternal(bool skipping_saved_key,
case kTypeSingleDeletion:
// Arrange to skip all upcoming entries for this key since
// they are hidden by this deletion.
// if iterartor specified start_seqnum we
// 1) return internal key, including the type
// 2) return ikey only if ikey.seqnum >= start_seqnum_
// note that if deletion seqnum is < start_seqnum_ we
// just skip it like in normal iterator.
if (start_seqnum_ > 0) {
if (ikey_.sequence >= start_seqnum_) {
saved_key_.SetInternalKey(ikey_);
valid_ = true;
return true;
} else {
saved_key_.SetUserKey(
ikey_.user_key,
!pin_thru_lifetime_ ||
!iter_.iter()->IsKeyPinned() /* copy */);
skipping_saved_key = true;
PERF_COUNTER_ADD(internal_delete_skipped_count, 1);
}
} else if (timestamp_lb_) {
if (timestamp_lb_) {
saved_key_.SetInternalKey(ikey_);
valid_ = true;
return true;
@ -360,28 +341,7 @@ bool DBIter::FindNextUserEntryInternal(bool skipping_saved_key,
break;
case kTypeValue:
case kTypeBlobIndex:
if (start_seqnum_ > 0) {
if (ikey_.sequence >= start_seqnum_) {
saved_key_.SetInternalKey(ikey_);
if (ikey_.type == kTypeBlobIndex) {
if (!SetBlobValueIfNeeded(ikey_.user_key, iter_.value())) {
return false;
}
}
valid_ = true;
return true;
} else {
// this key and all previous versions shouldn't be included,
// skipping_saved_key
saved_key_.SetUserKey(
ikey_.user_key,
!pin_thru_lifetime_ ||
!iter_.iter()->IsKeyPinned() /* copy */);
skipping_saved_key = true;
}
} else if (timestamp_lb_) {
if (timestamp_lb_) {
saved_key_.SetInternalKey(ikey_);
if (ikey_.type == kTypeBlobIndex) {

View File

@ -151,7 +151,7 @@ class DBIter final : public Iterator {
}
Slice key() const override {
assert(valid_);
if (start_seqnum_ > 0 || timestamp_lb_) {
if (timestamp_lb_) {
return saved_key_.GetInternalKey();
} else {
const Slice ukey_and_ts = saved_key_.GetUserKey();
@ -371,9 +371,6 @@ class DBIter final : public Iterator {
ROCKSDB_FIELD_UNUSED
#endif
ColumnFamilyData* cfd_;
// for diff snapshots we want the lower bound on the seqnum;
// if this value > 0 iterator will return internal keys
SequenceNumber start_seqnum_;
const Slice* const timestamp_ub_;
const Slice* const timestamp_lb_;
const size_t timestamp_size_;

View File

@ -414,7 +414,7 @@ TEST_F(DBIteratorStressTest, StressTest) {
a /= 10;
++len;
}
std::string s = ToString(rnd.Next() % static_cast<uint64_t>(max_key));
std::string s = std::to_string(rnd.Next() % static_cast<uint64_t>(max_key));
s.insert(0, len - (int)s.size(), '0');
return s;
};
@ -444,12 +444,13 @@ TEST_F(DBIteratorStressTest, StressTest) {
for (double mutation_probability : {0.01, 0.5}) {
for (double target_hidden_fraction : {0.1, 0.5}) {
std::string trace_str =
"entries: " + ToString(num_entries) +
", key_space: " + ToString(key_space) +
", error_probability: " + ToString(error_probability) +
", mutation_probability: " + ToString(mutation_probability) +
"entries: " + std::to_string(num_entries) +
", key_space: " + std::to_string(key_space) +
", error_probability: " + std::to_string(error_probability) +
", mutation_probability: " +
std::to_string(mutation_probability) +
", target_hidden_fraction: " +
ToString(target_hidden_fraction);
std::to_string(target_hidden_fraction);
SCOPED_TRACE(trace_str);
if (trace) {
std::cout << trace_str << std::endl;
@ -470,7 +471,7 @@ TEST_F(DBIteratorStressTest, StressTest) {
types[rnd.Next() % (sizeof(types) / sizeof(types[0]))];
}
e.sequence = i;
e.value = "v" + ToString(i);
e.value = "v" + std::to_string(i);
ParsedInternalKey internal_key(e.key, e.sequence, e.type);
AppendInternalKey(&e.ikey, internal_key);

View File

@ -766,7 +766,7 @@ TEST_F(DBIteratorTest, DBIteratorUseSkip) {
internal_iter->AddMerge("b", "merge_1");
internal_iter->AddMerge("a", "merge_2");
for (size_t k = 0; k < 200; ++k) {
internal_iter->AddPut("c", ToString(k));
internal_iter->AddPut("c", std::to_string(k));
}
internal_iter->Finish();
@ -780,7 +780,7 @@ TEST_F(DBIteratorTest, DBIteratorUseSkip) {
ASSERT_TRUE(db_iter->Valid());
ASSERT_EQ(db_iter->key().ToString(), "c");
ASSERT_EQ(db_iter->value().ToString(), ToString(i));
ASSERT_EQ(db_iter->value().ToString(), std::to_string(i));
db_iter->Prev();
ASSERT_TRUE(db_iter->Valid());
@ -925,11 +925,11 @@ TEST_F(DBIteratorTest, DBIteratorUseSkip) {
internal_iter->AddMerge("b", "merge_1");
internal_iter->AddMerge("a", "merge_2");
for (size_t k = 0; k < 200; ++k) {
internal_iter->AddPut("d", ToString(k));
internal_iter->AddPut("d", std::to_string(k));
}
for (size_t k = 0; k < 200; ++k) {
internal_iter->AddPut("c", ToString(k));
internal_iter->AddPut("c", std::to_string(k));
}
internal_iter->Finish();
@ -942,7 +942,7 @@ TEST_F(DBIteratorTest, DBIteratorUseSkip) {
ASSERT_TRUE(db_iter->Valid());
ASSERT_EQ(db_iter->key().ToString(), "d");
ASSERT_EQ(db_iter->value().ToString(), ToString(i));
ASSERT_EQ(db_iter->value().ToString(), std::to_string(i));
db_iter->Prev();
ASSERT_TRUE(db_iter->Valid());
@ -966,7 +966,7 @@ TEST_F(DBIteratorTest, DBIteratorUseSkip) {
internal_iter->AddMerge("b", "b");
internal_iter->AddMerge("a", "a");
for (size_t k = 0; k < 200; ++k) {
internal_iter->AddMerge("c", ToString(k));
internal_iter->AddMerge("c", std::to_string(k));
}
internal_iter->Finish();
@ -981,7 +981,7 @@ TEST_F(DBIteratorTest, DBIteratorUseSkip) {
ASSERT_EQ(db_iter->key().ToString(), "c");
std::string merge_result = "0";
for (size_t j = 1; j <= i; ++j) {
merge_result += "," + ToString(j);
merge_result += "," + std::to_string(j);
}
ASSERT_EQ(db_iter->value().ToString(), merge_result);
@ -3156,7 +3156,7 @@ TEST_F(DBIteratorTest, ReverseToForwardWithDisappearingKeys) {
internal_iter->AddPut("a", "A");
internal_iter->AddPut("b", "B");
for (int i = 0; i < 100; ++i) {
internal_iter->AddPut("c" + ToString(i), "");
internal_iter->AddPut("c" + std::to_string(i), "");
}
internal_iter->Finish();

View File

@ -111,9 +111,12 @@ TEST_P(DBIteratorTest, PersistedTierOnIterator) {
TEST_P(DBIteratorTest, NonBlockingIteration) {
do {
ReadOptions non_blocking_opts, regular_opts;
Options options = CurrentOptions();
anon::OptionsOverride options_override;
options_override.full_block_cache = true;
Options options = CurrentOptions(options_override);
options.statistics = ROCKSDB_NAMESPACE::CreateDBStatistics();
non_blocking_opts.read_tier = kBlockCacheTier;
CreateAndReopenWithCF({"pikachu"}, options);
// write one kv to the database.
ASSERT_OK(Put(1, "a", "b"));
@ -3157,7 +3160,7 @@ TEST_F(DBIteratorWithReadCallbackTest, ReadCallback) {
uint64_t num_versions =
CurrentOptions().max_sequential_skip_in_iterations + 10;
for (uint64_t i = 0; i < num_versions; i++) {
ASSERT_OK(Put("bar", ToString(i)));
ASSERT_OK(Put("bar", std::to_string(i)));
}
SequenceNumber seq3 = db_->GetLatestSequenceNumber();
TestReadCallback callback2(seq3);
@ -3186,7 +3189,7 @@ TEST_F(DBIteratorWithReadCallbackTest, ReadCallback) {
ASSERT_TRUE(iter->Valid());
ASSERT_OK(iter->status());
ASSERT_EQ("bar", iter->key());
ASSERT_EQ(ToString(num_versions - 1), iter->value());
ASSERT_EQ(std::to_string(num_versions - 1), iter->value());
delete iter;
}

Some files were not shown because too many files have changed in this diff Show More