Commit Graph

1193 Commits

Author SHA1 Message Date
Zitan Chen
b578ca2e4d BackupEngine supports custom file checksums (#7085)
Summary:
A new option `std::shared_ptr<FileChecksumGenFactory> backup_checksum_gen_factory` is added to `BackupableDBOptions`. This allows custom checksum functions to be used for creating, verifying, or restoring backups.

Tests are added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7085

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22390756

Pulled By: gg814

fbshipit-source-id: 3b7756ca444c2129844536b91c3ca09f53b6248f
2020-08-12 13:31:09 -07:00
Peter Dillinger
6ac1d25fd0 Fix+clean up handling of mock sleeps (#7101)
Summary:
We have a number of tests hanging on MacOS and windows due to
mishandling of code for mock sleeps. In addition, the code was in
terrible shape because the same variable (addon_time_) would sometimes
refer to microseconds and sometimes to seconds. One test even assumed it
was nanoseconds but was written to pass anyway.

This has been cleaned up so that DB tests generally use a SpecialEnv
function to mock sleep, for either some number of microseconds or seconds
depending on the function called. But to call one of these, the test must first
call SetMockSleep (precondition enforced with assertion), which also turns
sleeps in RocksDB into mock sleeps. To also removes accounting for actual
clock time, call SetTimeElapseOnlySleepOnReopen, which implies
SetMockSleep (on DB re-open). This latter setting only works by applying
on DB re-open, otherwise havoc can ensue if Env goes back in time with
DB open.

More specifics:

Removed some unused test classes, and updated comments on the general
problem.

Fixed DBSSTTest.GetTotalSstFilesSize using a sync point callback instead
of mock time. For this we have the only modification to production code,
inserting a sync point callback in flush_job.cc, which is not a change to
production behavior.

Removed unnecessary resetting of mock times to 0 in many tests. RocksDB
deals in relative time. Any behaviors relying on absolute date/time are likely
a bug. (The above test DBSSTTest.GetTotalSstFilesSize was the only one
clearly injecting a specific absolute time for actual testing convenience.) Just
in case I misunderstood some test, I put this note in each replacement:
// NOTE: Presumed unnecessary and removed: resetting mock time in env

Strengthened some tests like MergeTestTime, MergeCompactionTimeTest, and
FilterCompactionTimeTest in db_test.cc

stats_history_test and blob_db_test are each their own beast, rather deeply
dependent on MockTimeEnv. Each gets its own variant of a work-around for
TimedWait in a mock time environment. (Reduces redundancy and
inconsistency in stats_history_test.)

Intended follow-up:

Remove TimedWait from the public API of InstrumentedCondVar, and only
make that accessible through Env by passing in an InstrumentedCondVar and
a deadline. Then the Env implementations mocking time can fix this problem
without using sync points. (Test infrastructure using sync points interferes
with individual tests' control over sync points.)

With that change, we can simplify/consolidate the scattered work-arounds.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7101

Test Plan: make check on Linux and MacOS

Reviewed By: zhichao-cao

Differential Revision: D23032815

Pulled By: pdillinger

fbshipit-source-id: 7f33967ada8b83011fb54e8279365c008bd6610b
2020-08-11 12:41:30 -07:00
Cheng Chang
71c7e4935e Replace tracked_keys with a new LockTracker interface in TransactionDB (#7013)
Summary:
We're going to support more locking protocols such as range lock in transaction.

However, in current design, `TransactionBase` has a member `tracked_keys` which assumes that point lock (lock a single key) is used, and is used in snapshot checking (isolation protocol). When using range lock, we may use read committed instead of snapshot checking as the isolation protocol.

The most significant usage scenarios of `tracked_keys` are:
1. pessimistic transaction uses it to track the locked keys, and unlock these keys when commit or rollback.
2. optimistic transaction does not lock keys upfront, it only tracks the lock intentions in tracked_keys, and do write conflict checking when commit.
3. each `SavePoint` tracks the keys that are locked since the `SavePoint`, `RollbackToSavePoint` or `PopSavePoint` relies on both the tracked keys in `SavePoint`s and `tracked_keys`.

Based on these scenarios, if we can abstract out a `LockTracker` interface to hold a set of tracked locks (can be keys or key ranges), and have methods that can be composed together to implement the scenarios, then `tracked_keys` can be an internal data structure of one implementation of `LockTracker`. See `utilities/transactions/lock/lock_tracker.h` for the detailed interface design, and `utilities/transactions/lock/point_lock_tracker.cc` for the implementation.

In the future, a `RangeLockTracker` can be implemented to track range locks without affecting other components.

After this PR, a clean interface for lock manager should be possible, and then ideally, we can have pluggable locking protocols.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7013

Test Plan: Run `transaction_test` and `optimistic_transaction_test`.

Reviewed By: ajkr

Differential Revision: D22163706

Pulled By: cheng-chang

fbshipit-source-id: f2860577b5334e31dd2994f5bc6d7c40d502b1b4
2020-08-06 12:38:00 -07:00
Levi Tamasi
124fbd96d8 Remove assertion from FaultInjectionTestFS::NewDirectory (#7220)
Summary:
FaultInjectionTestFS::NewDirectory currently asserts that the directory
creation on the target filesystem succeeds. This is actually not
guaranteed since there might be a legitimate I/O error when creating the
directory. The patch removes this assertion.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7220

Test Plan: `make check`

Reviewed By: zhichao-cao

Differential Revision: D22957990

Pulled By: ltamasi

fbshipit-source-id: b2e221320d8ce7235cb4897ef5936072412a25b6
2020-08-05 16:25:14 -07:00
Andrew Kryczka
a4a4a2dabd dedup ReadOptions in iterator hierarchy (#7210)
Summary:
Previously, a `ReadOptions` object was stored in every `BlockBasedTableIterator`
and every `LevelIterator`. This redundancy consumes extra memory,
resulting in the `Arena` making more allocations, and iteration
observing worse cache performance.

This PR migrates callers of `NewInternalIterator()` and
`MakeInputIterator()` to provide a `ReadOptions` object guaranteed to
outlive the returned iterator. When the iterator's lifetime will be managed by the
user, this lifetime guarantee is achieved by storing the `ReadOptions`
value in `ArenaWrappedDBIter`. Then, sub-iterators of `NewInternalIterator()` and
`MakeInputIterator()` can hold a reference-to-const `ReadOptions`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7210

Test Plan:
- `make check` under ASAN and valgrind
- benchmark: on a DB with 2 L0 files and 3 L1+ levels, this PR reduced `Arena` allocation 4792 -> 4160 bytes.

Reviewed By: anand1976

Differential Revision: D22861323

Pulled By: ajkr

fbshipit-source-id: 54aebb3e89c872eeab0f5793b4b6e42878d093ce
2020-08-03 15:23:04 -07:00
mrambacher
d9d190742c Make env*_test work with ASSERT_STATUS_CHECKED (#7176)
Summary:
Make (most of) the env*_test pass when ASSERT_STATUS_CHECKED is enabled.

One test that opens a database is currently disabled in this mode, as there are many errors that need revisited for DB tests and status checks.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7176

Reviewed By: cheng-chang

Differential Revision: D22799278

Pulled By: ajkr

fbshipit-source-id: 16d8a02eaeecd6df1060249b6a5811292801f2ed
2020-07-28 22:59:48 -07:00
zitan
4496719450 Fix data race warning of BackupableDBTest.TableFileWithDbChecksumCorruptedDuringBackup (#7177)
Summary:
Fix the data race warning by removing an unnecessary variable that causes the warning.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7177

Test Plan:
`COMPILE_WITH_TSAN=1 make backupable_db_test`
`./backupable_db_test --gtest_filter=*TableFileWithDbChecksumCorruptedDuringBackup*`

Reviewed By: riversand963

Differential Revision: D22774430

Pulled By: gg814

fbshipit-source-id: 3b0b1ac344d0375c64da564cc97f98745c289959
2020-07-28 12:10:39 -07:00
Akanksha Mahajan
7e37a5918c Fix for flaky test BackupableDBTest.RateLimiting (#7167)
Summary:
BackupableDBTest.RateLimiting test is failing due to timed out
on our test server. It might be because of nested loops run sequentially that test different type of combinations of parameters. This patch converts the test into parameterized test so that all combinations can be tested out.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7167

Test Plan: make check -j64

Reviewed By: zhichao-cao

Differential Revision: D22709531

Pulled By: akankshamahajan15

fbshipit-source-id: 95518153e87b3b5311a6c1960a191bca58898786
2020-07-24 14:47:00 -07:00
Levi Tamasi
0d04a8434a Sync blob files before closing them (#7160)
Summary:
BlobDB currently syncs each blob file periodically after writing a certain amount of
data (as specified by the configuration option `BlobDBOptions::bytes_per_sync`)
and all open blob files when the base DB's memtables are flushed. With the patch,
in addition to the above, blob files are also synced right before being closed, after
the footer has been written. This will be beneficial for the new integrated blob file
write path as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7160

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D22672646

Pulled By: ltamasi

fbshipit-source-id: 62b34263543a7e74abcbb7adf011daa1e699998f
2020-07-22 17:25:20 -07:00
Zitan Chen
b923dc720b BackupEngine computes table checksums only once if db session ids are available (#7110)
Summary:
BackupEngine requires computing table checksums twice when backing up table files to the `shared_checksum` directory.

The repeated computation can be avoided by utilizing the db session id stored as a part of the table properties.

Filenames of table files in the `shared_checksum` directory depend on the following conditions:
1. the naming scheme is `kOptionalChecksumAndDbSessionId`,
2. `db_session_id` is not empty,
3. checksum is available in the DB manifest.

If 1,2,3 are satisfied, then the filenames will be of the form `<file_number>_<checksum>_<db_session_id>.sst`.
If 1,2 are satisfied, then the filenames will be of the form `<file_number>_<db_session_id>.sst`.
In all other cases, the filenames are of the form `<file_number>_<checksum>_<size>.sst`.

Additionally, if `kOptionalChecksumAndDbSessionId` is used (and not falling back to `kChecksumAndFileSize`), the `<checksum>` appeared in the filenames is hexadecimally encoded, instead of being plain `uint32_t` value.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7110

Test Plan: backupable_db_test and manual tests.

Reviewed By: ajkr

Differential Revision: D22508992

Pulled By: gg814

fbshipit-source-id: 5669f0ea9ad5a097f69f6d87aca4abba15032389
2020-07-21 10:35:40 -07:00
Levi Tamasi
c5ddeceba0 Remove some more dead code around syncing blob files (#7138)
Summary:
Periodic syncing of blob files is handled by a lower layer, namely by
`WritableFileWriter`; the `NeedsFsync` method of `BlobFile` and the
`last_fsync_` member variable are actually unused and thus can be
removed. See also https://github.com/facebook/rocksdb/pull/7125 .

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7138

Test Plan: `make check`

Reviewed By: zhichao-cao

Differential Revision: D22562981

Pulled By: ltamasi

fbshipit-source-id: c235aad94a7c27120528c9ec270a7a5b9154e49f
2020-07-15 18:53:54 -07:00
Levi Tamasi
ee8c79d40d Turn the compression_type check in BlobDBImpl::DecompressSlice into an assertion (#7127)
Summary:
In both cases where `BlobDBImpl::DecompressSlice` is called,
`compression_type` is already checked at the call site; thus, the check
inside the method is redundant and can be turned into an assertion.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7127

Test Plan: `make check`

Reviewed By: zhichao-cao

Differential Revision: D22533454

Pulled By: ltamasi

fbshipit-source-id: ae524443fc6abe0a5fb12327a3fe761a9cd2c831
2020-07-15 13:19:14 -07:00
Levi Tamasi
687fbd0270 Update some log messages in BlobDB to account for compaction filters (#7128)
Summary:
https://github.com/facebook/rocksdb/pull/6850, which added compaction
filter support to BlobDB, reused elements of the BlobDB GC mechanism.
This patch updates some log messages in this logic to account for this
fact; namely, it replaces mentions of "GC" with "compaction/GC" to avoid
confusion in cases when GC is not enabled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7128

Test Plan: `make check`

Reviewed By: zhichao-cao

Differential Revision: D22535371

Pulled By: ltamasi

fbshipit-source-id: 1f14f3b02ab9983728bbca1cf680420208d9a195
2020-07-14 16:53:33 -07:00
Levi Tamasi
bdf4de6cb9 Remove some dead code from BlobLogWriter (#7125)
Summary:
Periodic syncing of blob files is performed by `WritableFileWriter`;
`bytes_per_sync_` and `next_sync_offset_` in `BlobLogWriter` are
actually unused (or more precisely, only used by methods that are
themselves unused). The patch removes all this dead code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7125

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D22531021

Pulled By: ltamasi

fbshipit-source-id: 6b293ad5a79d3e6bf15c5c68f7aedd7ce7a15f10
2020-07-14 13:51:54 -07:00
sdong
43cc622d09 Add CLANG analyze to CircleCI (#7114)
Summary:
CLANG analyze is useful before pull request. Add it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7114

Test Plan: Watch the CI results to succeed.

Reviewed By: riversand963

Differential Revision: D22491942

fbshipit-source-id: 9ccad91c6142fedc3d3dd491cf55054827908f36
2020-07-13 12:33:16 -07:00
mrambacher
c7c7b07f06 More Makefile Cleanup (#7097)
Summary:
Cleans up some of the dependencies on test code in the Makefile while building tools:
- Moves the test::RandomString, DBBaseTest::RandomString into Random
- Moves the test::RandomHumanReadableString into Random
- Moves the DestroyDir method into file_utils
- Moves the SetupSyncPointsToMockDirectIO into sync_point.
- Moves the FaultInjection Env and FS classes under env

These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies.  By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.

Tested both release and debug builds via Make and CMake for both static and shared libraries.

More work remains to clean up how the tools are built and remove some unnecessary dependencies.  There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097

Reviewed By: riversand963

Differential Revision: D22463160

Pulled By: pdillinger

fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
2020-07-09 14:35:17 -07:00
Zitan Chen
b35a2f9146 Fix GetFileDbIdentities (#7104)
Summary:
Although PR https://github.com/facebook/rocksdb/issues/7032 fixes the construction of the `SstFileDumper` in `GetFileDbIdentities` by setting a proper `Env` of the `Options` passed in the constructor, the file path was not corrected accordingly. This actually disables backup engine to use db session ids in the file names since the `db_session_id` is always empty.

Now it is fixed by setting the correct path in the construction of `SstFileDumper`. Furthermore, to preserve the Direct IO property that backup engine already has, parameter `EnvOptions` is added to `GetFileDbIdentities` and `SstFileDumper`.

The `BackupUsingDirectIO` test is updated accordingly.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7104

Test Plan: backupable_db_test and some manual tests.

Reviewed By: ajkr

Differential Revision: D22443245

Pulled By: gg814

fbshipit-source-id: 056a9bb8b82947c5e73d7c3fbb62bfe23af5e562
2020-07-09 08:37:59 -07:00
Zitan Chen
cc5c68084b Fix flaky BackupableDBTest.TableFileCorruptedBeforeBackup (#7102)
Summary:
The fix in PR https://github.com/facebook/rocksdb/issues/7082 is not really successful because there is still a small chance that the test will fail.

In addtion to flushing, we close the DB and then reopen before corrupting a table file in the DB. Specifically, we corrupt a table file before backup takes place as follows.
* Open DB
* Fill DB
* Flush DB (optional, no flushing here also works)
* Close DB
* Reopen DB
* Corrupt a table file in the DB

This should make the test reliable.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7102

Test Plan:
`while ./backupable_db_test --gtest_filter=*TableFileCorruptedBeforeBackup*; do true; done`
(kept running for an hour or so :)

Reviewed By: pdillinger

Differential Revision: D22432417

Pulled By: gg814

fbshipit-source-id: d407eee93ff428bb662f80cde1659fbf0149d0cd
2020-07-08 12:16:19 -07:00
rockeet
b649d8cb97 Fixed Factory construct just for calling .Name() (#7080)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7080

Reviewed By: riversand963

Differential Revision: D22412352

Pulled By: ajkr

fbshipit-source-id: 1d7f4c1621040a0130245139b52c3f4d3deac865
2020-07-08 11:54:00 -07:00
Levi Tamasi
a693341604 Move the blob file format related classes to the main namespace, rename reader/writer (#7086)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7086

Test Plan: `make check`

Reviewed By: zhichao-cao

Differential Revision: D22395420

Pulled By: ltamasi

fbshipit-source-id: 088a20097bd6b73b0c433cd79725779f97ec04f2
2020-07-06 17:18:14 -07:00
Zitan Chen
147f7b472a Fix flakiness of BackupableDBTest.TableFileCorruptedBeforeBackup (#7082)
Summary:
If the corruption of a table file is done before flushing, then db manifest may record the checksum for the corrupted table, which results in "matching checksums" when backup engine tries to verfiy the checksum, and causes a flaky test.

Fix the issue by adding `Flush()` before trying to corrupt a table file in *db*.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7082

Test Plan:
`buck test`
Without the fix, failed 5 of 100 tests.
Suspected whether the pseudo randomness causes the issue: doubling `keys_iteration` resulted in 2 of 100 tests failed; deterministically corrupting tables file also caused 2 of 100 tests to fail.
With the fix, passed 200 of 200 tests.

Reviewed By: pdillinger

Differential Revision: D22375421

Pulled By: gg814

fbshipit-source-id: 7304618e7520684b6087e42d0b58329c5ad18329
2020-07-03 15:40:04 -07:00
Jay Zhuang
00de699096 Replace reinterpret_cast with static_cast_with_check (#7067)
Summary:
Replace `reinterpret_cast` with `static_cast_with_check` for `DBImpl` and `ColumnFamilyHandleImpl`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7067

Reviewed By: siying

Differential Revision: D22361587

Pulled By: jay-zhuang

fbshipit-source-id: dfe9e8f3af39c3d27cc372c55ab9ad905eb0a5a1
2020-07-02 19:25:41 -07:00
Zitan Chen
373d5ac485 BackupEngine verifies table file checksums on creating new backups (#7015)
Summary:
When table file checksums are enabled and stored in the DB manifest by using the RocksDB default crc32c checksum function, BackupEngine will calculate the crc32c checksum of the file to be copied and compare the calculated result with the one stored in the DB manifest before copying the file to the backup directory.

After copying to the backup directory, BackupEngine will verify the checksum of the copied file with the one calculated before copying. This helps detect some rare corruption events such as bit-flips during the copying process.

No verification with checksums in DB manifest will be performed if the table file checksum function is not the RocksDB default crc32c checksum function.

In addition, If `share_table_files` and `share_files_with_checksum` are true, BackupEngine will compare the checksums computed before and after copying of the table files.

Corresponding tests are added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7015

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22165732

Pulled By: gg814

fbshipit-source-id: ee0e8cc397c455eba64545c29380b9d9853588ec
2020-07-02 18:15:12 -07:00
Peter Dillinger
52d59e0c93 Revert "Whole DBTest to skip fsync (#7049)" (#7070)
Summary:
This reverts commit 4f1534bdb0.

This commit caused failures and deadlocks in
MultiThreadedDBTest.MultiThreaded/69 and others.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7070

Reviewed By: riversand963

Differential Revision: D22358778

Pulled By: pdillinger

fbshipit-source-id: faf8f2cb469a7063a113921c8e9c64a9f7610dac
2020-07-02 10:22:43 -07:00
sdong
4f1534bdb0 Whole DBTest to skip fsync (#7049)
Summary:
After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7049

Test Plan: Run all existing files.

Reviewed By: pdillinger

Differential Revision: D22301700

fbshipit-source-id: f9a9e3b3b26ce640665a47cb8bff33ba0c89b565
2020-07-01 19:37:56 -07:00
Zitan Chen
b5bae48c8a Fix db_id and db_session_id nullptr warning by clang analyzer (#7063)
Summary:
GetFileDbIdentities requires either db_id non-null or db_session_id non-null.
Passing nullptr for db_id or db_session_id in CopyOrCreateFile indicates the caller does not want to obtain the value for db_id or db_session_id.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7063

Test Plan:
USE_CLANG=1 make analyze
backupable_db_test

Reviewed By: pdillinger

Differential Revision: D22338497

Pulled By: gg814

fbshipit-source-id: 2aa2dcc14d156b0f99b07d6cf3c731ee088272cd
2020-07-01 17:28:28 -07:00
Zitan Chen
6a243b3ade Generalize BackupEngine naming option for share_files_with_checksum SSTs and revert BackupEngine::VerifyBackup to check only file sizes by default (#7032)
Summary:
`bool BackupableDBOptions::new_naming_for_backup_files` is updated to `BackupTableNameOption BackupableDBOptions::share_files_with_checksum_naming`, where `BackupTableNameOption` is an `enum` type with two enumerators `kChecksumAndFileSize` and `kChecksumAndFileSize`. This opens up possibilities of extenting the current naming scheme for backup table files. By default, `BackupTableNameOption BackupableDBOptions::share_files_with_checksum_naming` is set to `kChecksumAndDbSessionId`.

Revert `BackupEngine::VerifyBackup` to only check file sizes by default.

Also fix the construction of the `SstFileDumper` in `GetFileDbIdentities` by setting a proper `Env` of the `Options` passed in the constructor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7032

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D22237763

Pulled By: gg814

fbshipit-source-id: 466902a4e731babd64e30f0e82ca1aa82962e52e
2020-06-30 18:47:16 -07:00
Burton Li
5be2cb6948 Compaction filter support for BlobDB (#6850)
Summary:
Added compaction filter support for BlobDB non-TTL values. Same as vanilla RocksDB, user compaction filter applies to all k/v pairs of the compaction for non-TTL values. It honors `min_blob_size`, which potentially results value transitions between inlined data and stored-in-blob data when size of value is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6850

Reviewed By: siying

Differential Revision: D22263487

Pulled By: ltamasi

fbshipit-source-id: 8fc03f8cde2a5c831e63b436b3dbf1b7f90939e8
2020-06-29 17:32:14 -07:00
Anand Ananthabhotla
9a5886bd8c Extend Get/MultiGet deadline support to table open (#6982)
Summary:
Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open.

The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future.

Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982

Test Plan: Add new unit tests in db_basic_test

Reviewed By: riversand963

Differential Revision: D22219515

Pulled By: anand1976

fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b
2020-06-29 14:53:17 -07:00
Zitan Chen
1569dc48f5 BackupEngine::VerifyBackup verifies checksum by default (#7014)
Summary:
A parameter `verify_with_checksum` is added to `BackupEngine::VerifyBackup`, which is true by default. So now `BackupEngine::VerifyBackup` verifies backup files with checksum AND file size by default. When `verify_with_checksum` is false, `BackupEngine::VerifyBackup` only compares file sizes to verify backup files.

Also add a test for the case when corruption does not change the file size.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7014

Test Plan: Passed backupable_db_test

Reviewed By: zhichao-cao

Differential Revision: D22165590

Pulled By: gg814

fbshipit-source-id: 606a7450714e868bceb38598c89fd356c6004f4f
2020-06-26 11:42:12 -07:00
sdong
d64cf0e4ee Move away from direct TmpDir() call in some tests (#7030)
Summary:
Some tests directly uses TmpDir() as temporary directory without adding any randomize factor. This would cause failures when tests run in parallel. Fix it by moving some of them to test::PerThreadDBPath()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7030

Test Plan: Watch existing tests pass

Reviewed By: zhichao-cao

Differential Revision: D22224710

fbshipit-source-id: 28c9932fede0a4a64670e5b5fdb08f4fb5dccdd0
2020-06-25 12:09:57 -07:00
Zitan Chen
be41c61f22 Add a new option for BackupEngine to store table files under shared_checksum using DB session id in the backup filenames (#6997)
Summary:
`BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`.

Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.

Three new test cases are added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997

Test Plan: Passed make check.

Reviewed By: ajkr

Differential Revision: D22098895

Pulled By: gg814

fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76
2020-06-24 19:31:25 -07:00
Levi Tamasi
9f21d08660 Move kNoExpiration to blob_db.h (#7018)
Summary:
The constant `kNoExpiration` is currently defined in an
internal/implementation header (`blob_log_format.h`); the patch moves it
to the public header `blob_db.h` so it is accessible to users.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7018

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D22191354

Pulled By: ltamasi

fbshipit-source-id: 98c8012a83b999a3f1a30e955ce6bb71ba29dc5c
2020-06-23 13:45:06 -07:00
Zhen Li
9c24a5cb4d Fix persistent cache on windows (#6932)
Summary:
Persistent cache feature caused rocks db crash on windows. I posted a issue for it, https://github.com/facebook/rocksdb/issues/6919. I found this is because no "persistent_cache_key_prefix" is generated for persistent cache. Looking repo history, "GetUniqueIdFromFile" is not implemented on Windows. So my fix is adding "NewId()" function in "persistent_cache" and using it to generate prefix for persistent cache. In this PR, i also re-enable related test cases defined in "db_test2" and "persistent_cache_test" for windows.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6932

Test Plan:
1. run related test cases in "db_test2" and "persistent_cache_test" on windows and see it passed.
2. manually run db_bench.exe with "read_cache_path" and verified.

Reviewed By: riversand963

Differential Revision: D21911608

Pulled By: cheng-chang

fbshipit-source-id: cdfd938d54a385edbb2836b13aaa1d39b0a6f1c2
2020-06-13 13:28:31 -07:00
Levi Tamasi
5abda3bb8b Move blob_log_{format,reader,writer}.{cc,h} to db/blob/ (#6960)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6960

Test Plan: `make check`

Reviewed By: zhichao-cao

Differential Revision: D21958416

Pulled By: ltamasi

fbshipit-source-id: 97cf05027b7363014b07836e7f158c23827bd661
2020-06-09 15:16:05 -07:00
Peter Dillinger
aaece2a98d Fix some defects reported by Coverity Scan (#6933)
Summary:
Confusing checks for null that are never null
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6933

Test Plan: make check

Reviewed By: cheng-chang

Differential Revision: D21885466

Pulled By: pdillinger

fbshipit-source-id: 4b48e03c2a33727f2702b0d12292f9fda5a3c475
2020-06-04 15:46:27 -07:00
Peter Dillinger
c7432cc3c0 Fix more defects reported by Coverity Scan (#6935)
Summary:
Mostly uninitialized values: some probably written before use, but some seem like bugs. Also, destructor needs to be virtual, and possible use-after-free in test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6935

Test Plan: make check

Reviewed By: siying

Differential Revision: D21885484

Pulled By: pdillinger

fbshipit-source-id: e2e7cb0a0cf196f2b55edd16f0634e81f6cc8e08
2020-06-04 15:35:08 -07:00
sdong
6cbe9d9762 Make StringAppendOperatorTest a parameterized test (#6930)
Summary:
StringAppendOperatorTest right now runs in a mode where RUN_ALL_TESTS() is executed twice for the same test but different settings. This creates a problem with a tool that expects every test to run once. Fix it by using a parameterized test instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6930

Test Plan: Run the test and see it passed.

Reviewed By: ltamasi

Differential Revision: D21874145

fbshipit-source-id: 55520b2d7f1ba9f3cba1e2d087fe86f43fb06145
2020-06-04 14:17:11 -07:00
sdong
afa3518839 Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923)
Summary:
This reverts commit 8d87e9cea1.

Based on offline discussions, it's too early to upgrade to gtest 1.10, as it prevents some developers from using an older version of gtest to integrate to some other systems. Revert it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6923

Reviewed By: pdillinger

Differential Revision: D21864799

fbshipit-source-id: d0726b1ff649fc911b9378f1763316200bd363fc
2020-06-03 15:55:03 -07:00
Zhichao Cao
2adb7e3768 Fix potential overflow of unsigned type in for loop (#6902)
Summary:
x.size() -1 or y - 1 can overflow to an extremely large value when x.size() pr y is 0 when they are unsigned type. The end condition of i in the for loop will be extremely large, potentially causes segment fault. Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6902

Test Plan: pass make asan_check

Reviewed By: ajkr

Differential Revision: D21843767

Pulled By: zhichao-cao

fbshipit-source-id: 5b8b88155ac5a93d86246d832e89905a783bb5a1
2020-06-02 15:05:07 -07:00
sdong
bfc9737aca Remove gtest dependency in non-test code under utilities/cassandra (#6908)
Summary:
production code under utilities/cassandra depends on gtest.h. Remove them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6908

Test Plan: Run all existing tests.

Reviewed By: ajkr

Differential Revision: D21842606

fbshipit-source-id: a098e0b49c9aeac51cc90a79562ad9897a36122c
2020-06-02 13:56:29 -07:00
Adam Retter
8d87e9cea1 Update googletest from 1.8.1 to 1.10.0 (#6808)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6808

Reviewed By: anand1976

Differential Revision: D21483984

Pulled By: pdillinger

fbshipit-source-id: 70c5eff2bd54ddba469761d95e4cd4611fb8e598
2020-06-01 20:33:42 -07:00
Marek Kurdej
bcd32560dd Fix warning -Wextra-semi. NFC. (#6869)
Summary:
Minor fix.

CLA signed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6869

Reviewed By: ajkr

Differential Revision: D21704001

Pulled By: pdillinger

fbshipit-source-id: 57fd08114f3234f51f34758e25e708cc70962582
2020-05-22 11:20:13 -07:00
Peter Dillinger
c7aedf1b48 Clean up some code related to file checksums (#6861)
Summary:
* Add missing unit test for schema stability of FileChecksumGenCrc32c
  (previously was only comparing to itself)
* A lot of clarifying comments
* Add some assertions for preconditions
* Rename WritableFileWriter::CalculateFileChecksum -> UpdateFileChecksum
* Simplify FileChecksumGenCrc32c with shared functions
* Implement EndianSwapValue to replace unused EndianTransform

And incidentally since I had trouble with 'make check-format' GitHub action disagreeing with local run,
* Output full diagnostic information when 'make check-format' fails in CI
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6861

Test Plan: new unit test passes before & after other changes

Reviewed By: zhichao-cao

Differential Revision: D21667115

Pulled By: pdillinger

fbshipit-source-id: 6a99970f87605aa024fa540c78cd519ff322c3e6
2020-05-21 08:12:51 -07:00
Cheng Chang
b9d65f5aa6 Trigger compaction in CompactOnDeletionCollector based on deletion ratio (#6806)
Summary:
In level compaction, if the total size (even if compensated after taking account of the deletions) of a level hasn't exceeded the limit, but there are lots of deletion entries in some SST files of the level, these files should also be good candidates for compaction. Otherwise, queries for the deleted keys might be slow because they need to go over all the tombstones.

This PR adds an option `deletion_ratio` to the factory of `CompactOnDeletionCollector` to configure it to trigger compaction when the ratio of tombstones >= `deletion_ratio`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6806

Test Plan:
Added new unit test in `compact_on_deletion_collector_test.cc`.
make compact_on_deletion_collector_test && ./compact_on_deletion_collector_test

Reviewed By: ajkr

Differential Revision: D21511981

Pulled By: cheng-chang

fbshipit-source-id: 65a9d0150e8c9c00337787686475252e4535a3e1
2020-05-18 08:42:05 -07:00
Levi Tamasi
06c3b85b9a Disallow using the base DB's storage directory as blob_dir in BlobDB (#6810)
Summary:
https://github.com/facebook/rocksdb/pull/6807 extends the logic that
identifies and purges obsolete files to blob files handled by RocksDB
itself. In order to prevent that from interfering with the current BlobDB code,
we need to make sure that `BlobDBOptions::blob_dir` is different from
the storage directories used by the base DB. (Note: this is true by default.)
The patch adds a check that explicitly disallows this configuration and
returns `Status::NotSupported` from `BlobDB::Open` in such cases.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6810

Test Plan: Tested using the BlobDB mode of `db_bench`.

Reviewed By: riversand963

Differential Revision: D21412676

Pulled By: ltamasi

fbshipit-source-id: 6630cc7481e48c8bf55d59423b25f14d52ffe681
2020-05-06 14:00:46 -07:00
Cheng Chang
211088df6e Remove redundant update of txn_state_ in transaction Prepare (#6778)
Summary:
When  expiration is set in a pessimistic transaction, `txn_state_` is already updated to `AWAITING_PREPARE` in the `if (expiration_time_ > 0)` block, there is  no need to update the state in `if (can_prepare)` block again.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6778

Test Plan: make check

Reviewed By: lth

Differential Revision: D21335319

Pulled By: cheng-chang

fbshipit-source-id: 251d634cc7d1a0e86e673a59f0bda8584da5a35f
2020-05-01 17:37:33 -07:00
Cheng Chang
ef0c3eda27 Make users explicitly be aware of prepare before commit (#6775)
Summary:
In current commit protocol of pessimistic transaction, if the transaction is not prepared before commit, the commit protocol implicitly assumes that the user wants to commit without prepare.

This PR adds TransactionOptions::skip_prepare, the default value is `true` because if set to `false`, all existing users who commit without prepare need to update their code to set skip_prepare to true. Although this does not force the user to explicitly express their intention of skip_prepare, it at least lets the user be aware of the assumption of being able to commit without prepare.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6775

Test Plan: added a new unit test TransactionTest::CommitWithoutPrepare

Reviewed By: lth

Differential Revision: D21313270

Pulled By: cheng-chang

fbshipit-source-id: 3d95b7c9b2d6cdddc09bdd66c561bc4fae8c3251
2020-04-30 16:24:20 -07:00
anand76
ab13d43e1d Pass a timeout to FileSystem for random reads (#6751)
Summary:
Calculate ```IOOptions::timeout``` using ```ReadOptions::deadline``` and pass it to ```FileSystem::Read/FileSystem::MultiRead```. This allows us to impose a tighter bound on the time taken by Get/MultiGet on FileSystem/Envs that support IO timeouts. Even on those that don't support, check in ```RandomAccessFileReader::Read``` and ```MultiRead``` and return ```Status::TimedOut()``` if the deadline is exceeded.

For now, TableReader creation, which might do file opens and reads, are not covered. It will be implemented in another PR.

Tests:
Update existing unit tests to verify the correct timeout value is being passed
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6751

Reviewed By: riversand963

Differential Revision: D21285631

Pulled By: anand1976

fbshipit-source-id: d89af843e5a91ece866e87aa29438b52a65a8567
2020-04-30 14:50:39 -07:00
Derrick Pallas
5272305437 Fix FilterBench when RTTI=0 (#6732)
Summary:
The dynamic_cast in the filter benchmark causes release mode to fail due to
no-rtti.  Replace with static_cast_with_check.

Signed-off-by: Derrick Pallas <derrick@pallas.us>

Addition by peterd: Remove unnecessary 2nd template arg on all static_cast_with_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6732

Reviewed By: ltamasi

Differential Revision: D21304260

Pulled By: pdillinger

fbshipit-source-id: 6e8eb437c4ca5a16dbbfa4053d67c4ad55f1608c
2020-04-29 13:09:23 -07:00
Levi Tamasi
bea91d5d61 Destroy any ColumnFamilyHandles in BlobDB::Open upon error (#6763)
Summary:
If an error happens during BlobDBImpl::Open after the base DB has been
opened, we need to destroy the `ColumnFamilyHandle`s returned by `DB::Open`
to prevent an assertion in `ColumnFamilySet`'s destructor from being hit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6763

Test Plan: Ran `make check` and tested using the BlobDB mode of `db_bench`.

Reviewed By: riversand963

Differential Revision: D21262643

Pulled By: ltamasi

fbshipit-source-id: 60ebc7ab19be66cf37fbe5f6d8957d58470f3d3b
2020-04-27 16:45:13 -07:00
mrambacher
4cbc19d2a1 Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389)
Summary:
The methods in convenience.h are used to compare/convert objects to/from strings.  There is a mishmash of parameters in use here with more needed in the future.  This PR replaces those parameters with a single structure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389

Reviewed By: siying

Differential Revision: D21163707

Pulled By: zhichao-cao

fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd
2020-04-21 17:38:17 -07:00
Peter Dillinger
31da5e34c1 C++20 compatibility (#6697)
Summary:
Based on https://github.com/facebook/rocksdb/issues/6648 (CLA Signed), but heavily modified / extended:

* Implicit capture of this via [=] deprecated in C++20, and [=,this] not standard before C++20 -> now using explicit capture lists
* Implicit copy operator deprecated in gcc 9 -> add explicit '= default' definition
* std::random_shuffle deprecated in C++17 and removed in C++20 -> migrated to a replacement in RocksDB random.h API
* Add the ability to build with different std version though -DCMAKE_CXX_STANDARD=11/14/17/20 on the cmake command line
* Minimal rebuild flag of MSVC is deprecated and is forbidden with /std:c++latest (C++20)
* Added MSVC 2019 C++11 & MSVC 2019 C++20 in AppVeyor
* Added GCC 9 C++11 & GCC9 C++20 in Travis
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6697

Test Plan: make check and CI

Reviewed By: cheng-chang

Differential Revision: D21020318

Pulled By: pdillinger

fbshipit-source-id: 12311be5dbd8675a0e2c817f7ec50fa11c18ab91
2020-04-20 13:24:25 -07:00
Peter Dillinger
45d2b4efca Fix tabs and lint-ignores (#6734)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6734

Reviewed By: cheng-chang

Differential Revision: D21134556

Pulled By: pdillinger

fbshipit-source-id: 3636cc1d1333137b70031f8277458781c21631fb
2020-04-20 11:39:31 -07:00
anand76
3d6d7bcf17 Log CompactOnDeletionCollectorFactory parameters on DB open (#6686)
Summary:
Log it in the info log to help in troubleshooting. It is logged as follows -
```
2020/04/10-10:51:39.886662 7ffff7fef340                   Options.table_properties_collectors: CompactOnDeletionCollector (Sliding window size = 100 Deletion trigger = 90);
```

Tests:
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6686

Reviewed By: ltamasi

Differential Revision: D21002442

Pulled By: anand1976

fbshipit-source-id: 7adf0dbae7f1febcb00ce61fea5097118ede5c6a
2020-04-13 19:58:04 -07:00
sdong
1be3be5522 Auto-Format two recent diffs and add HISTORY.md (#6685)
Summary:
Two recent diffs can be autoformatted.
Also add HISTORY.md entry for https://github.com/facebook/rocksdb/pull/6214
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6685

Test Plan: Run all existing tests

Reviewed By: cheng-chang

Differential Revision: D20965780

fbshipit-source-id: 195b08d7849513d42fe14073112cd19fdda6af95
2020-04-10 11:32:44 -07:00
Cheng Chang
d648a0e17f Add unit test for TransactionLockMgr (#6599)
Summary:
Although there are tests related to locking in transaction_test, this new test directly tests against TransactionLockMgr.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6599

Test Plan: make transaction_lock_mgr_test && ./transaction_lock_mgr_test

Reviewed By: lth

Differential Revision: D20673749

Pulled By: cheng-chang

fbshipit-source-id: 1fa4a13218e68d785f5a99924556751a8c5c0f31
2020-04-08 13:51:51 -07:00
Levi Tamasi
e6f86cfb36 Revert the recent cache deleter change (#6620)
Summary:
Revert "Use function objects as deleters in the block cache (https://github.com/facebook/rocksdb/issues/6545)"

    This reverts commit 6301dbe7a7.

    Revert "Call out the cache deleter related interface change in HISTORY.md (https://github.com/facebook/rocksdb/issues/6606)"

    This reverts commit 3a35542f86.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6620

Test Plan: `make check`

Reviewed By: zhichao-cao

Differential Revision: D20773311

Pulled By: ltamasi

fbshipit-source-id: 7637a761f718f323ef0e7da959462e8fb06e7a2b
2020-03-31 16:11:06 -07:00
Cheng Chang
ee50b8d499 Be able to decrease background thread's CPU priority when creating database backup (#6602)
Summary:
When creating a database backup, the background threads will not only consume IO resources by copying files, but also consuming CPU such as by computing checksums. During peak times, the CPU consumption by the background threads might affect online queries.

This PR makes it possible to decrease CPU priority of these threads when creating a new backup.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6602

Test Plan: make check

Reviewed By: siying, zhichao-cao

Differential Revision: D20683216

Pulled By: cheng-chang

fbshipit-source-id: 9978b9ed9488e8ce135e90ca083e5b4b7221fd84
2020-03-28 19:07:25 -07:00
Cheng Chang
3881a678d5 Refactor IsLockExpired (#6586)
Summary:
1. If expiration_time is non-positive, no need to call NowMicros, save a syscall.
2. expire_time should only be set when expired is false.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6586

Test Plan: make check

Reviewed By: lth

Differential Revision: D20673730

Pulled By: cheng-chang

fbshipit-source-id: a69e8d7b16dc6d0d00487bb1c19f0710d79482e2
2020-03-27 16:14:22 -07:00
Cheng Chang
2e276973e4 Compute cv_end_time with simpler logic (#6585)
Summary:
The refactored logic is easier to read.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6585

Test Plan: make check

Reviewed By: lth

Differential Revision: D20663225

Pulled By: cheng-chang

fbshipit-source-id: cfd28955cd03b0a71d9087085170875f6dd0be9e
2020-03-27 16:01:23 -07:00
Burton Li
8abd41a544 Fix write_unprepared_transaction_test crash on debug version. (#6574)
Summary:
The last key may hit index of out bound exception when id = 9.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6574

Reviewed By: riversand963

Differential Revision: D20699791

Pulled By: cheng-chang

fbshipit-source-id: 8e2c5be5ff0e53e9857cfd59cea97cff21446819
2020-03-27 11:12:23 -07:00
Peter Dillinger
e91d1a21a6 Streamline persistent_cache_test for testing efficiency (#6601)
Summary:
This test was written like a stress test, using up to 3x26GB
RSS memory during parallel 'make check'. Now, while this code is mostly
dormant, I've made the "for Travis" versions of the expensive tests the
canonical versions and disabled the expensive versions. This has the
side benefit of removing some arbitrary conditional compilation.

For unknown reason, the super expensive tests were gated on
Snappy_Supported, which appears to be irrelevant, so I removed it.

The tests can be fixed / improved / migrated to stress test if/when they
are deemed important again.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6601

Test Plan:
make check + CI

./persistent_cache_test Before:
...
[==========] 10 tests from 2 test cases ran. (114541 ms total)
[  PASSED  ] 10 tests.
YOU HAVE 1 DISABLED TEST

After:
...
[==========] 3 tests from 2 test cases ran. (1714 ms total)
[  PASSED  ] 3 tests.
YOU HAVE 10 DISABLED TESTS

Reviewed By: siying

Differential Revision: D20680983

Pulled By: pdillinger

fbshipit-source-id: 2be0fde13eeb0a71110ac7f5477cfe63996a509e
2020-03-26 19:36:32 -07:00
Levi Tamasi
6301dbe7a7 Use function objects as deleters in the block cache (#6545)
Summary:
As the first step of reintroducing eviction statistics for the block
cache, the patch switches from using simple function pointers as deleters
to function objects implementing an interface. This will enable using
deleters that have state, like a smart pointer to the statistics object
that is to be updated when an entry is removed from the cache. For now,
the patch adds a deleter template class `SimpleDeleter`, which simply
casts the `value` pointer to its original type and calls `delete` or
`delete[]` on it as appropriate. Note: to prevent object lifecycle
issues, deleters must outlive the cache entries referring to them;
`SimpleDeleter` ensures this by using the ("leaky") Meyers singleton
pattern.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6545

Test Plan: `make asan_check`

Reviewed By: siying

Differential Revision: D20475823

Pulled By: ltamasi

fbshipit-source-id: fe354c33dd96d9bafc094605462352305449a22a
2020-03-26 16:19:58 -07:00
Huisheng Liu
a6ce5c823b multiget support for timestamps (#6483)
Summary:
Add timestamp support for MultiGet().
timestamp from readoptions is honored, and timestamps can be returned along with values.

MultiReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
base line (commit 17bef7d3a):
  multireadrandom :     104.173 micros/op 307167 ops/sec; (5462999 of 5462999 found)
This PR:
  multireadrandom :     104.199 micros/op 307095 ops/sec; (5307999 of 5307999 found)

.\db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=multireadrandom --use_existing_db=1 --num=25000000 --threads=32 --allow_concurrent_memtable_write=0
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6483

Reviewed By: anand1976

Differential Revision: D20498373

Pulled By: riversand963

fbshipit-source-id: 8505f22bc40fd791bc7dd05e48d7e67c91edb627
2020-03-24 11:24:09 -07:00
Cheng Chang
4fc216649d Support direct IO in RandomAccessFileReader::MultiRead (#6446)
Summary:
By supporting direct IO in RandomAccessFileReader::MultiRead, the benefits of parallel IO (IO uring) and direct IO can be combined.

In direct IO mode, read requests are aligned and merged together before being issued to RandomAccessFile::MultiRead, so blocks in the original requests might share the same underlying buffer, the shared buffers are returned in `aligned_bufs`, which is a new parameter of the `MultiRead` API.

For example, suppose alignment requirement for direct IO is 4KB, one request is (offset: 1KB, len: 1KB), another request is (offset: 3KB, len: 1KB), then since they all belong to page (offset: 0, len: 4KB), `MultiRead` only reads the page with direct IO into a buffer on heap, and returns 2 Slices referencing regions in that same buffer. See `random_access_file_reader_test.cc` for more examples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6446

Test Plan: Added a new test `random_access_file_reader_test.cc`.

Reviewed By: anand1976

Differential Revision: D20097518

Pulled By: cheng-chang

fbshipit-source-id: ca48a8faf9c3af146465c102ef6b266a363e78d1
2020-03-20 16:33:26 -07:00
Levi Tamasi
c15e85bdcb Move BlobDB related files under db/ to db/blob/ (#6519)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6519

Test Plan:
```
make all
make check
```

Differential Revision: D20400691

Pulled By: ltamasi

fbshipit-source-id: 20ef911cf1c2c92c7f71ef0b493f9be64f2eef94
2020-03-12 11:00:56 -07:00
Chao Zhao
4028eba67b Optional sequence number exporting during checkpoint creation (#5528)
Summary:
Add sequence_number_ptr to the checkpoint interface to expose the sequence number during taking the checkpoint. The number will be consistent with the seq # in rocksdb log.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5528

Test Plan: make check -j64

Reviewed By: Winger1994

Differential Revision: D16080209

fbshipit-source-id: 6dc3c7680287ee97d673c5e61f89aae1f43e33df
2020-03-10 13:40:18 -07:00
Yanqin Jin
d93812c9ae Iterator with timestamp (#6255)
Summary:
Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests.

Create an iterator with timestamp.
```
...
read_opts.timestamp = &ts;
auto* iter = db->NewIterator(read_opts);
// target is key without timestamp.
for (iter->Seek(target); iter->Valid(); iter->Next()) {}
for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {}
delete iter;
read_opts.timestamp = &ts1;
// lower_bound and upper_bound are without timestamp.
read_opts.iterate_lower_bound = &lower_bound;
read_opts.iterate_upper_bound = &upper_bound;
auto* iter1 = db->NewIterator(read_opts);
// Do Seek or SeekToFirst()
delete iter1;
```

Test plan (dev server)
```
$make check
```

Simple benchmarking (dev server)
1. The overhead introduced by this PR even when timestamp is disabled.
key size: 16 bytes
value size: 100 bytes
Entries: 1000000
Data reside in main memory, and try to stress iterator.
Repeated three times on master and this PR.
- Seek without next
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3
```
master: 159047.0 ops/sec
this PR: 158922.3 ops/sec (2% drop in throughput)
- Seek and next 10 times
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10
```
master: 109539.3 ops/sec
this PR: 107519.7 ops/sec (2% drop in throughput)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255

Differential Revision: D19438227

Pulled By: riversand963

fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746
2020-03-06 16:24:27 -08:00
Cheng Chang
0a0151fb99 Remove memcpy from RandomAccessFileReader::Read in direct IO mode (#6455)
Summary:
In direct IO mode, RandomAccessFileReader::Read allocates an internal aligned buffer, and then copies the result into the scratch buffer. If the result is only temporarily used inside a function, there is no need to do the memcpy and just let the result Slice refer to the internally allocated buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6455

Test Plan: make check

Differential Revision: D20106753

Pulled By: cheng-chang

fbshipit-source-id: 44f505843837bba47a56e3fa2c4dd3bd76486b58
2020-03-06 14:05:12 -08:00
Otto Kekäläinen
f6c2777d95 Fix spelling: commited -> committed (#6481)
Summary:
In most places in the code the variable names are spelled correctly as
COMMITTED but in a couple places not. This fixes them and ensures the
variable is always called COMMITTED everywhere.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6481

Differential Revision: D20306776

Pulled By: pdillinger

fbshipit-source-id: b6c1bfe41db559b4bc6955c530934460c07f7022
2020-03-06 12:45:20 -08:00
Huisheng Liu
904a60ff63 return timestamp from get (#6409)
Summary:
Added new Get() methods that return timestamp. Dummy implementation is given so that classes derived from DB don't need to be touched to provide their implementation. MultiGet is not included.

ReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
    base line (commit 72ee067b9):
        101.712 micros/op 314602 ops/sec;   36.0 MB/s (5658999 of 5658999 found)
    This PR:
        100.288 micros/op 319071 ops/sec;   36.5 MB/s (5674999 of 5674999 found)

./db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=readrandom --use_existing_db=1 --num=25000000 --threads=32
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6409

Differential Revision: D20200086

Pulled By: riversand963

fbshipit-source-id: 490edd74d924f62bd8ae9c29c2a6bbbb8410ca50
2020-03-02 16:01:00 -08:00
Michael R. Crusoe
051696bf98 fix some spelling typos (#6464)
Summary:
Found from Debian's "Lintian" program
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6464

Differential Revision: D20162862

Pulled By: zhichao-cao

fbshipit-source-id: 06941ee2437b038b2b8045becbe9d2c6fbff3e12
2020-02-28 14:14:03 -08:00
Manuel Ung
41535d0218 WriteUnPrepared: Pass in correct subbatch count during rollback (#6463)
Summary:
Today `WriteUnpreparedTxn::RollbackInternal` will write the rollback batch assuming that there is only a single subbatch. However, because untracked_keys_ are currently not deduplicated, it's possible for duplicate keys to exist, and thus split the batch. Also, tracked_keys_ also does not support compators outside of the bytewise comparators, so it's possible for duplicates to occur there as well.

To solve this, just pass in the correct subbatch count.

Also, removed `WriteUnpreparedRollbackPreReleaseCallback` to unify the Commit/Rollback codepaths some more.

Also, fixed a bug in `CommitInternal` where if 1. two_write_queue is true and 2. include_data is true, then `WriteUnpreparedCommitEntryPreReleaseCallback` ends up calling `AddCommitted` on the commit time write batch a second time on the second write. To fix, `WriteUnpreparedCommitEntryPreReleaseCallback` is re-initialized.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6463

Differential Revision: D20150153

Pulled By: lth

fbshipit-source-id: df0b42d39406c75af73df995aa1138f0db539cd1
2020-02-28 11:19:32 -08:00
sdong
fdf882ded2 Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433)
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433

Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.

Differential Revision: D19977691

fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
2020-02-20 12:09:57 -08:00
Andrew Kryczka
0f9dcb88b2 Return NotSupported from WriteBatchWithIndex::DeleteRange (#5393)
Summary:
As discovered in https://github.com/facebook/rocksdb/issues/5260 and https://github.com/facebook/rocksdb/issues/5392, reads on the indexed batch do not account for range tombstones. So, return `Status::NotSupported` from `WriteBatchWithIndex::DeleteRange` until we properly support it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5393

Test Plan: added unit test

Differential Revision: D19912360

Pulled By: ajkr

fbshipit-source-id: 0bbfc978ea015d64516ca708fce2429abba524cb
2020-02-18 11:18:25 -08:00
Manuel Ung
dc23c125c3 WriteUnPrepared: Untracked keys (#6404)
Summary:
For write unprepared, some applications may bypass the transaction api, and write keys directly into the write batch. However, since they are not tracked, rollbacks (both for savepoint and transaction) are not aware that these keys have to be rolled back.

The fix is to track them in `WriteUnpreparedTxn::untracked_keys_`. This is populated whenever we flush unprepared batches into the DB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6404

Differential Revision: D19842023

Pulled By: lth

fbshipit-source-id: a9edfc643d5c905fc89da9a9a9094d30c9b70108
2020-02-14 11:31:39 -08:00
wolfkdy
29e24434fe refine code (#6420)
Summary:
I create a new branch from the branch new upsteram/master and "git merge --squash".
Maybe it will fix everything.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6420

Differential Revision: D19897152

Pulled By: zhichao-cao

fbshipit-source-id: 6575d9e3b23e360f42ee1480b43028b5fcc20136
2020-02-13 18:55:02 -08:00
Manuel Ung
fb571509a7 WriteUnPrepared: Enable WAL during crash recovery (#6418)
Summary:
Unfortunately, it seems like mysqld reuses xids across machine restarts. When that happens, we could have something like the following happening:

```
BEGIN_PREPARE(unprepared) Put(a) END_PREPARE(xid = 1)
-- crash and recover with Put(a) rolled back as it was not prepared
BEGIN_PREPARE(prepared) Put(b) END_PREPARE(xid = 1)
COMMIT(xid = 1)
-- crash and recover with both a, b
```

To solve this, we will have to log the rollback batch into the WAL during recovery.

WritePrepared already logs the rollback batch into the WAL, if a rollback happens after prepare, so there is no problem there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6418

Differential Revision: D19896151

Pulled By: lth

fbshipit-source-id: 2ff65ddc5fe75efd57736fed4b7cd7a109d26609
2020-02-13 18:44:39 -08:00
sdong
ac8e89a443 Should flush and sync WAL when writing it in DB::Open() (#6417)
Summary:
A recent fix related to 2pc https://github.com/facebook/rocksdb/pull/6313/ writes something to WAL, but does not flush or sync. This causes assertion failure "impl->TEST_WALBufferIsEmpty()" if manual_wal_flush = true. We should fsync the entry to make sure a second power reset can recover.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6417

Test Plan: Add manual_wal_flush=true case in TransactionTest.DoubleCrashInRecovery and fix a bug in the test so that the bug can be reproduced. It passes with the fix.

Differential Revision: D19894537

fbshipit-source-id: f1e84e49e2269f583c6019743118292cd8b6598e
2020-02-13 18:41:04 -08:00
Kefu Chai
debc4ef18b utilities/env_librados: copy use bufferlist::iterator (#6395)
Summary:
to adapt the change in ceph upstream where the bufferlist::copy() method
was removed in
c724369010

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6395

Differential Revision: D19816815

Pulled By: zhichao-cao

fbshipit-source-id: 9210767b91af0ecdcf5dfaa3e70edcaeea55135f
2020-02-10 11:31:16 -08:00
sdong
876c2dbff4 Allow readahead when reading option files. (#6372)
Summary:
Right, when reading from option files, no readahead is used and 8KB buffer is used. It might introduce high latency if the file system provide high latency and doesn't do readahead. Instead, introduce a readahead to the file. When calling inside DB, infer the value from options.log_readahead. Otherwise, a default 512KB readahead size is used.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6372

Test Plan: Add --log_readahead_size in db_bench. Run it with several options and observe read size from option files using strace.

Differential Revision: D19727739

fbshipit-source-id: e6d8053b0a64259abc087f1f388b9cd66fa8a583
2020-02-07 15:18:26 -08:00
Levi Tamasi
1b4be4cac9 BlobDB: ignore trivially moved files when updating the SST<->blob file mapping (#6381)
Summary:
BlobDB keeps track of the mapping between SSTs and blob files using
the `OnFlushCompleted` and `OnCompactionCompleted` callbacks of
the `EventListener` interface: upon receiving a flush notification, a link
is added between the newly flushed SST and the corresponding blob file;
for compactions, links are removed for the inputs and added for the outputs.
The earlier code performed this link deletion and addition even for
trivially moved files; the new code walks through the two lists together
(in a fashion that's similar to merge sort) and skips such files.
This should mitigate https://github.com/facebook/rocksdb/issues/6338,
wherein an assertion is triggered with the earlier code when a compaction
notification for a trivial move precedes the flush notification for the
moved SST.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6381

Test Plan: make check

Differential Revision: D19773729

Pulled By: ltamasi

fbshipit-source-id: ae0f273ded061110dd9334e8fb99b0d7786650b0
2020-02-07 12:50:57 -08:00
Yanqin Jin
f2fbc5d668 Shorten certain test names to avoid infra failure (#6352)
Summary:
Unit test names, together with other components,  are used to create log files
during some internal testing. Overly long names cause infra failure due to file
names being too long.

Look for internal tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6352

Differential Revision: D19649307

Pulled By: riversand963

fbshipit-source-id: 6f29de096e33c0eaa87d9c8702f810eda50059e7
2020-01-30 23:10:24 -08:00
Levi Tamasi
9e3ace42a4 Add statistics for BlobDB GC (#6296)
Summary:
The patch adds statistics support to the new BlobDB garbage collection implementation;
namely, it adds support for the following (pre-existing) tickers:

`BLOB_DB_GC_NUM_FILES`: the number of blob files obsoleted by the GC logic.
`BLOB_DB_GC_NUM_NEW_FILES`: the number of new blob files generated by the GC logic.
`BLOB_DB_GC_FAILURES`: the number of failed GC passes (where a GC pass is
equivalent to a (sub)compaction).
`BLOB_DB_GC_NUM_KEYS_RELOCATED`: the number of blobs relocated to new blob
files by the GC logic.
`BLOB_DB_GC_BYTES_RELOCATED`: the total size of blobs relocated to new blob files.

The tickers `BLOB_DB_GC_NUM_KEYS_OVERWRITTEN`, `BLOB_DB_GC_NUM_KEYS_EXPIRED`,
`BLOB_DB_GC_BYTES_OVERWRITTEN`, `BLOB_DB_GC_BYTES_EXPIRED`, and
`BLOB_DB_GC_MICROS` are not relevant for the new GC logic, and are thus marked
deprecated.

The patch also adds a couple of log messages that log the number and total size of
blobs encountered and relocated during a GC pass, as well as the number of blob
files created and obsoleted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6296

Test Plan: Extended unit tests and used the BlobDB mode of `db_bench`.

Differential Revision: D19402513

Pulled By: ltamasi

fbshipit-source-id: d53d2bfbf4928a1db1e9346c67ebb9007b8932ec
2020-01-29 16:46:16 -08:00
Maysam Yabandeh
2f973ca96e Double Crash in kPointInTimeRecovery with TransactionDB (#6313)
Summary:
In WritePrepared there could be gap in sequence numbers. This breaks the trick we use in kPointInTimeRecovery which assume the first seq in the log right after the corrupted log is one larger than the last seq we read from the logs. To let this trick keep working, we add a dummy entry with the expected sequence to the first log right after recovery.
Also in WriteCommitted, if the log right after the corrupted log is empty, since it has no sequence number to let the sequential trick work, it is assumed as unexpected behavior. This is however expected to happen if we close the db after recovering from a corruption and before writing anything new to it. To remedy that, we apply the same technique by writing a dummy entry to the log that is created after the corrupted log.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6313

Differential Revision: D19458291

Pulled By: maysamyabandeh

fbshipit-source-id: 09bc49e574690085df45b034ca863ff315937e2d
2020-01-29 11:40:55 -08:00
matthewvon
e6e8b9e871 Correct pragma once problem with Bazel on Windows (#6321)
Summary:
This is a simple edit to have two #include file paths be consistent within range_del_aggregator.{h,cc} with everywhere else.

The impact of this inconsistency is that it actual breaks a Bazel based build on the Windows platform. The same pragma once failure occurs with both Windows Visual C++ 2019 and clang for Windows 9.0. Bazel's "sandboxing" of the builds causes both compilers to not properly recognize "rocksdb/types.h" and "include/rocksdb/types.h" to be the same file (also comparator.h). My guess is that the backslash versus forward slash mixing within path names is the underlying issue.

But, everything builds fine once the include paths in these two source files are consistent with the rest of the repository.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6321

Differential Revision: D19506585

Pulled By: ltamasi

fbshipit-source-id: 294c346607edc433ab99eaabc9c880ee7426817a
2020-01-21 16:12:43 -08:00
Levi Tamasi
1dd7873e08 Remove earlier partial BlobDB GC implementation (#6278)
Summary:
In addition to removing the earlier partially implemented garbage collection
logic from the BlobDB codebase, the patch also removes the test cases (as well as
the related sync points, as appropriate) that were only relevant for the old
implementation, and reworks the remaining ones so they use the new GC logic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6278

Test Plan: `make check`

Differential Revision: D19335226

Pulled By: ltamasi

fbshipit-source-id: 0cc1794bc9892feda1426ed5522a318f3cb1b692
2020-01-14 15:08:44 -08:00
Maysam Yabandeh
eff5e076f5 unordered_write incompatible with max_successive_merges (#6284)
Summary:
unordered_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions.
The patch fixes that and also reverts the changes performed by https://github.com/facebook/rocksdb/pull/6254, in which max_successive_merges was mistakenly declared incompatible with unordered_write.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6284

Differential Revision: D19356115

Pulled By: maysamyabandeh

fbshipit-source-id: f06dadec777622bd75f267361c022735cf8cecb6
2020-01-10 16:53:19 -08:00
sdong
39410bcb3d Fix some shadow warning (#6242)
Summary:
Some shadow warning shows up when using gcc 4.8. An example:

./utilities/blob_db/blob_compaction_filter.h: In constructor ‘rocksdb::blob_db::BlobIndexCompactionFilterFactoryBase::BlobIndexCompactionFilterFactoryBase(rocksdb::blob_db::lobDBImpl*, rocksdb::Env*, rocksdb::Statistics*)’:
./utilities/blob_db/blob_compaction_filter.h:121:7: error: declaration of ‘blob_db_impl’ shadows a member of 'this' [-Werror=shadow]
       : blob_db_impl_(blob_db_impl), env_(_env), statistics_(_statistics) {}
       ^

Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6242

Test Plan: Build and see the warnings go away.

Differential Revision: D19217789

fbshipit-source-id: 8ef631941f23dab47a388e060adec24b72efd65e
2020-01-08 18:20:13 -08:00
Maysam Yabandeh
5709e97a74 Skip CancelAllBackgroundWork if DBImpl is already closed (#6268)
Summary:
WritePreparedTxnDB calls CancelAllBackgroundWork in its destructor to avoid dangling references to it from background job's SnapshotChecker callback. However, if the DBImpl is already closed, the info log might be closed with it, which causes memory leak when CancelAllBackgroundWork tries to print to the info log. The patch fixes that by calling CancelAllBackgroundWork only if the db is not closed already.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6268

Differential Revision: D19303439

Pulled By: maysamyabandeh

fbshipit-source-id: 4228a6be7e78d43c90630347baa89b008200bd15
2020-01-07 15:34:27 -08:00
wolfkdy
1ab1231acf parallel occ (#6240)
Summary:
This is a continuation of https://github.com/facebook/rocksdb/pull/5320/files
I open a new mr for these purposes, half a year has past since the old mr is posted so it's almost impossible to fulfill some points below on the old mr, especially 5)
1) add validation modes for optimistic txns
2) modify unittests to test both modes
3) make format
4) refine hash functor
5) push to master
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6240

Differential Revision: D19301296

fbshipit-source-id: 5b5b3cbd39558f43947f7d2dec6cd31a06386edb
2020-01-07 14:20:38 -08:00
Maysam Yabandeh
48a678b7c9 Prevent an incompatible combination of options (#6254)
Summary:
allow_concurrent_memtable_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6254

Differential Revision: D19265819

Pulled By: maysamyabandeh

fbshipit-source-id: 47f2e2dc26fe0972c7152f4da15dadb9703f1179
2020-01-02 16:15:06 -08:00
Levi Tamasi
1ebaa762e6 Log garbage_collection_cutoff alongside the other BlobDB options
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6229

Differential Revision: D19191195

Pulled By: ltamasi

fbshipit-source-id: 2a3c4785299641a46e022fc012460b759a689fce
2019-12-20 11:00:53 -08:00
Levi Tamasi
7a7ca8eb5b BlobDB: only compare CF IDs when checking whether an API call is for the default CF (#6226)
Summary:
BlobDB currently only supports using the default column family. The earlier
code enforces this by comparing the `ColumnFamilyHandle` passed to the
`Get`/`Put`/etc. call with the handle returned by `DefaultColumnFamily`
(which, at the end of the day, comes from `DBImpl::default_cf_handle_`).
Since other `ColumnFamilyHandle`s can also point to the default column
family, this can reject legitimate requests as well. (As an example,
with the earlier code, the handle returned by `BlobDB::Open` cannot
actually be used in API calls.) The patch fixes this by comparing only
the IDs of the column family handles instead of the pointers themselves.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6226

Test Plan: `make check`

Differential Revision: D19187461

Pulled By: ltamasi

fbshipit-source-id: 54ce2e12ebb1f07e6d1e70e3b1e0213dfa94bda2
2019-12-19 18:05:49 -08:00
anand1976
1be48cb895 Fix crash in Transaction::MultiGet() when num_keys > 32
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6192

Test Plan:
Add a unit test that fails without the fix and passes now
make check

Differential Revision: D19124781

Pulled By: anand1976

fbshipit-source-id: 8c8cb6fa16c3fc23ec011e168561a13f76bbd783
2019-12-16 20:39:35 -08:00
Levi Tamasi
0d2172f128 Make it possible to enable periodic compactions for BlobDB (#6172)
Summary:
Periodic compactions ensure that even SSTs that do not get picked up
otherwise eventually go through compaction; used in conjunction with
BlobDB's garbage collection, they enable BlobDB to reclaim space when
old blob files are used by such straggling SSTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6172

Test Plan: Ran `make check` and used the BlobDB mode of `db_bench`.

Differential Revision: D19045045

Pulled By: ltamasi

fbshipit-source-id: 04636ecc4b6cfe8d495bf656faa65d54a5eb1a93
2019-12-13 16:13:25 -08:00
anand76
afa2420c2b Introduce a new storage specific Env API (#5761)
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.

This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.

The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.

This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.

The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761

Differential Revision: D18868376

Pulled By: anand1976

fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
2019-12-13 14:48:41 -08:00
Levi Tamasi
583c6953d8 Move out valid blobs from the oldest blob files during compaction (#6121)
Summary:
The patch adds logic that relocates live blobs from the oldest N non-TTL
blob files as they are encountered during compaction (assuming the BlobDB
configuration option `enable_garbage_collection` is `true`), where N is defined
as the number of immutable non-TTL blob files multiplied by the value of
a new BlobDB configuration option called `garbage_collection_cutoff`.
(The default value of this parameter is 0.25, that is, by default the valid blobs
residing in the oldest 25% of immutable non-TTL blob files are relocated.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6121

Test Plan: Added unit test and tested using the BlobDB mode of `db_bench`.

Differential Revision: D18785357

Pulled By: ltamasi

fbshipit-source-id: 8c21c512a18fba777ec28765c88682bb1a5e694e
2019-12-13 10:13:05 -08:00
Levi Tamasi
3b607610df Do not update SST <-> blob file mapping if compaction failed
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6156

Test Plan: Extended unit tests.

Differential Revision: D18943867

Pulled By: ltamasi

fbshipit-source-id: b3669d2dd6af08e987ad1a59d6712ae2514da0b1
2019-12-12 11:30:45 -08:00