rocksdb

Author	SHA1	Message	Date
Jay Zhuang	dff9219df6	Fix checkpoint_test hang (#7849 ) Summary: `CheckpointTest.CurrentFileModifiedWhileCheckpointing` could hang because now create checkpoint triggers flush twice. The test should wait both flush done. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7849 Test Plan: `gtest-parallel ./checkpoint_test --gtest_filter=CheckpointTest.CurrentFileModifiedWhileCheckpointing -r 100` Reviewed By: ajkr Differential Revision: D25860713 Pulled By: jay-zhuang fbshipit-source-id: e1c2f23037dedc33e205519f4289a25e77816b41	2021-01-09 13:29:00 -08:00
Cheng Chang	eeea27a048	Get manifest size again after getting min_log_num during checkpoint (#7836 ) Summary: Currently, manifest size is determined before getting min_log_num. But between getting manifest size and getting min_log_num, concurrently, a flush might succeed, which will write new records to manifest to make some WALs become outdated, then min_log_num will be correspondingly increased, but the new records in manifest will not be copied into the checkpoint because the manifest's size is determined before them, then the newly outdated WALs will still exist in the checkpoint's manifest, but they are not linked/copied to the checkpoint because their log number is < min_log_num, so a corruption of missing WAL will be reported when restoring from the checkpoint. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7836 Test Plan: make crash_test Reviewed By: ajkr Differential Revision: D25788204 Pulled By: cheng-chang fbshipit-source-id: a4e5acf30f08270b3c0a95304ff559a9e655252f	2021-01-08 10:16:03 -08:00
cheng-chang	2eaad9e1b1	Skip WALs according to MinLogNumberToKeep when creating checkpoint (#7789 ) Summary: In a stress test failure, we observe that a WAL is skipped when creating checkpoint, although its log number >= MinLogNumberToKeep(). This might happen in the following case: 1. when creating the checkpoint, there are 2 column families: CF0 and CF1, and there are 2 WALs: 1, 2; 2. CF0's log number is 1, CF0's active memtable is empty, CF1's log number is 2, CF1's active memtable is not empty, WAL 2 is not empty, the sequence number points to WAL 2; 2. the checkpoint process flushes CF0, since CF0' active memtable is empty, there is no need to SwitchMemtable, thus no new WAL will be created, so CF0's log number is now 2, concurrently, some data is written to CF0 and WAL 2; 3. the checkpoint process flushes CF1, WAL 3 is created and CF1's log number is now 3, CF0's log number is still 2 because CF0 is not empty and WAL 2 contains its unflushed data concurrently written in step 2; 4. the checkpoint process determines that WAL 1 and 2 are no longer needed according to [live_wal_files[i]->StartSequence() >= *sequence_number](https://github.com/facebook/rocksdb/blob/master/utilities/checkpoint/checkpoint_impl.cc#L388), so it skips linking them to the checkpoint directory; 5. but according to `MinLogNumberToKeep()`, WAL 2 still needs to be kept because CF0's log number is 2. If the checkpoint is reopened in read-only mode, and only read from the snapshot with the initial sequence number, then there will be no data loss or data inconsistency. But if the checkpoint is reopened and read from the most recent sequence number, suppose in step 3, there are also data concurrently written to CF1 and WAL 3, then the most recent sequence number refers to the latest entry in WAL 3, so the data written in step 2 should also be visible, but since WAL 2 is discarded, those data are lost. When tracking WAL in MANIFEST is enabled, when reopening the checkpoint, since WAL 2 is still tracked in MANIFEST as alive, but it's missing from the checkpoint directory, a corruption will be reported. This PR makes the checkpoint process to only skip a WAL if its log number < `MinLogNumberToKeep`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7789 Test Plan: watch existing tests to pass. Reviewed By: ajkr Differential Revision: D25662346 Pulled By: cheng-chang fbshipit-source-id: 136471095baa01886cf44809455cf855f24857a0	2020-12-23 14:22:45 -08:00
Sergei Petrunia	2a2d4d0f37	Apply the changes from: PS-5501 : Re-license PerconaFT 'locktree' to Apache V2 (#7801 ) Summary: commit d5178f513c0b4144a5ac9358ec0f6a3b54a28e76 Author: George O. Lorch III <george.lorch@percona.com> Date: Tue Mar 19 12:18:40 2019 -0700 PS-5501 : Re-license PerconaFT 'locktree' to Apache V2 - Fixed some incomplete relicensed files from previous round. - Added missing license text to some. - Relicensed more files to Apache V2 that locktree depends on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7801 Reviewed By: jay-zhuang Differential Revision: D25682430 Pulled By: cheng-chang fbshipit-source-id: deb8a0de3e76f3638672997bfbd300e2fffbe5f5	2020-12-23 14:20:51 -08:00
Zhichao Cao	04b3524ad0	Inject the random write error to stress test (#7653 ) Summary: Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653 Test Plan: pass db_stress and python3 db_crashtest.py blackbox Reviewed By: ajkr Differential Revision: D25354132 Pulled By: zhichao-cao fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669	2020-12-17 11:52:28 -08:00
Akanksha Mahajan	99f5a800c3	Fix clang_analyze error (#7777 ) Summary: Fix clang_analyze error Pull Request resolved: https://github.com/facebook/rocksdb/pull/7777 Test Plan: USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j64 analyze Reviewed By: jay-zhuang Differential Revision: D25601675 Pulled By: akankshamahajan15 fbshipit-source-id: 30f58cf4d575a2d546c455fb43e856455eb72a07	2020-12-16 21:34:41 -08:00
Adam Retter	8ff6557e7f	Add further tests to ASSERT_STATUS_CHECKED (2) (#7698 ) Summary: Second batch of adding more tests to ASSERT_STATUS_CHECKED. * external_sst_file_basic_test * checkpoint_test * db_wal_test * db_block_cache_test * db_logical_block_size_cache_test * db_blob_index_test * optimistic_transaction_test * transaction_test * point_lock_manager_test * write_prepared_transaction_test * write_unprepared_transaction_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7698 Reviewed By: cheng-chang Differential Revision: D25441664 Pulled By: pdillinger fbshipit-source-id: 9e78867f32321db5d4833e95eb96c5734526ef00	2020-12-09 21:21:16 -08:00
Manuel Ung	71239908cf	Invalidate iterator on transaction clear (#7733 ) Summary: Some clients do not close their iterators until after the transaction finishes. To handle this case, we will invalidate any iterators on transaction clear. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7733 Reviewed By: cheng-chang Differential Revision: D25261158 Pulled By: lth fbshipit-source-id: b91320f00c54cbe0e6882b794b34f3bb5640dbc0	2020-12-09 19:13:22 -08:00
Sergei Petrunia	98236fb10e	LockTree library, originally from PerconaFT (#7753 ) Summary: To be used for implementing Range Locking. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7753 Reviewed By: zhichao-cao Differential Revision: D25378980 Pulled By: cheng-chang fbshipit-source-id: 801a9c5cd92a84654ca2586b73e8f69001e89320	2020-12-09 12:10:57 -08:00
Adam Retter	7b2216c906	Add further tests to ASSERT_STATUS_CHECKED (1) (#7679 ) Summary: First batch of adding more tests to ASSERT_STATUS_CHECKED. * db_iterator_test * db_memtable_test * db_merge_operator_test * db_merge_operand_test * write_batch_test * write_batch_with_index_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7679 Reviewed By: ajkr Differential Revision: D25399270 Pulled By: pdillinger fbshipit-source-id: 3017d0a686aec5cd2d743fc2acbbf75df239f3ba	2020-12-08 15:55:04 -08:00
Sergei Petrunia	d8bd9fc7b3	Range Locking: Allow different LockManagers, add Range Lock definitions (#7443 ) Summary: This PR has two commits: 1. Modify the code to allow different Lock Managers (of any kind) to be used. It is implied that a LockManager uses its own custom LockTracker. 2. Add definitions for Range Locking (class Endpoint and GetRangeLock() function. cheng-chang, is this what you've had in mind (should the PR have both item 1 and item 2?) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7443 Reviewed By: zhichao-cao Differential Revision: D24123172 Pulled By: cheng-chang fbshipit-source-id: c6548ad6d4cc3c25f68d13b29147bc6fdf357185	2020-12-07 20:18:07 -08:00
Akanksha Mahajan	20c7d7c58a	Handling misuse of snprintf return value (#7686 ) Summary: Handle misuse of snprintf return value to avoid Out of bound read/write. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7686 Test Plan: make check -j64 Reviewed By: riversand963 Differential Revision: D25030831 Pulled By: akankshamahajan15 fbshipit-source-id: 1a1d181c067c78b94d720323ae00b79566b57cfa	2020-12-07 13:43:55 -08:00
Levi Tamasi	61932cdf1d	Add blob support to DBIter (#7731 ) Summary: The patch adds iterator support to the integrated BlobDB implementation. Whenever a blob reference is encountered during iteration, the corresponding blob is retrieved by calling `Version::GetBlob`, assuming the `expose_blob_index` (formerly `allow_blob`) flag is not set. (Note: the flag is set by the old stacked BlobDB implementation, which has its own blob file handling/blob retrieval logic.) In addition, `DBIter` now uniformly returns `Status::NotSupported` with the error message `"BlobDB does not support merge operator."` when encountering a blob reference while performing a merge (instead of potentially returning a message that implies the database should be opened using the stacked BlobDB's `Open`.) TODO: We can implement support for lazily retrieving the blob value (or in other words, bypassing the retrieval of blob values based on key) by extending the `Iterator` API with a new `PrepareValue` method (similarly to `InternalIterator`, which already supports lazy values). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7731 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D25256293 Pulled By: ltamasi fbshipit-source-id: c39cd782011495a526cdff99c16f5fca400c4811	2020-12-04 21:29:38 -08:00
Andrew Kryczka	1c5f13f2a5	Fail early when `merge_operator` not configured (#7667 ) Summary: An application may accidentally write merge operands without properly configuring `merge_operator`. We should alert them as early as possible that there's an API misuse. Previously RocksDB only notified them when a query or background operation needed to merge but couldn't. With this PR, RocksDB notifies them of the problem before applying the merge operand to the memtable (although it may already be in WAL, which seems it'd cause a crash loop until they enable `merge_operator`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7667 Reviewed By: riversand963 Differential Revision: D24933360 Pulled By: ajkr fbshipit-source-id: 3a4a2ceb0b7aed184113dd03b8efd735a8332f7f	2020-11-16 20:39:01 -08:00
Cheng Chang	5e794b0841	Fix a recovery corner case (#7621 ) Summary: Consider the following sequence of events: 1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST. 2. Syncing MANIFEST failed and db crashed. 3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file. 4. Db crashed again. 5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed. It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5. After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621 Test Plan: 1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery. 2. a new unit test is added in db_basic_test to simulate step 3. Reviewed By: riversand963 Differential Revision: D24668144 Pulled By: cheng-chang fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b	2020-11-07 22:23:27 -08:00
Cheng Chang	da42eceabc	Skip fsync in txn tests (#7641 ) Summary: The tests often times out in internal infra, skipping fsync should reduce test time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7641 Test Plan: watch existing tests to pass Reviewed By: anand1976 Differential Revision: D24765098 Pulled By: cheng-chang fbshipit-source-id: c62bf8110361aee901918d632cf4772435d05e8d	2020-11-06 14:25:14 -08:00
mrambacher	30beecef8c	Return NotFound from TableFactory configuration errors during options loading (#7615 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7615 Reviewed By: riversand963 Differential Revision: D24637054 Pulled By: ajkr fbshipit-source-id: 7da20d44289eaa2387af4edf8c3c48057425cc1c	2020-10-29 18:44:24 -07:00
mrambacher	7eb2824e3f	Revert LoadLatestOptions handling of ignore_unknown_options if versions differ (#7612 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7612 Reviewed By: zhichao-cao Differential Revision: D24627054 Pulled By: riversand963 fbshipit-source-id: 451b4da742e3e84c7442bc7cc4959d39089b89d0	2020-10-29 13:46:26 -07:00
Yanqin Jin	394210f280	Remove unused includes (#7604 ) Summary: This is a PR generated semi-automatically by an internal tool to remove unused includes and `using` statements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604 Test Plan: make check Reviewed By: ajkr Differential Revision: D24579392 Pulled By: riversand963 fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd	2020-10-28 23:22:27 -07:00
Ramkumar Vadivelu	9a690a74e1	In ParseInternalKey(), include corrupt key info in Status (#7515 ) Summary: Fixes Issue https://github.com/facebook/rocksdb/issues/7497 When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()` Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later. Tests: - make check - some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test - some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test Example of new status returns: - Key too small - `Corrupted Key: Internal Key too small. Size=5` - Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3` - Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515 Reviewed By: ajkr Differential Revision: D24240264 Pulled By: ramvadiv fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7	2020-10-28 10:12:58 -07:00
mrambacher	f35f7f2704	Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566 ) Summary: This PR does a few things: 1. The MockFileSystem class was split out from the MockEnv. This change would theoretically allow a MockFileSystem to be used by other Environments as well (if we created a means of constructing one). The MockFileSystem implements a FileSystem in its entirety and does not rely on any Wrapper implementation. 2. Make the RocksDB test suite work when MOCK_ENV=1 and ENCRYPTED_ENV=1 are set. To accomplish this, a few things were needed: - The tests that tried to use the "wrong" environment (Env::Default() instead of env_) were updated - The MockFileSystem was changed to support the features it was missing or mishandled (such as recursively deleting files in a directory or supporting renaming of a directory). 3. Updated the test framework to have a ROCKSDB_GTEST_SKIP macro. This can be used to flag tests that are skipped. Currently, this defaults to doing nothing (marks the test as SUCCESS) but will mark the tests as SKIPPED when RocksDB is upgraded to a version of gtest that supports this (gtest-1.10). I have run a full "make check" with MEM_ENV, ENCRYPTED_ENV, both, and neither under both MacOS and RedHat. A few tests were disabled/skipped for the MEM/ENCRYPTED cases. The error_handler_fs_test fails/hangs for MEM_ENV (presumably a timing problem) and I will introduce another PR/issue to track that problem. (I will also push a change to disable those tests soon). There is one more test in DBTest2 that also fails which I need to investigate or skip before this PR is merged. Theoretically, this PR should also allow the test suite to run against an Env loaded from the registry, though I do not have one to try it with currently. Finally, once this is accepted, it would be nice if there was a CircleCI job to run these tests on a checkin so this effort does not become stale. I do not know how to do that, so if someone could write that job, it would be appreciated :) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7566 Reviewed By: zhichao-cao Differential Revision: D24408980 Pulled By: jay-zhuang fbshipit-source-id: 911b1554a4d0da06fd51feca0c090a4abdcb4a5f	2020-10-27 10:33:09 -07:00
Zhichao Cao	d8ec0a760a	Make FileType Public and Replace kLogFile with kWalFile (#7580 ) Summary: As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in https://github.com/facebook/rocksdb/issues/7523 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7580 Test Plan: make check Reviewed By: riversand963 Differential Revision: D24485420 Pulled By: zhichao-cao fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799	2020-10-22 17:06:20 -07:00
Cheng Chang	5227b315ec	Fix unchecked statuses for transaction_test (#7572 ) Summary: When `ASSERT_STATUS_CHECKED` is enabled, `transaction_test` does not pass without this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7572 Test Plan: `ASSERT_STATUS_CHECKED=1 make -j32 transaction_test && ./transaction_test` Reviewed By: zhichao-cao Differential Revision: D24404319 Pulled By: cheng-chang fbshipit-source-id: 13689035995366ab06d8eada3ea404e45fef8bc5	2020-10-21 14:03:59 -07:00
Cheng Chang	73dbe10bbf	Fix write_batch_test when ASSERT_STATUS_CHECKED=1 (#7575 ) Summary: Without this PR, `ASSERT_STATUS_CHECKED=1 make -j32 write_batch_test && ./write_batch_test` fails. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7575 Test Plan: ASSERT_STATUS_CHECKED=1 make -j32 write_batch_test && ./write_batch_test Reviewed By: zhichao-cao Differential Revision: D24411442 Pulled By: cheng-chang fbshipit-source-id: f67dc43c44d6afcc6d7e5ff15c6ae9bbf4dfc943	2020-10-20 13:18:41 -07:00
mrambacher	1eda625eab	Revert `Status`es returned from pre-`Configurable` options functions (#7563 ) Summary: Further refinement of the earlier PR. Now the Status is NotFound with a subcode of PathNotFound. Also the existing functions for options parsing/loading are reverted to return InvalidArgument no matter in which way the user-provided arguments are deemed invalid. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7563 Reviewed By: zhichao-cao Differential Revision: D24422491 Pulled By: ajkr fbshipit-source-id: ba6b237cd0584d3f925c5ba0d349aeb8c250af67	2020-10-20 11:53:28 -07:00
Cheng Chang	0ea7db768e	Abstract out LockManager interface (#7532 ) Summary: In order to be able to introduce more locking protocols, we need to abstract out the locking subsystem in TransactionDB into a set of interfaces. PR https://github.com/facebook/rocksdb/pull/7013 introduces interface `LockTracker`. This PR is a follow up to take the first step to abstract out a `LockManager` interface. Further modifications to the interface may be needed when introducing the first implementation of range lock. But the idea here is to put the range lock implementation based on range tree under the `utilities/transactions/lock/range/range_tree`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7532 Test Plan: point_lock_manager_test Reviewed By: ajkr Differential Revision: D24238731 Pulled By: cheng-chang fbshipit-source-id: 2a9458cd8b3fb008d9529dbc4d3b28c24631f463	2020-10-19 10:14:42 -07:00
Levi Tamasi	e8cb32ed67	Introduce BlobFileCache and add support for blob files to Get() (#7540 ) Summary: The patch adds blob file support to the `Get` API by extending `Version` so that whenever a blob reference is read from a file, the blob is retrieved from the corresponding blob file and passed back to the caller. (This is assuming the blob reference is valid and the blob file is actually part of the given `Version`.) It also introduces a cache of `BlobFileReader`s called `BlobFileCache` that enables sharing `BlobFileReader`s between callers. `BlobFileCache` uses the same backing cache as `TableCache`, so `max_open_files` (if specified) limits the total number of open (table + blob) files. TODO: proactively open/cache blob files and pin the cache handles of the readers in the metadata objects similarly to what `VersionBuilder::LoadTableHandlers` does for table files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7540 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24260219 Pulled By: ltamasi fbshipit-source-id: a8a2a4f11d3d04d6082201b52184bc4d7b0857ba	2020-10-15 13:04:47 -07:00
mrambacher	a8c89cc969	Test for LoadLatestOptions (#7554 ) Summary: Make LoadLatestOptions return PathNotFound if the options file does not exist. Added tests for the LoadOptions related methods. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7554 Reviewed By: akankshamahajan15 Differential Revision: D24298985 Pulled By: zhichao-cao fbshipit-source-id: c9ae3cb12fc4a5bbef07743e1c1300f98a2441b3	2020-10-14 22:28:55 -07:00
Akanksha Mahajan	24498ab1ec	Add few unit test cases in ASSERT_STATUS_CHECKED (#7500 ) Summary: Add status enforcement for following tests: 1. import_column_family_test 2. memory_test 3. table_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7500 Reviewed By: zhichao-cao Differential Revision: D24095887 Pulled By: akankshamahajan15 fbshipit-source-id: db8e1ec595852df143fad78a0c07bfdd27dc3c84	2020-10-08 11:22:44 -07:00
Levi Tamasi	1f84611e5d	Clean up BlobLogReader and rename it to BlobLogSequentialReader (#7517 ) Summary: The patch does some cleanup in and around the legacy `BlobLogReader` class: * It renames the class to `BlobLogSequentialReader` to emphasize that it is for sequentially iterating through blobs in a blob file, as opposed to doing random point reads using `BlobIndex`es (which is `BlobFileReader`'s jurisdiction). * It removes some dead code from the old BlobDB implementation that references `BlobLogReader` (namely the method `BlobFile::OpenRandomAccessReader`). * It cleans up some `#include`s and forward declarations. * It fixes some incorrect/outdated comments related to the reader class. * It adds a few assertions to the `Read` methods of the class. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7517 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D24172611 Pulled By: ltamasi fbshipit-source-id: 43e2ae1eba5c3dd30c1070cb00f217edc45bd64f	2020-10-07 17:48:16 -07:00
Zhichao Cao	b7062f0b2c	Status check enforcement for error_handler_fs_test (#7342 ) Summary: Added status check enforcement for error_test_fs_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7342 Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 error_test_fs_test Reviewed By: akankshamahajan15 Differential Revision: D23972231 Pulled By: zhichao-cao fbshipit-source-id: fa41bfe440012e0c55f2c9507c1d0104e5e93f84	2020-10-02 16:41:13 -07:00
Ramkumar Vadivelu	e04a50923d	Change ParseInternalKey() to return Status instead of bool (#7457 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/7430 Change ParseInternalKey() to return Status instead of bool. db_bench (seekrandom) based before/after results with value size of 100 bytes and 16 bytes can be found at (tests ran on an udb server): https://www.dropbox.com/s/47bwamdy5ozngph/PIK_ret_Status_results.xlsx?dl=0 ![db_bench_results](https://user-images.githubusercontent.com/62277872/94642825-2a21a800-029a-11eb-88f2-124136c83fd3.png) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7457 Reviewed By: ajkr Differential Revision: D24002433 Pulled By: ramvadiv fbshipit-source-id: ac253ecf577a29044c47c3fe254a01e71404c44c	2020-09-30 19:16:47 -07:00
Jay Zhuang	e127fe18c3	Fix TSAN failure for backupable_db_test (#7478 ) Summary: It's a transient failure, but can be reproduce with running the test 100 times: https://app.circleci.com/pipelines/github/facebook/rocksdb/3760/workflows/de909685-f22b-45ba-a8f3-6ebb78a54e96/jobs/37039 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7478 Test Plan: re-run the test 100 times Reviewed By: ajkr Differential Revision: D24035758 Pulled By: jay-zhuang fbshipit-source-id: 6b31983d5c3f7faa8d5481306098513485d0d69d	2020-09-30 17:22:56 -07:00
sdong	d08a9005b7	Make db_basic_test pass assert status checked (#7452 ) Summary: Add db_basic_test status check list. Some of the warnings are suppressed. It is possible that some of them are due to real bugs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7452 Test Plan: See CI tests pass. Reviewed By: zhichao-cao Differential Revision: D23979764 fbshipit-source-id: 6151570c2a9b931b0fbb3fe939a94b2bd1583cbe	2020-09-29 09:49:04 -07:00
Levi Tamasi	5e221a98b5	Support injecting read errors for RandomAccessFile when using FaultInjectionTestEnv (#7447 ) Summary: The patch adds support for injecting errors when reading from `RandomAccessFile` using `FaultInjectionTestEnv`. (This functionality was curiously missing w/r/t `RandomAccessFile`, even though it was implemented for `RandomRWFile`.) The patch also fixes up a test case in `blob_db_test` which uses `FaultInjectionTestEnv` but has so far relied on reads from `RandomAccessFile`s succeeding even after deactivating the filesystem. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7447 Test Plan: `make check` Reviewed By: zhichao-cao Differential Revision: D23971740 Pulled By: ltamasi fbshipit-source-id: 8492736cb64b1ee138c658822535f3ff4fe560c6	2020-09-28 17:32:06 -07:00
Peter Dillinger	08552b19d3	Genericize and clean up FastRange (#7436 ) Summary: A generic algorithm in progress depends on a templatized version of fastrange, so this change generalizes it and renames it to fit our style guidelines, FastRange32, FastRange64, and now FastRangeGeneric. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7436 Test Plan: added a few more test cases Reviewed By: jay-zhuang Differential Revision: D23958153 Pulled By: pdillinger fbshipit-source-id: 8c3b76101653417804997e5f076623a25586f3e8	2020-09-28 11:35:00 -07:00
Zhichao Cao	0ce9b3a22d	Add AppendWithVerify and PositionedAppendWithVerify to Env and FileSystem (#7419 ) Summary: Add new AppendWithVerify and PositionedAppendWithVerify APIs to Env and FileSystem to bring the data verification information (data checksum information) from upper layer (e.g., WritableFileWriter) to the storage layer. This PR only include the API definition, no functional codes are added to unblock other developers which depend on these APIs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7419 Test Plan: make -j32 Reviewed By: pdillinger Differential Revision: D23883196 Pulled By: zhichao-cao fbshipit-source-id: 94676c26bc56144cc32e3661f84f21eccd790411	2020-09-23 19:02:26 -07:00
Peter Dillinger	9d8eb77c4d	Less I/O for incremental backups, slightly better corruption detection (#7413 ) Summary: Two relatively simple functional changes to incremental backup behavior, integrated with a minor refactoring to reduce code redundancy and improve error/log message. There are nuances to the impact of these changes, but I believe they are fundamentally good and generally safe. Those functional changes: * Incremental backups no longer read DB table files that are already saved to a shared part of the backup directory, unless `share_files_with_checksum` is used with `kLegacyCrc32cAndFileSize` naming (discouraged) where crc32c full file checksums are needed to determine file naming. * Justification: incremental backups should not need to read the whole DB, especially without rate limiting. (Although other BackupEngine reads are not rate limited either, other non-trivial reads are generally limited by a corresponding write, as in copying files.) Also, the fact that this is not already fixed was arguably a bug/oversight in the implementation of https://github.com/facebook/rocksdb/issues/7110. * When considering whether a table file is already backed up in a shared part of backup directory, BackupEngine would already query the sizes of source (DB) and pre-existing destination (backup) files. BackupEngine now uses these file sizes to detect corruption, as at least one of (a) old backup, (b) backup in progress, or (c) current DB is corrupt if there's a size mismatch. * Justification: a random related fix that also helps to cover a small hole in corruption checking uncovered by the other functional change: * For `share_table_files` without "checksum" (not recommended), the other change regresses in detecting fundamentally unsafe use of this option combination: when you might generate different versions of same SST file number. As demonstrated by `BackupableDBTest.FailOverwritingBackups,` this regression is greatly mitigated by the new file size checking. Nevertheless, almost no reason to use `share_files_with_checksum=false` should remain, and comments are updated appropriately. Also, this change renames internal function `CalculateChecksum` to `ReadFileAndComputeChecksum` to make the performance impact of this function clear in code reviews. It is not clear what 'same_path' is for in backupable_db.cc, and I suspect it cannot be true for a DB with unique file names (like DBImpl). Nevertheless, I've tried to keep its functionality intact when `true` to minimize risk for now, despite having no unit tests for which it is true. Select impact details (much more in unit tests): For `share_files_with_checksum`, I am confident there is no regression (vs. pre-6.12) in detecting DB or backup corruption at backup creation time, mostly because the old design did not leverage this extra checksum computation for detecting inconsistencies at backup creation time. (With computed checksums in names, a recently corrupted file just looked like a different file vs. what was already backed up.) Even in the hypothetical case of DB session id collision (~100 bits entropy collision), file size in name and/or our file size check add an extra layer of protection against false success in creating an accurate new backup. (Unit test included.) `DB::VerifyChecksum` and `BackupEngine::VerifyBackup` with checksum checking are still able to catch corruptions that `CreateNewBackup` does not. Note that when custom file checksum support is added to BackupEngine, that will essentially give the same power as `DB::VerifyChecksum` into `CreateNewBackup`. We could add options for `CreateNewBackup` to cover some of what would be caught by `VerifyBackup` with checksum checking. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7413 Test Plan: Two new unit tests included, both of which fail without these changes. Although we don't test the I/O improvement directly, we test it indirectly in DB corruption detection power that was inadvertently unlocked with new backup file naming PLUS computing current content checksums (now removed). (I don't think that case of DB corruption detection justifies reading the whole DB on incremental backup.) Reviewed By: zhichao-cao Differential Revision: D23818480 Pulled By: pdillinger fbshipit-source-id: 148aff16f001af5b9fd4b22f155311c2461f1bac	2020-09-21 16:19:24 -07:00
Peter Dillinger	b475a83f9d	Postponing custom checksum support in BackupEngine (#7411 ) Summary: This change reverts BackupEngine to 6.12 state to accommodate a higher-priority fix that does not easily merge with this custom checksum support. We intend to reinstate this support soon, by merging a revert of this change. For backupable_db_test, I've removed the tests depending on this feature. I've also removed relevant HISTORY.md entry. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7411 Test Plan: unit tests Reviewed By: ajkr Differential Revision: D23793835 Pulled By: pdillinger fbshipit-source-id: 7e861436539584799b13d1a8ae559b81b6d08052	2020-09-18 15:27:03 -07:00
Peter Dillinger	93719fc953	Restore file size in backup table file names (and other cleanup) (#7400 ) Summary: Prior to 6.12, backup files using share_files_with_checksum had the file size encoded in the file name, after the last '\_' and before the last '.'. We considered this an implementation detail subject to change, and indeed removed this information from the file name (with an option to use old behavior) because it was considered ineffective/inefficient for file name uniqueness. However, some downstream RocksDB users were relying on this information since the file size is not explicitly in the backup manifest file. This primary purpose of this change is "retrofitting" the 6.12 release (not yet a public release) to simultaneously support the benefits of the new naming scheme (I/O performance and data correctness at scale) and preserve the file size information, both as default behaviors. With this change, we are essentially making the file size information encoded in the file name an official, though obscure, extension of the backup meta file format. We preserve an option (kLegacyCrc32cAndFileSize) to use the original "legacy" naming scheme, with its caveats, and make it easy to omit the file size information (no kFlagIncludeFileSize), for more compact file names. But note that changing the naming scheme used on an existing db and backup directory can lead to transient space amplification, as some files will be stored under two names in the shared_checksum directory. Because some backups were saved using the original 6.12 naming scheme, we offer two ways of dealing with those files: SST files generated by older 6.12 versions can either use the default naming scheme in effect when the SST files were generated (kFlagMatchInterimNaming, default, no transient space amplification) or can use a new naming scheme (no kFlagMatchInterimNaming, potential space amplification because some already stored files getting a new name). We don't have a natural way to detect which files were generated by previous 6.12 versions, but this change hacks one in by changing DB session ids to now use a more concise encoding, reducing file name length, saving ~dozen bytes from SST files, and making them visually distinct from DB ids so that they are less likely to be mixed up. Two final auxiliary notes: Recognizing that the backup file names have become a de facto part of the backup meta schema, this change makes them easier to parse and extend by putting a distinct marker, 's', before DB session ids embedded in the name. When we extend this to allow custom checksums in the name, they can get their own marker to ensure safe parsing. For backward compatibility, file size does not get a marker but is assumed for `_[0-9]+[.]` Another change from initial 6.12 default behavior is never including file custom checksum in the file name. Looking ahead to 6.13, we do not want the default behavior to cause backup space amplification for someone turning on file custom checksum checking in BackupEngine; we want that to be an easy decision. When implemented, including file custom checksums in backup file names will be a non-default option. Actual file name patterns and priorities, as regexes: kLegacyCrc32cAndFileSize OR pre-6.12 SST file -> [0-9]+_[0-9]+_[0-9]+[.]sst kFlagMatchInterimNaming set (default) AND early 6.12 SST file -> [0-9]+_[0-9a-fA-F-]+[.]sst kUseDbSessionId AND NOT kFlagIncludeFileSize -> [0-9]+_s[0-9A-Z]{20}[.]sst kUseDbSessionId AND kFlagIncludeFileSize (default) -> [0-9]+_s[0-9A-Z]{20}_[0-9]+[.]sst We might add opt-in options for more '\_' separated data in the name, but embedded file size, if present, will always be after last '\_' and before '.sst'. This change was originally applied to version 6.12. (See https://github.com/facebook/rocksdb/issues/7390) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7400 Test Plan: unit tests included. Sync point callbacks are used to mimic previous version SST files. Reviewed By: ajkr Differential Revision: D23759587 Pulled By: pdillinger fbshipit-source-id: f62d8af4e0978de0a34f26288cfbe66049b70025	2020-09-17 10:24:22 -07:00
mrambacher	a08d6f18f0	Add more tests to ASSERT_STATUS_CHECKED (#7367 ) Summary: db_options_test options_file_test auto_roll_logger_test options_util_test persistent_cache_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7367 Reviewed By: jay-zhuang Differential Revision: D23712520 Pulled By: zhichao-cao fbshipit-source-id: 99b331e357f5d6a6aabee89d1bd933002cbb3908	2020-09-16 15:48:07 -07:00
mrambacher	7d472accdc	Bring the Configurable options together (#5753 ) Summary: This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753 Reviewed By: ajkr Differential Revision: D23385030 Pulled By: zhichao-cao fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a	2020-09-14 17:01:01 -07:00
Levi Tamasi	5ce246c716	Expose the start of the expiration range for TTL blob files through LiveFileMetaData (#7365 ) Summary: The patch adds support for exposing the start of the expiration range for TTL blob files through the `GetLiveFilesMetaData` API. This can be used for monitoring purposes, i.e. to make sure TTL blob files are deleted in a timely manner. The patch also fixes a couple of uninitialized variable issues. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7365 Test Plan: `make check` Reviewed By: pdillinger Differential Revision: D23605465 Pulled By: ltamasi fbshipit-source-id: 97a9612bf5f4b058423debdd3f28f576bb23a70f	2020-09-10 11:33:33 -07:00
Peter Dillinger	4e258d3e63	Fix backup/restore in stress/crash test (#7357 ) Summary: (1) Skip check on specific key if restoring an old backup (small minority of cases) because it can fail in those cases. (2) Remove an old assertion about number of column families and number of keys passed in, which is broken by atomic flush (cf_consistency) test. Like other code (for better or worse) assume a single key and iterate over column families. (3) Apply mock_direct_io to NewSequentialFile so that db_stress backup works on /dev/shm. Also add more context to output in case of backup/restore db_stress failure. Also a minor fix to BackupEngine to report first failure status in creating new backup, and drop another clue about the potential source of a "Backup failed" status. Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)" Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357 Test Plan: Using backup_one_in=10000, "USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes "USE_CLANG=1 make blackbox_crash_test" for 30+ minutes And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb Reviewed By: riversand963 Differential Revision: D23567244 Pulled By: pdillinger fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77	2020-09-08 10:50:19 -07:00
Jay Zhuang	55bf42a80c	Recompress blobs during GC if compression changed (#7331 ) Summary: Recompress blobs if compression type is changed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7331 Test Plan: `make check` Reviewed By: ltamasi Differential Revision: D23437102 Pulled By: jay-zhuang fbshipit-source-id: bb699ebdad137721d422e42e331d4de8a82a7c5f	2020-09-01 18:03:50 -07:00
Levi Tamasi	5043960623	Add a blob file builder class that can be used in background jobs (#7306 ) Summary: The patch adds a class called `BlobFileBuilder` that can be used to build and cut blob files in background jobs (flushes/compactions). The class enforces a value size threshold (`min_blob_size`; smaller blobs will be inlined in the LSM tree itself), and supports specifying a blob file size limit (`blob_file_size`), as well as compression (`blob_compression_type`) and checksums for blob files. It also keeps track of the generated blob files and their associated `BlobFileAddition` metadata, which can be applied as part of the background job's `VersionEdit`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7306 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D23298817 Pulled By: ltamasi fbshipit-source-id: 38f35d81dab1ba81f15236240612ec173d7f21b5	2020-08-27 11:55:54 -07:00
Peter Dillinger	9aad24da55	Real fix for race in backup custom checksum checking (#7309 ) Summary: This is a "real" fix for the issue worked around in https://github.com/facebook/rocksdb/issues/7294. To get DB checksum info for live files, we now read the manifest file that will become part of the checkpoint/backup. This requires a little extra handling in taking a custom checkpoint, including only reading the manifest file up to the size prescribed by the checkpoint. This moves GetFileChecksumsFromManifest from backup code to file_checksum_helper.{h,cc} and removes apparently unnecessary checking related to column families. Updated HISTORY.md and warned potential future users of DB::GetLiveFilesChecksumInfo() Pull Request resolved: https://github.com/facebook/rocksdb/pull/7309 Test Plan: updated unit test, before and after Reviewed By: ajkr Differential Revision: D23311994 Pulled By: pdillinger fbshipit-source-id: 741e30a2dc1830e8208f7648fcc8c5f000d4e2d5	2020-08-26 10:39:20 -07:00
mrambacher	b7e1c5213f	Add some simulator cache and block tracer tests to ASSERT_STATUS_CHECKED (#7305 ) Summary: More tests now pass. When in doubt, I added a TODO comment to check what should happen with an ignored error. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7305 Reviewed By: akankshamahajan15 Differential Revision: D23301262 Pulled By: ajkr fbshipit-source-id: 5f120edc7393560aefc0633250277bbc7e8de9e6	2020-08-24 16:43:31 -07:00
rockeet	e653af7164	DBWithTTL::Open() param ttls: vector<int32_t> to const vector<int32_t>& (#7196 ) Summary: fix DBWithTTL::Open() param ttls: vector<int32_t> to const vector<int32_t> Pull Request resolved: https://github.com/facebook/rocksdb/pull/7196 Reviewed By: akankshamahajan15 Differential Revision: D23277772 Pulled By: ajkr fbshipit-source-id: bf69834b5c2062c7e166dab21fbfd40416c7872d	2020-08-24 16:24:16 -07:00
sdong	5aacef9712	Disable fsync in SeqAdvanceConcurrentTest (#7302 ) Summary: SeqAdvanceConcurrentTest sometimes runs too long on some platforms. Disable fsync to speed it up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7302 Test Plan: Run the tests and watch CI. Reviewed By: ajkr Differential Revision: D23298192 fbshipit-source-id: 2185eed4e0958c3de5e8a3f94ceed5be5945ed37	2020-08-24 13:22:06 -07:00

1 2 3 4 5 ...

1199 Commits