rocksdb

Author	SHA1	Message	Date
Yanqin Jin	961c7590d6	Add timestamp to delete (#6253 ) Summary: Preliminary user-timestamp support for delete. If ["a", ts=100] exists, you can delete it by calling `DB::Delete(write_options, key)` in which `write_options.timestamp` points to a `ts` higher than 100. Implementation A new ValueType, i.e. `kTypeDeletionWithTimestamp` is added for deletion marker with timestamp. The reason for a separate `kTypeDeletionWithTimestamp`: RocksDB may drop tombstones (keys with kTypeDeletion) when compacting them to the bottom level. This is OK and useful if timestamp is disabled. When timestamp is enabled, should we still reuse `kTypeDeletion`, we may drop the tombstone with a more recent timestamp, causing deleted keys to re-appear. Test plan (dev server) ``` make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6253 Reviewed By: ltamasi Differential Revision: D20995328 Pulled By: riversand963 fbshipit-source-id: a9e5c22968ad76f98e3dc6ee0151265a3f0df619	2020-05-28 10:40:03 -07:00
Levi Tamasi	e3f953a863	Make it possible to look up files by number in VersionStorageInfo (#6862 ) Summary: Does what it says on the can: the patch adds a hash map to `VersionStorageInfo` that maps file numbers to file locations, i.e. (level, position in level) pairs. This will enable stricter consistency checks in `VersionBuilder`. The patch also fixes all the unit tests that used duplicate file numbers in a version (which would trigger an assertion with the new code). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6862 Test Plan: `make check` `make whitebox_crash_test` Reviewed By: riversand963 Differential Revision: D21670446 Pulled By: ltamasi fbshipit-source-id: 2eac249945cf33d8fb8597b26bfff5221e1a861a	2020-05-28 10:03:06 -07:00
Akanksha Mahajan	bcefc59e9f	Allow MultiGet users to limit cumulative value size (#6826 ) Summary: 1. Add a value_size in read options which limits the cumulative value size of keys read in batches. Once the size exceeds read_options.value_size, all the remaining keys are returned with status Abort without further fetching any key. 2. Add a unit test case MultiGetBatchedValueSizeSimple the reads keys from memory and sst files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6826 Test Plan: 1. make check -j64 2. Add a new unit test case Reviewed By: anand1976 Differential Revision: D21471483 Pulled By: akankshamahajan15 fbshipit-source-id: dea51b8e76d5d1df38ece8cdb29933b1d798b900	2020-05-27 13:07:14 -07:00
Zhichao Cao	545e14b53b	Generate file checksum in SstFileWriter (#6859 ) Summary: If Option.file_checksum_gen_factory is set, rocksdb generates the file checksum during flush and compaction based on the checksum generator created by the factory and store the checksum and function name in vstorage and Manifest. This PR enable file checksum generation in SstFileWrite and store the checksum and checksum function name in the ExternalSstFileInfo, such that application can use them for other purpose, for example, ingest the file checksum with files in IngestExternalFile(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6859 Test Plan: add unit test and pass make asan_check. Reviewed By: ajkr Differential Revision: D21656247 Pulled By: zhichao-cao fbshipit-source-id: 78a3570c76031d8832e3d2de3d6c79cdf2b675d0	2020-05-20 11:55:31 -07:00
Levi Tamasi	1551f1011a	Refactor the blob file related logic in VersionBuilder (#6835 ) Summary: This patch is groundwork for an upcoming change to store the set of linked SSTs in `BlobFileMetaData`. With the current code, a new `BlobFileMetaData` object is created each time a `VersionEdit` touches a certain blob file. This is fine as long as these objects are lightweight and cheap to create; however, with the addition of the linked SST set, it would be very inefficient since the set would have to be copied over and over again. Note that this is the same kind of problem that `VersionBuilder` is solving w/r/t `Version`s and files, and we can apply the same solution; that is, we can accumulate the changes in a different mutable object, and apply the delta in one shot when the changes are committed. The patch does exactly that by adding a new `BlobFileMetaDataDelta` class to `VersionBuilder`. In addition, it turns the existing `GetBlobFileMetaData` helper into `IsBlobFileInVersion` (which is fine since that's the only thing the method's clients care about now), and adds a couple of helper methods that can create a `BlobFileMetaData` object from the `BlobFileMetaData` in the base (if applicable) and the delta when the `Version` is saved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6835 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D21505187 Pulled By: ltamasi fbshipit-source-id: d81a48c5f2ca7b79d7124c935332a6bcf3d5d988	2020-05-19 10:00:04 -07:00
Yanqin Jin	b11a8b1b9a	Fix valgrind error by init memory region (#6842 ) Summary: As title. After allocating a memory buffer, initialize its content to 0s. This fixes valgrind issue introduced in https://github.com/facebook/rocksdb/issues/6709. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6842 Test Plan: ``` $make valgrind_test $valgrind --tool=memcheck --track-origins=yes ./db_test2 --gtest_filter=DBTest2.CompressionFailures ``` Reviewed By: pdillinger Differential Revision: D21551848 Pulled By: riversand963 fbshipit-source-id: e87a6f413e3f3d92d8e23d8ecc4cf93479c6674c	2020-05-14 18:50:03 -07:00
anand76	50d63a2af0	Fix LITE build failure in compaction_picker_test (#6839 ) Summary: Fix LITE build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6839 Test Plan: LITE=1 make check Reviewed By: riversand963 Differential Revision: D21535808 Pulled By: anand1976 fbshipit-source-id: fcad961eca08e13fb0c256c92d18c3c1f1165f22	2020-05-13 10:47:15 -07:00
sdong	4a4b8a1344	sst_dump to reduce number of file reads (#6836 ) Summary: sst_dump can issue many file reads from the file system. This doesn't work well with file systems without a OS cache, especially remote file systems. In order to mitigate this problem, several improvements are done: 1. --readahead_size is added, so that users can specify readahead size when scanning the data. 2. Force a 512KB tail readahead, which prevents three I/Os for footer, meta index and property blocks and hopefully index and filter blocks too. 3. Consoldiate SSTDump's I/Os before opening the file for read. Use the same file prefetch buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6836 Test Plan: Add a test that covers this new feature. Reviewed By: pdillinger Differential Revision: D21516607 fbshipit-source-id: 3ae43526286f67b2f4a5bdedfbc92719d579b87e	2020-05-12 18:23:33 -07:00
Stanislav Tkach	70aaa9ceeb	Expose CancellAllBackgroundWork to C api (#6832 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6832 Reviewed By: cheng-chang Differential Revision: D21498186 Pulled By: riversand963 fbshipit-source-id: 66bb0d7c06af2bf0df3c6a09b61bca2fb81f2dd3	2020-05-12 14:50:52 -07:00
Ziyue Yang	c384c08a4f	Add tests for compression failure in BlockBasedTableBuilder (#6709 ) Summary: Currently there is no check for whether BlockBasedTableBuilder will expose compression error status if compression fails during the table building. This commit adds fake faulting compressors and a unit test to test such cases. This check finds 5 bugs, and this commit also fixes them: 1. Not handling compression failure well in BlockBasedTableBuilder::BGWorkWriteRawBlock. 2. verify_compression failing in BlockBasedTableBuilder when used with ZSTD. 3. Wrongly passing the same reference of block contents to BlockBasedTableBuilder::CompressAndVerifyBlock in parallel compression. 4. Wrongly setting block_rep->first_key_in_next_block to nullptr in BlockBasedTableBuilder::EnterUnbuffered when there are still incoming data blocks. 5. Not maintaining variables for compression ratio estimation and first_block in BlockBasedTableBuilder::EnterUnbuffered. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6709 Reviewed By: ajkr Differential Revision: D21236254 fbshipit-source-id: 101f6e62b2bac2b7be72be198adf93cd32a1ff46	2020-05-12 09:27:35 -07:00
sdong	a50ea71c00	Improve ldb consistency checks (#6802 ) Summary: When using ldb, users cannot turn on force consistency check in most commands, while they cannot use checksonsistnecy with --try_load_options. The change fixes both by: 1. checkconsistency now calls OpenDB() so that it gets all the options loading and sanitized options logic 2. use options.check_consistency_checks = true by default, and add a --disable_consistency_checks to turn it off. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6802 Test Plan: Add a new unit test. Some manual tests with corrupted DBs. Reviewed By: pdillinger Differential Revision: D21388051 fbshipit-source-id: 8d122732d391b426e3982a1c3232a8e3763ffad0	2020-05-08 14:17:47 -07:00
Yanqin Jin	e72e2167fd	Fix a few bugs in best-efforts recovery (#6824 ) Summary: 1. Update column_family_memtables_ to point to latest column_family_set in version_set after recovery. 2. Normalize file paths passed by application so that directories end with '/' or '\\'. 3. In addition to missing files, corrupted files are also ignored in best-efforts recovery. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6824 Test Plan: COMPILE_WITH_ASAN=1 make check Reviewed By: anand1976 Differential Revision: D21463905 Pulled By: riversand963 fbshipit-source-id: c48db8843cc93c8c1c7139c474b64e6f775307d2	2020-05-08 13:01:42 -07:00
anand76	94265234de	Fix race due to delete triggered compaction in Universal compaction mode (#6799 ) Summary: Delete triggered compaction in universal compaction mode was causing a corruption when scheduled in parallel with other compactions. 1. When num_levels = 1, a file marked for compaction may be picked along with all older files in L0, without checking if any of them are already being compaction. This can cause unpredictable results like resurrection of older versions of keys or deleted keys. 2. When num_levels > 1, a delete triggered compaction would not get scheduled if it overlaps with a running regular compaction. However, the reverse is not true. This is due to the fact that in ```UniversalCompactionBuilder::CalculateSortedRuns```, it assumes that entire sorted runs are picked for compaction and only checks the first file in a sorted run to determine conflicts. This is violated by a delete triggered compaction as it works on a subset of a sorted run. Fix the bug for num_levels > 1, and disable the feature for now when num_levels = 1. After disabling this feature, files would still get marked for compaction, but no compaction would get scheduled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6799 Reviewed By: pdillinger Differential Revision: D21431286 Pulled By: anand1976 fbshipit-source-id: ae9f0bdb1d6ae2f10284847db731c23f43af164a	2020-05-07 17:32:17 -07:00
Peter Dillinger	b27a1448b6	Fix false NotFound from batched MultiGet with kHashSearch (#6821 ) Summary: The error is assigning KeyContext::s to NotFound status in a table reader for a "not found in this table" case, which skips searching in later tables, like only a delete should. (The hash search index iterator is the only one that can return status NotFound even if Valid() == false.) This was detected by intermittent failure in MultiThreadedDBTest.MultiThreaded/5, a kHashSearch configuration. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6821 Test Plan: modified existing unit test to reproduce problem Reviewed By: anand1976 Differential Revision: D21450469 Pulled By: pdillinger fbshipit-source-id: 7478003684d637dbd491cdac81468041a791be2c	2020-05-07 15:41:37 -07:00
Andrew Kryczka	e9ba4ba348	validate range tombstone covers positive range (#6788 ) Summary: We found some files containing nothing but negative range tombstones, and unsurprisingly their metadata specified a negative range, which made things crash. Time to add a bit of user input validation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6788 Reviewed By: zhichao-cao Differential Revision: D21343719 Pulled By: ajkr fbshipit-source-id: f1c16e4c3e9fa150958c8c866176632a3206fb74	2020-05-07 11:55:30 -07:00
Levi Tamasi	ac3ae1df0b	Find/purge obsolete blob files (#6807 ) Summary: The patch extends `FindObsoleteFiles` and `PurgeObsoleteFiles` with support for blob files. The behavior is analogous to SST files: obsolete blob files are put on the "candidates for deletion" list, while live (and pending) files are preserved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6807 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D21406249 Pulled By: ltamasi fbshipit-source-id: 1948f71c31927564b61e8af394f50ca3964880d9	2020-05-07 09:32:51 -07:00
sdong	c21c459771	Slightly expand converage to file consistency check failure (#6800 ) Summary: Current DBCompactionTest.ConsistencyFailTest checks DB fails after L0 inconsitency is found. Add slightly more coverage by introducing DBCompactionTest.ConsistencyFailTest2 which checks non-L0 files too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6800 Test Plan: Run the new test. Reviewed By: riversand963 Differential Revision: D21384806 fbshipit-source-id: 36db7b657eed42115283fe2f6afa4c3a31a3b510	2020-05-05 18:31:53 -07:00
mrambacher	394f2bbd13	Add OptionTypeInfo::Enum and related methods (#6423 ) Summary: Add methods and constructors for handling enums to the OptionTypeInfo. This change allows enums to be converted/compared without adding a special "type" to the OptionType. This change addresses a couple of issues: - It allows new enumerated types to be added to the options without editing the OptionType base class (and related methods) - It standardizes the procedure for adding enumerated types to the options, reducing potential mistakes - It moves the enum maps to the location where they are used, allowing them to be static file members rather than global values - It reduces the number of types and cases that need to be handled in the various OptionType methods Pull Request resolved: https://github.com/facebook/rocksdb/pull/6423 Reviewed By: siying Differential Revision: D21408713 Pulled By: zhichao-cao fbshipit-source-id: fc492af285d011822578b95d186a0fce25d35626	2020-05-05 15:04:04 -07:00
Andrew Kryczka	a96461d169	fix swallowed error for file deletion consistency check (#6809 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6809 Reviewed By: pdillinger Differential Revision: D21411971 Pulled By: ajkr fbshipit-source-id: 900b6b0370b76e9a3e5e03f968e2ac1bbaab73b8	2020-05-05 14:54:21 -07:00
Peter Dillinger	2f1700c8c5	Fix failure to write output in SpecialEnv::GetCurrentTime (#6803 ) Summary: This very old test code bug was causing a new valgrind failure in MultiGetDeadlineExceeded Also fix hang in MultiGetDeadlineExceeded by unifying with some logic from another test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6803 Test Plan: run that unit test under valgrind, make check Reviewed By: siying Differential Revision: D21388470 Pulled By: pdillinger fbshipit-source-id: 0ce99d6d5eb8cd3195b17406892c8c5cff5fa5dd	2020-05-05 13:11:29 -07:00
Cheng Chang	91bc0130fa	Refactor level compaction picker (#6804 ) Summary: 1. refactor out PickFileToCompact to remove duplicated logic. 2. remove redundant checks of `start_level_inputs_.empty()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6804 Test Plan: make check Reviewed By: siying Differential Revision: D21390053 Pulled By: cheng-chang fbshipit-source-id: 185d5987a08bfdaf63f0f245310c6da69878d415	2020-05-05 11:09:29 -07:00
Yanqin Jin	5584595f80	Do not swallow error returned from SaveTo() (#6801 ) Summary: With consistency check enabled, VersionBuilder::SaveTo() may return error once corruption is detected while building versions. We should handle these errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6801 Test Plan: make check Reviewed By: siying Differential Revision: D21385045 Pulled By: riversand963 fbshipit-source-id: 98f6424e2a4699b62befa21e9fe00e70a771118e	2020-05-05 10:46:20 -07:00
Yanqin Jin	5a61e7864d	Fix db_stress when GetLiveFiles() flushes dropped CF (#6805 ) Summary: Current impl. of db_stress will abort verification and report failure if GetLiveFiles() causes a dropped column family to be flushed. This is not desired. To fix, this PR makes the following change: In GetLiveFiles, if flush is triggered and returns Status::IsColumnFamilyDropped(), then set status to Status::OK(). This is OK because dropped column families will be skipped during the rest of this function, and valid column families will have their live files returned to caller. Test plan (dev server): make check ./db_stress -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100 ./db_stress -disable_wal=1 -reopen=0 -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6805 Reviewed By: ltamasi Differential Revision: D21390044 Pulled By: riversand963 fbshipit-source-id: de67846b95a4f1b88aa0a30c3d70c43cc68625b9	2020-05-04 17:45:49 -07:00
Levi Tamasi	a00ddf1574	Expose the set of live blob files from Version/VersionSet (#6785 ) Summary: The patch adds logic that returns the set of live blob files from `Version::AddLiveFiles` and `VersionSet::AddLiveFiles` (in addition to live table files), and also cleans up the code a bit, for example, by exposing only the numbers of table files as opposed to the earlier `FileDescriptor`s that no clients used. Moreover, the patch extends the `GetLiveFiles` API so that it also exposes blob files in the current version. Similarly to https://github.com/facebook/rocksdb/pull/6755, this is a building block for identifying and purging obsolete blob files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6785 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D21336210 Pulled By: ltamasi fbshipit-source-id: fc1aede8a49eacd03caafbc5f6f9ce43b6270821	2020-05-04 15:08:13 -07:00
sdong	680c416348	Avoid Swallowing Some File Consistency Checking Bugs (#6793 ) Summary: We are swallowing some file consistency checking failures. This is not expected. We are fixing two cases: DB reopen and manifest dump. More places are not fixed and need follow-up. Error from CheckConsistencyForDeletes() is also swallowed, which is not fixed in this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6793 Test Plan: Add a unit test to cover the reopen case. Reviewed By: riversand963 Differential Revision: D21366525 fbshipit-source-id: eb438a322237814e8d5125f916a3c6de97f39ded	2020-05-04 14:18:11 -07:00
sdong	6504ae0c4e	Remove the support of setting CompressionOptions.parallel_threads from string for now (#6782 ) Summary: The current way of implementing CompressionOptions.parallel_threads introduces a format change. We plan to change CompressionOptions's serailization format to a new JSON-like format, which would be another format change. We would like to consolidate the two format changes into one, rather than making some users to change twice. Hold CompressionOptions.parallel_threads from being supported by option string for now. Will add it back after the general CompressionOptions's format change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6782 Test Plan: Run all existing tests. Reviewed By: zhichao-cao Differential Revision: D21338614 fbshipit-source-id: bca2dac3cb37d4e6e64b52cbbe8ea749cd848685	2020-04-30 17:01:17 -07:00
anand76	ab13d43e1d	Pass a timeout to FileSystem for random reads (#6751 ) Summary: Calculate ```IOOptions::timeout``` using ```ReadOptions::deadline``` and pass it to ```FileSystem::Read/FileSystem::MultiRead```. This allows us to impose a tighter bound on the time taken by Get/MultiGet on FileSystem/Envs that support IO timeouts. Even on those that don't support, check in ```RandomAccessFileReader::Read``` and ```MultiRead``` and return ```Status::TimedOut()``` if the deadline is exceeded. For now, TableReader creation, which might do file opens and reads, are not covered. It will be implemented in another PR. Tests: Update existing unit tests to verify the correct timeout value is being passed Pull Request resolved: https://github.com/facebook/rocksdb/pull/6751 Reviewed By: riversand963 Differential Revision: D21285631 Pulled By: anand1976 fbshipit-source-id: d89af843e5a91ece866e87aa29438b52a65a8567	2020-04-30 14:50:39 -07:00
Levi Tamasi	fe238e5438	Keep track of obsolete blob files in VersionSet (#6755 ) Summary: The patch adds logic to keep track of obsolete blob files. A blob file becomes obsolete when the last `shared_ptr` that points to the corresponding `SharedBlobFileMetaData` object goes away, which, in turn, happens when the last `Version` that contains the blob file is destroyed. No longer needed blob files are added to the obsolete list in `VersionSet` using a custom deleter to avoid unnecessary coupling between `SharedBlobFileMetaData` and `VersionSet`. Obsolete blob files are returned by `VersionSet::GetObsoleteFiles` and stored in `JobContext`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6755 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D21233155 Pulled By: ltamasi fbshipit-source-id: 47757e06fdc0127f27ed57f51abd27893d9a7b7a	2020-04-30 11:25:51 -07:00
Zhichao Cao	8c694025e9	Fix potential size_t overflow in import_column_family (#6762 ) Summary: The issue is reported in https://github.com/facebook/rocksdb/issues/6753 . size_t is unsigned and if sorted_file.size() is 0, the end condition of i will be extremely large, cause segment fault in sorted_files[i] and sorted_files[i+1]. Added condition to fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6762 Test Plan: make asan_check Reviewed By: pdillinger Differential Revision: D21323063 Pulled By: zhichao-cao fbshipit-source-id: 56ce59201949ed319448228553202b8642c2cc3a	2020-04-30 08:40:42 -07:00
Derrick Pallas	5272305437	Fix FilterBench when RTTI=0 (#6732 ) Summary: The dynamic_cast in the filter benchmark causes release mode to fail due to no-rtti. Replace with static_cast_with_check. Signed-off-by: Derrick Pallas <derrick@pallas.us> Addition by peterd: Remove unnecessary 2nd template arg on all static_cast_with_check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6732 Reviewed By: ltamasi Differential Revision: D21304260 Pulled By: pdillinger fbshipit-source-id: 6e8eb437c4ca5a16dbbfa4053d67c4ad55f1608c	2020-04-29 13:09:23 -07:00
Peter Dillinger	8086e5e294	Fix LITE build (#6770 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6770 Test Plan: make LITE=1 check Reviewed By: ajkr Differential Revision: D21296261 Pulled By: pdillinger fbshipit-source-id: b6075cc13a6d6db48617b7e0e9ebeea9364dfd9f	2020-04-28 21:37:20 -07:00
anand76	335ea73e49	Fix a valgrind failure due to DBBasicTestMultiGetDeadline (#6756 ) Summary: Fix a valgrind failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6756 Test Plan: valgrind_test Reviewed By: pdillinger Differential Revision: D21284660 Pulled By: anand1976 fbshipit-source-id: 39bf1bd130b6adb585ddbf2f9aa2f53dbf666f80	2020-04-28 20:06:06 -07:00
mrambacher	618bf638aa	Add Functions to OptionTypeInfo (#6422 ) Summary: Added functions for parsing, serializing, and comparing elements to OptionTypeInfo. These functions allow all of the special cases that could not be handled directly in the map of OptionTypeInfo to be moved into the map. Using these functions, every type can be handled via the map rather than special cased. By adding these functions, the code for handling options can become more standardized (fewer special cases) and (eventually) handled completely by common classes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6422 Test Plan: pass make check Reviewed By: siying Differential Revision: D21269005 Pulled By: zhichao-cao fbshipit-source-id: 9ba71c721a38ebf9ee88259d60bd81b3282b9077	2020-04-28 18:04:26 -07:00
Peter Dillinger	b810e62b39	Clarifying comments in db.h (#6768 ) Summary: And fix a confusingly worded log message Pull Request resolved: https://github.com/facebook/rocksdb/pull/6768 Reviewed By: anand1976 Differential Revision: D21284527 Pulled By: pdillinger fbshipit-source-id: f03c1422c229a901c3a65e524740452349626164	2020-04-28 15:26:03 -07:00
Peter Dillinger	bae6f58696	Basic MultiGet support for partitioned filters (#6757 ) Summary: In MultiGet, access each applicable filter partition only once per batch, rather than for each applicable key. Also, * Fix Bloom stats for MultiGet * Fix/refactor MultiGetContext::Range::KeysLeft, including * Add efficient BitsSetToOne implementation * Assert that MultiGetContext::Range does not go beyond shift range Performance test: Generate db: $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 -partition_index_and_filters=true ... Before (middle performing run of three; note some missing Bloom stats): $ ./db_bench --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 \| egrep 'micros/op\|block.cache.filter.hit\|bloom.filter.(full\|use)\|number.multiget' multireadrandom : 26.403 micros/op 597517 ops/sec; (548427 of 671968 found) rocksdb.block.cache.filter.hit COUNT : 83443275 rocksdb.bloom.filter.useful COUNT : 0 rocksdb.bloom.filter.full.positive COUNT : 0 rocksdb.bloom.filter.full.true.positive COUNT : 7931450 rocksdb.number.multiget.get COUNT : 385984 rocksdb.number.multiget.keys.read COUNT : 12351488 rocksdb.number.multiget.bytes.read COUNT : 793145000 rocksdb.number.multiget.keys.found COUNT : 7931450 After (middle performing run of three): $ ./db_bench_new --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 \| egrep 'micros/op\|block.cache.filter.hit\|bloom.filter.(full\|use)\|number.multiget' multireadrandom : 21.024 micros/op 752963 ops/sec; (705188 of 863968 found) rocksdb.block.cache.filter.hit COUNT : 49856682 rocksdb.bloom.filter.useful COUNT : 45684579 rocksdb.bloom.filter.full.positive COUNT : 10395458 rocksdb.bloom.filter.full.true.positive COUNT : 9908456 rocksdb.number.multiget.get COUNT : 481984 rocksdb.number.multiget.keys.read COUNT : 15423488 rocksdb.number.multiget.bytes.read COUNT : 990845600 rocksdb.number.multiget.keys.found COUNT : 9908456 So that's about 25% higher throughput even for random keys Pull Request resolved: https://github.com/facebook/rocksdb/pull/6757 Test Plan: unit test included Reviewed By: anand1976 Differential Revision: D21243256 Pulled By: pdillinger fbshipit-source-id: 5644a1468d9e8c8575be02f4e04bc5d62dbbb57f	2020-04-28 14:49:34 -07:00
Yanqin Jin	d4398e08fc	Fix timestamp support for MultiGet (#6748 ) Summary: 1. Avoid nullptr dereference when passing timestamp to KeyContext creation. 2. Construct LookupKey correctly with timestamp when creating MultiGetContext. 3. Compare without timestamp when sorting KeyContexts. Fixes https://github.com/facebook/rocksdb/issues/6745 Test plan (dev server): make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6748 Reviewed By: pdillinger Differential Revision: D21258691 Pulled By: riversand963 fbshipit-source-id: 44e65b759c18b9986947783edf03be4f890bb004	2020-04-27 22:49:56 -07:00
Peter Dillinger	249eff0f30	Stats for redundant insertions into block cache (#6681 ) Summary: Since read threads do not coordinate on loading data into block cache, two threads between Lookup and Insert can end up loading and inserting the same data. This is particularly concerning with cache_index_and_filter_blocks since those are hot and more likely to be race targets if ejected from (or not pre-populated in) the cache. Particularly with moves toward disaggregated / network storage, the cost of redundant retrieval might be high, and we should at least have some hard statistics from which we can estimate impact. Example with full filter thrashing "cliff": $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 ... $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((130 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 \| egrep '^rocksdb.block.cache.(.add\|.redundant)' \| grep -v compress \| sort rocksdb.block.cache.add COUNT : 14181 rocksdb.block.cache.add.failures COUNT : 0 rocksdb.block.cache.add.redundant COUNT : 476 rocksdb.block.cache.data.add COUNT : 12749 rocksdb.block.cache.data.add.redundant COUNT : 18 rocksdb.block.cache.filter.add COUNT : 1003 rocksdb.block.cache.filter.add.redundant COUNT : 217 rocksdb.block.cache.index.add COUNT : 429 rocksdb.block.cache.index.add.redundant COUNT : 241 $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((120 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 \| egrep '^rocksdb.block.cache.(.add\|.redundant)' \| grep -v compress \| sort rocksdb.block.cache.add COUNT : 1182223 rocksdb.block.cache.add.failures COUNT : 0 rocksdb.block.cache.add.redundant COUNT : 302728 rocksdb.block.cache.data.add COUNT : 31425 rocksdb.block.cache.data.add.redundant COUNT : 12 rocksdb.block.cache.filter.add COUNT : 795455 rocksdb.block.cache.filter.add.redundant COUNT : 130238 rocksdb.block.cache.index.add COUNT : 355343 rocksdb.block.cache.index.add.redundant COUNT : 172478 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6681 Test Plan: Some manual testing (above) and unit test covering key metrics is included Reviewed By: ltamasi Differential Revision: D21134113 Pulled By: pdillinger fbshipit-source-id: c11497b5f00f4ffdfe919823904e52d0a1a91d87	2020-04-27 13:20:27 -07:00
Cheng Chang	0a77617820	Disable O_DIRECT in stress test when db directory does not support direct IO (#6727 ) Summary: In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test. This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727 Test Plan: export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0" make -j24 crash_test Reviewed By: zhichao-cao Differential Revision: D21139250 Pulled By: cheng-chang fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee	2020-04-25 00:01:03 -07:00
Yanqin Jin	e04f3bce4f	Update CURRENT file after best-efforts recovery (#6746 ) Summary: After a successful recovery, the CURRENT file should be updated to point to the valid MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6746 Test Plan: make check Reviewed By: anand1976 Differential Revision: D21189876 Pulled By: riversand963 fbshipit-source-id: 7537b49988c5c425ebe9505a5cc260de351ad79b	2020-04-23 16:21:09 -07:00
Levi Tamasi	bc51e33d9c	Make sure (Shared)BlobFileMetaData are owned by shared_ptrs (#6749 ) Summary: The patch makes a couple of small cleanups to `SharedBlobFileMetaData` and `BlobFileMetaData`: * It makes the constructors private and introduces factory methods to ensure these objects are always owned by `shared_ptr`s. Note that `SharedBlobFileMetaData` has an additional factory that takes a deleter object; we can utilize this to e.g. notify `VersionSet` when a blob file becomes obsolete (which is exactly when `SharedBlobFileMetaData` is destroyed). * It disables move operations explicitly instead of relying on them being suppressed because of a user-declared destructor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6749 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D21206947 Pulled By: ltamasi fbshipit-source-id: 9094c14cc335b3e226f883e5a0df4f87a5cdeb95	2020-04-23 13:44:29 -07:00
mrambacher	4cbc19d2a1	Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389 ) Summary: The methods in convenience.h are used to compare/convert objects to/from strings. There is a mishmash of parameters in use here with more needed in the future. This PR replaces those parameters with a single structure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389 Reviewed By: siying Differential Revision: D21163707 Pulled By: zhichao-cao fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd	2020-04-21 17:38:17 -07:00
anand76	c1ccd6b6af	Implement deadline support for MultiGet (#6710 ) Summary: Initial implementation of ReadOptions.deadline for MultiGet. If the request takes longer than the deadline, the keys not yet found will be returned with Status::TimedOut(). This implementation enforces the deadline in DBImpl, which is fairly high level. Its best effort and may not check the deadline after every key lookup, but may do so after a batch of keys. In subsequent stages, we will extend this to passing a timeout down to the FileSystem. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6710 Test Plan: Add new unit tests Reviewed By: riversand963 Differential Revision: D21149158 Pulled By: anand1976 fbshipit-source-id: 9f44eecffeb40873f5034ed59a66d21f9f88879e	2020-04-21 14:51:51 -07:00
Tomas Kolda	6ee66cf8f0	Prevents Table Cache to open same files more times (#6707 ) Summary: In highly concurrent requests table cache opens same file more times which lowers purpose of max_open_files. Fixes (https://github.com/facebook/rocksdb/issues/6699) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6707 Reviewed By: ltamasi Differential Revision: D21044965 fbshipit-source-id: f6e91d90b60dad86e518b5147021da42460ee1d2	2020-04-21 13:16:31 -07:00
Akanksha Mahajan	03a1d95db0	Set max_background_flushes dynamically (#6701 ) Summary: 1. Add changes so that max_background_flushes can be set dynamically. 2. Add a testcase DBOptionsTest.SetBackgroundFlushThreads which set the max_background_flushes dynamically using SetDBOptions. TestPlan: 1. make -j64 check 2. Using new testcase DBOptionsTest.SetBackgroundFlushThreads Pull Request resolved: https://github.com/facebook/rocksdb/pull/6701 Reviewed By: ajkr Differential Revision: D21028010 Pulled By: akankshamahajan15 fbshipit-source-id: 5f949e4a8fd3c32537b637947b7ee09a69cfc7c1	2020-04-20 16:19:02 -07:00
Peter Dillinger	31da5e34c1	C++20 compatibility (#6697 ) Summary: Based on https://github.com/facebook/rocksdb/issues/6648 (CLA Signed), but heavily modified / extended: * Implicit capture of this via [=] deprecated in C++20, and [=,this] not standard before C++20 -> now using explicit capture lists * Implicit copy operator deprecated in gcc 9 -> add explicit '= default' definition * std::random_shuffle deprecated in C++17 and removed in C++20 -> migrated to a replacement in RocksDB random.h API * Add the ability to build with different std version though -DCMAKE_CXX_STANDARD=11/14/17/20 on the cmake command line * Minimal rebuild flag of MSVC is deprecated and is forbidden with /std:c++latest (C++20) * Added MSVC 2019 C++11 & MSVC 2019 C++20 in AppVeyor * Added GCC 9 C++11 & GCC9 C++20 in Travis Pull Request resolved: https://github.com/facebook/rocksdb/pull/6697 Test Plan: make check and CI Reviewed By: cheng-chang Differential Revision: D21020318 Pulled By: pdillinger fbshipit-source-id: 12311be5dbd8675a0e2c817f7ec50fa11c18ab91	2020-04-20 13:24:25 -07:00
Peter Dillinger	45d2b4efca	Fix tabs and lint-ignores (#6734 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6734 Reviewed By: cheng-chang Differential Revision: D21134556 Pulled By: pdillinger fbshipit-source-id: 3636cc1d1333137b70031f8277458781c21631fb	2020-04-20 11:39:31 -07:00
Andrew Kryczka	6717ada899	Fix CF import with overlapping SST files (#6663 ) Summary: Invariant checking should use internal key comparator rather than `sstableKeyCompare()`. The latter was intended for checking whether a compaction input file's neighboring files need to be included in the same compaction. Using it for invariant checking was leading to false positives for files with overlapping endpoints. Fixes https://github.com/facebook/rocksdb/issues/6647. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6663 Test Plan: regression test Reviewed By: zhichao-cao Differential Revision: D20910466 Pulled By: ajkr fbshipit-source-id: f0b70dad7c4096fce635cab7a36f16e14f74ae3f	2020-04-16 13:16:06 -07:00
Mike Kolupaev	e45673dece	Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621 ) Summary: Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype. Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling. It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas. Note that the deferred value loading only happens for internal iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621 Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats. Reviewed By: siying Differential Revision: D20786930 Pulled By: al13n321 fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee	2020-04-15 17:40:44 -07:00
Yanqin Jin	eeb3cf3f58	Fix release build (#6690 ) Summary: Fix release build caused by variable defined but unused. Test plan (devserver) ``` make release ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6690 Reviewed By: cheng-chang Differential Revision: D20980571 Pulled By: riversand963 fbshipit-source-id: c3f3b13f81dce4bdb19876dc2e710d5902ff8a02	2020-04-11 22:04:04 -07:00
Yanqin Jin	0c05624d50	Compaction with timestamp: input boundaries (#6645 ) Summary: Towards making compaction logic compatible with user timestamp. When computing boundaries and overlapping ranges for inputs of compaction, We need to compare SSTs by user key without timestamp. Test plan (devserver): ``` make check ``` Several individual tests: ``` ./version_set_test --gtest_filter=VersionStorageInfoTimestampTest.GetOverlappingInputs ./db_with_timestamp_compaction_test ./db_with_timestamp_basic_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6645 Reviewed By: ltamasi Differential Revision: D20960012 Pulled By: riversand963 fbshipit-source-id: ad377fa9eb481bf7a8a3e1824aaade48cdc653a4	2020-04-10 16:05:49 -07:00

1 2 3 4 5 ...

3998 Commits