rocksdb

Author	SHA1	Message	Date
Akanksha Mahajan	30a5ed9c53	Update "num_data_read" stat in RetrieveMultipleBlocks (#7770 ) Summary: RetrieveMultipleBlocks which is used by MultiGet to read data blocks is not updating num_data_read stat in GetContextStats. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7770 Test Plan: make check -j64 Reviewed By: anand1976 Differential Revision: D25538982 Pulled By: akankshamahajan15 fbshipit-source-id: e3daedb035b1be8ab6af6f115cb3793ccc7b1ec6	2020-12-23 15:16:46 -08:00
cheng-chang	bdb7e544bd	Skip WALs according to MinLogNumberToKeep when creating checkpoint (#7789 ) Summary: In a stress test failure, we observe that a WAL is skipped when creating checkpoint, although its log number >= MinLogNumberToKeep(). This might happen in the following case: 1. when creating the checkpoint, there are 2 column families: CF0 and CF1, and there are 2 WALs: 1, 2; 2. CF0's log number is 1, CF0's active memtable is empty, CF1's log number is 2, CF1's active memtable is not empty, WAL 2 is not empty, the sequence number points to WAL 2; 2. the checkpoint process flushes CF0, since CF0' active memtable is empty, there is no need to SwitchMemtable, thus no new WAL will be created, so CF0's log number is now 2, concurrently, some data is written to CF0 and WAL 2; 3. the checkpoint process flushes CF1, WAL 3 is created and CF1's log number is now 3, CF0's log number is still 2 because CF0 is not empty and WAL 2 contains its unflushed data concurrently written in step 2; 4. the checkpoint process determines that WAL 1 and 2 are no longer needed according to [live_wal_files[i]->StartSequence() >= *sequence_number](https://github.com/facebook/rocksdb/blob/master/utilities/checkpoint/checkpoint_impl.cc#L388), so it skips linking them to the checkpoint directory; 5. but according to `MinLogNumberToKeep()`, WAL 2 still needs to be kept because CF0's log number is 2. If the checkpoint is reopened in read-only mode, and only read from the snapshot with the initial sequence number, then there will be no data loss or data inconsistency. But if the checkpoint is reopened and read from the most recent sequence number, suppose in step 3, there are also data concurrently written to CF1 and WAL 3, then the most recent sequence number refers to the latest entry in WAL 3, so the data written in step 2 should also be visible, but since WAL 2 is discarded, those data are lost. When tracking WAL in MANIFEST is enabled, when reopening the checkpoint, since WAL 2 is still tracked in MANIFEST as alive, but it's missing from the checkpoint directory, a corruption will be reported. This PR makes the checkpoint process to only skip a WAL if its log number < `MinLogNumberToKeep`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7789 Test Plan: watch existing tests to pass. Reviewed By: ajkr Differential Revision: D25662346 Pulled By: cheng-chang fbshipit-source-id: 136471095baa01886cf44809455cf855f24857a0	2020-12-23 11:33:26 -08:00
anand76	bd2645bc34	Update regression_test.sh to run multireadrandom benchmark (#7802 ) Summary: Update the regression_test.sh script to run the multireadrandom benchmark. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7802 Reviewed By: zhichao-cao Differential Revision: D25685482 Pulled By: anand1976 fbshipit-source-id: ef2973b551a1bbdbce198a0adf29fc277f3e65e2	2020-12-23 11:26:12 -08:00
Peter Dillinger	a727efca99	Remove flaky, redundant, and dubious DBTest.SparseMerge (#7800 ) Summary: This test would occasionally fail like this: WARNING: c:\users\circleci\project\db\db_test.cc(1343): error: Expected: (dbfull()->TEST_MaxNextLevelOverlappingBytes(handles_[1])) <= (20 * 1048576), actual: 33501540 vs 20971520 And being a super old test, it's not structured in a sound way. And it appears that DBTest2.MaxCompactionBytesTest is a better test of what SparseMerge was intended to test. In fact, SparseMerge fails if I set options.max_compaction_bytes = options.target_file_size_base * 1000; Thus, we are removing this negative-value test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7800 Test Plan: Q.E.D. Reviewed By: ajkr Differential Revision: D25693366 Pulled By: pdillinger fbshipit-source-id: 9da07d4dce0559547fc938b2163a2015e956c548	2020-12-23 11:08:12 -08:00
mrambacher	02418194d7	Add more tests for assert status checked (#7524 ) Summary: Added 10 more tests that pass the ASSERT_STATUS_CHECKED test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7524 Reviewed By: akankshamahajan15 Differential Revision: D24323093 Pulled By: ajkr fbshipit-source-id: 28d4106d0ca1740c3b896c755edf82d504b74801	2020-12-22 23:45:58 -08:00
Sergei Petrunia	daab7603f6	Range Locking: Implementation of range locking (#7506 ) Summary: Range Locking - an implementation based on the locktree library - Add a RangeTreeLockManager and RangeTreeLockTracker which implement range locking using the locktree library. - Point locks are handled as locks on single-point ranges. - Add a unit test: range_locking_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7506 Reviewed By: akankshamahajan15 Differential Revision: D25320703 Pulled By: cheng-chang fbshipit-source-id: f86347384b42ba2b0257d67eca0f45f806b69da7	2020-12-22 19:12:36 -08:00
sdong	f4db3e4119	Avoid to force PORTABLE mode in tools/regression_test.sh (#7806 ) Summary: Right now tools/regression_test.sh always builds RocksDB with PORTABLE=1. There isn't a reason for that. Remove it. Users can always specify PORTABLE through envirionement variable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7806 Test Plan: Run tools/regression_test.sh and see it still builds. Reviewed By: ajkr Differential Revision: D25687911 fbshipit-source-id: 1c0b03e5df890babc8b7d8af48b48774d9a4600c	2020-12-22 16:54:07 -08:00
Adam Retter	81592d9ffa	Add more tests to ASSERT_STATUS_CHECKED (4) (#7718 ) Summary: Fourth batch of adding more tests to ASSERT_STATUS_CHECKED. * db_range_del_test * db_write_test * random_access_file_reader_test * merge_test * external_sst_file_test * write_buffer_manager_test * stringappend_test * deletefile_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7718 Reviewed By: pdillinger Differential Revision: D25671608 fbshipit-source-id: 687a794e98a9e0cd5428ead9898ef05ced987c31	2020-12-22 15:09:39 -08:00
cheng-chang	41ff125a8a	SyncWAL shouldn't be supported in compacted db (#7788 ) Summary: `CompactedDB` is a kind of read-only DB, so it shouldn't support `SyncWAL`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7788 Test Plan: watch existing tests to pass. Reviewed By: akankshamahajan15 Differential Revision: D25661209 Pulled By: cheng-chang fbshipit-source-id: 9eb2cc3f73736dcc205c8410e5944aa203f002d3	2020-12-22 14:53:43 -08:00
Sergei Petrunia	1022090981	Apply the changes from: PS-5501 : Re-license PerconaFT 'locktree' to Apache V2 (#7801 ) Summary: commit d5178f513c0b4144a5ac9358ec0f6a3b54a28e76 Author: George O. Lorch III <george.lorch@percona.com> Date: Tue Mar 19 12:18:40 2019 -0700 PS-5501 : Re-license PerconaFT 'locktree' to Apache V2 - Fixed some incomplete relicensed files from previous round. - Added missing license text to some. - Relicensed more files to Apache V2 that locktree depends on. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7801 Reviewed By: jay-zhuang Differential Revision: D25682430 Pulled By: cheng-chang fbshipit-source-id: deb8a0de3e76f3638672997bfbd300e2fffbe5f5	2020-12-22 14:47:41 -08:00
sdong	9057d0a079	Minimize Timing Issue in test WALTrashCleanupOnOpen (#7796 ) Summary: We saw DBWALTestWithParam/DBWALTestWithParam.WALTrashCleanupOnOpen sometimes fail with: db/db_sst_test.cc:575: Failure Expected: (trash_log_count) >= (1), actual: 0 vs 1 The suspicious is that delete scheduling actually deleted all trash files based on rate, but it is not expected. This can be reproduced if we manually add sleep after DB is closed for serveral seconds. Minimize its chance by setting the delete rate to be lowest possible. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7796 Test Plan: The test doesn't fail with the manual sleeping anymore Reviewed By: anand1976 Differential Revision: D25675000 fbshipit-source-id: a39fd05e1a83719c41014e48843792e752368e22	2020-12-22 14:44:08 -08:00
Akanksha Mahajan	fbac1b3f80	Add tests in ASSERT_STATUS_CHECKED (#7793 ) Summary: add io_tracer_parser_test and prefetch_test under ASSERT_STATUS_CHECKED Pull Request resolved: https://github.com/facebook/rocksdb/pull/7793 Test Plan: ASSERT_STATUS_CHECKED=1 make check -j64 Reviewed By: jay-zhuang Differential Revision: D25673464 Pulled By: akankshamahajan15 fbshipit-source-id: 50e0b6f17160ddda206a521a7b47ee33e699a2d4	2020-12-22 10:31:13 -08:00
Peter Dillinger	4d897e51df	Migrate away from Travis+Linux+amd64 (#7791 ) Summary: This disables Linux/amd64 builds in Travis for PRs, and adds a gcc-10+c++20 build in CircleCI, which should fill out sufficient coverage vs. what we had in Travis Fixed a use of std::is_pod, which is deprecated in c++20 Fixed ++ on a volatile in db_repl_stress.cc, with bigger refactoring. Although ++ on this volatile was probably ok with one thread writer and one thread reader, the code was still overly complex. There was a deadcode check for error `if (replThread.no_read < dataPump.no_records)` which can be proven never to happen based on the structure of the code. It infinite loops instead for the case intended to be checked. I just simplified the code for what should be the same checking power. Also most configurations seem to be using make parallelism = 2 * vcores, so fixing / using that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7791 Test Plan: CI and `while ./db_repl_stress; do echo again; done` for a while Reviewed By: siying Differential Revision: D25669834 Pulled By: pdillinger fbshipit-source-id: b2c688053d0b1d52c989903449d3cd27a04130d6	2020-12-22 00:20:57 -08:00
Jay Zhuang	861b0d1a49	Fix Windows build in block_cache_tracer_test (#7795 ) Summary: The test was added to cmake in https://github.com/facebook/rocksdb/issues/7783 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7795 Reviewed By: akankshamahajan15 Differential Revision: D25671010 Pulled By: jay-zhuang fbshipit-source-id: 2146ff9559cdd7266c4d78476672488c62654a6d	2020-12-21 15:22:16 -08:00
Jay Zhuang	fd0d35d390	Fix block_cache_test failure (#7783 ) Summary: `block_cache_tracer_test` and `block_cache_trace_analyzer_test` are using the same test directory. Which causes build failure if these 2 tests are running in parallel, for example: https://app.circleci.com/pipelines/github/facebook/rocksdb/5211/workflows/8639afbe-9fec-43e2-a6a4-6d47ea9cbcbe/jobs/74598/tests Pull Request resolved: https://github.com/facebook/rocksdb/pull/7783 Reviewed By: akankshamahajan15 Differential Revision: D25656762 Pulled By: jay-zhuang fbshipit-source-id: 68aa020aa5b4b3bce324315edecb4e1a60cc18e6	2020-12-21 08:47:08 -08:00
Jay Zhuang	a8aeefd0fd	Update release version to 6.16 (#7782 ) Summary: Update release version to 6.8 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7782 Reviewed By: siying Differential Revision: D25648579 Pulled By: jay-zhuang fbshipit-source-id: c536d606868b95c5fb2ae8f19c17eb259d67bc51	2020-12-19 12:39:21 -08:00
Peter Dillinger	4d1ac19e3d	aggregated-table-properties with GetMapProperty (#7779 ) Summary: So that we can more easily get aggregate live table data such as total filter, index, and data sizes. Also adds ldb support for getting properties Also fixed some missing/inaccurate related comments in db.h For example: $ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties rocksdb.aggregated-table-properties.data_size: 102871 rocksdb.aggregated-table-properties.filter_size: 0 rocksdb.aggregated-table-properties.index_partitions: 0 rocksdb.aggregated-table-properties.index_size: 2232 rocksdb.aggregated-table-properties.num_data_blocks: 100 rocksdb.aggregated-table-properties.num_deletions: 0 rocksdb.aggregated-table-properties.num_entries: 15000 rocksdb.aggregated-table-properties.num_merge_operands: 0 rocksdb.aggregated-table-properties.num_range_deletions: 0 rocksdb.aggregated-table-properties.raw_key_size: 288890 rocksdb.aggregated-table-properties.raw_value_size: 198890 rocksdb.aggregated-table-properties.top_level_index_size: 0 $ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties-at-level1 rocksdb.aggregated-table-properties-at-level1.data_size: 80909 rocksdb.aggregated-table-properties-at-level1.filter_size: 0 rocksdb.aggregated-table-properties-at-level1.index_partitions: 0 rocksdb.aggregated-table-properties-at-level1.index_size: 1787 rocksdb.aggregated-table-properties-at-level1.num_data_blocks: 81 rocksdb.aggregated-table-properties-at-level1.num_deletions: 0 rocksdb.aggregated-table-properties-at-level1.num_entries: 12466 rocksdb.aggregated-table-properties-at-level1.num_merge_operands: 0 rocksdb.aggregated-table-properties-at-level1.num_range_deletions: 0 rocksdb.aggregated-table-properties-at-level1.raw_key_size: 238210 rocksdb.aggregated-table-properties-at-level1.raw_value_size: 163414 rocksdb.aggregated-table-properties-at-level1.top_level_index_size: 0 $ Pull Request resolved: https://github.com/facebook/rocksdb/pull/7779 Test Plan: Added a test to ldb_test.py Reviewed By: jay-zhuang Differential Revision: D25653103 Pulled By: pdillinger fbshipit-source-id: 2905469a08a64dd6b5510cbd7be2e64d3234d6d3	2020-12-19 08:00:14 -08:00
Cheng Chang	fbce7a3808	Track WAL obsoletion when updating empty CF's log number (#7781 ) Summary: In the write path, there is an optimization: when a new WAL is created during SwitchMemtable, we update the internal log number of the empty column families to the new WAL. `FindObsoleteFiles` marks a WAL as obsolete if the WAL's log number is less than `VersionSet::MinLogNumberWithUnflushedData`. After updating the empty column families' internal log number, `VersionSet::MinLogNumberWithUnflushedData` might change, so some WALs might become obsolete to be purged from disk. For example, consider there are 3 column families: 0, 1, 2: 1. initially, all the column families' log number is 1; 2. write some data to cf0, and flush cf0, but the flush is pending; 3. now a new WAL 2 is created; 4. write data to cf1 and WAL 2, now cf0's log number is 1, cf1's log number is 2, cf2's log number is 2 (because cf1 and cf2 are empty, so their log numbers will be set to the highest log number); 5. now cf0's flush hasn't finished, flush cf1, a new WAL 3 is created, and cf1's flush finishes, now cf0's log number is 1, cf1's log number is 3, cf2's log number is 3, since WAL 1 still contains data for the unflushed cf0, no WAL can be deleted from disk; 6. now cf0's flush finishes, cf0's log number is 2 (because when cf0 was switching memtable, WAL 3 does not exist yet), cf1's log number is 3, cf2's log number is 3, so WAL 1 can be purged from disk now, but WAL 2 still cannot because `MinLogNumberToKeep()` is 2; 7. write data to cf2 and WAL 3, because cf0 is empty, its log number is updated to 3, so now cf0's log number is 3, cf1's log number is 3, cf2's log number is 3; 8. now if the background threads want to purge obsolete files from disk, WAL 2 can be purged because `MinLogNumberToKeep()` is 3. But there are only two flush results written to MANIFEST: the first is for flushing cf1, and the `MinLogNumberToKeep` is 1, the second is for flushing cf0, and the `MinLogNumberToKeep` is 2. So without this PR, if the DB crashes at this point and try to recover, `WalSet` will still expect WAL 2 to exist. When WAL tracking is enabled, we assume WALs will only become obsolete after a flush result is written to MANIFEST in `MemtableList::TryInstallMemtableFlushResults` (or its atomic flush counterpart). The above situation breaks this assumption. This PR tracks WAL obsoletion if necessary before updating the empty column families' log numbers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7781 Test Plan: watch existing tests and stress tests to pass. `make -j48 blackbox_crash_test` on devserver Reviewed By: ltamasi Differential Revision: D25631695 Pulled By: cheng-chang fbshipit-source-id: ca7fff967bdb42204b84226063d909893bc0a4ec	2020-12-18 21:34:36 -08:00
Adam Retter	62afa968c2	Fix various small build issues, Java API naming (#7776 ) Summary: * Compatibility with older GCC. * Compatibility with older jemalloc libraries. * Remove Docker warning when building i686 binaries. * Fix case inconsistency in Java API naming (potential update to HISTORY.md deferred) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7776 Reviewed By: akankshamahajan15 Differential Revision: D25607235 Pulled By: pdillinger fbshipit-source-id: 7ab0fb7fa7a34e97ed0bec991f5081acb095777d	2020-12-18 16:12:26 -08:00
sdong	75e4af14e0	Update code comment for options.ttl (#7775 ) Summary: The behavior of options.ttl has been updated long ago but we didn't update the code comments. Also update the periodic compaction's comment. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7775 Test Plan: See it can still build through CI. Reviewed By: ajkr Differential Revision: D25592015 fbshipit-source-id: b1db18b6787e7048ce6aedcbc3bb44493c9fc49b	2020-12-18 15:29:03 -08:00
Peter Dillinger	239d17a19c	Support optimize_filters_for_memory for Ribbon filter (#7774 ) Summary: Primarily this change refactors the optimize_filters_for_memory code for Bloom filters, based on malloc_usable_size, to also work for Ribbon filters. This change also replaces the somewhat slow but general BuiltinFilterBitsBuilder::ApproximateNumEntries with implementation-specific versions for Ribbon (new) and Legacy Bloom (based on a recently deleted version). The reason is to emphasize speed in ApproximateNumEntries rather than 100% accuracy. Justification: ApproximateNumEntries (formerly CalculateNumEntry) is only used by RocksDB for range-partitioned filters, called each time we start to construct one. (In theory, it should be possible to reuse the estimate, but the abstractions provided by FilterPolicy don't really make that workable.) But this is only used as a heuristic estimate for hitting a desired partitioned filter size because of alignment to data blocks, which have various numbers of unique keys or prefixes. The two factors lead us to prioritize reasonable speed over 100% accuracy. optimize_filters_for_memory adds extra complication, because precisely calculating num_entries for some allowed number of bytes depends on state with optimize_filters_for_memory enabled. And the allocator-agnostic implementation of optimize_filters_for_memory, using malloc_usable_size, means we would have to actually allocate memory, many times, just to precisely determine how many entries (keys) could be added and stay below some size budget, for the current state. (In a draft, I got this working, and then realized the balance of speed vs. accuracy was all wrong.) So related to that, I have made CalculateSpace, an internal-only API only used for testing, non-authoritative also if optimize_filters_for_memory is enabled. This simplifies some code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7774 Test Plan: unit test updated, and for FilterSize test, range of tested values is greatly expanded (still super fast) Also tested `db_bench -benchmarks=fillrandom,stats -bloom_bits=10 -num=1000000 -partition_index_and_filters -format_version=5 [-optimize_filters_for_memory] [-use_ribbon_filter]` with temporary debug output of generated filter sizes. Bloom+optimize_filters_for_memory: 1 Filter size: 197 (224 in memory) 134 Filter size: 3525 (3584 in memory) 107 Filter size: 4037 (4096 in memory) Total on disk: 904,506 Total in memory: 918,752 Ribbon+optimize_filters_for_memory: 1 Filter size: 3061 (3072 in memory) 110 Filter size: 3573 (3584 in memory) 58 Filter size: 4085 (4096 in memory) Total on disk: 633,021 (-30.0%) Total in memory: 634,880 (-30.9%) Bloom (no offm): 1 Filter size: 261 (320 in memory) 1 Filter size: 3333 (3584 in memory) 240 Filter size: 3717 (4096 in memory) Total on disk: 895,674 (-1% on disk vs. +offm; known tolerable overhead of offm) Total in memory: 986,944 (+7.4% vs. +offm) Ribbon (no offm): 1 Filter size: 2949 (3072 in memory) 1 Filter size: 3381 (3584 in memory) 167 Filter size: 3701 (4096 in memory) Total on disk: 624,397 (-30.3% vs. Bloom) Total in memory: 690,688 (-30.0% vs. Bloom) Note that optimize_filters_for_memory is even more effective for Ribbon filter than for cache-local Bloom, because it can close the unused memory gap even tighter than Bloom filter, because of 16 byte increments for Ribbon vs. 64 byte increments for Bloom. Reviewed By: jay-zhuang Differential Revision: D25592970 Pulled By: pdillinger fbshipit-source-id: 606fdaa025bb790d7e9c21601e8ea86e10541912	2020-12-18 14:31:03 -08:00
Zhichao Cao	04b3524ad0	Inject the random write error to stress test (#7653 ) Summary: Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653 Test Plan: pass db_stress and python3 db_crashtest.py blackbox Reviewed By: ajkr Differential Revision: D25354132 Pulled By: zhichao-cao fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669	2020-12-17 11:52:28 -08:00
Akanksha Mahajan	99f5a800c3	Fix clang_analyze error (#7777 ) Summary: Fix clang_analyze error Pull Request resolved: https://github.com/facebook/rocksdb/pull/7777 Test Plan: USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j64 analyze Reviewed By: jay-zhuang Differential Revision: D25601675 Pulled By: akankshamahajan15 fbshipit-source-id: 30f58cf4d575a2d546c455fb43e856455eb72a07	2020-12-16 21:34:41 -08:00
Adam Retter	29d12748b0	Fix failing RocksJava test compilation and add CI (#7769 ) Summary: * Fixes a Java test compilation issue on macOS * Cleans up CircleCI RocksDBJava build config * Adds CircleCI for RocksDBJava on MacOS * Ensures backwards compatibility with older macOS via CircleCI * Fixes RocksJava static builds ordering * Adds missing RocksJava static builds to CircleCI for Mac and Linux * Improves parallelism in RocksJava builds * Reduces the size of the machines used for RocksJava CircleCI as they don't need to be so large (Saves credits) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7769 Reviewed By: akankshamahajan15 Differential Revision: D25601293 Pulled By: pdillinger fbshipit-source-id: 0a0bb9906f65438fe143487d78e37e1947364d08	2020-12-16 16:00:02 -08:00
Burton Li	2021392e25	Do not full scan obsolete files on compaction busy (#7739 ) Summary: When ConcurrentTaskLimiter is enabled and there are too many outstanding compactions, BackgroundCompaction returns Status::Busy(), which shouldn't be treat as compaction failure. This caused performance issue when outstanding compactions reached the limit. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7739 Reviewed By: cheng-chang Differential Revision: D25508319 Pulled By: ltamasi fbshipit-source-id: 3b181b16ada0ca3393cfa3a7412985764e79c719	2020-12-15 13:51:10 -08:00
Jay Zhuang	a0e4421e81	Log sst number in Corruption status (#7767 ) Summary: sst file number in corruption error would be very useful for debugging Pull Request resolved: https://github.com/facebook/rocksdb/pull/7767 Reviewed By: zhichao-cao Differential Revision: D25485872 Pulled By: jay-zhuang fbshipit-source-id: 67315b582cedeefbce6676015303ebe5bf6526a3	2020-12-14 14:07:52 -08:00
Levi Tamasi	1afbd1948c	Add initial blob support to batched MultiGet (#7766 ) Summary: The patch adds initial support for reading blobs to the batched `MultiGet` API. The current implementation simply retrieves the blob values as the blob indexes are encountered; that is, reads from blob files are currently not batched. (This will be optimized in a separate phase.) In addition, the patch removes some dead code related to BlobDB from the batched `MultiGet` implementation, namely the `is_blob` / `is_blob_index` flags that are passed around in `DBImpl` and `MemTable` / `MemTableListVersion`. These were never hooked up to anything and wouldn't work anyways, since a single flag is not sufficient to communicate the "blobness" of multiple key-values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7766 Test Plan: `make check` Reviewed By: jay-zhuang Differential Revision: D25479290 Pulled By: ltamasi fbshipit-source-id: 7aba2d290e31876ee592bcf1adfd1018713a8000	2020-12-14 13:48:22 -08:00
Peter Dillinger	003e72b201	Use size_t for filter APIs, protect against overflow (#7726 ) Summary: Deprecate CalculateNumEntry and replace with ApproximateNumEntries (better name) using size_t instead of int and uint32_t, to minimize confusing casts and bad overflow behavior (possible though probably not realistic). Bloom sizes are now explicitly capped at max size supported by implementations: just under 4GiB for fv=5 Bloom, and just under 512MiB for fv<5 Legacy Bloom. This hardening could help to set up for fuzzing. Also, since RocksDB only uses this information as an approximation for trying to hit certain sizes for partitioned filters, it's more important that the function be reasonably fast than for it to be completely accurate. It's hard enough to be 100% accurate for Ribbon (currently reversing CalculateSpace) that adding optimize_filters_for_memory into the mix is just not worth trying to be 100% accurate for num entries for bytes. Also: - Cleaned up filter_policy.h to remove MSVC warning handling and potentially unsafe use of exception for "not implemented" - Correct the number of entries limit beyond which current Ribbon implementation falls back on Bloom instead. - Consistently use "num_entries" rather than "num_entry" - Remove LegacyBloomBitsBuilder::CalculateNumEntry as it's essentially obsolete from general implementation BuiltinFilterBitsBuilder::CalculateNumEntries. - Fix filter_bench to skip some tests that don't make sense when only one or a small number of filters has been generated. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7726 Test Plan: expanded existing unit tests for CalculateSpace / ApproximateNumEntries. Also manually used filter_bench to verify Legacy and fv=5 Bloom size caps work (much too expensive for unit test). Note that the actual bits per key is below requested due to space cap. $ ./filter_bench -impl=0 -bits_per_key=20 -average_keys_per_filter=256000000 -vary_key_count_ratio=0 -m_keys_total_max=256 -allow_bad_fp_rate ... Total size (MB): 511.992 Bits/key stored: 16.777 ... $ ./filter_bench -impl=2 -bits_per_key=20 -average_keys_per_filter=2000000000 -vary_key_count_ratio=0 -m_keys_total_max=2000 ... Total size (MB): 4096 Bits/key stored: 17.1799 ... $ Reviewed By: jay-zhuang Differential Revision: D25239800 Pulled By: pdillinger fbshipit-source-id: f94e6d065efd31e05ec630ae1a82e6400d8390c4	2020-12-11 22:18:12 -08:00
Cheng Chang	491779514e	Update SstFileWriter fuzzer to iterate and check all key-value pairs (#7761 ) Summary: as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/7761 Test Plan: cd fuzz && make sst_file_writer_fuzzer && ./sst_file_writer_fuzzer Reviewed By: pdillinger Differential Revision: D25430802 Pulled By: cheng-chang fbshipit-source-id: 01436307df6f4c434bb608f44e1c8e4a1119f94f	2020-12-11 16:09:10 -08:00
Peter Dillinger	b1ee191405	Fix memory leak for ColumnFamily drop with live iterator (#7749 ) Summary: Uncommon bug seen by ASAN with ColumnFamilyTest.LiveIteratorWithDroppedColumnFamily, if the last two references to a ColumnFamilyData are both SuperVersions (during InstallSuperVersion). The fix is to use UnrefAndTryDelete even in SuperVersion::Cleanup but with a parameter to avoid re-entering Cleanup on the same SuperVersion being cleaned up. ColumnFamilyData::Unref is considered unsafe so removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7749 Test Plan: ./column_family_test --gtest_filter=LiveIter --gtest_repeat=100 Reviewed By: jay-zhuang Differential Revision: D25354304 Pulled By: pdillinger fbshipit-source-id: e78f3a3f67c40013b8432f31d0da8bec55c5321c	2020-12-11 11:18:21 -08:00
Jay Zhuang	07c0fc002a	Fix pyenv-version-name not found issue (#7768 ) Summary: Remove the workaround for that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7768 Reviewed By: cheng-chang Differential Revision: D25487150 Pulled By: jay-zhuang fbshipit-source-id: 4514dfc64eb56d2dad48f9a24176c21af8107def	2020-12-10 20:37:51 -08:00
Cheng Chang	fd7d8dc56e	Do not log unnecessary WAL obsoletion events (#7765 ) Summary: min_wal_number_to_keep should not be decreasing, if it does not increase, then there is no need to log the WAL obsoletions in MANIFEST since a previous one has been logged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7765 Test Plan: watch existing tests and stress tests to pass Reviewed By: pdillinger Differential Revision: D25462542 Pulled By: cheng-chang fbshipit-source-id: 0085fcb6edf5cf2b0fc32f9932a7566f508768ff	2020-12-10 12:55:49 -08:00
Azat Khuzhin	40f2b65008	Eliminate possible race between LockFile() vs UnlockFile() (#7721 ) Summary: LockFile() accessing LockHoldingInfo (element of locked_files) by reference after mutex_locked_files had been released. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7721 Reviewed By: pdillinger Differential Revision: D25431839 Pulled By: jay-zhuang fbshipit-source-id: eefee93f12a8016a98e2466e442af2605b3e2a5e	2020-12-10 09:35:11 -08:00
Adam Retter	8ff6557e7f	Add further tests to ASSERT_STATUS_CHECKED (2) (#7698 ) Summary: Second batch of adding more tests to ASSERT_STATUS_CHECKED. * external_sst_file_basic_test * checkpoint_test * db_wal_test * db_block_cache_test * db_logical_block_size_cache_test * db_blob_index_test * optimistic_transaction_test * transaction_test * point_lock_manager_test * write_prepared_transaction_test * write_unprepared_transaction_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7698 Reviewed By: cheng-chang Differential Revision: D25441664 Pulled By: pdillinger fbshipit-source-id: 9e78867f32321db5d4833e95eb96c5734526ef00	2020-12-09 21:21:16 -08:00
Michael Lee	8e2749fd3a	Fix use of positional args in BUCK rules (#7760 ) Summary: Prefer to use keyword args rather than positional args for Buck rules. This appears to be the only remaining instance for `custom_unittest` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7760 Test Plan: Search for other instances of `custom_unittest` without `name` Reviewed By: cheng-chang Differential Revision: D25439887 Pulled By: mzlee fbshipit-source-id: 518c541a5c01207c7b0c1f7322addf5cc4f09f92	2020-12-09 19:25:31 -08:00
Manuel Ung	71239908cf	Invalidate iterator on transaction clear (#7733 ) Summary: Some clients do not close their iterators until after the transaction finishes. To handle this case, we will invalidate any iterators on transaction clear. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7733 Reviewed By: cheng-chang Differential Revision: D25261158 Pulled By: lth fbshipit-source-id: b91320f00c54cbe0e6882b794b34f3bb5640dbc0	2020-12-09 19:13:22 -08:00
Cheng Chang	80159f6e0b	Carry over min_log_number_to_keep_2pc in new MANIFEST (#7747 ) Summary: When two phase commit is enabled, `VersionSet::min_log_number_to_keep_2pc` is set during flush. But when a new MANIFEST is created, the `min_log_number_to_keep_2pc` is not carried over to the new MANIFEST. So if a new MANIFEST is created and then DB is reopened, the `min_log_number_to_keep_2pc` will be lost. This may cause DB recovery errors. The bug is reproduced in a new unit test in `version_set_test.cc`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7747 Test Plan: The new unit test in `version_set_test.cc` should pass. Reviewed By: jay-zhuang Differential Revision: D25350661 Pulled By: cheng-chang fbshipit-source-id: eee890d5b19f15769069670692e270ae31044ece	2020-12-09 19:07:25 -08:00
anand76	8a1488efbf	Ensure that MultiGet works properly with compressed cache (#7756 ) Summary: Ensure that when direct IO is enabled and a compressed block cache is configured, MultiGet inserts compressed data blocks into the compressed block cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7756 Test Plan: Add unit test to db_basic_test Reviewed By: cheng-chang Differential Revision: D25416240 Pulled By: anand1976 fbshipit-source-id: 75d57526370c9c0a45ff72651f3278dbd8a9086f	2020-12-09 17:01:13 -08:00
Cheng Chang	3c2a448856	Add a test for disabling tracking WAL (#7757 ) Summary: If WAL tracking was enabled, then disabled during reopen, the previously tracked WALs should be removed from MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7757 Test Plan: a new unit test `DBBasicTest.DisableTrackWal` is added. Reviewed By: jay-zhuang Differential Revision: D25410508 Pulled By: cheng-chang fbshipit-source-id: 9d8d9e665066135930a7c1035bb8c2f68bded6a0	2020-12-09 16:58:26 -08:00
Cheng Chang	89cc06b3e7	Add a new db_map_fuzzer (#7762 ) Summary: Execute randomly generated operations on both a DB and a std::map, then reopen the DB and make sure that iterating the DB produces the same key-value pairs as iterating through the std::map. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7762 Test Plan: cd fuzz && make db_map_fuzzer && ./db_map_fuzzer Reviewed By: pdillinger Differential Revision: D25437485 Pulled By: cheng-chang fbshipit-source-id: 3a93f7efd046b194193e45d2ab1ad81565510781	2020-12-09 16:26:35 -08:00
Cheng Chang	efe827baf0	Always track WAL obsoletion (#7759 ) Summary: Currently, when a WAL becomes obsolete after flushing, if VersionSet::WalSet does not contain the WAL, we do not track the WAL obsoletion event in MANIFEST. But consider this case: * WAL 10 is synced, a VersionEdit is LogAndApplied to MANIFEST to log this WAL addition event, but the VersionEdit is not applied to WalSet yet since its corresponding ManifestWriter is still pending in the write queue; * Since the above ManifestWriter is blocking, the LogAndApply will block on a conditional variable and release the db mutex, so another LogAndApply can proceed to enqueue other VersionEdits concurrently; * Now flush happens, and WAL 10 becomes obsolete, although WalSet does not contain WAL 10 yet, we should call LogAndApply to enqueue a VersionEdit to indicate the obsoletion of WAL 10; * otherwise, when the queued edit indicating WAL 10 addition is logged to MANIFEST, and DB crashes and reopens, the WAL 10 might have been removed from disk, but it still exists in MANIFEST. This PR changes the behavior to: always `LogAndApply` any WAL addition or obsoletion event, without considering the order issues caused by concurrency, but when applying the edits to `WalSet`, do not add the WALs if they are already obsolete. In this approach, the logical events of WAL addition and obsoletion are always tracked in MANIFEST, so we can inspect the MANIFEST and know all the previous WAL events, but we choose to ignore certain events due to the concurrency issues such as the case above, or the case in https://github.com/facebook/rocksdb/pull/7725. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7759 Test Plan: make check Reviewed By: pdillinger Differential Revision: D25423089 Pulled By: cheng-chang fbshipit-source-id: 9cb9a7fbc1875bf954f2a42f9b6cfd6d49a7b21c	2020-12-09 16:02:12 -08:00
Sergei Petrunia	98236fb10e	LockTree library, originally from PerconaFT (#7753 ) Summary: To be used for implementing Range Locking. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7753 Reviewed By: zhichao-cao Differential Revision: D25378980 Pulled By: cheng-chang fbshipit-source-id: 801a9c5cd92a84654ca2586b73e8f69001e89320	2020-12-09 12:10:57 -08:00
Adam Retter	7b2216c906	Add further tests to ASSERT_STATUS_CHECKED (1) (#7679 ) Summary: First batch of adding more tests to ASSERT_STATUS_CHECKED. * db_iterator_test * db_memtable_test * db_merge_operator_test * db_merge_operand_test * write_batch_test * write_batch_with_index_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7679 Reviewed By: ajkr Differential Revision: D25399270 Pulled By: pdillinger fbshipit-source-id: 3017d0a686aec5cd2d743fc2acbbf75df239f3ba	2020-12-08 15:55:04 -08:00
pkubaj	66e54c5984	Fix build on FreeBSD/powerpc64(le) (#7732 ) Summary: To build on FreeBSD, arch_ppc_probe needs to be adapted to FreeBSD. Since FreeBSD uses elf_aux_info as an getauxval equivalent, use it and include necessary headers: - machine/cpu.h for PPC_FEATURE2_HAS_VEC_CRYPTO, - sys/auxv.h for elf_aux_info, - sys/elf_common.h for AT_HWCAP2. elf_aux_info isn't checked for being available, because it's available since FreeBSD 12.0. rocksdb assumes using Clang on FreeBSD, but powerpc* platforms switch to Clang only since 13.0. This patch makes rocksdb build on FreeBSD on powerpc64 and powerpc64le platforms. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7732 Reviewed By: ltamasi Differential Revision: D25399194 Pulled By: pdillinger fbshipit-source-id: 9c905147d75f98cd2557dd2f86a940b8e6c5afcd	2020-12-08 15:31:56 -08:00
Vincent Milum Jr	93c6c18cf9	Adding ARM AT_HWCAP support for FreeBSD (#7750 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7750 Reviewed By: ltamasi Differential Revision: D25400609 Pulled By: pdillinger fbshipit-source-id: 13b15e2f490acc011b648fbd9615ea8e580cccc7	2020-12-08 13:33:21 -08:00
Cheng Chang	07030c6f4a	Do not track obsolete WALs in MANIFEST even if they are synced (#7725 ) Summary: Consider the case: 1. All column families are flushed, so all WALs become obsolete, but no WAL is removed from disk yet because the removal is asynchronous, a VersionEdit is written to MANIFEST indicating that WALs before a certain WAL number are obsolete, let's say this number is 3; 2. `SyncWAL` is called, so all the on-disk WALs are synced, and if track_and_verify_wal_in_manifest=true, the WALs will be tracked in MANIFEST, let's say the WAL numbers are 1 and 2; 3. DB crashes; 4. During DB recovery, when replaying MANIFEST, we first see that WAL with number < 3 are obsolete, then we see that WAL 1 and 2 are synced, so according to current implementation of `WalSet`, the `WalSet` will be recovered to include WAL 1 and 2; 5. WAL 1 and 2 are asynchronously deleted from disk, then the WAL verification algorithm fails with `Corruption: missing WAL`. The above case is reproduced in a new unit test `DBBasicTestTrackWal::DoNotTrackObsoleteWal`. The fix is to maintain the upper bound of the obsolete WAL numbers, any WAL with number less than the maintained number is considered to be obsolete, so shouldn't be tracked even if they are later synced. The number is maintained in `WalSet`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7725 Test Plan: 1. a new unit test `DBBasicTestTrackWal::DoNotTrackObsoleteWal` is added. 2. run `make crash_test` on devserver. Reviewed By: riversand963 Differential Revision: D25238914 Pulled By: cheng-chang fbshipit-source-id: f5dccd57c3d89f19565ec5731f2d42f06d272b72	2020-12-08 10:58:04 -08:00
Yanqin Jin	11c4be2222	Refactor ProcessManifestWrites a little bit (#7751 ) Summary: This PR removes a nested loop inside ProcessManifestWrites. The new implementation has the same behavior as the old code with simpler logic and lower complexity. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7751 Test Plan: make check Run make crash_test on devserver and succeeds 3 times. Reviewed By: ltamasi Differential Revision: D25363526 Pulled By: riversand963 fbshipit-source-id: 27e681949dacd7501a752e5e517b9e85b54ccb2e	2020-12-08 02:37:38 -08:00
Sergei Petrunia	d8bd9fc7b3	Range Locking: Allow different LockManagers, add Range Lock definitions (#7443 ) Summary: This PR has two commits: 1. Modify the code to allow different Lock Managers (of any kind) to be used. It is implied that a LockManager uses its own custom LockTracker. 2. Add definitions for Range Locking (class Endpoint and GetRangeLock() function. cheng-chang, is this what you've had in mind (should the PR have both item 1 and item 2?) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7443 Reviewed By: zhichao-cao Differential Revision: D24123172 Pulled By: cheng-chang fbshipit-source-id: c6548ad6d4cc3c25f68d13b29147bc6fdf357185	2020-12-07 20:18:07 -08:00
mrambacher	db03172d08	Change ErrorHandler methods to return const Status& (#7539 ) Summary: This change eliminates the need for a lot of the PermitUncheckedError calls on return from ErrorHandler methods. The calls are no longer needed as the status is returned as a reference rather than a copy. Additionally, this means that the originating status (recovery_error_, bg_error_) is not cleared implicitly as a result of calling one of these methods. For this class, I do not know if the proper behavior should be to call PermitUncheckedError in the destructor or if the checked state should be cleared when the status is cleared. I did tests both ways. Without the code in the destructor, the status will need to be cleared in at least some of the places where it is set to OK. When running tests, I found no instances where this class was destructed with a non-OK, non-checked Status. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7539 Reviewed By: anand1976 Differential Revision: D25340565 Pulled By: pdillinger fbshipit-source-id: 1730c035c81a475875ea745226112030ec25136c	2020-12-07 20:11:35 -08:00
Levi Tamasi	8a06fe278f	Do not use ASSERT_OK in child threads in ExternalSstFileTest.PickedLevelBug (#7754 ) Summary: `googletest` uses exceptions to communicate assertion failures when `GTEST_THROW_ON_FAILURE` is set, which does not go well with `std::thread`s, since an exception escaping the top-level function of an `std::thread` object or an `std::thread` getting destroyed without having been `join`ed or `detach`ed first results in a call to `std::terminate`. The patch fixes this by moving the `Status` assertions of background operations in `ExternalSstFileTest.PickedLevelBug` to the main thread. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7754 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D25383808 Pulled By: ltamasi fbshipit-source-id: 32fb2721e5169ec898d218900bc0d83eead45d03	2020-12-07 17:37:17 -08:00

1 2 3 4 5 ...

9667 Commits