41fe221a06
11060 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Akanksha Mahajan
|
41fe221a06 |
Update History.md for #9922 (#10092)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10092 Reviewed By: riversand963 Differential Revision: D36832311 Pulled By: akankshamahajan15 fbshipit-source-id: 8fb1cf90b1d4dddebbfbeebeddb15f6905968e9b |
||
Akanksha Mahajan
|
405a35f8c3 |
Persist the new MANIFEST after successfully syncing the new WAL during recovery (#9922)
Summary: In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't flush the data from WAL to L0 for all column families if possible. As a result, not all column families can increase their log_numbers, and min_log_number_to_keep won't change. For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change. If we persist a new MANIFEST with advanced log_numbers for some column families, then during a second crash after persisting the MANIFEST, RocksDB will see some column families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail. As a solution, RocksDB will persist the new MANIFEST after successfully syncing the new WAL. If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point. If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful Pull Request resolved: https://github.com/facebook/rocksdb/pull/9922 Test Plan: 1. Update unit tests to fail without this change 2. make crast_test -j Branch with unit test and no fix https://github.com/facebook/rocksdb/pull/9942 to keep track of unit test (without fix) Reviewed By: riversand963 Differential Revision: D36043701 Pulled By: akankshamahajan15 fbshipit-source-id: 5760970db0a0920fb73d3c054a4155733500acd9 |
||
Yanqin Jin
|
8244f13448 |
Fix a bug in WAL tracking (#10087)
Summary: Closing https://github.com/facebook/rocksdb/issues/10080 When `SyncWAL()` calls `MarkLogsSynced()`, even if there is only one active WAL file, this event should still be added to the MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10087 Test Plan: make check Reviewed By: ajkr Differential Revision: D36797580 Pulled By: riversand963 fbshipit-source-id: 24184c9dd606b3939a454ed41de6e868d1519999 |
||
Akanksha Mahajan
|
c8bae6e29c |
Provide support for IOTracing for ReadAsync API (#9833)
Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/9833 Test Plan: Add unit test and manually check the output of tracing logs For fixed readahead_size it logs as: ``` Access Time : 193352113447923 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 15075 , IO Status: OK, Length: 12288, Offset: 659456 Access Time : 193352113465232 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 14425 , IO Status: OK, Length: 12288, Offset: 671744 Access Time : 193352113481539 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13062 , IO Status: OK, Length: 12288, Offset: 684032 Access Time : 193352113497692 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13649 , IO Status: OK, Length: 12288, Offset: 696320 Access Time : 193352113520043 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 19384 , IO Status: OK, Length: 12288, Offset: 708608 Access Time : 193352113538401 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 15406 , IO Status: OK, Length: 12288, Offset: 720896 Access Time : 193352113554855 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13670 , IO Status: OK, Length: 12288, Offset: 733184 Access Time : 193352113571624 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13855 , IO Status: OK, Length: 12288, Offset: 745472 Access Time : 193352113587924 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 13953 , IO Status: OK, Length: 12288, Offset: 757760 Access Time : 193352113603285 , File Name: 000026.sst , File Operation: Prefetch , Latency: 59 , IO Status: Not implemented: Prefetch not supported, Length: 8868, Offset: 898349 ``` For implicit readahead: ``` Access Time : 193351865156587 , File Name: 000026.sst , File Operation: Prefetch , Latency: 48 , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 391174 Access Time : 193351865160354 , File Name: 000026.sst , File Operation: Prefetch , Latency: 51 , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 395248 Access Time : 193351865164253 , File Name: 000026.sst , File Operation: Prefetch , Latency: 49 , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 399322 Access Time : 193351865165461 , File Name: 000026.sst , File Operation: ReadAsync , Latency: 222871 , IO Status: OK, Length: 135168, Offset: 401408 ``` Reviewed By: anand1976 Differential Revision: D35601634 Pulled By: akankshamahajan15 fbshipit-source-id: 5a4f32a850af878efa0767bd5706380152a1f26e |
||
Levi Tamasi
|
07a00828af |
Fix potential ambiguities in/around port/sys_time.h (#10045)
Summary: There are some time-related POSIX APIs that are not available on Windows (e.g. `localtime_r`), which we have worked around by providing our own implementations in `port/sys_time.h`. This workaround actually relies on some ambiguity: on Windows, a call to `localtime_r` calls `ROCKSDB_NAMESPACE::port::localtime_r` (which is pulled into `ROCKSDB_NAMESPACE` by a using-declaration), while on other platforms it calls the global `localtime_r`. This works fine as long as there is only one candidate function; however, it breaks down when there is more than one `localtime_r` visible in a scope. The patch fixes this by introducing `ROCKSDB_NAMESPACE::port::{TimeVal, GetTimeOfDay, LocalTimeR}` to eliminate any ambiguity. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10045 Test Plan: `make check` Reviewed By: riversand963 Differential Revision: D36639372 Pulled By: ltamasi fbshipit-source-id: fc13dbfa421b7c8918111a6d9e24ce77e91a7c50 |
||
anand76
|
f80bac500e |
Fix fbcode internal build failure (#10041)
Summary: The build failed due to different namespaces for coroutines (std::experimental vs std) based on compiler version. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10041 Reviewed By: ltamasi Differential Revision: D36617212 Pulled By: anand1976 fbshipit-source-id: dfb25320788d32969317d5651173059e2cbd8bd5 |
||
Akanksha Mahajan
|
a479c2c2b2 |
Fix stress test failure "Corruption: checksum mismatch" or "Iterator Diverged" with async_io enabled (#10032)
Summary: In case of non sequential reads with `async_io`, `FilePRefetchBuffer::TryReadFromCacheAsync` can be called for previous blocks with `offset < bufs_[curr_].offset_` which wasn't handled correctly resulting wrong data being returned from buffer. Since `FilePRefetchBuffer::PrefetchAsync` can be called for any data block, it sets `prev_len_` to 0 indicating `FilePRefetchBuffer::TryReadFromCacheAsync` to go for the prefetching even though offset < bufs_[curr_].offset_ This is because async prefetching is always done in second buffer (to avoid mutex) even though curr_ is empty leading to offset < bufs_[curr_].offset_ in some cases. If prev_len_ is non zero then `TryReadFromCacheAsync` returns false if `offset < bufs_[curr_].offset_ && prev_len != 0` indicating reads are not sequential and previous call wasn't PrefetchAsync. - This PR also simplifies `FilePRefetchBuffer::TryReadFromCacheAsync` as it was getting complicated covering different scenarios based on `async_io` enabled/disabled. If `for_compaction` is set true, it now calls `FilePRefetchBufferTryReadFromCache` following synchronous flow as before. Its decided in BlockFetcher.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/10032 Test Plan: 1. export CRASH_TEST_EXT_ARGS=" --async_io=1" make crash_test -j completed successfully locally 2. make crash_test -j completed successfully locally 3. Reran CircleCi mini crashtest job 4 - 5 times. 4. Updated prefetch_test for more coverage. Reviewed By: anand1976 Differential Revision: D36579858 Pulled By: akankshamahajan15 fbshipit-source-id: 0c428d62b45e12e082a83acf533a5e37a584bedf |
||
sdong
|
bea5831bff |
Move three info logging within DB Mutex to use log buffer (#10029)
Summary: info logging with DB Mutex could potentially invoke I/O and cause performance issues. Move three of the cases to use log buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10029 Test Plan: Run existing tests. Reviewed By: jay-zhuang Differential Revision: D36561694 fbshipit-source-id: cabb93fea299001a6b4c2802fcba3fde27fa062c |
||
Peter Dillinger
|
1e4850f626 |
Java build: finish compiling before testing (etc) (#10034)
Summary: Lack of ordering dependencies could lead to random build-linux-java failures with "Truncated class file" because tests started before compilation was finished. (Fix to java/Makefile) Also: * export SHA256_CMD to save copy-paste * Actually fail if Java sample build fails--which it was in CircleCI * Don't require Snappy for Java sample build (for more compatibility) * Remove check_all_python from jtest because it's running in `make check` builds in CircleCI Pull Request resolved: https://github.com/facebook/rocksdb/pull/10034 Test Plan: CI, some manual Reviewed By: jay-zhuang Differential Revision: D36596541 Pulled By: pdillinger fbshipit-source-id: 230d79db4b7ae93a366871ff09d0a88e8e1c8af3 |
||
Tom Blamer
|
cb8586003d |
Add plugin header install in CMakeLists.txt (#10025)
Summary: Fixes https://github.com/facebook/rocksdb/issues/9987. - Plugin specific headers can be specified by setting ${PLUGIN_NAME}_HEADERS in ${PLUGIN_NAME}.mk in the plugin directory. - This is supported by the Makefile based build, but was missing from CMakeLists.txt. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10025 Test Plan: - Add a plugin with ${PLUGIN_NAME}_HEADERS set in both ${PLUGIN_NAME}.mk and CMakeLists.txt - Run Makefile based install and CMake based install and verify installed headers match Reviewed By: riversand963 Differential Revision: D36584908 Pulled By: ajkr fbshipit-source-id: 5ea0137205ccbf0d36faacf45d712c5604065bb5 |
||
Adam Retter
|
56ce3aef33 |
Minimum macOS version needed to build v7.2.2 and up is 10.13 (#9976)
Summary: Some C++ code changes between version 7.1.2 and 7.2.2 now seem to require at least macOS 10.13 (2017) to build successfully, previously we needed 10.12 (2016). I haven't been able to identify the exact commit. **NOTE**: This needs to be merged to both `main` and `7.2.fb` branches. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9976 Reviewed By: jay-zhuang Differential Revision: D36303226 Pulled By: ajkr fbshipit-source-id: 589ce3ecf821db3402b0876e76d37b407896c945 |
||
Levi Tamasi
|
bed40e7266 |
Update HISTORY for 7.3 release (#10031)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10031 Reviewed By: riversand963 Differential Revision: D36567741 Pulled By: ltamasi fbshipit-source-id: 058f8cc856d276db6e1aed07a89ac0b7118c4435 |
||
Yanqin Jin
|
899db56a76 |
Point libprotobuf-mutator to the latest verified commit hash (#10028)
Summary: Recent updates to https://github.com/google/libprotobuf-mutator has caused link errors for RocksDB CircleCI job 'build-fuzzers'. This PR points the CI to a specific, most recent verified commit hash. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10028 Test Plan: watch for CI to finish. Reviewed By: pdillinger, jay-zhuang Differential Revision: D36562517 Pulled By: riversand963 fbshipit-source-id: ba5ef0f9ed6ea6a75aa5dd2768bd5f389ac14f46 |
||
Yanqin Jin
|
f648915b0d |
Fix a bug of not setting enforce_single_del_contracts (#10027)
Summary: Before this PR, BuildDBOptions() does not set a newly-added option, i.e. enforce_single_del_contracts, causing OPTIONS files to contain incorrect information. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10027 Test Plan: make check Manually check OPTIONS file. Reviewed By: ltamasi Differential Revision: D36556125 Pulled By: riversand963 fbshipit-source-id: e1074715b22c328b68c19e9ad89aa5d67d864bb5 |
||
Akanksha Mahajan
|
2db6a4a1d6 |
Seek parallelization (#9994)
Summary: The RocksDB iterator is a hierarchy of iterators. MergingIterator maintains a heap of LevelIterators, one for each L0 file and for each non-zero level. The Seek() operation naturally lends itself to parallelization, as it involves positioning every LevelIterator on the correct data block in the correct SST file. It lookups a level for a target key, to find the first key that's >= the target key. This typically involves reading one data block that is likely to contain the target key, and scan forward to find the first valid key. The forward scan may read more data blocks. In order to find the right data block, the iterator may read some metadata blocks (required for opening a file and searching the index). This flow can be parallelized. Design: Seek will be called two times under async_io option. First seek will send asynchronous request to prefetch the data blocks at each level and second seek will follow the normal flow and in FilePrefetchBuffer::TryReadFromCacheAsync it will wait for the Poll() to get the results and add the iterator to min_heap. - Status::TryAgain is passed down from FilePrefetchBuffer::PrefetchAsync to block_iter_.Status indicating asynchronous request has been submitted. - If for some reason asynchronous request returns error in submitting the request, it will fallback to sequential reading of blocks in one pass. - If the data already exists in prefetch_buffer, it will return the data without prefetching further and it will be treated as single pass of seek. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9994 Test Plan: - **Run Regressions.** ``` ./db_bench -db=/tmp/prefix_scan_prefetch_main -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000000 -use_direct_io_for_flush_and_compaction=true -target_file_size_base=16777216 ``` i) Previous release 7.0 run for normal prefetching with async_io disabled: ``` ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 7.0 Date: Thu Mar 17 13:11:34 2022 CPU: 24 * Intel Core Processor (Broadwell) CPUCache: 16384 KB Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 512 bytes each (256 bytes after compression) Entries: 5000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 2594.0 MB (estimated) FileSize: 1373.3 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 ------------------------------------------------ DB path: [/tmp/prefix_scan_prefetch_main] seekrandom : 483618.390 micros/op 2 ops/sec; 338.9 MB/s (249 of 249 found) ``` ii) normal prefetching after changes with async_io disable: ``` ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 Set seed to 1652922591315307 because --seed was 0 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 7.3 Date: Wed May 18 18:09:51 2022 CPU: 32 * Intel Xeon Processor (Skylake) CPUCache: 16384 KB Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 512 bytes each (256 bytes after compression) Entries: 5000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 2594.0 MB (estimated) FileSize: 1373.3 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 ------------------------------------------------ DB path: [/tmp/prefix_scan_prefetch_main] seekrandom : 483080.466 micros/op 2 ops/sec 120.287 seconds 249 operations; 340.8 MB/s (249 of 249 found) ``` iii) db_bench with async_io enabled completed succesfully ``` ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1 -adaptive_readahead=1 Set seed to 1652924062021732 because --seed was 0 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags RocksDB: version 7.3 Date: Wed May 18 18:34:22 2022 CPU: 32 * Intel Xeon Processor (Skylake) CPUCache: 16384 KB Keys: 32 bytes each (+ 0 bytes user-defined timestamp) Values: 512 bytes each (256 bytes after compression) Entries: 5000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 2594.0 MB (estimated) FileSize: 1373.3 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 ------------------------------------------------ DB path: [/tmp/prefix_scan_prefetch_main] seekrandom : 553913.576 micros/op 1 ops/sec 120.199 seconds 217 operations; 293.6 MB/s (217 of 217 found) ``` - db_stress with async_io disabled completed succesfully ``` export CRASH_TEST_EXT_ARGS=" --async_io=0" make crash_test -j ``` I**n Progress**: db_stress with async_io is failing and working on debugging/fixing it. Reviewed By: anand1976 Differential Revision: D36459323 Pulled By: akankshamahajan15 fbshipit-source-id: abb1cd944abe712bae3986ae5b16704b3338917c |
||
anand76
|
e015206dd6 |
Fix crash due to MultiGet async IO and direct IO (#10024)
Summary: MultiGet with async IO is not officially supported with Posix yet. Avoid a crash by using synchronous MultiRead when direct IO is enabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10024 Test Plan: Run db_crashtest.py manually Reviewed By: akankshamahajan15 Differential Revision: D36551053 Pulled By: anand1976 fbshipit-source-id: 72190418fa92dd0397e87825df618b12c9bdecda |
||
Changyu Bi
|
cc23b46da1 |
Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857)
Summary:
An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857
Test Plan:
#### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data:
Set up: change the parameter [here](
|
||
dependabot[bot]
|
6255ac7223 |
Bump nokogiri from 1.13.4 to 1.13.6 in /docs (#10019)
Summary: Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.13.4 to 1.13.6. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p> <blockquote> <h2>1.13.6 / 2022-05-08</h2> <h3>Security</h3> <ul> <li>[CRuby] Address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-29181">CVE-2022-29181</a>, improper handling of unexpected data types, related to untrusted inputs to the SAX parsers. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-xh29-r2w5-wx8m">GHSA-xh29-r2w5-wx8m</a> for more information.</li> </ul> <h3>Improvements</h3> <ul> <li><code>{HTML4,XML}::SAX::{Parser,ParserContext}</code> constructor methods now raise <code>TypeError</code> instead of segfaulting when an incorrect type is passed.</li> </ul> <hr /> <p>sha256:</p> <pre><code>58417c7c10f78cd1c0e1984f81538300d4ea98962cfd3f46f725efee48f9757a nokogiri-1.13.6-aarch64-linux.gem a2b04ec3b1b73ecc6fac619b41e9fdc70808b7a653b96ec97d04b7a23f158dbc nokogiri-1.13.6-arm64-darwin.gem 4437f2d03bc7da8854f4aaae89e24a98cf5c8b0212ae2bc003af7e65c7ee8e27 nokogiri-1.13.6-java.gem 99d3e212bbd5e80aa602a1f52d583e4f6e917ec594e6aa580f6aacc253eff984 nokogiri-1.13.6-x64-mingw-ucrt.gem a04f6154a75b6ed4fe2d0d0ff3ac02f094b54e150b50330448f834fa5726fbba nokogiri-1.13.6-x64-mingw32.gem a13f30c2863ef9e5e11240dd6d69ef114229d471018b44f2ff60bab28327de4d nokogiri-1.13.6-x86-linux.gem 63a2ca2f7a4f6bd9126e1695037f66c8eb72ed1e1740ef162b4480c57cc17dc6 nokogiri-1.13.6-x86-mingw32.gem 2b266e0eb18030763277b30dc3d64337f440191e2bd157027441ac56a59d9dfe nokogiri-1.13.6-x86_64-darwin.gem 3fa37b0c3b5744af45f9da3e4ae9cbd89480b35e12ae36b5e87a0452e0b38335 nokogiri-1.13.6-x86_64-linux.gem b1512fdc0aba446e1ee30de3e0671518eb363e75fab53486e99e8891d44b8587 nokogiri-1.13.6.gem </code></pre> <h2>1.13.5 / 2022-05-04</h2> <h3>Security</h3> <ul> <li>[CRuby] Vendored libxml2 is updated to address <a href="https://nvd.nist.gov/vuln/detail/CVE-2022-29824">CVE-2022-29824</a>. See <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-cgx6-hpwq-fhv5">GHSA-cgx6-hpwq-fhv5</a> for more information.</li> </ul> <h3>Dependencies</h3> <ul> <li>[CRuby] Vendored libxml2 is updated from v2.9.13 to <a href="https://gitlab.gnome.org/GNOME/libxml2/-/releases/v2.9.14">v2.9.14</a>.</li> </ul> <h3>Improvements</h3> <ul> <li>[CRuby] The libxml2 HTML4 parser no longer exhibits quadratic behavior when recovering some broken markup related to start-of-tag and bare <code><</code> characters.</li> </ul> <h3>Changed</h3> <ul> <li>[CRuby] The libxml2 HTML4 parser in v2.9.14 recovers from some broken markup differently. Notably, the XML CDATA escape sequence <code><![CDATA[</code> and incorrectly-opened comments will result in HTML text nodes starting with <code>&lt;!</code> instead of skipping the invalid tag. This behavior is a direct result of the <a href="https://gitlab.gnome.org/GNOME/libxml2/-/commit/798bdf1">quadratic-behavior fix</a> noted above. The behavior of downstream sanitizers relying on this behavior will also change. Some tests describing the changed behavior are in <a href=" |
||
Yu Zhang
|
16bdb1f999 |
Add timestamp support to DBImplReadOnly (#10004)
Summary: This PR adds timestamp support to a read only DB instance opened as `DBImplReadOnly`. A follow up PR will add the same support to `CompactedDBImpl`. With this, read only database has these timestamp related APIs: `ReadOptions.timestamp` : read should return the latest data visible to this specified timestamp `Iterator::timestamp()` : returns the timestamp associated with the key, value `DB:Get(..., std::string* timestamp)` : returns the timestamp associated with the key, value in `timestamp` Test plan (on devserver): ``` $COMPILE_WITH_ASAN=1 make -j24 all $./db_with_timestamp_basic_test --gtest_filter=DBBasicTestWithTimestamp.ReadOnlyDB* ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/10004 Reviewed By: riversand963 Differential Revision: D36434422 Pulled By: jowlyzhang fbshipit-source-id: 5d949e65b1ffb845758000e2b310fdd4aae71cfb |
||
anand76
|
57997ddaaf |
Multi file concurrency in MultiGet using coroutines and async IO (#9968)
Summary: This PR implements a coroutine version of batched MultiGet in order to concurrently read from multiple SST files in a level using async IO, thus reducing the latency of the MultiGet. The API from the user perspective is still synchronous and single threaded, with the RocksDB part of the processing happening in the context of the caller's thread. In Version::MultiGet, the decision is made whether to call synchronous or coroutine code. A good way to review this PR is to review the first 4 commits in order - de773b3, 70c2f70, 10b50e1, and 377a597 - before reviewing the rest. TODO: 1. Figure out how to build it in CircleCI (requires some dependencies to be installed) 2. Do some stress testing with coroutines enabled No regression in synchronous MultiGet between this branch and main - ``` ./db_bench -use_existing_db=true --db=/data/mysql/rocksdb/prefix_scan -benchmarks="readseq,multireadrandom" -key_size=32 -value_size=512 -num=5000000 -batch_size=64 -multiread_batched=true -use_direct_reads=false -duration=60 -ops_between_duration_checks=1 -readonly=true -adaptive_readahead=true -threads=16 -cache_size=10485760000 -async_io=false -multiread_stride=40000 -statistics ``` Branch - ```multireadrandom : 4.025 micros/op 3975111 ops/sec 60.001 seconds 238509056 operations; 2062.3 MB/s (14767808 of 14767808 found)``` Main - ```multireadrandom : 3.987 micros/op 4013216 ops/sec 60.001 seconds 240795392 operations; 2082.1 MB/s (15231040 of 15231040 found)``` More benchmarks in various scenarios are given below. The measurements were taken with ```async_io=false``` (no coroutines) and ```async_io=true``` (use coroutines). For an IO bound workload (with every key requiring an IO), the coroutines version shows a clear benefit, being ~2.6X faster. For CPU bound workloads, the coroutines version has ~6-15% higher CPU utilization, depending on how many keys overlap an SST file. 1. Single thread IO bound workload on remote storage with sparse MultiGet batch keys (~1 key overlap/file) - No coroutines - ```multireadrandom : 831.774 micros/op 1202 ops/sec 60.001 seconds 72136 operations; 0.6 MB/s (72136 of 72136 found)``` Using coroutines - ```multireadrandom : 318.742 micros/op 3137 ops/sec 60.003 seconds 188248 operations; 1.6 MB/s (188248 of 188248 found)``` 2. Single thread CPU bound workload (all data cached) with ~1 key overlap/file - No coroutines - ```multireadrandom : 4.127 micros/op 242322 ops/sec 60.000 seconds 14539384 operations; 125.7 MB/s (14539384 of 14539384 found)``` Using coroutines - ```multireadrandom : 4.741 micros/op 210935 ops/sec 60.000 seconds 12656176 operations; 109.4 MB/s (12656176 of 12656176 found)``` 3. Single thread CPU bound workload with ~2 key overlap/file - No coroutines - ```multireadrandom : 3.717 micros/op 269000 ops/sec 60.000 seconds 16140024 operations; 139.6 MB/s (16140024 of 16140024 found)``` Using coroutines - ```multireadrandom : 4.146 micros/op 241204 ops/sec 60.000 seconds 14472296 operations; 125.1 MB/s (14472296 of 14472296 found)``` 4. CPU bound multi-threaded (16 threads) with ~4 key overlap/file - No coroutines - ```multireadrandom : 4.534 micros/op 3528792 ops/sec 60.000 seconds 211728728 operations; 1830.7 MB/s (12737024 of 12737024 found) ``` Using coroutines - ```multireadrandom : 4.872 micros/op 3283812 ops/sec 60.000 seconds 197030096 operations; 1703.6 MB/s (12548032 of 12548032 found) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9968 Reviewed By: akankshamahajan15 Differential Revision: D36348563 Pulled By: anand1976 fbshipit-source-id: c0ce85a505fd26ebfbb09786cbd7f25202038696 |
||
Bo Wang
|
5be1579ead |
Address comments for PR #9988 and #9996 (#10020)
Summary: 1. The latest change of DecideRateLimiterPriority in https://github.com/facebook/rocksdb/pull/9988 is reverted. 2. For https://github.com/facebook/rocksdb/blob/main/db/builder.cc#L345-L349 2.1. Remove `we will regrad this verification as user reads` from the comments. 2.2. `Do not set` the read_options.rate_limiter_priority to Env::IO_USER . Flush should be a background job. 2.3. Update db_rate_limiter_test.cc. 3. In IOOptions, mark `prio` as deprecated for future removal. 4. In `file_system.h`, mark `IOPriority` as deprecated for future removal. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10020 Test Plan: Unit tests. Reviewed By: ajkr Differential Revision: D36525317 Pulled By: gitbw95 fbshipit-source-id: 011ba421822f8a124e6d25a2661c4e242df6ad36 |
||
Peter Dillinger
|
280b9f371a |
Fix auto_prefix_mode performance with partitioned filters (#10012)
Summary: Essentially refactored the RangeMayExist implementation in FullFilterBlockReader to FilterBlockReaderCommon so that it applies to partitioned filters as well. (The function is not called for the block-based filter case.) RangeMayExist is essentially a series of checks around a possible PrefixMayExist, and I'm confident those checks should be the same for partitioned as for full filters. (I think it's likely that bugs remain in those checks, but this change is overall a simplifying one.) Added auto_prefix_mode support to db_bench Other small fixes as well Fixes https://github.com/facebook/rocksdb/issues/10003 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10012 Test Plan: Expanded unit test that uses statistics to check for filter optimization, fails without the production code changes here Performance: populate two DBs with ``` TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters ``` Observe no measurable change in non-partitioned performance ``` TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20 ``` Before: seekrandom [AVG 15 runs] : 11798 (± 331) ops/sec After: seekrandom [AVG 15 runs] : 11724 (± 315) ops/sec Observe big improvement with partitioned (also supported by bloom use statistics) ``` TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20 ``` Before: seekrandom [AVG 12 runs] : 2942 (± 57) ops/sec After: seekrandom [AVG 12 runs] : 7489 (± 184) ops/sec Reviewed By: siying Differential Revision: D36469796 Pulled By: pdillinger fbshipit-source-id: bcf1e2a68d347b32adb2b27384f945434e7a266d |
||
Jay Zhuang
|
c6d326d3d7 |
Track SST unique id in MANIFEST and verify (#9990)
Summary: Start tracking SST unique id in MANIFEST, which is used to verify with SST properties to make sure the SST file is not overwritten or misplaced. A DB option `try_verify_sst_unique_id` is introduced to enable/disable the verification, if enabled, it opens all SST files during DB-open to read the unique_id from table properties (default is false), so it's recommended to use it with `max_open_files = -1` to pre-open the files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990 Test Plan: unittests, format-compatible test, mini-crash Reviewed By: anand1976 Differential Revision: D36381863 Pulled By: jay-zhuang fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f |
||
Hui Xiao
|
dde774db64 |
Mark old reserve* option deprecated (#10016)
Summary: **Context/Summary:** https://github.com/facebook/rocksdb/pull/9926 removed inefficient `reserve*` option API but forgot to mark them deprecated in `block_based_table_type_info` for compatible table format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10016 Test Plan: build-format-compatible Reviewed By: pdillinger Differential Revision: D36484247 Pulled By: hx235 fbshipit-source-id: c41b90cc99fb7ab7098934052f0af7290b221f98 |
||
gitbw95
|
4da34b97ee |
Set Read rate limiter priority dynamically and pass it to FS (#9996)
Summary: ### Context: Background compactions and flush generate large reads and writes, and can be long running, especially for universal compaction. In some cases, this can impact foreground reads and writes by users. ### Solution User, Flush, and Compaction reads share some code path. For this task, we update the rate_limiter_priority in ReadOptions for code paths (e.g. FindTable (mainly in BlockBasedTable::Open()) and various iterators), and eventually update the rate_limiter_priority in IOOptions for FSRandomAccessFile. **This PR is for the Read path.** The **Read:** dynamic priority for different state are listed as follows: | State | Normal | Delayed | Stalled | | ----- | ------ | ------- | ------- | | Flush (verification read in BuildTable()) | IO_USER | IO_USER | IO_USER | | Compaction | IO_LOW | IO_USER | IO_USER | | User | User provided | User provided | User provided | We will respect the read_options that the user provided and will not set it. The only sst read for Flush is the verification read in BuildTable(). It claims to be "regard as user read". **Details** 1. Set read_options.rate_limiter_priority dynamically: - User: Do not update the read_options. Use the read_options that the user provided. - Compaction: Update read_options in CompactionJob::ProcessKeyValueCompaction(). - Flush: Update read_options in BuildTable(). 2. Pass the rate limiter priority to FSRandomAccessFile functions: - After calling the FindTable(), read_options is passed through GetTableReader(table_cache.cc), BlockBasedTableFactory::NewTableReader(block_based_table_factory.cc), and BlockBasedTable::Open(). The Open() needs some updates for the ReadOptions variable and the updates are also needed for the called functions, including PrefetchTail(), PrepareIOOptions(), ReadFooterFromFile(), ReadMetaIndexblock(), ReadPropertiesBlock(), PrefetchIndexAndFilterBlocks(), and ReadRangeDelBlock(). - In RandomAccessFileReader, the functions to be updated include Read(), MultiRead(), ReadAsync(), and Prefetch(). - Update the downstream functions of NewIndexIterator(), NewDataBlockIterator(), and BlockBasedTableIterator(). ### Test Plans Add unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9996 Reviewed By: anand1976 Differential Revision: D36452483 Pulled By: gitbw95 fbshipit-source-id: 60978204a4f849bb9261cb78d9bc1cb56d6008cf |
||
sdong
|
f1303bf8d8 |
Remove two tests from platform dependent tests (#10017)
Summary: Platform dependent tests sometimes run too long and causes timeout in Travis. Remove two tests that are less likely to be platform dependent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10017 Test Plan: Watch Travis tests. Reviewed By: pdillinger Differential Revision: D36486734 fbshipit-source-id: 2a3ad1746791c893a790c2a69a3b70f81e7de260 |
||
Yaroslav Stepanchuk
|
0a43061f8d |
Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015)
Summary: ROCKSDB_SUPPORT_THREAD_LOCAL definition has been removed. `__thread`(#define) has been replaced with `thread_local`(C++ keyword) across the code base. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10015 Reviewed By: siying Differential Revision: D36485491 Pulled By: pdillinger fbshipit-source-id: 6522d212514ee190b90b4e2750c80c7e34013c78 |
||
Yanqin Jin
|
e3a3dbf2be |
Avoid overwriting options loaded from OPTIONS (#9943)
Summary: This is similar to https://github.com/facebook/rocksdb/issues/9862, including the following fixes/refactoring: 1. If OPTIONS file is specified via `-options_file`, majority of options will be loaded from the file. We should not overwrite options that have been loaded from the file. Instead, we configure only fields of options which are shared objects and not set by the OPTIONS file. We also configure a few fields, e.g. `create_if_missing` necessary for stress test to run. 2. Refactor options initialization into three functions, `InitializeOptionsFromFile()`, `InitializeOptionsFromFlags()` and `InitializeOptionsGeneral()` similar to db_bench. I hope they can be shared in the future. The high-level logic is as follows: ```cpp if (!InitializeOptionsFromFile(...)) { InitializeOptionsFromFlags(...); } InitializeOptionsGeneral(...); ``` 3. Currently, the setting for `block_cache_compressed` does not seem correct because it by default specifies a size of `numeric_limits<size_t>::max()` ((size_t)-1). According to code comments, `-1` indicates default value, which should be referring to `num_shard_bits` argument. 4. Clarify `fail_if_options_file_error`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9943 Test Plan: 1. make check 2. Run stress tests, and manually check generated OPTIONS file and compare them with input OPTIONS files Reviewed By: jay-zhuang Differential Revision: D36133769 Pulled By: riversand963 fbshipit-source-id: 35dacdc090a0a72c922907170cd132b9ecaa073e |
||
sdong
|
a74f14b550 |
Log error message when LinkFile() is not supported when ingesting files (#10010)
Summary: Right now, whether moving file is skipped due to LinkFile() is not supported is opaque to users. Add a log message to help users debug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10010 Test Plan: Run existing test. Manual test verify the log message printed out. Reviewed By: riversand963 Differential Revision: D36463237 fbshipit-source-id: b00bd5041bd5c11afa4e326819c8461ee2c98a91 |
||
gitbw95
|
05c678e135 |
Set Write rate limiter priority dynamically and pass it to FS (#9988)
Summary: ### Context: Background compactions and flush generate large reads and writes, and can be long running, especially for universal compaction. In some cases, this can impact foreground reads and writes by users. From the RocksDB perspective, there can be two kinds of rate limiters, the internal (native) one and the external one. - The internal (native) rate limiter is introduced in [the wiki](https://github.com/facebook/rocksdb/wiki/Rate-Limiter). Currently, only IO_LOW and IO_HIGH are used and they are set statically. - For the external rate limiter, in FSWritableFile functions, IOOptions is open for end users to set and get rate_limiter_priority for their own rate limiter. Currently, RocksDB doesn’t pass the rate_limiter_priority through IOOptions to the file system. ### Solution During the User Read, Flush write, Compaction read/write, the WriteController is used to determine whether DB writes are stalled or slowed down. The rate limiter priority (Env::IOPriority) can be determined accordingly. We decided to always pass the priority in IOOptions. What the file system does with it should be a contract between the user and the file system. We would like to set the rate limiter priority at file level, since the Flush/Compaction job level may be too coarse with multiple files and block IO level is too granular. **This PR is for the Write path.** The **Write:** dynamic priority for different state are listed as follows: | State | Normal | Delayed | Stalled | | ----- | ------ | ------- | ------- | | Flush | IO_HIGH | IO_USER | IO_USER | | Compaction | IO_LOW | IO_USER | IO_USER | Flush and Compaction writes share the same call path through BlockBaseTableWriter, WritableFileWriter, and FSWritableFile. When a new FSWritableFile object is created, its io_priority_ can be set dynamically based on the state of the WriteController. In WritableFileWriter, before the call sites of FSWritableFile functions, WritableFileWriter::DecideRateLimiterPriority() determines the rate_limiter_priority. The options (IOOptions) argument of FSWritableFile functions will be updated with the rate_limiter_priority. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9988 Test Plan: Add unit tests. Reviewed By: anand1976 Differential Revision: D36395159 Pulled By: gitbw95 fbshipit-source-id: a7c82fc29759139a1a07ec46c37dbf7e753474cf |
||
Jay Zhuang
|
b84e3363f5 |
Add table_properties_collector_factories override (#9995)
Summary: Add table_properties_collector_factories override on the remote side. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9995 Test Plan: unittest added Reviewed By: ajkr Differential Revision: D36392623 Pulled By: jay-zhuang fbshipit-source-id: 3ba031294d90247ca063d7de7b43178d38e3f66a |
||
Peter Dillinger
|
0070680cfd |
Adjust public APIs to prefer 128-bit SST unique ID (#10009)
Summary: 128 bits should suffice almost always and for tracking in manifest. Note that this changes the output of sst_dump --show_properties to only show 128 bits. Also introduces InternalUniqueIdToHumanString for presenting internal IDs for debugging purposes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10009 Test Plan: unit tests updated Reviewed By: jay-zhuang Differential Revision: D36458189 Pulled By: pdillinger fbshipit-source-id: 93ebc4a3b6f9c73ee154383a1f8b291a5d6bbef5 |
||
XieJiSS
|
8b1df101da |
fix: build on risc-v (#9215)
Summary:
Patch is modified from ~~https://reviews.llvm.org/file/data/du5ol5zctyqw53ma7dwz/PHID-FILE-knherxziu4tl4erti5ab/file~~
Tested on Arch Linux riscv64gc (qemu)
UPDATE: Seems like the above link is broken, so I tried to search for a link pointing to the original merge request. It turned out to me that the LLVM guys are cherry-picking from `google/benchmark`, and the upstream should be this:
|
||
Hui Xiao
|
3573558ec5 |
Rewrite memory-charging feature's option API (#9926)
Summary: **Context:** Previous PR https://github.com/facebook/rocksdb/pull/9748, https://github.com/facebook/rocksdb/pull/9073, https://github.com/facebook/rocksdb/pull/8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`. Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags. **Summary:** - Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that - Added missing db bench/stress test for some memory charging features - Renamed related test suite to indicate they are under the same theme of memory charging - Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication - Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9926 Test Plan: - New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)` - New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)` - CI - db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_compression_dictionary_building_buffer=1(remove this for comparison) -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'` #-run | (pre-PR) avg micros/op | std micros/op | (post-PR) micros/op | std micros/op | change (%) -- | -- | -- | -- | -- | -- 10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721 20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465** 40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078** - db_stress: `python3 tools/db_crashtest.py blackbox -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D36054712 Pulled By: hx235 fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af |
||
Hui Xiao
|
f6339de0d2 |
Clarify some SequentialFileReader::Read logic (#10002)
Summary: **Context/Summary:** The logic related to PositionedRead in SequentialFileReader::Read confused me a bit as discussed here https://github.com/facebook/rocksdb/pull/9973#discussion_r872869256. Therefore I added a drawing with help from cbi42. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10002 Test Plan: - no code change Reviewed By: anand1976, cbi42 Differential Revision: D36422632 Pulled By: hx235 fbshipit-source-id: 9a8311d2365564f90d216c430f542fc11b2d9cde |
||
mrambacher
|
b11ff347b4 |
Use STATIC_AVOID_DESTRUCTION for static objects with non-trivial destructors (#9958)
Summary: Changed the static objects that had non-trivial destructors to use the STATIC_AVOID_DESTRUCTION construct. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9958 Reviewed By: pdillinger Differential Revision: D36442982 Pulled By: mrambacher fbshipit-source-id: 029d47b1374d30d198bfede369a4c0ae7a4eb519 |
||
Yanqin Jin
|
3f263ef536 |
Add a temporary option for user to opt-out enforcement of SingleDelete contract (#9983)
Summary: PR https://github.com/facebook/rocksdb/issues/9888 started to enforce the contract of single delete described in https://github.com/facebook/rocksdb/wiki/Single-Delete. For some of existing use cases, it is desirable to have a transition during which compaction will not fail if the contract is violated. Therefore, we add a temporary option `enforce_single_del_contracts` to allow application to opt out from this new strict behavior. Once transition completes, the flag can be set to `true` again. In a future release, the option will be removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9983 Test Plan: make check Reviewed By: ajkr Differential Revision: D36333672 Pulled By: riversand963 fbshipit-source-id: dcb703ea0ed08076a1422f1bfb9914afe3c2caa2 |
||
Hui Xiao
|
e66e6d2faa |
Use SpecialEnv to speed up some slow BackupEngineRateLimitingTestWithParam (#9974)
Summary: **Context:** `BackupEngineRateLimitingTestWithParam.RateLimiting` and `BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup` involve creating backup and restoring of a big database with rate-limiting. Using the normal env with a normal clock requires real elapse of time (13702 - 19848 ms/per test). As suggested in https://github.com/facebook/rocksdb/pull/8722#discussion_r703698603, this PR is to speed it up with SpecialEnv (`time_elapse_only_sleep=true`) where its clock accepts fake elapse of time during rate-limiting (100 - 600 ms/per test) **Summary:** - Added TEST_ function to set clock of the default rate limiters in backup engine - Shrunk testdb by 10 times while keeping it big enough for testing - Renamed some test variables and reorganized some if-else branch for clarity without changing the test Pull Request resolved: https://github.com/facebook/rocksdb/pull/9974 Test Plan: - Run tests pre/post PR the same time to verify the tests are sped up by 90 - 95% `BackupEngineRateLimitingTestWithParam.RateLimiting` Pre: ``` [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 (11123 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 (9441 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 (11096 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 (9339 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 (11121 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 (9413 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 (11185 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 (9511 ms) [----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (82230 ms total) ``` Post: ``` [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 (395 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 (564 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 (358 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 (567 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 (173 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 (176 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 (191 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 (177 ms) [----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (2601 ms total) ``` `BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup` Pre: ``` [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 (7275 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 (3961 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 (7117 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 (3921 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 (19862 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 (10231 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 (19848 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 (10372 ms) [----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (82587 ms total) ``` Post: ``` [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 (157 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 (152 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 (160 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 (158 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 (155 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 (151 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 (146 ms) [ RUN ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 [ OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 (153 ms) [----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (1232 ms total) ``` Reviewed By: pdillinger Differential Revision: D36336345 Pulled By: hx235 fbshipit-source-id: 724c6ba745f95f56d4440a6d2f1e4512a2987589 |
||
mrambacher
|
204a42ca97 |
Added GetFactoryCount/Names/Types to ObjectRegistry (#9358)
Summary: These methods allow for more thorough testing of the ObjectRegistry and Customizable infrastructure in a simpler manner. With this change, the Customizable tests can now check what factories are registered and attempt to create each of them in a systematic fashion. With this change, I think all of the factories registered with the ObjectRegistry/CreateFromString are now tested via the customizable_test classes. Note that there were a few other minor changes. There was a "posix://*" register with the ObjectRegistry which was missed during the PatternEntry conversion -- these changes found that. The nickname and default names for the FileSystem classes was also inverted. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9358 Reviewed By: pdillinger Differential Revision: D33433542 Pulled By: mrambacher fbshipit-source-id: 9a32da74e6620745b4eeffb2712be70eeeadfa7e |
||
sdong
|
c4cd8e1acc |
Fix a bug handling multiget index I/O error. (#9993)
Summary: In one path of BlockBasedTable::MultiGet(), Next() is directly called after calling Seek() against the index iterator. This might cause crash if an I/O error happens in Seek(). The bug is discovered in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9993 Test Plan: See existing CI tests pass. Reviewed By: anand1976 Differential Revision: D36381758 fbshipit-source-id: a11e0aa48dcee168c2554c33b532646ffdb68877 |
||
Yanqin Jin
|
b58a1a035b |
Revert "Bugfix/fix manual flush blocking bug (#9893)" (#9992)
Summary:
This reverts commit
|
||
Yanqin Jin
|
f6d9730ea1 |
Fix stress test with best-efforts-recovery (#9986)
Summary: This PR - since we are testing with disable_wal = true and best_efforts_recovery, we should set column family count to 1, due to the requirement of `ExpectedState` tracking and replaying logic. - during backup and checkpoint restore, disable best-efforts-recovery. This does not matter now because db_crashtest.py always disables wal when testing best-efforts-recovery. In the future, if we enable wal, then not setting `restore_opitions.best_efforts_recovery` will cause backup db not to recover the WALs, and differ from db (that enables WAL). - during verification of backup and checkpoint restore, print the key where inconsistency exists between expected state and db. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9986 Test Plan: TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_best_efforts_recovery Reviewed By: siying Differential Revision: D36353105 Pulled By: riversand963 fbshipit-source-id: a484da161273e6216a1f7e245bac15a349693917 |
||
mrambacher
|
bfc6a8ee4a |
Option type info functions (#9411)
Summary: Add methods to set the various functions (Parse, Serialize, Equals) to the OptionTypeInfo. These methods simplify the number of constructors required for OptionTypeInfo and make the code a little clearer. Add functions to the OptionTypeInfo for Prepare and Validate. These methods allow types other than Configurable and Customizable to have Prepare and Validate logic. These methods could be used by an option to guarantee that its settings were in a range or that a value was initialized. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9411 Reviewed By: pdillinger Differential Revision: D36174849 Pulled By: mrambacher fbshipit-source-id: 72517d8c6bab4723788a4c1a9e16590bff870125 |
||
Peter Dillinger
|
cdaa9576bb |
Put build size checking logic in Makefile (#9989)
Summary: ... for better maintainability, in case of Makefile changes / refactoring. This is lightly modified from rocksd-lego-determinator, and will be used by Meta-internal CI with custom REPORT_BUILD_STATISTIC Pull Request resolved: https://github.com/facebook/rocksdb/pull/9989 Test Plan: some manual stuff Reviewed By: jay-zhuang Differential Revision: D36362362 Pulled By: pdillinger fbshipit-source-id: 52b65b6282fe839dc6d906ff95a3ed66ca1574ba |
||
Chen Xiao
|
07c6807113 |
Add pmem-rocksdb-plugin link in PLUGINs.md (#9934)
Summary: This change adds pmem-rocksdb-plugin link in PLUGINS.md. The link is: https://github.com/pmem/pmem-rocksdb-plugin. It provides a collection plugins to enable Persistent Memory (PMEM) on RocksDB. The pmem-rocksdb-plugin repo contains RocksDB’s plugins for LSM-tree based KV store to fit it on the PMEM by effectively utilize its characteristics. The first two basic plugins are: 1) Providing a filesystem API wrapper to write RocksDB's WAL (Write Ahead Log) files on PMEM to optimize write performance. 2) Using PMEM as secondary cache to optimize read performance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9934 Reviewed By: jay-zhuang Differential Revision: D36366893 Pulled By: riversand963 fbshipit-source-id: d58a39365e9b5d6a3249d4e9b377c7fb2c79badb |
||
Yueh-Hsuan Chiang
|
bcb1287235 |
Port the batched version of MultiGet() to RocksDB's C API (#9952)
Summary: The batched version of MultiGet() is not available in RocksDB's C API. This PR implements rocksdb_batched_multi_get_cf which is a C wrapper function that invokes the batched version of MultiGet() which takes one single column family. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9952 Test Plan: Added a new test case under "columnfamilies" test case in c_test.cc Reviewed By: riversand963 Differential Revision: D36302888 Pulled By: ajkr fbshipit-source-id: fa134c4a1c8e7d72dd4ae8649a74e3797b5cf4e6 |
||
Akanksha Mahajan
|
6442a62e46 |
Update WAL corruption test so that it fails without fix (#9942)
Summary: In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't flush the data from WAL to L0 for all column families if possible. As a result, not all column families can increase their log_numbers, and min_log_number_to_keep won't change. For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change. If we persist a new MANIFEST with advanced log_numbers for some column families, then during a second crash after persisting the MANIFEST, RocksDB will see some column families' log_numbers larger than the corrupted WAL, and the "column family inconsistency" error will be hit, causing recovery to fail. This PR update unit tests to emulate the errors and tests are failing without a fix. Error: ``` [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/0 db/corruption_test.cc:1190: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF test_cf [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/0, where GetParam() = (true, false) (91 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/1 db/corruption_test.cc:1190: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF test_cf [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/1, where GetParam() = (false, false) (92 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/2 db/corruption_test.cc:1190: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF test_cf [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/2, where GetParam() = (true, true) (95 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/3 db/corruption_test.cc:1190: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF test_cf [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/3, where GetParam() = (false, true) (92 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/0 db/corruption_test.cc:1354: Failure TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/0, where GetParam() = (true, false) (94 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/1 db/corruption_test.cc:1354: Failure TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/1, where GetParam() = (false, false) (97 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/2 db/corruption_test.cc:1354: Failure TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/2, where GetParam() = (true, true) (94 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/3 db/corruption_test.cc:1354: Failure TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/3, where GetParam() = (false, true) (91 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/0 db/corruption_test.cc:1483: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/0, where GetParam() = (true, false) (93 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/1 db/corruption_test.cc:1483: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/1, where GetParam() = (false, false) (94 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/2 db/corruption_test.cc:1483: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/2, where GetParam() = (true, true) (90 ms) [ RUN ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/3 db/corruption_test.cc:1483: Failure DB::Open(options, dbname_, cf_descs, &handles, &db_) Corruption: SST file is ahead of WALs in CF default [ FAILED ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/3, where GetParam() = (false, true) (93 ms) [----------] 12 tests from CorruptionTest/CrashDuringRecoveryWithCorruptionTest (1116 ms total) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9942 Test Plan: Not needed Reviewed By: riversand963 Differential Revision: D36324112 Pulled By: akankshamahajan15 fbshipit-source-id: cab2075ac4ebe48f5ef93a6ea162558aa4fc334d |
||
Peter Dillinger
|
e96e8e2d05 |
Remove slack CircleCI hook (#9982)
Summary: Our Slack site is deprecated Pull Request resolved: https://github.com/facebook/rocksdb/pull/9982 Test Plan: CircleCI Reviewed By: siying Differential Revision: D36322050 Pulled By: pdillinger fbshipit-source-id: 678202404d307e1547e4203d7e6bd467803ccd5e |
||
Andrew Kryczka
|
e943bbdd2f |
Temporarily disable sync_fault_injection (#9979)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9979 Reviewed By: siying Differential Revision: D36301555 Pulled By: ajkr fbshipit-source-id: ed298d3484b6aad3ef19746e984bf4c52be33a9f |
||
Peter Dillinger
|
e8d604cf85 |
Reorganize CircleCI workflows (#9981)
Summary: Condense down to 8 groups rather than 20+ for ease of browsing pages like https://app.circleci.com/pipelines/github/facebook/rocksdb?branch=main&filter=all Also, run nightly builds at 1AM or 2AM Pacific (depending on daylight time) rather than 4PM or 5PM Pacific, so that they actually use each day's landed changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9981 Test Plan: CI And manually inspected ``` grep -Eo 'build-[^: ]*' .circleci/config.yml | sort | uniq -c | less ``` to ensure I didn't orphan anything Reviewed By: jay-zhuang Differential Revision: D36317634 Pulled By: pdillinger fbshipit-source-id: 1c10d29d6b5d60ce3dd1364cd91f175380075ff3 |