Commit Graph

11041 Commits

Author SHA1 Message Date
anand76
57997ddaaf Multi file concurrency in MultiGet using coroutines and async IO (#9968)
Summary:
This PR implements a coroutine version of batched MultiGet in order to concurrently read from multiple SST files in a level using async IO, thus reducing the latency of the MultiGet. The API from the user perspective is still synchronous and single threaded, with the RocksDB part of the processing happening in the context of the caller's thread. In Version::MultiGet, the decision is made whether to call synchronous or coroutine code.

A good way to review this PR is to review the first 4 commits in order - de773b3, 70c2f70, 10b50e1, and 377a597 - before reviewing the rest.

TODO:
1. Figure out how to build it in CircleCI (requires some dependencies to be installed)
2. Do some stress testing with coroutines enabled

No regression in synchronous MultiGet between this branch and main -
```
./db_bench -use_existing_db=true --db=/data/mysql/rocksdb/prefix_scan -benchmarks="readseq,multireadrandom" -key_size=32 -value_size=512 -num=5000000 -batch_size=64 -multiread_batched=true -use_direct_reads=false -duration=60 -ops_between_duration_checks=1 -readonly=true -adaptive_readahead=true -threads=16 -cache_size=10485760000 -async_io=false -multiread_stride=40000 -statistics
```
Branch - ```multireadrandom :       4.025 micros/op 3975111 ops/sec 60.001 seconds 238509056 operations; 2062.3 MB/s (14767808 of 14767808 found)```

Main - ```multireadrandom :       3.987 micros/op 4013216 ops/sec 60.001 seconds 240795392 operations; 2082.1 MB/s (15231040 of 15231040 found)```

More benchmarks in various scenarios are given below. The measurements were taken with ```async_io=false``` (no coroutines) and ```async_io=true``` (use coroutines). For an IO bound workload (with every key requiring an IO), the coroutines version shows a clear benefit, being ~2.6X faster. For CPU bound workloads, the coroutines version has ~6-15% higher CPU utilization, depending on how many keys overlap an SST file.

1. Single thread IO bound workload on remote storage with sparse MultiGet batch keys (~1 key overlap/file) -
No coroutines - ```multireadrandom :     831.774 micros/op 1202 ops/sec 60.001 seconds 72136 operations;    0.6 MB/s (72136 of 72136 found)```
Using coroutines - ```multireadrandom :     318.742 micros/op 3137 ops/sec 60.003 seconds 188248 operations;    1.6 MB/s (188248 of 188248 found)```

2. Single thread CPU bound workload (all data cached) with ~1 key overlap/file -
No coroutines - ```multireadrandom :       4.127 micros/op 242322 ops/sec 60.000 seconds 14539384 operations;  125.7 MB/s (14539384 of 14539384 found)```
Using coroutines - ```multireadrandom :       4.741 micros/op 210935 ops/sec 60.000 seconds 12656176 operations;  109.4 MB/s (12656176 of 12656176 found)```

3. Single thread CPU bound workload with ~2 key overlap/file -
No coroutines - ```multireadrandom :       3.717 micros/op 269000 ops/sec 60.000 seconds 16140024 operations;  139.6 MB/s (16140024 of 16140024 found)```
Using coroutines - ```multireadrandom :       4.146 micros/op 241204 ops/sec 60.000 seconds 14472296 operations;  125.1 MB/s (14472296 of 14472296 found)```

4. CPU bound multi-threaded (16 threads) with ~4 key overlap/file -
No coroutines - ```multireadrandom :       4.534 micros/op 3528792 ops/sec 60.000 seconds 211728728 operations; 1830.7 MB/s (12737024 of 12737024 found) ```
Using coroutines - ```multireadrandom :       4.872 micros/op 3283812 ops/sec 60.000 seconds 197030096 operations; 1703.6 MB/s (12548032 of 12548032 found) ```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9968

Reviewed By: akankshamahajan15

Differential Revision: D36348563

Pulled By: anand1976

fbshipit-source-id: c0ce85a505fd26ebfbb09786cbd7f25202038696
2022-05-19 15:36:27 -07:00
Bo Wang
5be1579ead Address comments for PR #9988 and #9996 (#10020)
Summary:
1. The latest change of DecideRateLimiterPriority in https://github.com/facebook/rocksdb/pull/9988 is reverted.
2. For https://github.com/facebook/rocksdb/blob/main/db/builder.cc#L345-L349
  2.1. Remove `we will regrad this verification as user reads` from the comments.
  2.2. `Do not set` the read_options.rate_limiter_priority to Env::IO_USER . Flush should be a background job.
  2.3. Update db_rate_limiter_test.cc.
3. In IOOptions, mark `prio` as deprecated for future removal.
4. In `file_system.h`, mark `IOPriority` as deprecated for future removal.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10020

Test Plan: Unit tests.

Reviewed By: ajkr

Differential Revision: D36525317

Pulled By: gitbw95

fbshipit-source-id: 011ba421822f8a124e6d25a2661c4e242df6ad36
2022-05-19 15:23:53 -07:00
Peter Dillinger
280b9f371a Fix auto_prefix_mode performance with partitioned filters (#10012)
Summary:
Essentially refactored the RangeMayExist implementation in
FullFilterBlockReader to FilterBlockReaderCommon so that it applies to
partitioned filters as well. (The function is not called for the
block-based filter case.) RangeMayExist is essentially a series of checks
around a possible PrefixMayExist, and I'm confident those checks should
be the same for partitioned as for full filters. (I think it's likely
that bugs remain in those checks, but this change is overall a simplifying
one.)

Added auto_prefix_mode support to db_bench

Other small fixes as well

Fixes https://github.com/facebook/rocksdb/issues/10003

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10012

Test Plan:
Expanded unit test that uses statistics to check for filter
optimization, fails without the production code changes here

Performance: populate two DBs with
```
TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8
TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters
```

Observe no measurable change in non-partitioned performance
```
TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20
```
Before: seekrandom [AVG 15 runs] : 11798 (± 331) ops/sec
After: seekrandom [AVG 15 runs] : 11724 (± 315) ops/sec

Observe big improvement with partitioned (also supported by bloom use statistics)
```
TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20
```
Before: seekrandom [AVG 12 runs] : 2942 (± 57) ops/sec
After: seekrandom [AVG 12 runs] : 7489 (± 184) ops/sec

Reviewed By: siying

Differential Revision: D36469796

Pulled By: pdillinger

fbshipit-source-id: bcf1e2a68d347b32adb2b27384f945434e7a266d
2022-05-19 13:09:03 -07:00
Jay Zhuang
c6d326d3d7 Track SST unique id in MANIFEST and verify (#9990)
Summary:
Start tracking SST unique id in MANIFEST, which is used to verify with
SST properties to make sure the SST file is not overwritten or
misplaced. A DB option `try_verify_sst_unique_id` is introduced to
enable/disable the verification, if enabled, it opens all SST files
during DB-open to read the unique_id from table properties (default is
false), so it's recommended to use it with `max_open_files = -1` to
pre-open the files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990

Test Plan: unittests, format-compatible test, mini-crash

Reviewed By: anand1976

Differential Revision: D36381863

Pulled By: jay-zhuang

fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f
2022-05-19 11:04:21 -07:00
Hui Xiao
dde774db64 Mark old reserve* option deprecated (#10016)
Summary:
**Context/Summary:**
https://github.com/facebook/rocksdb/pull/9926 removed inefficient `reserve*` option API but forgot to mark them deprecated in `block_based_table_type_info` for compatible table format.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10016

Test Plan: build-format-compatible

Reviewed By: pdillinger

Differential Revision: D36484247

Pulled By: hx235

fbshipit-source-id: c41b90cc99fb7ab7098934052f0af7290b221f98
2022-05-18 22:25:54 -07:00
gitbw95
4da34b97ee Set Read rate limiter priority dynamically and pass it to FS (#9996)
Summary:
### Context:
Background compactions and flush generate large reads and writes, and can be long running, especially for universal compaction. In some cases, this can impact foreground reads and writes by users.

### Solution
User, Flush, and Compaction reads share some code path. For this task, we update the rate_limiter_priority in ReadOptions for code paths (e.g. FindTable (mainly in BlockBasedTable::Open()) and various iterators), and eventually update the rate_limiter_priority in IOOptions for FSRandomAccessFile.

**This PR is for the Read path.** The **Read:** dynamic priority for different state are listed as follows:

| State | Normal | Delayed | Stalled |
| ----- | ------ | ------- | ------- |
|  Flush (verification read in BuildTable()) | IO_USER | IO_USER | IO_USER |
|  Compaction | IO_LOW  | IO_USER | IO_USER |
|  User | User provided | User provided | User provided |

We will respect the read_options that the user provided and will not set it.
The only sst read for Flush is the verification read in BuildTable(). It claims to be "regard as user read".

**Details**
1. Set read_options.rate_limiter_priority dynamically:
- User: Do not update the read_options. Use the read_options that the user provided.
- Compaction: Update read_options in CompactionJob::ProcessKeyValueCompaction().
- Flush: Update read_options in BuildTable().

2. Pass the rate limiter priority to FSRandomAccessFile functions:
- After calling the FindTable(), read_options is passed through GetTableReader(table_cache.cc), BlockBasedTableFactory::NewTableReader(block_based_table_factory.cc), and BlockBasedTable::Open(). The Open() needs some updates for the ReadOptions variable and the updates are also needed for the called functions,  including PrefetchTail(), PrepareIOOptions(), ReadFooterFromFile(), ReadMetaIndexblock(), ReadPropertiesBlock(), PrefetchIndexAndFilterBlocks(), and ReadRangeDelBlock().
- In RandomAccessFileReader, the functions to be updated include Read(), MultiRead(), ReadAsync(), and Prefetch().
- Update the downstream functions of NewIndexIterator(), NewDataBlockIterator(), and BlockBasedTableIterator().

### Test Plans
Add unit tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9996

Reviewed By: anand1976

Differential Revision: D36452483

Pulled By: gitbw95

fbshipit-source-id: 60978204a4f849bb9261cb78d9bc1cb56d6008cf
2022-05-18 19:41:44 -07:00
sdong
f1303bf8d8 Remove two tests from platform dependent tests (#10017)
Summary:
Platform dependent tests sometimes run too long and causes timeout in Travis. Remove two tests that are less likely to be platform dependent.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10017

Test Plan: Watch Travis tests.

Reviewed By: pdillinger

Differential Revision: D36486734

fbshipit-source-id: 2a3ad1746791c893a790c2a69a3b70f81e7de260
2022-05-18 16:18:12 -07:00
Yaroslav Stepanchuk
0a43061f8d Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015)
Summary:
ROCKSDB_SUPPORT_THREAD_LOCAL definition has been removed.
`__thread`(#define) has been replaced with `thread_local`(C++ keyword) across the code base.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10015

Reviewed By: siying

Differential Revision: D36485491

Pulled By: pdillinger

fbshipit-source-id: 6522d212514ee190b90b4e2750c80c7e34013c78
2022-05-18 15:25:19 -07:00
Yanqin Jin
e3a3dbf2be Avoid overwriting options loaded from OPTIONS (#9943)
Summary:
This is similar to https://github.com/facebook/rocksdb/issues/9862, including the following fixes/refactoring:

1. If OPTIONS file is specified via `-options_file`, majority of options will be loaded from the file. We should not
overwrite options that have been loaded from the file. Instead, we configure only fields of options which are
shared objects and not set by the OPTIONS file. We also configure a few fields, e.g. `create_if_missing` necessary
for stress test to run.

2. Refactor options initialization into three functions, `InitializeOptionsFromFile()`, `InitializeOptionsFromFlags()`
and `InitializeOptionsGeneral()` similar to db_bench. I hope they can be shared in the future. The high-level logic is
as follows:
```cpp
if (!InitializeOptionsFromFile(...)) {
  InitializeOptionsFromFlags(...);
}
InitializeOptionsGeneral(...);
```

3. Currently, the setting for `block_cache_compressed` does not seem correct because it by default specifies a
size of `numeric_limits<size_t>::max()` ((size_t)-1). According to code comments, `-1` indicates default value,
which should be referring to `num_shard_bits` argument.

4. Clarify `fail_if_options_file_error`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9943

Test Plan:
1. make check
2. Run stress tests, and manually check generated OPTIONS file and compare them with input OPTIONS files

Reviewed By: jay-zhuang

Differential Revision: D36133769

Pulled By: riversand963

fbshipit-source-id: 35dacdc090a0a72c922907170cd132b9ecaa073e
2022-05-18 12:43:50 -07:00
sdong
a74f14b550 Log error message when LinkFile() is not supported when ingesting files (#10010)
Summary:
Right now, whether moving file is skipped due to LinkFile() is not supported is opaque to users. Add a log message to help users debug.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10010

Test Plan: Run existing test. Manual test verify the log message printed out.

Reviewed By: riversand963

Differential Revision: D36463237

fbshipit-source-id: b00bd5041bd5c11afa4e326819c8461ee2c98a91
2022-05-18 11:23:12 -07:00
gitbw95
05c678e135 Set Write rate limiter priority dynamically and pass it to FS (#9988)
Summary:
### Context:
Background compactions and flush generate large reads and writes, and can be long running, especially for universal compaction. In some cases, this can impact foreground reads and writes by users.

From the RocksDB perspective, there can be two kinds of rate limiters, the internal (native) one and the external one.
- The internal (native) rate limiter is introduced in [the wiki](https://github.com/facebook/rocksdb/wiki/Rate-Limiter). Currently, only IO_LOW and IO_HIGH are used and they are set statically.
- For the external rate limiter, in FSWritableFile functions,  IOOptions is open for end users to set and get rate_limiter_priority for their own rate limiter. Currently, RocksDB doesn’t pass the rate_limiter_priority through IOOptions to the file system.

### Solution
During the User Read, Flush write, Compaction read/write, the WriteController is used to determine whether DB writes are stalled or slowed down. The rate limiter priority (Env::IOPriority) can be determined accordingly. We decided to always pass the priority in IOOptions. What the file system does with it should be a contract between the user and the file system. We would like to set the rate limiter priority at file level, since the Flush/Compaction job level may be too coarse with multiple files and block IO level is too granular.

**This PR is for the Write path.** The **Write:** dynamic priority for different state are listed as follows:

| State | Normal | Delayed | Stalled |
| ----- | ------ | ------- | ------- |
|  Flush | IO_HIGH | IO_USER | IO_USER |
|  Compaction | IO_LOW | IO_USER | IO_USER |

Flush and Compaction writes share the same call path through BlockBaseTableWriter, WritableFileWriter, and FSWritableFile. When a new FSWritableFile object is created, its io_priority_ can be set dynamically based on the state of the WriteController. In WritableFileWriter, before the call sites of FSWritableFile functions, WritableFileWriter::DecideRateLimiterPriority() determines the rate_limiter_priority. The options (IOOptions) argument of FSWritableFile functions will be updated with the rate_limiter_priority.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9988

Test Plan: Add unit tests.

Reviewed By: anand1976

Differential Revision: D36395159

Pulled By: gitbw95

fbshipit-source-id: a7c82fc29759139a1a07ec46c37dbf7e753474cf
2022-05-18 00:41:41 -07:00
Jay Zhuang
b84e3363f5 Add table_properties_collector_factories override (#9995)
Summary:
Add table_properties_collector_factories override on the remote
side.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9995

Test Plan: unittest added

Reviewed By: ajkr

Differential Revision: D36392623

Pulled By: jay-zhuang

fbshipit-source-id: 3ba031294d90247ca063d7de7b43178d38e3f66a
2022-05-17 20:57:51 -07:00
Peter Dillinger
0070680cfd Adjust public APIs to prefer 128-bit SST unique ID (#10009)
Summary:
128 bits should suffice almost always and for tracking in manifest.

Note that this changes the output of sst_dump --show_properties to only show 128 bits.

Also introduces InternalUniqueIdToHumanString for presenting internal IDs for debugging purposes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10009

Test Plan: unit tests updated

Reviewed By: jay-zhuang

Differential Revision: D36458189

Pulled By: pdillinger

fbshipit-source-id: 93ebc4a3b6f9c73ee154383a1f8b291a5d6bbef5
2022-05-17 18:43:48 -07:00
XieJiSS
8b1df101da fix: build on risc-v (#9215)
Summary:
Patch is modified from ~~https://reviews.llvm.org/file/data/du5ol5zctyqw53ma7dwz/PHID-FILE-knherxziu4tl4erti5ab/file~~

Tested on Arch Linux riscv64gc (qemu)

UPDATE: Seems like the above link is broken, so I tried to search for a link pointing to the original merge request. It turned out to me that the LLVM guys are cherry-picking from `google/benchmark`, and the upstream should be this:

808571a52f/src/cycleclock.h (L190)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9215

Reviewed By: siying, jay-zhuang

Differential Revision: D34170586

Pulled By: riversand963

fbshipit-source-id: 41b16b9f7f3bb0f3e7b26bb078eb575499c0f0f4
2022-05-17 17:33:01 -07:00
Hui Xiao
3573558ec5 Rewrite memory-charging feature's option API (#9926)
Summary:
**Context:**
Previous PR https://github.com/facebook/rocksdb/pull/9748, https://github.com/facebook/rocksdb/pull/9073, https://github.com/facebook/rocksdb/pull/8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`.

Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags.

**Summary:**
- Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that
- Added missing db bench/stress test for some memory charging features
- Renamed related test suite to indicate they are under the same theme of memory charging
- Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication
- Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9926

Test Plan:
- New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)`
- New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)`
- CI
- db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR  -charge_compression_dictionary_building_buffer=1(remove this for comparison)  -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'`

#-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
-- | -- | -- | -- | -- | --
10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721
20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465**
40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078**

- db_stress: `python3 tools/db_crashtest.py blackbox  -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal

Reviewed By: ajkr

Differential Revision: D36054712

Pulled By: hx235

fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af
2022-05-17 15:01:51 -07:00
Hui Xiao
f6339de0d2 Clarify some SequentialFileReader::Read logic (#10002)
Summary:
**Context/Summary:**
The logic related to PositionedRead in SequentialFileReader::Read confused me a bit as discussed here https://github.com/facebook/rocksdb/pull/9973#discussion_r872869256. Therefore I added a drawing with help from cbi42.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10002

Test Plan: - no code change

Reviewed By: anand1976, cbi42

Differential Revision: D36422632

Pulled By: hx235

fbshipit-source-id: 9a8311d2365564f90d216c430f542fc11b2d9cde
2022-05-17 10:24:04 -07:00
mrambacher
b11ff347b4 Use STATIC_AVOID_DESTRUCTION for static objects with non-trivial destructors (#9958)
Summary:
Changed the static objects that had non-trivial destructors to use the STATIC_AVOID_DESTRUCTION construct.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9958

Reviewed By: pdillinger

Differential Revision: D36442982

Pulled By: mrambacher

fbshipit-source-id: 029d47b1374d30d198bfede369a4c0ae7a4eb519
2022-05-17 09:39:22 -07:00
Yanqin Jin
3f263ef536 Add a temporary option for user to opt-out enforcement of SingleDelete contract (#9983)
Summary:
PR https://github.com/facebook/rocksdb/issues/9888 started to enforce the contract of single delete described in https://github.com/facebook/rocksdb/wiki/Single-Delete.

For some of existing use cases, it is desirable to have a transition during which compaction will not fail
if the contract is violated. Therefore, we add a temporary option `enforce_single_del_contracts` to allow
application to opt out from this new strict behavior. Once transition completes, the flag can be set to `true` again.

In a future release, the option will be removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9983

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D36333672

Pulled By: riversand963

fbshipit-source-id: dcb703ea0ed08076a1422f1bfb9914afe3c2caa2
2022-05-16 15:44:59 -07:00
Hui Xiao
e66e6d2faa Use SpecialEnv to speed up some slow BackupEngineRateLimitingTestWithParam (#9974)
Summary:
**Context:**
`BackupEngineRateLimitingTestWithParam.RateLimiting` and `BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup` involve creating backup and restoring of a big database with rate-limiting. Using the normal env with a normal clock requires real elapse of time (13702 - 19848 ms/per test). As suggested in https://github.com/facebook/rocksdb/pull/8722#discussion_r703698603, this PR is to speed it up with SpecialEnv (`time_elapse_only_sleep=true`) where its clock accepts fake elapse of time during rate-limiting (100 - 600 ms/per test)

**Summary:**
- Added TEST_ function to set clock of the default rate limiters in backup engine
- Shrunk testdb by 10 times while keeping it big enough for testing
- Renamed some test variables and reorganized some if-else branch for clarity without changing the test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9974

Test Plan:
- Run tests pre/post PR the same time to verify the tests are sped up by 90 - 95%
`BackupEngineRateLimitingTestWithParam.RateLimiting`
Pre:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 (11123 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 (9441 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 (11096 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 (9339 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 (11121 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 (9413 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 (11185 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 (9511 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (82230 ms total)
```
Post:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/0 (395 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/1 (564 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/2 (358 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/3 (567 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/4 (173 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/5 (176 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/6 (191 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimiting/7 (177 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (2601 ms total)
```
`BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup`
Pre:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 (7275 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 (3961 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 (7117 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 (3921 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 (19862 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 (10231 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 (19848 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 (10372 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (82587 ms total)
```
Post:
```
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/0 (157 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/1 (152 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/2 (160 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/3 (158 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/4 (155 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/5 (151 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/6 (146 ms)
[ RUN      ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7
[       OK ] RateLimiting/BackupEngineRateLimitingTestWithParam.RateLimitingVerifyBackup/7 (153 ms)
[----------] 8 tests from RateLimiting/BackupEngineRateLimitingTestWithParam (1232 ms total)
```

Reviewed By: pdillinger

Differential Revision: D36336345

Pulled By: hx235

fbshipit-source-id: 724c6ba745f95f56d4440a6d2f1e4512a2987589
2022-05-16 10:54:02 -07:00
mrambacher
204a42ca97 Added GetFactoryCount/Names/Types to ObjectRegistry (#9358)
Summary:
These methods allow for more thorough testing of the ObjectRegistry and Customizable infrastructure in a simpler manner.  With this change, the Customizable tests can now check what factories are registered and attempt to create each of them in a systematic fashion.

With this change, I think all of the factories registered with the ObjectRegistry/CreateFromString are now tested via the customizable_test classes.

Note that there were a few other minor changes.  There was a "posix://*" register with the ObjectRegistry which was missed during the PatternEntry conversion -- these changes found that.  The nickname and default names for the FileSystem classes was also inverted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9358

Reviewed By: pdillinger

Differential Revision: D33433542

Pulled By: mrambacher

fbshipit-source-id: 9a32da74e6620745b4eeffb2712be70eeeadfa7e
2022-05-16 09:44:43 -07:00
sdong
c4cd8e1acc Fix a bug handling multiget index I/O error. (#9993)
Summary:
In one path of BlockBasedTable::MultiGet(), Next() is directly called after calling Seek() against the index iterator. This might cause crash if an I/O error happens in Seek().
The bug is discovered in crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9993

Test Plan: See existing CI tests pass.

Reviewed By: anand1976

Differential Revision: D36381758

fbshipit-source-id: a11e0aa48dcee168c2554c33b532646ffdb68877
2022-05-13 13:15:10 -07:00
Yanqin Jin
b58a1a035b Revert "Bugfix/fix manual flush blocking bug (#9893)" (#9992)
Summary:
This reverts commit 6d2577e567.

A proposal for resolving our current internal test failures. A fix is
being planned.

More context can be found: https://github.com/facebook/rocksdb/pull/9893#issuecomment-1126230634
TSAN error: https://github.com/facebook/rocksdb/pull/9893#issuecomment-1126233132

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9992

Reviewed By: siying

Differential Revision: D36379154

Pulled By: riversand963

fbshipit-source-id: b240261e766eff099513799cf5631832093f4cd2
2022-05-13 12:31:30 -07:00
Yanqin Jin
f6d9730ea1 Fix stress test with best-efforts-recovery (#9986)
Summary:
This PR

- since we are testing with disable_wal = true and best_efforts_recovery, we should set column family count to 1, due to the requirement of `ExpectedState` tracking and replaying logic.
- during backup and checkpoint restore, disable best-efforts-recovery. This does not matter now because db_crashtest.py always disables wal when testing best-efforts-recovery. In the future, if we enable wal, then not setting `restore_opitions.best_efforts_recovery` will cause backup db not to recover the WALs, and differ from db (that enables WAL).
- during verification of backup and checkpoint restore, print the key where inconsistency exists between expected state and db.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9986

Test Plan: TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_best_efforts_recovery

Reviewed By: siying

Differential Revision: D36353105

Pulled By: riversand963

fbshipit-source-id: a484da161273e6216a1f7e245bac15a349693917
2022-05-13 12:29:20 -07:00
mrambacher
bfc6a8ee4a Option type info functions (#9411)
Summary:
Add methods to set the various functions (Parse, Serialize, Equals) to the OptionTypeInfo.  These methods simplify the number of constructors required for OptionTypeInfo and make the code a little clearer.

Add functions to the OptionTypeInfo for Prepare and Validate.  These methods allow types other than Configurable and Customizable to have Prepare and Validate logic.  These methods could be used by an option to guarantee that its settings were in a range or that a value was initialized.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9411

Reviewed By: pdillinger

Differential Revision: D36174849

Pulled By: mrambacher

fbshipit-source-id: 72517d8c6bab4723788a4c1a9e16590bff870125
2022-05-13 04:57:08 -07:00
Peter Dillinger
cdaa9576bb Put build size checking logic in Makefile (#9989)
Summary:
... for better maintainability, in case of Makefile changes /
refactoring. This is lightly modified from rocksd-lego-determinator, and
will be used by Meta-internal CI with custom REPORT_BUILD_STATISTIC

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9989

Test Plan: some manual stuff

Reviewed By: jay-zhuang

Differential Revision: D36362362

Pulled By: pdillinger

fbshipit-source-id: 52b65b6282fe839dc6d906ff95a3ed66ca1574ba
2022-05-12 22:54:18 -07:00
Chen Xiao
07c6807113 Add pmem-rocksdb-plugin link in PLUGINs.md (#9934)
Summary:
This change adds pmem-rocksdb-plugin link in PLUGINS.md. The link is: https://github.com/pmem/pmem-rocksdb-plugin. It provides a collection plugins to enable Persistent Memory (PMEM) on RocksDB.

The pmem-rocksdb-plugin repo contains RocksDB’s plugins for LSM-tree based KV store to fit it on the PMEM by effectively utilize its characteristics. The first two basic plugins are:
1) Providing a filesystem API wrapper to write RocksDB's WAL (Write Ahead Log) files on PMEM to optimize write performance. 2) Using PMEM as secondary cache to optimize read performance.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9934

Reviewed By: jay-zhuang

Differential Revision: D36366893

Pulled By: riversand963

fbshipit-source-id: d58a39365e9b5d6a3249d4e9b377c7fb2c79badb
2022-05-12 22:02:28 -07:00
Yueh-Hsuan Chiang
bcb1287235 Port the batched version of MultiGet() to RocksDB's C API (#9952)
Summary:
The batched version of MultiGet() is not available in RocksDB's C API.
This PR implements rocksdb_batched_multi_get_cf which is a C wrapper function
that invokes the batched version of MultiGet() which takes one single column family.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9952

Test Plan: Added a new test case under "columnfamilies" test case in c_test.cc

Reviewed By: riversand963

Differential Revision: D36302888

Pulled By: ajkr

fbshipit-source-id: fa134c4a1c8e7d72dd4ae8649a74e3797b5cf4e6
2022-05-12 18:17:36 -07:00
Akanksha Mahajan
6442a62e46 Update WAL corruption test so that it fails without fix (#9942)
Summary:
In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted WAL, and the "column family inconsistency" error will be hit, causing recovery to fail.

This PR update unit tests to emulate the errors and tests are failing without a fix.

Error:
```
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/0
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/0, where GetParam() = (true, false) (91 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/1
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/1, where GetParam() = (false, false) (92 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/2
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/2, where GetParam() = (true, true) (95 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/3
db/corruption_test.cc:1190: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF test_cf
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecovery/3, where GetParam() = (false, true) (92 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/0
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/0, where GetParam() = (true, false) (94 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/1
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/1, where GetParam() = (false, false) (97 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/2
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/2, where GetParam() = (true, true) (94 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/3
db/corruption_test.cc:1354: Failure
TransactionDB::Open(options, txn_db_opts, dbname_, cf_descs, &handles, &txn_db)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.TxnDbCrashDuringRecovery/3, where GetParam() = (false, true) (91 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/0
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/0, where GetParam() = (true, false) (93 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/1
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/1, where GetParam() = (false, false) (94 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/2
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/2, where GetParam() = (true, true) (90 ms)
[ RUN      ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/3
db/corruption_test.cc:1483: Failure
DB::Open(options, dbname_, cf_descs, &handles, &db_)
Corruption: SST file is ahead of WALs in CF default
[  FAILED  ] CorruptionTest/CrashDuringRecoveryWithCorruptionTest.CrashDuringRecoveryWithFlush/3, where GetParam() = (false, true) (93 ms)
[----------] 12 tests from CorruptionTest/CrashDuringRecoveryWithCorruptionTest (1116 ms total)

```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9942

Test Plan: Not needed

Reviewed By: riversand963

Differential Revision: D36324112

Pulled By: akankshamahajan15

fbshipit-source-id: cab2075ac4ebe48f5ef93a6ea162558aa4fc334d
2022-05-11 16:12:55 -07:00
Peter Dillinger
e96e8e2d05 Remove slack CircleCI hook (#9982)
Summary:
Our Slack site is deprecated

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9982

Test Plan: CircleCI

Reviewed By: siying

Differential Revision: D36322050

Pulled By: pdillinger

fbshipit-source-id: 678202404d307e1547e4203d7e6bd467803ccd5e
2022-05-11 13:17:21 -07:00
Andrew Kryczka
e943bbdd2f Temporarily disable sync_fault_injection (#9979)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9979

Reviewed By: siying

Differential Revision: D36301555

Pulled By: ajkr

fbshipit-source-id: ed298d3484b6aad3ef19746e984bf4c52be33a9f
2022-05-11 12:19:07 -07:00
Peter Dillinger
e8d604cf85 Reorganize CircleCI workflows (#9981)
Summary:
Condense down to 8 groups rather than 20+ for ease of browsing
pages like
https://app.circleci.com/pipelines/github/facebook/rocksdb?branch=main&filter=all

Also, run nightly builds at 1AM or 2AM Pacific (depending on daylight
time) rather than 4PM or 5PM Pacific, so that they actually use each
day's landed changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9981

Test Plan:
CI
And manually inspected
```
grep -Eo 'build-[^: ]*' .circleci/config.yml | sort | uniq -c | less
```
to ensure I didn't orphan anything

Reviewed By: jay-zhuang

Differential Revision: D36317634

Pulled By: pdillinger

fbshipit-source-id: 1c10d29d6b5d60ce3dd1364cd91f175380075ff3
2022-05-11 11:16:09 -07:00
yaphet
26768edb65 Support single delete in ldb (#9469)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9469

Reviewed By: riversand963

Differential Revision: D33953484

fbshipit-source-id: f4e84a2d9865957d744c7e84ff02ffbb0a62b0a8
2022-05-10 16:37:19 -07:00
Peter Dillinger
0d1613aad6 Avoid some warnings-as-error in CircleCI+unity+AVX512F (#9978)
Summary:
Example failure when compiling on sufficiently new hardware and built-in headers:

```
In file included from /usr/local/lib/gcc/x86_64-linux-gnu/12.1.0/include/immintrin.h:49,
                 from ./util/bloom_impl.h:21,
                 from table/block_based/filter_policy.cc:31,
                 from unity.cc:167:
In function '__m512i _mm512_shuffle_epi32(__m512i, _MM_PERM_ENUM)',
    inlined from 'void XXH3_accumulate_512_avx512(void*, const void*, const void*)' at util/xxhash.h:3605:58,
    inlined from 'void XXH3_accumulate(xxh_u64*, const xxh_u8*, const xxh_u8*, size_t, XXH3_f_accumulate_512)' at util/xxhash.h:4229:17,
    inlined from 'void XXH3_hashLong_internal_loop(xxh_u64*, const xxh_u8*, size_t, const xxh_u8*, size_t, XXH3_f_accumulate_512, XXH3_f_scrambleAcc)' at util/xxhash.h:4251:24,
    inlined from 'XXH128_hash_t XXH3_hashLong_128b_internal(const void*, size_t, const xxh_u8*, size_t, XXH3_f_accumulate_512, XXH3_f_scrambleAcc)' at util/xxhash.h:5065:32,
    inlined from 'XXH128_hash_t XXH3_hashLong_128b_withSecret(const void*, size_t, XXH64_hash_t, const void*, size_t)' at util/xxhash.h:5104:39:
/usr/local/lib/gcc/x86_64-linux-gnu/12.1.0/include/avx512fintrin.h:4459:50: error: '__Y' may be used uninitialized [-Werror=maybe-uninitialized]
```

https://app.circleci.com/pipelines/github/facebook/rocksdb/13295/workflows/1695fb5c-40c1-423b-96b4-45107dc3012d/jobs/360416

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9978

Test Plan:
I was able to re-run in CircleCI with ssh, see the failure, ssh in and
verify that adding -fno-avx512f fixed the failure. Will watch build-linux-unity-and-headers

Reviewed By: riversand963

Differential Revision: D36296028

Pulled By: pdillinger

fbshipit-source-id: ba5955cf2ac730f57d1d18c2f517e92f34be77a3
2022-05-10 15:24:40 -07:00
Peter Dillinger
e78451f3f6 Increase soft open file limit for mini-crashtest on Linux (#9972)
Summary:
CircleCI was using a soft open file limit of 1024 which would
frequently be exceeded during test runs. Now using
```
ulimit -S -n `ulimit -H -n`
```
to set soft limit up to the hard limit (524288 in my test). I've also
applied this same idiom to existing applicable MacOS configurations to
reduce hard-coding numbers.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9972

Test Plan: CI

Reviewed By: riversand963

Differential Revision: D36262943

Pulled By: pdillinger

fbshipit-source-id: 86320cdf9b68a97fdb73531a7b4a59b4c2d2f73f
2022-05-10 09:51:03 -07:00
Andrew Kryczka
7b7a37c069 Add microbenchmarks for DB::GetMergeOperands() (#9971)
Summary:
The new microbenchmarks, DBGetMergeOperandsInMemtable and DBGetMergeOperandsInSstFile, correspond to the two different LSMs tested: all data in one memtable and all data in one SST file, respectively. Both cases are parameterized by thread count (1 or 8) and merge operands per key (1, 32, or 1024). The SST file case is additionally parameterized by whether data is in block cache or mmap'd memory.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9971

Test Plan:
```
$ TEST_TMPDIR=/dev/shm/db_basic_bench/ ./db_basic_bench --benchmark_filter=DBGetMergeOperands
The number of inputs is very large. DBGet will be repeated at least 192 times.
The number of inputs is very large. DBGet will be repeated at least 192 times.
2022-05-09T13:15:40-07:00
Running ./db_basic_bench
Run on (36 X 2570.91 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x18)
  L1 Instruction 32 KiB (x18)
  L2 Unified 1024 KiB (x18)
  L3 Unified 25344 KiB (x1)
Load Average: 4.50, 4.33, 4.37
----------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                  Time             CPU   Iterations UserCounters...
----------------------------------------------------------------------------------------------------------------------------
DBGetMergeOperandsInMemtable/entries_per_key:1/threads:1                 846 ns          846 ns       849893 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:32/threads:1               2436 ns         2436 ns       305779 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:1024/threads:1            77226 ns        77224 ns         8152 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:1/threads:8                 116 ns          929 ns       779368 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:32/threads:8                330 ns         2644 ns       280824 db_size=0
DBGetMergeOperandsInMemtable/entries_per_key:1024/threads:8            12466 ns        99718 ns         7200 db_size=0
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:0/threads:1          1640 ns         1640 ns       461262 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:1/threads:1          1693 ns         1693 ns       439936 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:0/threads:1         3999 ns         3999 ns       172881 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:1/threads:1         5544 ns         5543 ns       135657 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:0/threads:1      78767 ns        78761 ns         8395 db_size=19.6389M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:1/threads:1     157242 ns       157238 ns         4495 db_size=19.6389M
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:0/threads:8           231 ns         1848 ns       347768 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:1/mmap:1/threads:8           214 ns         1715 ns       393312 db_size=21.7826M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:0/threads:8          596 ns         4767 ns       142088 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:32/mmap:1/threads:8          720 ns         5757 ns       118200 db_size=19.6981M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:0/threads:8      11613 ns        92460 ns         7344 db_size=19.6389M
DBGetMergeOperandsInSstFile/entries_per_key:1024/mmap:1/threads:8      19989 ns       159908 ns         4440 db_size=19.6389M
```

Reviewed By: jay-zhuang

Differential Revision: D36258861

Pulled By: ajkr

fbshipit-source-id: 04b733e1cc3a4a70ed9baa894c50fdf96c0d6064
2022-05-09 15:17:19 -07:00
Peter Dillinger
c5c58708db Fix format_compatible blowing away its TEST_TMPDIR (#9970)
Summary:
https://github.com/facebook/rocksdb/issues/9961 broke format_compatible check because of `make clean`
referencing TEST_TMPDIR. The Makefile behavior seems reasonable to me,
so here's a fix in check_format_compatible.sh

Apparently I also included removing a redundant part of our CircleCI config.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9970

Test Plan: manual run: SHORT_TEST=1 ./tools/check_format_compatible.sh

Reviewed By: riversand963

Differential Revision: D36258172

Pulled By: pdillinger

fbshipit-source-id: d46507f04614e888b414ff23b88d040ae2b5c294
2022-05-09 13:38:46 -07:00
Davide Angelocola
4527bb2fed Fix conversion issues in MutableOptions (#9194)
Summary:
Removing unnecessary checks around conversion from int/long to double as it does not lose information (see https://docs.oracle.com/javase/specs/jls/se9/html/jls-5.html#jls-5.1.2).

For example, `value > Double.MAX_VALUE` is always false when value is long or int.

Can you please have a look adamretter? Also fixed some other minor issues (do you prefer a separate PR?)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9194

Reviewed By: ajkr

Differential Revision: D36221694

fbshipit-source-id: bf327c07386560b87ddc0c98039e8d6e8f2f1e82
2022-05-09 12:34:26 -07:00
Wang Yuan
89571b30e5 Improve the precision of row entry charge in row_cache (#9337)
Summary:
- For entry charge, we should only calculate the value size instead of including key size in LRUCache
- The capacity of string could show the memory usage precisely

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9337

Reviewed By: ajkr

Differential Revision: D36219855

fbshipit-source-id: 393e48ca419d230dc552ae62dd0eb1cc9f45961d
2022-05-09 12:27:38 -07:00
Luca Giacchino
39b6c5791a Improve memkind library detection (#9134)
Summary:
Improve memkind library detection in build_detect_platform:

- The current position of -lmemkind does not work with all versions of gcc
- LDFLAGS allows specifying non-standard library path through EXTRA_LDFLAGS

After the change, the options match TBB detection.
This is a follow-up to https://github.com/facebook/rocksdb/issues/6214.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9134

Reviewed By: ajkr, mrambacher

Differential Revision: D32192028

fbshipit-source-id: 115fafe8d93f1fe6aaf80afb32b2cb67aad074c7
2022-05-09 12:26:09 -07:00
leipeng
9f7968b2ed arena.h: fix Arena::IsInInlineBlock() (#9317)
Summary:
When I enable hugepage on my box, unit test fails, this PR fixes this issue:

[  FAILED  ] ArenaTest.ApproximateMemoryUsage (1 ms)

memory/arena_test.cc:127: Failure
Value of: arena.IsInInlineBlock()
  Actual: true
Expected: false
arena.IsInInlineBlock() = 1
memory/arena_test.cc:127: Failure
Value of: arena.IsInInlineBlock()
  Actual: true
Expected: false

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9317

Reviewed By: ajkr

Differential Revision: D36219813

fbshipit-source-id: 08d040d9f37ec4c16987e4150c2db876180d163d
2022-05-09 12:21:21 -07:00
Qingyou Meng
7b55b50839 util/ribbon_alg.h: removed duplicate word "vector" (#9216)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9216

Reviewed By: riversand963

Differential Revision: D36219934

fbshipit-source-id: 8253b4e3eacceb8b040eeaa45cd5a50570a4eba6
2022-05-06 18:38:13 -07:00
aierui
d1cc91c142 typo fix: delete duplicate comment word (#9249)
Summary:
typo fix: delete duplicate comment word

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9249

Reviewed By: riversand963

Differential Revision: D36219911

fbshipit-source-id: 01e2fda65590f18fe46eefb56e049e6f2d028ae8
2022-05-06 18:29:33 -07:00
♚ PH⑦ de Soria™♛
9381436bf3 Fixed typo (#9331)
Summary:
Just fixing a very minor typo in the doc block :) Hope it will help anyway 😊

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9331

Reviewed By: riversand963

Differential Revision: D34339823

fbshipit-source-id: b76104bc3efbc9d1f38cbf5c6dd7648dc909ced3
2022-05-06 17:41:07 -07:00
Peter Dillinger
e03d958b91 Clean up variables for temporary directory (#9961)
Summary:
Having all of TMPD, TMPDIR and TEST_TMPDIR as configuration
parameters is confusing. This change simplifies a number of things by
standardizing on TEST_TMPDIR, while still recognizing the old names
also. In detail:
* crash_test.mk also needs to use TEST_TMPDIR for crash test, so put in
shared common.mk (an upgrade of python.mk)
* Always exporting TEST_TMPDIR eliminates the need to propagate TMPD or
export TEST_TMPDIR in selective places.
* Use --tmpdir option to gnu_parallel so that it doesn't need TMPDIR
environment variable
* Remove obsolete parloop and parallel_check Makefile targets
* Remove undefined, unused function ResetTmpDirForDirectIO()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9961

Test Plan: manual + CI

Reviewed By: riversand963

Differential Revision: D36212178

Pulled By: pdillinger

fbshipit-source-id: b76c1876c4f4d38b37789c2779eaa7c3026824dd
2022-05-06 16:38:06 -07:00
Roman Puchkovskiy
00889cf8f2 Never use String#getBytes() in the production code (#9487)
Summary:
There are encodings that are not ASCII-compatible (like cp1140), so it is possible that a JVM is run with a default encoding in which String#getBytes() would return unexpected values even for ASCII strings.

A little bit of context: https://stackoverflow.com/questions/70913929/can-an-encoding-incompatible-with-ascii-encoding-be-set-as-a-default-encoding-in/70914154

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9487

Reviewed By: riversand963

Differential Revision: D34097728

fbshipit-source-id: afd654ecaf20f6d30d9fc20c6a090398de2585eb
2022-05-06 16:22:15 -07:00
sdong
736a7b5433 Remove own ToString() (#9955)
Summary:
ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955

Test Plan: Watch CI tests

Reviewed By: riversand963

Differential Revision: D36176799

fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471
2022-05-06 13:03:58 -07:00
Andrew Kryczka
62d84e2a2b db_stress fault injection in release mode (#9957)
Summary:
Previously all fault injection was ignored in release mode. This PR adds it back except for read fault injection (`--read_fault_one_in > 0`) since its dependency (`IGNORE_STATUS_IF_ERROR`) is unavailable in release mode.

Other notable changes include:

- Moved `EnableWriteErrorInjection()` for `--write_fault_one_in > 0` so it's after `DB::Open()` without depending on `SyncPoint`
- Made `--read_fault_one_in > 0` return an error in release mode
- Updated `db_crashtest.py` to always set `--read_fault_one_in=0` in release mode

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9957

Test Plan:
```
$ DEBUG_LEVEL=0 make -j24 db_stress
$ DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox
```

Reviewed By: anand1976

Differential Revision: D36193830

Pulled By: ajkr

fbshipit-source-id: 0b97946b4e3f06e3e0f6e7833c2763da08ec5321
2022-05-06 11:17:08 -07:00
Otto Kekäläinen
b7aaa98762 Fix various spelling errors still found in code (#9653)
Summary:
dont -> don't
refered -> referred

This is a re-run of PR#7785 and acc9679 since these typos keep coming back.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9653

Reviewed By: jay-zhuang

Differential Revision: D34879593

fbshipit-source-id: d7631fb779ea0129beae92abfb838038e60790f8
2022-05-05 19:45:32 -07:00
Andrew Kryczka
a62506aee2 Enable unsynced data loss in crash test (#9947)
Summary:
`db_stress` already tracks expected state history to verify prefix-recoverability when `sync_fault_injection` is enabled. This PR enables `sync_fault_injection` in `db_crashtest.py`.

Previously enabling `sync_fault_injection` would cause whole unsynced files to be dropped. This PR adds a more interesting case of losing only the tail of unsynced data by implementing `TestFSWritableFile::RangeSync()` and enabling `{wal_,}bytes_per_sync`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9947

Test Plan:
- regular blackbox, blackbox --simple
- various commands to stress this new case, such as `TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=100000 --write_buffer_size=2097152 --avoid_flush_during_recovery=1 --disable_wal=0 --interval=10 --db_write_buffer_size=0 --sync_fault_injection=1 --wal_compression=none --delpercent=0 --delrangepercent=0 --prefixpercent=0 --iterpercent=0 --writepercent=100 --readpercent=0 --wal_bytes_per_sync=131072 --duration=36000 --sync=0 --open_write_fault_one_in=16`

Reviewed By: riversand963

Differential Revision: D36152775

Pulled By: ajkr

fbshipit-source-id: 44b68a7fad0a4cf74af9fe1f39be01baab8141d8
2022-05-05 13:21:03 -07:00
sdong
49628c9a83 Use std::numeric_limits<> (#9954)
Summary:
Right now we still don't fully use std::numeric_limits but use a macro, mainly for supporting VS 2013. Right now we only support VS 2017 and up so it is not a problem. The code comment claims that MinGW still needs it. We don't have a CI running MinGW so it's hard to validate. since we now require C++17, it's hard to imagine MinGW would still build RocksDB but doesn't support std::numeric_limits<>.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9954

Test Plan: See CI Runs.

Reviewed By: riversand963

Differential Revision: D36173954

fbshipit-source-id: a35a73af17cdcae20e258cdef57fcf29a50b49e0
2022-05-05 13:08:21 -07:00