Summary:
Right now, PosixRandomAccessFile::MultiRead() executes read requests in parallel. In this PR, it leverages I/O Uring library to run it in parallel, even when page cache is enabled. This function will fall back if the kernel version doesn't support it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5881
Test Plan: Run the unit test on a kernel version supporting it and make sure all tests pass, and run a unit test on kernel version supporting it and see it pass. Before merging, will also run stress test and see it passes.
Differential Revision: D17742266
fbshipit-source-id: e05699c925ac04fdb42379456a4e23e4ebcb803a
Summary:
Had complications with LITE build and valgrind test.
Reverts/fixes small parts of PR https://github.com/facebook/rocksdb/issues/6007
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6036
Test Plan:
make LITE=1 all check
and
ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 make -j24 db_bloom_filter_test && ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 ./db_bloom_filter_test
Differential Revision: D18512238
Pulled By: pdillinger
fbshipit-source-id: 37213cf0d309edf11c483fb4b2fb6c02c2cf2b28
Summary:
Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter.
Speed
The improved speed, at least on recent x86_64, comes from
* Using fastrange instead of modulo (%)
* Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only *slightly* slower on keys around 12 bytes if hashing the same size many thousands of times in a row.
* Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc.
* Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes.
Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed):
$ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter
Build avg ns/key: 47.7135
Mixed inside/outside queries...
Single filter net ns/op: 26.2825
Random filter net ns/op: 150.459
Average FP rate %: 0.954651
$ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter
Build avg ns/key: 47.2245
Mixed inside/outside queries...
Single filter net ns/op: 63.2978
Random filter net ns/op: 188.038
Average FP rate %: 1.13823
Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected.
The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome.
Accuracy
The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices
within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments.
Accuracy data (generalizes, except old impl gets worse with millions of keys):
Memory bits per key: FP rate percent old impl -> FP rate percent new impl
6: 5.70953 -> 5.69888
8: 2.45766 -> 2.29709
10: 1.13977 -> 0.959254
12: 0.662498 -> 0.411593
16: 0.353023 -> 0.0873754
24: 0.261552 -> 0.0060971
50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP)
Fixes https://github.com/facebook/rocksdb/issues/5857
Fixes https://github.com/facebook/rocksdb/issues/4120
Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized.
Compatibility
Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007
Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version).
Differential Revision: D18294749
Pulled By: pdillinger
fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f
Summary:
include db_stress_tool in rocksdb tools lib
Test Plan (on devserver):
```
$make db_stress
$./db_stress
$make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950
Differential Revision: D18044399
Pulled By: riversand963
fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1
Summary:
Users may desire to specify extra dependencies via buck. This PR allows users to pass additional dependencies as a JSON object so that the buckifier script can generate TARGETS file with desired extra dependencies.
Test plan (on dev server)
```
$python buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}'
Generating TARGETS
Extra dependencies:
{'': {'extra_compiler_flags': [], 'extra_deps': []}, 'test_dep1': {'extra_compiler_flags': ['-O2', '-DROCKSDB_LITE'], 'extra_deps': [':fake', '//dep1/mock']}}
Generated TARGETS Summary:
- 5 libs
- 0 binarys
- 296 tests
```
Verify the TARGETS file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5648
Differential Revision: D16565043
Pulled By: riversand963
fbshipit-source-id: a6ef02274174fcf159692d7b846e828454d01e89
Summary:
Update buckifier templates in the scripts.
Test plan (on devserver)
```
$python buckifier/buckify_rocksdb.py
```
Then
```
$git diff
```
Verify that generated TARGETS file is the same (except for indentation).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5647
Differential Revision: D16555647
Pulled By: riversand963
fbshipit-source-id: 32574a4d0e820858eab2391304dd731141719bcd
Summary:
This change adds a Dynamic Library class to the RocksDB Env. Dynamic libraries are populated via the Env::LoadLibrary method.
The addition of dynamic library support allows for a few different features to be developed:
1. The compression code can be changed to use dynamic library support. This would allow RocksDB to determine at run-time what compression packages were installed. This change would eliminate the need to make sure the build-time and run-time environment had the same library set. It would also simplify some of the Java build issues (where it attempts to build and include various packages inside the RocksDB jars).
2. Along with other features (to be provided in a subsequent PR), this change would allow code/configurations to be added to RocksDB at run-time. For example, the build system includes code for building an "rados" environment and adding "Cassandra" features. Instead of these extensions being built into the base RocksDB code, these extensions could be loaded at run-time as required/appropriate, either by configuration or explicitly.
We intend to push out other changes in support of the extending RocksDB at run-time via configurations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5281
Differential Revision: D15447613
Pulled By: riversand963
fbshipit-source-id: 452cd4f54511c0bceee18f6d9d919aae9fd25fef
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
Differential Revision: D15551366
Pulled By: siying
fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
Summary:
Declare Jemalloc non-standard APIs as weak symbols, so that if Jemalloc is linked with the binary, these symbols will be replaced by Jemalloc's, otherwise they will be nullptr. This is similar to how folly detect jemalloc, but we assume the main program use jemalloc as long as jemalloc is linked: https://github.com/facebook/folly/blob/master/folly/memory/Malloc.h#L147
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4844
Differential Revision: D13574934
Pulled By: yiwu-arbug
fbshipit-source-id: 7ea871beb1be7d5a1259cc38f9b78078793db2db
Summary:
Don't enable ROCKSDB_JEMALLOC unless the build mode is opt and default
allocator is jemalloc. In dev mode, this is causing compile/link errors such as -
```
stderr: buck-out/dev/gen/rocksdb/src/rocksdb_lib#compile-pic-malloc_stats.cc.o4768b59e,gcc-5-glibc-2.23-clang/db/malloc_stats.cc.o:malloc_stats.cc:function rocksdb::DumpMallocStats(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*): error: undefined reference to 'malloc_stats_print'
clang-7.0: error: linker command failed with exit code 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4747
Differential Revision: D13324840
Pulled By: anand1976
fbshipit-source-id: 45ffbd4f63fe4d9e8a0473d8f066155e4ef64a14
Summary:
Set the macro if default allocator is jemalloc. It doesn't handle the case when allocator is specified, e.g.
```
cpp_binary(
name="xxx"
allocator="jemalloc", # or "malloc" or something else
deps=["//rocksdb:rocksdb"],
)
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4489
Differential Revision: D10363683
Pulled By: yiwu-arbug
fbshipit-source-id: 5da490336a8e78e0feb0900c29e8036e7ec6f12b
Summary: There were a few files that were missed when AutoHeaders were moved to their own file. Add explicit loads
Reviewed By: yfeldblum
Differential Revision: D9499942
fbshipit-source-id: 942bf3a683b8961e1b6244136f6337477dcc45af
Summary:
Two CI tests never pass because of the environment problem. Delete them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4110
Differential Revision: D8805713
Pulled By: siying
fbshipit-source-id: 6eb4813dc2094ee2045ec8ede7fe8967d546d6e8
Summary:
-Wshorten-64-to-32 is invalid flag in fbcode. Changing it to -Warrowing.
Closes https://github.com/facebook/rocksdb/pull/4028
Differential Revision: D8553694
Pulled By: yiwu-arbug
fbshipit-source-id: 1523cbcb4c76cf1d2b10a4d28b5f58c78e6cb876
Summary:
Some flags used via make were not applied in the buckifier/targets file, causing some failures to be missed by testing infra ( ie the one fixed by #3434 )
Closes https://github.com/facebook/rocksdb/pull/3452
Differential Revision: D7457419
Pulled By: gfosco
fbshipit-source-id: e4aed2915ca3038c1485bbdeebedfc33d5704a49
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.
Reviewed By: yiwu-arbug
Differential Revision: D6821806
fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
Summary:
We're moving away from `import`. The equivalent internal construct that
gets the directory from `fbcode/` is `package_name()`. This is a
Skylark friendly wrapper around [`get_base_path`].
The additional whitespace change is from running `python ./buckifier/buckify_rocksdb.py`.
[`get_base_path`]: https://buckbuild.com/function/get_base_path.html
Closes https://github.com/facebook/rocksdb/pull/3210
Reviewed By: yiwu-arbug
Differential Revision: D6451242
Pulled By: zertosh
fbshipit-source-id: 445757261de0ec89d5d332c1ba9af097086326dc
Summary:
Do not build the tests in opt mode, since SyncPoint and other test code will not be included.
Closes https://github.com/facebook/rocksdb/pull/3204
Differential Revision: D6431154
Pulled By: yiwu-arbug
fbshipit-source-id: c404ef042c1a6f679e5c1dc57600b3d8cb52fc28
Summary:
We don't need to set them explicitly.
Closes https://github.com/facebook/rocksdb/pull/2660
Differential Revision: D5514141
Pulled By: yiwu-arbug
fbshipit-source-id: 10edebfc3cfe0afc00a34519f87fcea4d65069ae
Summary:
simply enable the macro in internal build, it wont hurt other sanitizers and will fix UBSAN issues
Closes https://github.com/facebook/rocksdb/pull/2625
Differential Revision: D5475897
Pulled By: IslamAbdelRahman
fbshipit-source-id: 262c6fd5de3c1906f4b29e55b39110f125f41057
Summary:
Instead of hard coding the path of the internal repo.
Make TARGETS file work anywhere in fbcode
Closes https://github.com/facebook/rocksdb/pull/2586
Differential Revision: D5428122
Pulled By: IslamAbdelRahman
fbshipit-source-id: 21adec82bfbff14ea93532bee789b5f5bbee5b01
Summary:
1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
2. add blob_db_test
Closes https://github.com/facebook/rocksdb/pull/2506
Differential Revision: D5331517
Pulled By: yiwu-arbug
fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57
Summary: Build Java and RocksDB LITE as a customized unit test under internal_repo_rocksdb. One thing I'm not sure is that whether these two tests are triggered in every flavor.
Reviewed By: IslamAbdelRahman
Differential Revision: D4855868
fbshipit-source-id: 82a1628b458744d7692bbd29ef7424cca1294031