Summary:
Was seeing
./cache_test: error while loading shared libraries: libasan.so.5: cannot open shared object file: No such file or directory
etc. using COMPILE_WITH_ASAN=1 without USE_CLANG=1
Now including compiler libs in runtime ld path.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8402
Test Plan: reproduced with local builds
Reviewed By: akankshamahajan15
Differential Revision: D29107729
Pulled By: pdillinger
fbshipit-source-id: 13805b87b846b39522c9dd6a231ca245c58f1c71
Summary:
Internal builds failing
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8399
Test Plan:
I can reproduce a failure by putting a bad version of `as` in
my PATH. This indicates that before this change, the custom compiler is
falsely relying on host `as`. This change fixes that, ignoring the bad
`as` on PATH.
Reviewed By: akankshamahajan15
Differential Revision: D29094159
Pulled By: pdillinger
fbshipit-source-id: c432e90404ea4d39d885a685eebbb08be9eda1c8
Summary:
platform007 being phased out and sometimes broken
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8389
Test Plan: `make V=1` to see which compiler is being used
Reviewed By: jay-zhuang
Differential Revision: D29067183
Pulled By: pdillinger
fbshipit-source-id: d1b07267cbc55baa9395f2f4fe3967cc6dad52f7
Summary:
By default, try to build with liburing. For make, if ROCKSDB_USE_IO_URING is not set, treat as 1, which means RocksDB will try to build with liburing. For cmake, add WITH_LIBURING to control it, with default on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8322
Test Plan: Build using cmake and make.
Reviewed By: anand1976
Differential Revision: D28586498
fbshipit-source-id: cfd39159ab697f4b93a9293a59c07f839b1e7ed5
Summary:
Forgot to re-test crash test after adding read-only filesystem
enforcement to https://github.com/facebook/rocksdb/issues/8142. The problem is ReadOnlyFileSystem would reject
CreateDirIfMissing whenever DBOptions::create_if_missing=true. The fix
that is better for users is to allow CreateDirIfMissing in
ReadOnlyFileSystem if the directory exists, so that they don't cause a
failure on using create_if_missing with opening backups as read-only
DBs. Added this option test to the unit test (in addition to being in the
crash test).
Also fixed a couple of lints.
And some better messaging from 'make format' so that when you run it
with uncommitted changes, it's clear that it's only checking the
uncommitted changes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8161
Test Plan: local blackbox_crash_test with amplified backup_one_in
Reviewed By: ajkr
Differential Revision: D27614409
Pulled By: pdillinger
fbshipit-source-id: 63ccb626c7e34c200d61c6bca2a8f60da9015179
Summary:
At least under MacOS, some things were excluded from the build (like Snappy) because the compilation flags were not passed in correctly. This PR does a few things:
- Passes the EXTRA_CXX/LDFLAGS into build_detect_platform. This means that if some tool (like TBB for example) is not installed in a standard place, it could still be detected by build_detect_platform. In this case, the developer would invoke: "EXTRA_CXXFLAGS=<path to TBB include> EXTRA_LDFLAGS=<path to TBB library> make", and the build script would find the tools in the extra location.
- Changes the compilation tests to use PLATFORM_CXXFLAGS. This change causes the EXTRA_FLAGS passed in to the script to be included in the compilation check. Additionally, flags set by the script itself (like --std=c++11) will be used during the checks.
Validated that the make_platform.mk file generated on Linux does not change with this change. On my MacOS machine, the SNAPPY libraries are now available (they were not before as they required --std=c++11 to build).
I also verified that I can build against TBB installed on my Mac by passing in the EXTRA CXX and LD FLAGS to the location in which TBB is installed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8111
Reviewed By: jay-zhuang
Differential Revision: D27353516
Pulled By: mrambacher
fbshipit-source-id: b6b378c96dbf678bab1479556dcbcb49c47e807d
Summary:
If the platform is ppc64 and the libc is not GNU libc, then we exclude the range_tree from compilation.
See https://jira.percona.com/browse/PS-7559
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8070
Reviewed By: jay-zhuang
Differential Revision: D27246004
Pulled By: mrambacher
fbshipit-source-id: 59d8433242ce7ce608988341becb4f83312445f5
Summary:
Extract test cases correctly in run_ci_db_test.ps1 script.
There are some new test group that are ended with # comments. Previously in the script when trying to extract test groups and test cases, the regex rule did not apply to this case so the concatenation of some test group and test case failed, see examples in comments.
Also removed useless trailing whitespaces in the script.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7989
Reviewed By: jay-zhuang
Differential Revision: D26615909
Pulled By: ajkr
fbshipit-source-id: 8e68d599994f17d6fefde0daa925c3018179521a
Summary:
Due to offline discussion, we use actual url of the clang-format-diff.py and add a note.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7950
Reviewed By: pdillinger
Differential Revision: D26370822
Pulled By: riversand963
fbshipit-source-id: 7508e23c002d56d5c1649090438ef5f8ff2cdbe7
Summary:
Recent Github actions of format checking fail due to invalid location
from where clang-format-diff.py is downloaded. Update the path to point
to a stable, archived location.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7944
Test Plan: manually check the result of Github action.
Reviewed By: ltamasi
Differential Revision: D26345066
Pulled By: riversand963
fbshipit-source-id: 2b1a58c2e59c2f1eb11202d321d2ea002cb0917e
Summary:
This disables Linux/amd64 builds in Travis for PRs, and adds a
gcc-10+c++20 build in CircleCI, which should fill out sufficient coverage
vs. what we had in Travis
Fixed a use of std::is_pod, which is deprecated in c++20
Fixed ++ on a volatile in db_repl_stress.cc, with bigger refactoring.
Although ++ on this volatile was probably ok with one thread writer and
one thread reader, the code was still overly complex. There was a
deadcode check for error
`if (replThread.no_read < dataPump.no_records)` which can be proven
never to happen based on the structure of the code. It infinite loops
instead for the case intended to be checked. I just simplified the code
for what should be the same checking power.
Also most configurations seem to be using make parallelism = 2 * vcores,
so fixing / using that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7791
Test Plan:
CI
and `while ./db_repl_stress; do echo again; done` for a while
Reviewed By: siying
Differential Revision: D25669834
Pulled By: pdillinger
fbshipit-source-id: b2c688053d0b1d52c989903449d3cd27a04130d6
Summary:
Expands on https://github.com/facebook/rocksdb/pull/7016 so that when `PORTABLE=1` is set the dependencies for RocksJava static target will also be built with backwards compatibility for MacOS as far back as 10.12 (i.e. 2016).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7683
Reviewed By: ajkr
Differential Revision: D25034164
Pulled By: pdillinger
fbshipit-source-id: dc9e51828869ed9ec336a8a86683e4d0bfe04f27
Summary:
`llvm-mirror/clang` is archived. Get the `clang-format-diff.py` file from the active source.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7609
Reviewed By: ajkr
Differential Revision: D24711608
Pulled By: pdillinger
fbshipit-source-id: b115d8765ff23fbb8190290a170de21565daba84
Summary:
My previous change to use lib2to3 to migrate clang-format-diff.py
for Python 2 only works if there's nothing to reformat. Instead, give
instructions to download to REPO_ROOT.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7603
Test Plan: Try the instructions on a fresh CentOS 8 devserver
Reviewed By: riversand963
Differential Revision: D24569608
Pulled By: pdillinger
fbshipit-source-id: 1410ba163e016b226e883dec93fae3df9ed0eab2
Summary:
These new functions and 128-bit value bit operations are
expected to be used in a forthcoming Bloom filter alternative.
No functional changes to production code, just new code only called by
unit tests, cosmetic changes to existing headers, and fix an existing
function for a yet-unused template instantiation (BitsSetToOne on
something signed and smaller than 32 bits).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7338
Test Plan:
Unit tests included. Works with and without
TEST_UINT128_COMPAT=1 to check compatibility with and without
__uint128_t. Also added that parameter to the CircleCI build
build-linux-shared_lib-alt_namespace-status_checked.
Reviewed By: jay-zhuang
Differential Revision: D23494945
Pulled By: pdillinger
fbshipit-source-id: 5c0dc419100d9df5d4d9abb153b2855d5aea39e8
Summary:
RocksDb regression commands are exiting with error
/usr/bin/ar: creating
librocksdb.a
/usr/bin/ld: ./cache/cache.o: relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a shared object; recompile with -fPIC
Bug: It tries to link the static code into a shared lib.
Fix: Added make clean before building shared_lib
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7300
Test Plan:
make clean
make -j$(nproc) static_lib
make -j$(nproc) shared_lib
Reviewed By: pdillinger
Differential Revision: D23276842
Pulled By: akankshamahajan15
fbshipit-source-id: c2e69fa505893ad414786794fc486f3f22f059d5
Summary:
We see some hosts failed to build platform009 with gcc. Revert the default to be platform007 if USE_CLANG is not specified.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7253
Test Plan: Build with both of USE_CLANG=1 set and not set and observe it builds successfully, and see the tool chain used.
Reviewed By: jay-zhuang
Differential Revision: D23110550
fbshipit-source-id: 25cb47923f7174b24debdad0cc8d90b07c4d5d09
Summary:
Upgrade tool chain to the latest. It is done mostly manually as build_tools/build_detect_platform fails to update many of them.
Try to fix a new clang analyze warning with the new tool chain.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7251
Test Plan: "make all", "USE_CLANG=1 make all"
Reviewed By: riversand963
Differential Revision: D23091090
fbshipit-source-id: 732e5a30137837431438f85f36296406b641f975
Summary:
`USE_LTO=1` in `make` commands now enables LTO. The archiver (`ar`) needed
to change in this PR to use a wrapper that enables the LTO plugin.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7181
Test Plan:
build a few ways
```
$ make clean && USE_LTO=1 make -j48 db_bench
$ make clean && USE_CLANG=1 USE_LTO=1 make -j48 db_bench
$ make clean && ROCKSDB_NO_FBCODE=1 USE_LTO=1 make -j48 db_bench
```
Reviewed By: cheng-chang
Differential Revision: D22784994
Pulled By: ajkr
fbshipit-source-id: 9c45333bd49bf4615aa04c85b7c6fd3925421152
Summary:
Change the linking of tests/tools to be against a library rather than a list of objects. This change substantially reduces the size of the objects produced.
peterd clean repo size: 264M
Before this change, with make all: 40G
After this change, with make all: 28G
With make LIB_MODE=shared all: 7.0G
The list of TESTS was changed from being hard-coded to generated from the test sources variable. Note that there are some test sources that are not built as tests (though the set of tests is identical to the previous version).
Added OBJ_DIR option to Makefile to allow objects to be placed in an alternative location. By default, OBJ_DIR is the same as before ("./").
This change is a precursor to being able to build/run the tests/tools linked against static libraries. Additionally, it should be possible to clean up and merge some of the rules for building tests and the like if so desired.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6660
Reviewed By: riversand963
Differential Revision: D22244463
Pulled By: pdillinger
fbshipit-source-id: db9c6341d81ed62c2270374f4ede02fb9604c754
Summary:
When `PORTABLE=1` is set, RocksDB will now be built with backwards compatibility for MacOS as far back as 10.12 (i.e. 2016).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7016
Reviewed By: ajkr
Differential Revision: D22211312
Pulled By: pdillinger
fbshipit-source-id: 7b0858d9b55d6265d3ea27bf5ea1673639b6538c
Summary:
RocksDB Makefile was assuming existence of 'python' command,
which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.
Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)
Also, now use just 'python3' for PYTHON if not found so that an informative
"command not found" error will result rather than something weird.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883
Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.
Reviewed By: gg814
Differential Revision: D21767029
Pulled By: pdillinger
fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
Summary:
* Add missing unit test for schema stability of FileChecksumGenCrc32c
(previously was only comparing to itself)
* A lot of clarifying comments
* Add some assertions for preconditions
* Rename WritableFileWriter::CalculateFileChecksum -> UpdateFileChecksum
* Simplify FileChecksumGenCrc32c with shared functions
* Implement EndianSwapValue to replace unused EndianTransform
And incidentally since I had trouble with 'make check-format' GitHub action disagreeing with local run,
* Output full diagnostic information when 'make check-format' fails in CI
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6861
Test Plan: new unit test passes before & after other changes
Reviewed By: zhichao-cao
Differential Revision: D21667115
Pulled By: pdillinger
fbshipit-source-id: 6a99970f87605aa024fa540c78cd519ff322c3e6
Summary:
Add Github Action to perform some basic sanity check for PR, inclding the
following.
1) Buck TARGETS file.
On the one hand, The TARGETS file is used for internal buck, and we do not
manually update it. On the other hand, we need to run the buckifier scripts to
update TARGETS whenever new files are added, etc. With this Github Action, we
make sure that every PR does not forget this step. The GH Action uses
a Makefile target called check-buck-targets. Users can manually run `make
check-buck-targets` on local machine.
2) Code format
We use clang-format-diff.py to format our code. The GH Action in this PR makes
sure this step is not skipped. The checking script build_tools/format-diff.sh assumes that `clang-format-diff.py` is executable.
On host running GH Action, it is difficult to download `clang-format-diff.py` and make it
executable. Therefore, we modified build_tools/format-diff.sh to handle the case in which there is a non-executable clang-format-diff.py file in the top-level rocksdb repo directory.
Test Plan (Github and devserver):
Watch for Github Action result in the `Checks` tab.
On dev server
```
make check-format
make check-buck-targets
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6761
Test Plan: Watch for Github Action result in the `Checks` tab.
Reviewed By: pdillinger
Differential Revision: D21260209
Pulled By: riversand963
fbshipit-source-id: c646e2f37c6faf9f0614b68aa0efc818cff96787
Summary:
Nasty bug in which more/different changes would be applied than
those shown to user
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6772
Test Plan: manual
Reviewed By: siying
Differential Revision: D21304604
Pulled By: pdillinger
fbshipit-source-id: 7e20740e513c9c300d1522511290a025b35abedc
Summary:
Based on https://github.com/facebook/rocksdb/issues/6648 (CLA Signed), but heavily modified / extended:
* Implicit capture of this via [=] deprecated in C++20, and [=,this] not standard before C++20 -> now using explicit capture lists
* Implicit copy operator deprecated in gcc 9 -> add explicit '= default' definition
* std::random_shuffle deprecated in C++17 and removed in C++20 -> migrated to a replacement in RocksDB random.h API
* Add the ability to build with different std version though -DCMAKE_CXX_STANDARD=11/14/17/20 on the cmake command line
* Minimal rebuild flag of MSVC is deprecated and is forbidden with /std:c++latest (C++20)
* Added MSVC 2019 C++11 & MSVC 2019 C++20 in AppVeyor
* Added GCC 9 C++11 & GCC9 C++20 in Travis
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6697
Test Plan: make check and CI
Reviewed By: cheng-chang
Differential Revision: D21020318
Pulled By: pdillinger
fbshipit-source-id: 12311be5dbd8675a0e2c817f7ec50fa11c18ab91
Summary:
Improve it in two ways:
1. tools/check_format_compatible.sh is not friendly to run outside FB environment. remove the hard-coded http proxy setting. Instead, move it to Legocastle configuration
2. Always disable warning as error, so that older build is more likely to pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6702
Test Plan: Run the test and make sure at least it doesn't break.
Reviewed By: riversand963
Differential Revision: D21033329
fbshipit-source-id: 88b4ec1ec49547b772790050a165466bdc4a62a0
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538
Test Plan:
crash_test
make check
Reviewed By: riversand963
Differential Revision: D20714347
Pulled By: anand1976
fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.
**Performance**
We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.
The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.
![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)
**Changes**
- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator
**Minimum Requirements**
- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind
**Memory Configuration**
The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.
Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.
**Usage**
When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):
```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"
NewLRUCache(
capacity /*size_t*/,
6 /*cache_numshardbits*/,
false /*strict_capacity_limit*/,
false /*cache_high_pri_pool_ratio*/,
std::make_shared<MemkindKmemAllocator>());
```
Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214
Reviewed By: cheng-chang
Differential Revision: D19292435
fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
Summary:
In the `.travis.yml` file the `jdk: openjdk7` element is ignored when `language: cpp`. So whatever version of the JDK that was installed in the Travis container was used - typically JDK 11.
To ensure our RocksJava builds are working, we now instead install and use OpenJDK 8. Ideally we would use OpenJDK 7, as RocksJava supports Java 7, but many of the newer Travis containers don't support Java 7, so Java 8 is the next best thing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6512
Differential Revision: D20388296
Pulled By: pdillinger
fbshipit-source-id: 8bbe6b59b70cfab7fe81ff63867d907fefdd2df1
Summary:
Check for sys/auxv.h and getauxval before using them as they are not
always available (for example on uclibc)
Signed-off-by: Fabrice Fontaine <fontaine.fabrice@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6359
Differential Revision: D20239797
fbshipit-source-id: 175a098094d81545628c2372e7c388e70a32fd48
Summary:
We realized bugs related to IO Uring. Turn it off by default.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6405
Test Plan: Manually run build_tools/build_detect_platform and observe outputs.
Differential Revision: D19862792
fbshipit-source-id: 5d5e8e2762997b72a145ae59389ef3d7e4ccd060
Summary:
I set up a mirror of our Java deps on github so we can download
them through github URLs rather than maven.org, which is proving
terribly unreliable from Travis builds.
Also sanitized calls to curl, so they are easier to read and
appropriately fail on download failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348
Test Plan: CI
Differential Revision: D19633621
Pulled By: pdillinger
fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3
Summary:
Difficult to root cause crash test failures without archiving
db dir. Now all crash test configurations should save the db dir.
Also exit with error code on bad command.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6344
Test Plan:
Hmm, how about this:
for TARGET in stress_crash asan_crash ubsan_crash tsan_crash; do EMAIL=email ONCALL=oncall TRIGGER=all SUBSCRIBER=sub build_tools/rocksdb-lego-determinator $TARGET > tmp && node -c tmp && grep -q Upload tmp || echo Bad; done
Differential Revision: D19625605
Pulled By: pdillinger
fbshipit-source-id: cb84aa93ee80b4534f4c61b90f0e0f99a41155d5
Summary:
While the instruction of installing "make format" dependencies works on some platforms, it is hard to use for some others. Improve it a little bit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6162
Test Plan: Run "make format" on an envrionment missing the dependencies and see the instructions printed out
Differential Revision: D18970773
fbshipit-source-id: fd21b31053407cc171a6675f781a556a1c3e8945
Summary:
Right now, PosixRandomAccessFile::MultiRead() executes read requests in parallel. In this PR, it leverages I/O Uring library to run it in parallel, even when page cache is enabled. This function will fall back if the kernel version doesn't support it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5881
Test Plan: Run the unit test on a kernel version supporting it and make sure all tests pass, and run a unit test on kernel version supporting it and see it pass. Before merging, will also run stress test and see it passes.
Differential Revision: D17742266
fbshipit-source-id: e05699c925ac04fdb42379456a4e23e4ebcb803a
Summary:
Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter.
Speed
The improved speed, at least on recent x86_64, comes from
* Using fastrange instead of modulo (%)
* Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only *slightly* slower on keys around 12 bytes if hashing the same size many thousands of times in a row.
* Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc.
* Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes.
Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed):
$ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter
Build avg ns/key: 47.7135
Mixed inside/outside queries...
Single filter net ns/op: 26.2825
Random filter net ns/op: 150.459
Average FP rate %: 0.954651
$ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter
Build avg ns/key: 47.2245
Mixed inside/outside queries...
Single filter net ns/op: 63.2978
Random filter net ns/op: 188.038
Average FP rate %: 1.13823
Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected.
The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome.
Accuracy
The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices
within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments.
Accuracy data (generalizes, except old impl gets worse with millions of keys):
Memory bits per key: FP rate percent old impl -> FP rate percent new impl
6: 5.70953 -> 5.69888
8: 2.45766 -> 2.29709
10: 1.13977 -> 0.959254
12: 0.662498 -> 0.411593
16: 0.353023 -> 0.0873754
24: 0.261552 -> 0.0060971
50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP)
Fixes https://github.com/facebook/rocksdb/issues/5857
Fixes https://github.com/facebook/rocksdb/issues/4120
Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized.
Compatibility
Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007
Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version).
Differential Revision: D18294749
Pulled By: pdillinger
fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f