2018-04-03 08:45:39 +02:00
|
|
|
#!/usr/bin/env bash
|
2012-03-21 18:28:03 +01:00
|
|
|
#
|
2012-04-17 17:36:46 +02:00
|
|
|
# Detects OS we're compiling on and outputs a file specified by the first
|
|
|
|
# argument, which in turn gets read while processing Makefile.
|
2012-03-21 18:28:03 +01:00
|
|
|
#
|
2012-04-17 17:36:46 +02:00
|
|
|
# The output will set the following variables:
|
2012-08-27 08:45:35 +02:00
|
|
|
# CC C Compiler path
|
|
|
|
# CXX C++ Compiler path
|
2012-03-21 18:28:03 +01:00
|
|
|
# PLATFORM_LDFLAGS Linker flags
|
2014-07-22 07:41:54 +02:00
|
|
|
# JAVA_LDFLAGS Linker flags for RocksDBJava
|
2015-10-09 20:41:40 +02:00
|
|
|
# JAVA_STATIC_LDFLAGS Linker flags for RocksDBJava static build
|
2020-03-12 20:22:03 +01:00
|
|
|
# JAVAC_ARGS Arguments for javac
|
2012-03-30 22:15:49 +02:00
|
|
|
# PLATFORM_SHARED_EXT Extension for shared libraries
|
|
|
|
# PLATFORM_SHARED_LDFLAGS Flags for building shared library
|
|
|
|
# PLATFORM_SHARED_CFLAGS Flags for compiling objects for shared library
|
2012-03-21 18:28:03 +01:00
|
|
|
# PLATFORM_CCFLAGS C compiler flags
|
|
|
|
# PLATFORM_CXXFLAGS C++ compiler flags. Will contain:
|
2012-08-27 08:45:35 +02:00
|
|
|
# PLATFORM_SHARED_VERSIONED Set to 'true' if platform supports versioned
|
|
|
|
# shared libraries, empty otherwise.
|
2018-03-19 20:11:58 +01:00
|
|
|
# FIND Command for the find utility
|
|
|
|
# WATCH Command for the watch utility
|
2012-08-27 08:45:35 +02:00
|
|
|
#
|
|
|
|
# The PLATFORM_CCFLAGS and PLATFORM_CXXFLAGS might include the following:
|
|
|
|
#
|
2016-08-03 20:07:53 +02:00
|
|
|
# -DROCKSDB_PLATFORM_POSIX if posix-platform based
|
2014-02-08 03:12:30 +01:00
|
|
|
# -DSNAPPY if the Snappy library is present
|
|
|
|
# -DLZ4 if the LZ4 library is present
|
2015-08-28 00:40:42 +02:00
|
|
|
# -DZSTD if the ZSTD library is present
|
2014-07-07 19:53:31 +02:00
|
|
|
# -DNUMA if the NUMA library is present
|
2016-08-18 19:44:29 +02:00
|
|
|
# -DTBB if the TBB library is present
|
Provide an allocator for new memory type to be used with RocksDB block cache (#6214)
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.
**Performance**
We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.
The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.
![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)
**Changes**
- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator
**Minimum Requirements**
- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind
**Memory Configuration**
The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.
Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.
**Usage**
When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):
```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"
NewLRUCache(
capacity /*size_t*/,
6 /*cache_numshardbits*/,
false /*strict_capacity_limit*/,
false /*cache_high_pri_pool_ratio*/,
std::make_shared<MemkindKmemAllocator>());
```
Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214
Reviewed By: cheng-chang
Differential Revision: D19292435
fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
2020-04-10 05:45:17 +02:00
|
|
|
# -DMEMKIND if the memkind library is present
|
2012-08-27 08:45:35 +02:00
|
|
|
#
|
2013-11-17 08:44:39 +01:00
|
|
|
# Using gflags in rocksdb:
|
|
|
|
# Our project depends on gflags, which requires users to take some extra steps
|
|
|
|
# before they can compile the whole repository:
|
|
|
|
# 1. Install gflags. You may download it from here:
|
2016-08-25 19:40:38 +02:00
|
|
|
# https://gflags.github.io/gflags/ (Mac users can `brew install gflags`)
|
|
|
|
# 2. Once installed, add the include path for gflags to your CPATH env var and
|
|
|
|
# the lib path to LIBRARY_PATH. If installed with default settings, the lib
|
|
|
|
# will be /usr/local/lib and the include path will be /usr/local/include
|
2012-03-21 18:28:03 +01:00
|
|
|
|
2012-04-17 17:36:46 +02:00
|
|
|
OUTPUT=$1
|
|
|
|
if test -z "$OUTPUT"; then
|
2012-08-27 08:45:35 +02:00
|
|
|
echo "usage: $0 <output-filename>" >&2
|
2012-04-17 17:36:46 +02:00
|
|
|
exit 1
|
|
|
|
fi
|
2011-06-29 02:30:50 +02:00
|
|
|
|
2022-02-05 02:12:03 +01:00
|
|
|
# we depend on C++17, but should be compatible with newer standards
|
2020-12-22 09:19:44 +01:00
|
|
|
if [ "$ROCKSDB_CXX_STANDARD" ]; then
|
|
|
|
PLATFORM_CXXFLAGS="-std=$ROCKSDB_CXX_STANDARD"
|
|
|
|
else
|
2022-02-05 02:12:03 +01:00
|
|
|
PLATFORM_CXXFLAGS="-std=c++17"
|
2020-12-22 09:19:44 +01:00
|
|
|
fi
|
|
|
|
|
2013-11-18 20:40:16 +01:00
|
|
|
# we currently depend on POSIX platform
|
2015-10-14 10:14:53 +02:00
|
|
|
COMMON_FLAGS="-DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX"
|
2013-11-18 20:40:16 +01:00
|
|
|
|
2013-01-14 21:39:24 +01:00
|
|
|
# Default to fbcode gcc on internal fb machines
|
2014-09-19 18:27:16 +02:00
|
|
|
if [ -z "$ROCKSDB_NO_FBCODE" -a -d /mnt/gvfs/third-party ]; then
|
2013-11-18 07:05:00 +01:00
|
|
|
FBCODE_BUILD="true"
|
2015-01-23 20:22:20 +01:00
|
|
|
# If we're compiling with TSAN we need pic build
|
|
|
|
PIC_BUILD=$COMPILE_WITH_TSAN
|
2019-01-28 19:57:57 +01:00
|
|
|
if [ -n "$ROCKSDB_FBCODE_BUILD_WITH_481" ]; then
|
2015-01-23 23:51:27 +01:00
|
|
|
# we need this to build with MySQL. Don't use for other purposes.
|
|
|
|
source "$PWD/build_tools/fbcode_config4.8.1.sh"
|
2019-10-21 21:07:58 +02:00
|
|
|
elif [ -n "$ROCKSDB_FBCODE_BUILD_WITH_5xx" ]; then
|
2019-01-28 19:57:57 +01:00
|
|
|
source "$PWD/build_tools/fbcode_config.sh"
|
2020-08-13 04:28:28 +02:00
|
|
|
elif [ -n "$ROCKSDB_FBCODE_BUILD_WITH_PLATFORM007" ]; then
|
2019-10-21 21:07:58 +02:00
|
|
|
source "$PWD/build_tools/fbcode_config_platform007.sh"
|
2020-08-13 23:48:08 +02:00
|
|
|
elif [ -n "$ROCKSDB_FBCODE_BUILD_WITH_PLATFORM009" ]; then
|
|
|
|
source "$PWD/build_tools/fbcode_config_platform009.sh"
|
2020-08-13 04:28:28 +02:00
|
|
|
else
|
|
|
|
source "$PWD/build_tools/fbcode_config_platform009.sh"
|
2015-01-23 23:51:27 +01:00
|
|
|
fi
|
2013-01-14 21:39:24 +01:00
|
|
|
fi
|
|
|
|
|
2012-04-17 17:36:46 +02:00
|
|
|
# Delete existing output, if it exists
|
2014-05-11 06:01:25 +02:00
|
|
|
rm -f "$OUTPUT"
|
|
|
|
touch "$OUTPUT"
|
2011-06-29 02:30:50 +02:00
|
|
|
|
2012-08-27 08:45:35 +02:00
|
|
|
if test -z "$CC"; then
|
2018-04-03 08:45:39 +02:00
|
|
|
if [ -x "$(command -v cc)" ]; then
|
|
|
|
CC=cc
|
|
|
|
elif [ -x "$(command -v clang)" ]; then
|
|
|
|
CC=clang
|
|
|
|
else
|
|
|
|
CC=cc
|
|
|
|
fi
|
2012-08-27 08:45:35 +02:00
|
|
|
fi
|
|
|
|
|
2011-11-30 11:59:40 +01:00
|
|
|
if test -z "$CXX"; then
|
2018-04-03 08:45:39 +02:00
|
|
|
if [ -x "$(command -v g++)" ]; then
|
|
|
|
CXX=g++
|
|
|
|
elif [ -x "$(command -v clang++)" ]; then
|
|
|
|
CXX=clang++
|
|
|
|
else
|
|
|
|
CXX=g++
|
|
|
|
fi
|
2011-11-30 11:59:40 +01:00
|
|
|
fi
|
|
|
|
|
2020-07-28 22:09:12 +02:00
|
|
|
if test -z "$AR"; then
|
|
|
|
if [ -x "$(command -v gcc-ar)" ]; then
|
|
|
|
AR=gcc-ar
|
|
|
|
elif [ -x "$(command -v llvm-ar)" ]; then
|
|
|
|
AR=llvm-ar
|
|
|
|
else
|
|
|
|
AR=ar
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
2011-06-29 02:30:50 +02:00
|
|
|
# Detect OS
|
2012-03-21 18:28:03 +01:00
|
|
|
if test -z "$TARGET_OS"; then
|
|
|
|
TARGET_OS=`uname -s`
|
|
|
|
fi
|
|
|
|
|
2015-02-27 00:19:17 +01:00
|
|
|
if test -z "$TARGET_ARCHITECTURE"; then
|
|
|
|
TARGET_ARCHITECTURE=`uname -m`
|
|
|
|
fi
|
|
|
|
|
2015-02-04 06:43:06 +01:00
|
|
|
if test -z "$CLANG_SCAN_BUILD"; then
|
|
|
|
CLANG_SCAN_BUILD=scan-build
|
|
|
|
fi
|
|
|
|
|
build: do not relink every single binary just for a timestamp
Summary:
Prior to this change, "make check" would always waste a lot of
time relinking 60+ binaries. With this change, it does that
only when the generated file, util/build_version.cc, changes,
and that happens only when the date changes or when the
current git SHA changes.
This change makes some other improvements: before, there was no
rule to build a deleted util/build_version.cc. If it was somehow
removed, any attempt to link a program would fail.
There is no longer any need for the separate file,
build_tools/build_detect_version. Its functionality is
now in the Makefile.
* Makefile (DEPFILES): Don't filter-out util/build_version.cc.
No need, and besides, removing that dependency was wrong.
(date, git_sha, gen_build_version): New helper variables.
(util/build_version.cc): New rule, to create this file
and update it only if it would contain new information.
* build_tools/build_detect_platform: Remove file.
* db/db_impl.cc: Now, print only date (not the time).
* util/build_version.h (rocksdb_build_compile_time): Remove
declaration. No longer used.
Test Plan:
- Run "make check" twice, and note that the second time no linking is performed.
- Remove util/build_version.cc and ensure that any "make"
command regenerates it before doing anything else.
- Run this: strings librocksdb.a|grep _build_.
That prints output including the following:
rocksdb_build_git_date:2015-02-19
rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33591
2015-02-19 22:11:10 +01:00
|
|
|
if test -z "$CLANG_ANALYZER"; then
|
2018-04-03 08:45:39 +02:00
|
|
|
CLANG_ANALYZER=$(command -v clang++ 2> /dev/null)
|
2015-02-04 06:43:06 +01:00
|
|
|
fi
|
|
|
|
|
2018-03-19 20:11:58 +01:00
|
|
|
if test -z "$FIND"; then
|
|
|
|
FIND=find
|
|
|
|
fi
|
|
|
|
|
|
|
|
if test -z "$WATCH"; then
|
|
|
|
WATCH=watch
|
|
|
|
fi
|
|
|
|
|
fPIC in x64 environment
Summary:
Check https://github.com/facebook/rocksdb/pull/15 for context.
Apparently [1], we need -fPIC in x64 environments (this is added only in non-fbcode).
In fbcode, I removed -fPIC per @dhruba's suggestion, since it introduces perf regression. I'm not sure what would are the implications of doing that, but looks like it works, and when releasing to the third-party, we're disabling -fPIC either way [2].
Would love a suggestion from someone who knows more about this
[1] http://eli.thegreenplace.net/2011/11/11/position-independent-code-pic-in-shared-libraries-on-x64/
[2] https://our.intern.facebook.com/intern/wiki/index.php/Database/RocksDB/Third_Party
Test Plan: make check works
Reviewers: dhruba, emayanke, kailiu
Reviewed By: dhruba
CC: leveldb, dhruba, reconnect.grayhat
Differential Revision: https://reviews.facebook.net/D14337
2013-11-26 06:21:01 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS ${CFLAGS}"
|
2012-08-27 08:45:35 +02:00
|
|
|
CROSS_COMPILE=
|
2012-03-21 18:28:03 +01:00
|
|
|
PLATFORM_CCFLAGS=
|
2013-01-15 03:37:01 +01:00
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS"
|
2012-03-30 22:15:49 +02:00
|
|
|
PLATFORM_SHARED_EXT="so"
|
2015-06-24 01:32:59 +02:00
|
|
|
PLATFORM_SHARED_LDFLAGS="-Wl,--no-as-needed -shared -Wl,-soname -Wl,"
|
2012-03-30 22:15:49 +02:00
|
|
|
PLATFORM_SHARED_CFLAGS="-fPIC"
|
2015-04-07 22:22:22 +02:00
|
|
|
PLATFORM_SHARED_VERSIONED=true
|
2012-03-21 18:28:03 +01:00
|
|
|
|
2013-04-11 19:54:35 +02:00
|
|
|
# generic port files (working on all platform by #ifdef) go directly in /port
|
2014-05-11 06:01:25 +02:00
|
|
|
GENERIC_PORT_FILES=`cd "$ROCKSDB_ROOT"; find port -name '*.cc' | tr "\n" " "`
|
2013-04-11 19:54:35 +02:00
|
|
|
|
2012-03-21 18:28:03 +01:00
|
|
|
# On GCC, we pick libc's memcmp over GCC's memcmp via -fno-builtin-memcmp
|
|
|
|
case "$TARGET_OS" in
|
2011-06-29 02:30:50 +02:00
|
|
|
Darwin)
|
|
|
|
PLATFORM=OS_MACOSX
|
2013-11-13 05:05:28 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DOS_MACOSX"
|
2012-03-30 22:15:49 +02:00
|
|
|
PLATFORM_SHARED_EXT=dylib
|
|
|
|
PLATFORM_SHARED_LDFLAGS="-dynamiclib -install_name "
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/darwin/darwin_specific.cc
|
2011-06-29 02:30:50 +02:00
|
|
|
;;
|
2014-04-04 22:11:44 +02:00
|
|
|
IOS)
|
|
|
|
PLATFORM=IOS
|
2014-04-15 22:39:26 +02:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DOS_MACOSX -DIOS_CROSS_COMPILE -DROCKSDB_LITE"
|
2014-04-04 22:11:44 +02:00
|
|
|
PLATFORM_SHARED_EXT=dylib
|
|
|
|
PLATFORM_SHARED_LDFLAGS="-dynamiclib -install_name "
|
|
|
|
CROSS_COMPILE=true
|
2015-04-07 22:22:22 +02:00
|
|
|
PLATFORM_SHARED_VERSIONED=
|
2014-04-04 22:11:44 +02:00
|
|
|
;;
|
2011-06-29 02:30:50 +02:00
|
|
|
Linux)
|
|
|
|
PLATFORM=OS_LINUX
|
2013-08-16 05:53:21 +02:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DOS_LINUX"
|
2013-01-15 03:37:01 +01:00
|
|
|
if [ -z "$USE_CLANG" ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp"
|
2018-04-25 21:08:24 +02:00
|
|
|
else
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -latomic"
|
2013-01-15 03:37:01 +01:00
|
|
|
fi
|
2020-07-01 04:31:57 +02:00
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt -ldl"
|
2021-05-21 19:20:58 +02:00
|
|
|
if test -z "$ROCKSDB_USE_IO_URING"; then
|
|
|
|
ROCKSDB_USE_IO_URING=1
|
|
|
|
fi
|
|
|
|
if test "$ROCKSDB_USE_IO_URING" -ne 0; then
|
2020-02-13 03:00:24 +01:00
|
|
|
# check for liburing
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -luring -o test.o 2>/dev/null <<EOF
|
2020-02-13 03:00:24 +01:00
|
|
|
#include <liburing.h>
|
|
|
|
int main() {
|
|
|
|
struct io_uring ring;
|
|
|
|
io_uring_queue_init(1, &ring, 0);
|
|
|
|
return 0;
|
|
|
|
}
|
2019-12-08 05:54:27 +01:00
|
|
|
EOF
|
2020-02-13 03:00:24 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -luring"
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_IOURING_PRESENT"
|
|
|
|
fi
|
2019-12-08 05:54:27 +01:00
|
|
|
fi
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/linux/linux_specific.cc
|
2011-06-29 02:30:50 +02:00
|
|
|
;;
|
|
|
|
SunOS)
|
|
|
|
PLATFORM=OS_SOLARIS
|
2017-04-22 05:41:37 +02:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_SOLARIS -m64"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt -static-libstdc++ -static-libgcc -m64"
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/sunos/sunos_specific.cc
|
2011-06-29 02:30:50 +02:00
|
|
|
;;
|
2017-04-22 05:41:37 +02:00
|
|
|
AIX)
|
|
|
|
PLATFORM=OS_AIX
|
|
|
|
CC=gcc
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -maix64 -pthread -fno-builtin-memcmp -D_REENTRANT -DOS_AIX -D__STDC_FORMAT_MACROS"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -pthread -lpthread -lrt -maix64 -static-libstdc++ -static-libgcc"
|
|
|
|
# PORT_FILES=port/aix/aix_specific.cc
|
|
|
|
;;
|
2011-07-27 03:46:25 +02:00
|
|
|
FreeBSD)
|
|
|
|
PLATFORM=OS_FREEBSD
|
2018-01-11 22:21:35 +01:00
|
|
|
CXX=clang++
|
2013-01-14 21:39:24 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_FREEBSD"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread"
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/freebsd/freebsd_specific.cc
|
2011-07-27 03:46:25 +02:00
|
|
|
;;
|
2020-06-18 18:50:05 +02:00
|
|
|
GNU/kFreeBSD)
|
|
|
|
PLATFORM=OS_GNU_KFREEBSD
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DOS_GNU_KFREEBSD"
|
|
|
|
if [ -z "$USE_CLANG" ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp"
|
|
|
|
else
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -latomic"
|
|
|
|
fi
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt"
|
|
|
|
# PORT_FILES=port/gnu_kfreebsd/gnu_kfreebsd_specific.cc
|
|
|
|
;;
|
2012-03-05 19:35:46 +01:00
|
|
|
NetBSD)
|
|
|
|
PLATFORM=OS_NETBSD
|
2013-01-14 21:39:24 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_NETBSD"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lgcc_s"
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/netbsd/netbsd_specific.cc
|
2012-03-05 19:35:46 +01:00
|
|
|
;;
|
|
|
|
OpenBSD)
|
|
|
|
PLATFORM=OS_OPENBSD
|
2018-03-19 20:11:58 +01:00
|
|
|
CXX=clang++
|
2013-01-14 21:39:24 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_OPENBSD"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -pthread"
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/openbsd/openbsd_specific.cc
|
2018-03-19 20:11:58 +01:00
|
|
|
FIND=gfind
|
|
|
|
WATCH=gnuwatch
|
2012-03-05 19:35:46 +01:00
|
|
|
;;
|
|
|
|
DragonFly)
|
|
|
|
PLATFORM=OS_DRAGONFLYBSD
|
2013-01-14 21:39:24 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_DRAGONFLYBSD"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread"
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/dragonfly/dragonfly_specific.cc
|
2012-03-21 18:28:03 +01:00
|
|
|
;;
|
2015-04-24 04:17:57 +02:00
|
|
|
Cygwin)
|
|
|
|
PLATFORM=CYGWIN
|
2015-06-12 22:54:29 +02:00
|
|
|
PLATFORM_SHARED_CFLAGS=""
|
2015-04-24 04:17:57 +02:00
|
|
|
PLATFORM_CXXFLAGS="-std=gnu++11"
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DCYGWIN"
|
|
|
|
if [ -z "$USE_CLANG" ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp"
|
2018-04-25 21:08:24 +02:00
|
|
|
else
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -latomic"
|
2015-04-24 04:17:57 +02:00
|
|
|
fi
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt"
|
|
|
|
# PORT_FILES=port/linux/linux_specific.cc
|
|
|
|
;;
|
2012-03-21 18:28:03 +01:00
|
|
|
OS_ANDROID_CROSSCOMPILE)
|
2012-08-27 08:45:35 +02:00
|
|
|
PLATFORM=OS_ANDROID
|
2016-08-03 20:07:53 +02:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_ANDROID -DROCKSDB_PLATFORM_POSIX"
|
2013-01-14 21:39:24 +01:00
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS " # All pthread features are in the Android C library
|
2013-04-11 19:54:35 +02:00
|
|
|
# PORT_FILES=port/android/android.cc
|
2012-08-27 08:45:35 +02:00
|
|
|
CROSS_COMPILE=true
|
2012-03-05 19:35:46 +01:00
|
|
|
;;
|
2011-06-29 02:30:50 +02:00
|
|
|
*)
|
2012-08-27 08:45:35 +02:00
|
|
|
echo "Unknown platform!" >&2
|
2011-06-29 02:30:50 +02:00
|
|
|
exit 1
|
|
|
|
esac
|
|
|
|
|
2015-04-24 04:17:57 +02:00
|
|
|
PLATFORM_CXXFLAGS="$PLATFORM_CXXFLAGS ${CXXFLAGS}"
|
2014-07-22 07:41:54 +02:00
|
|
|
JAVA_LDFLAGS="$PLATFORM_LDFLAGS"
|
2015-10-09 20:41:40 +02:00
|
|
|
JAVA_STATIC_LDFLAGS="$PLATFORM_LDFLAGS"
|
2020-03-12 20:22:03 +01:00
|
|
|
JAVAC_ARGS="-source 7"
|
2014-07-22 07:41:54 +02:00
|
|
|
|
2013-11-18 07:05:00 +01:00
|
|
|
if [ "$CROSS_COMPILE" = "true" -o "$FBCODE_BUILD" = "true" ]; then
|
2012-03-21 18:28:03 +01:00
|
|
|
# Cross-compiling; do not try any compilation tests.
|
2013-11-18 07:05:00 +01:00
|
|
|
# Also don't need any compilation tests if compiling on fbcode
|
2020-04-11 02:18:56 +02:00
|
|
|
if [ "$FBCODE_BUILD" = "true" ]; then
|
|
|
|
# Enable backtrace on fbcode since the necessary libraries are present
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_BACKTRACE"
|
|
|
|
fi
|
2014-01-06 20:53:19 +01:00
|
|
|
true
|
2011-06-29 02:30:50 +02:00
|
|
|
else
|
2016-02-12 02:00:01 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_FALLOCATE; then
|
|
|
|
# Test whether fallocate is available
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2016-02-12 02:00:01 +01:00
|
|
|
#include <fcntl.h>
|
|
|
|
#include <linux/falloc.h>
|
|
|
|
int main() {
|
|
|
|
int fd = open("/dev/null", 0);
|
2019-03-05 00:40:26 +01:00
|
|
|
fallocate(fd, FALLOC_FL_KEEP_SIZE, 0, 1024);
|
2016-02-12 02:00:01 +01:00
|
|
|
}
|
2013-12-11 07:34:19 +01:00
|
|
|
EOF
|
2016-02-12 02:00:01 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_FALLOCATE_PRESENT"
|
|
|
|
fi
|
2013-12-11 07:34:19 +01:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_SNAPPY; then
|
|
|
|
# Test whether Snappy library is installed
|
|
|
|
# http://code.google.com/p/snappy/
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <snappy.h>
|
|
|
|
int main() {}
|
2011-06-29 02:30:50 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DSNAPPY"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lsnappy"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -lsnappy"
|
|
|
|
fi
|
2012-03-21 18:28:03 +01:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_GFLAGS; then
|
|
|
|
# Test whether gflags library is installed
|
|
|
|
# http://gflags.github.io/gflags/
|
|
|
|
# check if the namespace is gflags
|
2022-01-14 23:08:05 +01:00
|
|
|
if $CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null << EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <gflags/gflags.h>
|
2020-07-01 04:31:57 +02:00
|
|
|
using namespace GFLAGS_NAMESPACE;
|
2017-12-06 00:05:41 +01:00
|
|
|
int main() {}
|
2015-04-22 21:47:50 +02:00
|
|
|
EOF
|
2020-07-01 04:31:57 +02:00
|
|
|
then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DGFLAGS=1"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lgflags"
|
|
|
|
# check if namespace is gflags
|
2022-01-14 23:08:05 +01:00
|
|
|
elif $CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null << EOF
|
2020-07-01 04:31:57 +02:00
|
|
|
#include <gflags/gflags.h>
|
|
|
|
using namespace gflags;
|
|
|
|
int main() {}
|
|
|
|
EOF
|
|
|
|
then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DGFLAGS=1 -DGFLAGS_NAMESPACE=gflags"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lgflags"
|
|
|
|
# check if namespace is google
|
2022-01-14 23:08:05 +01:00
|
|
|
elif $CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null << EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <gflags/gflags.h>
|
|
|
|
using namespace google;
|
|
|
|
int main() {}
|
|
|
|
EOF
|
2020-07-01 04:31:57 +02:00
|
|
|
then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DGFLAGS=1 -DGFLAGS_NAMESPACE=google"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lgflags"
|
2017-12-06 00:05:41 +01:00
|
|
|
fi
|
2013-10-24 16:43:14 +02:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_ZLIB; then
|
|
|
|
# Test whether zlib library is installed
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <zlib.h>
|
|
|
|
int main() {}
|
2012-06-28 08:41:33 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DZLIB"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lz"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -lz"
|
|
|
|
fi
|
2012-06-28 08:41:33 +02:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_BZIP; then
|
|
|
|
# Test whether bzip library is installed
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <bzlib.h>
|
|
|
|
int main() {}
|
2012-06-29 04:26:43 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DBZIP2"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lbz2"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -lbz2"
|
|
|
|
fi
|
2012-06-29 04:26:43 +02:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_LZ4; then
|
|
|
|
# Test whether lz4 library is installed
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <lz4.h>
|
|
|
|
#include <lz4hc.h>
|
|
|
|
int main() {}
|
2014-02-08 03:12:30 +01:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DLZ4"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -llz4"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -llz4"
|
|
|
|
fi
|
2014-02-08 03:12:30 +01:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_ZSTD; then
|
|
|
|
# Test whether zstd library is installed
|
2021-03-31 16:39:41 +02:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <zstd.h>
|
|
|
|
int main() {}
|
2015-08-28 00:40:42 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DZSTD"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lzstd"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -lzstd"
|
|
|
|
fi
|
2015-08-28 00:40:42 +02:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_NUMA; then
|
|
|
|
# Test whether numa is available
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o -lnuma 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <numa.h>
|
|
|
|
#include <numaif.h>
|
|
|
|
int main() {}
|
2014-07-07 19:53:31 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DNUMA"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lnuma"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -lnuma"
|
|
|
|
fi
|
2014-07-07 19:53:31 +02:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_TBB; then
|
|
|
|
# Test whether tbb is available
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $LDFLAGS -x c++ - -o test.o -ltbb 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <tbb/tbb.h>
|
|
|
|
int main() {}
|
2016-08-18 19:44:29 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DTBB"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -ltbb"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -ltbb"
|
|
|
|
fi
|
2016-08-18 19:44:29 +02:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_JEMALLOC; then
|
|
|
|
# Test whether jemalloc is available
|
2022-01-14 23:08:05 +01:00
|
|
|
if echo 'int main() {}' | $CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o -ljemalloc \
|
2017-12-06 00:05:41 +01:00
|
|
|
2>/dev/null; then
|
|
|
|
# This will enable some preprocessor identifiers in the Makefile
|
|
|
|
JEMALLOC=1
|
|
|
|
# JEMALLOC can be enabled either using the flag (like here) or by
|
|
|
|
# providing direct link to the jemalloc library
|
|
|
|
WITH_JEMALLOC_FLAG=1
|
2018-12-18 01:27:00 +01:00
|
|
|
# check for JEMALLOC installed with HomeBrew
|
|
|
|
if [ "$PLATFORM" == "OS_MACOSX" ]; then
|
|
|
|
if hash brew 2>/dev/null && brew ls --versions jemalloc > /dev/null; then
|
|
|
|
JEMALLOC_VER=$(brew ls --versions jemalloc | tail -n 1 | cut -f 2 -d ' ')
|
|
|
|
JEMALLOC_INCLUDE="-I/usr/local/Cellar/jemalloc/${JEMALLOC_VER}/include"
|
|
|
|
JEMALLOC_LIB="/usr/local/Cellar/jemalloc/${JEMALLOC_VER}/lib/libjemalloc_pic.a"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS $JEMALLOC_LIB"
|
|
|
|
JAVA_STATIC_LDFLAGS="$JAVA_STATIC_LDFLAGS $JEMALLOC_LIB"
|
|
|
|
fi
|
|
|
|
fi
|
2017-12-06 00:05:41 +01:00
|
|
|
fi
|
|
|
|
fi
|
|
|
|
if ! test $JEMALLOC && ! test $ROCKSDB_DISABLE_TCMALLOC; then
|
2015-04-24 02:48:18 +02:00
|
|
|
# jemalloc is not available. Let's try tcmalloc
|
2022-01-14 23:08:05 +01:00
|
|
|
if echo 'int main() {}' | $CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o \
|
2016-04-28 01:23:33 +02:00
|
|
|
-ltcmalloc 2>/dev/null; then
|
2015-04-24 02:48:18 +02:00
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -ltcmalloc"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -ltcmalloc"
|
|
|
|
fi
|
2012-03-21 18:28:03 +01:00
|
|
|
fi
|
Use malloc_usable_size() for accounting block cache size
Summary:
Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB!
This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected.
This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory.
I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases.
Test Plan:
1. fill up mongo instance with 80GB of data
2. restart mongo with block cache size configured to 10GB
3. do a table scan in mongo
4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB
Reviewers: sdong, MarkCallaghan, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D40635
2015-06-26 20:48:09 +02:00
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_MALLOC_USABLE_SIZE; then
|
|
|
|
# Test whether malloc_usable_size is available
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <malloc.h>
|
|
|
|
int main() {
|
|
|
|
size_t res = malloc_usable_size(0);
|
2019-05-16 00:57:04 +02:00
|
|
|
(void)res;
|
2017-12-06 00:05:41 +01:00
|
|
|
return 0;
|
|
|
|
}
|
Use malloc_usable_size() for accounting block cache size
Summary:
Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB!
This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected.
This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory.
I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases.
Test Plan:
1. fill up mongo instance with 80GB of data
2. restart mongo with block cache size configured to 10GB
3. do a table scan in mongo
4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB
Reviewers: sdong, MarkCallaghan, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D40635
2015-06-26 20:48:09 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_MALLOC_USABLE_SIZE"
|
|
|
|
fi
|
Use malloc_usable_size() for accounting block cache size
Summary:
Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB!
This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected.
This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory.
I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases.
Test Plan:
1. fill up mongo instance with 80GB of data
2. restart mongo with block cache size configured to 10GB
3. do a table scan in mongo
4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB
Reviewers: sdong, MarkCallaghan, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D40635
2015-06-26 20:48:09 +02:00
|
|
|
fi
|
2020-07-01 04:31:57 +02:00
|
|
|
|
Provide an allocator for new memory type to be used with RocksDB block cache (#6214)
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.
**Performance**
We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.
The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.
![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)
**Changes**
- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator
**Minimum Requirements**
- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind
**Memory Configuration**
The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.
Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.
**Usage**
When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):
```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"
NewLRUCache(
capacity /*size_t*/,
6 /*cache_numshardbits*/,
false /*strict_capacity_limit*/,
false /*cache_high_pri_pool_ratio*/,
std::make_shared<MemkindKmemAllocator>());
```
Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214
Reviewed By: cheng-chang
Differential Revision: D19292435
fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
2020-04-10 05:45:17 +02:00
|
|
|
if ! test $ROCKSDB_DISABLE_MEMKIND; then
|
|
|
|
# Test whether memkind library is installed
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -lmemkind -x c++ - -o test.o 2>/dev/null <<EOF
|
Provide an allocator for new memory type to be used with RocksDB block cache (#6214)
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.
**Performance**
We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.
The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.
![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)
**Changes**
- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator
**Minimum Requirements**
- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind
**Memory Configuration**
The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.
Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.
**Usage**
When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):
```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"
NewLRUCache(
capacity /*size_t*/,
6 /*cache_numshardbits*/,
false /*strict_capacity_limit*/,
false /*cache_high_pri_pool_ratio*/,
std::make_shared<MemkindKmemAllocator>());
```
Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214
Reviewed By: cheng-chang
Differential Revision: D19292435
fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
2020-04-10 05:45:17 +02:00
|
|
|
#include <memkind.h>
|
|
|
|
int main() {
|
|
|
|
memkind_malloc(MEMKIND_DAX_KMEM, 1024);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DMEMKIND"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lmemkind"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -lmemkind"
|
|
|
|
fi
|
|
|
|
fi
|
2020-07-01 04:31:57 +02:00
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_PTHREAD_MUTEX_ADAPTIVE_NP; then
|
|
|
|
# Test whether PTHREAD_MUTEX_ADAPTIVE_NP mutex type is available
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <pthread.h>
|
|
|
|
int main() {
|
|
|
|
int x = PTHREAD_MUTEX_ADAPTIVE_NP;
|
2019-05-16 00:57:04 +02:00
|
|
|
(void)x;
|
2017-12-06 00:05:41 +01:00
|
|
|
return 0;
|
|
|
|
}
|
2016-04-23 01:49:12 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX"
|
|
|
|
fi
|
2016-04-23 01:49:12 +02:00
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_BACKTRACE; then
|
|
|
|
# Test whether backtrace is available
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2019-05-16 00:57:04 +02:00
|
|
|
#include <execinfo.h>
|
2016-04-23 01:49:12 +02:00
|
|
|
int main() {
|
|
|
|
void* frames[1];
|
|
|
|
backtrace_symbols(frames, backtrace(frames, 1));
|
2017-12-06 00:05:41 +01:00
|
|
|
return 0;
|
2016-04-23 01:49:12 +02:00
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_BACKTRACE"
|
2017-12-06 00:05:41 +01:00
|
|
|
else
|
|
|
|
# Test whether execinfo library is installed
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -lexecinfo -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <execinfo.h>
|
|
|
|
int main() {
|
|
|
|
void* frames[1];
|
|
|
|
backtrace_symbols(frames, backtrace(frames, 1));
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_BACKTRACE"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lexecinfo"
|
|
|
|
JAVA_LDFLAGS="$JAVA_LDFLAGS -lexecinfo"
|
|
|
|
fi
|
2016-04-23 01:49:12 +02:00
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_PG; then
|
|
|
|
# Test if -pg is supported
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -pg -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
int main() {
|
|
|
|
return 0;
|
|
|
|
}
|
2016-04-23 01:49:12 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
PROFILING_FLAGS=-pg
|
|
|
|
fi
|
2016-04-23 01:49:12 +02:00
|
|
|
fi
|
2017-04-22 05:41:37 +02:00
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_SYNC_FILE_RANGE; then
|
|
|
|
# Test whether sync_file_range is supported for compatibility with an old glibc
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <fcntl.h>
|
|
|
|
int main() {
|
|
|
|
int fd = open("/dev/null", 0);
|
|
|
|
sync_file_range(fd, 0, 1024, SYNC_FILE_RANGE_WRITE);
|
|
|
|
}
|
2017-04-22 05:41:37 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_RANGESYNC_PRESENT"
|
|
|
|
fi
|
2017-04-22 05:41:37 +02:00
|
|
|
fi
|
2017-05-10 20:50:10 +02:00
|
|
|
|
2017-12-06 00:05:41 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_SCHED_GETCPU; then
|
|
|
|
# Test whether sched_getcpu is supported
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
#include <sched.h>
|
|
|
|
int main() {
|
|
|
|
int cpuid = sched_getcpu();
|
2019-05-16 00:57:04 +02:00
|
|
|
(void)cpuid;
|
2017-12-06 00:05:41 +01:00
|
|
|
}
|
2017-05-10 20:50:10 +02:00
|
|
|
EOF
|
2017-12-06 00:05:41 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_SCHED_GETCPU_PRESENT"
|
|
|
|
fi
|
2017-05-10 20:50:10 +02:00
|
|
|
fi
|
2018-10-23 19:53:38 +02:00
|
|
|
|
2020-03-04 03:06:56 +01:00
|
|
|
if ! test $ROCKSDB_DISABLE_AUXV_GETAUXVAL; then
|
|
|
|
# Test whether getauxval is supported
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2020-03-04 03:06:56 +01:00
|
|
|
#include <sys/auxv.h>
|
|
|
|
int main() {
|
|
|
|
uint64_t auxv = getauxval(AT_HWCAP);
|
|
|
|
(void)auxv;
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_AUXV_GETAUXVAL_PRESENT"
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
2018-10-23 19:53:38 +02:00
|
|
|
if ! test $ROCKSDB_DISABLE_ALIGNED_NEW; then
|
|
|
|
# Test whether c++17 aligned-new is supported
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -faligned-new -x c++ - -o test.o 2>/dev/null <<EOF
|
2018-10-23 19:53:38 +02:00
|
|
|
struct alignas(1024) t {int a;};
|
|
|
|
int main() {}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
PLATFORM_CXXFLAGS="$PLATFORM_CXXFLAGS -faligned-new -DHAVE_ALIGNED_NEW"
|
|
|
|
fi
|
|
|
|
fi
|
2021-07-09 02:50:55 +02:00
|
|
|
if ! test $ROCKSDB_DISABLE_BENCHMARK; then
|
|
|
|
# Test whether google benchmark is available
|
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o /dev/null -lbenchmark 2>/dev/null <<EOF
|
|
|
|
#include <benchmark/benchmark.h>
|
|
|
|
int main() {}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lbenchmark"
|
|
|
|
fi
|
|
|
|
fi
|
2011-06-29 02:30:50 +02:00
|
|
|
fi
|
|
|
|
|
2015-02-27 00:19:17 +01:00
|
|
|
# TODO(tec): Fix -Wshorten-64-to-32 errors on FreeBSD and enable the warning.
|
2021-10-14 23:37:37 +02:00
|
|
|
# -Wshorten-64-to-32 breaks compilation on FreeBSD aarch64 and i386
|
|
|
|
if ! { [ "$TARGET_OS" = FreeBSD ] && [ "$TARGET_ARCHITECTURE" = arm64 -o "$TARGET_ARCHITECTURE" = i386 ]; }; then
|
2015-02-27 00:19:17 +01:00
|
|
|
# Test whether -Wshorten-64-to-32 is available
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS -x c++ - -o test.o -Wshorten-64-to-32 2>/dev/null <<EOF
|
2015-02-27 00:19:17 +01:00
|
|
|
int main() {}
|
2014-11-11 22:47:22 +01:00
|
|
|
EOF
|
2015-02-27 00:19:17 +01:00
|
|
|
if [ "$?" = 0 ]; then
|
2014-11-11 22:47:22 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -Wshorten-64-to-32"
|
2015-02-27 00:19:17 +01:00
|
|
|
fi
|
2014-11-11 22:47:22 +01:00
|
|
|
fi
|
|
|
|
|
2019-09-13 20:04:52 +02:00
|
|
|
if test "0$PORTABLE" -eq 0; then
|
2016-01-19 16:08:19 +01:00
|
|
|
if test -n "`echo $TARGET_ARCHITECTURE | grep ^ppc64`"; then
|
|
|
|
# Tune for this POWER processor, treating '+' models as base models
|
|
|
|
POWER=`LD_SHOW_AUXV=1 /bin/true | grep AT_PLATFORM | grep -E -o power[0-9]+`
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -mcpu=$POWER -mtune=$POWER "
|
2019-06-18 20:20:52 +02:00
|
|
|
elif test -n "`echo $TARGET_ARCHITECTURE | grep -e^arm -e^aarch64`"; then
|
2018-01-10 03:12:32 +01:00
|
|
|
# TODO: Handle this with approprite options.
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS"
|
2019-06-13 20:43:35 +02:00
|
|
|
elif test -n "`echo $TARGET_ARCHITECTURE | grep ^aarch64`"; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS"
|
2021-10-22 19:12:09 +02:00
|
|
|
elif test -n "`echo $TARGET_ARCHITECTURE | grep ^s390x`"; then
|
|
|
|
if echo 'int main() {}' | $CXX $PLATFORM_CXXFLAGS -x c++ \
|
|
|
|
-fsyntax-only -march=native - -o /dev/null 2>/dev/null; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -march=native "
|
|
|
|
else
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -march=z196 "
|
|
|
|
fi
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS"
|
2018-04-03 08:45:39 +02:00
|
|
|
elif [ "$TARGET_OS" == "IOS" ]; then
|
2018-03-06 21:27:07 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS"
|
2019-09-13 20:04:52 +02:00
|
|
|
elif [ "$TARGET_OS" == "AIX" ] || [ "$TARGET_OS" == "SunOS" ]; then
|
|
|
|
# TODO: Not sure why we don't use -march=native on these OSes
|
|
|
|
if test "$USE_SSE"; then
|
|
|
|
TRY_SSE_ETC="1"
|
|
|
|
fi
|
|
|
|
else
|
2016-01-19 16:08:19 +01:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -march=native "
|
|
|
|
fi
|
2019-09-13 20:04:52 +02:00
|
|
|
else
|
|
|
|
# PORTABLE=1
|
|
|
|
if test "$USE_SSE"; then
|
|
|
|
TRY_SSE_ETC="1"
|
|
|
|
fi
|
2020-06-25 22:54:07 +02:00
|
|
|
|
2021-10-22 19:12:09 +02:00
|
|
|
if test -n "`echo $TARGET_ARCHITECTURE | grep ^s390x`"; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -march=z196 "
|
|
|
|
fi
|
|
|
|
|
2020-06-25 22:54:07 +02:00
|
|
|
if [[ "${PLATFORM}" == "OS_MACOSX" ]]; then
|
|
|
|
# For portability compile for macOS 10.12 (2016) or newer
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -mmacosx-version-min=10.12"
|
|
|
|
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -mmacosx-version-min=10.12"
|
2021-11-16 21:16:11 +01:00
|
|
|
# -mmacosx-version-min must come first here.
|
|
|
|
PLATFORM_SHARED_LDFLAGS="-mmacosx-version-min=10.12 $PLATFORM_SHARED_LDFLAGS"
|
2020-11-18 00:28:56 +01:00
|
|
|
PLATFORM_CMAKE_FLAGS="-DCMAKE_OSX_DEPLOYMENT_TARGET=10.12"
|
|
|
|
JAVA_STATIC_DEPS_COMMON_FLAGS="-mmacosx-version-min=10.12"
|
|
|
|
JAVA_STATIC_DEPS_LDFLAGS="$JAVA_STATIC_DEPS_COMMON_FLAGS"
|
|
|
|
JAVA_STATIC_DEPS_CCFLAGS="$JAVA_STATIC_DEPS_COMMON_FLAGS"
|
|
|
|
JAVA_STATIC_DEPS_CXXFLAGS="$JAVA_STATIC_DEPS_COMMON_FLAGS"
|
2020-06-25 22:54:07 +02:00
|
|
|
fi
|
2014-12-15 11:29:41 +01:00
|
|
|
fi
|
2012-10-17 20:59:44 +02:00
|
|
|
|
2021-03-30 01:31:26 +02:00
|
|
|
if test -n "`echo $TARGET_ARCHITECTURE | grep ^ppc64`"; then
|
|
|
|
# check for GNU libc on ppc64
|
|
|
|
$CXX -x c++ - -o /dev/null 2>/dev/null <<EOF
|
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
|
|
|
#include <gnu/libc-version.h>
|
|
|
|
|
|
|
|
int main(int argc, char *argv[]) {
|
|
|
|
printf("GNU libc version: %s\n", gnu_get_libc_version());
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" != 0 ]; then
|
|
|
|
PPC_LIBC_IS_GNU=0
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
2019-09-13 20:04:52 +02:00
|
|
|
if test "$TRY_SSE_ETC"; then
|
|
|
|
# The USE_SSE flag now means "attempt to compile with widely-available
|
|
|
|
# Intel architecture extensions utilized by specific optimizations in the
|
|
|
|
# source code." It's a qualifier on PORTABLE=1 that means "mostly portable."
|
|
|
|
# It doesn't even really check that your current CPU is compatible.
|
|
|
|
#
|
|
|
|
# SSE4.2 available since nehalem, ca. 2008-2010
|
2020-09-03 18:31:28 +02:00
|
|
|
# Includes POPCNT for BitsSetToOne, BitParity
|
2019-09-13 20:04:52 +02:00
|
|
|
TRY_SSE42="-msse4.2"
|
|
|
|
# PCLMUL available since westmere, ca. 2010-2011
|
|
|
|
TRY_PCLMUL="-mpclmul"
|
New Bloom filter implementation for full and partitioned filters (#6007)
Summary:
Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter.
Speed
The improved speed, at least on recent x86_64, comes from
* Using fastrange instead of modulo (%)
* Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only *slightly* slower on keys around 12 bytes if hashing the same size many thousands of times in a row.
* Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc.
* Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes.
Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed):
$ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter
Build avg ns/key: 47.7135
Mixed inside/outside queries...
Single filter net ns/op: 26.2825
Random filter net ns/op: 150.459
Average FP rate %: 0.954651
$ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter
Build avg ns/key: 47.2245
Mixed inside/outside queries...
Single filter net ns/op: 63.2978
Random filter net ns/op: 188.038
Average FP rate %: 1.13823
Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected.
The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome.
Accuracy
The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices
within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments.
Accuracy data (generalizes, except old impl gets worse with millions of keys):
Memory bits per key: FP rate percent old impl -> FP rate percent new impl
6: 5.70953 -> 5.69888
8: 2.45766 -> 2.29709
10: 1.13977 -> 0.959254
12: 0.662498 -> 0.411593
16: 0.353023 -> 0.0873754
24: 0.261552 -> 0.0060971
50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP)
Fixes https://github.com/facebook/rocksdb/issues/5857
Fixes https://github.com/facebook/rocksdb/issues/4120
Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized.
Compatibility
Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007
Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version).
Differential Revision: D18294749
Pulled By: pdillinger
fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f
2019-11-14 01:31:26 +01:00
|
|
|
# AVX2 available since haswell, ca. 2013-2015
|
|
|
|
TRY_AVX2="-mavx2"
|
2020-09-03 18:31:28 +02:00
|
|
|
# BMI available since haswell, ca. 2013-2015
|
|
|
|
# Primarily for TZCNT for CountTrailingZeroBits
|
|
|
|
TRY_BMI="-mbmi"
|
|
|
|
# LZCNT available since haswell, ca. 2013-2015
|
|
|
|
# For FloorLog2
|
|
|
|
TRY_LZCNT="-mlzcnt"
|
2019-09-13 20:04:52 +02:00
|
|
|
fi
|
|
|
|
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS $TRY_SSE42 -x c++ - -o test.o 2>/dev/null <<EOF
|
cross-platform compatibility improvements
Summary:
We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS.
See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build.
I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream.
Closes https://github.com/facebook/rocksdb/pull/2199
Differential Revision: D5054042
Pulled By: yiwu-arbug
fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
2017-05-15 23:42:32 +02:00
|
|
|
#include <cstdint>
|
|
|
|
#include <nmmintrin.h>
|
|
|
|
int main() {
|
|
|
|
volatile uint32_t x = _mm_crc32_u32(0, 0);
|
2019-05-16 00:57:04 +02:00
|
|
|
(void)x;
|
cross-platform compatibility improvements
Summary:
We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS.
See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build.
I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream.
Closes https://github.com/facebook/rocksdb/pull/2199
Differential Revision: D5054042
Pulled By: yiwu-arbug
fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
2017-05-15 23:42:32 +02:00
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
2019-09-13 20:04:52 +02:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS $TRY_SSE42 -DHAVE_SSE42"
|
cross-platform compatibility improvements
Summary:
We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS.
See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build.
I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream.
Closes https://github.com/facebook/rocksdb/pull/2199
Differential Revision: D5054042
Pulled By: yiwu-arbug
fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
2017-05-15 23:42:32 +02:00
|
|
|
elif test "$USE_SSE"; then
|
2019-05-16 00:57:04 +02:00
|
|
|
echo "warning: USE_SSE specified but compiler could not use SSE intrinsics, disabling" >&2
|
Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**
RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
1) ReadRandom in db_bench overall metrics
PER RUN
Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
3-way | 1 | 4.143 | 241387 | 26.7
3-way | 2 | 3.775 | 264872 | 29.3
3-way | 3 | 4.116 | 242929 | 26.9
FastCrc32c|1 | 4.037 | 247727 | 27.4
FastCrc32c|2 | 4.648 | 215166 | 23.8
FastCrc32c|3 | 4.352 | 229799 | 25.4
AVG
Algorithm | Average of micros/op | Average of ops/sec | Average of Throughput (MB/s)
3-way | 4.01 | 249,729 | 27.63
FastCrc32c | 4.35 | 230,897 | 25.53
2) Crc32c computation CPU cost (inclusive samples percentage)
PER RUN
Implementation | run | TotalSamples | Crc32c percentage
3-way | 1 | 4,572,250,000 | 4.37%
3-way | 2 | 3,779,250,000 | 4.62%
3-way | 3 | 4,129,500,000 | 4.48%
FastCrc32c | 1 | 4,663,500,000 | 11.24%
FastCrc32c | 2 | 4,047,500,000 | 12.34%
FastCrc32c | 3 | 4,366,750,000 | 11.68%
**# Test Plan**
make -j64 corruption_test && ./corruption_test
By default it uses 3-way SSE algorithm
NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
make clean && DEBUG_LEVEL=0 make -j64 db_bench
make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173
Differential Revision: D6330882
Pulled By: yingsu00
fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-20 03:20:50 +01:00
|
|
|
fi
|
|
|
|
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS $TRY_PCLMUL -x c++ - -o test.o 2>/dev/null <<EOF
|
Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**
RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
1) ReadRandom in db_bench overall metrics
PER RUN
Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
3-way | 1 | 4.143 | 241387 | 26.7
3-way | 2 | 3.775 | 264872 | 29.3
3-way | 3 | 4.116 | 242929 | 26.9
FastCrc32c|1 | 4.037 | 247727 | 27.4
FastCrc32c|2 | 4.648 | 215166 | 23.8
FastCrc32c|3 | 4.352 | 229799 | 25.4
AVG
Algorithm | Average of micros/op | Average of ops/sec | Average of Throughput (MB/s)
3-way | 4.01 | 249,729 | 27.63
FastCrc32c | 4.35 | 230,897 | 25.53
2) Crc32c computation CPU cost (inclusive samples percentage)
PER RUN
Implementation | run | TotalSamples | Crc32c percentage
3-way | 1 | 4,572,250,000 | 4.37%
3-way | 2 | 3,779,250,000 | 4.62%
3-way | 3 | 4,129,500,000 | 4.48%
FastCrc32c | 1 | 4,663,500,000 | 11.24%
FastCrc32c | 2 | 4,047,500,000 | 12.34%
FastCrc32c | 3 | 4,366,750,000 | 11.68%
**# Test Plan**
make -j64 corruption_test && ./corruption_test
By default it uses 3-way SSE algorithm
NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
make clean && DEBUG_LEVEL=0 make -j64 db_bench
make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173
Differential Revision: D6330882
Pulled By: yingsu00
fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-20 03:20:50 +01:00
|
|
|
#include <cstdint>
|
|
|
|
#include <wmmintrin.h>
|
|
|
|
int main() {
|
|
|
|
const auto a = _mm_set_epi64x(0, 0);
|
|
|
|
const auto b = _mm_set_epi64x(0, 0);
|
|
|
|
const auto c = _mm_clmulepi64_si128(a, b, 0x00);
|
|
|
|
auto d = _mm_cvtsi128_si64(c);
|
2019-05-16 00:57:04 +02:00
|
|
|
(void)d;
|
Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**
RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
1) ReadRandom in db_bench overall metrics
PER RUN
Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
3-way | 1 | 4.143 | 241387 | 26.7
3-way | 2 | 3.775 | 264872 | 29.3
3-way | 3 | 4.116 | 242929 | 26.9
FastCrc32c|1 | 4.037 | 247727 | 27.4
FastCrc32c|2 | 4.648 | 215166 | 23.8
FastCrc32c|3 | 4.352 | 229799 | 25.4
AVG
Algorithm | Average of micros/op | Average of ops/sec | Average of Throughput (MB/s)
3-way | 4.01 | 249,729 | 27.63
FastCrc32c | 4.35 | 230,897 | 25.53
2) Crc32c computation CPU cost (inclusive samples percentage)
PER RUN
Implementation | run | TotalSamples | Crc32c percentage
3-way | 1 | 4,572,250,000 | 4.37%
3-way | 2 | 3,779,250,000 | 4.62%
3-way | 3 | 4,129,500,000 | 4.48%
FastCrc32c | 1 | 4,663,500,000 | 11.24%
FastCrc32c | 2 | 4,047,500,000 | 12.34%
FastCrc32c | 3 | 4,366,750,000 | 11.68%
**# Test Plan**
make -j64 corruption_test && ./corruption_test
By default it uses 3-way SSE algorithm
NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
make clean && DEBUG_LEVEL=0 make -j64 db_bench
make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173
Differential Revision: D6330882
Pulled By: yingsu00
fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-20 03:20:50 +01:00
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
2019-09-13 20:04:52 +02:00
|
|
|
COMMON_FLAGS="$COMMON_FLAGS $TRY_PCLMUL -DHAVE_PCLMUL"
|
Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**
RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.
This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.
**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.
Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.
1) ReadRandom in db_bench overall metrics
PER RUN
Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
3-way | 1 | 4.143 | 241387 | 26.7
3-way | 2 | 3.775 | 264872 | 29.3
3-way | 3 | 4.116 | 242929 | 26.9
FastCrc32c|1 | 4.037 | 247727 | 27.4
FastCrc32c|2 | 4.648 | 215166 | 23.8
FastCrc32c|3 | 4.352 | 229799 | 25.4
AVG
Algorithm | Average of micros/op | Average of ops/sec | Average of Throughput (MB/s)
3-way | 4.01 | 249,729 | 27.63
FastCrc32c | 4.35 | 230,897 | 25.53
2) Crc32c computation CPU cost (inclusive samples percentage)
PER RUN
Implementation | run | TotalSamples | Crc32c percentage
3-way | 1 | 4,572,250,000 | 4.37%
3-way | 2 | 3,779,250,000 | 4.62%
3-way | 3 | 4,129,500,000 | 4.48%
FastCrc32c | 1 | 4,663,500,000 | 11.24%
FastCrc32c | 2 | 4,047,500,000 | 12.34%
FastCrc32c | 3 | 4,366,750,000 | 11.68%
**# Test Plan**
make -j64 corruption_test && ./corruption_test
By default it uses 3-way SSE algorithm
NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test
make clean && DEBUG_LEVEL=0 make -j64 db_bench
make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173
Differential Revision: D6330882
Pulled By: yingsu00
fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-20 03:20:50 +01:00
|
|
|
elif test "$USE_SSE"; then
|
2019-05-16 00:57:04 +02:00
|
|
|
echo "warning: USE_SSE specified but compiler could not use PCLMUL intrinsics, disabling" >&2
|
cross-platform compatibility improvements
Summary:
We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS.
See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build.
I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream.
Closes https://github.com/facebook/rocksdb/pull/2199
Differential Revision: D5054042
Pulled By: yiwu-arbug
fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
2017-05-15 23:42:32 +02:00
|
|
|
fi
|
|
|
|
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS $TRY_AVX2 -x c++ - -o test.o 2>/dev/null <<EOF
|
New Bloom filter implementation for full and partitioned filters (#6007)
Summary:
Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter.
Speed
The improved speed, at least on recent x86_64, comes from
* Using fastrange instead of modulo (%)
* Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only *slightly* slower on keys around 12 bytes if hashing the same size many thousands of times in a row.
* Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc.
* Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes.
Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed):
$ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter
Build avg ns/key: 47.7135
Mixed inside/outside queries...
Single filter net ns/op: 26.2825
Random filter net ns/op: 150.459
Average FP rate %: 0.954651
$ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter
Build avg ns/key: 47.2245
Mixed inside/outside queries...
Single filter net ns/op: 63.2978
Random filter net ns/op: 188.038
Average FP rate %: 1.13823
Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected.
The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome.
Accuracy
The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices
within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments.
Accuracy data (generalizes, except old impl gets worse with millions of keys):
Memory bits per key: FP rate percent old impl -> FP rate percent new impl
6: 5.70953 -> 5.69888
8: 2.45766 -> 2.29709
10: 1.13977 -> 0.959254
12: 0.662498 -> 0.411593
16: 0.353023 -> 0.0873754
24: 0.261552 -> 0.0060971
50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP)
Fixes https://github.com/facebook/rocksdb/issues/5857
Fixes https://github.com/facebook/rocksdb/issues/4120
Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized.
Compatibility
Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007
Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version).
Differential Revision: D18294749
Pulled By: pdillinger
fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f
2019-11-14 01:31:26 +01:00
|
|
|
#include <cstdint>
|
|
|
|
#include <immintrin.h>
|
|
|
|
int main() {
|
|
|
|
const auto a = _mm256_setr_epi32(0, 1, 2, 3, 4, 7, 6, 5);
|
|
|
|
const auto b = _mm256_permutevar8x32_epi32(a, a);
|
|
|
|
(void)b;
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS $TRY_AVX2 -DHAVE_AVX2"
|
|
|
|
elif test "$USE_SSE"; then
|
|
|
|
echo "warning: USE_SSE specified but compiler could not use AVX2 intrinsics, disabling" >&2
|
|
|
|
fi
|
|
|
|
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS $TRY_BMI -x c++ - -o test.o 2>/dev/null <<EOF
|
2020-09-03 18:31:28 +02:00
|
|
|
#include <cstdint>
|
|
|
|
#include <immintrin.h>
|
|
|
|
int main(int argc, char *argv[]) {
|
|
|
|
(void)argv;
|
|
|
|
return (int)_tzcnt_u64((uint64_t)argc);
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS $TRY_BMI -DHAVE_BMI"
|
|
|
|
elif test "$USE_SSE"; then
|
|
|
|
echo "warning: USE_SSE specified but compiler could not use BMI intrinsics, disabling" >&2
|
|
|
|
fi
|
|
|
|
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS $TRY_LZCNT -x c++ - -o test.o 2>/dev/null <<EOF
|
2020-09-03 18:31:28 +02:00
|
|
|
#include <cstdint>
|
|
|
|
#include <immintrin.h>
|
|
|
|
int main(int argc, char *argv[]) {
|
|
|
|
(void)argv;
|
|
|
|
return (int)_lzcnt_u64((uint64_t)argc);
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS $TRY_LZCNT -DHAVE_LZCNT"
|
|
|
|
elif test "$USE_SSE"; then
|
|
|
|
echo "warning: USE_SSE specified but compiler could not use LZCNT intrinsics, disabling" >&2
|
|
|
|
fi
|
|
|
|
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -x c++ - -o test.o 2>/dev/null <<EOF
|
2019-10-25 02:14:27 +02:00
|
|
|
#include <cstdint>
|
|
|
|
int main() {
|
|
|
|
uint64_t a = 0xffffFFFFffffFFFF;
|
|
|
|
__uint128_t b = __uint128_t(a) * a;
|
|
|
|
a = static_cast<uint64_t>(b >> 64);
|
|
|
|
(void)a;
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DHAVE_UINT128_EXTENSION"
|
|
|
|
fi
|
|
|
|
|
2022-02-05 02:12:03 +01:00
|
|
|
# thread_local is part of C++11 and later (TODO: clean up this define)
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_SUPPORT_THREAD_LOCAL"
|
2020-04-20 22:21:34 +02:00
|
|
|
|
2019-06-04 07:59:54 +02:00
|
|
|
if [ "$FBCODE_BUILD" != "true" -a "$PLATFORM" = OS_LINUX ]; then
|
|
|
|
$CXX $COMMON_FLAGS $PLATFORM_SHARED_CFLAGS -x c++ -c - -o test_dl.o 2>/dev/null <<EOF
|
|
|
|
void dummy_func() {}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
2022-01-14 23:08:05 +01:00
|
|
|
$CXX $COMMON_FLAGS $PLATFORM_SHARED_LDFLAGS test_dl.o -o test.o 2>/dev/null
|
2019-06-04 07:59:54 +02:00
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
EXEC_LDFLAGS+="-ldl"
|
|
|
|
rm -f test_dl.o
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
fi
|
|
|
|
|
2022-01-19 05:21:37 +01:00
|
|
|
# check for F_FULLFSYNC
|
|
|
|
$CXX $PLATFORM_CXXFALGS -x c++ - -o test.o 2>/dev/null <<EOF
|
|
|
|
#include <fcntl.h>
|
|
|
|
int main() {
|
|
|
|
fcntl(0, F_FULLFSYNC);
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
EOF
|
|
|
|
if [ "$?" = 0 ]; then
|
|
|
|
COMMON_FLAGS="$COMMON_FLAGS -DHAVE_FULLFSYNC"
|
|
|
|
fi
|
|
|
|
|
|
|
|
rm -f test.o test_dl.o
|
2022-01-14 23:08:05 +01:00
|
|
|
|
2012-03-21 18:28:03 +01:00
|
|
|
PLATFORM_CCFLAGS="$PLATFORM_CCFLAGS $COMMON_FLAGS"
|
|
|
|
PLATFORM_CXXFLAGS="$PLATFORM_CXXFLAGS $COMMON_FLAGS"
|
|
|
|
|
2013-03-07 20:11:30 +01:00
|
|
|
VALGRIND_VER="$VALGRIND_VER"
|
|
|
|
|
2014-10-02 20:59:22 +02:00
|
|
|
ROCKSDB_MAJOR=`build_tools/version.sh major`
|
|
|
|
ROCKSDB_MINOR=`build_tools/version.sh minor`
|
|
|
|
ROCKSDB_PATCH=`build_tools/version.sh patch`
|
|
|
|
|
2014-05-11 06:01:25 +02:00
|
|
|
echo "CC=$CC" >> "$OUTPUT"
|
|
|
|
echo "CXX=$CXX" >> "$OUTPUT"
|
2020-07-28 22:09:12 +02:00
|
|
|
echo "AR=$AR" >> "$OUTPUT"
|
2014-05-11 06:01:25 +02:00
|
|
|
echo "PLATFORM=$PLATFORM" >> "$OUTPUT"
|
|
|
|
echo "PLATFORM_LDFLAGS=$PLATFORM_LDFLAGS" >> "$OUTPUT"
|
2020-11-18 00:28:56 +01:00
|
|
|
echo "PLATFORM_CMAKE_FLAGS=$PLATFORM_CMAKE_FLAGS" >> "$OUTPUT"
|
2014-07-22 07:41:54 +02:00
|
|
|
echo "JAVA_LDFLAGS=$JAVA_LDFLAGS" >> "$OUTPUT"
|
2015-10-09 20:41:40 +02:00
|
|
|
echo "JAVA_STATIC_LDFLAGS=$JAVA_STATIC_LDFLAGS" >> "$OUTPUT"
|
2020-11-18 00:28:56 +01:00
|
|
|
echo "JAVA_STATIC_DEPS_CCFLAGS=$JAVA_STATIC_DEPS_CCFLAGS" >> "$OUTPUT"
|
|
|
|
echo "JAVA_STATIC_DEPS_CXXFLAGS=$JAVA_STATIC_DEPS_CXXFLAGS" >> "$OUTPUT"
|
|
|
|
echo "JAVA_STATIC_DEPS_LDFLAGS=$JAVA_STATIC_DEPS_LDFLAGS" >> "$OUTPUT"
|
2020-03-12 20:22:03 +01:00
|
|
|
echo "JAVAC_ARGS=$JAVAC_ARGS" >> "$OUTPUT"
|
2014-05-11 06:01:25 +02:00
|
|
|
echo "VALGRIND_VER=$VALGRIND_VER" >> "$OUTPUT"
|
|
|
|
echo "PLATFORM_CCFLAGS=$PLATFORM_CCFLAGS" >> "$OUTPUT"
|
|
|
|
echo "PLATFORM_CXXFLAGS=$PLATFORM_CXXFLAGS" >> "$OUTPUT"
|
|
|
|
echo "PLATFORM_SHARED_CFLAGS=$PLATFORM_SHARED_CFLAGS" >> "$OUTPUT"
|
|
|
|
echo "PLATFORM_SHARED_EXT=$PLATFORM_SHARED_EXT" >> "$OUTPUT"
|
|
|
|
echo "PLATFORM_SHARED_LDFLAGS=$PLATFORM_SHARED_LDFLAGS" >> "$OUTPUT"
|
|
|
|
echo "PLATFORM_SHARED_VERSIONED=$PLATFORM_SHARED_VERSIONED" >> "$OUTPUT"
|
|
|
|
echo "EXEC_LDFLAGS=$EXEC_LDFLAGS" >> "$OUTPUT"
|
|
|
|
echo "JEMALLOC_INCLUDE=$JEMALLOC_INCLUDE" >> "$OUTPUT"
|
|
|
|
echo "JEMALLOC_LIB=$JEMALLOC_LIB" >> "$OUTPUT"
|
2014-10-02 20:59:22 +02:00
|
|
|
echo "ROCKSDB_MAJOR=$ROCKSDB_MAJOR" >> "$OUTPUT"
|
|
|
|
echo "ROCKSDB_MINOR=$ROCKSDB_MINOR" >> "$OUTPUT"
|
|
|
|
echo "ROCKSDB_PATCH=$ROCKSDB_PATCH" >> "$OUTPUT"
|
2015-02-04 06:43:06 +01:00
|
|
|
echo "CLANG_SCAN_BUILD=$CLANG_SCAN_BUILD" >> "$OUTPUT"
|
|
|
|
echo "CLANG_ANALYZER=$CLANG_ANALYZER" >> "$OUTPUT"
|
2016-04-23 01:49:12 +02:00
|
|
|
echo "PROFILING_FLAGS=$PROFILING_FLAGS" >> "$OUTPUT"
|
2018-03-19 20:11:58 +01:00
|
|
|
echo "FIND=$FIND" >> "$OUTPUT"
|
|
|
|
echo "WATCH=$WATCH" >> "$OUTPUT"
|
2017-08-04 19:27:39 +02:00
|
|
|
# This will enable some related identifiers for the preprocessor
|
2016-04-28 03:25:19 +02:00
|
|
|
if test -n "$JEMALLOC"; then
|
2016-04-28 01:23:33 +02:00
|
|
|
echo "JEMALLOC=1" >> "$OUTPUT"
|
|
|
|
fi
|
2017-08-04 19:27:39 +02:00
|
|
|
# Indicates that jemalloc should be enabled using -ljemalloc flag
|
|
|
|
# The alternative is to porvide a direct link to the library via JEMALLOC_LIB
|
|
|
|
# and JEMALLOC_INCLUDE
|
|
|
|
if test -n "$WITH_JEMALLOC_FLAG"; then
|
|
|
|
echo "WITH_JEMALLOC_FLAG=$WITH_JEMALLOC_FLAG" >> "$OUTPUT"
|
|
|
|
fi
|
2016-11-17 00:27:02 +01:00
|
|
|
echo "LUA_PATH=$LUA_PATH" >> "$OUTPUT"
|
2019-08-07 23:29:35 +02:00
|
|
|
if test -n "$USE_FOLLY_DISTRIBUTED_MUTEX"; then
|
|
|
|
echo "USE_FOLLY_DISTRIBUTED_MUTEX=$USE_FOLLY_DISTRIBUTED_MUTEX" >> "$OUTPUT"
|
|
|
|
fi
|
2021-03-30 01:31:26 +02:00
|
|
|
if test -n "$PPC_LIBC_IS_GNU"; then
|
|
|
|
echo "PPC_LIBC_IS_GNU=$PPC_LIBC_IS_GNU" >> "$OUTPUT"
|
|
|
|
fi
|