Commit Graph

688 Commits

Author SHA1 Message Date
Cheng Chang
91b7553293 Enable IO Uring in MultiGet in direct IO mode (#6815)
Summary:
Currently, in direct IO mode, `MultiGet` retrieves the data blocks one by one instead of in parallel, see `BlockBasedTable::RetrieveMultipleBlocks`.

Since direct IO is supported in `RandomAccessFileReader::MultiRead` in https://github.com/facebook/rocksdb/pull/6446, this PR applies `MultiRead` to `MultiGet` so that the data blocks can be retrieved in parallel.

Also, in direct IO mode and when data blocks are compressed and need to uncompressed, this PR only allocates one continuous aligned buffer to hold the data blocks, and then directly uncompress the blocks to insert into block cache, there is no longer intermediate copies to scratch buffers.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6815

Test Plan:
1. added a new unit test `BlockBasedTableReaderTest::MultiGet`.
2. existing unit tests and stress tests  contain tests against `MultiGet` in direct IO mode.

Reviewed By: anand1976

Differential Revision: D21426347

Pulled By: cheng-chang

fbshipit-source-id: b8446ae0e74152444ef9111e97f8e402ac31b24f
2020-05-14 23:26:26 -07:00
Andrew Kryczka
1c84660457 prototype status check enforcement (#6798)
Summary:
Tried making Status object enforce that it is checked in some way. In cases it is not checked, `PermitUncheckedError()` must be called explicitly.

Added a way to run tests (`ASSERT_STATUS_CHECKED=1 make -j48 check`) on a
whitelist. The effort appears significant to get each test to pass with
this assertion, so I only fixed up enough to get one test (`options_test`)
working and added it to the whitelist.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6798

Reviewed By: pdillinger

Differential Revision: D21377404

Pulled By: ajkr

fbshipit-source-id: 73236f9c8df38f01cf24ecac4a6d1661b72d077e
2020-05-08 12:40:43 -07:00
Yanqin Jin
6acbbbf9fc Add Github Action for some basic sanity test of PR (#6761)
Summary:
Add Github Action to perform some basic sanity check for PR, inclding the
following.
1) Buck TARGETS file.
On the one hand, The TARGETS file is used for internal buck, and we do not
manually update it. On the other hand, we need to run the buckifier scripts to
update TARGETS whenever new files are added, etc. With this Github Action, we
make sure that every PR does not forget this step. The GH Action uses
a Makefile target called check-buck-targets. Users can manually run `make
check-buck-targets` on local machine.

2) Code format
We use clang-format-diff.py to format our code. The GH Action in this PR makes
sure this step is not skipped. The checking script build_tools/format-diff.sh assumes that `clang-format-diff.py` is executable.
On host running GH Action, it is difficult to download `clang-format-diff.py` and make it
executable. Therefore, we modified build_tools/format-diff.sh to handle the case in which there is a non-executable clang-format-diff.py file in the top-level rocksdb repo directory.

Test Plan (Github and devserver):
Watch for Github Action result in the `Checks` tab.
On dev server
```
make check-format
make check-buck-targets
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6761

Test Plan: Watch for Github Action result in the `Checks` tab.

Reviewed By: pdillinger

Differential Revision: D21260209

Pulled By: riversand963

fbshipit-source-id: c646e2f37c6faf9f0614b68aa0efc818cff96787
2020-04-30 19:22:45 -07:00
Peter Dillinger
791e5714a5 Understand common build variables passed as make variables (#6740)
Summary:
Some common build variables like USE_CLANG and
COMPILE_WITH_UBSAN did not work if specified as make variables, as in
`make USE_CLANG=1 check` etc. rather than (in theory less hygienic)
`USE_CLANG=1 make check`. This patches Makefile to export some commonly
used ones to build_detect_platform so that they work. (I'm skeptical of
a broad `export` in Makefile because it's hard to predict how random
make variables might affect various invoked tools.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6740

Test Plan: manual / CI

Reviewed By: siying

Differential Revision: D21229011

Pulled By: pdillinger

fbshipit-source-id: b00c69b23eb2a13105bc8d860ce2d1e61ac5a355
2020-04-27 10:48:49 -07:00
Cheng Chang
40497a875a Reduce memory copies when fetching and uncompressing blocks from SST files (#6689)
Summary:
In https://github.com/facebook/rocksdb/pull/6455, we modified the interface of `RandomAccessFileReader::Read` to be able to get rid of memcpy in direct IO mode.
This PR applies the new interface to `BlockFetcher` when reading blocks from SST files in direct IO mode.

Without this PR, in direct IO mode, when fetching and uncompressing compressed blocks, `BlockFetcher` will first copy the raw compressed block into `BlockFetcher::compressed_buf_` or `BlockFetcher::stack_buf_` inside `RandomAccessFileReader::Read` depending on the block size. then during uncompressing, it will copy the uncompressed block into `BlockFetcher::heap_buf_`.

In this PR, we get rid of the first memcpy and directly uncompress the block from `direct_io_buf_` to `heap_buf_`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6689

Test Plan: A new unit test `block_fetcher_test` is added.

Reviewed By: anand1976

Differential Revision: D21006729

Pulled By: cheng-chang

fbshipit-source-id: 2370b92c24075692423b81277415feb2aed5d980
2020-04-24 15:32:56 -07:00
Adam Retter
5fef0ffd66 Update RocksJava static version of bzip2 (#6714)
Summary:
Updates the version of bzip2 used for RocksJava static builds.

Please, can we also get this cherry-picked to:

1. 6.7.fb
2. 6.8.fb
3. 6.9.fb
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6714

Reviewed By: cheng-chang

Differential Revision: D21067233

Pulled By: pdillinger

fbshipit-source-id: 8164b7eb99c5ca7b2021ab8c371ba9ded4cb4f7e
2020-04-16 15:35:24 -07:00
anand76
5c19a441c4 Fault injection in db_stress (#6538)
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538

Test Plan:
crash_test
make check

Reviewed By: riversand963

Differential Revision: D20714347

Pulled By: anand1976

fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
2020-04-10 17:21:26 -07:00
Yanqin Jin
0c05624d50 Compaction with timestamp: input boundaries (#6645)
Summary:
Towards making compaction logic compatible with user timestamp.
When computing boundaries and overlapping ranges for inputs of compaction, We need to compare SSTs by user key without timestamp.

Test plan (devserver):
```
make check
```
Several individual tests:
```
./version_set_test --gtest_filter=VersionStorageInfoTimestampTest.GetOverlappingInputs
./db_with_timestamp_compaction_test
./db_with_timestamp_basic_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6645

Reviewed By: ltamasi

Differential Revision: D20960012

Pulled By: riversand963

fbshipit-source-id: ad377fa9eb481bf7a8a3e1824aaade48cdc653a4
2020-04-10 16:05:49 -07:00
Luca Giacchino
66a95f0fac Provide an allocator for new memory type to be used with RocksDB block cache (#6214)
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.

**Performance**

We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.

The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.

![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)

**Changes**

- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator

**Minimum Requirements**

- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind

**Memory Configuration**

The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.

Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.

**Usage**

When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):

```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"

NewLRUCache(
    capacity /*size_t*/,
    6 /*cache_numshardbits*/,
    false /*strict_capacity_limit*/,
    false /*cache_high_pri_pool_ratio*/,
    std::make_shared<MemkindKmemAllocator>());
```

Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214

Reviewed By: cheng-chang

Differential Revision: D19292435

fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
2020-04-09 20:47:23 -07:00
Cheng Chang
d648a0e17f Add unit test for TransactionLockMgr (#6599)
Summary:
Although there are tests related to locking in transaction_test, this new test directly tests against TransactionLockMgr.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6599

Test Plan: make transaction_lock_mgr_test && ./transaction_lock_mgr_test

Reviewed By: lth

Differential Revision: D20673749

Pulled By: cheng-chang

fbshipit-source-id: 1fa4a13218e68d785f5a99924556751a8c5c0f31
2020-04-08 13:51:51 -07:00
Sagar Vemuri
0355d14dd9 Add a simple timer support to schedule work at fixed times/intervals (#6543)
Summary:
Adding a simple timer support to schedule work at a fixed time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6543

Test Plan: TODO: clean up the unit tests, and make them better.

Reviewed By: siying

Differential Revision: D20465390

Pulled By: sagar0

fbshipit-source-id: cba143f70b6339863e1d0f8b8bf92e51c2b3d678
2020-04-07 11:55:27 -07:00
Peter Dillinger
a67fb4c9bd Add some timestamps in CI build+test output (#6643)
Summary:
When Travis times out, it's hard to determine whether
the last executing thing took an excessively long time or the
sum of all the work just exceeded the time limit. This
change inserts some timestamps in the output that should
make this easier to determine.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6643

Test Plan: CI (Travis mostly)

Reviewed By: anand1976

Differential Revision: D20843901

Pulled By: pdillinger

fbshipit-source-id: e7aae5434b0c609931feddf238ce4355964488b7
2020-04-04 10:02:07 -07:00
Ziyue Yang
03a781a90c Add pipelined & parallel compression optimization (#6262)
Summary:
This PR adds support for pipelined & parallel compression optimization for `BlockBasedTableBuilder`. This optimization makes block building, block compression and block appending a pipeline, and uses multiple threads to accelerate block compression. Users can set `CompressionOptions::parallel_threads` greater than 1 to enable compression parallelism.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6262

Reviewed By: ajkr

Differential Revision: D20651306

fbshipit-source-id: 62125590a9c15b6d9071def9dc72589c1696a4cb
2020-04-01 16:40:18 -07:00
Yanqin Jin
ccf7676455 Update a few scripts to be python3 compatible (#6525)
Summary:
There are a few scripts with python3 compatibility issues that were not
detected by automated tool before. Update them now.

Test Plan (devserver):
python2 tools/ldb_test.py
python3 tools/ldb_test.py

python2 tools/write_stress_runner.py --runtime_sec=30
python3 tools/write_stress_runner.py --runtime_sec=30

python2 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox
python3 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox

python2 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
python3 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6525

Reviewed By: cheng-chang

Differential Revision: D20627820

Pulled By: riversand963

fbshipit-source-id: 4b25a7bd4d001c7f868be8b640ef876523be6ca3
2020-03-24 21:00:27 -07:00
Cheng Chang
4fc216649d Support direct IO in RandomAccessFileReader::MultiRead (#6446)
Summary:
By supporting direct IO in RandomAccessFileReader::MultiRead, the benefits of parallel IO (IO uring) and direct IO can be combined.

In direct IO mode, read requests are aligned and merged together before being issued to RandomAccessFile::MultiRead, so blocks in the original requests might share the same underlying buffer, the shared buffers are returned in `aligned_bufs`, which is a new parameter of the `MultiRead` API.

For example, suppose alignment requirement for direct IO is 4KB, one request is (offset: 1KB, len: 1KB), another request is (offset: 3KB, len: 1KB), then since they all belong to page (offset: 0, len: 4KB), `MultiRead` only reads the page with direct IO into a buffer on heap, and returns 2 Slices referencing regions in that same buffer. See `random_access_file_reader_test.cc` for more examples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6446

Test Plan: Added a new test `random_access_file_reader_test.cc`.

Reviewed By: anand1976

Differential Revision: D20097518

Pulled By: cheng-chang

fbshipit-source-id: ca48a8faf9c3af146465c102ef6b266a363e78d1
2020-03-20 16:33:26 -07:00
Zhichao Cao
5c30e6c088 Separate timestamp related test from db_basic_test (#6516)
Summary:
In some of the test, db_basic_test may cause time out due to its long running time. Separate the timestamp related test from db_basic_test to avoid the potential issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6516

Test Plan: pass make asan_check

Differential Revision: D20423922

Pulled By: zhichao-cao

fbshipit-source-id: d6306f89a8de55b07bf57233e4554c09ef1fe23a
2020-03-13 11:37:15 -07:00
Adam Retter
0772768d07 Force Java version on Travis CI (#6512)
Summary:
In the `.travis.yml` file the `jdk: openjdk7` element is ignored when `language: cpp`. So whatever version of the JDK that was installed in the Travis container was used - typically JDK 11.

To ensure our RocksJava builds are working, we now instead install and use OpenJDK 8. Ideally we would use OpenJDK 7, as RocksJava supports Java 7, but many of the newer Travis containers don't support Java 7, so Java 8 is the next best thing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6512

Differential Revision: D20388296

Pulled By: pdillinger

fbshipit-source-id: 8bbe6b59b70cfab7fe81ff63867d907fefdd2df1
2020-03-12 12:24:51 -07:00
Levi Tamasi
c15e85bdcb Move BlobDB related files under db/ to db/blob/ (#6519)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6519

Test Plan:
```
make all
make check
```

Differential Revision: D20400691

Pulled By: ltamasi

fbshipit-source-id: 20ef911cf1c2c92c7f71ef0b493f9be64f2eef94
2020-03-12 11:00:56 -07:00
Cheng Chang
2d9efc9ab2 Cache result of GetLogicalBufferSize in Linux (#6457)
Summary:
In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down.

This PR introduces two new APIs:
1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes.
2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes.

Other modifications:
1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms.
2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`.
3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457

Test Plan:
1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`.
2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`.

`make check`

Differential Revision: D20131243

Pulled By: cheng-chang

fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04
2020-03-11 18:40:05 -07:00
Adam Retter
65b60db9e1 Update to latest Snappy to fix compilation issue on latest MacOS XCode (#6496)
Summary:
* **macOS version:** 10.15.2 (Catalina)
* **XCode/Clang version:** Apple clang version 11.0.0 (clang-1100.0.33.16)

Before this bugfix the error generated is:

```
In file included from ./util/compression.h:23:
./snappy-1.1.7/snappy.h:76:59: error: unknown type name 'string'; did you mean 'std::string'?
  size_t Compress(const char* input, size_t input_length, string* output);
                                                          ^~~~~~
                                                          std::string
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here
typedef basic_string<char, char_traits<char>, allocator<char> > string;
                                                                ^
In file included from db/builder.cc:10:
In file included from ./db/builder.h:12:
In file included from ./db/range_tombstone_fragmenter.h:15:
In file included from ./db/pinned_iterators_manager.h:12:
In file included from ./table/internal_iterator.h:13:
In file included from ./table/format.h:25:
In file included from ./options/cf_options.h:14:
In file included from ./util/compression.h:23:
./snappy-1.1.7/snappy.h:85:19: error: unknown type name 'string'; did you mean 'std::string'?
                  string* uncompressed);
                  ^~~~~~
                  std::string
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here
typedef basic_string<char, char_traits<char>, allocator<char> > string;
                                                                ^
2 errors generated.
make: *** [jls/db/builder.o] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6496

Differential Revision: D20389254

Pulled By: pdillinger

fbshipit-source-id: 2864245c8d0dba7b2ab81294241a62f2adf02e20
2020-03-11 11:46:13 -07:00
Levi Tamasi
f5bc3b99d5 Split BlobFileState into an immutable and a mutable part (#6502)
Summary:
It's never too soon to refactor something. The patch splits the recently
introduced (`VersionEdit` related) `BlobFileState` into two classes
`BlobFileAddition` and `BlobFileGarbage`. The idea is that once blob files
are closed, they are immutable, and the only thing that changes is the
amount of garbage in them. In the new design, `BlobFileAddition` contains
the immutable attributes (currently, the count and total size of all blobs, checksum
method, and checksum value), while `BlobFileGarbage` contains the mutable
GC-related information elements (count and total size of garbage blobs). This is a
better fit for the GC logic and is more consistent with how SST files are handled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6502

Test Plan: `make check`

Differential Revision: D20348352

Pulled By: ltamasi

fbshipit-source-id: ff93f0121e80ab15e0e0a6525ba0d6af16a0e008
2020-03-10 17:27:26 -07:00
Zhichao Cao
e62fe50634 Introduce FaultInjectionTestFS to test fault File system instead of Env (#6414)
Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6414

Test Plan: pass make asan_check, pass error_handler_fs_test.

Differential Revision: D20252421

Pulled By: zhichao-cao

fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
2020-03-04 12:35:05 -08:00
Michael R. Crusoe
051696bf98 fix some spelling typos (#6464)
Summary:
Found from Debian's "Lintian" program
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6464

Differential Revision: D20162862

Pulled By: zhichao-cao

fbshipit-source-id: 06941ee2437b038b2b8045becbe9d2c6fbff3e12
2020-02-28 14:14:03 -08:00
Levi Tamasi
d87c10c6ab Add blob file state to VersionEdit (#6416)
Summary:
BlobDB currently does not keep track of blob files: no records are written to
the manifest when a blob file is added or removed, and upon opening a database,
the list of blob files is populated simply based on the contents of the blob directory.
This means that lost blob files cannot be detected at the moment. We plan to solve
this issue by making blob files a part of `Version`; as a first step, this patch makes
it possible to store information about blob files in `VersionEdit`. Currently, this information
includes blob file number, total number and size of all blobs, and total number and size
of garbage blobs. However, the format is extensible: new fields can be added in
both a forward compatible and a forward incompatible manner if needed (similarly
to `kNewFile4`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6416

Test Plan: `make check`

Differential Revision: D19894234

Pulled By: ltamasi

fbshipit-source-id: f9753e1f2aedf6dadb70c09b345207cb9c58c329
2020-02-24 18:39:53 -08:00
Cheng Chang
dafb568052 Add utility class Defer (#6382)
Summary:
Add a utility class `Defer` to defer the execution of a function until the Defer object goes out of scope.
Used in VersionSet:: ProcessManifestWrites as an example.
The inline comments for class `Defer` have more details.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6382

Test Plan: `make defer_test version_set_test && ./defer_test && ./version_set_test`

Differential Revision: D19797538

Pulled By: cheng-chang

fbshipit-source-id: b1a9b7306e4fd4f48ec2ab55783caa561a315f0f
2020-02-10 17:59:47 -08:00
Cheng Chang
b42fa1497f Support move semantics for PinnableSlice (#6374)
Summary:
It's logically correct for PinnableSlice to support move semantics to transfer ownership of the pinned memory region. This PR adds both move constructor and move assignment to PinnableSlice.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6374

Test Plan:
A set of unit tests for the move semantics are added in slice_test.
So `make slice_test && ./slice_test`.

Differential Revision: D19739254

Pulled By: cheng-chang

fbshipit-source-id: f898bd811bb05b2d87384ec58b645e9915e8e0b1
2020-02-07 14:26:26 -08:00
Peter Dillinger
90c71aa5d9 Don't download from (unreliable) maven.org (#6348)
Summary:
I set up a mirror of our Java deps on github so we can download
them through github URLs rather than maven.org, which is proving
terribly unreliable from Travis builds.

Also sanitized calls to curl, so they are easier to read and
appropriately fail on download failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348

Test Plan: CI

Differential Revision: D19633621

Pulled By: pdillinger

fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3
2020-01-30 11:02:08 -08:00
Adam Retter
a07a9dc904 Reduce the need to re-download dependencies (#6318)
Summary:
Both changes are related to RocksJava:

1. Allow dependencies that are already present on the host system due to Maven to be reused in Docker builds.

2. Extend the `make clean-not-downloaded` target to RocksJava, so that libraries needed as dependencies for the test suite are not deleted and re-downloaded unnecessarily.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6318

Differential Revision: D19608742

Pulled By: pdillinger

fbshipit-source-id: 25e25649e3e3212b537ac4512b40e2e53dc02ae7
2020-01-29 08:01:56 -08:00
Peter Dillinger
95d226d8f5 Fix a clang analyzer report, and 'analyze' make rule (#6244)
Summary:
Clang analyzer was falsely reporting on use of txn=nullptr.
Added a new const variable so that it can properly prune impossible
control flows.

Also, 'make analyze' previously required setting USE_CLANG=1 as an
environment variable, not a make variable, or else compilation errors
like

g++: error: unrecognized command line option ‘-Wshorten-64-to-32’

Now USE_CLANG is not required for 'make analyze' (it's implied) and you
can do an incremental analysis (recompile what has changed) with
'USE_CLANG=1 make analyze_incremental'
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6244

Test Plan: 'make -j24 analyze', 'make crash_test'

Differential Revision: D19225950

Pulled By: pdillinger

fbshipit-source-id: 14f4039aa552228826a2de62b2671450e0fed3cb
2019-12-24 18:46:40 -08:00
Yanqin Jin
670a916d01 Add more verification to db_stress (#6173)
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.

Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173

Differential Revision: D19047367

Pulled By: riversand963

fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
2019-12-20 08:49:29 -08:00
Peter Dillinger
5b18729d7d Syntax check python files on testing (#6209)
Summary:
Adds a python script to syntax check all python files in the
repository and report any errors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6209

Test Plan:
'make check' with and without seeded syntax errors. Also look
for "No syntax errors in 34 .py files" on success, and in java_test CI output

Differential Revision: D19166756

Pulled By: pdillinger

fbshipit-source-id: 537df464b767260d66810b4cf4c9808a026c58a4
2019-12-19 08:31:11 -08:00
Peter Dillinger
58d46d1915 Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154)
Summary:
And clean up related code, especially in stress test.

(More clean up of db_stress_test_base.cc coming after this.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154

Test Plan: make check, make blackbox_crash_test for a bit

Differential Revision: D18938180

Pulled By: pdillinger

fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e
2019-12-13 14:30:14 -08:00
Maysam Yabandeh
1ad6fa9cc7 Enable txn in crash tests (#6155)
Summary:
Start daily crash tests with use_txn flag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6155

Differential Revision: D18943630

Pulled By: maysamyabandeh

fbshipit-source-id: eea99a6ffd5f57fb9651f6ca7dab8fbf70379c87
2019-12-11 16:01:55 -08:00
Jermy Li
1dd3194f56 Fix compile error "folly/xx.h file not found" on Mac OS (#6145)
Summary:
Error message when running `make` on Mac OS with master branch (v6.6.0):
```
$ make
$DEBUG_LEVEL is 1
Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
#include <folly/synchronization/WaitOptions.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
#include <folly/synchronization/ParkingLot.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
#include <folly/synchronization/DistributedMutex.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
#include <folly/synchronization/AtomicNotification.h>
         ^
1 error generated.
third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
#include <folly/detail/Futex.h>
         ^
1 error generated.
  GEN      util/build_version.cc
$DEBUG_LEVEL is 1
Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production
third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found
#include <folly/synchronization/WaitOptions.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found
#include <folly/synchronization/ParkingLot.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found
#include <folly/synchronization/DistributedMutex.h>
         ^
1 error generated.
third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found
#include <folly/synchronization/AtomicNotification.h>
         ^
1 error generated.
third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found
#include <folly/detail/Futex.h>
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6145

Differential Revision: D18910812

fbshipit-source-id: 5a4475466c2d0601657831a0b48d34316b2f0816
2019-12-10 11:24:11 -08:00
sdong
7d79b32618 Break db_stress_tool.cc to a list of source files (#6134)
Summary:
db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files.
Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134

Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1

Differential Revision: D18876064

fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d
2019-12-08 23:51:01 -08:00
Yanqin Jin
4edb4284e7 Make folly-related targets comply with verbosity (#6120)
Summary:
Before this fix, `make all` will emit full compilation command when building
object files in the third-party/folly directory even if default verbosity is
0 (AM_DEFAULT_VERBOSITY).

Test Plan (devserver):
```
$make all | tee build.log
$make check
```
Check build.log to verify.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6120

Differential Revision: D18795621

Pulled By: riversand963

fbshipit-source-id: 04641a8359cd4fd55034e6e797ed85de29ee2fe2
2019-12-03 16:04:44 -08:00
Patrick Double
8ae149eba1 Add shared library for musl-libc (#3143)
Summary:
Add the jni library for musl-libc, specifically for incorporating into Alpine based docker images. The classifier is `musl64`.

I have signed the CLA electronically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3143

Differential Revision: D18719372

fbshipit-source-id: 6189d149310b6436d6def7d808566b0234b23313
2019-11-26 18:24:09 -08:00
Adam Retter
1bf316e5b6 Fix naming of library on PPC64LE (#6080)
Summary:
**NOTE**: This also needs to be back-ported to be 6.4.6

Fix a regression introduced in f2bf0b2 by https://github.com/facebook/rocksdb/pull/5674 whereby the compiled library would get the wrong name on PPC64LE platforms.

On PPC64LE, the regression caused the library to be named `librocksdbjni-linux64.so` instead of `librocksdbjni-linux-ppc64le.so`.

This PR corrects the name back to `librocksdbjni-linux-ppc64le.so` and also corrects the ordering of conditional arguments in the Makefile to match the expected order as defined in the documentation for Make.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6080

Differential Revision: D18710351

fbshipit-source-id: d4db87ef378263b57de7f9edce1b7d15644cf9de
2019-11-26 13:28:32 -08:00
Adam Retter
7f14519577 Small improvements to Docker build for RocksJava (#6079)
Summary:
* We can reuse downloaded 3rd-party libraries
* We can isolate the build to a Docker volume. This is useful for investigating failed builds, as we can examine the volume by assigning it a name during the build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6079

Differential Revision: D18710263

fbshipit-source-id: 93f456ba44b49e48941c43b0c4d53995ecc1f404
2019-11-26 13:28:31 -08:00
Adam Retter
382b154be6 Update 3rd-party libraries used by RocksJava (#6084)
Summary:
* LZ4 1.8.3 -> 1.9.2
* ZSTD 1.4.0 -> 1.4.4
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6084

Differential Revision: D18710224

fbshipit-source-id: a461ef19a473d3480acdc027f627ec3048730692
2019-11-26 13:28:31 -08:00
Peter Dillinger
00d58a370e Abandon use of folly::Optional (#6036)
Summary:
Had complications with LITE build and valgrind test.
Reverts/fixes small parts of PR https://github.com/facebook/rocksdb/issues/6007
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6036

Test Plan:
make LITE=1 all check
and
ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 make -j24 db_bloom_filter_test && ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 ./db_bloom_filter_test

Differential Revision: D18512238

Pulled By: pdillinger

fbshipit-source-id: 37213cf0d309edf11c483fb4b2fb6c02c2cf2b28
2019-11-14 14:04:15 -08:00
Peter Dillinger
f059c7d9b9 New Bloom filter implementation for full and partitioned filters (#6007)
Summary:
Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter.

Speed

The improved speed, at least on recent x86_64, comes from
* Using fastrange instead of modulo (%)
* Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only *slightly* slower on keys around 12 bytes if hashing the same size many thousands of times in a row.
* Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc.
* Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes.

Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed):

$ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter
Build avg ns/key: 47.7135
Mixed inside/outside queries...
  Single filter net ns/op: 26.2825
  Random filter net ns/op: 150.459
    Average FP rate %: 0.954651
$ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter
Build avg ns/key: 47.2245
Mixed inside/outside queries...
  Single filter net ns/op: 63.2978
  Random filter net ns/op: 188.038
    Average FP rate %: 1.13823

Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected.

The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome.

Accuracy

The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices
within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments.

Accuracy data (generalizes, except old impl gets worse with millions of keys):
Memory bits per key: FP rate percent old impl -> FP rate percent new impl
6: 5.70953 -> 5.69888
8: 2.45766 -> 2.29709
10: 1.13977 -> 0.959254
12: 0.662498 -> 0.411593
16: 0.353023 -> 0.0873754
24: 0.261552 -> 0.0060971
50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP)

Fixes https://github.com/facebook/rocksdb/issues/5857
Fixes https://github.com/facebook/rocksdb/issues/4120

Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized.

Compatibility

Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007

Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version).

Differential Revision: D18294749

Pulled By: pdillinger

fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f
2019-11-13 16:44:01 -08:00
Yun Tang
07a0ad3c29 Download bzip2 packages from sourceforge (#5995)
Summary:
From bzip2's official [download page](http://www.bzip.org/downloads.html), we could download it from sourceforge. This source would be more credible than previous web archive.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5995

Differential Revision: D18377662

fbshipit-source-id: e8353f83d5d6ea6067f78208b7bfb7f0d5b49c05
2019-11-07 12:51:06 -08:00
Yanqin Jin
925250f42f Include db_stress_tool in rocksdb tools lib (#5950)
Summary:
include db_stress_tool in rocksdb tools lib

Test Plan (on devserver):
```
$make db_stress
$./db_stress
$make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950

Differential Revision: D18044399

Pulled By: riversand963

fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1
2019-10-21 19:40:35 -07:00
Yanqin Jin
e60cc0925c Expose db stress tests (#5937)
Summary:
expose db stress test by providing db_stress_tool.h in public header.
This PR does the following:
- adds a new header, db_stress_tool.h, in include/rocksdb/
- renames db_stress.cc to db_stress_tool.cc
- adds a db_stress.cc which simply invokes a test function.
- update Makefile accordingly.

Test Plan (dev server):
```
make db_stress
./db_stress
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5937

Differential Revision: D17997647

Pulled By: riversand963

fbshipit-source-id: 1a8d9994f89ce198935566756947c518f0052410
2019-10-18 09:46:44 -07:00
Peter Dillinger
46ca51d430 filter_bench - a prelim tool for SST filter benchmarking (#5825)
Summary:
Example: using the tool before and after PR https://github.com/facebook/rocksdb/issues/5784 shows that
the refactoring, presumed performance-neutral, actually sped up SST
filters by about 3% to 8% (repeatable result):

Before:
-  Dry run ns/op: 22.4725
-  Single filter ns/op: 51.1078
-  Random filter ns/op: 120.133

After:
+  Dry run ns/op: 22.2301
+  Single filter run ns/op: 47.4313
+  Random filter ns/op: 115.9

Only tests filters for the block-based table (full filters and
partitioned filters - same implementation; not block-based filters),
which seems to be the recommended format/implementation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5825

Differential Revision: D17804987

Pulled By: pdillinger

fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5
2019-10-07 20:10:53 -07:00
Yanqin Jin
a9c5e8e944 Refactor deletefile_test.cc (#5822)
Summary:
Make DeleteFileTest inherit DBTestBase to avoid code duplication.

Test Plan (on devserver)
```
$make deletefile_test
$./deletefile_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5822

Differential Revision: D17456750

Pulled By: riversand963

fbshipit-source-id: 224e97967da7b98838a98981cd5095d3230a814f
2019-09-18 16:58:21 -07:00
Yanqin Jin
6a279037cf Refactor ObsoleteFilesTest to inherit from DBTestBase (#5820)
Summary:
Make class ObsoleteFilesTest inherit from DBTestBase.

Test plan (on devserver):
```
$COMPILE_WITH_ASAN=1 make obsolete_files_test
$./obsolete_files_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5820

Differential Revision: D17452348

Pulled By: riversand963

fbshipit-source-id: b09f4581a18022ca2bfd79f2836c0bf7083f5f25
2019-09-18 11:52:17 -07:00
Peter Dillinger
68626249c3 Refactor/consolidate legacy Bloom implementation details (#5784)
Summary:
Refactoring to consolidate implementation details of legacy
Bloom filters. This helps to organize and document some related,
obscure code.

Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to
compile and run unit tests for non-native cache line size. (Fixed a
related test failure in db_properties_test.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5784

Test Plan:
make check, including Recently added Bloom schema unit tests
(in ./plain_table_db_test && ./bloom_test), and including with
TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the
schema tests with temporary fault injection into new implementations.

Some performance testing with modified unit tests suggest a small to moderate
improvement in speed.

Differential Revision: D17381384

Pulled By: pdillinger

fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df
2019-09-16 16:17:09 -07:00
Peter Dillinger
d3a6726f02 Revert changes from PR#5784 accidentally in PR#5780 (#5810)
Summary:
This will allow us to fix history by having the code changes for PR#5784 properly attributed to it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5810

Differential Revision: D17400231

Pulled By: pdillinger

fbshipit-source-id: 2da8b1cdf2533cfedb35b5526eadefb38c291f09
2019-09-16 11:38:53 -07:00