rocksdb

Author	SHA1	Message	Date
Peter Dillinger	aaafcb80ab	Use in-repo gtest in buck build (#6858 ) Summary: ... so that we have freedom to upgrade it (see https://github.com/facebook/rocksdb/issues/6808). As a side benefit, gtest will no longer be linked into main library in buck build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6858 Test Plan: fb internal build & link Reviewed By: riversand963 Differential Revision: D21652061 Pulled By: pdillinger fbshipit-source-id: 6018104af944debde576b5beda6c134e737acedb	2020-05-20 11:37:45 -07:00
Akanksha Mahajan	a1523efcdf	Status check enforcement for io_posix_test and options_settable_test (#6857 ) Summary: Added status check enforcement for io_posix_test and options_settable_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/6857 Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 check Reviewed By: ajkr Differential Revision: D21647904 Pulled By: akankshamahajan15 fbshipit-source-id: b7f2321eb6c141a88cd5e1270ecb7d58f00341af	2020-05-19 19:22:28 -07:00
Cheng Chang	91b7553293	Enable IO Uring in MultiGet in direct IO mode (#6815 ) Summary: Currently, in direct IO mode, `MultiGet` retrieves the data blocks one by one instead of in parallel, see `BlockBasedTable::RetrieveMultipleBlocks`. Since direct IO is supported in `RandomAccessFileReader::MultiRead` in https://github.com/facebook/rocksdb/pull/6446, this PR applies `MultiRead` to `MultiGet` so that the data blocks can be retrieved in parallel. Also, in direct IO mode and when data blocks are compressed and need to uncompressed, this PR only allocates one continuous aligned buffer to hold the data blocks, and then directly uncompress the blocks to insert into block cache, there is no longer intermediate copies to scratch buffers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6815 Test Plan: 1. added a new unit test `BlockBasedTableReaderTest::MultiGet`. 2. existing unit tests and stress tests contain tests against `MultiGet` in direct IO mode. Reviewed By: anand1976 Differential Revision: D21426347 Pulled By: cheng-chang fbshipit-source-id: b8446ae0e74152444ef9111e97f8e402ac31b24f	2020-05-14 23:26:26 -07:00
Andrew Kryczka	1c84660457	prototype status check enforcement (#6798 ) Summary: Tried making Status object enforce that it is checked in some way. In cases it is not checked, `PermitUncheckedError()` must be called explicitly. Added a way to run tests (`ASSERT_STATUS_CHECKED=1 make -j48 check`) on a whitelist. The effort appears significant to get each test to pass with this assertion, so I only fixed up enough to get one test (`options_test`) working and added it to the whitelist. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6798 Reviewed By: pdillinger Differential Revision: D21377404 Pulled By: ajkr fbshipit-source-id: 73236f9c8df38f01cf24ecac4a6d1661b72d077e	2020-05-08 12:40:43 -07:00
Yanqin Jin	6acbbbf9fc	Add Github Action for some basic sanity test of PR (#6761 ) Summary: Add Github Action to perform some basic sanity check for PR, inclding the following. 1) Buck TARGETS file. On the one hand, The TARGETS file is used for internal buck, and we do not manually update it. On the other hand, we need to run the buckifier scripts to update TARGETS whenever new files are added, etc. With this Github Action, we make sure that every PR does not forget this step. The GH Action uses a Makefile target called check-buck-targets. Users can manually run `make check-buck-targets` on local machine. 2) Code format We use clang-format-diff.py to format our code. The GH Action in this PR makes sure this step is not skipped. The checking script build_tools/format-diff.sh assumes that `clang-format-diff.py` is executable. On host running GH Action, it is difficult to download `clang-format-diff.py` and make it executable. Therefore, we modified build_tools/format-diff.sh to handle the case in which there is a non-executable clang-format-diff.py file in the top-level rocksdb repo directory. Test Plan (Github and devserver): Watch for Github Action result in the `Checks` tab. On dev server ``` make check-format make check-buck-targets make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6761 Test Plan: Watch for Github Action result in the `Checks` tab. Reviewed By: pdillinger Differential Revision: D21260209 Pulled By: riversand963 fbshipit-source-id: c646e2f37c6faf9f0614b68aa0efc818cff96787	2020-04-30 19:22:45 -07:00
Peter Dillinger	791e5714a5	Understand common build variables passed as make variables (#6740 ) Summary: Some common build variables like USE_CLANG and COMPILE_WITH_UBSAN did not work if specified as make variables, as in `make USE_CLANG=1 check` etc. rather than (in theory less hygienic) `USE_CLANG=1 make check`. This patches Makefile to export some commonly used ones to build_detect_platform so that they work. (I'm skeptical of a broad `export` in Makefile because it's hard to predict how random make variables might affect various invoked tools.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6740 Test Plan: manual / CI Reviewed By: siying Differential Revision: D21229011 Pulled By: pdillinger fbshipit-source-id: b00c69b23eb2a13105bc8d860ce2d1e61ac5a355	2020-04-27 10:48:49 -07:00
Cheng Chang	40497a875a	Reduce memory copies when fetching and uncompressing blocks from SST files (#6689 ) Summary: In https://github.com/facebook/rocksdb/pull/6455, we modified the interface of `RandomAccessFileReader::Read` to be able to get rid of memcpy in direct IO mode. This PR applies the new interface to `BlockFetcher` when reading blocks from SST files in direct IO mode. Without this PR, in direct IO mode, when fetching and uncompressing compressed blocks, `BlockFetcher` will first copy the raw compressed block into `BlockFetcher::compressed_buf_` or `BlockFetcher::stack_buf_` inside `RandomAccessFileReader::Read` depending on the block size. then during uncompressing, it will copy the uncompressed block into `BlockFetcher::heap_buf_`. In this PR, we get rid of the first memcpy and directly uncompress the block from `direct_io_buf_` to `heap_buf_`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6689 Test Plan: A new unit test `block_fetcher_test` is added. Reviewed By: anand1976 Differential Revision: D21006729 Pulled By: cheng-chang fbshipit-source-id: 2370b92c24075692423b81277415feb2aed5d980	2020-04-24 15:32:56 -07:00
Adam Retter	5fef0ffd66	Update RocksJava static version of bzip2 (#6714 ) Summary: Updates the version of bzip2 used for RocksJava static builds. Please, can we also get this cherry-picked to: 1. 6.7.fb 2. 6.8.fb 3. 6.9.fb Pull Request resolved: https://github.com/facebook/rocksdb/pull/6714 Reviewed By: cheng-chang Differential Revision: D21067233 Pulled By: pdillinger fbshipit-source-id: 8164b7eb99c5ca7b2021ab8c371ba9ded4cb4f7e	2020-04-16 15:35:24 -07:00
anand76	5c19a441c4	Fault injection in db_stress (#6538 ) Summary: This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538 Test Plan: crash_test make check Reviewed By: riversand963 Differential Revision: D20714347 Pulled By: anand1976 fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7	2020-04-10 17:21:26 -07:00
Yanqin Jin	0c05624d50	Compaction with timestamp: input boundaries (#6645 ) Summary: Towards making compaction logic compatible with user timestamp. When computing boundaries and overlapping ranges for inputs of compaction, We need to compare SSTs by user key without timestamp. Test plan (devserver): ``` make check ``` Several individual tests: ``` ./version_set_test --gtest_filter=VersionStorageInfoTimestampTest.GetOverlappingInputs ./db_with_timestamp_compaction_test ./db_with_timestamp_basic_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6645 Reviewed By: ltamasi Differential Revision: D20960012 Pulled By: riversand963 fbshipit-source-id: ad377fa9eb481bf7a8a3e1824aaade48cdc653a4	2020-04-10 16:05:49 -07:00
Luca Giacchino	66a95f0fac	Provide an allocator for new memory type to be used with RocksDB block cache (#6214 ) Summary: New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM. The new allocator provided in this PR uses the memkind library to allocate memory on different media. Performance We tested the new allocator using db_bench. - For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database). - The database is filled sequentially. Throughput is then measured with a readrandom benchmark. - We use a uniform distribution as a worst-case scenario. The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator. For all tests, p99 latency is below 500 us. ![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png) Changes - Add MemkindKmemAllocator - Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator) - Add detection of memkind library with KMEM DAX support - Add test for MemkindKmemAllocator Minimum Requirements - kernel 5.3.12 - ndctl v67 - https://github.com/pmem/ndctl - memkind v1.10.0 - https://github.com/memkind/memkind Memory Configuration The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly. Note on memory allocation with NVDIMM memory exposed as system memory. - The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind). - The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node. Usage When creating an LRU cache, pass a MemkindKmemAllocator object as argument. For example (replace capacity with the desired value in bytes): ``` #include "rocksdb/cache.h" #include "memory/memkind_kmem_allocator.h" NewLRUCache( capacity /size_t/, 6 /cache_numshardbits/, false /strict_capacity_limit/, false /cache_high_pri_pool_ratio/, std::make_shared<MemkindKmemAllocator>()); ``` Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214 Reviewed By: cheng-chang Differential Revision: D19292435 fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1	2020-04-09 20:47:23 -07:00
Cheng Chang	d648a0e17f	Add unit test for TransactionLockMgr (#6599 ) Summary: Although there are tests related to locking in transaction_test, this new test directly tests against TransactionLockMgr. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6599 Test Plan: make transaction_lock_mgr_test && ./transaction_lock_mgr_test Reviewed By: lth Differential Revision: D20673749 Pulled By: cheng-chang fbshipit-source-id: 1fa4a13218e68d785f5a99924556751a8c5c0f31	2020-04-08 13:51:51 -07:00
Sagar Vemuri	0355d14dd9	Add a simple timer support to schedule work at fixed times/intervals (#6543 ) Summary: Adding a simple timer support to schedule work at a fixed time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6543 Test Plan: TODO: clean up the unit tests, and make them better. Reviewed By: siying Differential Revision: D20465390 Pulled By: sagar0 fbshipit-source-id: cba143f70b6339863e1d0f8b8bf92e51c2b3d678	2020-04-07 11:55:27 -07:00
Peter Dillinger	a67fb4c9bd	Add some timestamps in CI build+test output (#6643 ) Summary: When Travis times out, it's hard to determine whether the last executing thing took an excessively long time or the sum of all the work just exceeded the time limit. This change inserts some timestamps in the output that should make this easier to determine. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6643 Test Plan: CI (Travis mostly) Reviewed By: anand1976 Differential Revision: D20843901 Pulled By: pdillinger fbshipit-source-id: e7aae5434b0c609931feddf238ce4355964488b7	2020-04-04 10:02:07 -07:00
Ziyue Yang	03a781a90c	Add pipelined & parallel compression optimization (#6262 ) Summary: This PR adds support for pipelined & parallel compression optimization for `BlockBasedTableBuilder`. This optimization makes block building, block compression and block appending a pipeline, and uses multiple threads to accelerate block compression. Users can set `CompressionOptions::parallel_threads` greater than 1 to enable compression parallelism. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6262 Reviewed By: ajkr Differential Revision: D20651306 fbshipit-source-id: 62125590a9c15b6d9071def9dc72589c1696a4cb	2020-04-01 16:40:18 -07:00
Yanqin Jin	ccf7676455	Update a few scripts to be python3 compatible (#6525 ) Summary: There are a few scripts with python3 compatibility issues that were not detected by automated tool before. Update them now. Test Plan (devserver): python2 tools/ldb_test.py python3 tools/ldb_test.py python2 tools/write_stress_runner.py --runtime_sec=30 python3 tools/write_stress_runner.py --runtime_sec=30 python2 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox python3 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox python2 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox python3 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox Pull Request resolved: https://github.com/facebook/rocksdb/pull/6525 Reviewed By: cheng-chang Differential Revision: D20627820 Pulled By: riversand963 fbshipit-source-id: 4b25a7bd4d001c7f868be8b640ef876523be6ca3	2020-03-24 21:00:27 -07:00
Cheng Chang	4fc216649d	Support direct IO in RandomAccessFileReader::MultiRead (#6446 ) Summary: By supporting direct IO in RandomAccessFileReader::MultiRead, the benefits of parallel IO (IO uring) and direct IO can be combined. In direct IO mode, read requests are aligned and merged together before being issued to RandomAccessFile::MultiRead, so blocks in the original requests might share the same underlying buffer, the shared buffers are returned in `aligned_bufs`, which is a new parameter of the `MultiRead` API. For example, suppose alignment requirement for direct IO is 4KB, one request is (offset: 1KB, len: 1KB), another request is (offset: 3KB, len: 1KB), then since they all belong to page (offset: 0, len: 4KB), `MultiRead` only reads the page with direct IO into a buffer on heap, and returns 2 Slices referencing regions in that same buffer. See `random_access_file_reader_test.cc` for more examples. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6446 Test Plan: Added a new test `random_access_file_reader_test.cc`. Reviewed By: anand1976 Differential Revision: D20097518 Pulled By: cheng-chang fbshipit-source-id: ca48a8faf9c3af146465c102ef6b266a363e78d1	2020-03-20 16:33:26 -07:00
Zhichao Cao	5c30e6c088	Separate timestamp related test from db_basic_test (#6516 ) Summary: In some of the test, db_basic_test may cause time out due to its long running time. Separate the timestamp related test from db_basic_test to avoid the potential issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6516 Test Plan: pass make asan_check Differential Revision: D20423922 Pulled By: zhichao-cao fbshipit-source-id: d6306f89a8de55b07bf57233e4554c09ef1fe23a	2020-03-13 11:37:15 -07:00
Adam Retter	0772768d07	Force Java version on Travis CI (#6512 ) Summary: In the `.travis.yml` file the `jdk: openjdk7` element is ignored when `language: cpp`. So whatever version of the JDK that was installed in the Travis container was used - typically JDK 11. To ensure our RocksJava builds are working, we now instead install and use OpenJDK 8. Ideally we would use OpenJDK 7, as RocksJava supports Java 7, but many of the newer Travis containers don't support Java 7, so Java 8 is the next best thing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6512 Differential Revision: D20388296 Pulled By: pdillinger fbshipit-source-id: 8bbe6b59b70cfab7fe81ff63867d907fefdd2df1	2020-03-12 12:24:51 -07:00
Levi Tamasi	c15e85bdcb	Move BlobDB related files under db/ to db/blob/ (#6519 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6519 Test Plan: ``` make all make check ``` Differential Revision: D20400691 Pulled By: ltamasi fbshipit-source-id: 20ef911cf1c2c92c7f71ef0b493f9be64f2eef94	2020-03-12 11:00:56 -07:00
Cheng Chang	2d9efc9ab2	Cache result of GetLogicalBufferSize in Linux (#6457 ) Summary: In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down. This PR introduces two new APIs: 1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes. 2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes. Other modifications: 1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms. 2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`. 3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457 Test Plan: 1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`. 2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`. `make check` Differential Revision: D20131243 Pulled By: cheng-chang fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04	2020-03-11 18:40:05 -07:00
Adam Retter	65b60db9e1	Update to latest Snappy to fix compilation issue on latest MacOS XCode (#6496 ) Summary: * macOS version: 10.15.2 (Catalina) * XCode/Clang version: Apple clang version 11.0.0 (clang-1100.0.33.16) Before this bugfix the error generated is: ``` In file included from ./util/compression.h:23: ./snappy-1.1.7/snappy.h:76:59: error: unknown type name 'string'; did you mean 'std::string'? size_t Compress(const char* input, size_t input_length, string* output); ^~~~~~ std::string /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here typedef basic_string<char, char_traits<char>, allocator<char> > string; ^ In file included from db/builder.cc:10: In file included from ./db/builder.h:12: In file included from ./db/range_tombstone_fragmenter.h:15: In file included from ./db/pinned_iterators_manager.h:12: In file included from ./table/internal_iterator.h:13: In file included from ./table/format.h:25: In file included from ./options/cf_options.h:14: In file included from ./util/compression.h:23: ./snappy-1.1.7/snappy.h:85:19: error: unknown type name 'string'; did you mean 'std::string'? string* uncompressed); ^~~~~~ std::string /Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here typedef basic_string<char, char_traits<char>, allocator<char> > string; ^ 2 errors generated. make: *** [jls/db/builder.o] Error 1 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6496 Differential Revision: D20389254 Pulled By: pdillinger fbshipit-source-id: 2864245c8d0dba7b2ab81294241a62f2adf02e20	2020-03-11 11:46:13 -07:00
Levi Tamasi	f5bc3b99d5	Split BlobFileState into an immutable and a mutable part (#6502 ) Summary: It's never too soon to refactor something. The patch splits the recently introduced (`VersionEdit` related) `BlobFileState` into two classes `BlobFileAddition` and `BlobFileGarbage`. The idea is that once blob files are closed, they are immutable, and the only thing that changes is the amount of garbage in them. In the new design, `BlobFileAddition` contains the immutable attributes (currently, the count and total size of all blobs, checksum method, and checksum value), while `BlobFileGarbage` contains the mutable GC-related information elements (count and total size of garbage blobs). This is a better fit for the GC logic and is more consistent with how SST files are handled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6502 Test Plan: `make check` Differential Revision: D20348352 Pulled By: ltamasi fbshipit-source-id: ff93f0121e80ab15e0e0a6525ba0d6af16a0e008	2020-03-10 17:27:26 -07:00
Zhichao Cao	e62fe50634	Introduce FaultInjectionTestFS to test fault File system instead of Env (#6414 ) Summary: In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller. A set of ErrorHandlerFSTests are introduced for testing Pull Request resolved: https://github.com/facebook/rocksdb/pull/6414 Test Plan: pass make asan_check, pass error_handler_fs_test. Differential Revision: D20252421 Pulled By: zhichao-cao fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21	2020-03-04 12:35:05 -08:00
Michael R. Crusoe	051696bf98	fix some spelling typos (#6464 ) Summary: Found from Debian's "Lintian" program Pull Request resolved: https://github.com/facebook/rocksdb/pull/6464 Differential Revision: D20162862 Pulled By: zhichao-cao fbshipit-source-id: 06941ee2437b038b2b8045becbe9d2c6fbff3e12	2020-02-28 14:14:03 -08:00
Levi Tamasi	d87c10c6ab	Add blob file state to VersionEdit (#6416 ) Summary: BlobDB currently does not keep track of blob files: no records are written to the manifest when a blob file is added or removed, and upon opening a database, the list of blob files is populated simply based on the contents of the blob directory. This means that lost blob files cannot be detected at the moment. We plan to solve this issue by making blob files a part of `Version`; as a first step, this patch makes it possible to store information about blob files in `VersionEdit`. Currently, this information includes blob file number, total number and size of all blobs, and total number and size of garbage blobs. However, the format is extensible: new fields can be added in both a forward compatible and a forward incompatible manner if needed (similarly to `kNewFile4`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6416 Test Plan: `make check` Differential Revision: D19894234 Pulled By: ltamasi fbshipit-source-id: f9753e1f2aedf6dadb70c09b345207cb9c58c329	2020-02-24 18:39:53 -08:00
Cheng Chang	dafb568052	Add utility class Defer (#6382 ) Summary: Add a utility class `Defer` to defer the execution of a function until the Defer object goes out of scope. Used in VersionSet:: ProcessManifestWrites as an example. The inline comments for class `Defer` have more details. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6382 Test Plan: `make defer_test version_set_test && ./defer_test && ./version_set_test` Differential Revision: D19797538 Pulled By: cheng-chang fbshipit-source-id: b1a9b7306e4fd4f48ec2ab55783caa561a315f0f	2020-02-10 17:59:47 -08:00
Cheng Chang	b42fa1497f	Support move semantics for PinnableSlice (#6374 ) Summary: It's logically correct for PinnableSlice to support move semantics to transfer ownership of the pinned memory region. This PR adds both move constructor and move assignment to PinnableSlice. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6374 Test Plan: A set of unit tests for the move semantics are added in slice_test. So `make slice_test && ./slice_test`. Differential Revision: D19739254 Pulled By: cheng-chang fbshipit-source-id: f898bd811bb05b2d87384ec58b645e9915e8e0b1	2020-02-07 14:26:26 -08:00
Peter Dillinger	90c71aa5d9	Don't download from (unreliable) maven.org (#6348 ) Summary: I set up a mirror of our Java deps on github so we can download them through github URLs rather than maven.org, which is proving terribly unreliable from Travis builds. Also sanitized calls to curl, so they are easier to read and appropriately fail on download failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348 Test Plan: CI Differential Revision: D19633621 Pulled By: pdillinger fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3	2020-01-30 11:02:08 -08:00
Adam Retter	a07a9dc904	Reduce the need to re-download dependencies (#6318 ) Summary: Both changes are related to RocksJava: 1. Allow dependencies that are already present on the host system due to Maven to be reused in Docker builds. 2. Extend the `make clean-not-downloaded` target to RocksJava, so that libraries needed as dependencies for the test suite are not deleted and re-downloaded unnecessarily. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6318 Differential Revision: D19608742 Pulled By: pdillinger fbshipit-source-id: 25e25649e3e3212b537ac4512b40e2e53dc02ae7	2020-01-29 08:01:56 -08:00
Peter Dillinger	95d226d8f5	Fix a clang analyzer report, and 'analyze' make rule (#6244 ) Summary: Clang analyzer was falsely reporting on use of txn=nullptr. Added a new const variable so that it can properly prune impossible control flows. Also, 'make analyze' previously required setting USE_CLANG=1 as an environment variable, not a make variable, or else compilation errors like g++: error: unrecognized command line option ‘-Wshorten-64-to-32’ Now USE_CLANG is not required for 'make analyze' (it's implied) and you can do an incremental analysis (recompile what has changed) with 'USE_CLANG=1 make analyze_incremental' Pull Request resolved: https://github.com/facebook/rocksdb/pull/6244 Test Plan: 'make -j24 analyze', 'make crash_test' Differential Revision: D19225950 Pulled By: pdillinger fbshipit-source-id: 14f4039aa552228826a2de62b2671450e0fed3cb	2019-12-24 18:46:40 -08:00
Yanqin Jin	670a916d01	Add more verification to db_stress (#6173 ) Summary: Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways. 1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests. 2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests. Test plan (devserver) ``` $./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100 $./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0 $make crash_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173 Differential Revision: D19047367 Pulled By: riversand963 fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615	2019-12-20 08:49:29 -08:00
Peter Dillinger	5b18729d7d	Syntax check python files on testing (#6209 ) Summary: Adds a python script to syntax check all python files in the repository and report any errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6209 Test Plan: 'make check' with and without seeded syntax errors. Also look for "No syntax errors in 34 .py files" on success, and in java_test CI output Differential Revision: D19166756 Pulled By: pdillinger fbshipit-source-id: 537df464b767260d66810b4cf4c9808a026c58a4	2019-12-19 08:31:11 -08:00
Peter Dillinger	58d46d1915	Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154 ) Summary: And clean up related code, especially in stress test. (More clean up of db_stress_test_base.cc coming after this.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154 Test Plan: make check, make blackbox_crash_test for a bit Differential Revision: D18938180 Pulled By: pdillinger fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e	2019-12-13 14:30:14 -08:00
Maysam Yabandeh	1ad6fa9cc7	Enable txn in crash tests (#6155 ) Summary: Start daily crash tests with use_txn flag. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6155 Differential Revision: D18943630 Pulled By: maysamyabandeh fbshipit-source-id: eea99a6ffd5f57fb9651f6ca7dab8fbf70379c87	2019-12-11 16:01:55 -08:00
Jermy Li	1dd3194f56	Fix compile error "folly/xx.h file not found" on Mac OS (#6145 ) Summary: Error message when running `make` on Mac OS with master branch (v6.6.0): ``` $ make $DEBUG_LEVEL is 1 Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found #include <folly/synchronization/WaitOptions.h> ^ 1 error generated. third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found #include <folly/synchronization/ParkingLot.h> ^ 1 error generated. third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found #include <folly/synchronization/DistributedMutex.h> ^ 1 error generated. third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found #include <folly/synchronization/AtomicNotification.h> ^ 1 error generated. third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found #include <folly/detail/Futex.h> ^ 1 error generated. GEN util/build_version.cc $DEBUG_LEVEL is 1 Makefile:168: Warning: Compiling in debug mode. Don't use the resulting binary in production third-party/folly/folly/synchronization/WaitOptions.cpp:6:10: fatal error: 'folly/synchronization/WaitOptions.h' file not found #include <folly/synchronization/WaitOptions.h> ^ 1 error generated. third-party/folly/folly/synchronization/ParkingLot.cpp:6:10: fatal error: 'folly/synchronization/ParkingLot.h' file not found #include <folly/synchronization/ParkingLot.h> ^ 1 error generated. third-party/folly/folly/synchronization/DistributedMutex.cpp:6:10: fatal error: 'folly/synchronization/DistributedMutex.h' file not found #include <folly/synchronization/DistributedMutex.h> ^ 1 error generated. third-party/folly/folly/synchronization/AtomicNotification.cpp:6:10: fatal error: 'folly/synchronization/AtomicNotification.h' file not found #include <folly/synchronization/AtomicNotification.h> ^ 1 error generated. third-party/folly/folly/detail/Futex.cpp:6:10: fatal error: 'folly/detail/Futex.h' file not found #include <folly/detail/Futex.h> ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6145 Differential Revision: D18910812 fbshipit-source-id: 5a4475466c2d0601657831a0b48d34316b2f0816	2019-12-10 11:24:11 -08:00
sdong	7d79b32618	Break db_stress_tool.cc to a list of source files (#6134 ) Summary: db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files. Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134 Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1 Differential Revision: D18876064 fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d	2019-12-08 23:51:01 -08:00
Yanqin Jin	4edb4284e7	Make folly-related targets comply with verbosity (#6120 ) Summary: Before this fix, `make all` will emit full compilation command when building object files in the third-party/folly directory even if default verbosity is 0 (AM_DEFAULT_VERBOSITY). Test Plan (devserver): ``` $make all \| tee build.log $make check ``` Check build.log to verify. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6120 Differential Revision: D18795621 Pulled By: riversand963 fbshipit-source-id: 04641a8359cd4fd55034e6e797ed85de29ee2fe2	2019-12-03 16:04:44 -08:00
Patrick Double	8ae149eba1	Add shared library for musl-libc (#3143 ) Summary: Add the jni library for musl-libc, specifically for incorporating into Alpine based docker images. The classifier is `musl64`. I have signed the CLA electronically. Pull Request resolved: https://github.com/facebook/rocksdb/pull/3143 Differential Revision: D18719372 fbshipit-source-id: 6189d149310b6436d6def7d808566b0234b23313	2019-11-26 18:24:09 -08:00
Adam Retter	1bf316e5b6	Fix naming of library on PPC64LE (#6080 ) Summary: NOTE: This also needs to be back-ported to be 6.4.6 Fix a regression introduced in `f2bf0b2` by https://github.com/facebook/rocksdb/pull/5674 whereby the compiled library would get the wrong name on PPC64LE platforms. On PPC64LE, the regression caused the library to be named `librocksdbjni-linux64.so` instead of `librocksdbjni-linux-ppc64le.so`. This PR corrects the name back to `librocksdbjni-linux-ppc64le.so` and also corrects the ordering of conditional arguments in the Makefile to match the expected order as defined in the documentation for Make. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6080 Differential Revision: D18710351 fbshipit-source-id: d4db87ef378263b57de7f9edce1b7d15644cf9de	2019-11-26 13:28:32 -08:00
Adam Retter	7f14519577	Small improvements to Docker build for RocksJava (#6079 ) Summary: * We can reuse downloaded 3rd-party libraries * We can isolate the build to a Docker volume. This is useful for investigating failed builds, as we can examine the volume by assigning it a name during the build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6079 Differential Revision: D18710263 fbshipit-source-id: 93f456ba44b49e48941c43b0c4d53995ecc1f404	2019-11-26 13:28:31 -08:00
Adam Retter	382b154be6	Update 3rd-party libraries used by RocksJava (#6084 ) Summary: * LZ4 1.8.3 -> 1.9.2 * ZSTD 1.4.0 -> 1.4.4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6084 Differential Revision: D18710224 fbshipit-source-id: a461ef19a473d3480acdc027f627ec3048730692	2019-11-26 13:28:31 -08:00
Peter Dillinger	00d58a370e	Abandon use of folly::Optional (#6036 ) Summary: Had complications with LITE build and valgrind test. Reverts/fixes small parts of PR https://github.com/facebook/rocksdb/issues/6007 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6036 Test Plan: make LITE=1 all check and ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 make -j24 db_bloom_filter_test && ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 ./db_bloom_filter_test Differential Revision: D18512238 Pulled By: pdillinger fbshipit-source-id: 37213cf0d309edf11c483fb4b2fb6c02c2cf2b28	2019-11-14 14:04:15 -08:00
Peter Dillinger	f059c7d9b9	New Bloom filter implementation for full and partitioned filters (#6007 ) Summary: Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter. Speed The improved speed, at least on recent x86_64, comes from * Using fastrange instead of modulo (%) * Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only slightly slower on keys around 12 bytes if hashing the same size many thousands of times in a row. * Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc. * Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes. Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed): $ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter Build avg ns/key: 47.7135 Mixed inside/outside queries... Single filter net ns/op: 26.2825 Random filter net ns/op: 150.459 Average FP rate %: 0.954651 $ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter Build avg ns/key: 47.2245 Mixed inside/outside queries... Single filter net ns/op: 63.2978 Random filter net ns/op: 188.038 Average FP rate %: 1.13823 Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected. The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome. Accuracy The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments. Accuracy data (generalizes, except old impl gets worse with millions of keys): Memory bits per key: FP rate percent old impl -> FP rate percent new impl 6: 5.70953 -> 5.69888 8: 2.45766 -> 2.29709 10: 1.13977 -> 0.959254 12: 0.662498 -> 0.411593 16: 0.353023 -> 0.0873754 24: 0.261552 -> 0.0060971 50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP) Fixes https://github.com/facebook/rocksdb/issues/5857 Fixes https://github.com/facebook/rocksdb/issues/4120 Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized. Compatibility Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007 Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version). Differential Revision: D18294749 Pulled By: pdillinger fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f	2019-11-13 16:44:01 -08:00
Yun Tang	07a0ad3c29	Download bzip2 packages from sourceforge (#5995 ) Summary: From bzip2's official [download page](http://www.bzip.org/downloads.html), we could download it from sourceforge. This source would be more credible than previous web archive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5995 Differential Revision: D18377662 fbshipit-source-id: e8353f83d5d6ea6067f78208b7bfb7f0d5b49c05	2019-11-07 12:51:06 -08:00
Yanqin Jin	925250f42f	Include db_stress_tool in rocksdb tools lib (#5950 ) Summary: include db_stress_tool in rocksdb tools lib Test Plan (on devserver): ``` $make db_stress $./db_stress $make all && make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950 Differential Revision: D18044399 Pulled By: riversand963 fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1	2019-10-21 19:40:35 -07:00
Yanqin Jin	e60cc0925c	Expose db stress tests (#5937 ) Summary: expose db stress test by providing db_stress_tool.h in public header. This PR does the following: - adds a new header, db_stress_tool.h, in include/rocksdb/ - renames db_stress.cc to db_stress_tool.cc - adds a db_stress.cc which simply invokes a test function. - update Makefile accordingly. Test Plan (dev server): ``` make db_stress ./db_stress ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5937 Differential Revision: D17997647 Pulled By: riversand963 fbshipit-source-id: 1a8d9994f89ce198935566756947c518f0052410	2019-10-18 09:46:44 -07:00
Peter Dillinger	46ca51d430	filter_bench - a prelim tool for SST filter benchmarking (#5825 ) Summary: Example: using the tool before and after PR https://github.com/facebook/rocksdb/issues/5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): Before: - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 After: + Dry run ns/op: 22.2301 + Single filter run ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5825 Differential Revision: D17804987 Pulled By: pdillinger fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5	2019-10-07 20:10:53 -07:00
Yanqin Jin	a9c5e8e944	Refactor deletefile_test.cc (#5822 ) Summary: Make DeleteFileTest inherit DBTestBase to avoid code duplication. Test Plan (on devserver) ``` $make deletefile_test $./deletefile_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5822 Differential Revision: D17456750 Pulled By: riversand963 fbshipit-source-id: 224e97967da7b98838a98981cd5095d3230a814f	2019-09-18 16:58:21 -07:00
Yanqin Jin	6a279037cf	Refactor ObsoleteFilesTest to inherit from DBTestBase (#5820 ) Summary: Make class ObsoleteFilesTest inherit from DBTestBase. Test plan (on devserver): ``` $COMPILE_WITH_ASAN=1 make obsolete_files_test $./obsolete_files_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5820 Differential Revision: D17452348 Pulled By: riversand963 fbshipit-source-id: b09f4581a18022ca2bfd79f2836c0bf7083f5f25	2019-09-18 11:52:17 -07:00
Peter Dillinger	68626249c3	Refactor/consolidate legacy Bloom implementation details (#5784 ) Summary: Refactoring to consolidate implementation details of legacy Bloom filters. This helps to organize and document some related, obscure code. Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to compile and run unit tests for non-native cache line size. (Fixed a related test failure in db_properties_test.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5784 Test Plan: make check, including Recently added Bloom schema unit tests (in ./plain_table_db_test && ./bloom_test), and including with TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the schema tests with temporary fault injection into new implementations. Some performance testing with modified unit tests suggest a small to moderate improvement in speed. Differential Revision: D17381384 Pulled By: pdillinger fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df	2019-09-16 16:17:09 -07:00
Peter Dillinger	d3a6726f02	Revert changes from PR#5784 accidentally in PR#5780 (#5810 ) Summary: This will allow us to fix history by having the code changes for PR#5784 properly attributed to it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5810 Differential Revision: D17400231 Pulled By: pdillinger fbshipit-source-id: 2da8b1cdf2533cfedb35b5526eadefb38c291f09	2019-09-16 11:38:53 -07:00
Peter Dillinger	aa2486b23c	Refactor some confusing logic in PlainTableReader Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5780 Test Plan: existing plain table unit test Differential Revision: D17368629 Pulled By: pdillinger fbshipit-source-id: f25409cdc2f39ebe8d5cbb599cf820270e6b5d26	2019-09-13 10:26:36 -07:00
Wilfried Goesgens	fbab9913e2	upgrade gtest 1.7.0 => 1.8.1 for json result writing Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332 Differential Revision: D17242232 fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64	2019-09-09 11:24:11 -07:00
ENDOH takanao	3f2723a81b	fix checking the '-march' flag (#5766 ) Summary: Hi! guys, I got errors on the ARM machine. before: ```console $ make static_lib ... g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto' g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native ``` Thanks! Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766 Differential Revision: D17191117 fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868	2019-09-04 14:34:28 -07:00
sdong	d8a27d9331	Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729 ) Summary: AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same. Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729 Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed. Differential Revision: D16969791 fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163	2019-08-22 16:32:55 -07:00
sdong	e1c468d16f	Do readahead in VerifyChecksum() (#5713 ) Summary: Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713 Test Plan: Add a new unit test. Differential Revision: D16860874 fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413	2019-08-16 16:42:56 -07:00
Adam Retter	f2bf0b2d1e	Fixes for building RocksJava releases on arm64v8 Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5674 Differential Revision: D16870338 fbshipit-source-id: c8dac644b1479fa734b491f3a8d50151772290f7	2019-08-16 16:27:50 -07:00
Aaryaman Sagar	77273d4137	Fix TSAN failures in DistributedMutex tests (#5684 ) Summary: TSAN was not able to correctly instrument atomic bts and btr instructions, so when TSAN is enabled implement those with std::atomic::fetch_or and std::atomic::fetch_and. Also disable tests that fail on TSAN with false negatives (we know these are false negatives because this other verifiably correct program fails with the same TSAN error <link>) ``` make clean TEST_TMPDIR=/dev/shm/rocksdb OPT=-g COMPILE_WITH_TSAN=1 make J=1 -j56 folly_synchronization_distributed_mutex_test ``` This is the code that fails with the same false-negative with TSAN ``` namespace { class ExceptionWithConstructionTrack : public std::exception { public: explicit ExceptionWithConstructionTrack(int id) : id_{folly::to<std::string>(id)}, constructionTrack_{id} {} const char* what() const noexcept override { return id_.c_str(); } private: std::string id_; TestConstruction constructionTrack_; }; template <typename Storage, typename Atomic> void transferCurrentException(Storage& storage, Atomic& produced) { assert(std::current_exception()); new (&storage) std::exception_ptr(std::current_exception()); produced->store(true, std::memory_order_release); } void concurrentExceptionPropagationStress( int numThreads, std::chrono::milliseconds milliseconds) { auto&& stop = std::atomic<bool>{false}; auto&& exceptions = std::vector<std::aligned_storage<48, 8>::type>{}; auto&& produced = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumed = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumers = std::vector<std::thread>{}; for (auto i = 0; i < numThreads; ++i) { produced.emplace_back(new std::atomic<bool>{false}); consumed.emplace_back(new std::atomic<bool>{false}); exceptions.push_back({}); } auto producer = std::thread{[&]() { auto counter = std::vector<int>(numThreads, 0); for (auto i = 0; true; i = ((i + 1) % numThreads)) { try { throw ExceptionWithConstructionTrack{counter.at(i)++}; } catch (...) { transferCurrentException(exceptions.at(i), produced.at(i)); } while (!consumed.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } consumed.at(i)->store(false, std::memory_order_release); } }}; for (auto i = 0; i < numThreads; ++i) { consumers.emplace_back([&, i]() { auto counter = 0; while (true) { while (!produced.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } produced.at(i)->store(false, std::memory_order_release); try { auto storage = &exceptions.at(i); auto exc = folly::launder( reinterpret_cast<std::exception_ptr>(storage)); auto copy = std::move(exc); exc->std::exception_ptr::~exception_ptr(); std::rethrow_exception(std::move(copy)); } catch (std::exception& exc) { auto value = std::stoi(exc.what()); EXPECT_EQ(value, counter++); } consumed.at(i)->store(true, std::memory_order_release); } }); } std::this_thread::sleep_for(milliseconds); stop.store(true); producer.join(); for (auto& thread : consumers) { thread.join(); } } } // namespace ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5684 Differential Revision: D16746077 Pulled By: miasantreble fbshipit-source-id: 8af88dcf9161c05daec1a76290f577918638f79d	2019-08-14 17:01:31 -07:00
Aaryaman Sagar	38b03c840e	Port folly/synchronization/DistributedMutex to rocksdb (#5642 ) Summary: This ports `folly::DistributedMutex` into RocksDB. The PR includes everything else needed to compile and use DistributedMutex as a component within folly. Most files are unchanged except for some portability stuff and includes. For now, I've put this under `rocksdb/third-party`, but if there is a better folder to put this under, let me know. I also am not sure how or where to put unit tests for third-party stuff like this. It seems like gtest is included already, but I need to link with it from another third-party folder. This also includes some other common components from folly - folly/Optional - folly/ScopeGuard (In particular `SCOPE_EXIT`) - folly/synchronization/ParkingLot (A portable futex-like interface) - folly/synchronization/AtomicNotification (The standard C++ interface for futexes) - folly/Indestructible (For singletons that don't get destroyed without allocations) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5642 Differential Revision: D16544439 fbshipit-source-id: 179b98b5dcddc3075926d31a30f92fd064245731	2019-08-07 14:34:19 -07:00
Vijay Nadimpalli	d150e01474	New API to get all merge operands for a Key (#5604 ) Summary: This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases: 1. Update subset of columns and read subset of columns - Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU. 2. Updating very few attributes in a value which is a JSON-like document - Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge. ---------------------------------------------------------------------------------------------------- API : Status GetMergeOperands( const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* merge_operands, GetMergeOperandsOptions* get_merge_operands_options, int* number_of_operands) Example usage : int size = 100; int number_of_operands = 0; std::vector<PinnableSlice> values(size); GetMergeOperandsOptions merge_operands_info; db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands); Description : Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion. merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604 Test Plan: Added unit test and perf test in db_bench that can be run using the command: ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist Differential Revision: D16657366 Pulled By: vjnadimpalli fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf	2019-08-06 14:26:44 -07:00
Yanqin Jin	b1a02ffeab	Fix make target 'all' and 'check' (#5672 ) Summary: If a test is one of parallel tests, then it should also be one of the 'tests'. Otherwise, `make all` won't build the binaries. For examle, ``` $COMPILE_WITH_ASAN=1 make -j32 all ``` Then if you do ``` $make check ``` The second command will invoke the compilation and building for db_bloom_test and file_reader_writer_test without the `COMPILE_WITH_ASAN=1`, causing the command to fail. Test plan (on devserver): ``` $make -j32 all ``` Verify all binaries are built so that `make check` won't have to compile any thing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5672 Differential Revision: D16655834 Pulled By: riversand963 fbshipit-source-id: 050131412b5313496f85ae3deeeeb8d28af75746	2019-08-05 15:45:56 -07:00
haoyuhuang	70c7302fb5	Block cache simulator: Add pysim to simulate caches using reinforcement learning. (#5610 ) Summary: This PR implements cache eviction using reinforcement learning. It includes two implementations: 1. An implementation of Thompson Sampling for the Bernoulli Bandit [1]. 2. An implementation of LinUCB with disjoint linear models [2]. The idea is that a cache uses multiple eviction policies, e.g., MRU, LRU, and LFU. The cache learns which eviction policy is the best and uses it upon a cache miss. Thompson Sampling is contextless and does not include any features. LinUCB includes features such as level, block type, caller, column family id to decide which eviction policy to use. [1] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11, 1 (July 2018), 1-96. DOI: https://doi.org/10.1561/2200000070 [2] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 661-670. DOI=http://dx.doi.org/10.1145/1772690.1772758 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5610 Differential Revision: D16435067 Pulled By: HaoyuHuang fbshipit-source-id: 6549239ae14115c01cb1e70548af9e46d8dc21bb	2019-07-26 14:41:13 -07:00
Levi Tamasi	3617287e0e	Parallelize db_bloom_filter_test (#5632 ) Summary: This test frequently times out under TSAN; parallelizing it should fix this issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5632 Test Plan: make check buck test mode/dev-tsan internal_repo_rocksdb/repo:db_bloom_filter_test Differential Revision: D16519399 Pulled By: ltamasi fbshipit-source-id: 66e05a644d6f79c6d544255ffcf6de195d2d62fe	2019-07-26 11:48:17 -07:00
Yanqin Jin	74782cec32	Fix target 'clean' to include parallel test binaries (#5629 ) Summary: current `clean` target in Makefile does not remove parallel test binaries. Fix this. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5629 Test Plan: (on devserver) Take file_reader_writer_test for instance. ``` $make -j32 file_reader_writer_test $make clean ``` Verify that binary file 'file_reader_writer_test' is delete by `make clean`. Differential Revision: D16513176 Pulled By: riversand963 fbshipit-source-id: 70acb9f56c928a494964121b86aacc0090f31ff6	2019-07-26 09:56:09 -07:00
anand76	112702ac6c	Parallelize file_reader_writer_test in order to reduce timeouts Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5608 Test Plan: make check buck test mode/dev-tsan internal_repo_rocksdb/repo:file_reader_writer_test -- --run-disabled Differential Revision: D16441796 Pulled By: anand1976 fbshipit-source-id: afbb88a9fcb1c0ba22215118767e8eab3d1d6a4a	2019-07-23 11:50:10 -07:00
Venki Pallipadi	22ce462450	Export Import sst files (#5495 ) Summary: Refresh of the earlier change here - https://github.com/facebook/rocksdb/issues/5135 This is a review request for code change needed for - https://github.com/facebook/rocksdb/issues/3469 "Add support for taking snapshot of a column family and creating column family from a given CF snapshot" We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality. (1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below. // Exports all live SST files of a specified Column Family onto export_dir, // returning SST files information in metadata. // - SST files will be created as hard links when the directory specified // is in the same partition as the db directory, copied otherwise. // - export_dir should not already exist and will be created by this API. // - Always triggers a flush. virtual Status ExportColumnFamily(ColumnFamilyHandle* handle, const std::string& export_dir, ExportImportFilesMetaData metadata); Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by returning the list of file metadata. (2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below. // CreateColumnFamilyWithImport() will create a new column family with // column_family_name and import external SST files specified in metadata into // this column family. // (1) External SST files can be created using SstFileWriter. // (2) External SST files can be exported from a particular column family in // an existing DB. // Option in import_options specifies whether the external files are copied or // moved (default is copy). When option specifies copy, managing files at // external_file_path is caller's responsibility. When option specifies a // move, the call ensures that the specified files at external_file_path are // deleted on successful return and files are not modified on any error // return. // On error return, column family handle returned will be nullptr. // ColumnFamily will be present on successful return and will not be present // on error return. ColumnFamily may be present on any crash during this call. virtual Status CreateColumnFamilyWithImport( const ColumnFamilyOptions& options, const std::string& column_family_name, const ImportColumnFamilyOptions& import_options, const ExportImportFilesMetaData& metadata, ColumnFamilyHandle handle); Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported. If incoming sequence number is higher than current local sequence number, local sequence number is updated to reflect this. Note, as the sst files is are being moved across Column Families, Column Family name in sst file will no longer match the actual column family on destination DB. The API does not modify Column Family name or id in the sst files being imported. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5495 Differential Revision: D16018881 fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9	2019-07-17 12:27:14 -07:00
Yuqi Gu	a3c1832e86	Arm64 CRC32 parallel computation optimization for RocksDB (#5494 ) Summary: Crc32c Parallel computation optimization: Algorithm comes from Intel whitepaper: [crc-iscsi-polynomial-crc32-instruction-paper](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf) Input data is divided into three equal-sized blocks Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes 1. crc32c_test: ``` [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from CRC [ RUN ] CRC.StandardResults [ OK ] CRC.StandardResults (1 ms) [ RUN ] CRC.Values [ OK ] CRC.Values (0 ms) [ RUN ] CRC.Extend [ OK ] CRC.Extend (0 ms) [ RUN ] CRC.Mask [ OK ] CRC.Mask (0 ms) [----------] 4 tests from CRC (1 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1 ms total) [ PASSED ] 4 tests. ``` 2. RocksDB benchmark: db_bench --benchmarks="crc32c" ``` Linear Arm crc32c: crc32c: 1.005 micros/op 995133 ops/sec; 3887.2 MB/s (4096 per op) ``` ``` Parallel optimization with Armv8 crypto extension: crc32c: 0.419 micros/op 2385078 ops/sec; 9316.7 MB/s (4096 per op) ``` It gets ~2.4x speedup compared to linear Arm crc32c instructions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5494 Differential Revision: D16340806 fbshipit-source-id: 95dae9a5b646fd20a8303671d82f17b2e162e945	2019-07-17 11:22:38 -07:00
haoyuhuang	1a59b6e2a9	Cache simulator: Add a ghost cache for admission control and a hybrid row-block cache. (#5534 ) Summary: This PR adds a ghost cache for admission control. Specifically, it admits an entry on its second access. It also adds a hybrid row-block cache that caches the referenced key-value pairs of a Get/MultiGet request instead of its blocks. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5534 Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32 Differential Revision: D16101124 Pulled By: HaoyuHuang fbshipit-source-id: b99edda6418a888e94eb40f71ece45d375e234b1	2019-07-11 12:43:29 -07:00
ggaurav28	60d8b19836	Implemented a file logger that uses WritableFileWriter (#5491 ) Summary: Current PosixLogger performs IO operations using posix calls. Thus the current implementation will not work for non-posix env. Created a new logger class EnvLogger that uses env specific WritableFileWriter for IO operations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5491 Test Plan: make check Differential Revision: D15909002 Pulled By: ggaurav28 fbshipit-source-id: 13a8105176e8e42db0c59798d48cb6a0dbccc965	2019-07-09 16:27:22 -07:00
Adam Retter	5dc9fbd117	Update the version of ZStd for the Rocks Java static build Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5228 Differential Revision: D15880451 Pulled By: sagar0 fbshipit-source-id: 84da6f42cac15367d95bffa5336ebd002e7c3308	2019-06-18 11:57:01 -07:00
haoyuhuang	2d1dd5bce7	Support computing miss ratio curves using sim_cache. (#5449 ) Summary: This PR adds a BlockCacheTraceSimulator that reports the miss ratios given different cache configurations. A cache configuration contains "cache_name,num_shard_bits,cache_capacities". For example, "lru, 1, 1K, 2K, 4M, 4G". When we replay the trace, we also perform lookups and inserts on the simulated caches. In the end, it reports the miss ratio for each tuple <cache_name, num_shard_bits, cache_capacity> in a output file. This PR also adds a main source block_cache_trace_analyzer so that we can run the analyzer in command line. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5449 Test Plan: Added tests for block_cache_trace_analyzer. COMPILE_WITH_ASAN=1 make check -j32. Differential Revision: D15797073 Pulled By: HaoyuHuang fbshipit-source-id: aef0c5c2e7938f3e8b6a10d4a6a50e6928ecf408	2019-06-17 16:41:12 -07:00
Zhongyi Xie	671d15cbdd	Persistent Stats: persist stats history to disk (#5046 ) Summary: This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046 Differential Revision: D15863138 Pulled By: miasantreble fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b	2019-06-17 15:21:50 -07:00
Patrick Zhang	5c76ba9dc4	Support rocksdbjava aarch64 build and test (#5258 ) Summary: Verified with an Ampere Computing eMAG aarch64 system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5258 Differential Revision: D15807309 Pulled By: maysamyabandeh fbshipit-source-id: ab85d2fd3fe40e6094430ab0eba557b1e979510d	2019-06-13 11:48:10 -07:00
haoyuhuang	9bbccda01e	First commit for block cache trace analyzer (#5425 ) Summary: This PR contains the first commit for block cache trace analyzer. It reads a block cache trace file and prints statistics of the traces. We will extend this class to provide more functionalities. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5425 Differential Revision: D15709580 Pulled By: HaoyuHuang fbshipit-source-id: 2f43bd2311f460ab569880819d95eeae217c20bb	2019-06-11 12:22:44 -07:00
haoyuhuang	aa71718ac3	Add block cache tracer. (#5410 ) Summary: This PR adds a help class block cache tracer to read/write block cache accesses. It uses the trace reader/writer to perform this task. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5410 Differential Revision: D15612843 Pulled By: HaoyuHuang fbshipit-source-id: f30fd1e1524355ca87db5d533a5c086728b141ea	2019-06-06 11:24:39 -07:00
Siying Dong	000b9ec217	Move some logging related files to logging/ (#5387 ) Summary: Many logging related source files are under util/. It will be more structured if they are together. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387 Differential Revision: D15579036 Pulled By: siying fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f	2019-05-31 17:23:59 -07:00
Vijay Nadimpalli	49c5a12dbe	Organizing rocksdb/db directory Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390 Differential Revision: D15579388 Pulled By: vjnadimpalli fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366	2019-05-31 11:57:01 -07:00
Siying Dong	8843129ece	Move some memory related files from util/ to memory/ (#5382 ) Summary: Move arena, allocator, and memory tools under util to a separate memory/ directory. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382 Differential Revision: D15564655 Pulled By: siying fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5	2019-05-30 17:44:09 -07:00
Vijay Nadimpalli	50e470791d	Organizing rocksdb/table directory by format Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373 Differential Revision: D15559425 Pulled By: vjnadimpalli fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55	2019-05-30 14:51:11 -07:00
Siying Dong	e9e0101ca4	Move test related files under util/ to test_util/ (#5377 ) Summary: There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo ve them to a new directory test_util/, so that util/ is cleaner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377 Differential Revision: D15551366 Pulled By: siying fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c	2019-05-30 11:25:51 -07:00
Siying Dong	545d206040	Move some file related files outside util/ (#5375 ) Summary: util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375 Differential Revision: D15550935 Pulled By: siying fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2	2019-05-29 20:47:06 -07:00
Adam Retter	5882e847aa	Allow builds of RocksJava debug releases (#5274 ) Summary: This allows debug releases of RocksJava to be build with the Docker release targets. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5274 Differential Revision: D15185067 Pulled By: sagar0 fbshipit-source-id: f3988e472f281f5844d9a07098344a827b1e7eb1	2019-05-02 14:27:20 -07:00
Yuqi Gu	03c7ae24c2	RocksDB CRC32c optimization with ARMv8 Intrinsic (#5221 ) Summary: 1. Add Arm linear crc32c implemtation for RocksDB. 2. Arm runtime check for crc32 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5221 Differential Revision: D15013685 Pulled By: siying fbshipit-source-id: 2c2983743d26656d93f212dc7c1a3cf66a1acf12	2019-04-30 10:59:05 -07:00
Yanqin Jin	9358178edc	Support for single-primary, multi-secondary instances (#4899 ) Summary: This PR allows RocksDB to run in single-primary, multi-secondary process mode. The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary. Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`. This PR has several components: 1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary. 2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue. 3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`. 3.1 Tail the primary's MANIFEST during recovery. 3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`. 3.3 Tailing WAL will be in a future PR. 4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899 Differential Revision: D14510945 Pulled By: riversand963 fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886	2019-03-26 16:45:31 -07:00
Adam Retter	bb474e9a02	Add missing functionality to RocksJava (#4833 ) Summary: This is my latest round of changes to add missing items to RocksJava. More to come in future PRs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4833 Differential Revision: D14152266 Pulled By: sagar0 fbshipit-source-id: d6cff67e26da06c131491b5cf6911a8cd0db0775	2019-02-22 14:46:46 -08:00
Yanqin Jin	7d23210226	Separate crash test with atomic flush (#4945 ) Summary: Currently crash test covers cases with and without atomic flush, but takes too long to finish. Therefore it may be a better idea to put crash test with atomic flush in a separate set of tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4945 Differential Revision: D13947548 Pulled By: riversand963 fbshipit-source-id: 177c6de865290fd650b0103408339eaa3f801d8c	2019-02-19 14:08:39 -08:00
Yanqin Jin	5af9446ee6	Remove Lua compaction filter from RocksDB main repo (#4971 ) Summary: as title. For people who continue to need Lua compaction filter, you can copy the include/rocksdb/utilities/rocks_lua/lua_compaction_filter.h and utilities/lua/rocks_lua_compaction_filter.cc to your own codebase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4971 Differential Revision: D14047468 Pulled By: riversand963 fbshipit-source-id: 9ad1a6484a7c94e478f1e108127a3184e4069f70	2019-02-13 12:42:44 -08:00
Yanqin Jin	95604d13e9	Change the command to invoke parallel tests (#4922 ) Summary: We used to call `printf $(t_run)` and later feed the result to GNU parallel in the recipe of target `check_0`. However, this approach is problematic when the length of $(t_run) exceeds the maximum length of a command and the `printf` command cannot be executed. Instead we use 'find -print' to avoid generating an overly long command. This PR is actually the last commit of #4916. Prefer to merge this PR separately. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4922 Differential Revision: D13845883 Pulled By: riversand963 fbshipit-source-id: b56de7f7af43337c6ec89b931de843c9667cb679	2019-01-28 15:02:26 -08:00
Yanqin Jin	e1de88c8c7	Escape '.' by adding a '\' to avoid matching any char Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4912 Differential Revision: D13789449 Pulled By: riversand963 fbshipit-source-id: 0639dae82049b7ac977c8f81851f1c9fdc346705	2019-01-24 11:25:27 -08:00
Remington Brasga	1eded07f00	Bug in Regular Expression in Makefile (#4682 ) Summary: False-negative about path not existing. The regex is ignoring the "." in front of a path. Example: "./path/to/file" Pull Request resolved: https://github.com/facebook/rocksdb/pull/4682 Differential Revision: D13777110 Pulled By: sagar0 fbshipit-source-id: 9f8173b7581407555fdc055580732aeab37d4ade	2019-01-23 10:24:10 -08:00
Varadharajan	349c7cceff	Fix downloaded filename of snappy (#4870 ) Summary: Build failing due to incorrect filename. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4870 Differential Revision: D13637205 Pulled By: sagar0 fbshipit-source-id: 72da45d51b49bce32f696532ba0656ee0dc2b89f	2019-01-11 10:29:40 -08:00
Siying Dong	1fb2e274c5	Remove some components (#4101 ) Summary: Remove some components that we never heard people using them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4101 Differential Revision: D8825431 Pulled By: siying fbshipit-source-id: 97a12ad3cad4ab12c82741a5ba49669aaa854180	2019-01-10 13:30:09 -08:00
Jakub Tomanik	71a69d9b68	Fix building RocksDB for iOS (#4687 ) Summary: This PR contains the following fixes: 1. Fixing Makefile to support non-default locations of developer tools 2. Fixing compile error using a patch from https://github.com/facebook/rocksdb/pull/4007 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4687 Differential Revision: D13287263 Pulled By: riversand963 fbshipit-source-id: 4525eb42ba7b6f82af5f9bfb8e52fa4024e27ccc	2018-12-19 14:13:55 -08:00
Adam Retter	257b458121	Update the version of the dependencies used by the RocksJava static build (#4761 ) Summary: Note that Snappy now requires CMake to build it, so I added a note about RocksJava to the README.md file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4761 Differential Revision: D13403811 Pulled By: ajkr fbshipit-source-id: 8fcd7e3dc7f7152080364a374d3065472f417eff	2018-12-18 20:25:43 -08:00
Abhishek Madan	81b6b09f6b	Remove v1 RangeDelAggregator (#4778 ) Summary: Now that v2 is fully functional, the v1 aggregator is removed. The v2 aggregator has been renamed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4778 Differential Revision: D13495930 Pulled By: abhimadan fbshipit-source-id: 9d69500a60a283e79b6c4fa938fc68a8aa4d40d6	2018-12-17 17:33:46 -08:00
Adam Retter	84001cfa96	Cache dependencies for static build of RocksJava (#4769 ) Summary: Avoids re-downloading the .tar.gz files for the static build of RocksJava if they are already present. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4769 Differential Revision: D13491919 Pulled By: ajkr fbshipit-source-id: 9265f577e049838dc40335d54f1ff2b4f972c38c	2018-12-17 13:32:24 -08:00
Huachao Huang	5e72bc113a	Add SstFileReader to read sst files (#4717 ) Summary: A user friendly sst file reader is useful when we want to access sst files outside of RocksDB. For example, we can generate an sst file with SstFileWriter and send it to other places, then use SstFileReader to read the file and process the entries in other ways. Also rename the original SstFileReader to SstFileDumper because of name conflict, and seems SstFileDumper is more appropriate for tools. TODO: there is only a very simple test now, because I want to get some feedback first. If the changes look good, I will add more tests soon. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717 Differential Revision: D13212686 Pulled By: ajkr fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56	2018-11-27 13:02:23 -08:00
Abhishek Madan	457f77b9ff	Introduce RangeDelAggregatorV2 (#4649 ) Summary: The old RangeDelAggregator did expensive pre-processing work to create a collapsed, binary-searchable representation of range tombstones. With FragmentedRangeTombstoneIterator, much of this work is now unnecessary. RangeDelAggregatorV2 takes advantage of this by seeking in each iterator to find a covering tombstone in ShouldDelete, while doing minimal work in AddTombstones. The old RangeDelAggregator is still used during flush/compaction for now, though RangeDelAggregatorV2 will support those uses in a future PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4649 Differential Revision: D13146964 Pulled By: abhimadan fbshipit-source-id: be29a4c020fc440500c137216fcc1cf529571eb3	2018-11-21 10:56:45 -08:00
Sagar Vemuri	dd742e2416	Automatically set LITE=1 on passing OPT="-DROCKSDB_LITE" (#4671 ) Summary: In #4652 we are setting -Os for lite builds only when LITE=1 is specified. But currently almost all the users invoke lite build via OPT="-DROCKSDB_LITE=1". So this diff tries to set LITE=1 when users already pass in -DROCKSDB_LITE=1 via the command line. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4671 Differential Revision: D13033801 Pulled By: sagar0 fbshipit-source-id: e7b506cee574f9e3f42221ee6647915011c78d78	2018-11-12 16:58:54 -08:00
Yi Wu	7a2f98a0fc	Fix liblua link error when building shared lib under fbcode (#4651 ) Summary: When running `make shared_lib` under fbcode, there's liblua link error: https://gist.github.com/yiwu-arbug/b796bff6b3d46d90c1ed878d983de50d This is because we link liblua.a when building shared lib. If we want to link with liblua, we need to link with liblua_pic.a instead. Fixing by simply not link with lua. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4651 Differential Revision: D12964798 Pulled By: yiwu-arbug fbshipit-source-id: 18d6cee94afe20748068822b76e29ef255cdb04d	2018-11-09 14:13:40 -08:00
Yi Wu	8c2a48742a	Use -Os for lite release build (#4652 ) Summary: Set `-Os` for lite release build to minimize binary size. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4652 Differential Revision: D12965427 Pulled By: yiwu-arbug fbshipit-source-id: c8b647642c24b3e5df6a2cd13112e452a08e8398	2018-11-07 22:10:28 -08:00
Yanqin Jin	912bbbbc72	Enable crash-recovery stress test for atomic flush (#4605 ) Summary: This PR adds test of atomic flush to our continuous stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605 Differential Revision: D12840607 Pulled By: riversand963 fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9	2018-10-30 14:03:36 -07:00
Abhishek Madan	8c78348c77	Use only "local" range tombstones during Get (#4449 ) Summary: Previously, range tombstones were accumulated from every level, which was necessary if a range tombstone in a higher level covered a key in a lower level. However, RangeDelAggregator::AddTombstones's complexity is based on the number of tombstones that are currently stored in it, which is wasteful in the Get case, where we only need to know the highest sequence number of range tombstones that cover the key from higher levels, and compute the highest covering sequence number at the current level. This change introduces this optimization, and removes the use of RangeDelAggregator from the Get path. In the benchmark results, the following command was used to initialize the database: ``` ./db_bench -db=/dev/shm/5k-rts -use_existing_db=false -benchmarks=filluniquerandom -write_buffer_size=1048576 -compression_type=lz4 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -value_size=112 -key_size=16 -block_size=4096 -level_compaction_dynamic_level_bytes=true -num=5000000 -max_background_jobs=12 -benchmark_write_rate_limit=20971520 -range_tombstone_width=100 -writes_per_range_tombstone=100 -max_num_range_tombstones=50000 -bloom_bits=8 ``` ...and the following command was used to measure read throughput: ``` ./db_bench -db=/dev/shm/5k-rts/ -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=5000000 -reads=100000 -threads=32 ``` The filluniquerandom command was only run once, and the resulting database was used to measure read performance before and after the PR. Both binaries were compiled with `DEBUG_LEVEL=0`. Readrandom results before PR: ``` readrandom : 4.544 micros/op 220090 ops/sec; 16.9 MB/s (63103 of 100000 found) ``` Readrandom results after PR: ``` readrandom : 11.147 micros/op 89707 ops/sec; 6.9 MB/s (63103 of 100000 found) ``` So it's actually slower right now, but this PR paves the way for future optimizations (see #4493). ---- Pull Request resolved: https://github.com/facebook/rocksdb/pull/4449 Differential Revision: D10370575 Pulled By: abhimadan fbshipit-source-id: 9a2e152be1ef36969055c0e9eb4beb0d96c11f4d	2018-10-24 12:31:12 -07:00
Yi Wu	742302a1a3	Fix compile error with aligned-new (#4576 ) Summary: In fbcode when we build with clang7++, although -faligned-new is available in compile phase, we link with an older version of libstdc++.a and it doesn't come with aligned-new support (e.g. `nm libstdc++.a \| grep align_val_t` return empty). In this case the previous -faligned-new detection can pass but will end up with link error. Fixing it by only have the detection for non-fbcode build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4576 Differential Revision: D10500008 Pulled By: yiwu-arbug fbshipit-source-id: b375de4fbb61d2a08e54ab709441aa8e7b4b08cf	2018-10-23 10:55:41 -07:00
Yi Wu	d6f2ecf49c	Utility to run task periodically in a thread (#4423 ) Summary: Introduce `RepeatableThread` utility to run task periodically in a separate thread. It is basically the same as the the same class in fbcode, and in addition provide a helper method to let tests mock time and trigger execution one at a time. We can use this class to replace `TimerQueue` in #4382 and `BlobDB`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4423 Differential Revision: D10020932 Pulled By: yiwu-arbug fbshipit-source-id: 3616bef108c39a33c92eedb1256de424b7c04087	2018-09-27 15:28:00 -07:00
Abhishek Madan	1626f6ab6b	Add RangeDelAggregator microbenchmarks (#4363 ) Summary: To measure the results of upcoming DeleteRange v2 work, this commit adds simple benchmarks for RangeDelAggregator. It measures the average time for AddTombstones and ShouldDelete calls. Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results: Before #4014: ``` ======================= Results: ======================= AddTombstones: 1356.28 us ShouldDelete: 0.401732 us ``` Latest master: ``` ======================= Results: ======================= AddTombstones: 740.82 us ShouldDelete: 0.383271 us ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363 Differential Revision: D9881676 Pulled By: abhimadan fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442	2018-09-17 14:58:31 -07:00
Yanqin Jin	3ba3b153ef	Fix Makefile target 'jtest' on PowerPC (#4357 ) Summary: Before the fix: On a PowerPC machine, run the following ``` $ make jtest ``` The command will fail due to "undefined symbol: crc32c_ppc". It was caused by 'rocksdbjava' Makefile target not including crc32c_ppc object files when generating the shared lib. The fix is simple. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4357 Differential Revision: D9779474 Pulled By: riversand963 fbshipit-source-id: 3c5ec9068c2b9c796e6500f71cd900267064fd51	2018-09-11 16:37:23 -07:00
Zhongyi Xie	1cf17ba53b	Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323 ) Summary: Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict. Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323 Differential Revision: D9599170 Pulled By: miasantreble fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029	2018-08-30 18:42:51 -07:00
Sagar Vemuri	2f871bc85e	Download bzip2 packages from Internet Archive (#4306 ) Summary: Since bzip.org is no longer maintained, download the bzip2 packages from a snapshot taken by the internet archive until we figure out a more credible source. Fixes issue: #4305 Pull Request resolved: https://github.com/facebook/rocksdb/pull/4306 Differential Revision: D9514868 Pulled By: sagar0 fbshipit-source-id: 57c6a141a62e652f94377efc7ca9916b458e68d5	2018-08-27 09:58:24 -07:00
Zhichao Cao	9e2d5ab6bf	Adjusted the Makefile of trace_analyzer to isolate the Gflags from other (#4290 ) Summary: Previously, the trace_analyzer_tool will be complied with other libobjects, which let the GFLAGS of trace_analyzer appear in other tools (e.g., db_bench, rocksdb_dump, and etc.). When using '--help', the help information of trace_analyzer will appear in other tool help information, which will cause confusion issues. Currently, trace_analyzer_tool is built and used only by trace_analyzer and trace_analyzer_test to avoid the issues. Tested with make asan_check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4290 Differential Revision: D9413163 Pulled By: zhichao-cao fbshipit-source-id: ed5d20c4575a53ca15ff62a2ffe601d5cf278cc4	2018-08-21 10:47:24 -07:00
Zhichao Cao	999d955e4f	RocksDB Trace Analyzer (#4091 ) Summary: A framework of trace analyzing for RocksDB After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload. Input: 1. trace file 2. Whole keys space file Statistics: 1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family. 2. Key hotness (access count) of each one 3. Key space separation based on given prefix 4. Key size distribution 5. Value size distribution if appliable 6. Top K accessed keys 7. QPS statistics including the average QPS and peak QPS 8. Top K accessed prefix 9. The query correlation analyzing, output the number of X after Y and the corresponding average time intervals Output: 1. key access heat map (either in the accessed key space or whole key space) 2. trace sequence file (interpret the raw trace file to line base text file for future use) 3. Time serial (The key space ID and its access time) 4. Key access count distritbution 5. Key size distribution 6. Value size distribution (in each intervals) 7. whole key space separation by the prefix 8. Accessed key space separation by the prefix 9. QPS of each operation and each column family 10. Top K QPS and their accessed prefix range Test: 1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge 2. Generated the trace and analyze the trace Implemented but not tested (due to the limitation of trace_replay): 1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing 2. Analyzing the number of Key found by Get Future Work: 1. Support execution time analyzing of each requests 2. Support cache hit situation and block read situation of Get Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091 Differential Revision: D9256157 Pulled By: zhichao-cao fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6	2018-08-13 11:44:02 -07:00
Yanqin Jin	bdc6abd0b4	Enable cscope to exclude test source files (#4190 ) Summary: Usually when using cscope, the query results contain a lot of function calls in test, making it hard to browse. So this PR aims to provide an option to exclude test source files. Add a new PHONY target, tags0, to exclude test source files while using cscope. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4190 Differential Revision: D9015901 Pulled By: riversand963 fbshipit-source-id: ea9a45756ccff5b26344d37e9ff1c02c5d9736d6	2018-07-26 11:12:29 -07:00
Fenggang Wu	8805ec2f49	DataBlockHashIndex: Standalone Implementation with Unit Test (#4139 ) Summary: The first step of the `DataBlockHashIndex` implementation. A string based hash table is implemented and unit-tested. `DataBlockHashIndexBuilder`: `Add()` takes pairs of `<key, restart_index>`, and formats it into a string when `Finish()` is called. `DataBlockHashIndex`: initialized by the formatted string, and can interpret it as a hash table. Lookup for a key is supported by iterator operation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4139 Reviewed By: sagar0 Differential Revision: D8866764 Pulled By: fgwu fbshipit-source-id: 7f015f0098632c65979a22898a50424384730b10	2018-07-24 11:43:37 -07:00
Adam Retter	c6d2a7f821	Build improvements: Split docker targets and parallelize java builds Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4165 Differential Revision: D8955531 Pulled By: sagar0 fbshipit-source-id: 97d5a1375e200bde3c6414f94703504a4ed7536a	2018-07-23 13:28:37 -07:00
Maysam Yabandeh	537a233941	Exclude StackableDB from transaction stress tests (#4132 ) Summary: The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests. On a single core it reduced the runtime of transaction_test from 199s to 135s. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132 Differential Revision: D8841655 Pulled By: maysamyabandeh fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19	2018-07-13 13:59:11 -07:00
Anand Ananthabhotla	52d4c9b7f6	Allow DB resume after background errors (#3997 ) Summary: Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts - 1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not 2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place 3. Provide an API for the user to clear the error and resume the DB instance This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors. Closes https://github.com/facebook/rocksdb/pull/3997 Differential Revision: D8653831 Pulled By: anand1976 fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd	2018-06-28 12:34:40 -07:00
Daniel Black	346d1069c3	Align StatisticsImpl / StatisticsData (#4036 ) Summary: Pinned the alignment of StatisticsData to the cacheline size rather than just extending its size (which could go over two cache lines)if unaligned in allocation. Avoid compile errors in the process as per individual commit messages. strengthen static_assert to CACHELINE rather than the highest common multiple. Closes https://github.com/facebook/rocksdb/pull/4036 Differential Revision: D8582844 Pulled By: yiwu-arbug fbshipit-source-id: 363c37029f28e6093e06c60b987bca9aa204bc71	2018-06-25 22:58:19 -07:00
Adam Retter	64c85d0d97	Set DEBUG_LEVEL=0 for RocksJava Mac Release (#4040 ) Summary: Closes https://github.com/facebook/rocksdb/issues/2717 Closes https://github.com/facebook/rocksdb/pull/4040 Differential Revision: D8592058 Pulled By: sagar0 fbshipit-source-id: d01099a1067aa32659abb0b4bed641d919a3927e	2018-06-22 10:57:48 -07:00
Manuel Ung	01e3c30def	Extend existing unit tests to run with WriteUnprepared as well Summary: As titled. I have not extended the Compatibility tests because the new WAL markers are still unimplemented. Closes https://github.com/facebook/rocksdb/pull/3941 Differential Revision: D8238394 Pulled By: lth fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d	2018-06-01 14:58:41 -07:00
Mike Kolupaev	8bf555f487	Change and clarify the relationship between Valid(), status() and Seek() for all iterators. Also fix some bugs Summary: Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are: If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted. * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not. However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours). This PR changes the convention to: * If status() is not ok, Valid() always returns false. * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.) This does sacrifice the two use cases listed above, but siying said it's ok. Overview of the changes: * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario. * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed. * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator. To find the iterator types that needed changes I searched for "public .Iterator" in the code. Here's an overview of all 27 iterator types: Iterators that didn't need changes: status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator. * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator. Iterators with changes (see inline comments for details): * DBIter - an overhaul: - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back. - It had a few code paths silently discarding subiterator's status. The stress test caught a few. - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example. - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption. - It used to not reset status on seek for some types of errors. - Some simplifications and better comments. - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer. * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status. * ForwardIterator - changed to the new convention, also slightly simplified. * ForwardLevelIterator - fixed some bugs and simplified. * LevelIterator - simplified. * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_. * BlockBasedTableIterator - minor changes. * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid. * PlainTableIterator - some seeks used to not reset status. * CuckooTableIterator - tiny code cleanup. * ManagedIterator - fixed some bugs. * BaseDeltaIterator - changed to the new convention and fixed a bug. * BlobDBIterator - seeks used to not reset status. * KeyConvertingIterator - some small change. Closes https://github.com/facebook/rocksdb/pull/3810 Differential Revision: D7888019 Pulled By: al13n321 fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d	2018-05-17 02:56:56 -07:00
Andrew Kryczka	6fc1bccef5	Fix crash test allocation error under TSAN Summary: We were seeing the following error: "ThreadSanitizer: DenseSlabAllocator overflow. Dying." It is fixable by mmap'ing a smaller region for keys' expected values, which this PR achieves by reducing the number of keys. Closes https://github.com/facebook/rocksdb/pull/3803 Differential Revision: D7874478 Pulled By: ajkr fbshipit-source-id: 433939f5cb92410ab4777d540cb0cc2ee0fe6c2e	2018-05-04 13:44:04 -07:00
David Lai	3be9b36453	comment unused parameters to turn on -Wunused-parameter flag Summary: This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557. Closes https://github.com/facebook/rocksdb/pull/3662 Differential Revision: D7426121 Pulled By: Dayvedde fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352	2018-04-12 17:59:16 -07:00
zhsj	6571770030	fix shared libary compile on ppc Summary: shared-ppc-objects is missed in $(SHARED4) target Closes https://github.com/facebook/rocksdb/pull/3619 Differential Revision: D7475767 Pulled By: ajkr fbshipit-source-id: d957ac7290bab3cd542af504405fb5ff912bfbf1	2018-04-05 19:58:20 -07:00
Adam Retter	3cb591954e	Allow rocksdbjavastatic to also be built as debug build Summary: Closes https://github.com/facebook/rocksdb/pull/3654 Differential Revision: D7417948 Pulled By: sagar0 fbshipit-source-id: 9514df9328181e54a6384764444c0c7ce66e7f5f	2018-03-28 16:30:36 -07:00
Yanqin Jin	1f5def1653	Fix race condition causing double deletion of ssts Summary: Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files, but this PR targets sst. Also add a unit test to verify that this PR fixes the issue. The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows ``` timeline user_thread background_compaction thread t1 \| FindObsoleteFiles(full_scan=false) t2 \| FindObsoleteFiles(full_scan=true) t3 \| PurgeObsoleteFiles t4 \| PurgeObsoleteFiles V ``` When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`. To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files. ajkr could you take a look and comment? Thanks! Closes https://github.com/facebook/rocksdb/pull/3638 Differential Revision: D7384372 Pulled By: riversand963 fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3	2018-03-28 10:29:59 -07:00
Sagar Vemuri	23f9d93f47	Exclude MySQLStyleTransactionTest.TransactionStressTest* from valgrind Summary: I found that each instance of MySQLStyleTransactionTest.TransactionStressTest/x is taking more than 10 hours to complete on our continuous testing environment, causing the whole valgrind run to timeout after a day. So excluding these tests. Closes https://github.com/facebook/rocksdb/pull/3652 Differential Revision: D7400332 Pulled By: sagar0 fbshipit-source-id: 987810574506d01487adf7c2de84d4817ec3d22d	2018-03-26 10:27:47 -07:00
Tobias Tschinkowitz	ccb761364d	Enable compilation on OpenBSD Summary: I modified the Makefile so that we can compile rocksdb on OpenBSD. The instructions for building have been added to INSTALL.md. The whole compilation process works fine like this on OpenBSD-current Closes https://github.com/facebook/rocksdb/pull/3617 Differential Revision: D7323754 Pulled By: siying fbshipit-source-id: 990037d1cc69138d22f85bd77ef4dc8c1ba9edea	2018-03-19 12:30:05 -07:00
Yanqin Jin	1139422dfb	Fix the command used to generate ctags Summary: In original $ROCKSDB_HOME/Makefile, the command used to generate ctags is ``` ctags * -R ``` However, this failed to generate tags for me. I did some search on the usage of ctags command and found that it should be ``` ctags -R . ``` or ``` ctags -R * ``` After the change, I can find the tags in vim using `:ts <identifier>`. Closes https://github.com/facebook/rocksdb/pull/3626 Reviewed By: ajkr Differential Revision: D7320217 Pulled By: riversand963 fbshipit-source-id: e4cd8f8a67842370a2343f0213df3cbd07754111	2018-03-18 22:43:18 -07:00
Zhongyi Xie	09e5d7af8c	add 4th test_group in travis Summary: to overcome the space limitation Closes https://github.com/facebook/rocksdb/pull/3605 Differential Revision: D7262607 Pulled By: miasantreble fbshipit-source-id: 1b1148026f17a7ee4b9f3a17ddc6b4ba9cf7af7f	2018-03-13 18:57:29 -07:00
Pengchao Wang	1ccdc2c337	Fix vagrant build process Summary: https://blog.github.com/2018-02-23-weak-cryptographic-standards-removed/ Github dropped supporting some weak cryptographic protocols from their website couple of weeks ago which cause our vagrant build process to fail on curl downloading step. This diff force curl use tls v1.2 protocol if it is supported so that it does not rely on the default protocol on different systems. Closes https://github.com/facebook/rocksdb/pull/3561 Differential Revision: D7148575 Pulled By: wpc fbshipit-source-id: b8cecfdfeb2bc8236de2d0d14f044532befec98c	2018-03-05 11:57:41 -08:00
Siying Dong	74748611a8	Suppress UBSAN error in finer guanularity Summary: Now we suppress alignment UBSAN error as a whole. Suppressing 3-way CRC and murmurhash feels a better idea than turning off alignment check as a whole. Closes https://github.com/facebook/rocksdb/pull/3495 Differential Revision: D6971273 Pulled By: siying fbshipit-source-id: 080b59fed6df494b9f622ef7cb5d42d39e6a8cdf	2018-02-13 12:18:07 -08:00
Siying Dong	821e0b1683	Disable options_settable_test in UBSAN and fix UBSAN failure in blob_… Summary: …db_test options_settable_test won't pass UBSAN so disable it. blob_db_test fails in UBSAN as SnapshotList doesn't initialize all the fields in dummy snapshot. Fix it. I don't understand why only blob_db_test fails though. Closes https://github.com/facebook/rocksdb/pull/3477 Differential Revision: D6928681 Pulled By: siying fbshipit-source-id: e31dd300fcdecdfd4f6af279a0987fd0cdec5122	2018-02-07 14:42:26 -08:00
Siying Dong	1336a7742d	Disable alignment check in UBSAN Summary: Disable alignment check in UBSAN for now. Now we can't get signals to meaningful failures. We can reenable it after we figure out how we can suppress failures in finer grain manner. Closes https://github.com/facebook/rocksdb/pull/3473 Differential Revision: D6925971 Pulled By: siying fbshipit-source-id: a0f1a242cde866abbc5c1eeee9ff8d1d7d582ac4	2018-02-07 10:58:01 -08:00
Zhongyi Xie	5eccf0b9d5	add -fno-sanitize-recover option to force exit on errors Summary: By default if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program. In order to make test abort on errors, option `-fno-sanitize-recover` is needed. [link](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html) Closes https://github.com/facebook/rocksdb/pull/3447 Differential Revision: D6854654 Pulled By: miasantreble fbshipit-source-id: c48e892b0b38307029df38a67adda0e24257e481	2018-01-31 12:13:00 -08:00
Sagar Vemuri	68829ed89c	Revert Snappy version upgrade Summary: Java static builds are again broken, this time due Snappy version upgrade introduced in `90c1d81975` (#3331). This is due to two reasons: 1. The new Snappy packages should now be downloaded from https://github.com/google/snappy/archive/<pkg.tar.gz> instead of https://github.com/google/snappy/releases/download/<pkg.tar.gz> which we are using now. 1. In addition to the the above URL change, Snappy changed its build from using autotools to CMake based : `e69d9f8806/README.md (L65-L72)` So more changes are needed if we are going to upgrade to 1.1.7. Hence reverting the version upgrade until we figure them out. Closes https://github.com/facebook/rocksdb/pull/3363 Differential Revision: D6716983 Pulled By: sagar0 fbshipit-source-id: f451a1bc5eb0bb090f4da07bc3e5ba72cf89aefa	2018-01-12 23:41:43 -08:00
Sagar Vemuri	e446d14093	Fix PowerPC dynamic java build Summary: Java build on PPC64le has been broken since a few months, due to #2716. Fixing it with the least amount of changes. (We should cleanup a little around this code when time permits). This should fix the build failures seen in http://140.211.168.68:8080/job/Rocksdb/ . Closes https://github.com/facebook/rocksdb/pull/3359 Differential Revision: D6712938 Pulled By: sagar0 fbshipit-source-id: 3046e8f072180693de2af4762934ec1ace309ca4	2018-01-12 10:57:14 -08:00
Adam Retter	a53c571d2d	FreeBSD build support for RocksDB and RocksJava Summary: Tested on a clean FreeBSD 11.01 x64. Closes https://github.com/facebook/rocksdb/pull/1423 Closes https://github.com/facebook/rocksdb/pull/3357 Differential Revision: D6705868 Pulled By: sagar0 fbshipit-source-id: cbccbbdafd4f42922512ca03619a5d5583a425fd	2018-01-11 13:29:55 -08:00
Sagar Vemuri	3e955fad09	Fix zstd/zdict include path for java static build Summary: With the ZSTD dictionary generator support added in #3057 `PORTABLE=1 ROCKSDB_NO_FBCODE=1 make rocksdbjavastatic` fails as it can't find zdict.h. Specifically due to: `e3a06f12d2/util/compression.h (L39)` In java static builds zstd code gets directly downloaded from https://github.com/facebook/zstd , and in there zdict.h is under dictBuilder directory. So, I modified libzstd.a target to use `make install` to collect all the header files into a single location and used that as the zstd's include path. Closes https://github.com/facebook/rocksdb/pull/3260 Differential Revision: D6669850 Pulled By: sagar0 fbshipit-source-id: f8a7562a670e5aed4c4fb6034a921697590d7285	2018-01-05 15:41:46 -08:00
Adam Retter	90c1d81975	Update javastatic dependencies Summary: 1. Snappy 1.1.7 2. LZ4 1.8.0 3. ZSTD 1.3.3 Closes https://github.com/facebook/rocksdb/pull/3331 Differential Revision: D6667933 Pulled By: ajkr fbshipit-source-id: 21c526609df7580481195a389d31f733e2695e65	2018-01-05 12:11:44 -08:00
Andrew Kryczka	ea8ccd2267	fix powerpc java static build Summary: added support for C and asm files as required for `e612e31740`. Closes https://github.com/facebook/rocksdb/pull/3299 Differential Revision: D6612479 Pulled By: ajkr fbshipit-source-id: 6263ed7c1602f249460421825c76b5721f396163	2018-01-03 12:41:37 -08:00
yingsu00	f54d7f5fea	Port 3 way SSE4.2 crc32c implementation from Folly Summary: # Summary RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain. This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used. # Performance Test Results We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm. Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s. 1) ReadRandom in db_bench overall metrics PER RUN Algorithm \| run \| micros/op \| ops/sec \|Throughput (MB/s) 3-way \| 1 \| 4.143 \| 241387 \| 26.7 3-way \| 2 \| 3.775 \| 264872 \| 29.3 3-way \| 3 \| 4.116 \| 242929 \| 26.9 FastCrc32c\|1 \| 4.037 \| 247727 \| 27.4 FastCrc32c\|2 \| 4.648 \| 215166 \| 23.8 FastCrc32c\|3 \| 4.352 \| 229799 \| 25.4 AVG Algorithm \| Average of micros/op \| Average of ops/sec \| Average of Throughput (MB/s) 3-way \| 4.01 \| 249,729 \| 27.63 FastCrc32c \| 4.35 \| 230,897 \| 25.53 2) Crc32c computation CPU cost (inclusive samples percentage) PER RUN Implementation \| run \| TotalSamples \| Crc32c percentage 3-way \| 1 \| 4,572,250,000 \| 4.37% 3-way \| 2 \| 3,779,250,000 \| 4.62% 3-way \| 3 \| 4,129,500,000 \| 4.48% FastCrc32c \| 1 \| 4,663,500,000 \| 11.24% FastCrc32c \| 2 \| 4,047,500,000 \| 12.34% FastCrc32c \| 3 \| 4,366,750,000 \| 11.68% # Test Plan make -j64 corruption_test && ./corruption_test By default it uses 3-way SSE algorithm NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test make clean && DEBUG_LEVEL=0 make -j64 db_bench make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench Closes https://github.com/facebook/rocksdb/pull/3173 Differential Revision: D6330882 Pulled By: yingsu00 fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9	2017-12-19 18:26:49 -08:00
Maysam Yabandeh	0faa026db6	WritePrepared Txn: make buck tests parallel Summary: The TSAN version of tests could take quite long. Make the buck tests parallel to avoid timeouts. Closes https://github.com/facebook/rocksdb/pull/3280 Differential Revision: D6581594 Pulled By: maysamyabandeh fbshipit-source-id: 3f8476d8c69f0183e394fa8a2089dd8d4e90c90c	2017-12-18 14:42:09 -08:00
Yi Wu	bbcd3b0bd2	Suppress valgrind "unimplemented functionality" error Summary: Add ROCKSDB_VALGRIND_RUN macro and suppress false-positive "unimplemented functionality" throw by valgrind for steam hints. Another approach would be add a valgrind suppress file. Valgrind is suppose to print the suppression when given "--gen-suppressions=all" param, which is suppose to be the content for the suppression file. But it doesn't print. Closes https://github.com/facebook/rocksdb/pull/3174 Differential Revision: D6338786 Pulled By: yiwu-arbug fbshipit-source-id: 3559efa5f3b92d40d09ad6ac82bc7b59f86c75aa	2017-11-15 14:28:34 -08:00
Yi Wu	42564ada53	Blob DB: not using PinnableSlice move assignment Summary: The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky. Closes https://github.com/facebook/rocksdb/pull/3164 Differential Revision: D6319201 Pulled By: yiwu-arbug fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48	2017-11-13 18:12:20 -08:00
Yi Wu	31d3e41810	PinnableSlice move assignment Summary: Allow `std::move(pinnable_slice)`. Closes https://github.com/facebook/rocksdb/pull/2997 Differential Revision: D6036782 Pulled By: yiwu-arbug fbshipit-source-id: 583fb0419a97e437ff530f4305822341cd3381fa	2017-10-12 18:28:24 -07:00
Andrew Kryczka	5a6ad9d52a	release build treat warnings as errors Summary: fixing warnings is important, especially for release code. Closes https://github.com/facebook/rocksdb/pull/2971 Differential Revision: D5980596 Pulled By: ajkr fbshipit-source-id: 04f4ea3fb005dcda33d60342e4361e380bc4dfb1	2017-10-05 12:41:52 -07:00
Andrew Kryczka	1026e794a3	rate limit auto-tuning Summary: Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in `GenericRateLimiter::Tune`. Other changes: - make rate limiter's `Env*` configurable for testing - track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter. Closes https://github.com/facebook/rocksdb/pull/2899 Differential Revision: D5858704 Pulled By: ajkr fbshipit-source-id: cc2bac30f85e7f6fd63655d0a6732ef9ed7403b1	2017-10-04 19:15:01 -07:00
Yi Wu	92ccae7123	speedup 'make check' Summary: Make SnapshotConcurrentAccessTest run in the beginning of the queue. Test Plan `make all check -j64` on devserver Closes https://github.com/facebook/rocksdb/pull/2962 Differential Revision: D5965871 Pulled By: yiwu-arbug fbshipit-source-id: 8cb5a47c2468be0fbbb929226a143ec5848bfaa9	2017-10-03 12:11:49 -07:00
Yi Wu	d1cab2b64e	Add ValueType::kTypeBlobIndex Summary: Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to 1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex. 2. Make rocksdb able to detect if the db contains value written by blob db, if so return error. 3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type). The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob(). Changes on blob db side will be in a separate patch. Closes https://github.com/facebook/rocksdb/pull/2886 Differential Revision: D5838431 Pulled By: yiwu-arbug fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca	2017-10-03 09:11:23 -07:00

1 2 3 4 5 ...

790 Commits