rocksdb

Author	SHA1	Message	Date
Levi Tamasi	6301dbe7a7	Use function objects as deleters in the block cache (#6545 ) Summary: As the first step of reintroducing eviction statistics for the block cache, the patch switches from using simple function pointers as deleters to function objects implementing an interface. This will enable using deleters that have state, like a smart pointer to the statistics object that is to be updated when an entry is removed from the cache. For now, the patch adds a deleter template class `SimpleDeleter`, which simply casts the `value` pointer to its original type and calls `delete` or `delete[]` on it as appropriate. Note: to prevent object lifecycle issues, deleters must outlive the cache entries referring to them; `SimpleDeleter` ensures this by using the ("leaky") Meyers singleton pattern. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6545 Test Plan: `make asan_check` Reviewed By: siying Differential Revision: D20475823 Pulled By: ltamasi fbshipit-source-id: fe354c33dd96d9bafc094605462352305449a22a	2020-03-26 16:19:58 -07:00
Huisheng Liu	a6ce5c823b	multiget support for timestamps (#6483 ) Summary: Add timestamp support for MultiGet(). timestamp from readoptions is honored, and timestamps can be returned along with values. MultiReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks. base line (commit `17bef7d3a`): multireadrandom : 104.173 micros/op 307167 ops/sec; (5462999 of 5462999 found) This PR: multireadrandom : 104.199 micros/op 307095 ops/sec; (5307999 of 5307999 found) .\db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=multireadrandom --use_existing_db=1 --num=25000000 --threads=32 --allow_concurrent_memtable_write=0 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6483 Reviewed By: anand1976 Differential Revision: D20498373 Pulled By: riversand963 fbshipit-source-id: 8505f22bc40fd791bc7dd05e48d7e67c91edb627	2020-03-24 11:24:09 -07:00
Zhichao Cao	d300d10962	Fix the MultiGet testing failure in Circleci (#6578 ) Summary: The MultiGet test in db_basic_test fails in CircleCI vs2019. The reason is that even Snappy compression is enabled, the first compression type is still kNoCompression. This PR checks the list and ensure that only when compression is enable and the compression type is valid, compression will be enabled. Such that, it will not fail the combined read test in MultiGet. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6578 Test Plan: make check, db_basic_test. Reviewed By: anand1976 Differential Revision: D20607529 Pulled By: zhichao-cao fbshipit-source-id: dcead264d5c2da105912c18caad34b8510bb04b0	2020-03-23 18:51:09 -07:00
Yanqin Jin	617f479266	Fix LITE build (#6575 ) Summary: Fix LITE build by excluding some unit tests that use features not supported in LITE. ``` db/db_basic_test.cc:1778:8: error: ‘void rocksdb::{anonymous}::TableFileListener::OnTableFileCreated(const rocksdb::TableFileCreationInfo&)’ marked ‘override’, but does not override void OnTableFileCreated(const TableFileCreationInfo& info) override { ^~~~~~~~~~~~~~~~~~ make: *** [db/db_basic_test.o] Error 1 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6575 Reviewed By: ltamasi Differential Revision: D20598598 Pulled By: riversand963 fbshipit-source-id: 367f7cb2500360ad57030b138a94c0f731a04339	2020-03-23 13:05:36 -07:00
Yanqin Jin	fb09ef05dc	Attempt to recover from db with missing table files (#6334 ) Summary: There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status. This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version. `DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will not be performed. To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush. Test plan (on devserver): ``` $make check $COMPILE_WITH_ASAN=1 make all && make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334 Reviewed By: anand1976 Differential Revision: D19778960 Pulled By: riversand963 fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8	2020-03-20 19:30:48 -07:00
Cheng Chang	402da454cb	Migrate AppVeyor to CircleCI (#6518 ) Summary: CircleCI is the new recommended CI system internally. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6518 Test Plan: Watch https://app.circleci.com/pipelines/github/facebook/rocksdb Differential Revision: D20454743 Pulled By: cheng-chang fbshipit-source-id: 39031568d6c1d3d25b7fbd78fa9a0e6067ddc47c	2020-03-13 21:58:51 -07:00
Zhichao Cao	5c30e6c088	Separate timestamp related test from db_basic_test (#6516 ) Summary: In some of the test, db_basic_test may cause time out due to its long running time. Separate the timestamp related test from db_basic_test to avoid the potential issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6516 Test Plan: pass make asan_check Differential Revision: D20423922 Pulled By: zhichao-cao fbshipit-source-id: d6306f89a8de55b07bf57233e4554c09ef1fe23a	2020-03-13 11:37:15 -07:00
sdong	331e6199df	Include more information in file lock failure (#6507 ) Summary: When users fail to open a DB with file lock failure, it is sometimes hard for users to debug. We now include the time the lock is acquired and the thread ID that acquired the lock, to help users debug problems like this. Default Env's thread ID is used. Since type of lockedFiles is changed, rename it to follow naming convention too. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6507 Test Plan: Add a unit test and improve an existing test to validate the case. Differential Revision: D20378333 fbshipit-source-id: 312fe0e9733fd1d1e9969c321b90ce523cf4708a	2020-03-11 16:23:08 -07:00
Yanqin Jin	d93812c9ae	Iterator with timestamp (#6255 ) Summary: Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests. Create an iterator with timestamp. ``` ... read_opts.timestamp = &ts; auto* iter = db->NewIterator(read_opts); // target is key without timestamp. for (iter->Seek(target); iter->Valid(); iter->Next()) {} for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {} delete iter; read_opts.timestamp = &ts1; // lower_bound and upper_bound are without timestamp. read_opts.iterate_lower_bound = &lower_bound; read_opts.iterate_upper_bound = &upper_bound; auto* iter1 = db->NewIterator(read_opts); // Do Seek or SeekToFirst() delete iter1; ``` Test plan (dev server) ``` $make check ``` Simple benchmarking (dev server) 1. The overhead introduced by this PR even when timestamp is disabled. key size: 16 bytes value size: 100 bytes Entries: 1000000 Data reside in main memory, and try to stress iterator. Repeated three times on master and this PR. - Seek without next ``` ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 ``` master: 159047.0 ops/sec this PR: 158922.3 ops/sec (2% drop in throughput) - Seek and next 10 times ``` ./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10 ``` master: 109539.3 ops/sec this PR: 107519.7 ops/sec (2% drop in throughput) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255 Differential Revision: D19438227 Pulled By: riversand963 fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746	2020-03-06 16:24:27 -08:00
Huisheng Liu	904a60ff63	return timestamp from get (#6409 ) Summary: Added new Get() methods that return timestamp. Dummy implementation is given so that classes derived from DB don't need to be touched to provide their implementation. MultiGet is not included. ReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks. base line (commit `72ee067b9`): 101.712 micros/op 314602 ops/sec; 36.0 MB/s (5658999 of 5658999 found) This PR: 100.288 micros/op 319071 ops/sec; 36.5 MB/s (5674999 of 5674999 found) ./db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=readrandom --use_existing_db=1 --num=25000000 --threads=32 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6409 Differential Revision: D20200086 Pulled By: riversand963 fbshipit-source-id: 490edd74d924f62bd8ae9c29c2a6bbbb8410ca50	2020-03-02 16:01:00 -08:00
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	2020-02-20 12:09:57 -08:00
anand76	3e49249d30	Ensure all MultiGet IO errors are propagated to user (#6403 ) Summary: Unrevert the previous fix to propagate error status, and an additional fix to not treat a memtable lookup MergeInProgress status as an error. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6403 Test Plan: Unit tests Tried running stress tests but couldn't repro the stress failure Differential Revision: D19846721 Pulled By: anand1976 fbshipit-source-id: 7db10cccbdc863d9b559497f0a46b608d2488ca4	2020-02-11 17:27:22 -08:00
anand76	35ed530d2c	Revert "Check KeyContext status in MultiGet (#6387 )" (#6401 ) Summary: This reverts commit `d70011bccc`. The commit is causing some stress test failure due to unexpected Status::MergeInProgress() return for some keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6401 Differential Revision: D19826623 Pulled By: anand1976 fbshipit-source-id: edd634cede9cb7bdd2cb8f46e662ea709b16d2f1	2020-02-10 22:23:36 -08:00
anand76	d70011bccc	Check KeyContext status in MultiGet (#6387 ) Summary: Currently, any IO errors and checksum mismatches while reading data blocks, are being ignored by the batched MultiGet. Its only looking at the GetContext state. Fix that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6387 Test Plan: Add unit tests Differential Revision: D19799819 Pulled By: anand1976 fbshipit-source-id: 46133dccbb04e64067b9fe6cda73e282203db969	2020-02-07 16:48:16 -08:00
Yanqin Jin	6a9989381f	Fix compilation under LITE (#6277 ) Summary: Fix compilation under LITE by putting `#ifndef ROCKSDB_LITE` around a code block. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6277 Differential Revision: D19334157 Pulled By: riversand963 fbshipit-source-id: 947111ed68aa550f5ea424b216c1442a8af9e32b	2020-01-09 15:57:39 -08:00
Huisheng Liu	e5b476f551	Update file indexer to take timestamp into consideration (#6205 ) Summary: Exclude timestamp in key comparison during boundary calculation to avoid key versions being excluded. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6205 Differential Revision: D19166765 Pulled By: riversand963 fbshipit-source-id: bbe08816fef8de349a83ebd59a595ad844021f24	2020-01-08 16:31:23 -08:00
Connor1996	3e26a94ba1	Add oldest snapshot sequence property (#6228 ) Summary: Add oldest snapshot sequence property, so we can use `db.GetProperty("rocksdb.oldest-snapshot-sequence")` to get the sequence number of the oldest snapshot. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6228 Differential Revision: D19264145 Pulled By: maysamyabandeh fbshipit-source-id: 67fbe5304d89cbc475bd404e30d1299f7b11c010	2020-01-07 08:36:44 -08:00
Zhichao Cao	cddd637997	Merge adjacent file block reads in RocksDB MultiGet() and Add uncompressed block to cache (#6089 ) Summary: In the current MultiGet, if the KV-pairs do not belong to the data blocks in the block cache, multiple blocks are read from a SST. It will trigger one block read for each block request and read them in parallel. In some cases, if some data blocks are adjacent in the SST, the reads for these blocks can be combined to a single large read, which can reduce the system calls and reduce the read latency if possible. Considering to fill the block cache, if multiple data blocks are in the same memory buffer, we need to copy them to the heap separately. Therefore, only in the case that 1) data block compression is enabled, and 2) compressed block cache is null, we can do combined read. Otherwise, extra memory copy is needed, which may cause extra overhead. In the current case, data blocks will be uncompressed to a new memory space. Also, in the case that 1) data block compression is enabled, and 2) compressed block cache is null, it is possible the data block is actually not compressed. In the current logic, these data blocks will not be added to the uncompressed_cache. So if memory buffer is shared and the data block is not compressed, the data block are copied to the head and fill the cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6089 Test Plan: Added test case to ParallelIO.MultiGet. Pass make asan_check Differential Revision: D18734668 Pulled By: zhichao-cao fbshipit-source-id: 67c5615ed373e51e42635fd74b36f8f3a66d5da4	2019-12-16 16:26:03 -08:00
sdong	a68dff5c35	Apply formatter to some recent commits (#6138 ) Summary: Formatter somehow complains some recent lines changed. Apply them to make the formatter happy. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6138 Test Plan: See CI passes. Differential Revision: D18895950 fbshipit-source-id: 7d1696cf3e3a682bc10a30cdca748a23c6565255	2019-12-09 15:49:49 -08:00
Ziyue Yang	7e2f831924	Fix wrong ExtractUserKey usage in BlockBasedTableBuilder::EnterUnbuff… (#6100 ) Summary: BlockBasedTableBuilder uses ExtractUserKey in EnterUnbuffered. This would cause index filter building error, since user-provided timestamp is supported by ExtractUserKeyAndStripTimestamp, and it's used in Add. This commit changes ExtractUserKey to ExtractUserKeyAndStripTimestamp. A test case is also added by modifying DBBasicTestWithTimestampWithParam_ PutAndGet test in db_basic_test to cover ExtractUserKeyAndStripTimestamp usage in both kBuffered and kUnbuffered state of BlockBasedTableBuilder. Before the ExtractUserKeyAndStripTimstamp fix: ``` $ ./db_basic_test --gtest_filter="PutAndGet" Note: Google Test filter = PutAndGet [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam [ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0 db/db_basic_test.cc:2109: Failure db_->Get(ropts, cfh, "key" + std::to_string(j), &value) NotFound: db/db_basic_test.cc:2109: Failure db_->Get(ropts, cfh, "key" + std::to_string(j), &value) NotFound: db/db_basic_test.cc:2109: Failure db_->Get(ropts, cfh, "key" + std::to_string(j), &value) NotFound: db/db_basic_test.cc:2109: Failure db_->Get(ropts, cfh, "key" + std::to_string(j), &value) NotFound: db/db_basic_test.cc:2109: Failure db_->Get(ropts, cfh, "key" + std::to_string(j), &value) NotFound: [ FAILED ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0, where GetParam() = false (1177 ms) [ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 [ OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 (1056 ms) [----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam (2233 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (2233 ms total) [ PASSED ] 1 test. [ FAILED ] 1 test, listed below: [ FAILED ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0, where GetParam() = false 1 FAILED TEST ``` After the ExtractUserKeyAndStripTimstamp fix: ``` $ ./db_basic_test --gtest_filter="PutAndGet" Note: Google Test filter = PutAndGet [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam [ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0 [ OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/0 (1417 ms) [ RUN ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 [ OK ] Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/1 (1041 ms) [----------] 2 tests from Timestamp/DBBasicTestWithTimestampWithParam (2458 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (2458 ms total) [ PASSED ] 2 tests. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6100 Differential Revision: D18769654 Pulled By: riversand963 fbshipit-source-id: 76c2cf2c9a5e0d85db95d98e812e6af0c2a15c6b	2019-12-09 10:57:02 -08:00
anand76	38cc611297	Fix test failure in LITE mode (#6050 ) Summary: GetSupportedCompressions() is not available in LITE build, so check and use Snappy compression in db_basic_test.cc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6050 Test Plan: make LITE=1 check make check Differential Revision: D18588114 Pulled By: anand1976 fbshipit-source-id: a193de58c44f91bcc237107f25dbc1b9458eef3d	2019-11-19 10:13:24 -08:00
anand76	5b9233bfe8	Fix a test failure on systems that don't have Snappy compression libraries (#6038 ) Summary: The ParallelIO/DBBasicTestWithParallelIO.MultiGet/11 test fails if Snappy compression library is not installed, since RocksDB defaults to Snappy if none is specified. So dynamically determine the supported compression types and pick the first one. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6038 Differential Revision: D18532370 Pulled By: anand1976 fbshipit-source-id: a0a735114d1f8892ea09f7c4af8688d7bcc5b075	2019-11-18 09:37:18 -08:00
anand76	6c7b1a0cc7	Batched MultiGet API for multiple column families (#5816 ) Summary: Add a new API that allows a user to call MultiGet specifying multiple keys belonging to different column families. This is mainly useful for users who want to do a consistent read of keys across column families, with the added performance benefits of batching and returning values using PinnableSlice. As part of this change, the code in the original multi-column family MultiGet for acquiring the super versions has been refactored into a separate function that can be used by both, the batching and the non-batching versions of MultiGet. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5816 Test Plan: make check make asan_check asan_crash_test Differential Revision: D18408676 Pulled By: anand1976 fbshipit-source-id: 933e7bec91dd70e7b633be4ff623a1116cc28c8d	2019-11-12 13:52:55 -08:00
anand76	03ce7fb292	Fix a buffer overrun problem in BlockBasedTable::MultiGet (#6014 ) Summary: The calculation in BlockBasedTable::MultiGet for the required buffer length for reading in compressed blocks is incorrect. It needs to take the 5-byte block trailer into account. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6014 Test Plan: Add a unit test DBBasicTest.MultiGetBufferOverrun that fails in asan_check before the fix, and passes after. Differential Revision: D18412753 Pulled By: anand1976 fbshipit-source-id: 754dfb66be1d5f161a7efdf87be872198c7e3b72	2019-11-11 16:59:15 -08:00
anand76	9836a1fa33	Fix MultiGet crash when no_block_cache is set (#5991 ) Summary: This PR fixes https://github.com/facebook/rocksdb/issues/5975. In ```BlockBasedTable::RetrieveMultipleBlocks()```, we were calling ```MaybeReadBlocksAndLoadToCache()```, which is a no-op if neither uncompressed nor compressed block cache are configured. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5991 Test Plan: 1. Add unit tests that fail with the old code and pass with the new 2. make check and asan_check Cc spetrunia Differential Revision: D18272744 Pulled By: anand1976 fbshipit-source-id: e62fa6090d1a6adf84fcd51dfd6859b03c6aebfe	2019-11-07 12:02:21 -08:00
Peter Dillinger	ca7ccbe2ea	Misc hashing updates / upgrades (#5909 ) Summary: - Updated our included xxhash implementation to version 0.7.2 (== the latest dev version as of 2019-10-09). - Using XXH_NAMESPACE (like other fb projects) to avoid potential name collisions. - Added fastrange64, and unit tests for it and fastrange32. These are faster alternatives to hash % range. - Use preview version of XXH3 instead of MurmurHash64A for NPHash64 -- Had to update cache_test to increase probability of passing for any given hash function. - Use fastrange64 instead of % with uses of NPHash64 -- Had to fix WritePreparedTransactionTest.CommitOfDelayedPrepared to avoid deadlock apparently caused by new hash collision. - Set default seed for NPHash64 because specifying a seed rarely makes sense for it. - Removed unnecessary include xxhash.h in a popular .h file - Rename preview version of XXH3 to XXH3p for clarity and to ease backward compatibility in case final version of XXH3 is integrated. Relying on existing unit tests for NPHash64-related changes. Each new implementation of fastrange64 passed unit tests when manipulating my local build to select it. I haven't done any integration performance tests, but I consider the improved performance of the pieces being swapped in to be well established. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5909 Differential Revision: D18125196 Pulled By: pdillinger fbshipit-source-id: f6bf83d49d20cbb2549926adf454fd035f0ecc0d	2019-10-24 17:16:46 -07:00
Vijay Nadimpalli	4c49e38f15	MultiGet batching in memtable (#5818 ) Summary: RocksDB has a MultiGet() API that implements batched key lookup for higher performance (https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L468). Currently, batching is implemented in BlockBasedTableReader::MultiGet() for SST file lookups. One of the ways it improves performance is by pipelining bloom filter lookups (by prefetching required cachelines for all the keys in the batch, and then doing the probe) and thus hiding the cache miss latency. The same concept can be extended to the memtable as well. This PR involves implementing a pipelined bloom filter lookup in DynamicBloom, and implementing MemTable::MultiGet() that can leverage it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5818 Test Plan: Existing tests Performance Test: Ran the below command which fills up the memtable and makes sure there are no flushes and then call multiget. Ran it on master and on the new change and see atleast 1% performance improvement across all the test runs I did. Sometimes the improvement was upto 5%. TEST_TMPDIR=/data/users/$USER/benchmarks/feature/ numactl -C 10 ./db_bench -benchmarks="fillseq,multireadrandom" -num=600000 -compression_type="none" -level_compaction_dynamic_level_bytes -write_buffer_size=200000000 -target_file_size_base=200000000 -max_bytes_for_level_base=16777216 -reads=90000 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 -statistics -memtable_whole_key_filtering=true -memtable_bloom_size_ratio=10 Differential Revision: D17578869 Pulled By: vjnadimpalli fbshipit-source-id: 23dc651d9bf49db11d22375bf435708875a1f192	2019-10-10 09:39:39 -07:00
Yanqin Jin	457bcfde02	Let TestEnv and FaultInjectEnv use Env of choice (#5886 ) Summary: Instead of hard coding Env::Default in TestEnv and a few other places, use the DBTestBase::env_ that has been deduced from the constructor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5886 Test Plan: ``` make check ``` Differential Revision: D17773029 Pulled By: riversand963 fbshipit-source-id: 7ce4e5175a487e9d281ea2c3aae3c41bffd44629	2019-10-07 17:48:50 -07:00
sdong	e8263dbdaa	Apply formatter to recent 200+ commits. (#5830 ) Summary: Further apply formatter to more recent commits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830 Test Plan: Run all existing tests. Differential Revision: D17488031 fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab	2019-09-20 12:04:26 -07:00
Vijay Nadimpalli	979fbdc696	Persistent globally unique DB ID in manifest (#5725 ) Summary: Each DB has a globally unique ID. A DB can be physically copied around, or backed-up and restored, and the users should be identify the same DB. This unique ID right now is stored as plain text in file IDENTITY under the DB directory. This approach introduces at least two problems: (1) the file is not checksumed; (2) the source of truth of a DB is the manifest file, which can be copied separately from IDENTITY file, causing the DB ID to be wrong. The goal of this PR is solve this problem by moving the DB ID to manifest. To begin with we will write to both identity file and manifest. Write to Manifest is controlled via the flag write_dbid_to_manifest in Options and default is false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5725 Test Plan: Added unit tests. Differential Revision: D16963840 Pulled By: vjnadimpalli fbshipit-source-id: 8a86a4c8c82c716003c40fd6b9d2d758030d92e9	2019-09-03 08:52:24 -07:00
anand76	1729779b85	Disable MultiGet row cache test in LITE mode (#5756 ) Summary: Row cache is not supported in LITE mode. So disable the test in that mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5756 Test Plan: make LITE=1 all check Differential Revision: D17115684 Pulled By: anand1976 fbshipit-source-id: e6433c2e528674645cea76cdfc80ddc473708fc2	2019-08-29 12:13:28 -07:00
anand76	e10570331d	Support row cache with batched MultiGet (#5706 ) Summary: This PR adds support for row cache in ```rocksdb::TableCache::MultiGet```. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5706 Test Plan: 1. Unit tests in db_basic_test 2. db_bench results with batch size of 2 (```Get``` is faster than ```MultiGet``` for single key) - Get - readrandom : 3.935 micros/op 254116 ops/sec; 28.1 MB/s (22870998 of 22870999 found) MultiGet - multireadrandom : 3.743 micros/op 267190 ops/sec; (24047998 of 24047998 found) Command used - TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench -use_existing_db=true -use_existing_keys=false -benchmarks="readtorowcache,[read\|multiread]random" -write_buffer_size=16777216 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -row_cache_size=4194304000 -batch_size=2 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=131072 Differential Revision: D17086297 Pulled By: anand1976 fbshipit-source-id: 85784378da913e05f1baf31ec1b4e7c9345e7f57	2019-08-28 16:11:56 -07:00
Zhongyi Xie	2f41ecfe75	Refactor trimming logic for immutable memtables (#5022 ) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	2019-08-23 13:55:34 -07:00
anand76	9046bdc5d3	Fix MultiGet() bug when whole_key_filtering is disabled (#5665 ) Summary: The batched MultiGet() implementation was not correctly handling bloom filter lookups when whole_key_filtering is disabled. It was incorrectly skipping keys not in the prefix_extractor domain, and not calling transform for keys in domain. This PR fixes both problems by moving the domain check and transformation to the FilterBlockReader. Tests: Unit test (confirmed failed before the fix) make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/5665 Differential Revision: D16902380 Pulled By: anand1976 fbshipit-source-id: a6be81ad68a6e37134a65246aec7a2c590eccf00	2019-08-21 10:23:23 -07:00
Yanqin Jin	5d9a67e718	Support loading custom objects in unit tests (#5676 ) Summary: Most existing RocksDB unit tests run on `Env::Default()`. It will be useful to port the unit tests to non-default environments, e.g. `HdfsEnv`, etc. This pull request is one step towards this goal. If RocksDB unit tests are built with a static library exposing a function `RegisterCustomObjects()`, then it is possible to implement custom object registrar logic in the library. RocksDB unit test can call `RegisterCustomObjects()` at the beginning. By default, `ROCKSDB_UNITTESTS_WITH_CUSTOM_OBJECTS_FROM_STATIC_LIBS` is not defined, thus this PR has no impact on existing RocksDB because `RegisterCustomObjects()` is a noop. Test plan (on devserver): ``` $make clean && COMPILE_WITH_ASAN=1 make -j32 all $make check ``` All unit tests must pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5676 Differential Revision: D16679157 Pulled By: riversand963 fbshipit-source-id: aca571af3fd0525277cdc674248d0fe06e060f9d	2019-08-09 15:12:08 -07:00
Yanqin Jin	7c76a7fba2	Support GetAllKeyVersions() for non-default cf (#5544 ) Summary: Previously `GetAllKeyVersions()` supports default column family only. This PR add support for other column families. Test plan (devserver): ``` $make clean && COMPILE_WITH_ASAN=1 make -j32 db_basic_test $./db_basic_test --gtest_filter=DBBasicTest.GetAllKeyVersions ``` All other unit tests must pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5544 Differential Revision: D16147551 Pulled By: riversand963 fbshipit-source-id: 5a61aece2a32d789e150226a9b8d53f4a5760168	2019-07-07 22:43:52 -07:00
anand76	7259e28d91	MultiGet parallel IO (#5464 ) Summary: Enhancement to MultiGet batching to read data blocks required for keys in a batch in parallel from disk. It uses Env::MultiRead() API to read multiple blocks and reduce latency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5464 Test Plan: 1. make check 2. make asan_check 3. make asan_crash Differential Revision: D15911771 Pulled By: anand1976 fbshipit-source-id: 605036b9af0f90ca0020dc87c3a86b4da6e83394	2019-06-30 20:56:04 -07:00
Yanqin Jin	340ed4fac7	Add support for timestamp in Get/Put (#5079 ) Summary: It's useful to be able to (optionally) associate key-value pairs with user-provided timestamps. This PR is an early effort towards this goal and continues the work of facebook#4942. A suite of new unit tests exist in DBBasicTestWithTimestampWithParam. Support for timestamp requires the user to provide timestamp as a slice in `ReadOptions` and `WriteOptions`. All timestamps of the same database must share the same length, format, etc. The format of the timestamp is the same throughout the same database, and the user is responsible for providing a comparator function (Comparator) to order the <key, timestamp> tuples. Once created, the format and length of the timestamp cannot change (at least for now). Test plan (on devserver): ``` $COMPILE_WITH_ASAN=1 make -j32 all $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/* $make check ``` All tests must pass. We also run the following db_bench tests to verify whether there is regression on Get/Put while timestamp is not enabled. ``` $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000 $TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 ``` Repeat for 6 times for both versions. Results are as follows: ``` \| \| readrandom \| fillrandom \| \| master \| 16.77 MB/s \| 47.05 MB/s \| \| PR5079 \| 16.44 MB/s \| 47.03 MB/s \| ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5079 Differential Revision: D15132946 Pulled By: riversand963 fbshipit-source-id: 833a0d657eac21182f0f206c910a6438154c742c	2019-06-05 23:10:47 -07:00
Siying Dong	e9e0101ca4	Move test related files under util/ to test_util/ (#5377 ) Summary: There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo ve them to a new directory test_util/, so that util/ is cleaner. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377 Differential Revision: D15551366 Pulled By: siying fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c	2019-05-30 11:25:51 -07:00
Yi Zhang	3e63e553b4	Fix MultiGet ASSERT bug when passing unsorted result (#5195 ) Summary: Found this when test driving the new MultiGet. If you pass unsorted result with sorted_result = false you'll trigger the ASSERT incorrect even though we'll sort down below. I've also added simple test cover sorted_result=true/false scenario copied from MultiGetSimple. anand1976 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5195 Differential Revision: D14935475 Pulled By: yizhang82 fbshipit-source-id: 1d2af5e3a003847d965066a16e3b19da68acf170	2019-04-15 11:35:21 -07:00
Yanqin Jin	3189398c00	Fix bugs detected by clang analyzer (#5185 ) Summary: as titled. False positive included, fixed anyway to make the check pass. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5185 Differential Revision: D14909384 Pulled By: riversand963 fbshipit-source-id: dc5177e72b1929ccfd6175a60e2cd7bdb9bd80f3	2019-04-12 10:45:56 -07:00
anand76	fefd4b98c5	Introduce a new MultiGet batching implementation (#5011 ) Summary: This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching. Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to - 1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch() 2. Bloom filter cachelines can be prefetched, hiding the cache miss latency The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress. Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32). Batch Sizes 1 \| 2 \| 4 \| 8 \| 16 \| 32 Random pattern (Stride length 0) 4.158 \| 4.109 \| 4.026 \| 4.05 \| 4.1 \| 4.074 - Get 4.438 \| 4.302 \| 4.165 \| 4.122 \| 4.096 \| 4.075 - MultiGet (no batching) 4.461 \| 4.256 \| 4.277 \| 4.11 \| 4.182 \| 4.14 - MultiGet (w/ batching) Good locality (Stride length 16) 4.048 \| 3.659 \| 3.248 \| 2.99 \| 2.84 \| 2.753 4.429 \| 3.728 \| 3.406 \| 3.053 \| 2.911 \| 2.781 4.452 \| 3.45 \| 2.833 \| 2.451 \| 2.233 \| 2.135 Good locality (Stride length 256) 4.066 \| 3.786 \| 3.581 \| 3.447 \| 3.415 \| 3.232 4.406 \| 4.005 \| 3.644 \| 3.49 \| 3.381 \| 3.268 4.393 \| 3.649 \| 3.186 \| 2.882 \| 2.676 \| 2.62 Medium locality (Stride length 4096) 4.012 \| 3.922 \| 3.768 \| 3.61 \| 3.582 \| 3.555 4.364 \| 4.057 \| 3.791 \| 3.65 \| 3.57 \| 3.465 4.479 \| 3.758 \| 3.316 \| 3.077 \| 2.959 \| 2.891 dbbench command used (on a DB with 4 levels, 12 million keys)- TEST_TMPDIR=/dev/shm numactl -C 10 ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011 Differential Revision: D14348703 Pulled By: anand1976 fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b	2019-04-11 14:28:26 -07:00
Michael Liu	ca89ac2ba9	Apply modernize-use-override (2nd iteration) Summary: Use C++11’s override and remove virtual where applicable. Change are automatically generated. Reviewed By: Orvid Differential Revision: D14090024 fbshipit-source-id: 1e9432e87d2657e1ff0028e15370a85d1739ba2a	2019-02-14 14:41:36 -08:00
Siying Dong	cf3a671733	Remove cuckoo hash memtable (#4953 ) Summary: Cuckoo Hash is less useful than we initially expected. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4953 Differential Revision: D13979264 Pulled By: siying fbshipit-source-id: 2a60afdaa989f045357398b43a1cc5d46f4492ed	2019-02-07 16:15:27 -08:00
Anand Ananthabhotla	b9d6eccac1	Lock free MultiGet (#4754 ) Summary: Avoid locking the DB mutex in order to reference SuperVersions. Instead, we get the thread local cached SuperVersion for each column family in the list. It depends on finding a sequence number that overlaps with all the open memtables. We start with the latest published sequence number, and if any of the memtables is sealed before we can get all the SuperVersions, the process is repeated. After a few times, give up and lock the DB mutex. Tests: 1. Unit tests 2. make check 3. db_bench - TEST_TMPDIR=/dev/shm ./db_bench -use_existing_db=true -benchmarks=readrandom -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=5000000 -reads=1000000 -threads=32 -compression_type=none -cache_size=1048576000 -batch_size=1 -bloom_bits=1 readrandom : 0.167 micros/op 5983920 ops/sec; 426.2 MB/s (1000000 of 1000000 found) Multireadrandom with batch size 1: multireadrandom : 0.176 micros/op 5684033 ops/sec; (1000000 of 1000000 found) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4754 Differential Revision: D13363550 Pulled By: anand1976 fbshipit-source-id: 6243e8de7dbd9c8bb490a8eca385da0c855b1dd4	2019-01-02 11:42:54 -08:00
Sagar Vemuri	dc3528077a	Update all unique/shared_ptr instances to be qualified with namespace std (#4638 ) Summary: Ran the following commands to recursively change all the files under RocksDB: ``` find . -type f -name ".cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} + find . -type f -name ".cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} + ``` Running `make format` updated some formatting on the files touched. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638 Differential Revision: D12934992 Pulled By: sagar0 fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8	2018-11-09 11:19:58 -08:00
Bo Hou	cd9404bb77	xxhash 64 support Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4607 Reviewed By: siying Differential Revision: D12836696 Pulled By: jsjhoubo fbshipit-source-id: 7122ccb712d0b0f1cd998aa4477e0da1401bd870	2018-11-01 15:44:06 -07:00
Maysam Yabandeh	8581a93a6b	Per-thread unique test db names (#4135 ) Summary: The patch makes sure that two parallel test threads will operate on different db paths. This enables using open source tools such as gtest-parallel to run the tests of a file in parallel. Example: ``` ~/gtest-parallel/gtest-parallel ./table_test``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/4135 Differential Revision: D8846653 Pulled By: maysamyabandeh fbshipit-source-id: 799bad1abb260e3d346bcb680d2ae207a852ba84	2018-07-13 17:27:39 -07:00
amytai	0a3db28d98	Disallow compactions if there isn't enough free space Summary: This diff handles cases where compaction causes an ENOSPC error. This does not handle corner cases where another background job is started while compaction is running, and the other background job triggers ENOSPC, although we do allow the user to provision for these background jobs with SstFileManager::SetCompactionBufferSize. It also does not handle the case where compaction has finished and some other background job independently triggers ENOSPC. Usage: Functionality is inside SstFileManager. In particular, users should set SstFileManager::SetMaxAllowedSpaceUsage, which is the reference highwatermark for determining whether to cancel compactions. Closes https://github.com/facebook/rocksdb/pull/3449 Differential Revision: D7016941 Pulled By: amytai fbshipit-source-id: 8965ab8dd8b00972e771637a41b4e6c645450445	2018-03-06 16:27:54 -08:00
Andrew Kryczka	5d68243e61	Comment out unused variables Summary: Submitting on behalf of another employee. Closes https://github.com/facebook/rocksdb/pull/3557 Differential Revision: D7146025 Pulled By: ajkr fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e	2018-03-05 13:13:41 -08:00

1 2

66 Commits