rocksdb

Author	SHA1	Message	Date
Peter Dillinger	685e895652	Prepare filter tests for more implementations (#5967 ) Summary: This change sets up for alternate implementations underlying BloomFilterPolicy: * Refactor BloomFilterPolicy and expose in internal .h file so that it's easy to iterate over / select implementations for testing, regardless of what the best public interface will look like. Most notably updated db_bloom_filter_test to use this. * Hide FullFilterBitsBuilder from unit tests (alternate derived classes planned); expose the part important for testing (CalculateSpace), as abstract class BuiltinFilterBitsBuilder. (Also cleaned up internally exposed interface to CalculateSpace.) * Rename BloomTest -> BlockBasedBloomTest for clarity (despite ongoing confusion between block-based table and block-based filter) * Assert that block-based filter construction interface is only used on BloomFilterPolicy appropriately constructed. (A couple of tests updated to add ", true".) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5967 Test Plan: make check Differential Revision: D18138704 Pulled By: pdillinger fbshipit-source-id: 55ef9273423b0696309e251f50b8c1b5e9ec7597	2019-10-31 14:12:33 -07:00
Peter Dillinger	013babc685	Clean up some filter tests and comments (#5960 ) Summary: Some filtering tests were unfriendly to new implementations of FilterBitsBuilder because of dynamic_cast to FullFilterBitsBuilder. Most of those have now been cleaned up, worked around, or at least changed from crash on dynamic_cast failure to individual test failure. Also put some clarifying comments on filter-related APIs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5960 Test Plan: make check Differential Revision: D18121223 Pulled By: pdillinger fbshipit-source-id: e83827d9d5d96315d96f8e25a99cd70f497d802c	2019-10-24 18:48:16 -07:00
Peter Dillinger	ca7ccbe2ea	Misc hashing updates / upgrades (#5909 ) Summary: - Updated our included xxhash implementation to version 0.7.2 (== the latest dev version as of 2019-10-09). - Using XXH_NAMESPACE (like other fb projects) to avoid potential name collisions. - Added fastrange64, and unit tests for it and fastrange32. These are faster alternatives to hash % range. - Use preview version of XXH3 instead of MurmurHash64A for NPHash64 -- Had to update cache_test to increase probability of passing for any given hash function. - Use fastrange64 instead of % with uses of NPHash64 -- Had to fix WritePreparedTransactionTest.CommitOfDelayedPrepared to avoid deadlock apparently caused by new hash collision. - Set default seed for NPHash64 because specifying a seed rarely makes sense for it. - Removed unnecessary include xxhash.h in a popular .h file - Rename preview version of XXH3 to XXH3p for clarity and to ease backward compatibility in case final version of XXH3 is integrated. Relying on existing unit tests for NPHash64-related changes. Each new implementation of fastrange64 passed unit tests when manipulating my local build to select it. I haven't done any integration performance tests, but I consider the improved performance of the pieces being swapped in to be well established. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5909 Differential Revision: D18125196 Pulled By: pdillinger fbshipit-source-id: f6bf83d49d20cbb2549926adf454fd035f0ecc0d	2019-10-24 17:16:46 -07:00
Peter Dillinger	ec11eff3bc	FilterPolicy consolidation, part 2/2 (#5966 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. This change is step 2 of 2: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters remain in util/bloom_impl.h, and util/bloom_test.cc remains where it is for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5966 Test Plan: make check Differential Revision: D18124930 Pulled By: pdillinger fbshipit-source-id: 823bc09025b3395f092ef46a46aa5ba92a914d84	2019-10-24 15:44:51 -07:00
Peter Dillinger	dd19014a7a	FilterPolicy consolidation, part 1/2 (#5963 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. I don't foresee sharing these APIs with e.g. the Plain Table because they don't expose hashes for reuse in indexing. This change is step 1 of 2: (a) mv table/full_filter_bits_builder.h to table/block_based/filter_policy_internal.h which I expect to expand soon to internally reveal more implementation details for testing. (b) consolidate eventual contents of table/block_based/filter_policy.cc in util/bloom.cc, which has the most elaborate revision history (see step 2 ...) Step 2 soon to follow: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters are in util/bloom_impl.h. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5963 Test Plan: make check Differential Revision: D18121199 Pulled By: pdillinger fbshipit-source-id: 8f21732c3d8909777e3240e4ac3123d73140326a	2019-10-24 13:20:35 -07:00
Peter Dillinger	27a124571f	Fix memory leak on error opening PlainTable (#5951 ) Summary: Several error paths in opening of a plain table would leak memory. PR https://github.com/facebook/rocksdb/issues/5940 opened the leak to one more error path, which happens to have been (mistakenly) exercised by CuckooTableDBTest.AdaptiveTable. That test has been fixed, and the exercising of plain table error cases (more than before) has been added as BadOptions1 and BadOptions2 to PlainTableDBTest. This effectively moved the memory leak to plain_table_db_test. Also here is a cheap fix for the memory leak, without (yet?) changing the signature of ReadTableProperties. This fixes ASAN on unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5951 Test Plan: make COMPILE_WITH_ASAN=1 check Differential Revision: D18051940 Pulled By: pdillinger fbshipit-source-id: e2952930c09a2b46c4f1ff09818c5090426929de	2019-10-21 16:53:06 -07:00
sdong	30e2dc02f0	Fix VerifyChecksum readahead with mmap mode (#5945 ) Summary: A recent change introduced readahead inside VerifyChecksum(). However it is not compatible with mmap mode and generated wrong checksum verification failure. Fix it by not enabling readahead in mmap mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5945 Test Plan: Add a unit test that used to fail. Differential Revision: D18021443 fbshipit-source-id: 6f2eb600f81b26edb02222563a4006869d576bff	2019-10-21 11:38:30 -07:00
Levi Tamasi	29ccf2075c	Store the filter bits reader alongside the filter block contents (#5936 ) Summary: Amongst other things, PR https://github.com/facebook/rocksdb/issues/5504 refactored the filter block readers so that only the filter block contents are stored in the block cache (as opposed to the earlier design where the cache stored the filter block reader itself, leading to potentially dangling pointers and concurrency bugs). However, this change introduced a performance hit since with the new code, the metadata fields are re-parsed upon every access. This patch reunites the block contents with the filter bits reader to eliminate this overhead; since this is still a self-contained pure data object, it is safe to store it in the cache. (Note: this is similar to how the zstd digest is handled.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5936 Test Plan: make asan_check filter_bench results for the old code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.7153 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.4258 Single filter ns/op: 42.5974 Random filter ns/op: 217.861 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.4217 Single filter ns/op: 50.9855 Random filter ns/op: 219.167 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5172 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 32.3556 Single filter ns/op: 83.2239 Random filter ns/op: 370.676 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.2265 Single filter ns/op: 93.5651 Random filter ns/op: 408.393 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` With the new code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 25.4285 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 31.0594 Single filter ns/op: 43.8974 Random filter ns/op: 226.075 ---------------------------- Outside queries... Dry run (25d) ns/op: 31.0295 Single filter ns/op: 50.3824 Random filter ns/op: 226.805 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5308 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.2968 Single filter ns/op: 58.6163 Random filter ns/op: 291.434 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.1839 Single filter ns/op: 66.9039 Random filter ns/op: 292.828 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` Differential Revision: D17991712 Pulled By: ltamasi fbshipit-source-id: 7ea205550217bfaaa1d5158ebd658e5832e60f29	2019-10-18 19:32:59 -07:00
Peter Dillinger	fe464bca5c	Fix PlainTableReader not to crash sst_dump (#5940 ) Summary: Plain table SSTs could crash sst_dump because of a bug in PlainTableReader that can leave table_properties_ as null. Even if it was intended not to keep the table properties in some cases, they were leaked on the offending code path. Steps to reproduce: $ db_bench --benchmarks=fillrandom --num=2000000 --use_plain_table --prefix-size=12 $ sst_dump --file=0000xx.sst --show_properties from [] to [] Process /dev/shm/dbbench/000014.sst Sst file format: plain table Raw user collected properties ------------------------------ Segmentation fault (core dumped) Also added missing unit testing of plain table full_scan_mode, and an assertion in NewIterator to check for regression. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5940 Test Plan: new unit test, manual, make check Differential Revision: D18018145 Pulled By: pdillinger fbshipit-source-id: 4310c755e824c4cd6f3f86a3abc20dfa417c5e07	2019-10-18 14:44:42 -07:00
Maysam Yabandeh	4e729f9095	Fix SeekForPrev bug with Partitioned Filters and Prefix (#5907 ) Summary: Partition Filters make use of a top-level index to find the partition that might have the bloom hash of the key. The index is with internal key format (before format version 3). Each partition contains the i) blooms of the keys in that range ii) bloom of prefixes of keys in that range, iii) the bloom of the prefix of the last key in the previous partition. When ::SeekForPrev(key), we first perform a prefix bloom test on the SST file. The partition however is identified using the full internal key, rather than the prefix key. The reason is to be compatible with the internal key format of the top-level index. This creates a corner case. Example: - SST k, Partition N: P1K1, P1K2 - SST k, top-level index: P1K2 - SST k+1, Partition 1: P2K1, P3K1 - SST k+1 top-level index: P3K1 When SeekForPrev(P1K3), it should point us to P1K2. However SST k top-level index would reject P1K3 since it is out of range. One possible fix would be to search with the prefix P1 (instead of full internal key P1K3) however the details of properly comparing prefix with full internal key might get complicated. The fix we apply in this PR is to look into the last partition anyway even if the key is out of range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5907 Differential Revision: D17889918 Pulled By: maysamyabandeh fbshipit-source-id: 169fd7b3c71dbc08808eae5a8340611ebe5bdc1e	2019-10-11 20:30:00 -07:00
Andrew Kryczka	b00761eea6	Fix block cache ID uniqueness for Windows builds (#5844 ) Summary: Since we do not evict a file's blocks from block cache before that file is deleted, we require a file's cache ID prefix is both unique and non-reusable. However, the Windows functionality we were relying on only guaranteed uniqueness. That meant a newly created file could be assigned the same cache ID prefix as a deleted file. If the newly created file had block offsets matching the deleted file, full cache keys could be exactly the same, resulting in obsolete data blocks returned from cache when trying to read from the new file. We noticed this when running on FAT32 where compaction was writing out of order keys due to reading obsolete blocks from its input files. The functionality is documented as behaving the same on NTFS, although I wasn't able to repro it there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844 Test Plan: we had a reliable repro of out-of-order keys on FAT32 that was fixed by this change Differential Revision: D17752442 fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00	2019-10-11 18:19:31 -07:00
Vijay Nadimpalli	4c49e38f15	MultiGet batching in memtable (#5818 ) Summary: RocksDB has a MultiGet() API that implements batched key lookup for higher performance (https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L468). Currently, batching is implemented in BlockBasedTableReader::MultiGet() for SST file lookups. One of the ways it improves performance is by pipelining bloom filter lookups (by prefetching required cachelines for all the keys in the batch, and then doing the probe) and thus hiding the cache miss latency. The same concept can be extended to the memtable as well. This PR involves implementing a pipelined bloom filter lookup in DynamicBloom, and implementing MemTable::MultiGet() that can leverage it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5818 Test Plan: Existing tests Performance Test: Ran the below command which fills up the memtable and makes sure there are no flushes and then call multiget. Ran it on master and on the new change and see atleast 1% performance improvement across all the test runs I did. Sometimes the improvement was upto 5%. TEST_TMPDIR=/data/users/$USER/benchmarks/feature/ numactl -C 10 ./db_bench -benchmarks="fillseq,multireadrandom" -num=600000 -compression_type="none" -level_compaction_dynamic_level_bytes -write_buffer_size=200000000 -target_file_size_base=200000000 -max_bytes_for_level_base=16777216 -reads=90000 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 -statistics -memtable_whole_key_filtering=true -memtable_bloom_size_ratio=10 Differential Revision: D17578869 Pulled By: vjnadimpalli fbshipit-source-id: 23dc651d9bf49db11d22375bf435708875a1f192	2019-10-10 09:39:39 -07:00
Peter Dillinger	46ca51d430	filter_bench - a prelim tool for SST filter benchmarking (#5825 ) Summary: Example: using the tool before and after PR https://github.com/facebook/rocksdb/issues/5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): Before: - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 After: + Dry run ns/op: 22.2301 + Single filter run ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5825 Differential Revision: D17804987 Pulled By: pdillinger fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5	2019-10-07 20:10:53 -07:00
anand76	19a97dd139	Fix data block upper bound checking for iterator reseek case (#5883 ) Summary: When an iterator reseek happens with the user specifying a new iterate_upper_bound in ReadOptions, and the new seek position is at the end of the same data block, the Seek() ends up using a stale value of data_block_within_upper_bound_ and may return incorrect results. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5883 Test Plan: Added a new test case DBIteratorTest.IterReseekNewUpperBound. Verified that it failed due to the assertion failure without the fix, and passes with the fix. Differential Revision: D17752740 Pulled By: anand1976 fbshipit-source-id: f9b635ff5d6aeb0e1bef102cf8b2f900efd378e3	2019-10-03 20:53:29 -07:00
sdong	d783af1857	Fix a timer bug in MergingIterator::Seek() caused by #5871 (#5874 ) Summary: Conflict resolving in 846e05005d78dfd4276cce6753967cb16930aabb ("Revert "Merging iterator to avoid child iterator reseek for some cases") caused some timer misplaced. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5874 Test Plan: See it build. Differential Revision: D17705073 fbshipit-source-id: 9bd3a8dc4901ac33c2c6fc5b1091ffbc56a8529f	2019-10-01 19:26:18 -07:00
Yanqin Jin	9f31df8679	Fix compilation error (#5872 ) Summary: Without this fix, compiler complains. ``` $ROCKSDB_NO_FBCODE=1 USE_CLANG=1 make ldb table/block_based/full_filter_block.cc: In constructor ‘rocksdb::FullFilterBlockBuilder::FullFilterBlockBuilder(const rocksdb::SliceTransform, bool, rocksdb::FilterBitsBuilder)’: table/block_based/full_filter_block.cc:20:43: error: declaration of ‘prefix_extractor’ shadows a member of 'this' [-Werror=shadow] FilterBitsBuilder* filter_bits_builder) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5872 Test Plan: ``` $ROCKSDB_NO_FBCODE=1 make all ``` Differential Revision: D17690058 Pulled By: riversand963 fbshipit-source-id: 19e3d9bd86e1123847095240e73d30da5d66240e	2019-10-01 14:07:13 -07:00
sdong	846e05005d	Revert "Merging iterator to avoid child iterator reseek for some cases (#5286 )" (#5871 ) Summary: This reverts commit 9fad3e21eb90d215b6719097baba417bc1eeca3c. Iterator verification in stress tests sometimes fail for assertion table/block_based/block_based_table_reader.cc:2973: void rocksdb::BlockBasedTableIterator<TBlockIter, TValue>::FindBlockForward() [with TBlockIter = rocksdb::DataBlockIter; TValue = rocksdb::Slice]: Assertion `!next_block_is_out_of_bound \|\| user_comparator_.Compare(*read_options_.iterate_upper_bound, index_iter_->user_key()) <= 0' failed. It is likely to be linked to https://github.com/facebook/rocksdb/pull/5286 together with https://github.com/facebook/rocksdb/pull/5468 as the former PR makes some child iterator's seek being avoided, so that upper bound condition fails to be updated there. Strictly speaking, the former PR was merged before the latter one, but the latter one feels a more important improvement so I choose to revert the former one for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5871 Differential Revision: D17689196 fbshipit-source-id: 4ded5be68f67bee2782d31a29cb72ea68f59dd8c	2019-10-01 11:22:41 -07:00
Maysam Yabandeh	6652c94f59	Fix a bug in format_version 3 + partition filters + prefix search (#5835 ) Summary: Partitioned filters make use of a top-level index to find the partition in which the filter resides. The top-level index has a key per partition. The key is guaranteed to be larger or equal than any key in that partition. When used with format_version 3, which excludes the sequence number form index keys, the separator key in the index could be equal to the prefix of the keys in the next partition. In this way, when searching for the key, the top-level index will lead us to the previous partition, which has no key with that prefix. The prefix bloom test thus returns false, although the prefix exists in the bloom of the next partition. The patch fixes that by a hack: It always adds the prefix of the first key of the next partition to the bloom of the current partition. In this way, in the corner cases that the index will lead us to the previous partition, we still can find the bloom filter there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5835 Differential Revision: D17513585 Pulled By: maysamyabandeh fbshipit-source-id: e2d1ff26c759e6e03875c4d57f4228316ecf50e9	2019-09-24 14:00:11 -07:00
Levi Tamasi	c9932d18cc	Add class comment for Block Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5832 Differential Revision: D17550773 Pulled By: ltamasi fbshipit-source-id: 66972bb008516e55b6fbba58ddd10234346d5d11	2019-09-24 11:02:11 -07:00
sdong	e8263dbdaa	Apply formatter to recent 200+ commits. (#5830 ) Summary: Further apply formatter to more recent commits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830 Test Plan: Run all existing tests. Differential Revision: D17488031 fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab	2019-09-20 12:04:26 -07:00
Maysam Yabandeh	6ec6a4a9a4	Remove snap_refresh_nanos option (#5826 ) Summary: The snap_refresh_nanos option didn't bring much benefit. Remove the feature to simplify the code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5826 Differential Revision: D17467147 Pulled By: maysamyabandeh fbshipit-source-id: 4f950b046990d0d1292d7fc04c2ccafaf751c7f0	2019-09-18 20:26:04 -07:00
sdong	43a340bebe	Merging iterator to disble reseek optimization in prefix seek (#5815 ) Summary: We are seeing a bug of wrong results with merging iterator's reseek avoidence feature and prefix extractor. Disable this optimization for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5815 Test Plan: Validated the same MyRocks case was fixed; run all existing tests. Differential Revision: D17430776 fbshipit-source-id: aef664277ba0ab8a2e68331ff0db6ae682535371	2019-09-17 17:10:29 -07:00
Peter Dillinger	68626249c3	Refactor/consolidate legacy Bloom implementation details (#5784 ) Summary: Refactoring to consolidate implementation details of legacy Bloom filters. This helps to organize and document some related, obscure code. Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to compile and run unit tests for non-native cache line size. (Fixed a related test failure in db_properties_test.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5784 Test Plan: make check, including Recently added Bloom schema unit tests (in ./plain_table_db_test && ./bloom_test), and including with TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the schema tests with temporary fault injection into new implementations. Some performance testing with modified unit tests suggest a small to moderate improvement in speed. Differential Revision: D17381384 Pulled By: pdillinger fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df	2019-09-16 16:17:09 -07:00
Maysam Yabandeh	638d239507	Charge block cache for cache internal usage (#5797 ) Summary: For our default block cache, each additional entry has extra memory overhead. It include LRUHandle (72 bytes currently) and the cache key (two varint64, file id and offset). The usage is not negligible. For example for block_size=4k, the overhead accounts for an extra 2% memory usage for the cache. The patch charging the cache for the extra usage, reducing untracked memory usage outside block cache. The feature is enabled by default and can be disabled by passing kDontChargeCacheMetadata to the cache constructor. This PR builds up on https://github.com/facebook/rocksdb/issues/4258 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5797 Test Plan: - Existing tests are updated to either disable the feature when the test has too much dependency on the old way of accounting the usage or increasing the cache capacity to account for the additional charge of metadata. - The Usage tests in cache_test.cc are augmented to test the cache usage under kFullChargeCacheMetadata. Differential Revision: D17396833 Pulled By: maysamyabandeh fbshipit-source-id: 7684ccb9f8a40ca595e4f5efcdb03623afea0c6f	2019-09-16 15:26:21 -07:00
Peter Dillinger	d3a6726f02	Revert changes from PR#5784 accidentally in PR#5780 (#5810 ) Summary: This will allow us to fix history by having the code changes for PR#5784 properly attributed to it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5810 Differential Revision: D17400231 Pulled By: pdillinger fbshipit-source-id: 2da8b1cdf2533cfedb35b5526eadefb38c291f09	2019-09-16 11:38:53 -07:00
sdong	b931f84e56	Divide file_reader_writer.h and .cc (#5803 ) Summary: file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/ Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803 Test Plan: Build whole project using make and cmake. Differential Revision: D17374550 fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987	2019-09-16 10:33:51 -07:00
sdong	bf5dbc17e3	merging_iterator.cc: Small refactoring (#5793 ) Summary: 1. Put the similar logic of adding valid iterator to heap and check invalid iterator's status code to the same helper functions. 2. Because of 1, in the changing direction case, move around the places where we check status a little bit so that we can call the helper function there too. The logic would only divert in the case where the iterator is valid but status is not OK, which is not expected to happen. Add an assertion for that. 3. Put the logic of changing direction from forward to backward to a separate function so the unlikely code path is not in Prev(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5793 Test Plan: run all existing tests. Differential Revision: D17374397 fbshipit-source-id: d595ffcf156095c4bd0f5532bacba854482a2332	2019-09-13 16:01:13 -07:00
Peter Dillinger	aa2486b23c	Refactor some confusing logic in PlainTableReader Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5780 Test Plan: existing plain table unit test Differential Revision: D17368629 Pulled By: pdillinger fbshipit-source-id: f25409cdc2f39ebe8d5cbb599cf820270e6b5d26	2019-09-13 10:26:36 -07:00
Shylock Hg	9eb3e1f77d	Use delete to disable automatic generated methods. (#5009 ) Summary: Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009 Differential Revision: D17288733 fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3	2019-09-11 18:09:00 -07:00
Wilfried Goesgens	fbab9913e2	upgrade gtest 1.7.0 => 1.8.1 for json result writing Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332 Differential Revision: D17242232 fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64	2019-09-09 11:24:11 -07:00
Peter Dillinger	b55b2f45d0	Faster new DynamicBloom implementation (for memtable) (#5762 ) Summary: Since DynamicBloom is now only used in-memory, we're free to change it without schema compatibility issues. The new implementation is drawn from (with manifest permission) `303542a767/bloom_simulation_tests/foo.cc (L613)` This has several speed advantages over the prior implementation: * Uses fastrange instead of % * Minimum logic to determine first (and all) probed memory addresses * (Major) Two probes per 64-bit memory fetch/write. * Very fast and effective (murmur-like) hash expansion/re-mixing. (At least on recent CPUs, integer multiplication is very cheap.) While a Bloom filter with 512-bit cache locality has about a 1.15x FP rate penalty (e.g. 0.84% to 0.97%), further restricting to two probes per 64 bits incurs an additional 1.12x FP rate penalty (e.g. 0.97% to 1.09%). Nevertheless, the unit tests show no "mediocre" FP rate samples, unlike the old implementation with more erratic FP rates. Especially for the memtable, we expect speed to outweigh somewhat higher FP rates. For example, a negative table query would have to be 1000x slower than a BF query to justify doubling BF query time to shave 10% off FP rate (working assumption around 1% FP rate). While that seems likely for SSTs, my data suggests a speed factor of roughly 50x for the memtable (vs. BF; ~1.5% lower write throughput when enabling memtable Bloom filter, after this change). Thus, it's probably not worth even 5% more time in the Bloom filter to shave off 1/10th of the Bloom FP rate, or 0.1% in absolute terms, and it's probably at least 20% slower to recoup that much FP rate from this new implementation. Because of this, we do not see a need for a 'locality' option that affects the MemTable Bloom filter and have decoupled the MemTable Bloom filter from Options::bloom_locality. Note that just 3% more memory to the Bloom filter (10.3 bits per key vs. just 10) is able to make up for the ~12% FP rate drop in the new implementation: [] # Nearly "ideal" FP-wise but reasonably fast cache-local implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out time: 3.29372 sampled_fp_rate: 0.00985956 ... [] # Close match to this new implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out 10000000 6 10.3 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.10072 sampled_fp_rate: 0.00985655 ... [] # Old locality=1 implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out time: 3.95472 sampled_fp_rate: 0.00988943 ... Also note the dramatic speed improvement vs. alternatives. -- Performance unit test: DynamicBloomTest.concurrent_with_perf is updated to report more precise timing data. (Measure running time of each thread, not just longest running thread, etc.) Results averaged over various sizes enabled with --enable_perf and 20 runs each; old dynamic bloom refers to locality=1, the faster of the old: old dynamic bloom, avg add latency = 65.6468 new dynamic bloom, avg add latency = 44.3809 old dynamic bloom, avg query latency = 50.6485 new dynamic bloom, avg query latency = 43.2186 old avg parallel add latency = 41.678 new avg parallel add latency = 24.5238 old avg parallel hit latency = 14.6322 new avg parallel hit latency = 12.3939 old avg parallel miss latency = 16.7289 new avg parallel miss latency = 12.2134 Tested on a dedicated 64-bit production machine at Facebook. Significant improvement all around. Despite now using std::atomic<uint64_t>, quick before-and-after test on a 32-bit machine (Intel Atom N270, released 2008) shows no regression in performance, in some cases modest improvement. -- Performance integration test (synthetic): with DEBUG_LEVEL=0, used TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom,readmissing,readrandom,stats --num=2000000 and optionally with -memtable_whole_key_filtering -memtable_bloom_size_ratio=0.01 300 runs each configuration. Write throughput change by enabling memtable bloom: Old locality=0: -3.06% Old locality=1: -2.37% New: -1.50% conclusion -> seems to substantially close the gap Readmissing throughput change by enabling memtable bloom: Old locality=0: +34.47% Old locality=1: +34.80% New: +33.25% conclusion -> maybe a small new penalty from FP rate Readrandom throughput change by enabling memtable bloom: Old locality=0: +31.54% Old locality=1: +31.13% New: +30.60% conclusion -> maybe also from FP rate (after memtable flush) -- Another conclusion we can draw from this new implementation is that the existing 32-bit hash function is not inherently crippling the Bloom filter speed or accuracy, below about 5 million keys. For speed, the implementation is essentially the same whether starting with 32-bits or 64-bits of hash; it just determines whether the first multiplication after fastrange is a pseudorandom expansion or needed re-mix. Note that this multiplication can occur while memory is fetching. For accuracy, in a standard configuration, you need about 5 million keys before you have about a 1.1x FP penalty due to using a 32-bit hash vs. 64-bit: [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.52069 sampled_fp_rate: 0.0118267 ... [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out time: 2.43871 sampled_fp_rate: 0.0109059 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5762 Differential Revision: D17214194 Pulled By: pdillinger fbshipit-source-id: ad9da031772e985fd6b62a0e1db8e81892520595	2019-09-05 14:59:25 -07:00
Peter Dillinger	20dec1401f	Copy/split PlainTableBloomV1 from DynamicBloom (refactor) (#5767 ) Summary: DynamicBloom was being used both for memory-only and for on-disk filters, as part of the PlainTable format. To set up enhancements to the memtable Bloom filter, this splits the code into two copies and removes unused features from each copy. Adds test PlainTableDBTest.BloomSchema to ensure no accidental change to that format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5767 Differential Revision: D17206963 Pulled By: pdillinger fbshipit-source-id: 6cce8d55305ed0df051b4c58bdc98c8ad81d0553	2019-09-05 10:05:20 -07:00
Shafreeck Sea	ab0645a596	Fix comment of function NotifyCollectTableCollectorsOnFinish (#5738 ) Summary: Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5738 Differential Revision: D17097075 Pulled By: riversand963 fbshipit-source-id: ed01b5f59e8eed262a49abe1f96552842d364af1	2019-08-29 10:57:01 -07:00
anand76	e10570331d	Support row cache with batched MultiGet (#5706 ) Summary: This PR adds support for row cache in ```rocksdb::TableCache::MultiGet```. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5706 Test Plan: 1. Unit tests in db_basic_test 2. db_bench results with batch size of 2 (```Get``` is faster than ```MultiGet``` for single key) - Get - readrandom : 3.935 micros/op 254116 ops/sec; 28.1 MB/s (22870998 of 22870999 found) MultiGet - multireadrandom : 3.743 micros/op 267190 ops/sec; (24047998 of 24047998 found) Command used - TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench -use_existing_db=true -use_existing_keys=false -benchmarks="readtorowcache,[read\|multiread]random" -write_buffer_size=16777216 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -row_cache_size=4194304000 -batch_size=2 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=131072 Differential Revision: D17086297 Pulled By: anand1976 fbshipit-source-id: 85784378da913e05f1baf31ec1b4e7c9345e7f57	2019-08-28 16:11:56 -07:00
Zhongyi Xie	2f41ecfe75	Refactor trimming logic for immutable memtables (#5022 ) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	2019-08-23 13:55:34 -07:00
Levi Tamasi	df8c307d63	Revert to storing UncompressionDicts in the cache (#5645 ) Summary: PR https://github.com/facebook/rocksdb/issues/5584 decoupled the uncompression dictionary object from the underlying block data; however, this defeats the purpose of the digested ZSTD dictionary, since the whole point of the digest is to create it once and reuse it over and over again. This patch goes back to storing the uncompression dictionary itself in the cache (which should be now safe to do, since it no longer includes a Statistics pointer), while preserving the rest of the refactoring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5645 Test Plan: make asan_check Differential Revision: D16551864 Pulled By: ltamasi fbshipit-source-id: 2a7e2d34bb16e70e3c816506d5afe1d842057800	2019-08-23 08:27:30 -07:00
Patrick Pei	202942b20c	Fix local includes Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5722 Differential Revision: D16908380 fbshipit-source-id: 6a0e3cb2730b08d6012d3d7f31c937f01c399846	2019-08-22 16:21:47 -07:00
Maysam Yabandeh	244e6f2002	Refactor MultiGet names in BlockBasedTable (#5726 ) Summary: To improve code readability, since RetrieveBlock already calls MaybeReadBlockAndLoadToCache, we avoid name similarity of the functions that call RetrieveBlock with MaybeReadBlockAndLoadToCache. The patch thus renames MaybeLoadBlocksToCache to RetrieveMultipleBlock and deletes GetDataBlockFromCache, which contains only two lines. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5726 Differential Revision: D16962535 Pulled By: maysamyabandeh fbshipit-source-id: 99e8946808ce4eb7857592b9003812e3004f92d6	2019-08-22 08:49:00 -07:00
anand76	9046bdc5d3	Fix MultiGet() bug when whole_key_filtering is disabled (#5665 ) Summary: The batched MultiGet() implementation was not correctly handling bloom filter lookups when whole_key_filtering is disabled. It was incorrectly skipping keys not in the prefix_extractor domain, and not calling transform for keys in domain. This PR fixes both problems by moving the domain check and transformation to the FilterBlockReader. Tests: Unit test (confirmed failed before the fix) make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/5665 Differential Revision: D16902380 Pulled By: anand1976 fbshipit-source-id: a6be81ad68a6e37134a65246aec7a2c590eccf00	2019-08-21 10:23:23 -07:00
sdong	e1c468d16f	Do readahead in VerifyChecksum() (#5713 ) Summary: Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713 Test Plan: Add a new unit test. Differential Revision: D16860874 fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413	2019-08-16 16:42:56 -07:00
Zhongyi Xie	e89b1c9c6e	add missing check for hash index when calling BlockBasedTableIterator (#5712 ) Summary: Previous PR https://github.com/facebook/rocksdb/pull/3601 added support for making prefix_extractor dynamically mutable. However, there was a missing check for hash index when creating new BlockBasedTableIterator. While the check may be redundant because no other types of IndexReader makes uses of the flag, it is less error-prone to add the missing check so that future index reader implementation will not worry about violating the contract. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5712 Differential Revision: D16842052 Pulled By: miasantreble fbshipit-source-id: aef11c0ff7a690ed248f5b8fe23481cac486b381	2019-08-16 16:39:49 -07:00
Eli Pozniansky	c2404d9928	Optimizing ApproximateSize to create index iterator just once (#5693 ) Summary: VersionSet::ApproximateSize doesn't need to create two separate index iterators and do binary search for each in BlockBasedTable. So BlockBasedTable::ApproximateSize was added that creates the iterator once and uses it to calculate the data size between start and end keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5693 Differential Revision: D16774056 Pulled By: elipoz fbshipit-source-id: 53ce262e1a057788243bf30cd9b8aa6581df1a18	2019-08-16 14:18:28 -07:00
Levi Tamasi	d92a59b6f2	Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705 ) Summary: PR https://github.com/facebook/rocksdb/issues/5298 (and subsequent related patches) unintentionally changed the semantics of cache_index_and_filter_blocks: historically, this option only affected the main index/filter block; with the changes, it affects index/filter partitions as well. This can cause performance issues when cache_index_and_filter_blocks is false since in this case, partitions are neither cached nor preloaded (i.e. they are loaded on demand upon each access). The patch reverts to the earlier behavior, that is, partitions are cached similarly to data blocks regardless of the value of the above option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5705 Test Plan: make check ./db_bench -benchmarks=fillrandom --statistics --stats_interval_seconds=1 --duration=30 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false ./db_bench -benchmarks=readrandom --use_existing_db --statistics --stats_interval_seconds=1 --duration=10 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false --cache_size=8000000000 Relevant statistics from the readrandom benchmark with the old code: rocksdb.block.cache.index.miss COUNT : 0 rocksdb.block.cache.index.hit COUNT : 0 rocksdb.block.cache.index.add COUNT : 0 rocksdb.block.cache.index.bytes.insert COUNT : 0 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 0 rocksdb.block.cache.filter.hit COUNT : 0 rocksdb.block.cache.filter.add COUNT : 0 rocksdb.block.cache.filter.bytes.insert COUNT : 0 rocksdb.block.cache.filter.bytes.evict COUNT : 0 With the new code: rocksdb.block.cache.index.miss COUNT : 2500 rocksdb.block.cache.index.hit COUNT : 42696 rocksdb.block.cache.index.add COUNT : 2500 rocksdb.block.cache.index.bytes.insert COUNT : 4050048 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 2500 rocksdb.block.cache.filter.hit COUNT : 4550493 rocksdb.block.cache.filter.add COUNT : 2500 rocksdb.block.cache.filter.bytes.insert COUNT : 10331040 rocksdb.block.cache.filter.bytes.evict COUNT : 0 Differential Revision: D16817382 Pulled By: ltamasi fbshipit-source-id: 28a516b0da1f041a03313e0b70b28cf5cf205d00	2019-08-14 18:16:06 -07:00
Yi Zhang	04a849b7b4	Fix compiler error by deleting GetContext default ctor (#5685 ) Summary: When updating compiler version for MyRocks I'm seeing this error with rocksdb: ``` ome/yzha/mysql/mysql-fork2/rocksdb/table/get_context.h:91:3: error: explicitly defaulted default constructor is implicitly deleted [-Werror,-Wdefaulted-function-deleted] GetContext() = default; ^ /home/yzha/mysql/mysql-fork2/rocksdb/table/get_context.h:166:18: note: default constructor of 'GetContext' is implicitly deleted because field 'tracing_get_id_' of const-qualified type 'const uint64_t' (aka 'const unsigned long') would not be initialized const uint64_t tracing_get_id_; ^ ``` The error itself is rather self explanatory and makes sense. Given that no one seems to be using the default ctor (they shouldn't, anyway), I'm deleting it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5685 Differential Revision: D16747712 Pulled By: yizhang82 fbshipit-source-id: 95c0acb958a1ed41154c0047d2e6fce7644de53f	2019-08-12 16:42:10 -07:00
Vijay Nadimpalli	d150e01474	New API to get all merge operands for a Key (#5604 ) Summary: This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases: 1. Update subset of columns and read subset of columns - Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU. 2. Updating very few attributes in a value which is a JSON-like document - Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge. ---------------------------------------------------------------------------------------------------- API : Status GetMergeOperands( const ReadOptions& options, ColumnFamilyHandle* column_family, const Slice& key, PinnableSlice* merge_operands, GetMergeOperandsOptions* get_merge_operands_options, int* number_of_operands) Example usage : int size = 100; int number_of_operands = 0; std::vector<PinnableSlice> values(size); GetMergeOperandsOptions merge_operands_info; db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands); Description : Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion. merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604 Test Plan: Added unit test and perf test in db_bench that can be run using the command: ./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist Differential Revision: D16657366 Pulled By: vjnadimpalli fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf	2019-08-06 14:26:44 -07:00
Levi Tamasi	092f417037	Move the uncompression dictionary object out of the block cache (#5584 ) Summary: RocksDB has historically stored uncompression dictionary objects in the block cache as opposed to storing just the block contents. This neccesitated evicting the object upon table close. With the new code, only the raw blocks are stored in the cache, eliminating the need for eviction. In addition, the patch makes the following improvements: 1) Compression dictionary blocks are now prefetched/pinned similarly to index/filter blocks. 2) A copy operation got eliminated when the uncompression dictionary is retrieved. 3) Errors related to retrieving the uncompression dictionary are propagated as opposed to silently ignored. Note: the patch temporarily breaks the compression dictionary evicition stats. They will be fixed in a separate phase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5584 Test Plan: make asan_check Differential Revision: D16344151 Pulled By: ltamasi fbshipit-source-id: 2962b295f5b19628f9da88a3fcebbce5a5017a7b	2019-07-23 16:01:44 -07:00
Eli Pozniansky	6b7fcc0d5f	Improve CPU Efficiency of ApproximateSize (part 1) (#5613 ) Summary: 1. Avoid creating the iterator in order to call BlockBasedTable::ApproximateOffsetOf(). Instead, directly call into it. 2. Optimize BlockBasedTable::ApproximateOffsetOf() keeps the index block iterator in stack. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5613 Differential Revision: D16442660 Pulled By: elipoz fbshipit-source-id: 9320be3e918c139b10e758cbbb684706d172e516	2019-07-23 15:34:33 -07:00
sdong	66b5613d0c	row_cache to share entry for recent snapshots (#5600 ) Summary: Right now, users cannot take advantage of row cache, unless no snapshot is used, or Get() is repeated for the same snapshots. This limits the usage of row cache. This change eliminate this restriction in some cases. If the snapshot used is newer than the largest sequence number in the file, and write callback function is not registered, the same row cache key is used as no snapshot is given. We still need the callback function restriction for now because the callback function may filter out different keys for different snapshots even if the snapshots are new. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5600 Test Plan: Add a unit test. Differential Revision: D16386616 fbshipit-source-id: 6b7d214bd215d191b03ccf55926ad4b703ec2e53	2019-07-22 18:56:19 -07:00
haoyuhuang	8a008d4170	Block access tracing: Trace referenced key for Get on non-data blocks. (#5548 ) Summary: This PR traces the referenced key for Get for all types of blocks. This is useful when evaluating hybrid row-block caches. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5548 Test Plan: make clean && USE_CLANG=1 make check -j32 Differential Revision: D16157979 Pulled By: HaoyuHuang fbshipit-source-id: f6327411c9deb74e35e22a35f66cdbae09ab9d87	2019-07-17 13:05:58 -07:00
Levi Tamasi	3bde41b5a3	Move the filter readers out of the block cache (#5504 ) Summary: Currently, when the block cache is used for the filter block, it is not really the block itself that is stored in the cache but a FilterBlockReader object. Since this object is not pure data (it has, for instance, pointers that might dangle, including in one case a back pointer to the TableReader), it's not really sharable. To avoid the issues around this, the current code erases the cache entries when the TableReader is closed (which, BTW, is not sufficient since a concurrent TableReader might have picked up the object in the meantime). Instead of doing this, the patch moves the FilterBlockReader out of the cache altogether, and decouples the filter reader object from the filter block. In particular, instead of the TableReader owning, or caching/pinning the FilterBlockReader (based on the customer's settings), with the change the TableReader unconditionally owns the FilterBlockReader, which in turn owns/caches/pins the filter block. This change also enables us to reuse the code paths historically used for data blocks for filters as well. Note: Eviction statistics for filter blocks are temporarily broken. We plan to fix this in a separate phase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5504 Test Plan: make asan_check Differential Revision: D16036974 Pulled By: ltamasi fbshipit-source-id: 770f543c5fb4ed126fd1e04bfd3809cf4ff9c091	2019-07-16 13:14:58 -07:00

1 2 3 4 5 ...

968 Commits