rocksdb

Author	SHA1	Message	Date
Peter Dillinger	f059c7d9b9	New Bloom filter implementation for full and partitioned filters (#6007 ) Summary: Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter. Speed The improved speed, at least on recent x86_64, comes from * Using fastrange instead of modulo (%) * Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only slightly slower on keys around 12 bytes if hashing the same size many thousands of times in a row. * Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc. * Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes. Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed): $ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter Build avg ns/key: 47.7135 Mixed inside/outside queries... Single filter net ns/op: 26.2825 Random filter net ns/op: 150.459 Average FP rate %: 0.954651 $ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter Build avg ns/key: 47.2245 Mixed inside/outside queries... Single filter net ns/op: 63.2978 Random filter net ns/op: 188.038 Average FP rate %: 1.13823 Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected. The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome. Accuracy The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments. Accuracy data (generalizes, except old impl gets worse with millions of keys): Memory bits per key: FP rate percent old impl -> FP rate percent new impl 6: 5.70953 -> 5.69888 8: 2.45766 -> 2.29709 10: 1.13977 -> 0.959254 12: 0.662498 -> 0.411593 16: 0.353023 -> 0.0873754 24: 0.261552 -> 0.0060971 50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP) Fixes https://github.com/facebook/rocksdb/issues/5857 Fixes https://github.com/facebook/rocksdb/issues/4120 Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized. Compatibility Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007 Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version). Differential Revision: D18294749 Pulled By: pdillinger fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f	2019-11-13 16:44:01 -08:00
Peter Dillinger	18f57f5ef8	Add new persistent 64-bit hash (#5984 ) Summary: For upcoming new SST filter implementations, we will use a new 64-bit hash function (XXH3 preview, slightly modified). This change updates hash.{h,cc} for that change, adds unit tests, and out-of-lines the implementations to keep hash.h as clean/small as possible. In developing the unit tests, I discovered that the XXH3 preview always returns zero for the empty string. Zero is problematic for some algorithms (including an upcoming SST filter implementation) if it occurs more often than at the "natural" rate, so it should not be returned from trivial values using trivial seeds. I modified our fork of XXH3 to return a modest hash of the seed for the empty string. With hash function details out-of-lines in hash.h, it makes sense to enable XXH_INLINE_ALL, so that direct calls to XXH64/XXH32/XXH3p are inlined. To fix array-bounds warnings on some inline calls, I injected some casts to uintptr_t in xxhash.cc. (Issue reported to Yann.) Revised: Reverted using XXH_INLINE_ALL for now. Some Facebook checks are unhappy about #include on xxhash.cc file. I would fix that by rename to xxhash_cc.h, but to best preserve history I want to do that in a separate commit (PR) from the uintptr casts. Also updated filter_bench for this change, improving the performance predictability of dry run hashing and adding support for 64-bit hash (for upcoming new SST filter implementations, minor dead code in the tool for now). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5984 Differential Revision: D18246567 Pulled By: pdillinger fbshipit-source-id: 6162fbf6381d63c8cc611dd7ec70e1ddc883fbb8	2019-10-31 16:36:35 -07:00
Peter Dillinger	685e895652	Prepare filter tests for more implementations (#5967 ) Summary: This change sets up for alternate implementations underlying BloomFilterPolicy: * Refactor BloomFilterPolicy and expose in internal .h file so that it's easy to iterate over / select implementations for testing, regardless of what the best public interface will look like. Most notably updated db_bloom_filter_test to use this. * Hide FullFilterBitsBuilder from unit tests (alternate derived classes planned); expose the part important for testing (CalculateSpace), as abstract class BuiltinFilterBitsBuilder. (Also cleaned up internally exposed interface to CalculateSpace.) * Rename BloomTest -> BlockBasedBloomTest for clarity (despite ongoing confusion between block-based table and block-based filter) * Assert that block-based filter construction interface is only used on BloomFilterPolicy appropriately constructed. (A couple of tests updated to add ", true".) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5967 Test Plan: make check Differential Revision: D18138704 Pulled By: pdillinger fbshipit-source-id: 55ef9273423b0696309e251f50b8c1b5e9ec7597	2019-10-31 14:12:33 -07:00
Peter Dillinger	26dc29633e	filter_bench not needed for ROCKSDB_LITE (#5978 ) Summary: filter_bench is a specialized micro-benchmarking tool that should not be needed with ROCKSDB_LITE. This should fix the LITE build. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5978 Test Plan: make LITE=1 check Differential Revision: D18177941 Pulled By: pdillinger fbshipit-source-id: b73a171404661e09e018bc99afcf8d4bf1e2949c	2019-10-28 14:12:36 -07:00
Peter Dillinger	3f891c40a0	More improvements to filter_bench (#5968 ) Summary: * Adds support for plain table filter. This is not critical right now, but does add a -impl flag that will be useful for new filter implementations initially targeted at block-based table (and maybe later ported to plain table) * Better mixing of inside vs. outside queries, for more realism * A -best_case option handy for implementation tuning inner loop * Option for whether to include hashing time in dry run / net timings No modifications to production code, just filter_bench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5968 Differential Revision: D18139872 Pulled By: pdillinger fbshipit-source-id: 5b09eba963111b48f9e0525a706e9921070990e8	2019-10-25 13:27:07 -07:00
Peter Dillinger	b3dc2f3691	Update xxhash.cc to allow combined compilation (#5969 ) Summary: To fix unity_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/5969 Test Plan: make unity_test Differential Revision: D18140426 Pulled By: pdillinger fbshipit-source-id: d5516e6d665f57e3706b9f9b965b0c458e58ccef	2019-10-25 12:54:41 -07:00
Peter Dillinger	013babc685	Clean up some filter tests and comments (#5960 ) Summary: Some filtering tests were unfriendly to new implementations of FilterBitsBuilder because of dynamic_cast to FullFilterBitsBuilder. Most of those have now been cleaned up, worked around, or at least changed from crash on dynamic_cast failure to individual test failure. Also put some clarifying comments on filter-related APIs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5960 Test Plan: make check Differential Revision: D18121223 Pulled By: pdillinger fbshipit-source-id: e83827d9d5d96315d96f8e25a99cd70f497d802c	2019-10-24 18:48:16 -07:00
Peter Dillinger	ca7ccbe2ea	Misc hashing updates / upgrades (#5909 ) Summary: - Updated our included xxhash implementation to version 0.7.2 (== the latest dev version as of 2019-10-09). - Using XXH_NAMESPACE (like other fb projects) to avoid potential name collisions. - Added fastrange64, and unit tests for it and fastrange32. These are faster alternatives to hash % range. - Use preview version of XXH3 instead of MurmurHash64A for NPHash64 -- Had to update cache_test to increase probability of passing for any given hash function. - Use fastrange64 instead of % with uses of NPHash64 -- Had to fix WritePreparedTransactionTest.CommitOfDelayedPrepared to avoid deadlock apparently caused by new hash collision. - Set default seed for NPHash64 because specifying a seed rarely makes sense for it. - Removed unnecessary include xxhash.h in a popular .h file - Rename preview version of XXH3 to XXH3p for clarity and to ease backward compatibility in case final version of XXH3 is integrated. Relying on existing unit tests for NPHash64-related changes. Each new implementation of fastrange64 passed unit tests when manipulating my local build to select it. I haven't done any integration performance tests, but I consider the improved performance of the pieces being swapped in to be well established. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5909 Differential Revision: D18125196 Pulled By: pdillinger fbshipit-source-id: f6bf83d49d20cbb2549926adf454fd035f0ecc0d	2019-10-24 17:16:46 -07:00
Peter Dillinger	ec11eff3bc	FilterPolicy consolidation, part 2/2 (#5966 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. This change is step 2 of 2: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters remain in util/bloom_impl.h, and util/bloom_test.cc remains where it is for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5966 Test Plan: make check Differential Revision: D18124930 Pulled By: pdillinger fbshipit-source-id: 823bc09025b3395f092ef46a46aa5ba92a914d84	2019-10-24 15:44:51 -07:00
Peter Dillinger	dd19014a7a	FilterPolicy consolidation, part 1/2 (#5963 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. I don't foresee sharing these APIs with e.g. the Plain Table because they don't expose hashes for reuse in indexing. This change is step 1 of 2: (a) mv table/full_filter_bits_builder.h to table/block_based/filter_policy_internal.h which I expect to expand soon to internally reveal more implementation details for testing. (b) consolidate eventual contents of table/block_based/filter_policy.cc in util/bloom.cc, which has the most elaborate revision history (see step 2 ...) Step 2 soon to follow: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters are in util/bloom_impl.h. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5963 Test Plan: make check Differential Revision: D18121199 Pulled By: pdillinger fbshipit-source-id: 8f21732c3d8909777e3240e4ac3123d73140326a	2019-10-24 13:20:35 -07:00
Peter Dillinger	2837008525	Vary key size and alignment in filter_bench (#5933 ) Summary: The first version of filter_bench has selectable key size but that size does not vary throughout a test run. This artificially favors "branchy" hash functions like the existing BloomHash, MurmurHash1, probably because of optimal return for branch prediction. This change primarily varies those key sizes from -2 to +2 bytes vs. the average selected size. We also set the default key size at 24 to better reflect our best guess of typical key size. But steadily random key sizes may not be realistic either. So this change introduces a new filter_bench option: -vary_key_size_log2_interval=n where the same key size is used 2^n times and then changes to another size. I've set the default at 5 (32 times same size) as a compromise between deployments with rather consistent vs. rather variable key sizes. On my Skylake system, the performance boost to MurmurHash1 largely lies between n=10 and n=15. Also added -vary_key_alignment (bool, now default=true), though this doesn't currently seem to matter in hash functions under consideration. This change also does a "dry run" for each testing scenario, to improve the accuracy of those numbers, as there was more difference between scenarios than expected. Subtracting gross test run times from dry run times is now also embedded in the output, because these "net" times are generally the most useful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5933 Differential Revision: D18121683 Pulled By: pdillinger fbshipit-source-id: 3c7efee1c5661a5fe43de555e786754ddf80dc1e	2019-10-24 13:08:30 -07:00
Peter Dillinger	6a32e3b562	Remove unused BloomFilterPolicy::hash_func_ (#5961 ) Summary: This is an internal, file-local "feature" that is not used and potentially confusing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5961 Test Plan: make check Differential Revision: D18099018 Pulled By: pdillinger fbshipit-source-id: 7870627eeed09941d12538ec55d10d2e164fc716	2019-10-23 15:47:17 -07:00
Levi Tamasi	29ccf2075c	Store the filter bits reader alongside the filter block contents (#5936 ) Summary: Amongst other things, PR https://github.com/facebook/rocksdb/issues/5504 refactored the filter block readers so that only the filter block contents are stored in the block cache (as opposed to the earlier design where the cache stored the filter block reader itself, leading to potentially dangling pointers and concurrency bugs). However, this change introduced a performance hit since with the new code, the metadata fields are re-parsed upon every access. This patch reunites the block contents with the filter bits reader to eliminate this overhead; since this is still a self-contained pure data object, it is safe to store it in the cache. (Note: this is similar to how the zstd digest is handled.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5936 Test Plan: make asan_check filter_bench results for the old code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.7153 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.4258 Single filter ns/op: 42.5974 Random filter ns/op: 217.861 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.4217 Single filter ns/op: 50.9855 Random filter ns/op: 219.167 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5172 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 32.3556 Single filter ns/op: 83.2239 Random filter ns/op: 370.676 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.2265 Single filter ns/op: 93.5651 Random filter ns/op: 408.393 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` With the new code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 25.4285 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 31.0594 Single filter ns/op: 43.8974 Random filter ns/op: 226.075 ---------------------------- Outside queries... Dry run (25d) ns/op: 31.0295 Single filter ns/op: 50.3824 Random filter ns/op: 226.805 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5308 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.2968 Single filter ns/op: 58.6163 Random filter ns/op: 291.434 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.1839 Single filter ns/op: 66.9039 Random filter ns/op: 292.828 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` Differential Revision: D17991712 Pulled By: ltamasi fbshipit-source-id: 7ea205550217bfaaa1d5158ebd658e5832e60f29	2019-10-18 19:32:59 -07:00
Peter Dillinger	5f8f2fda0e	Refactor / clean up / optimize FullFilterBitsReader (#5941 ) Summary: FullFilterBitsReader, after creating in BloomFilterPolicy, was responsible for decoding metadata bits. This meant that FullFilterBitsReader::MayMatch had some metadata checks in order to implement "always true" or "always false" functionality in the case of inconsistent or trivial metadata. This made for ugly mixing-of-concerns code and probably had some runtime cost. It also didn't really support plugging in alternative filter implementations with extensions to the existing metadata schema. BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible for decoding filter metadata bits and constructing appropriate instances deriving from FilterBitsReader. "Always false" and "always true" derived classes allow FullFilterBitsReader not to be concerned with handling of trivial or inconsistent metadata. This also makes for easy expansion to alternative filter implementations in new, alternative derived classes. This change makes calls to FilterBitsReader::MayMatch necessarily virtual because there's now more than one built-in implementation. Compared with the previous implementation's extra 'if' checks in MayMatch, there's no consistent performance difference, measured by (an older revision of) filter_bench (differences here seem to be within noise): Inside queries... - Dry run (407) ns/op: 35.9996 + Dry run (407) ns/op: 35.2034 - Single filter ns/op: 47.5483 + Single filter ns/op: 47.4034 - Batched, prepared ns/op: 43.1559 + Batched, prepared ns/op: 42.2923 ... - Random filter ns/op: 150.697 + Random filter ns/op: 149.403 ---------------------------- Outside queries... - Dry run (980) ns/op: 34.6114 + Dry run (980) ns/op: 34.0405 - Single filter ns/op: 56.8326 + Single filter ns/op: 55.8414 - Batched, prepared ns/op: 48.2346 + Batched, prepared ns/op: 47.5667 - Random filter ns/op: 155.377 + Random filter ns/op: 153.942 Average FP rate %: 1.1386 Also, the FullFilterBitsReader ctor was responsible for a surprising amount of CPU in production, due in part to inefficient determination of the CACHE_LINE_SIZE used to construct the filter being read. The overwhelming common case (same as my CACHE_LINE_SIZE) is now substantially optimized, as shown with filter_bench with -new_reader_every=1 (old option - see below) (repeatable result): Inside queries... - Dry run (453) ns/op: 118.799 + Dry run (453) ns/op: 105.869 - Single filter ns/op: 82.5831 + Single filter ns/op: 74.2509 ... - Random filter ns/op: 224.936 + Random filter ns/op: 194.833 ---------------------------- Outside queries... - Dry run (aa1) ns/op: 118.503 + Dry run (aa1) ns/op: 104.925 - Single filter ns/op: 90.3023 + Single filter ns/op: 83.425 ... - Random filter ns/op: 220.455 + Random filter ns/op: 175.7 Average FP rate %: 1.13886 However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse. Also fixed inadequate check of consistency between filter data size and num_lines. (Unit test updated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941 Test Plan: previously added unit tests FullBloomTest.CorruptFilters and FullBloomTest.RawSchema Differential Revision: D18018353 Pulled By: pdillinger fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c	2019-10-18 14:50:52 -07:00
Peter Dillinger	93edd51c4a	bloom_test.cc: include <array> (#5920 ) Summary: Fix build failure on some platforms, reported in issue https://github.com/facebook/rocksdb/issues/5914 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5920 Test Plan: make bloom_test && ./bloom_test Differential Revision: D17918328 Pulled By: pdillinger fbshipit-source-id: b822004d4442de0171db2aeff433677783f7b94e	2019-10-14 15:38:31 -07:00
Vijay Nadimpalli	4c49e38f15	MultiGet batching in memtable (#5818 ) Summary: RocksDB has a MultiGet() API that implements batched key lookup for higher performance (https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L468). Currently, batching is implemented in BlockBasedTableReader::MultiGet() for SST file lookups. One of the ways it improves performance is by pipelining bloom filter lookups (by prefetching required cachelines for all the keys in the batch, and then doing the probe) and thus hiding the cache miss latency. The same concept can be extended to the memtable as well. This PR involves implementing a pipelined bloom filter lookup in DynamicBloom, and implementing MemTable::MultiGet() that can leverage it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5818 Test Plan: Existing tests Performance Test: Ran the below command which fills up the memtable and makes sure there are no flushes and then call multiget. Ran it on master and on the new change and see atleast 1% performance improvement across all the test runs I did. Sometimes the improvement was upto 5%. TEST_TMPDIR=/data/users/$USER/benchmarks/feature/ numactl -C 10 ./db_bench -benchmarks="fillseq,multireadrandom" -num=600000 -compression_type="none" -level_compaction_dynamic_level_bytes -write_buffer_size=200000000 -target_file_size_base=200000000 -max_bytes_for_level_base=16777216 -reads=90000 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 -statistics -memtable_whole_key_filtering=true -memtable_bloom_size_ratio=10 Differential Revision: D17578869 Pulled By: vjnadimpalli fbshipit-source-id: 23dc651d9bf49db11d22375bf435708875a1f192	2019-10-10 09:39:39 -07:00
Peter Dillinger	90e285efde	Fix some implicit conversions in filter_bench (#5894 ) Summary: Fixed some spots where converting size_t or uint_fast32_t to uint32_t. Wrapped mt19937 in a new Random32 class to avoid future such traps. NB: I tried using Random32::Uniform (std::uniform_int_distribution) in filter_bench instead of fastrange, but that more than doubled the dry run time! So I added fastrange as Random32::Uniformish. ;) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5894 Test Plan: USE_CLANG=1 build, and manual re-run filter_bench Differential Revision: D17825131 Pulled By: pdillinger fbshipit-source-id: 68feee333b5f8193c084ded760e3d6679b405ecd	2019-10-08 19:22:07 -07:00
Peter Dillinger	46ca51d430	filter_bench - a prelim tool for SST filter benchmarking (#5825 ) Summary: Example: using the tool before and after PR https://github.com/facebook/rocksdb/issues/5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): Before: - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 After: + Dry run ns/op: 22.2301 + Single filter run ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5825 Differential Revision: D17804987 Pulled By: pdillinger fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5	2019-10-07 20:10:53 -07:00
Peter Dillinger	9f54446525	Fix type in shift operation in bloom_test (#5882 ) Summary: Broken type for shift in PR#5834. Fixing code means fixing expected values in test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5882 Test Plan: thisisthetest Differential Revision: D17746136 Pulled By: pdillinger fbshipit-source-id: d3c456ed30b433d55fcab6fc7d836940fe3b46b8	2019-10-03 13:19:20 -07:00
Peter Dillinger	9e4913ce9d	Add FullBloomTest.CorruptFilters,RawSchema (#5834 ) Summary: There was significant untested logic in FullFilterBitsReader in the handling of serialized Bloom filter bits that cannot be generated by FullFilterBitsBuilder in the current compilation. These now test many of those corner-case behaviors, including bad metadata or filters created with different cache line size than the current compiled-in value. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5834 Test Plan: thisisthetest Differential Revision: D17726372 Pulled By: pdillinger fbshipit-source-id: fb7b8003b5a8e6fb4666fe95206128f3d5835fc7	2019-10-02 15:33:48 -07:00
sdong	e8263dbdaa	Apply formatter to recent 200+ commits. (#5830 ) Summary: Further apply formatter to more recent commits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830 Test Plan: Run all existing tests. Differential Revision: D17488031 fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab	2019-09-20 12:04:26 -07:00
sdong	c06b54d0c6	Apply formatter on recent 45 commits. (#5827 ) Summary: Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827 Test Plan: Run all existing tests. Differential Revision: D17483727 fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f	2019-09-19 12:34:17 -07:00
Levi Tamasi	2cbb61eadb	Make clang-analyzer happy (#5821 ) Summary: clang-analyzer has uncovered a bunch of places where the code is relying on pointers being valid and one case (in VectorIterator) where a moved-from object is being used: In file included from db/range_tombstone_fragmenter.cc:17: ./util/vector_iterator.h:23:18: warning: Method called on moved-from object 'keys' of type 'std::vector' current_(keys.size()) { ^~~~~~~~~~~ 1 warning generated. utilities/persistent_cache/block_cache_tier_file.cc:39:14: warning: Called C++ object pointer is null Status s = env->NewRandomAccessFile(filepath, file, opt); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ utilities/persistent_cache/block_cache_tier_file.cc:47:19: warning: Called C++ object pointer is null Status status = env_->GetFileSize(Path(), size); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ utilities/persistent_cache/block_cache_tier_file.cc:290:14: warning: Called C++ object pointer is null Status s = env_->FileExists(Path()); ^~~~~~~~~~~~~~~~~~~~~~~~ utilities/persistent_cache/block_cache_tier_file.cc:363:35: warning: Called C++ object pointer is null CacheWriteBuffer* const buf = alloc_->Allocate(); ^~~~~~~~~~~~~~~~~~ utilities/persistent_cache/block_cache_tier_file.cc:399:41: warning: Called C++ object pointer is null const uint64_t file_off = buf_doff_ * alloc_->BufferSize(); ^~~~~~~~~~~~~~~~~~~~ utilities/persistent_cache/block_cache_tier_file.cc:463:33: warning: Called C++ object pointer is null size_t start_idx = lba.off_ / alloc_->BufferSize(); ^~~~~~~~~~~~~~~~~~~~ utilities/persistent_cache/block_cache_tier_file.cc:515:5: warning: Called C++ object pointer is null alloc_->Deallocate(bufs_[i]); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~ 7 warnings generated. ar: creating librocksdb_debug.a utilities/memory/memory_test.cc:68:25: warning: Called C++ object pointer is null cache_set->insert(db->GetDBOptions().row_cache.get()); ^~~~~~~~~~~~~~~~~~ 1 warning generated. The patch fixes these by adding assertions and explicitly passing in zero when initializing VectorIterator::current_ (which preserves the existing behavior). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5821 Test Plan: Ran make check and make analyze to make sure the warnings have disappeared. Differential Revision: D17455949 Pulled By: ltamasi fbshipit-source-id: 363619618ea649a0674287f9f3b3393e390571ee	2019-09-18 15:25:48 -07:00
风	2389aa2da9	Remove unneeded unlock statement (#5809 ) Summary: The dtor will automatically do unlock Pull Request resolved: https://github.com/facebook/rocksdb/pull/5809 Differential Revision: D17453694 Pulled By: ltamasi fbshipit-source-id: 5348bff8e6a620a05ff639a5454e8d82ae98a22d	2019-09-18 14:26:37 -07:00
andrew	622683000c	Allow users to stop manual compactions (#3971 ) Summary: Manual compaction may bring in very high load because sometime the amount of data involved in a compaction could be large, which may affect online service. So it would be good if the running compaction making the server busy can be stopped immediately. In this implementation, stopping manual compaction condition is only checked in slow process. We let deletion compaction and trivial move go through. Pull Request resolved: https://github.com/facebook/rocksdb/pull/3971 Test Plan: add tests at more spots. Differential Revision: D17369043 fbshipit-source-id: 575a624fb992ce0bb07d9443eb209e547740043c	2019-09-16 21:01:47 -07:00
Peter Dillinger	68626249c3	Refactor/consolidate legacy Bloom implementation details (#5784 ) Summary: Refactoring to consolidate implementation details of legacy Bloom filters. This helps to organize and document some related, obscure code. Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to compile and run unit tests for non-native cache line size. (Fixed a related test failure in db_properties_test.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5784 Test Plan: make check, including Recently added Bloom schema unit tests (in ./plain_table_db_test && ./bloom_test), and including with TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the schema tests with temporary fault injection into new implementations. Some performance testing with modified unit tests suggest a small to moderate improvement in speed. Differential Revision: D17381384 Pulled By: pdillinger fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df	2019-09-16 16:17:09 -07:00
Peter Dillinger	d3a6726f02	Revert changes from PR#5784 accidentally in PR#5780 (#5810 ) Summary: This will allow us to fix history by having the code changes for PR#5784 properly attributed to it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5810 Differential Revision: D17400231 Pulled By: pdillinger fbshipit-source-id: 2da8b1cdf2533cfedb35b5526eadefb38c291f09	2019-09-16 11:38:53 -07:00
sdong	b931f84e56	Divide file_reader_writer.h and .cc (#5803 ) Summary: file_reader_writer.h and .cc contain several files and helper function, and it's hard to navigate. Separate it to multiple files and put them under file/ Pull Request resolved: https://github.com/facebook/rocksdb/pull/5803 Test Plan: Build whole project using make and cmake. Differential Revision: D17374550 fbshipit-source-id: 10efca907721e7a78ed25bbf74dc5410dea05987	2019-09-16 10:33:51 -07:00
Peter Dillinger	915d72d849	Improve accuracy testing for DynamicBloom (#5805 ) Summary: DynamicBloom unit test now tests non-sequential as well as sequential keys in testing FP rates. Also now verifies larger structures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5805 Test Plan: thisisthetest Differential Revision: D17398109 Pulled By: pdillinger fbshipit-source-id: 374074206c76d242efa378afc27830448a0e892a	2019-09-16 09:37:42 -07:00
Peter Dillinger	aa2486b23c	Refactor some confusing logic in PlainTableReader Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5780 Test Plan: existing plain table unit test Differential Revision: D17368629 Pulled By: pdillinger fbshipit-source-id: f25409cdc2f39ebe8d5cbb599cf820270e6b5d26	2019-09-13 10:26:36 -07:00
HouBingjian	a378a4c2ac	arm64 crc prefetch optimise (#5773 ) Summary: prefetch data for following block，avoid cache miss when doing crc caculate I do performance test at kunpeng-920 server(arm-v8, 64core@2.6GHz) ./db_bench --benchmarks=crc32c --block_size=500000000 before optimise : 587313.500 micros/op 1 ops/sec; 811.9 MB/s (500000000 per op) after optimise : 289248.500 micros/op 3 ops/sec; 1648.5 MB/s (500000000 per op) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5773 Differential Revision: D17347339 fbshipit-source-id: bfcd74f0f0eb4b322b959be68019ddcaae1e3341	2019-09-12 16:59:44 -07:00
Shylock Hg	9eb3e1f77d	Use delete to disable automatic generated methods. (#5009 ) Summary: Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009 Differential Revision: D17288733 fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3	2019-09-11 18:09:00 -07:00
Peter Dillinger	7af6ced14b	Fix block allocation bug in new DynamicBloom (#5783 ) Summary: Bug found by valgrind. New DynamicBloom wasn't allocating in block sizes. New assertion added that probes starting in final word would be in bounds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5783 Test Plan: ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 valgrind --leak-check=full ./dynamic_bloom_test Differential Revision: D17270623 Pulled By: pdillinger fbshipit-source-id: 1e0407504b875133a771383cd488c70f91be2b87	2019-09-09 15:26:43 -07:00
Peter Dillinger	108c619acb	Add regression test for serialized Bloom filters (#5778 ) Summary: Check that we don't accidentally change the on-disk format of existing Bloom filter implementations, including for various CACHE_LINE_SIZE (by changing temporarily). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5778 Test Plan: thisisthetest Differential Revision: D17269630 Pulled By: pdillinger fbshipit-source-id: c77017662f010a77603b7d475892b1f0d5563d8b	2019-09-09 14:51:30 -07:00
Wilfried Goesgens	fbab9913e2	upgrade gtest 1.7.0 => 1.8.1 for json result writing Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332 Differential Revision: D17242232 fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64	2019-09-09 11:24:11 -07:00
HouBingjian	ac97e6930f	bloom test check fail on arm (#5745 ) Summary: FullFilterBitsBuilder::CalculateSpace use CACHE_LINE_SIZE which is 64@X86 but 128@ARM64 when it run bloom_test.FullVaryingLengths it failed on ARM64 server, the assert can be fixed by change 128->CACHE_LINE_SIZE2 as merged ASSERT_LE(FilterSize(), (size_t)((length 10 / 8) + CACHE_LINE_SIZE * 2 + 5)) << length; run bloom_test before fix: /root/rocksdb-master/util/bloom_test.cc:281: Failure Expected: (FilterSize()) <= ((size_t)((length * 10 / 8) + 128 + 5)), actual: 389 vs 383 200 [ FAILED ] FullBloomTest.FullVaryingLengths (32 ms) [----------] 4 tests from FullBloomTest (32 ms total) [----------] Global test environment tear-down [==========] 7 tests from 2 test cases ran. (116 ms total) [ PASSED ] 6 tests. [ FAILED ] 1 test, listed below: [ FAILED ] FullBloomTest.FullVaryingLengths after fix: Filters: 37 good, 0 mediocre [ OK ] FullBloomTest.FullVaryingLengths (90 ms) [----------] 4 tests from FullBloomTest (90 ms total) [----------] Global test environment tear-down [==========] 7 tests from 2 test cases ran. (174 ms total) [ PASSED ] 7 tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5745 Differential Revision: D17076047 fbshipit-source-id: e7beb5d55d4855fceb2b84bc8119a6b0759de635	2019-09-05 17:03:24 -07:00
Peter Dillinger	b55b2f45d0	Faster new DynamicBloom implementation (for memtable) (#5762 ) Summary: Since DynamicBloom is now only used in-memory, we're free to change it without schema compatibility issues. The new implementation is drawn from (with manifest permission) `303542a767/bloom_simulation_tests/foo.cc (L613)` This has several speed advantages over the prior implementation: * Uses fastrange instead of % * Minimum logic to determine first (and all) probed memory addresses * (Major) Two probes per 64-bit memory fetch/write. * Very fast and effective (murmur-like) hash expansion/re-mixing. (At least on recent CPUs, integer multiplication is very cheap.) While a Bloom filter with 512-bit cache locality has about a 1.15x FP rate penalty (e.g. 0.84% to 0.97%), further restricting to two probes per 64 bits incurs an additional 1.12x FP rate penalty (e.g. 0.97% to 1.09%). Nevertheless, the unit tests show no "mediocre" FP rate samples, unlike the old implementation with more erratic FP rates. Especially for the memtable, we expect speed to outweigh somewhat higher FP rates. For example, a negative table query would have to be 1000x slower than a BF query to justify doubling BF query time to shave 10% off FP rate (working assumption around 1% FP rate). While that seems likely for SSTs, my data suggests a speed factor of roughly 50x for the memtable (vs. BF; ~1.5% lower write throughput when enabling memtable Bloom filter, after this change). Thus, it's probably not worth even 5% more time in the Bloom filter to shave off 1/10th of the Bloom FP rate, or 0.1% in absolute terms, and it's probably at least 20% slower to recoup that much FP rate from this new implementation. Because of this, we do not see a need for a 'locality' option that affects the MemTable Bloom filter and have decoupled the MemTable Bloom filter from Options::bloom_locality. Note that just 3% more memory to the Bloom filter (10.3 bits per key vs. just 10) is able to make up for the ~12% FP rate drop in the new implementation: [] # Nearly "ideal" FP-wise but reasonably fast cache-local implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out time: 3.29372 sampled_fp_rate: 0.00985956 ... [] # Close match to this new implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out 10000000 6 10.3 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.10072 sampled_fp_rate: 0.00985655 ... [] # Old locality=1 implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out time: 3.95472 sampled_fp_rate: 0.00988943 ... Also note the dramatic speed improvement vs. alternatives. -- Performance unit test: DynamicBloomTest.concurrent_with_perf is updated to report more precise timing data. (Measure running time of each thread, not just longest running thread, etc.) Results averaged over various sizes enabled with --enable_perf and 20 runs each; old dynamic bloom refers to locality=1, the faster of the old: old dynamic bloom, avg add latency = 65.6468 new dynamic bloom, avg add latency = 44.3809 old dynamic bloom, avg query latency = 50.6485 new dynamic bloom, avg query latency = 43.2186 old avg parallel add latency = 41.678 new avg parallel add latency = 24.5238 old avg parallel hit latency = 14.6322 new avg parallel hit latency = 12.3939 old avg parallel miss latency = 16.7289 new avg parallel miss latency = 12.2134 Tested on a dedicated 64-bit production machine at Facebook. Significant improvement all around. Despite now using std::atomic<uint64_t>, quick before-and-after test on a 32-bit machine (Intel Atom N270, released 2008) shows no regression in performance, in some cases modest improvement. -- Performance integration test (synthetic): with DEBUG_LEVEL=0, used TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom,readmissing,readrandom,stats --num=2000000 and optionally with -memtable_whole_key_filtering -memtable_bloom_size_ratio=0.01 300 runs each configuration. Write throughput change by enabling memtable bloom: Old locality=0: -3.06% Old locality=1: -2.37% New: -1.50% conclusion -> seems to substantially close the gap Readmissing throughput change by enabling memtable bloom: Old locality=0: +34.47% Old locality=1: +34.80% New: +33.25% conclusion -> maybe a small new penalty from FP rate Readrandom throughput change by enabling memtable bloom: Old locality=0: +31.54% Old locality=1: +31.13% New: +30.60% conclusion -> maybe also from FP rate (after memtable flush) -- Another conclusion we can draw from this new implementation is that the existing 32-bit hash function is not inherently crippling the Bloom filter speed or accuracy, below about 5 million keys. For speed, the implementation is essentially the same whether starting with 32-bits or 64-bits of hash; it just determines whether the first multiplication after fastrange is a pseudorandom expansion or needed re-mix. Note that this multiplication can occur while memory is fetching. For accuracy, in a standard configuration, you need about 5 million keys before you have about a 1.1x FP penalty due to using a 32-bit hash vs. 64-bit: [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.52069 sampled_fp_rate: 0.0118267 ... [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out time: 2.43871 sampled_fp_rate: 0.0109059 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5762 Differential Revision: D17214194 Pulled By: pdillinger fbshipit-source-id: ad9da031772e985fd6b62a0e1db8e81892520595	2019-09-05 14:59:25 -07:00
Peter Dillinger	20dec1401f	Copy/split PlainTableBloomV1 from DynamicBloom (refactor) (#5767 ) Summary: DynamicBloom was being used both for memory-only and for on-disk filters, as part of the PlainTable format. To set up enhancements to the memtable Bloom filter, this splits the code into two copies and removes unused features from each copy. Adds test PlainTableDBTest.BloomSchema to ensure no accidental change to that format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5767 Differential Revision: D17206963 Pulled By: pdillinger fbshipit-source-id: 6cce8d55305ed0df051b4c58bdc98c8ad81d0553	2019-09-05 10:05:20 -07:00
Zhongyi Xie	2f41ecfe75	Refactor trimming logic for immutable memtables (#5022 ) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	2019-08-23 13:55:34 -07:00
DaiZhiwei	26293c89a6	crc32c_arm64 performance optimization (#5675 ) Summary: Crc32c Parallel computation coding optimization: Macro unfolding removes the "for" loop and is good to decrease branch-miss in arm64 micro architecture 1024 Bytes is divided into 8(head) + 1008( 6 * 7 * 3 * 8 ) + 8(tail) three parts Macro unfolding 42 loops to 6 CRC32C7X24BYTESs 1 CRC32C7X24BYTES containing 7 CRC32C24BYTESs 1, crc32c_test [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from CRC [ RUN ] CRC.StandardResults [ OK ] CRC.StandardResults (1 ms) [ RUN ] CRC.Values [ OK ] CRC.Values (0 ms) [ RUN ] CRC.Extend [ OK ] CRC.Extend (0 ms) [ RUN ] CRC.Mask [ OK ] CRC.Mask (0 ms) [----------] 4 tests from CRC (1 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1 ms total) [ PASSED ] 4 tests. 2, db_bench --benchmarks="crc32c" crc32c : 0.218 micros/op 4595390 ops/sec; 17950.7 MB/s (4096 per op) 3, repeated crc32c_test case 60000 times perf stat -e branch-miss -- ./crc32c_test before optimization: 739,426,504 branch-miss after optimization: 1,128,572 branch-miss Pull Request resolved: https://github.com/facebook/rocksdb/pull/5675 Differential Revision: D16989210 fbshipit-source-id: 7204e6069bb6ed066d49c2d1b3ac385065a98557	2019-08-23 11:04:08 -07:00
Levi Tamasi	df8c307d63	Revert to storing UncompressionDicts in the cache (#5645 ) Summary: PR https://github.com/facebook/rocksdb/issues/5584 decoupled the uncompression dictionary object from the underlying block data; however, this defeats the purpose of the digested ZSTD dictionary, since the whole point of the digest is to create it once and reuse it over and over again. This patch goes back to storing the uncompression dictionary itself in the cache (which should be now safe to do, since it no longer includes a Statistics pointer), while preserving the rest of the refactoring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5645 Test Plan: make asan_check Differential Revision: D16551864 Pulled By: ltamasi fbshipit-source-id: 2a7e2d34bb16e70e3c816506d5afe1d842057800	2019-08-23 08:27:30 -07:00
Kefu Chai	40712df9ab	ThreadPoolImpl::Impl::BGThreadWrapper() returns void (#5709 ) Summary: there is no need to return void*, as std:🧵:thread(Func&& f, Args&&... args ) only requires `Func` to be callable. Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5709 Differential Revision: D16832894 fbshipit-source-id: a1e1b876fa8d55589ef5feb5b27f3a435068b747	2019-08-16 13:55:41 -07:00
Yanqin Jin	ae152ee666	Avoid user key copying for Get/Put/Write with user-timestamp (#5502 ) Summary: In previous https://github.com/facebook/rocksdb/issues/5079, we added user-specified timestamp to `DB::Get()` and `DB::Put()`. Limitation is that these two functions may cause extra memory allocation and key copy. The reason is that `WriteBatch` does not allocate extra memory for timestamps because it is not aware of timestamp size, and we did not provide an API to assign/update timestamp of each key within a `WriteBatch`. We address these issues in this PR by doing the following. 1. Add a `timestamp_size_` to `WriteBatch` so that `WriteBatch` can take timestamps into account when calling `WriteBatch::Put`, `WriteBatch::Delete`, etc. 2. Add APIs `WriteBatch::AssignTimestamp` and `WriteBatch::AssignTimestamps` so that application can assign/update timestamps for each key in a `WriteBatch`. 3. Avoid key copy in `GetImpl` by adding new constructor to `LookupKey`. Test plan (on devserver): ``` $make clean && COMPILE_WITH_ASAN=1 make -j32 all $./db_basic_test --gtest_filter=Timestamp/DBBasicTestWithTimestampWithParam.PutAndGet/* $make check ``` If the API extension looks good, I will add more unit tests. Some simple benchmark using db_bench. ``` $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillseq,readrandom -num=1000000 $rm -rf /dev/shm/dbbench/* && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=1000000 -disable_wal=true ``` Master is at `a78503bd6c`. ``` \| \| readrandom \| fillrandom \| \| master \| 15.53 MB/s \| 25.97 MB/s \| \| PR5502 \| 16.70 MB/s \| 25.80 MB/s \| ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5502 Differential Revision: D16340894 Pulled By: riversand963 fbshipit-source-id: 51132cf792be07d1efc3ac33f5768c4ee2608bb8	2019-07-25 15:27:39 -07:00
Levi Tamasi	092f417037	Move the uncompression dictionary object out of the block cache (#5584 ) Summary: RocksDB has historically stored uncompression dictionary objects in the block cache as opposed to storing just the block contents. This neccesitated evicting the object upon table close. With the new code, only the raw blocks are stored in the cache, eliminating the need for eviction. In addition, the patch makes the following improvements: 1) Compression dictionary blocks are now prefetched/pinned similarly to index/filter blocks. 2) A copy operation got eliminated when the uncompression dictionary is retrieved. 3) Errors related to retrieving the uncompression dictionary are propagated as opposed to silently ignored. Note: the patch temporarily breaks the compression dictionary evicition stats. They will be fixed in a separate phase. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5584 Test Plan: make asan_check Differential Revision: D16344151 Pulled By: ltamasi fbshipit-source-id: 2962b295f5b19628f9da88a3fcebbce5a5017a7b	2019-07-23 16:01:44 -07:00
Eli Pozniansky	9f5cfb8e71	Fix for ReadaheadSequentialFile crash in ldb_cmd_test (#5586 ) Summary: Fixing a corner case crash when there was no data read from file, but status is still OK Pull Request resolved: https://github.com/facebook/rocksdb/pull/5586 Differential Revision: D16348117 Pulled By: elipoz fbshipit-source-id: f97973308024f020d8be79ca3c56466b84d80656	2019-07-17 17:04:39 -07:00
Yuqi Gu	a3c1832e86	Arm64 CRC32 parallel computation optimization for RocksDB (#5494 ) Summary: Crc32c Parallel computation optimization: Algorithm comes from Intel whitepaper: [crc-iscsi-polynomial-crc32-instruction-paper](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf) Input data is divided into three equal-sized blocks Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes 1. crc32c_test: ``` [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from CRC [ RUN ] CRC.StandardResults [ OK ] CRC.StandardResults (1 ms) [ RUN ] CRC.Values [ OK ] CRC.Values (0 ms) [ RUN ] CRC.Extend [ OK ] CRC.Extend (0 ms) [ RUN ] CRC.Mask [ OK ] CRC.Mask (0 ms) [----------] 4 tests from CRC (1 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1 ms total) [ PASSED ] 4 tests. ``` 2. RocksDB benchmark: db_bench --benchmarks="crc32c" ``` Linear Arm crc32c: crc32c: 1.005 micros/op 995133 ops/sec; 3887.2 MB/s (4096 per op) ``` ``` Parallel optimization with Armv8 crypto extension: crc32c: 0.419 micros/op 2385078 ops/sec; 9316.7 MB/s (4096 per op) ``` It gets ~2.4x speedup compared to linear Arm crc32c instructions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5494 Differential Revision: D16340806 fbshipit-source-id: 95dae9a5b646fd20a8303671d82f17b2e162e945	2019-07-17 11:22:38 -07:00
Eli Pozniansky	0f4d90e6e4	Added support for sequential read-ahead file (#5580 ) Summary: Added support for sequential read-ahead file that can prefetch the read data and later serve it from internal cache buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5580 Differential Revision: D16287082 Pulled By: elipoz fbshipit-source-id: a3e7ad9643d377d39352ff63058ce050ec31dcf3	2019-07-16 18:21:18 -07:00
sdong	699a569c52	Remove RandomAccessFileReader.for_compaction_ (#5572 ) Summary: RandomAccessFileReader.for_compaction_ doesn't seem to be used anymore. Remove it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5572 Test Plan: USE_CLANG=1 make all check -j Differential Revision: D16286178 fbshipit-source-id: aa338049761033dfbe5e8b1707bbb0be2df5be7e	2019-07-16 16:32:18 -07:00
Yikun Jiang	f064d74e45	Cleanup the Arm64 CRC32 unused warning (#5565 ) Summary: When 'HAVE_ARM64_CRC' is set, the blew methods: - bool rocksdb::crc32c::isSSE42() - bool rocksdb::crc32c::isPCLMULQDQ() are defined but not used, the unused-function is raised when do rocksdb build. This patch try to cleanup these warnings by add ifndef, if it build under the HAVE_ARM64_CRC, we will not define `isSSE42` and `isPCLMULQDQ`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5565 Differential Revision: D16233654 fbshipit-source-id: c32a9dda7465dbf65f9ccafef159124db92cdffd	2019-07-15 11:20:26 -07:00
ggaurav28	60d8b19836	Implemented a file logger that uses WritableFileWriter (#5491 ) Summary: Current PosixLogger performs IO operations using posix calls. Thus the current implementation will not work for non-posix env. Created a new logger class EnvLogger that uses env specific WritableFileWriter for IO operations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5491 Test Plan: make check Differential Revision: D15909002 Pulled By: ggaurav28 fbshipit-source-id: 13a8105176e8e42db0c59798d48cb6a0dbccc965	2019-07-09 16:27:22 -07:00

1 2 3 4 5 ...

1787 Commits