rocksdb

Author	SHA1	Message	Date
Peter Dillinger	ec11eff3bc	FilterPolicy consolidation, part 2/2 (#5966 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. This change is step 2 of 2: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters remain in util/bloom_impl.h, and util/bloom_test.cc remains where it is for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5966 Test Plan: make check Differential Revision: D18124930 Pulled By: pdillinger fbshipit-source-id: 823bc09025b3395f092ef46a46aa5ba92a914d84	2019-10-24 15:44:51 -07:00
Levi Tamasi	f7e7b34ebe	Propagate SST and blob file numbers through the EventListener interface (#5962 ) Summary: This patch adds a number of new information elements to the FlushJobInfo and CompactionJobInfo structures that are passed to EventListeners via the OnFlush{Begin, Completed} and OnCompaction{Begin, Completed} callbacks. Namely, for flushes, the file numbers of the new SST and the oldest blob file it references are propagated. For compactions, the new pieces of information are the file number, level, and the oldest blob file referenced by each compaction input and output file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5962 Test Plan: Extended the EventListener unit tests with logic that checks that these information elements are correctly propagated from the corresponding FileMetaData. Differential Revision: D18095568 Pulled By: ltamasi fbshipit-source-id: 6874359a6aadb53366b5fe87adcb2f9bd27a0a56	2019-10-24 14:44:15 -07:00
Peter Dillinger	dd19014a7a	FilterPolicy consolidation, part 1/2 (#5963 ) Summary: The parts that are used to implement FilterPolicy / NewBloomFilterPolicy and not used other than for the block-based table should be consolidated under table/block_based/filter_policy*. I don't foresee sharing these APIs with e.g. the Plain Table because they don't expose hashes for reuse in indexing. This change is step 1 of 2: (a) mv table/full_filter_bits_builder.h to table/block_based/filter_policy_internal.h which I expect to expand soon to internally reveal more implementation details for testing. (b) consolidate eventual contents of table/block_based/filter_policy.cc in util/bloom.cc, which has the most elaborate revision history (see step 2 ...) Step 2 soon to follow: mv util/bloom.cc table/block_based/filter_policy.cc This gets its own PR so that git has the best chance of following the rename for blame purposes. Note that low-level shared implementation details of Bloom filters are in util/bloom_impl.h. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5963 Test Plan: make check Differential Revision: D18121199 Pulled By: pdillinger fbshipit-source-id: 8f21732c3d8909777e3240e4ac3123d73140326a	2019-10-24 13:20:35 -07:00
Peter Dillinger	2837008525	Vary key size and alignment in filter_bench (#5933 ) Summary: The first version of filter_bench has selectable key size but that size does not vary throughout a test run. This artificially favors "branchy" hash functions like the existing BloomHash, MurmurHash1, probably because of optimal return for branch prediction. This change primarily varies those key sizes from -2 to +2 bytes vs. the average selected size. We also set the default key size at 24 to better reflect our best guess of typical key size. But steadily random key sizes may not be realistic either. So this change introduces a new filter_bench option: -vary_key_size_log2_interval=n where the same key size is used 2^n times and then changes to another size. I've set the default at 5 (32 times same size) as a compromise between deployments with rather consistent vs. rather variable key sizes. On my Skylake system, the performance boost to MurmurHash1 largely lies between n=10 and n=15. Also added -vary_key_alignment (bool, now default=true), though this doesn't currently seem to matter in hash functions under consideration. This change also does a "dry run" for each testing scenario, to improve the accuracy of those numbers, as there was more difference between scenarios than expected. Subtracting gross test run times from dry run times is now also embedded in the output, because these "net" times are generally the most useful. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5933 Differential Revision: D18121683 Pulled By: pdillinger fbshipit-source-id: 3c7efee1c5661a5fe43de555e786754ddf80dc1e	2019-10-24 13:08:30 -07:00
Dan Lambright	2509531123	Add test showing range tombstones can create excessively large compactions (#5956 ) Summary: For more information on the original problem see this [link](https://github.com/facebook/rocksdb/issues/3977). This change adds two new tests. They are identical other than one uses range tombstones and the other does not. Each test generates sub files at L2 which overlap with keys L3. The test that uses range tombstones generates a single file at L2. This single file will generate a very large range overlap that will in turn create excessively large compaction. 1: T001 - T005 2: 000 - 005 In contrast, the test that uses key ranges generates 3 files at L2. As a single file is compacted at a time, those 3 files will generate less work per compaction iteration. 1: 001 - 002 1: 003 - 004 1: 005 2: 000 - 005 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5956 Differential Revision: D18071631 Pulled By: dlambrig fbshipit-source-id: 12abae75fb3e0b022d228c6371698aa5e53385df	2019-10-24 11:08:44 -07:00
sdong	9f1e5a0b87	CfConsistencyStressTest to validate key consistent across CFs in TestGet() (#5863 ) Summary: Right now in CF consitency stres test's TestGet(), keys are just fetched without validation. With this change, in 1/2 the time, compare all the CFs share the same value with the same key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5863 Test Plan: Run "make crash_test_with_atomic_flush" and see tests pass. Hack the code to generate some inconsistency and observe the test fails as expected. Differential Revision: D17934206 fbshipit-source-id: 00ba1a130391f28785737b677f80f366fb83cced	2019-10-23 16:57:16 -07:00
Peter Dillinger	6a32e3b562	Remove unused BloomFilterPolicy::hash_func_ (#5961 ) Summary: This is an internal, file-local "feature" that is not used and potentially confusing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5961 Test Plan: make check Differential Revision: D18099018 Pulled By: pdillinger fbshipit-source-id: 7870627eeed09941d12538ec55d10d2e164fc716	2019-10-23 15:47:17 -07:00
Yanqin Jin	b4ebda7a39	Make buckifier python3 compatible (#5922 ) Summary: Make buckifier/buckify_rocksdb.py run on both Python 3 and 2 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5922 Test Plan: ``` $python3 buckifier/buckify_rocksdb.py $python3 buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}' $python2 buckifier/buckify_rocksdb.py $python2 buckifier/buckify_rocksdb.py '{"fake": {"extra_deps": [":test_dep", "//fakes/module:mock1"], "extra_compiler_flags": ["-DROCKSDB_LITE", "-Os"]}}' ``` Differential Revision: D17920611 Pulled By: riversand963 fbshipit-source-id: cc6e2f36013a88a710d96098f6ca18cbe85e3f62	2019-10-23 13:52:27 -07:00
Zhichao Cao	0933360644	Fix the potential memory leak in trace_replay (#5955 ) Summary: In the previous PR https://github.com/facebook/rocksdb/issues/5934 , in the while loop, if/else if is used without ending with else to free the object referenced by ra, it might cause potential memory leak (warning during compiling). Fix it by changing the last "else if" to "else". Pull Request resolved: https://github.com/facebook/rocksdb/pull/5955 Test Plan: pass make asan check, pass the USE_CLANG=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j64 analyze. Differential Revision: D18071612 Pulled By: zhichao-cao fbshipit-source-id: 51c00023d0c97c2921507254329aed55d56e1786	2019-10-22 16:39:46 -07:00
Yanqin Jin	c0abc6bbc1	Use FLAGS_env for certain operations in db_bench (#5943 ) Summary: Since we already parse env_uri from command line and creates custom Env accordingly, we should invoke the methods of such Envs instead of using Env::Default(). Test Plan (on devserver): ``` $make db_bench db_stress $./db_bench -benchmarks=fillseq ./db_stress ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5943 Differential Revision: D18018550 Pulled By: riversand963 fbshipit-source-id: 03b61329aaae0dfd914a0b902cc677f570f102e3	2019-10-22 11:43:21 -07:00
Yanqin Jin	925250f42f	Include db_stress_tool in rocksdb tools lib (#5950 ) Summary: include db_stress_tool in rocksdb tools lib Test Plan (on devserver): ``` $make db_stress $./db_stress $make all && make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950 Differential Revision: D18044399 Pulled By: riversand963 fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1	2019-10-21 19:40:35 -07:00
Vijay Nadimpalli	5677f4f775	Using clang for internal ubsan tests (#5952 ) Summary: Using clang for internal ubsan tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5952 Differential Revision: D18048810 Pulled By: vjnadimpalli fbshipit-source-id: ae55677a1928397b067e972d0ecb4ac1b7e2c8dc	2019-10-21 19:37:00 -07:00
Peter Dillinger	27a124571f	Fix memory leak on error opening PlainTable (#5951 ) Summary: Several error paths in opening of a plain table would leak memory. PR https://github.com/facebook/rocksdb/issues/5940 opened the leak to one more error path, which happens to have been (mistakenly) exercised by CuckooTableDBTest.AdaptiveTable. That test has been fixed, and the exercising of plain table error cases (more than before) has been added as BadOptions1 and BadOptions2 to PlainTableDBTest. This effectively moved the memory leak to plain_table_db_test. Also here is a cheap fix for the memory leak, without (yet?) changing the signature of ReadTableProperties. This fixes ASAN on unit tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5951 Test Plan: make COMPILE_WITH_ASAN=1 check Differential Revision: D18051940 Pulled By: pdillinger fbshipit-source-id: e2952930c09a2b46c4f1ff09818c5090426929de	2019-10-21 16:53:06 -07:00
Zhichao Cao	7245fb5f63	Fix the potential memory leak of ReplayMultiThread (#5949 ) Summary: The pointer ra needs to be freed the status s returns not OK. In the previous PR https://github.com/facebook/rocksdb/issues/5934 , the ra is not freed which might cause potential memory leak. Fix this issue by moving the clarification of ra inside the while loop and freeing it as desired. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5949 Test Plan: pass make asan check. Differential Revision: D18045726 Pulled By: zhichao-cao fbshipit-source-id: d5445b7b832c8bb1dafe008bafea7bfe9eb0b1ce	2019-10-21 15:05:01 -07:00
Vijay Nadimpalli	2ce6aa5f39	Making platform 007 (gcc 7) default in build_detect_platform.sh (#5947 ) Summary: Making platform 007 (gcc 7) default in build_detect_platform.sh. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5947 Differential Revision: D18038837 Pulled By: vjnadimpalli fbshipit-source-id: 9ac2ddaa93bf328a416faec028970e039886378e	2019-10-21 12:09:29 -07:00
sdong	a0cd920026	LevelIterator to avoid gap after prefix bloom filters out a file (#5861 ) Summary: Right now, when LevelIterator::Seek() is called, when a file is filtered out by prefix bloom filter, the position is put to the beginning of the next file. This is a confusing internal interface because many keys in the levels are skipped. Avoid this behavior by checking the key of the next file against the seek key, and invalidate the whole iterator if the prefix doesn't match. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5861 Test Plan: Add a new unit test to validate the behavior; run all exsiting tests; run crash_test Differential Revision: D17918213 fbshipit-source-id: f06b47d937c7cc8919001f18dcc3af5b28c9cdac	2019-10-21 11:40:57 -07:00
sdong	30e2dc02f0	Fix VerifyChecksum readahead with mmap mode (#5945 ) Summary: A recent change introduced readahead inside VerifyChecksum(). However it is not compatible with mmap mode and generated wrong checksum verification failure. Fix it by not enabling readahead in mmap mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5945 Test Plan: Add a unit test that used to fail. Differential Revision: D18021443 fbshipit-source-id: 6f2eb600f81b26edb02222563a4006869d576bff	2019-10-21 11:38:30 -07:00
sdong	1a21afa789	Fix some dependency paths (#5946 ) Summary: Some dependency path is not correct so that ASAN cannot run with CLANG. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5946 Test Plan: Run ASAN with CLANG Differential Revision: D18040933 fbshipit-source-id: 1d82be9d350485cf1df1c792dad765188958641f	2019-10-21 10:41:47 -07:00
Levi Tamasi	29ccf2075c	Store the filter bits reader alongside the filter block contents (#5936 ) Summary: Amongst other things, PR https://github.com/facebook/rocksdb/issues/5504 refactored the filter block readers so that only the filter block contents are stored in the block cache (as opposed to the earlier design where the cache stored the filter block reader itself, leading to potentially dangling pointers and concurrency bugs). However, this change introduced a performance hit since with the new code, the metadata fields are re-parsed upon every access. This patch reunites the block contents with the filter bits reader to eliminate this overhead; since this is still a self-contained pure data object, it is safe to store it in the cache. (Note: this is similar to how the zstd digest is handled.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5936 Test Plan: make asan_check filter_bench results for the old code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.7153 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.4258 Single filter ns/op: 42.5974 Random filter ns/op: 217.861 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.4217 Single filter ns/op: 50.9855 Random filter ns/op: 219.167 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5172 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 32.3556 Single filter ns/op: 83.2239 Random filter ns/op: 370.676 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.2265 Single filter ns/op: 93.5651 Random filter ns/op: 408.393 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` With the new code: ``` $ ./filter_bench -quick WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 25.4285 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 31.0594 Single filter ns/op: 43.8974 Random filter ns/op: 226.075 ---------------------------- Outside queries... Dry run (25d) ns/op: 31.0295 Single filter ns/op: 50.3824 Random filter ns/op: 226.805 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -quick -use_full_block_reader WARNING: Assertions are enabled; benchmarks unnecessarily slow Building... Build avg ns/key: 26.5308 Number of filters: 16669 Total memory (MB): 200.009 Bits/key actual: 10.0647 ---------------------------- Inside queries... Dry run (46b) ns/op: 33.2968 Single filter ns/op: 58.6163 Random filter ns/op: 291.434 ---------------------------- Outside queries... Dry run (25d) ns/op: 32.1839 Single filter ns/op: 66.9039 Random filter ns/op: 292.828 Average FP rate %: 1.13993 ---------------------------- Done. (For more info, run with -legend or -help.) ``` Differential Revision: D17991712 Pulled By: ltamasi fbshipit-source-id: 7ea205550217bfaaa1d5158ebd658e5832e60f29	2019-10-18 19:32:59 -07:00
Yanqin Jin	c53db172a1	Fix TestIterate for HashSkipList in db_stress (#5942 ) Summary: Since SeekForPrev (used by Prev) is not supported by HashSkipList when prefix is used, we disable it when stress testing HashSkipList. - Change the default memtablerep to skip list. - Avoid Prev() when memtablerep is HashSkipList and prefix is used. Test Plan (on devserver): ``` $make db_stress $./db_stress -ops_per_thread=10000 -reopen=1 -destroy_db_initially=true -column_families=1 -threads=1 -column_families=1 -memtablerep=prefix_hash $# or simply $./db_stress $./db_stress -memtablerep=prefix_hash ``` Results must print "Verification successful". Pull Request resolved: https://github.com/facebook/rocksdb/pull/5942 Differential Revision: D18017062 Pulled By: riversand963 fbshipit-source-id: af867e59aa9e6f533143c984d7d529febf232fd7	2019-10-18 15:49:12 -07:00
Peter Dillinger	5f8f2fda0e	Refactor / clean up / optimize FullFilterBitsReader (#5941 ) Summary: FullFilterBitsReader, after creating in BloomFilterPolicy, was responsible for decoding metadata bits. This meant that FullFilterBitsReader::MayMatch had some metadata checks in order to implement "always true" or "always false" functionality in the case of inconsistent or trivial metadata. This made for ugly mixing-of-concerns code and probably had some runtime cost. It also didn't really support plugging in alternative filter implementations with extensions to the existing metadata schema. BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible for decoding filter metadata bits and constructing appropriate instances deriving from FilterBitsReader. "Always false" and "always true" derived classes allow FullFilterBitsReader not to be concerned with handling of trivial or inconsistent metadata. This also makes for easy expansion to alternative filter implementations in new, alternative derived classes. This change makes calls to FilterBitsReader::MayMatch necessarily virtual because there's now more than one built-in implementation. Compared with the previous implementation's extra 'if' checks in MayMatch, there's no consistent performance difference, measured by (an older revision of) filter_bench (differences here seem to be within noise): Inside queries... - Dry run (407) ns/op: 35.9996 + Dry run (407) ns/op: 35.2034 - Single filter ns/op: 47.5483 + Single filter ns/op: 47.4034 - Batched, prepared ns/op: 43.1559 + Batched, prepared ns/op: 42.2923 ... - Random filter ns/op: 150.697 + Random filter ns/op: 149.403 ---------------------------- Outside queries... - Dry run (980) ns/op: 34.6114 + Dry run (980) ns/op: 34.0405 - Single filter ns/op: 56.8326 + Single filter ns/op: 55.8414 - Batched, prepared ns/op: 48.2346 + Batched, prepared ns/op: 47.5667 - Random filter ns/op: 155.377 + Random filter ns/op: 153.942 Average FP rate %: 1.1386 Also, the FullFilterBitsReader ctor was responsible for a surprising amount of CPU in production, due in part to inefficient determination of the CACHE_LINE_SIZE used to construct the filter being read. The overwhelming common case (same as my CACHE_LINE_SIZE) is now substantially optimized, as shown with filter_bench with -new_reader_every=1 (old option - see below) (repeatable result): Inside queries... - Dry run (453) ns/op: 118.799 + Dry run (453) ns/op: 105.869 - Single filter ns/op: 82.5831 + Single filter ns/op: 74.2509 ... - Random filter ns/op: 224.936 + Random filter ns/op: 194.833 ---------------------------- Outside queries... - Dry run (aa1) ns/op: 118.503 + Dry run (aa1) ns/op: 104.925 - Single filter ns/op: 90.3023 + Single filter ns/op: 83.425 ... - Random filter ns/op: 220.455 + Random filter ns/op: 175.7 Average FP rate %: 1.13886 However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse. Also fixed inadequate check of consistency between filter data size and num_lines. (Unit test updated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941 Test Plan: previously added unit tests FullBloomTest.CorruptFilters and FullBloomTest.RawSchema Differential Revision: D18018353 Pulled By: pdillinger fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c	2019-10-18 14:50:52 -07:00
Peter Dillinger	fe464bca5c	Fix PlainTableReader not to crash sst_dump (#5940 ) Summary: Plain table SSTs could crash sst_dump because of a bug in PlainTableReader that can leave table_properties_ as null. Even if it was intended not to keep the table properties in some cases, they were leaked on the offending code path. Steps to reproduce: $ db_bench --benchmarks=fillrandom --num=2000000 --use_plain_table --prefix-size=12 $ sst_dump --file=0000xx.sst --show_properties from [] to [] Process /dev/shm/dbbench/000014.sst Sst file format: plain table Raw user collected properties ------------------------------ Segmentation fault (core dumped) Also added missing unit testing of plain table full_scan_mode, and an assertion in NewIterator to check for regression. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5940 Test Plan: new unit test, manual, make check Differential Revision: D18018145 Pulled By: pdillinger fbshipit-source-id: 4310c755e824c4cd6f3f86a3abc20dfa417c5e07	2019-10-18 14:44:42 -07:00
Zhichao Cao	526e3b9763	Enable trace_replay with multi-threads (#5934 ) Summary: In the current trace replay, all the queries are serialized and called by single threads. It may not simulate the original application query situations closely. The multi-threads replay is implemented in this PR. Users can set the number of threads to replay the trace. The queries generated according to the trace records are scheduled in the thread pool job queue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5934 Test Plan: test with make check and real trace replay. Differential Revision: D17998098 Pulled By: zhichao-cao fbshipit-source-id: 87eecf6f7c17a9dc9d7ab29dd2af74f6f60212c8	2019-10-18 14:13:50 -07:00
Levi Tamasi	69bd8a2859	Update HISTORY.md with recent BlobDB adjacent changes Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5939 Differential Revision: D18009096 Pulled By: ltamasi fbshipit-source-id: 032a48a302f9da38aecf4055b5a8d4e1dffd9dc7	2019-10-18 10:24:23 -07:00
Yanqin Jin	e60cc0925c	Expose db stress tests (#5937 ) Summary: expose db stress test by providing db_stress_tool.h in public header. This PR does the following: - adds a new header, db_stress_tool.h, in include/rocksdb/ - renames db_stress.cc to db_stress_tool.cc - adds a db_stress.cc which simply invokes a test function. - update Makefile accordingly. Test Plan (dev server): ``` make db_stress ./db_stress ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5937 Differential Revision: D17997647 Pulled By: riversand963 fbshipit-source-id: 1a8d9994f89ce198935566756947c518f0052410	2019-10-18 09:46:44 -07:00
Levi Tamasi	fdc1cb43a6	Support decoding blob indexes in sst_dump (#5926 ) Summary: The patch adds a new command line parameter --decode_blob_index to sst_dump. If this switch is specified, sst_dump prints blob indexes in a human readable format, printing the blob file number, offset, size, and expiration (if applicable) for blob references, and the blob value (and expiration) for inlined blobs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5926 Test Plan: Used db_bench's BlobDB mode to generate SST files containing blob references with and without expiration, as well as inlined blobs with and without expiration (note: the latter are stored as plain values), and confirmed sst_dump correctly prints all four types of records. Differential Revision: D17939077 Pulled By: ltamasi fbshipit-source-id: edc5f58fee94ba35f6699c6a042d5758f5b3963d	2019-10-17 19:36:54 -07:00
Yi Wu	1f9d7c0f54	Fix OnFlushCompleted fired before flush result write to MANIFEST (#5908 ) Summary: When there are concurrent flush job on the same CF, `OnFlushCompleted` can be called before the flush result being install to LSM. Fixing the issue by passing `FlushJobInfo` through `MemTable`, and the thread who commit the flush result can fetch the `FlushJobInfo` and fire `OnFlushCompleted` on behave of the thread actually writing the SST. Fix https://github.com/facebook/rocksdb/issues/5892 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5908 Test Plan: Add new test. The test will fail without the fix. Differential Revision: D17916144 Pulled By: riversand963 fbshipit-source-id: e18df67d9533b5baee52ae3605026cdeb05cbe10	2019-10-16 10:40:23 -07:00
Maysam Yabandeh	2c9e9f2a59	Update HISTORY for SeekForPrev bug fix (#5925 ) Summary: Update history for the bug fix in https://github.com/facebook/rocksdb/pull/5907 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5925 Differential Revision: D17952605 Pulled By: maysamyabandeh fbshipit-source-id: 609afcbb2e4087f9153822c4d11193a75a7b0e7a	2019-10-16 07:59:26 -07:00
Yanqin Jin	5ef27dea33	Fix clang analyzer error (#5924 ) Summary: Without this PR, clang analyzer complains. ``` $USE_CLANG=1 make analyze db/compaction/compaction_job_test.cc:161:20: warning: The left operand of '==' is a garbage value if (key.type == kTypeBlobIndex) { ~~~~~~~~ ^ 1 warning generated. ``` Test Plan (on devserver) ``` $USE_CLANG=1 make analyze ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5924 Differential Revision: D17923226 Pulled By: riversand963 fbshipit-source-id: 9d1eb769b5e0de7cb3d89dc90d1cfa895db7fdc8	2019-10-14 22:14:24 -07:00
Levi Tamasi	78b28d80b0	Support non-TTL Puts for BlobDB in db_bench (#5921 ) Summary: Currently, db_bench only supports PutWithTTL operations for BlobDB but not regular Puts. The patch adds support for regular (non-TTL) Puts and also changes the default for blob_db_max_ttl_range to zero, which corresponds to no TTL. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5921 Test Plan: make check ./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1 -duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 (issues Put operations with no TTL) ./db_bench -benchmarks=fillrandom -statistics -stats_interval_seconds=1 -duration=90 -num=500000 -use_blob_db=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 -blob_db_max_ttl_range=86400 (issues PutWithTTL operations with random TTLs in the [0, blob_db_max_ttl_range) interval, as before) Differential Revision: D17919798 Pulled By: ltamasi fbshipit-source-id: b946c3522b836b92b4c157ffbad24f92ba2b0a16	2019-10-14 17:49:20 -07:00
Peter Dillinger	93edd51c4a	bloom_test.cc: include <array> (#5920 ) Summary: Fix build failure on some platforms, reported in issue https://github.com/facebook/rocksdb/issues/5914 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5920 Test Plan: make bloom_test && ./bloom_test Differential Revision: D17918328 Pulled By: pdillinger fbshipit-source-id: b822004d4442de0171db2aeff433677783f7b94e	2019-10-14 15:38:31 -07:00
Levi Tamasi	5f025ea832	BlobDB GC: add SST <-> oldest blob file referenced mapping (#5903 ) Summary: This is groundwork for adding garbage collection support to BlobDB. The patch adds logic that keeps track of the oldest blob file referred to by each SST file. The oldest blob file is identified during flush/ compaction (similarly to how the range of keys covered by the SST is identified), and persisted in the manifest as a custom field of the new file edit record. Blob indexes with TTL are ignored for the purposes of identifying the oldest blob file (since such blob files are cleaned up by the TTL logic in BlobDB). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5903 Test Plan: Added new unit tests; also ran db_bench in BlobDB mode, inspected the manifest using ldb, and confirmed (by scanning the SST files using sst_dump) that the value of the oldest blob file number field matches the contents of the file for each SST. Differential Revision: D17859997 Pulled By: ltamasi fbshipit-source-id: 21662c137c6259a6af70446faaf3a9912c550e90	2019-10-14 15:21:01 -07:00
Levi Tamasi	a59dc843a4	Move blob_index.h to db/ (#5919 ) Summary: Extracted from PR https://github.com/facebook/rocksdb/issues/5903 for technical reasons. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5919 Test Plan: make check Differential Revision: D17910132 Pulled By: ltamasi fbshipit-source-id: 6ecbb8d6e84b2a1d1f28575ad48ac3cc65833eb5	2019-10-14 12:54:05 -07:00
Yanqin Jin	231fffd07c	Add Env::SanitizeEnvOptions (#5885 ) Summary: Add Env::SanitizeEnvOptions to allow underlying environments properly configure env options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5885 Test Plan: ``` make check ``` Differential Revision: D17910327 Pulled By: riversand963 fbshipit-source-id: 86a1ac616e485742c35c4a9cc9f1227c529fc00f	2019-10-14 12:25:00 -07:00
Maysam Yabandeh	a6e615a7ba	Enable partitioned index/filter in stress tests (#5918 ) Summary: This is the 3rd attempt after the revert of https://github.com/facebook/rocksdb/issues/4020 and https://github.com/facebook/rocksdb/issues/5895 The last bug is fixed https://github.com/facebook/rocksdb/pull/5907 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5918 Test Plan: ``` make -j32 crash_test ``` Differential Revision: D17909489 Pulled By: maysamyabandeh fbshipit-source-id: 7dfb8cf998c2d295c86465dd21734593d277887e	2019-10-14 10:35:18 -07:00
Yanqin Jin	6febfd8451	OnTableFileCreationCompleted use "(nil)" for empty file during flush (#5905 ) Summary: Compaction can call OnTableFileCreationCompleted(). If file is empty, "(nil)" is used as the file name. Do the same for flush. Test plan (dev server): ``` make all make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5905 Differential Revision: D17883285 Pulled By: riversand963 fbshipit-source-id: 6565884adbb00e8023d88b17dfb3b6eb92220b59	2019-10-14 09:54:10 -07:00
Maysam Yabandeh	4e729f9095	Fix SeekForPrev bug with Partitioned Filters and Prefix (#5907 ) Summary: Partition Filters make use of a top-level index to find the partition that might have the bloom hash of the key. The index is with internal key format (before format version 3). Each partition contains the i) blooms of the keys in that range ii) bloom of prefixes of keys in that range, iii) the bloom of the prefix of the last key in the previous partition. When ::SeekForPrev(key), we first perform a prefix bloom test on the SST file. The partition however is identified using the full internal key, rather than the prefix key. The reason is to be compatible with the internal key format of the top-level index. This creates a corner case. Example: - SST k, Partition N: P1K1, P1K2 - SST k, top-level index: P1K2 - SST k+1, Partition 1: P2K1, P3K1 - SST k+1 top-level index: P3K1 When SeekForPrev(P1K3), it should point us to P1K2. However SST k top-level index would reject P1K3 since it is out of range. One possible fix would be to search with the prefix P1 (instead of full internal key P1K3) however the details of properly comparing prefix with full internal key might get complicated. The fix we apply in this PR is to look into the last partition anyway even if the key is out of range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5907 Differential Revision: D17889918 Pulled By: maysamyabandeh fbshipit-source-id: 169fd7b3c71dbc08808eae5a8340611ebe5bdc1e	2019-10-11 20:30:00 -07:00
Andrew Kryczka	b00761eea6	Fix block cache ID uniqueness for Windows builds (#5844 ) Summary: Since we do not evict a file's blocks from block cache before that file is deleted, we require a file's cache ID prefix is both unique and non-reusable. However, the Windows functionality we were relying on only guaranteed uniqueness. That meant a newly created file could be assigned the same cache ID prefix as a deleted file. If the newly created file had block offsets matching the deleted file, full cache keys could be exactly the same, resulting in obsolete data blocks returned from cache when trying to read from the new file. We noticed this when running on FAT32 where compaction was writing out of order keys due to reading obsolete blocks from its input files. The functionality is documented as behaving the same on NTFS, although I wasn't able to repro it there. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844 Test Plan: we had a reliable repro of out-of-order keys on FAT32 that was fixed by this change Differential Revision: D17752442 fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00	2019-10-11 18:19:31 -07:00
Yanqin Jin	bc8b05cb77	Revert "Enable partitioned index/filter in stress tests (#5895 )" (#5904 ) Summary: This reverts commit `2f4e288143`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5904 Differential Revision: D17871282 Pulled By: riversand963 fbshipit-source-id: d210725f8f3b26d8eac25892094da09d9694337e	2019-10-10 19:19:39 -07:00
Yanqin Jin	ddb62d1f29	Remove a webhook due to potential security concern (#5902 ) Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5902 Differential Revision: D17858150 Pulled By: riversand963 fbshipit-source-id: db2cd8a756faf7b9751b2651a22e1b29ca9fecec	2019-10-10 18:05:16 -07:00
Adam Retter	1e9c8d42a0	Fix the rocksjava release Vagrant build on CentOS (#5901 ) Summary: Closes https://github.com/facebook/rocksdb/issues/5873 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5901 Differential Revision: D17869585 fbshipit-source-id: 559472486f1d3ac80c0c7df6c421c4b612b9b7f9	2019-10-10 17:21:18 -07:00
Vijay Nadimpalli	4c49e38f15	MultiGet batching in memtable (#5818 ) Summary: RocksDB has a MultiGet() API that implements batched key lookup for higher performance (https://github.com/facebook/rocksdb/blob/master/include/rocksdb/db.h#L468). Currently, batching is implemented in BlockBasedTableReader::MultiGet() for SST file lookups. One of the ways it improves performance is by pipelining bloom filter lookups (by prefetching required cachelines for all the keys in the batch, and then doing the probe) and thus hiding the cache miss latency. The same concept can be extended to the memtable as well. This PR involves implementing a pipelined bloom filter lookup in DynamicBloom, and implementing MemTable::MultiGet() that can leverage it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5818 Test Plan: Existing tests Performance Test: Ran the below command which fills up the memtable and makes sure there are no flushes and then call multiget. Ran it on master and on the new change and see atleast 1% performance improvement across all the test runs I did. Sometimes the improvement was upto 5%. TEST_TMPDIR=/data/users/$USER/benchmarks/feature/ numactl -C 10 ./db_bench -benchmarks="fillseq,multireadrandom" -num=600000 -compression_type="none" -level_compaction_dynamic_level_bytes -write_buffer_size=200000000 -target_file_size_base=200000000 -max_bytes_for_level_base=16777216 -reads=90000 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4 -statistics -memtable_whole_key_filtering=true -memtable_bloom_size_ratio=10 Differential Revision: D17578869 Pulled By: vjnadimpalli fbshipit-source-id: 23dc651d9bf49db11d22375bf435708875a1f192	2019-10-10 09:39:39 -07:00
anand76	80ad996b35	Make the db_stress reopen loop in OperateDb() more robust (#5893 ) Summary: The loop in OperateDb() is getting quite complicated with the introduction of multiple key operations such as MultiGet and Reseeks. This is resulting in a number of corner cases that hangs db_stress due to synchronization problems during reopen (i.e when -reopen=<> option is specified). This PR makes it more robust by ensuring all db_stress threads vote to reopen the DB the exact same number of times. Most of the changes in this diff are due to indentation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5893 Test Plan: Run crash test Differential Revision: D17823827 Pulled By: anand1976 fbshipit-source-id: ec893829f611ac7cac4057c0d3d99f9ffb6a6dd9	2019-10-09 09:27:10 -07:00
katherine	5b123813f8	Remove deprecated RocksDBCommonHelper and cont_integration.sh (#5889 ) Summary: As titled. RocksDBCommonHelper contains references to legacy APIs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5889 Differential Revision: D17783179 fbshipit-source-id: dcde82a73a311bfa3300ad69189b3a32727134d1	2019-10-09 07:40:35 -07:00
Peter Dillinger	90e285efde	Fix some implicit conversions in filter_bench (#5894 ) Summary: Fixed some spots where converting size_t or uint_fast32_t to uint32_t. Wrapped mt19937 in a new Random32 class to avoid future such traps. NB: I tried using Random32::Uniform (std::uniform_int_distribution) in filter_bench instead of fastrange, but that more than doubled the dry run time! So I added fastrange as Random32::Uniformish. ;) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5894 Test Plan: USE_CLANG=1 build, and manual re-run filter_bench Differential Revision: D17825131 Pulled By: pdillinger fbshipit-source-id: 68feee333b5f8193c084ded760e3d6679b405ecd	2019-10-08 19:22:07 -07:00
Yanqin Jin	167cdc9f17	Support custom env in sst_dump (#5845 ) Summary: This PR allows for the creation of custom env when using sst_dump. If the user does not set options.env or set options.env to nullptr, then sst_dump will automatically try to create a custom env depending on the path to the sst file or db directory. In order to use this feature, the user must call ObjectRegistry::Register() beforehand. Test Plan (on devserver): ``` $make all && make check ``` All tests must pass to ensure this change does not break anything. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5845 Differential Revision: D17678038 Pulled By: riversand963 fbshipit-source-id: 58ecb4b3f75246d52b07c4c924a63ee61c1ee626	2019-10-08 19:19:12 -07:00
Maysam Yabandeh	2f4e288143	Enable partitioned index/filter in stress tests (#5895 ) Summary: This is the 2nd attempt after the revert of https://github.com/facebook/rocksdb/pull/4020 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5895 Test Plan: ``` ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000 ``` Differential Revision: D17822137 Pulled By: maysamyabandeh fbshipit-source-id: 3d148c0d8cc129080410ff859c04b544223c8ea3	2019-10-08 16:50:21 -07:00
Tomas Kolda	e3a93c9ee1	Fix crash when background task fails (#5879 ) Summary: Fixing crash. Full story in issue: https://github.com/facebook/rocksdb/issues/5878 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5879 Differential Revision: D17812299 Pulled By: anand1976 fbshipit-source-id: 14e5a4fc502ade974583da9692d0ed6e5014613a	2019-10-08 14:20:01 -07:00
Peter Dillinger	46ca51d430	filter_bench - a prelim tool for SST filter benchmarking (#5825 ) Summary: Example: using the tool before and after PR https://github.com/facebook/rocksdb/issues/5784 shows that the refactoring, presumed performance-neutral, actually sped up SST filters by about 3% to 8% (repeatable result): Before: - Dry run ns/op: 22.4725 - Single filter ns/op: 51.1078 - Random filter ns/op: 120.133 After: + Dry run ns/op: 22.2301 + Single filter run ns/op: 47.4313 + Random filter ns/op: 115.9 Only tests filters for the block-based table (full filters and partitioned filters - same implementation; not block-based filters), which seems to be the recommended format/implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5825 Differential Revision: D17804987 Pulled By: pdillinger fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5	2019-10-07 20:10:53 -07:00
Yanqin Jin	457bcfde02	Let TestEnv and FaultInjectEnv use Env of choice (#5886 ) Summary: Instead of hard coding Env::Default in TestEnv and a few other places, use the DBTestBase::env_ that has been deduced from the constructor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5886 Test Plan: ``` make check ``` Differential Revision: D17773029 Pulled By: riversand963 fbshipit-source-id: 7ce4e5175a487e9d281ea2c3aae3c41bffd44629	2019-10-07 17:48:50 -07:00

1 2 3 4 5 ...

8473 Commits