rocksdb

Author	SHA1	Message	Date
Peter Dillinger	8a72bb14bd	More fixes to auto-GarbageCollect in BackupEngine (#6023 ) Summary: Production: * Fixes GarbageCollect (and auto-GC triggered by PurgeOldBackups, DeleteBackup, or CreateNewBackup) to clean up backup directory independent of current settings (except max_valid_backups_to_open; see issue https://github.com/facebook/rocksdb/issues/4997) and prior settings used with same backup directory. * Fixes GarbageCollect (and auto-GC) not to attempt to remove "." and ".." entries from directories. * Clarifies contract with users in modifying BackupEngine operations. In short, leftovers from any incomplete operation are cleaned up by any subsequent call to that same kind of operation (PurgeOldBackups and DeleteBackup considered the same kind of operation). GarbageCollect is available to clean up after all kinds. (NB: right now PurgeOldBackups and DeleteBackup will clean up after incomplete CreateNewBackup, but we aren't promising to continue that behavior.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6023 Test Plan: * Refactors open parameters to use an option enum, for readability, etc. (Also fixes an unused parameter bug in the redundant OpenDBAndBackupEngineShareWithChecksum.) * Fixes an apparent bug in ShareTableFilesWithChecksumsTransition in which old backup data was destroyed in the transition to be tested. That test is now augmented to ensure GarbageCollect (or auto-GC) does not remove shared files when BackupEngine is opened with share_table_files=false. * Augments DeleteTmpFiles test to ensure that CreateNewBackup does auto-GC when an incompletely created backup is detected. Differential Revision: D18453559 Pulled By: pdillinger fbshipit-source-id: 5e54e7b08d711b161bc9c656181012b69a8feac4	2019-11-15 12:15:12 -08:00
Peter Dillinger	a6d418384d	Auto-GarbageCollect on PurgeOldBackups and DeleteBackup (#6015 ) Summary: Only if there is a crash, power failure, or I/O error in DeleteBackup, shared or private files from the backup might be left behind that are not cleaned up by PurgeOldBackups or DeleteBackup-- only by GarbageCollect. This makes the BackupEngine API "leaky by default." Even if it means a modest performance hit, I think we should make Delete and Purge do as they say, with ongoing best effort: i.e. future calls will attempt to finish any incomplete work from earlier calls. This change does that by having DeleteBackup and PurgeOldBackups do a GarbageCollect, unless (to minimize performance hit) this BackupEngine has already done a GarbageCollect and there have been no deletion-related I/O errors in that GarbageCollect or since then. Rejected alternative 1: remove meta file last instead of first. This would in theory turn partially deleted backups into corrupted backups, but code changes would be needed to allow the missing files and consider it acceptably corrupt, rather than failing to open the BackupEngine. This might be a reasonable choice, but I mostly rejected it because it doesn't solve the legacy problem of cleaning up existing lingering files. Rejected alternative 2: use a deletion marker file. If deletion started with creating a file that marks a backup as flagged for deletion, then we could reliably detect partially deleted backups and efficiently finish removing them. In addition to not solving the legacy problem, this could be precarious if there's a disk full situation, and we try to create a new file in order to delete some files. Ugh. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6015 Test Plan: Updated unit tests Differential Revision: D18401333 Pulled By: pdillinger fbshipit-source-id: 12944e372ce6809f3f5a4c416c3b321a8927d925	2019-11-15 12:15:12 -08:00
anand76	cb1dc29655	Fix a buffer overrun problem in BlockBasedTable::MultiGet (#6014 ) Summary: The calculation in BlockBasedTable::MultiGet for the required buffer length for reading in compressed blocks is incorrect. It needs to take the 5-byte block trailer into account. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6014 Test Plan: Add a unit test DBBasicTest.MultiGetBufferOverrun that fails in asan_check before the fix, and passes after. Differential Revision: D18412753 Pulled By: anand1976 fbshipit-source-id: 754dfb66be1d5f161a7efdf87be872198c7e3b72	2019-11-12 10:57:32 -08:00
anand76	98e5189fb0	Fix MultiGet crash when no_block_cache is set (#5991 ) Summary: This PR fixes https://github.com/facebook/rocksdb/issues/5975. In ```BlockBasedTable::RetrieveMultipleBlocks()```, we were calling ```MaybeReadBlocksAndLoadToCache()```, which is a no-op if neither uncompressed nor compressed block cache are configured. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5991 Test Plan: 1. Add unit tests that fail with the old code and pass with the new 2. make check and asan_check Cc spetrunia Differential Revision: D18272744 Pulled By: anand1976 fbshipit-source-id: e62fa6090d1a6adf84fcd51dfd6859b03c6aebfe	2019-11-12 10:56:03 -08:00
Vijay Nadimpalli	3353b7141d	Making platform 007 (gcc 7) default in build_detect_platform.sh (#5947 ) Summary: Making platform 007 (gcc 7) default in build_detect_platform.sh. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5947 Differential Revision: D18038837 Pulled By: vjnadimpalli fbshipit-source-id: 9ac2ddaa93bf328a416faec028970e039886378e	2019-10-30 10:32:53 -07:00
sdong	d72cceb443	Fix VerifyChecksum readahead with mmap mode (#5945 ) Summary: A recent change introduced readahead inside VerifyChecksum(). However it is not compatible with mmap mode and generated wrong checksum verification failure. Fix it by not enabling readahead in mmap mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5945 Test Plan: Add a unit test that used to fail. Differential Revision: D18021443 fbshipit-source-id: 6f2eb600f81b26edb02222563a4006869d576bff	2019-10-22 11:42:36 -07:00
myabandeh	1d5083a007	Bump up the version to 6.5.1	2019-10-16 10:55:02 -07:00
Maysam Yabandeh	6ea6aa77cd	Update HISTORY for SeekForPrev bug fix (#5925 ) Summary: Update history for the bug fix in https://github.com/facebook/rocksdb/pull/5907 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5925 Differential Revision: D17952605 Pulled By: maysamyabandeh fbshipit-source-id: 609afcbb2e4087f9153822c4d11193a75a7b0e7a	2019-10-16 10:52:59 -07:00
Maysam Yabandeh	4229f6df50	Fix SeekForPrev bug with Partitioned Filters and Prefix (#5907 ) Summary: Partition Filters make use of a top-level index to find the partition that might have the bloom hash of the key. The index is with internal key format (before format version 3). Each partition contains the i) blooms of the keys in that range ii) bloom of prefixes of keys in that range, iii) the bloom of the prefix of the last key in the previous partition. When ::SeekForPrev(key), we first perform a prefix bloom test on the SST file. The partition however is identified using the full internal key, rather than the prefix key. The reason is to be compatible with the internal key format of the top-level index. This creates a corner case. Example: - SST k, Partition N: P1K1, P1K2 - SST k, top-level index: P1K2 - SST k+1, Partition 1: P2K1, P3K1 - SST k+1 top-level index: P3K1 When SeekForPrev(P1K3), it should point us to P1K2. However SST k top-level index would reject P1K3 since it is out of range. One possible fix would be to search with the prefix P1 (instead of full internal key P1K3) however the details of properly comparing prefix with full internal key might get complicated. The fix we apply in this PR is to look into the last partition anyway even if the key is out of range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5907 Differential Revision: D17889918 Pulled By: maysamyabandeh fbshipit-source-id: 169fd7b3c71dbc08808eae5a8340611ebe5bdc1e	2019-10-16 10:51:46 -07:00
anand76	73a35c6e17	Update HISTORY.md with a bug fix Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:	2019-10-07 16:54:33 -07:00
anand76	fc53ac86f6	Fix data block upper bound checking for iterator reseek case (#5883 ) Summary: When an iterator reseek happens with the user specifying a new iterate_upper_bound in ReadOptions, and the new seek position is at the end of the same data block, the Seek() ends up using a stale value of data_block_within_upper_bound_ and may return incorrect results. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5883 Test Plan: Added a new test case DBIteratorTest.IterReseekNewUpperBound. Verified that it failed due to the assertion failure without the fix, and passes with the fix. Differential Revision: D17752740 Pulled By: anand1976 fbshipit-source-id: f9b635ff5d6aeb0e1bef102cf8b2f900efd378e3	2019-10-07 16:39:18 -07:00
sdong	2060a008b0	Fix a previous revert	2019-10-01 16:58:47 -07:00
sdong	89865776b7	Revert "Merging iterator to avoid child iterator reseek for some cases (#5286 )" (#5871 ) Summary: This reverts commit 9fad3e21eb90d215b6719097baba417bc1eeca3c. Iterator verification in stress tests sometimes fail for assertion table/block_based/block_based_table_reader.cc:2973: void rocksdb::BlockBasedTableIterator<TBlockIter, TValue>::FindBlockForward() [with TBlockIter = rocksdb::DataBlockIter; TValue = rocksdb::Slice]: Assertion `!next_block_is_out_of_bound \|\| user_comparator_.Compare(*read_options_.iterate_upper_bound, index_iter_->user_key()) <= 0' failed. It is likely to be linked to https://github.com/facebook/rocksdb/pull/5286 together with https://github.com/facebook/rocksdb/pull/5468 as the former PR makes some child iterator's seek being avoided, so that upper bound condition fails to be updated there. Strictly speaking, the former PR was merged before the latter one, but the latter one feels a more important improvement so I choose to revert the former one for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5871 Differential Revision: D17689196 fbshipit-source-id: 4ded5be68f67bee2782d31a29cb72ea68f59dd8c	2019-10-01 14:41:58 -07:00
Fosco Marotto	749b35d019	Update history and version for 6.5 branch	2019-09-13 11:53:36 -07:00
Peter Dillinger	6a171724b7	Clean up + fix build scripts re: USE_SSE= and PORTABLE= (#5800 ) Summary: In preparing to utilize a new Intel instruction extension, I noticed problems with the existing build script in regard to the existing utilized extensions, either with USE_SSE or PORTABLE flags. * PORTABLE=0 was interpreted the same as PORTABLE=1. Now empty and 0 mean the same. (I guess you were not supposed to set PORTABLE= if you wanted non-portable--except that...) * The Facebook build script extensions would set PORTABLE=1 even if it's already set in a make var or environment. Now it does not override a non-empty setting, so use PORTABLE=0 for fully optimized build, overriding Facebook environment default. * Put in an explanation of the USE_SSE flag where it's used by build_detect_platform, and cleaned up some confusing/redundant associated logic. * If USE_SSE was set and expected intrinsics were not available, build_detect_platform would exit early but build would proceed with broken, incomplete configuration. Now warning is gracefully recovered. * If USE_SSE was set and expected intrinsics were not available, build would still try to use flags like -msse4.2 etc. which could lead to unexpected compilation failure or binary incompatibility. Now those flags are not used if the warning is issued. This should not break or change existing, valid build scripts. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5800 Test Plan: manual case testing Differential Revision: D17369543 Pulled By: pdillinger fbshipit-source-id: 4ee244911680ae71144d272c40aceea548e3ce88	2019-09-13 11:07:13 -07:00
Lingjing You	9ba88a1e5d	Update history.md for option memtable_insert_hint_per_batch (#5799 ) Summary: Update history.md for option memtable_insert_hint_per_batch Pull Request resolved: https://github.com/facebook/rocksdb/pull/5799 Differential Revision: D17369186 fbshipit-source-id: 71d82f9d99d9a52d1475d1b0153670957b6111e9	2019-09-13 10:51:32 -07:00
Ronak Sisodia	27f516acc8	Update HISTORY.md for option to make write group size configurable (#5798 ) Summary: Update HISTORY.md for option to make write group size configurable . Pull Request resolved: https://github.com/facebook/rocksdb/pull/5798 Differential Revision: D17369062 fbshipit-source-id: 390a3fa0b01675e91879486a729cf2cc7624d106	2019-09-13 10:43:09 -07:00
Peter Dillinger	aa2486b23c	Refactor some confusing logic in PlainTableReader Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5780 Test Plan: existing plain table unit test Differential Revision: D17368629 Pulled By: pdillinger fbshipit-source-id: f25409cdc2f39ebe8d5cbb599cf820270e6b5d26	2019-09-13 10:26:36 -07:00
Lingjing You	1a928c22a0	Add insert hints for each writebatch (#5728 ) Summary: Add insert hints for each writebatch so that they can be used in concurrent write, and add write option to enable it. Bench result (qps): `./db_bench --benchmarks=fillseq -allow_concurrent_memtable_write=true -num=4000000 -batch-size=1 -threads=1 -db=/data3/ylj/tmp -write_buffer_size=536870912 -num_column_families=4` master: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 387883 \| 220790 \| 308294 \| 490998 \| \| 10 \| 1397208 \| 978911 \| 1275684 \| 1733395 \| \| 100 \| 2045414 \| 1589927 \| 1798782 \| 2681039 \| \| 1000 \| 2228038 \| 1698252 \| 1839877 \| 2863490 \| fillseq with writebatch hint: \| batch size \ thread num \| 1 \| 2 \| 4 \| 8 \| \| ----------------------- \| ------- \| ------- \| ------- \| ------- \| \| 1 \| 286005 \| 223570 \| 300024 \| 466981 \| \| 10 \| 970374 \| 813308 \| 1399299 \| 1753588 \| \| 100 \| 1962768 \| 1983023 \| 2676577 \| 3086426 \| \| 1000 \| 2195853 \| 2676782 \| 3231048 \| 3638143 \| Pull Request resolved: https://github.com/facebook/rocksdb/pull/5728 Differential Revision: D17297240 fbshipit-source-id: b053590a6d77871f1ef2f911a7bd013b3899b26c	2019-09-12 17:15:18 -07:00
HouBingjian	a378a4c2ac	arm64 crc prefetch optimise (#5773 ) Summary: prefetch data for following block，avoid cache miss when doing crc caculate I do performance test at kunpeng-920 server(arm-v8, 64core@2.6GHz) ./db_bench --benchmarks=crc32c --block_size=500000000 before optimise : 587313.500 micros/op 1 ops/sec; 811.9 MB/s (500000000 per op) after optimise : 289248.500 micros/op 3 ops/sec; 1648.5 MB/s (500000000 per op) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5773 Differential Revision: D17347339 fbshipit-source-id: bfcd74f0f0eb4b322b959be68019ddcaae1e3341	2019-09-12 16:59:44 -07:00
Levi Tamasi	d35ffd569c	Temporarily disable hash index in stress tests (#5792 ) Summary: PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash tests, resulting in assertion failures in Block. This patch disables the hash index until we can pinpoint the root cause of these issues. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792 Test Plan: Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2 (binary search and partitioned index). Differential Revision: D17346777 Pulled By: ltamasi fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551	2019-09-12 12:11:34 -07:00
Adam Retter	e8c2e68b4e	Fix RocksDB bug in block_cache_trace_analyzer.cc on Windows (#5786 ) Summary: This is required to compile on Windows with Visual Studio 2015. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5786 Differential Revision: D17335994 fbshipit-source-id: 8f9568310bc6f697e312b5e24ad465e9084f0011	2019-09-11 18:36:41 -07:00
Ronak Sisodia	d05c0fe4d1	Option to make write group size configurable (#5759 ) Summary: The max batch size that we can write to the WAL is controlled by a static manner. So if the leader write is less than 128 KB we will have the batch size as leader write size + 128 KB else the limit will be 1 MB. Both of them are statically defined. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5759 Differential Revision: D17329298 fbshipit-source-id: a3d910629d8d8ca84ea39ad89c2b2d284571ded5	2019-09-11 18:28:33 -07:00
Shylock Hg	9eb3e1f77d	Use delete to disable automatic generated methods. (#5009 ) Summary: Use delete to disable automatic generated methods instead of private, and put the constructor together for more clear.This modification cause the unused field warning, so add unused attribute to disable this warning. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5009 Differential Revision: D17288733 fbshipit-source-id: 8a767ce096f185f1db01bd28fc88fef1cdd921f3	2019-09-11 18:09:00 -07:00
Wilfried Goesgens	fcda80fc33	record the timestamp on first configure (#4799 ) Summary: cmake doesn't re-generate the timestamp on subsequent builds causing rebuilds of the lib This improves compile time turn-arounds if you have rocksdb as a compileable library include, since with the state its now it will re-generate the time stamp .cc file each time you build, and thus re-compile + re-link the rocksdb library though anything in the source actually changed. The original timestamp is recorded into `CMakeCache.txt` and will remain there until you flush this cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4799 Differential Revision: D17290040 fbshipit-source-id: 28357fef3422693c9c19e88fa2873c8db0f662ed	2019-09-11 18:00:02 -07:00
Andrew Kryczka	dd2a35f13f	Support partitioned index and filters in stress/crash tests (#4020 ) Summary: - In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test - When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020 Test Plan: currently this is blocked on fixing the bug that crash test caught: ``` $ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000 ... Verification failed for column family 0 key 937501: Value not found: NotFound: Crash-recovery verification failed :( ``` Differential Revision: D8508683 Pulled By: maysamyabandeh fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629	2019-09-11 14:13:38 -07:00
Andrew Kryczka	20dd828c01	Avoid clock_gettime on pre-10.12 macOS versions (#5570 ) Summary: On older macOS like 10.10 we saw the following compiler error: ``` /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/env/env_posix.cc:845:19: error: use of undeclared identifier 'CLOCK_THREAD_CPUTIME_ID' clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts); ^ ``` According to mac's `man clock_gettime`: "These functions first appeared in Mac OSX 10.12". So we should not try to compile it on earlier versions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5570 Test Plan: verified it compiles now on 10.10. Also did some investigation to ensure it does not cause regression on macOS 10.12+, although I do not have access to such an environment to really test. Differential Revision: D17322629 Pulled By: riversand963 fbshipit-source-id: e0a412223854f826b4d83e6d15c3739ff4620d7d	2019-09-11 14:07:25 -07:00
tongyingrui	c85c87a718	test size was wrong in 'fillbatch' benchmark (#5198 ) Summary: for fillbatch benchmar, the numEntries should be [num_] but not [num_ / 1000] because numEntries is just the total entries we want to test Pull Request resolved: https://github.com/facebook/rocksdb/pull/5198 Differential Revision: D17274664 Pulled By: anand1976 fbshipit-source-id: f96e952babdbac63fb99d14e1254d478a10437be	2019-09-11 12:04:44 -07:00
anand76	2becafdb43	Fix Appveyor build due to signed/unsigned comparison Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5788 Test Plan: Travis CI and Appveyor should complete successfully. Differential Revision: D17287422 Pulled By: anand1976 fbshipit-source-id: d9408b692f78be95d0088b29b33f6a8ff40ec97b	2019-09-10 14:34:37 -07:00
anand76	eb9026f09b	Add a db_bench benchmark to warm up the row cache Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5707 Differential Revision: D17242698 Pulled By: anand1976 fbshipit-source-id: 5d1bfda3c9e8f56176ae391cae6c91e6262016b8	2019-09-10 11:06:36 -07:00
jsteemann	4d945c57ac	do a bit less work in the normal case (#5695 ) Summary: i.e. if alive logfile is not being moved to archive while we are in GetSortedWalsOfType() Pull Request resolved: https://github.com/facebook/rocksdb/pull/5695 Differential Revision: D17279489 Pulled By: vjnadimpalli fbshipit-source-id: 02bcf920a75b812edba8b87c6079b4e6fd5e683c	2019-09-10 09:41:45 -07:00
Richard He	699e1b5ede	Added support for SstFileReader JNI interface (#5556 ) Summary: Feature request as per https://github.com/facebook/rocksdb/issues/5538 issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5556 Differential Revision: D17219008 fbshipit-source-id: e31f18dec318416eac9dea8213bab31da96e1f3a	2019-09-09 18:12:53 -07:00
Peter Dillinger	7af6ced14b	Fix block allocation bug in new DynamicBloom (#5783 ) Summary: Bug found by valgrind. New DynamicBloom wasn't allocating in block sizes. New assertion added that probes starting in final word would be in bounds. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5783 Test Plan: ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 valgrind --leak-check=full ./dynamic_bloom_test Differential Revision: D17270623 Pulled By: pdillinger fbshipit-source-id: 1e0407504b875133a771383cd488c70f91be2b87	2019-09-09 15:26:43 -07:00
Peter Dillinger	108c619acb	Add regression test for serialized Bloom filters (#5778 ) Summary: Check that we don't accidentally change the on-disk format of existing Bloom filter implementations, including for various CACHE_LINE_SIZE (by changing temporarily). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5778 Test Plan: thisisthetest Differential Revision: D17269630 Pulled By: pdillinger fbshipit-source-id: c77017662f010a77603b7d475892b1f0d5563d8b	2019-09-09 14:51:30 -07:00
Wilfried Goesgens	fbab9913e2	upgrade gtest 1.7.0 => 1.8.1 for json result writing Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332 Differential Revision: D17242232 fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64	2019-09-09 11:24:11 -07:00
sdong	adbc25a4c8	Rename InternalDBStatsType enum names (#5779 ) Summary: When building with clang 9, warning is reported for InternalDBStatsType type names shadowed the one for statistics. Rename them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5779 Test Plan: Build with clang 9 and see it passes. Differential Revision: D17239378 fbshipit-source-id: af28fb42066c738cd1b841f9fe21ab4671dafd18	2019-09-06 17:31:10 -07:00
houbingjian	cbfa729d37	cmakelist fix， add +crypto flag when use arm crc (#5750 ) Summary: cmake list add +crypto flag when use armv8 cpu the function crc32c_arm64 use HAVE_ARM64_CRYPTO to check if can enable arm-neon instructions : #ifdef HAVE_ARM64_CRYPTO /* Crc32c Parallel computation * Algorithm comes from Intel whitepaper: * crc-iscsi-polynomial-crc32-instruction-paper * * Input data is divided into three equal-sized blocks * Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes * One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes */ but the cmakelist not check and pass crypto flag now I check the default Makefile has it: ifeq (,$(shell $(CXX) -fsyntax-only -march=armv8-a+crc -xc /dev/null 2>&1)) CXXFLAGS += -march=armv8-a+crc+crypto CFLAGS += -march=armv8-a+crc+crypto ARMCRC_SOURCE=1 endif Pull Request resolved: https://github.com/facebook/rocksdb/pull/5750 Differential Revision: D17242027 fbshipit-source-id: 443c9b89755b4bc34e265205ab922db1b2e14bde	2019-09-06 17:03:21 -07:00
Maysam Yabandeh	78b8cfc7ec	WriteUnPrepared: Split ReadYourOwnWriteStress to three (#5776 ) Summary: ReadYourOwnWriteStress occasionally times out on some platforms. The patch splits it to three. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5776 Differential Revision: D17231743 Pulled By: maysamyabandeh fbshipit-source-id: d42eeaf22f61a48d50f9c404d98b1081ae8dac94	2019-09-06 15:25:26 -07:00
Manuel Ung	2208cc0196	Fix build break in TransactionBaseImpl::TrackKey (#5771 ) Summary: Fix build broken in https://github.com/facebook/rocksdb/pull/5696. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5771 Differential Revision: D17217665 Pulled By: lth fbshipit-source-id: 7aa84a2a9b4feb7a3ab1cab174e09276430fe042	2019-09-06 10:18:04 -07:00
奏之章	533e47709c	Fix WriteBatchWithIndex with MergeOperator bug (#5577 ) Summary: ``` TEST_F(WriteBatchWithIndexTest, TestGetFromBatchAndDBMerge3) { DB* db; Options options; options.create_if_missing = true; std::string dbname = test::PerThreadDBPath("write_batch_with_index_test"); options.merge_operator = MergeOperators::CreateFromStringId("stringappend"); DestroyDB(dbname, options); Status s = DB::Open(options, dbname, &db); assert(s.ok()); ReadOptions read_options; WriteOptions write_options; FlushOptions flush_options; std::string value; WriteBatchWithIndex batch; ASSERT_OK(db->Put(write_options, "A", "1")); ASSERT_OK(db->Flush(flush_options, db->DefaultColumnFamily())); ASSERT_OK(batch.Merge("A", "2")); ASSERT_OK(batch.GetFromBatchAndDB(db, read_options, "A", &value)); ASSERT_EQ(value, "1,2"); delete db; DestroyDB(dbname, options); } ``` Fix ASSERT in batch.GetFromBatchAndDB() Pull Request resolved: https://github.com/facebook/rocksdb/pull/5577 Differential Revision: D16379847 fbshipit-source-id: b1320e24ec8e71350c525083cc0a16180a63f752	2019-09-05 17:52:14 -07:00
Richard He	cfc20019d1	Fixed FALLOC_FL_KEEP_SIZE undefined (#5614 ) Summary: Fix `error: ‘FALLOC_FL_KEEP_SIZE’` undeclared error in `io_posix.cc` during Vagrant build in CentOS as per issue https://github.com/facebook/rocksdb/issues/5599 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5614 Differential Revision: D17217960 fbshipit-source-id: ef736c51b16833107fd9ccc7917ed1def2a8d02c	2019-09-05 17:37:21 -07:00
Jeffrey Xiao	eae9f040eb	Initialized pinned_pos_ and pinned_seq_pos_ in FragmentedRangeTombstoneIterator (#5720 ) Summary: These uninitialized member variables can cause a key to not be pinned when it should be, causing erroneous behavior. For example ingesting a file with range deletion tombstones will yield an "external file have corrupted keys" on a Mac. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5720 Differential Revision: D17217673 fbshipit-source-id: cd7df7ce3ad9cf69c841c4d3dc6fd144eff9e212	2019-09-05 17:30:29 -07:00
Yi Wu	83b991922e	Fix EncryptedEnv assert (#5735 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/5734. By reading the code the assert don't quite make sense to me, since `dataSize` and `fileOffset` has no correlation. But my knowledge about `EncryptedEnv` is very limited. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5735 Test Plan: run `ENCRYPTED_ENV=1 ./db_encryption_test` Signed-off-by: Yi Wu <yiwu@pingcap.com> Differential Revision: D17133849 fbshipit-source-id: bb7262d308e5b2503c400b180edc252668df0ef0	2019-09-05 17:21:42 -07:00
Andrew Kryczka	43a5cdb58c	remove unused #include to fix musl libc build (#5583 ) Summary: The `#include "core_local.h"` was pulling in libgcc's `posix_memalign()` declaration. That declaration specifies `throw()` whereas musl libc's declaration does not. This was leading to the following compiler error when using musl libc: ``` In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/port/jemalloc_helper.h:26:0, from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.h:11, from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb/util/jemalloc_nodump_allocator.cc:6: /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: error: declaration of 'int posix_memalign(void, size_t, size_t) throw ()' has a different exception specifier # define je_posix_memalign posix_memalign ^ /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:63:29: note: from previous declaration 'int posix_memalign(void, size_t, size_t)' # define je_posix_memalign posix_memalign ^ /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:202:38: note: in expansion of macro 'je_posix_memalign' JEMALLOC_EXPORT int JEMALLOC_NOTHROW je_posix_memalign(void memptr, ^~~~~~~~~~~~~~~~~ make[4]: * [CMakeFiles/rocksdb.dir/util/jemalloc_nodump_allocator.cc.o] Error 1 ``` Since `#include "core_local.h"` is not actually used, we can just remove it. I verified that fixes the build. There was a related PR here (https://github.com/facebook/rocksdb/issues/2188), although the problem description is slightly different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5583 Differential Revision: D16343227 fbshipit-source-id: 0386bc2b5fd55b2c3b5fba19382014efa52e44f8	2019-09-05 17:18:49 -07:00
HouBingjian	ac97e6930f	bloom test check fail on arm (#5745 ) Summary: FullFilterBitsBuilder::CalculateSpace use CACHE_LINE_SIZE which is 64@X86 but 128@ARM64 when it run bloom_test.FullVaryingLengths it failed on ARM64 server, the assert can be fixed by change 128->CACHE_LINE_SIZE2 as merged ASSERT_LE(FilterSize(), (size_t)((length 10 / 8) + CACHE_LINE_SIZE * 2 + 5)) << length; run bloom_test before fix: /root/rocksdb-master/util/bloom_test.cc:281: Failure Expected: (FilterSize()) <= ((size_t)((length * 10 / 8) + 128 + 5)), actual: 389 vs 383 200 [ FAILED ] FullBloomTest.FullVaryingLengths (32 ms) [----------] 4 tests from FullBloomTest (32 ms total) [----------] Global test environment tear-down [==========] 7 tests from 2 test cases ran. (116 ms total) [ PASSED ] 6 tests. [ FAILED ] 1 test, listed below: [ FAILED ] FullBloomTest.FullVaryingLengths after fix: Filters: 37 good, 0 mediocre [ OK ] FullBloomTest.FullVaryingLengths (90 ms) [----------] 4 tests from FullBloomTest (90 ms total) [----------] Global test environment tear-down [==========] 7 tests from 2 test cases ran. (174 ms total) [ PASSED ] 7 tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5745 Differential Revision: D17076047 fbshipit-source-id: e7beb5d55d4855fceb2b84bc8119a6b0759de635	2019-09-05 17:03:24 -07:00
Peter Dillinger	b55b2f45d0	Faster new DynamicBloom implementation (for memtable) (#5762 ) Summary: Since DynamicBloom is now only used in-memory, we're free to change it without schema compatibility issues. The new implementation is drawn from (with manifest permission) `303542a767/bloom_simulation_tests/foo.cc (L613)` This has several speed advantages over the prior implementation: * Uses fastrange instead of % * Minimum logic to determine first (and all) probed memory addresses * (Major) Two probes per 64-bit memory fetch/write. * Very fast and effective (murmur-like) hash expansion/re-mixing. (At least on recent CPUs, integer multiplication is very cheap.) While a Bloom filter with 512-bit cache locality has about a 1.15x FP rate penalty (e.g. 0.84% to 0.97%), further restricting to two probes per 64 bits incurs an additional 1.12x FP rate penalty (e.g. 0.97% to 1.09%). Nevertheless, the unit tests show no "mediocre" FP rate samples, unlike the old implementation with more erratic FP rates. Especially for the memtable, we expect speed to outweigh somewhat higher FP rates. For example, a negative table query would have to be 1000x slower than a BF query to justify doubling BF query time to shave 10% off FP rate (working assumption around 1% FP rate). While that seems likely for SSTs, my data suggests a speed factor of roughly 50x for the memtable (vs. BF; ~1.5% lower write throughput when enabling memtable Bloom filter, after this change). Thus, it's probably not worth even 5% more time in the Bloom filter to shave off 1/10th of the Bloom FP rate, or 0.1% in absolute terms, and it's probably at least 20% slower to recoup that much FP rate from this new implementation. Because of this, we do not see a need for a 'locality' option that affects the MemTable Bloom filter and have decoupled the MemTable Bloom filter from Options::bloom_locality. Note that just 3% more memory to the Bloom filter (10.3 bits per key vs. just 10) is able to make up for the ~12% FP rate drop in the new implementation: [] # Nearly "ideal" FP-wise but reasonably fast cache-local implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out time: 3.29372 sampled_fp_rate: 0.00985956 ... [] # Close match to this new implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out 10000000 6 10.3 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.10072 sampled_fp_rate: 0.00985655 ... [] # Old locality=1 implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out time: 3.95472 sampled_fp_rate: 0.00988943 ... Also note the dramatic speed improvement vs. alternatives. -- Performance unit test: DynamicBloomTest.concurrent_with_perf is updated to report more precise timing data. (Measure running time of each thread, not just longest running thread, etc.) Results averaged over various sizes enabled with --enable_perf and 20 runs each; old dynamic bloom refers to locality=1, the faster of the old: old dynamic bloom, avg add latency = 65.6468 new dynamic bloom, avg add latency = 44.3809 old dynamic bloom, avg query latency = 50.6485 new dynamic bloom, avg query latency = 43.2186 old avg parallel add latency = 41.678 new avg parallel add latency = 24.5238 old avg parallel hit latency = 14.6322 new avg parallel hit latency = 12.3939 old avg parallel miss latency = 16.7289 new avg parallel miss latency = 12.2134 Tested on a dedicated 64-bit production machine at Facebook. Significant improvement all around. Despite now using std::atomic<uint64_t>, quick before-and-after test on a 32-bit machine (Intel Atom N270, released 2008) shows no regression in performance, in some cases modest improvement. -- Performance integration test (synthetic): with DEBUG_LEVEL=0, used TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom,readmissing,readrandom,stats --num=2000000 and optionally with -memtable_whole_key_filtering -memtable_bloom_size_ratio=0.01 300 runs each configuration. Write throughput change by enabling memtable bloom: Old locality=0: -3.06% Old locality=1: -2.37% New: -1.50% conclusion -> seems to substantially close the gap Readmissing throughput change by enabling memtable bloom: Old locality=0: +34.47% Old locality=1: +34.80% New: +33.25% conclusion -> maybe a small new penalty from FP rate Readrandom throughput change by enabling memtable bloom: Old locality=0: +31.54% Old locality=1: +31.13% New: +30.60% conclusion -> maybe also from FP rate (after memtable flush) -- Another conclusion we can draw from this new implementation is that the existing 32-bit hash function is not inherently crippling the Bloom filter speed or accuracy, below about 5 million keys. For speed, the implementation is essentially the same whether starting with 32-bits or 64-bits of hash; it just determines whether the first multiplication after fastrange is a pseudorandom expansion or needed re-mix. Note that this multiplication can occur while memory is fetching. For accuracy, in a standard configuration, you need about 5 million keys before you have about a 1.1x FP penalty due to using a 32-bit hash vs. 64-bit: [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.52069 sampled_fp_rate: 0.0118267 ... [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out time: 2.43871 sampled_fp_rate: 0.0109059 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5762 Differential Revision: D17214194 Pulled By: pdillinger fbshipit-source-id: ad9da031772e985fd6b62a0e1db8e81892520595	2019-09-05 14:59:25 -07:00
jsteemann	19e8c9b64f	use c++17's try_emplace if available (#5696 ) Summary: This avoids rehashing the key in TrackKey() in case the key is not already in the map of tracked keys, which will happen at least once per key used in a transaction. Additionally fix two typos. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5696 Differential Revision: D17210178 Pulled By: lth fbshipit-source-id: 7e2c28e9e505c1d1c1535d435250cf2b191a6fdf	2019-09-05 13:59:40 -07:00
Peter Dillinger	20dec1401f	Copy/split PlainTableBloomV1 from DynamicBloom (refactor) (#5767 ) Summary: DynamicBloom was being used both for memory-only and for on-disk filters, as part of the PlainTable format. To set up enhancements to the memtable Bloom filter, this splits the code into two copies and removes unused features from each copy. Adds test PlainTableDBTest.BloomSchema to ensure no accidental change to that format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5767 Differential Revision: D17206963 Pulled By: pdillinger fbshipit-source-id: 6cce8d55305ed0df051b4c58bdc98c8ad81d0553	2019-09-05 10:05:20 -07:00
ENDOH takanao	3f2723a81b	fix checking the '-march' flag (#5766 ) Summary: Hi! guys, I got errors on the ARM machine. before: ```console $ make static_lib ... g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto' g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native ``` Thanks! Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766 Differential Revision: D17191117 fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868	2019-09-04 14:34:28 -07:00
Maysam Yabandeh	f9fb9f1421	Add a unit test to detect infinite loops with reseek optimizations (#5727 ) Summary: Iterators reseek to the target key after iterating over max_sequential_skip_in_iterations invalid values. The logic is susceptible to an infinite loop bug, which has been present with WritePrepared Transactions up until 6.2 release. Although the bug is not present on master, the patch adds a unit test to prevent it from resurfacing again. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5727 Differential Revision: D16952759 Pulled By: maysamyabandeh fbshipit-source-id: d0d973dddc8dfabd5a794931232aa4c862c74f51	2019-09-04 14:31:10 -07:00

1 2 3 4 5 ...

8334 Commits