rocksdb

Author	SHA1	Message	Date
Peter Dillinger	b55b2f45d0	Faster new DynamicBloom implementation (for memtable) (#5762 ) Summary: Since DynamicBloom is now only used in-memory, we're free to change it without schema compatibility issues. The new implementation is drawn from (with manifest permission) `303542a767/bloom_simulation_tests/foo.cc (L613)` This has several speed advantages over the prior implementation: * Uses fastrange instead of % * Minimum logic to determine first (and all) probed memory addresses * (Major) Two probes per 64-bit memory fetch/write. * Very fast and effective (murmur-like) hash expansion/re-mixing. (At least on recent CPUs, integer multiplication is very cheap.) While a Bloom filter with 512-bit cache locality has about a 1.15x FP rate penalty (e.g. 0.84% to 0.97%), further restricting to two probes per 64 bits incurs an additional 1.12x FP rate penalty (e.g. 0.97% to 1.09%). Nevertheless, the unit tests show no "mediocre" FP rate samples, unlike the old implementation with more erratic FP rates. Especially for the memtable, we expect speed to outweigh somewhat higher FP rates. For example, a negative table query would have to be 1000x slower than a BF query to justify doubling BF query time to shave 10% off FP rate (working assumption around 1% FP rate). While that seems likely for SSTs, my data suggests a speed factor of roughly 50x for the memtable (vs. BF; ~1.5% lower write throughput when enabling memtable Bloom filter, after this change). Thus, it's probably not worth even 5% more time in the Bloom filter to shave off 1/10th of the Bloom FP rate, or 0.1% in absolute terms, and it's probably at least 20% slower to recoup that much FP rate from this new implementation. Because of this, we do not see a need for a 'locality' option that affects the MemTable Bloom filter and have decoupled the MemTable Bloom filter from Options::bloom_locality. Note that just 3% more memory to the Bloom filter (10.3 bits per key vs. just 10) is able to make up for the ~12% FP rate drop in the new implementation: [] # Nearly "ideal" FP-wise but reasonably fast cache-local implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_WORM64_FROM32_any.out time: 3.29372 sampled_fp_rate: 0.00985956 ... [] # Close match to this new implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out 10000000 6 10.3 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.10072 sampled_fp_rate: 0.00985655 ... [] # Old locality=1 implementation [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out 10000000 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_ROCKSDB_DYNAMIC_any.out time: 3.95472 sampled_fp_rate: 0.00988943 ... Also note the dramatic speed improvement vs. alternatives. -- Performance unit test: DynamicBloomTest.concurrent_with_perf is updated to report more precise timing data. (Measure running time of each thread, not just longest running thread, etc.) Results averaged over various sizes enabled with --enable_perf and 20 runs each; old dynamic bloom refers to locality=1, the faster of the old: old dynamic bloom, avg add latency = 65.6468 new dynamic bloom, avg add latency = 44.3809 old dynamic bloom, avg query latency = 50.6485 new dynamic bloom, avg query latency = 43.2186 old avg parallel add latency = 41.678 new avg parallel add latency = 24.5238 old avg parallel hit latency = 14.6322 new avg parallel hit latency = 12.3939 old avg parallel miss latency = 16.7289 new avg parallel miss latency = 12.2134 Tested on a dedicated 64-bit production machine at Facebook. Significant improvement all around. Despite now using std::atomic<uint64_t>, quick before-and-after test on a 32-bit machine (Intel Atom N270, released 2008) shows no regression in performance, in some cases modest improvement. -- Performance integration test (synthetic): with DEBUG_LEVEL=0, used TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom,readmissing,readrandom,stats --num=2000000 and optionally with -memtable_whole_key_filtering -memtable_bloom_size_ratio=0.01 300 runs each configuration. Write throughput change by enabling memtable bloom: Old locality=0: -3.06% Old locality=1: -2.37% New: -1.50% conclusion -> seems to substantially close the gap Readmissing throughput change by enabling memtable bloom: Old locality=0: +34.47% Old locality=1: +34.80% New: +33.25% conclusion -> maybe a small new penalty from FP rate Readrandom throughput change by enabling memtable bloom: Old locality=0: +31.54% Old locality=1: +31.13% New: +30.60% conclusion -> maybe also from FP rate (after memtable flush) -- Another conclusion we can draw from this new implementation is that the existing 32-bit hash function is not inherently crippling the Bloom filter speed or accuracy, below about 5 million keys. For speed, the implementation is essentially the same whether starting with 32-bits or 64-bits of hash; it just determines whether the first multiplication after fastrange is a pseudorandom expansion or needed re-mix. Note that this multiplication can occur while memory is fetching. For accuracy, in a standard configuration, you need about 5 million keys before you have about a 1.1x FP penalty due to using a 32-bit hash vs. 64-bit: [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_FROM32_any.out time: 2.52069 sampled_fp_rate: 0.0118267 ... [~/wormhashing/bloom_simulation_tests] ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out $((5 * 1000 * 1000 * 10)) 6 10 $RANDOM 100000000 ./foo_gcc_IMPL_CACHE_MUL64_BLOCK_any.out time: 2.43871 sampled_fp_rate: 0.0109059 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5762 Differential Revision: D17214194 Pulled By: pdillinger fbshipit-source-id: ad9da031772e985fd6b62a0e1db8e81892520595	2019-09-05 14:59:25 -07:00
jsteemann	19e8c9b64f	use c++17's try_emplace if available (#5696 ) Summary: This avoids rehashing the key in TrackKey() in case the key is not already in the map of tracked keys, which will happen at least once per key used in a transaction. Additionally fix two typos. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5696 Differential Revision: D17210178 Pulled By: lth fbshipit-source-id: 7e2c28e9e505c1d1c1535d435250cf2b191a6fdf	2019-09-05 13:59:40 -07:00
Peter Dillinger	20dec1401f	Copy/split PlainTableBloomV1 from DynamicBloom (refactor) (#5767 ) Summary: DynamicBloom was being used both for memory-only and for on-disk filters, as part of the PlainTable format. To set up enhancements to the memtable Bloom filter, this splits the code into two copies and removes unused features from each copy. Adds test PlainTableDBTest.BloomSchema to ensure no accidental change to that format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5767 Differential Revision: D17206963 Pulled By: pdillinger fbshipit-source-id: 6cce8d55305ed0df051b4c58bdc98c8ad81d0553	2019-09-05 10:05:20 -07:00
ENDOH takanao	3f2723a81b	fix checking the '-march' flag (#5766 ) Summary: Hi! guys, I got errors on the ARM machine. before: ```console $ make static_lib ... g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto' g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native ``` Thanks! Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766 Differential Revision: D17191117 fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868	2019-09-04 14:34:28 -07:00
Maysam Yabandeh	f9fb9f1421	Add a unit test to detect infinite loops with reseek optimizations (#5727 ) Summary: Iterators reseek to the target key after iterating over max_sequential_skip_in_iterations invalid values. The logic is susceptible to an infinite loop bug, which has been present with WritePrepared Transactions up until 6.2 release. Although the bug is not present on master, the patch adds a unit test to prevent it from resurfacing again. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5727 Differential Revision: D16952759 Pulled By: maysamyabandeh fbshipit-source-id: d0d973dddc8dfabd5a794931232aa4c862c74f51	2019-09-04 14:31:10 -07:00
Affan Dar	229e6fbe0e	Adding DB::GetCurrentWalFile() API as a repliction/backup helper (#5765 ) Summary: Adding a light weight API to get last live WAL file name and size. Meant to be used as a helper for backup/restore tooling in a larger ecosystem such as MySQL with a MyRocks storage engine. Specifically within MySQL's backup/restore mechanism, this call can be made with a write lock on the mysql db to get a transactionally consistent snapshot of the current WAL file position along with other non-rocksdb log/data files. Without this, the alternative would be to take the aforementioned lock, scan the WAL dir for all files, find the last file and note its exact size as the rocksdb 'checkpoint'. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5765 Differential Revision: D17172717 Pulled By: affandar fbshipit-source-id: f2fabafd4c0e6fc45f126670c8c88a9f84cb8a37	2019-09-04 12:10:17 -07:00
Yanqin Jin	38b17ecd0e	Replace named comparator struct with lambda (#5768 ) Summary: Tiny code mod: replace a named comparator struct with anonymous lambda. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5768 Differential Revision: D17185141 Pulled By: riversand963 fbshipit-source-id: fabe367649931c33a39ad035dc707d2efc3ad5fc	2019-09-04 11:38:34 -07:00
git-hulk	cdb6334e68	MOD: trim last space and comma in perf context and iostat context ToString() Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5755 Differential Revision: D17165190 Pulled By: riversand963 fbshipit-source-id: a3a4633961bfe019bf360f97a4c4d36464e7fa0b	2019-09-03 12:27:17 -07:00
Vijay Nadimpalli	979fbdc696	Persistent globally unique DB ID in manifest (#5725 ) Summary: Each DB has a globally unique ID. A DB can be physically copied around, or backed-up and restored, and the users should be identify the same DB. This unique ID right now is stored as plain text in file IDENTITY under the DB directory. This approach introduces at least two problems: (1) the file is not checksumed; (2) the source of truth of a DB is the manifest file, which can be copied separately from IDENTITY file, causing the DB ID to be wrong. The goal of this PR is solve this problem by moving the DB ID to manifest. To begin with we will write to both identity file and manifest. Write to Manifest is controlled via the flag write_dbid_to_manifest in Options and default is false. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5725 Test Plan: Added unit tests. Differential Revision: D16963840 Pulled By: vjnadimpalli fbshipit-source-id: 8a86a4c8c82c716003c40fd6b9d2d758030d92e9	2019-09-03 08:52:24 -07:00
Yanqin Jin	44eca41add	Fix a bug in file ingestion (#5760 ) Summary: Before this PR, when the number of column families involved in a file ingestion exceeds 2, a bug in the looping logic prevents correct file number being assigned to each ingestion job. Also skip deleting non-existing hard links during cleanup-after-failure. Test plan (devserver) ``` $COMPILE_WITH_ASAN=1 make all $./external_sst_file_test --gtest_filter=ExternalSSTFileTest/ExternalSSTFileTest.IngestFilesIntoMultipleColumnFamilies_/ $makke check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5760 Differential Revision: D17142982 Pulled By: riversand963 fbshipit-source-id: 06c1847a4e7a402647bcf28d124e70f2a0f9daf6	2019-08-30 18:29:07 -07:00
Yanqin Jin	672befea2a	Fix assertion failure in FIFO compaction with TTL (#5754 ) Summary: Before this PR, the following sequence of events can cause assertion failure as shown below. Stack trace (partial): ``` (gdb) bt 2 0x00007f59b350ad15 in __assert_fail_base (fmt=<optimized out>, assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:92 3 0x00007f59b350adc3 in __GI___assert_fail (assertion=assertion@entry=0x9f8390 "mark_as_compacted ? !inputs_[i][j]->being_compacted : inputs_[i][j]->being_compacted", file=file@entry=0x9e347c "db/compaction/compaction.cc", line=line@entry=395, function=function@entry=0xa21ec0 <rocksdb::Compaction::MarkFilesBeingCompacted(bool)::__PRETTY_FUNCTION__> "void rocksdb::Compaction::MarkFilesBeingCompacted(bool)") at assert.c:101 4 0x0000000000492ccd in rocksdb::Compaction::MarkFilesBeingCompacted (this=<optimized out>, mark_as_compacted=<optimized out>) at db/compaction/compaction.cc:394 5 0x000000000049467a in rocksdb::Compaction::Compaction (this=0x7f59af013000, vstorage=0x7f581af53030, _immutable_cf_options=..., _mutable_cf_options=..., _inputs=..., _output_level=<optimized out>, _target_file_size=0, _max_compaction_bytes=0, _output_path_id=0, _compression=<incomplete type>, _compression_opts=..., _max_subcompactions=0, _grandparents=..., _manual_compaction=false, _score=4, _deletion_compaction=true, _compaction_reason=rocksdb::CompactionReason::kFIFOTtl) at db/compaction/compaction.cc:241 6 0x00000000004af9bc in rocksdb::FIFOCompactionPicker::PickTTLCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:101 7 0x00000000004b0771 in rocksdb::FIFOCompactionPicker::PickCompaction (this=0x7f59b31a6900, cf_name=..., mutable_cf_options=..., vstorage=0x7f581af53030, log_buffer=0x7f59b1bfa930) at db/compaction/compaction_picker_fifo.cc:201 8 0x00000000004838cc in rocksdb::ColumnFamilyData::PickCompaction (this=this@entry=0x7f59b31b3700, mutable_options=..., log_buffer=log_buffer@entry=0x7f59b1bfa930) at db/column_family.cc:933 9 0x00000000004f3645 in rocksdb::DBImpl::BackgroundCompaction (this=this@entry=0x7f59b3176000, made_progress=made_progress@entry=0x7f59b1bfa6bf, job_context=job_context@entry=0x7f59b1bfa760, log_buffer=log_buffer@entry=0x7f59b1bfa930, prepicked_compaction=prepicked_compaction@entry=0x0, thread_pri=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2541 10 0x00000000004f5e2a in rocksdb::DBImpl::BackgroundCallCompaction (this=this@entry=0x7f59b3176000, prepicked_compaction=prepicked_compaction@entry=0x0, bg_thread_pri=bg_thread_pri@entry=rocksdb::Env::LOW) at db/db_impl/db_impl_compaction_flush.cc:2312 11 0x00000000004f648e in rocksdb::DBImpl::BGWorkCompaction (arg=<optimized out>) at db/db_impl/db_impl_compaction_flush.cc:2087 ``` This can be caused by the following sequence of events. ``` Time \| thr bg_compact_thr1 bg_compact_thr2 \| write \| flush \| mark all l0 as being compacted \| write \| flush \| add cf to queue again \| mark all l0 as being \| compacted, fail the \| assertion V ``` Test plan (on devserver) Since bg_compact_thr1 and bg_compact_thr2 are two threads executing the same code, it is difficult to use sync point dependency to coordinate their execution. Therefore, I choose to use db_stress. ``` $TEST_TMPDIR=/dev/shm/rocksdb ./db_stress --periodic_compaction_seconds=1 --max_background_compactions=20 --format_version=2 --memtablerep=skip_list --max_write_buffer_number=3 --cache_index_and_filter_blocks=1 --reopen=20 --recycle_log_file_num=0 --acquire_snapshot_one_in=10000 --delpercent=4 --log2_keys_per_lock=22 --compaction_ttl=1 --block_size=16384 --use_multiget=1 --compact_files_one_in=1000000 --target_file_size_multiplier=2 --clear_column_family_one_in=0 --max_bytes_for_level_base=10485760 --use_full_merge_v1=1 --target_file_size_base=2097152 --checkpoint_one_in=1000000 --mmap_read=0 --compression_type=zstd --writepercent=35 --readpercent=45 --subcompactions=4 --use_merge=0 --write_buffer_size=4194304 --test_batches_snapshots=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox --use_direct_reads=0 --compact_range_one_in=1000000 --open_files=-1 --destroy_db_initially=0 --progress_reports=0 --compression_zstd_max_train_bytes=0 --snapshot_hold_ops=100000 --enable_pipelined_write=0 --nooverwritepercent=1 --compression_max_dict_bytes=0 --max_key=1000000 --prefixpercent=5 --flush_one_in=1000000 --ops_per_thread=40000 --index_block_restart_interval=7 --cache_size=1048576 --compaction_style=2 --verify_checksum=1 --delrangepercent=1 --use_direct_io_for_flush_and_compaction=0 ``` This should see no assertion failure. Last but not least, ``` $COMPILE_WITH_ASAN=1 make -j32 all $make check ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5754 Differential Revision: D17109791 Pulled By: riversand963 fbshipit-source-id: 25fc46101235add158554e096540b72c324be078	2019-08-30 12:42:01 -07:00
Paul O'Shannessy	9a449865e3	Adopt Contributor Covenant Summary: In order to foster healthy open source communities, we're adopting the [Contributor Covenant](https://www.contributor-covenant.org/). It has been built by open source community members and represents a shared understanding of what is expected from a healthy community. Reviewed By: josephsavona, danobi, rdzhabarov Differential Revision: D17104640 fbshipit-source-id: d210000de686c5f0d97d602b50472d5869bc6a49	2019-08-29 23:21:01 -07:00
Pratik Dhandharia	a281822331	Lower the risk for users to run options.force_consistency_checks = true (#5744 ) Summary: Open-source users recently reported two occurrences of LSM-tree corruption (https://github.com/facebook/rocksdb/issues/5558 is one), which would be caught by options.force_consistency_checks = true. options.force_consistency_checks has a usability limitation because it crashes the service once inconsistency is detected. This makes the feature hard to use. Most users serve from multiple RocksDB shards per server and the impacts of crashing the service is higher than it should be. Instead, we just pass the error back to users without killing the service, and ask them to deal with the problem accordingly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5744 Differential Revision: D17096940 Pulled By: pdhandharia fbshipit-source-id: b6780039044e265f26ed2ad03c51f4abbe8b603c	2019-08-29 14:07:37 -07:00
anand76	1729779b85	Disable MultiGet row cache test in LITE mode (#5756 ) Summary: Row cache is not supported in LITE mode. So disable the test in that mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5756 Test Plan: make LITE=1 all check Differential Revision: D17115684 Pulled By: anand1976 fbshipit-source-id: e6433c2e528674645cea76cdfc80ddc473708fc2	2019-08-29 12:13:28 -07:00
Jeremy Taylor	c5e12ebfd2	Add Crux to USERS.md Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5718 Differential Revision: D17096939 Pulled By: riversand963 fbshipit-source-id: 4301078d3ca3d54a1c7e841eccad95379cd1570d	2019-08-29 11:29:48 -07:00
Shafreeck Sea	ab0645a596	Fix comment of function NotifyCollectTableCollectorsOnFinish (#5738 ) Summary: Signed-off-by: Shafreeck Sea <shafreeck@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5738 Differential Revision: D17097075 Pulled By: riversand963 fbshipit-source-id: ed01b5f59e8eed262a49abe1f96552842d364af1	2019-08-29 10:57:01 -07:00
anand76	e10570331d	Support row cache with batched MultiGet (#5706 ) Summary: This PR adds support for row cache in ```rocksdb::TableCache::MultiGet```. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5706 Test Plan: 1. Unit tests in db_basic_test 2. db_bench results with batch size of 2 (```Get``` is faster than ```MultiGet``` for single key) - Get - readrandom : 3.935 micros/op 254116 ops/sec; 28.1 MB/s (22870998 of 22870999 found) MultiGet - multireadrandom : 3.743 micros/op 267190 ops/sec; (24047998 of 24047998 found) Command used - TEST_TMPDIR=/dev/shm/multiget numactl -C 10 ./db_bench -use_existing_db=true -use_existing_keys=false -benchmarks="readtorowcache,[read\|multiread]random" -write_buffer_size=16777216 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -row_cache_size=4194304000 -batch_size=2 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=131072 Differential Revision: D17086297 Pulled By: anand1976 fbshipit-source-id: 85784378da913e05f1baf31ec1b4e7c9345e7f57	2019-08-28 16:11:56 -07:00
sdong	1daff8f85a	crash_test to skip compaction TTL for FIFO compaction. (#5749 ) Summary: https://github.com/facebook/rocksdb/pull/5741 added compaction TTL to crash test, but it causes assertion fails for FIFO compaction. Disable this combination for now while we debug the assertion failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5749 Test Plan: Run crash test and observe that when compaction_style=2, compaction_ttl is always 0. Differential Revision: D17078292 fbshipit-source-id: 446821a3b9739956094d5e4f9be1251a15b57f5d	2019-08-27 17:55:37 -07:00
Pratik Dhandharia	1b4c104a67	replace some reinterpret_cast with static_cast_with_check (#5740 ) Summary: This PR focuses on replacing some of the reinterpret_cast<DBImpl*> to static_cast_with_check<DBImpl, DB>. Files impacted: ./db/db_impl/db_impl_compaction_flush.cc ./db/write_batch.cc ./utilities/blob_db/blob_db_impl.cc ./utilities/transactions/pessimistic_transaction_db.cc ./utilities/transactions/transaction_base.cc ./utilities/transactions/write_prepared_txn_db.cc ./utilities/transactions/write_unprepared_txn_db.cc Pull Request resolved: https://github.com/facebook/rocksdb/pull/5740 Differential Revision: D17055691 Pulled By: pdhandharia fbshipit-source-id: 0f8034d1b32eade56e37d59c04b7bf236a81d8e8	2019-08-27 10:59:11 -07:00
sdong	1d6a10f52d	Extend stress test to cover periodic compaction and compaction TTL (#5741 ) Summary: Covering periodic compaction and compaction TTL can help us expose potential issues. Add it there. Randomly select value for these two options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5741 Test Plan: Run crash_test and see the perameters generated. Differential Revision: D17059515 fbshipit-source-id: 8213974846a0b6a22fc13be705825c9054d1d097	2019-08-26 15:03:25 -07:00
Andrew Kryczka	ba0967b567	Reduce severity of too many levels log message (#5742 ) Summary: This condition is now a normal occurrence during write burst so there is no need to warn the user about it. Here is a scenario where it happens under completely normal conditions. * Initially we have a DB of three levels (L0, L1, and L2) that is stable, i.e., compaction scores are all less than one. * Now a write burst comes along. At first L0 blows up a bit in size as compaction hasn't had a chance to catch up. * As a result of the above, `base_bytes_min` also increases since it is based on L0 size as of https://github.com/facebook/rocksdb/issues/4338 * If `base_bytes_min` increased enough (i.e., to be larger than L1), then we are shown the warning that the DB has more levels than necessary. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5742 Differential Revision: D17059221 fbshipit-source-id: e4a31d6eea42089a8d273095f19653991bd91bea	2019-08-26 15:00:43 -07:00
jsteemann	62829ff751	reuse scratch buffer in transaction_log_reader (#5702 ) Summary: in order to avoid reallocations for a scratch std::string on every call to Next(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/5702 Differential Revision: D16867803 fbshipit-source-id: 1391220a1b172b23336bbc71dc0c79ccf3b1c701	2019-08-26 11:26:29 -07:00
Zhongyi Xie	2f41ecfe75	Refactor trimming logic for immutable memtables (#5022 ) Summary: MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory. We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one. The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming. In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022 Differential Revision: D14394062 Pulled By: miasantreble fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5	2019-08-23 13:55:34 -07:00
DaiZhiwei	26293c89a6	crc32c_arm64 performance optimization (#5675 ) Summary: Crc32c Parallel computation coding optimization: Macro unfolding removes the "for" loop and is good to decrease branch-miss in arm64 micro architecture 1024 Bytes is divided into 8(head) + 1008( 6 * 7 * 3 * 8 ) + 8(tail) three parts Macro unfolding 42 loops to 6 CRC32C7X24BYTESs 1 CRC32C7X24BYTES containing 7 CRC32C24BYTESs 1, crc32c_test [==========] Running 4 tests from 1 test case. [----------] Global test environment set-up. [----------] 4 tests from CRC [ RUN ] CRC.StandardResults [ OK ] CRC.StandardResults (1 ms) [ RUN ] CRC.Values [ OK ] CRC.Values (0 ms) [ RUN ] CRC.Extend [ OK ] CRC.Extend (0 ms) [ RUN ] CRC.Mask [ OK ] CRC.Mask (0 ms) [----------] 4 tests from CRC (1 ms total) [----------] Global test environment tear-down [==========] 4 tests from 1 test case ran. (1 ms total) [ PASSED ] 4 tests. 2, db_bench --benchmarks="crc32c" crc32c : 0.218 micros/op 4595390 ops/sec; 17950.7 MB/s (4096 per op) 3, repeated crc32c_test case 60000 times perf stat -e branch-miss -- ./crc32c_test before optimization: 739,426,504 branch-miss after optimization: 1,128,572 branch-miss Pull Request resolved: https://github.com/facebook/rocksdb/pull/5675 Differential Revision: D16989210 fbshipit-source-id: 7204e6069bb6ed066d49c2d1b3ac385065a98557	2019-08-23 11:04:08 -07:00
Levi Tamasi	df8c307d63	Revert to storing UncompressionDicts in the cache (#5645 ) Summary: PR https://github.com/facebook/rocksdb/issues/5584 decoupled the uncompression dictionary object from the underlying block data; however, this defeats the purpose of the digested ZSTD dictionary, since the whole point of the digest is to create it once and reuse it over and over again. This patch goes back to storing the uncompression dictionary itself in the cache (which should be now safe to do, since it no longer includes a Statistics pointer), while preserving the rest of the refactoring. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5645 Test Plan: make asan_check Differential Revision: D16551864 Pulled By: ltamasi fbshipit-source-id: 2a7e2d34bb16e70e3c816506d5afe1d842057800	2019-08-23 08:27:30 -07:00
sdong	d8a27d9331	Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729 ) Summary: AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same. Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729 Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed. Differential Revision: D16969791 fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163	2019-08-22 16:32:55 -07:00
Patrick Pei	202942b20c	Fix local includes Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5722 Differential Revision: D16908380 fbshipit-source-id: 6a0e3cb2730b08d6012d3d7f31c937f01c399846	2019-08-22 16:21:47 -07:00
Maysam Yabandeh	244e6f2002	Refactor MultiGet names in BlockBasedTable (#5726 ) Summary: To improve code readability, since RetrieveBlock already calls MaybeReadBlockAndLoadToCache, we avoid name similarity of the functions that call RetrieveBlock with MaybeReadBlockAndLoadToCache. The patch thus renames MaybeLoadBlocksToCache to RetrieveMultipleBlock and deletes GetDataBlockFromCache, which contains only two lines. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5726 Differential Revision: D16962535 Pulled By: maysamyabandeh fbshipit-source-id: 99e8946808ce4eb7857592b9003812e3004f92d6	2019-08-22 08:49:00 -07:00
anand76	9046bdc5d3	Fix MultiGet() bug when whole_key_filtering is disabled (#5665 ) Summary: The batched MultiGet() implementation was not correctly handling bloom filter lookups when whole_key_filtering is disabled. It was incorrectly skipping keys not in the prefix_extractor domain, and not calling transform for keys in domain. This PR fixes both problems by moving the domain check and transformation to the FilterBlockReader. Tests: Unit test (confirmed failed before the fix) make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/5665 Differential Revision: D16902380 Pulled By: anand1976 fbshipit-source-id: a6be81ad68a6e37134a65246aec7a2c590eccf00	2019-08-21 10:23:23 -07:00
Maysam Yabandeh	7bc18e2727	Disable snapshot refresh feature when snap_refresh_nanos is 0 (#5724 ) Summary: The comments of snap_refresh_nanos advertise that the snapshot refresh feature will be disabled when the option is set to 0. This contract is however not honored in the code: https://github.com/facebook/rocksdb/pull/5278 The patch fixes that and also adds an assert to ensure that the feature is not used when the option is zero. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5724 Differential Revision: D16918185 Pulled By: maysamyabandeh fbshipit-source-id: fec167287df7d85093e087fc39c0eb243e3bbd7e	2019-08-20 11:40:07 -07:00
sdong	3552473668	Introduce IngestExternalFileOptions.verify_checksums_readahead_size (#5721 ) Summary: Recently readahead is introduced for checksum verifying. However, users cannot override the setting for the checksum verifying before external SST file ingestion. Introduce a new option for the purpose. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5721 Test Plan: Add a new unit test for it. Differential Revision: D16906896 fbshipit-source-id: 218ec37001ddcc05411cefddbe233d15ab308476	2019-08-20 10:43:39 -07:00
sdong	4c74dba5fa	Bump up memory order of ref counting of ColumnFamilyData (#5723 ) Summary: We see this TSAN warning: WARNING: ThreadSanitizer: data race (pid=282806) Write of size 8 at 0x7b6c00000e38 by thread T16 (mutexes: write M1023578822185846136): #0 operator delete(void) <null> (libtsan.so.0+0x0000000795f8) https://github.com/facebook/rocksdb/issues/1 rocksdb::DBImpl::BackgroundFlush(bool, rocksdb::JobContext, rocksdb::LogBuffer, rocksdb::FlushReason, rocksdb::Env::Priority) db/db_impl/db_impl_compaction_flush.cc:2202 (db_flush_test+0x00000060b462) https://github.com/facebook/rocksdb/issues/2 rocksdb::DBImpl::BackgroundCallFlush(rocksdb::Env::Priority) db/db_impl/db_impl_compaction_flush.cc:2226 (db_flush_test+0x00000060cbd8) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::BGWorkFlush(void) db/db_impl/db_impl_compaction_flush.cc:2073 (db_flush_test+0x00000060d5ac) ...... Previous atomic write of size 4 at 0x7b6c00000e38 by main thread: #0 __tsan_atomic32_fetch_sub <null> (libtsan.so.0+0x00000006d721) https://github.com/facebook/rocksdb/issues/1 std::__atomic_base<int>::fetch_sub(int, std::memory_order) /mnt/gvfs/third-party2/libgcc/c67031f0f739ac61575a061518d6ef5038f99f90/7.x/platform007/5620abc/include/c++/7.3.0/bits/atomic_base.h:524 (db_flush_test+0x0000005f9e38) https://github.com/facebook/rocksdb/issues/2 rocksdb::ColumnFamilyData::Unref() db/column_family.h:286 (db_flush_test+0x0000005f9e38) https://github.com/facebook/rocksdb/issues/3 rocksdb::DBImpl::FlushMemTable(rocksdb::ColumnFamilyData, rocksdb::FlushOptions const&, rocksdb::FlushReason, bool) db/db_impl/db_impl_compaction_flush.cc:1624 (db_flush_test+0x0000005f9e38) https://github.com/facebook/rocksdb/issues/4 rocksdb::DBImpl::TEST_FlushMemTable(rocksdb::ColumnFamilyData, rocksdb::FlushOptions const&) db/db_impl/db_impl_debug.cc:127 (db_flush_test+0x00000061ace9) https://github.com/facebook/rocksdb/issues/5 rocksdb::DBFlushTest_CFDropRaceWithWaitForFlushMemTables_Test::TestBody() db/db_flush_test.cc:320 (db_flush_test+0x0000004b44e5) https://github.com/facebook/rocksdb/issues/6 void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test, void (testing::Test::)(), char const*) third-party/gtest-1.7.0/fused-src/gtest/gtest-all.cc:3824 (db_flush_test+0x000000be2988) ...... It's still very clear the cause of the warning is because that TSAN treats results from relaxed atomic::fetch_sub() as non-atomic with the operation itself. We can make it more explicit by bumping up the order to CS. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5723 Test Plan: Run all existing test. Differential Revision: D16908250 fbshipit-source-id: bf17d39ed19058372bdf97f6440a743f88153021	2019-08-20 10:34:33 -07:00
sdong	8e12638f3d	Slightly adjust atomic white box test's kill odd (#5717 ) Summary: Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout. If WAL is disabled, make the odds 5x likely to trigger. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5717 Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out. Differential Revision: D16897237 fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c	2019-08-19 10:51:59 -07:00
sdong	e1c468d16f	Do readahead in VerifyChecksum() (#5713 ) Summary: Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713 Test Plan: Add a new unit test. Differential Revision: D16860874 fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413	2019-08-16 16:42:56 -07:00
Zhongyi Xie	e89b1c9c6e	add missing check for hash index when calling BlockBasedTableIterator (#5712 ) Summary: Previous PR https://github.com/facebook/rocksdb/pull/3601 added support for making prefix_extractor dynamically mutable. However, there was a missing check for hash index when creating new BlockBasedTableIterator. While the check may be redundant because no other types of IndexReader makes uses of the flag, it is less error-prone to add the missing check so that future index reader implementation will not worry about violating the contract. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5712 Differential Revision: D16842052 Pulled By: miasantreble fbshipit-source-id: aef11c0ff7a690ed248f5b8fe23481cac486b381	2019-08-16 16:39:49 -07:00
Adam Retter	f2bf0b2d1e	Fixes for building RocksJava releases on arm64v8 Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5674 Differential Revision: D16870338 fbshipit-source-id: c8dac644b1479fa734b491f3a8d50151772290f7	2019-08-16 16:27:50 -07:00
Kefu Chai	35fe685402	cmake: s/SNAPPY_LIBRARIES/snappy_LIBRARIES/ (#5687 ) Summary: fix the regression introduced by `cc9fa7fc` Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5687 Differential Revision: D16870212 fbshipit-source-id: 78b5519e1d2b03262d102ca530491254ddffdc38	2019-08-16 15:49:23 -07:00
sdong	e0515607bc	Blacklist TransactionTest.GetWithoutSnapshot from valgrind_test (#5715 ) Summary: In valgrind_test, TransactionTest.GetWithoutSnapshot ran 2 hours and still didn't finish. Black list from valgrind_test to prevent timeout. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5715 Test Plan: run "make valgrind_test" and see whether the test is still generated. Differential Revision: D16866009 fbshipit-source-id: 92c78049b0bc1c2b9a0dfc1b7c8a9206b36f02f0	2019-08-16 15:36:49 -07:00
Yanqin Jin	353a68d550	Update HISTORY.md for 6.4.0 (#5714 ) Summary: Update HISTORY.md by removing a feature from "Unreleased" to 6.4.0 after cherry-picking related commits to 6.4.fb branch. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5714 Differential Revision: D16865334 Pulled By: riversand963 fbshipit-source-id: f17ede905a1dfbbcdf98806ca398c618cf54748a	2019-08-16 15:09:20 -07:00
jsteemann	a2e46eae46	fix compiling with `-DNPERF_CONTEXT` (#5704 ) Summary: This was previously broken, as the performance context-related macro signatures in file monitoring/perf_context_imp.h deviated for the case when NPERF_CONTEXT was defined and when it was not. Update the macros for the `-DNPERF_CONTEXT` case, so it compiles. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5704 Differential Revision: D16867746 fbshipit-source-id: 05539724cb1f7955ecc42828365836a677759ad9	2019-08-16 14:38:08 -07:00
Eli Pozniansky	c2404d9928	Optimizing ApproximateSize to create index iterator just once (#5693 ) Summary: VersionSet::ApproximateSize doesn't need to create two separate index iterators and do binary search for each in BlockBasedTable. So BlockBasedTable::ApproximateSize was added that creates the iterator once and uses it to calculate the data size between start and end keys. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5693 Differential Revision: D16774056 Pulled By: elipoz fbshipit-source-id: 53ce262e1a057788243bf30cd9b8aa6581df1a18	2019-08-16 14:18:28 -07:00
sheng qiu	c762efc4a9	fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared (#5708 ) Summary: add "linux/falloc.h" in env/io_posix.cc to fix compile error: ‘FALLOC_FL_KEEP_SIZE’ undeclared Signed-off-by: sheng qiu <herbert1984106@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5708 Differential Revision: D16832922 fbshipit-source-id: 30e787c4a1b5a9724a8acfd68962ff5ec5f27d3e	2019-08-16 13:58:05 -07:00
Kefu Chai	40712df9ab	ThreadPoolImpl::Impl::BGThreadWrapper() returns void (#5709 ) Summary: there is no need to return void*, as std:🧵:thread(Func&& f, Args&&... args ) only requires `Func` to be callable. Signed-off-by: Kefu Chai <tchaikov@gmail.com> Pull Request resolved: https://github.com/facebook/rocksdb/pull/5709 Differential Revision: D16832894 fbshipit-source-id: a1e1b876fa8d55589ef5feb5b27f3a435068b747	2019-08-16 13:55:41 -07:00
Levi Tamasi	3a3dc29437	Update HISTORY.md for 6.3.2/6.4.0 and add a not-yet-released change Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5710 Test Plan: HISTORY.md-only change, no testing required. Differential Revision: D16836869 Pulled By: ltamasi fbshipit-source-id: 978148f1d14b0c46839a94d7ada8a5e8ecf73965	2019-08-16 11:17:03 -07:00
sdong	bd2c753dd0	Add command "list_file_range_deletes" in ldb (#5615 ) Summary: Add a command in ldb so that users can print out tombstones in SST files. In order to test the code, change the interface of LDBCommandRunner::RunCommand() so that it doesn't return from the program, but return the status code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5615 Test Plan: Add a new unit test Differential Revision: D16550326 fbshipit-source-id: 88ddfe6984bdcbb3a528abdd115089df09eba52e	2019-08-15 17:01:03 -07:00
Maysam Yabandeh	6ec2bf3fce	Blog post for write_unprepared (#5711 ) Summary: Introducing write_unprepared feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5711 Differential Revision: D16838307 Pulled By: maysamyabandeh fbshipit-source-id: d9a4daf63dd0f855bea49c14ce84e6299f1401c7	2019-08-15 14:41:13 -07:00
Jeffrey Xiao	d61d4507c0	Fix IngestExternalFile overlapping check (#5649 ) Summary: Previously, the end key of a range deletion tombstone was considered exclusive for the purposes of deletion, but considered inclusive when checking if two SSTables overlap. For example, an SSTable with a range deletion tombstone [a, b) would be considered overlapping with an SSTable with a range deletion tombstone [b, c). This commit fixes this check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5649 Differential Revision: D16808765 Pulled By: anand1976 fbshipit-source-id: 5c7ad1c027e4f778d35070e5dae1b8e6037e0d68	2019-08-14 21:02:28 -07:00
Levi Tamasi	d92a59b6f2	Fix regression affecting partitioned indexes/filters when cache_index_and_filter_blocks is false (#5705 ) Summary: PR https://github.com/facebook/rocksdb/issues/5298 (and subsequent related patches) unintentionally changed the semantics of cache_index_and_filter_blocks: historically, this option only affected the main index/filter block; with the changes, it affects index/filter partitions as well. This can cause performance issues when cache_index_and_filter_blocks is false since in this case, partitions are neither cached nor preloaded (i.e. they are loaded on demand upon each access). The patch reverts to the earlier behavior, that is, partitions are cached similarly to data blocks regardless of the value of the above option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5705 Test Plan: make check ./db_bench -benchmarks=fillrandom --statistics --stats_interval_seconds=1 --duration=30 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false ./db_bench -benchmarks=readrandom --use_existing_db --statistics --stats_interval_seconds=1 --duration=10 --num=500000000 --bloom_bits=20 --partition_index_and_filters=true --cache_index_and_filter_blocks=false --cache_size=8000000000 Relevant statistics from the readrandom benchmark with the old code: rocksdb.block.cache.index.miss COUNT : 0 rocksdb.block.cache.index.hit COUNT : 0 rocksdb.block.cache.index.add COUNT : 0 rocksdb.block.cache.index.bytes.insert COUNT : 0 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 0 rocksdb.block.cache.filter.hit COUNT : 0 rocksdb.block.cache.filter.add COUNT : 0 rocksdb.block.cache.filter.bytes.insert COUNT : 0 rocksdb.block.cache.filter.bytes.evict COUNT : 0 With the new code: rocksdb.block.cache.index.miss COUNT : 2500 rocksdb.block.cache.index.hit COUNT : 42696 rocksdb.block.cache.index.add COUNT : 2500 rocksdb.block.cache.index.bytes.insert COUNT : 4050048 rocksdb.block.cache.index.bytes.evict COUNT : 0 rocksdb.block.cache.filter.miss COUNT : 2500 rocksdb.block.cache.filter.hit COUNT : 4550493 rocksdb.block.cache.filter.add COUNT : 2500 rocksdb.block.cache.filter.bytes.insert COUNT : 10331040 rocksdb.block.cache.filter.bytes.evict COUNT : 0 Differential Revision: D16817382 Pulled By: ltamasi fbshipit-source-id: 28a516b0da1f041a03313e0b70b28cf5cf205d00	2019-08-14 18:16:06 -07:00
Aaryaman Sagar	77273d4137	Fix TSAN failures in DistributedMutex tests (#5684 ) Summary: TSAN was not able to correctly instrument atomic bts and btr instructions, so when TSAN is enabled implement those with std::atomic::fetch_or and std::atomic::fetch_and. Also disable tests that fail on TSAN with false negatives (we know these are false negatives because this other verifiably correct program fails with the same TSAN error <link>) ``` make clean TEST_TMPDIR=/dev/shm/rocksdb OPT=-g COMPILE_WITH_TSAN=1 make J=1 -j56 folly_synchronization_distributed_mutex_test ``` This is the code that fails with the same false-negative with TSAN ``` namespace { class ExceptionWithConstructionTrack : public std::exception { public: explicit ExceptionWithConstructionTrack(int id) : id_{folly::to<std::string>(id)}, constructionTrack_{id} {} const char* what() const noexcept override { return id_.c_str(); } private: std::string id_; TestConstruction constructionTrack_; }; template <typename Storage, typename Atomic> void transferCurrentException(Storage& storage, Atomic& produced) { assert(std::current_exception()); new (&storage) std::exception_ptr(std::current_exception()); produced->store(true, std::memory_order_release); } void concurrentExceptionPropagationStress( int numThreads, std::chrono::milliseconds milliseconds) { auto&& stop = std::atomic<bool>{false}; auto&& exceptions = std::vector<std::aligned_storage<48, 8>::type>{}; auto&& produced = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumed = std::vector<std::unique_ptr<std::atomic<bool>>>{}; auto&& consumers = std::vector<std::thread>{}; for (auto i = 0; i < numThreads; ++i) { produced.emplace_back(new std::atomic<bool>{false}); consumed.emplace_back(new std::atomic<bool>{false}); exceptions.push_back({}); } auto producer = std::thread{[&]() { auto counter = std::vector<int>(numThreads, 0); for (auto i = 0; true; i = ((i + 1) % numThreads)) { try { throw ExceptionWithConstructionTrack{counter.at(i)++}; } catch (...) { transferCurrentException(exceptions.at(i), produced.at(i)); } while (!consumed.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } consumed.at(i)->store(false, std::memory_order_release); } }}; for (auto i = 0; i < numThreads; ++i) { consumers.emplace_back([&, i]() { auto counter = 0; while (true) { while (!produced.at(i)->load(std::memory_order_acquire)) { if (stop.load(std::memory_order_acquire)) { return; } } produced.at(i)->store(false, std::memory_order_release); try { auto storage = &exceptions.at(i); auto exc = folly::launder( reinterpret_cast<std::exception_ptr>(storage)); auto copy = std::move(exc); exc->std::exception_ptr::~exception_ptr(); std::rethrow_exception(std::move(copy)); } catch (std::exception& exc) { auto value = std::stoi(exc.what()); EXPECT_EQ(value, counter++); } consumed.at(i)->store(true, std::memory_order_release); } }); } std::this_thread::sleep_for(milliseconds); stop.store(true); producer.join(); for (auto& thread : consumers) { thread.join(); } } } // namespace ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/5684 Differential Revision: D16746077 Pulled By: miasantreble fbshipit-source-id: 8af88dcf9161c05daec1a76290f577918638f79d	2019-08-14 17:01:31 -07:00
Manuel Ung	7785f61132	WriteUnPrepared: Fix bug in savepoints (#5703 ) Summary: Fix a bug in write unprepared savepoints. When flushing the write batch according to savepoint boundaries, we were forgetting to flush the last write batch after the last savepoint, meaning that some data was not written to DB. Also, add a small optimization where we avoid flushing empty batches. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5703 Differential Revision: D16811996 Pulled By: lth fbshipit-source-id: 600c7e0e520ad7a8fad32d77e11d932453e68e3f	2019-08-14 16:15:46 -07:00

... 4 5 6 7 8 ...

8539 Commits