rocksdb/db
Maysam Yabandeh f383641a1d Unordered Writes (#5218)
Summary:
Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees.
The patch is prepared based on an original by siying
Existing unit tests are extended to include unordered_write option.

Benchmark Results:
```
TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions  --unordered_write=1
```
With WAL
- Vanilla RocksDB: 78.6 MB/s
- WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x)
- unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees)

Without WAL
- Vanilla RocksDB: 111.3 MB/s
- WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x)
- unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees)

- WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x)

Limitations:
- The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218

Differential Revision: D15219029

Pulled By: maysamyabandeh

fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759
2019-05-13 17:47:21 -07:00
..
builder.cc Periodic Compactions (#5166) 2019-04-10 19:31:18 -07:00
builder.h Periodic Compactions (#5166) 2019-04-10 19:31:18 -07:00
c_test.c add missing rocksdb_flush_cf in c (#5243) 2019-04-25 11:25:43 -07:00
c.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
column_family_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
column_family.cc Don't call FindObsoleteFiles() in ~ColumnFamilyHandleImpl() if CF is not dropped (#5238) 2019-04-24 17:11:36 -07:00
column_family.h Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
compact_files_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
compacted_db_impl.cc move dump stats to a separate thread (#4382) 2018-10-08 22:54:43 -07:00
compacted_db_impl.h Enable checkpoint of read-only db (#4681) 2018-12-07 17:06:02 -08:00
compaction_iteration_stats.h add counter for deletion dropping optimization 2017-08-19 14:10:08 -07:00
compaction_iterator_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
compaction_iterator.cc Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
compaction_iterator.h Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
compaction_job_stats_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
compaction_job_test.cc Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
compaction_job.cc Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
compaction_job.h Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
compaction_picker_fifo.cc Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
compaction_picker_fifo.h Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
compaction_picker_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
compaction_picker_universal.cc Add two more StatsLevel (#5027) 2019-02-28 10:27:59 -08:00
compaction_picker_universal.h Delete triggered compaction for universal style 2018-05-29 15:44:34 -07:00
compaction_picker.cc Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
compaction_picker.h Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
compaction.cc Dictionary compression for files written by SstFileWriter (#4978) 2019-02-14 11:23:55 -08:00
compaction.h Truncate range tombstones by leveraging InternalKeys (#4432) 2018-10-09 15:19:38 -07:00
comparator_db_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
convenience.cc Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
corruption_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
cuckoo_table_db_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
db_basic_test.cc Fix MultiGet ASSERT bug when passing unsorted result (#5195) 2019-04-15 11:35:21 -07:00
db_blob_index_test.cc fix lite build 2017-10-17 08:57:09 -07:00
db_block_cache_test.cc Add BlockBasedTableOptions::index_shortening (#5174) 2019-04-22 08:20:35 -07:00
db_bloom_filter_test.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
db_compaction_filter_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
db_compaction_test.cc Use creation_time or mtime when file_creation_time=0 (#5184) 2019-04-18 22:39:34 -07:00
db_dynamic_level_test.cc Fix flaky DBDynamicLevelTest.DynamicLevelMaxBytesBase2 (#4668) 2018-11-12 16:42:16 -08:00
db_encryption_test.cc Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
db_filesnapshot.cc Remove redundant member var and set options (#4631) 2018-11-12 12:24:26 -08:00
db_flush_test.cc Fix a race condition caused by unlocking db mutex (#5294) 2019-05-10 17:56:48 -07:00
db_impl_compaction_flush.cc Fix a race condition caused by unlocking db mutex (#5294) 2019-05-10 17:56:48 -07:00
db_impl_debug.cc Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
db_impl_experimental.cc Update JobContext. (#3949) 2018-08-03 17:42:34 -07:00
db_impl_files.cc Close WAL files before deletion (#5233) 2019-04-25 10:11:41 -07:00
db_impl_open.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
db_impl_readonly.cc Add two more StatsLevel (#5027) 2019-02-28 10:27:59 -08:00
db_impl_readonly.h Apply automatic formatting to some files (#5114) 2019-03-27 16:24:45 -07:00
db_impl_secondary.cc add WAL replay in TryCatchUpWithPrimary (#5282) 2019-05-08 10:59:37 -07:00
db_impl_secondary.h add WAL replay in TryCatchUpWithPrimary (#5282) 2019-05-08 10:59:37 -07:00
db_impl_write.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
db_impl.cc DB::Close() to fail when there are unreleased snapshots (#5272) 2019-05-01 10:17:30 -07:00
db_impl.h Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
db_info_dumper.cc avoid copying when iterating using range-based for (#4459) 2018-10-09 17:15:51 -07:00
db_info_dumper.h Change RocksDB License 2017-07-15 16:11:23 -07:00
db_inplace_update_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
db_io_failure_test.cc Disable DBIOFailureTest.NoSpaceCompactRange in LITE (#4596) 2018-10-29 14:36:31 -07:00
db_iter_stress_test.cc Move prefix_extractor to MutableCFOptions 2018-05-21 14:43:11 -07:00
db_iter_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
db_iter.cc DBIter::Next() can skip user key checking if previous entry's seqnum is 0 (#5244) 2019-05-09 12:24:04 -07:00
db_iter.h Remove v1 RangeDelAggregator (#4778) 2018-12-17 17:33:46 -08:00
db_iterator_test.cc Merging iterator to avoid child iterator reseek for some cases (#5286) 2019-05-09 14:20:04 -07:00
db_log_iter_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
db_memtable_test.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
db_merge_operator_test.cc WriteUnPrepared: less virtual in iterator callback (#5049) 2019-04-02 14:47:16 -07:00
db_options_test.cc Update RepeatableThreadTest with MockTimeEnv (#5107) 2019-03-29 10:08:50 -07:00
db_properties_test.cc Deprecate ttl option from CompactionOptionsFIFO (#4965) 2019-02-15 09:51:41 -08:00
db_range_del_test.cc Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
db_secondary_test.cc add WAL replay in TryCatchUpWithPrimary (#5282) 2019-05-08 10:59:37 -07:00
db_sst_test.cc Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
db_statistics_test.cc Make statistics's stats_level change thread-safe (#5030) 2019-03-01 10:42:09 -08:00
db_table_properties_test.cc Allow dynamic modification of window size and deletion trigger (#4403) 2018-09-20 15:15:28 -07:00
db_tailing_iter_test.cc Remove managed iterator 2018-07-17 14:43:18 -07:00
db_test2.cc DB::Close() to fail when there are unreleased snapshots (#5272) 2019-05-01 10:17:30 -07:00
db_test_util.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
db_test_util.h Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
db_test.cc Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
db_universal_compaction_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
db_wal_test.cc Preload some files even if options.max_open_files (#3340) 2018-12-28 18:02:28 -08:00
db_write_test.cc Expose DB methods to lock and unlock the WAL (#5146) 2019-04-06 06:40:36 -07:00
dbformat_test.cc Relax VersionStorageInfo::GetOverlappingInputs check (#4050) 2018-07-13 17:42:38 -07:00
dbformat.cc Apply automatic formatting to some files (#5114) 2019-03-27 16:24:45 -07:00
dbformat.h Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
deletefile_test.cc add assert to silence clang analyzer and fix variable shadowing (#5140) 2019-04-02 21:15:44 -07:00
error_handler_test.cc Fix regression test failures introduced by PR #4164 (#4375) 2018-09-17 13:14:07 -07:00
error_handler.cc Fix typos in comments (#4456) 2018-10-04 20:46:50 -07:00
error_handler.h Fix typos in comments (#4456) 2018-10-04 20:46:50 -07:00
event_helpers.cc Log file_creation_time table property (#5232) 2019-04-22 15:30:07 -07:00
event_helpers.h Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
experimental.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
external_sst_file_basic_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
external_sst_file_ingestion_job.cc Collect compaction stats by priority and dump to info LOG (#5050) 2019-03-19 17:28:19 -07:00
external_sst_file_ingestion_job.h Atomic ingest (#4895) 2019-02-12 19:16:17 -08:00
external_sst_file_test.cc Refactor ExternalSSTFileTest (#5129) 2019-04-08 11:16:34 -07:00
fault_injection_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
file_indexer_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
file_indexer.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
file_indexer.h Change RocksDB License 2017-07-15 16:11:23 -07:00
filename_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
flush_job_test.cc Collect compaction stats by priority and dump to info LOG (#5050) 2019-03-19 17:28:19 -07:00
flush_job.cc Periodic Compactions (#5166) 2019-04-10 19:31:18 -07:00
flush_job.h Collect compaction stats by priority and dump to info LOG (#5050) 2019-03-19 17:28:19 -07:00
flush_scheduler.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
flush_scheduler.h Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
forward_iterator_bench.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
forward_iterator.cc Avoid per-key upper bound check in BlockBasedTableIterator (#5142) 2019-04-16 11:37:47 -07:00
forward_iterator.h Comment out unused variables 2018-03-05 13:13:41 -08:00
in_memory_stats_history.cc add GetStatsHistory to retrieve stats snapshots (#4748) 2019-02-20 15:52:54 -08:00
in_memory_stats_history.h add GetStatsHistory to retrieve stats snapshots (#4748) 2019-02-20 15:52:54 -08:00
internal_stats.cc Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
internal_stats.h Collect compaction stats by priority and dump to info LOG (#5050) 2019-03-19 17:28:19 -07:00
job_context.h WritePrepared: Fix visible key compacted out by compaction (#4883) 2019-01-15 21:34:38 -08:00
listener_test.cc Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
log_format.h Fix an inaccurate comment (#4315) 2018-08-24 18:13:20 -07:00
log_reader.cc Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
log_reader.h secondary instance: add support for WAL tailing on OpenAsSecondary 2019-04-24 12:08:44 -07:00
log_test.cc Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
log_writer.cc Close WAL files before deletion (#5233) 2019-04-25 10:11:41 -07:00
log_writer.h Close WAL files before deletion (#5233) 2019-04-25 10:11:41 -07:00
logs_with_prep_tracker.cc Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
logs_with_prep_tracker.h Skip deleted WALs during recovery 2018-05-03 15:43:09 -07:00
lookup_key.h Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
malloc_stats.cc Detect if Jemalloc is linked with the binary (#4844) 2019-01-03 16:30:12 -08:00
malloc_stats.h Change RocksDB License 2017-07-15 16:11:23 -07:00
manual_compaction_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
memtable_list_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
memtable_list.cc Fix a race condition caused by unlocking db mutex (#5294) 2019-05-10 17:56:48 -07:00
memtable_list.h Use correct FileMeta for atomic flush result install (#4932) 2019-01-31 14:49:51 -08:00
memtable.cc rename variable to avoid shadowing (#5204) 2019-04-17 10:15:05 -07:00
memtable.h add whole key bloom filter support in memtables (#4985) 2019-02-19 12:15:39 -08:00
merge_context.h Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
merge_helper_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
merge_helper.cc Add two more StatsLevel (#5027) 2019-02-28 10:27:59 -08:00
merge_helper.h Remove v1 RangeDelAggregator (#4778) 2018-12-17 17:33:46 -08:00
merge_operator.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
merge_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
obsolete_files_test.cc Improve obsolete_files_test (#5125) 2019-03-28 13:16:02 -07:00
options_file_test.cc Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
perf_context_test.cc Fix some variable naming in db/transaction_log_impl.* (#5112) 2019-03-27 12:27:54 -07:00
pinned_iterators_manager.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_db_test.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
pre_release_callback.h Mark logs with prepare in PreReleaseCallback (#5121) 2019-04-02 15:17:47 -07:00
prefix_test.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
range_del_aggregator_bench.cc Fix Windows broken build error due to non-const override (#4798) 2018-12-19 13:29:51 -08:00
range_del_aggregator_test.cc Remove v1 RangeDelAggregator (#4778) 2018-12-17 17:33:46 -08:00
range_del_aggregator.cc Make ReadRangeDelAggregator::ShouldDelete() more inline friendly (#5202) 2019-04-18 12:27:25 -07:00
range_del_aggregator.h Make ReadRangeDelAggregator::ShouldDelete() more inline friendly (#5202) 2019-04-18 12:27:25 -07:00
range_tombstone_fragmenter_test.cc Prepare FragmentedRangeTombstoneIterator for use in compaction (#4740) 2018-12-11 12:10:48 -08:00
range_tombstone_fragmenter.cc Avoid per-key upper bound check in BlockBasedTableIterator (#5142) 2019-04-16 11:37:47 -07:00
range_tombstone_fragmenter.h Add compaction logic to RangeDelAggregatorV2 (#4758) 2018-12-17 13:20:51 -08:00
read_callback.h WritePrepared: fix race condition in reading batch with duplicate keys (#5147) 2019-04-12 14:40:41 -07:00
repair_test.cc Acquire lock on DB LOCK file before starting repair. (#4435) 2018-10-12 10:41:54 -07:00
repair.cc Force read existing data during db repair (#5209) 2019-04-19 11:55:13 -07:00
snapshot_checker.h WritePrepared: fix issue with snapshot released during compaction (#4858) 2019-01-16 09:55:32 -08:00
snapshot_impl.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
snapshot_impl.h Refresh snapshot list during long compactions (2nd attempt) (#5278) 2019-05-03 17:30:22 -07:00
table_cache.cc Improve explicit user readahead performance (#5246) 2019-04-26 21:24:10 -07:00
table_cache.h Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
table_properties_collector_test.cc Feature for sampling and reporting compressibility (#4842) 2019-03-18 12:15:34 -07:00
table_properties_collector.cc Feature for sampling and reporting compressibility (#4842) 2019-03-18 12:15:34 -07:00
table_properties_collector.h Feature for sampling and reporting compressibility (#4842) 2019-03-18 12:15:34 -07:00
transaction_log_impl.cc Fix some variable naming in db/transaction_log_impl.* (#5112) 2019-03-27 12:27:54 -07:00
transaction_log_impl.h Fix some variable naming in db/transaction_log_impl.* (#5112) 2019-03-27 12:27:54 -07:00
version_builder_test.cc Apply modernize-use-override (3) 2019-02-19 13:39:49 -08:00
version_builder.cc Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
version_builder.h Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
version_edit_test.cc Add a unit test to Ignorable manfiest record (#4964) 2019-02-11 11:20:24 -08:00
version_edit.cc Add a placeholder in manifest indicating ignorable record (#4960) 2019-02-08 11:33:11 -08:00
version_edit.h Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
version_set_test.cc Apply modernize-use-override (3) 2019-02-19 13:39:49 -08:00
version_set.cc Merging iterator to avoid child iterator reseek for some cases (#5286) 2019-05-09 14:20:04 -07:00
version_set.h VersionSet: optmize GetOverlappingInputsRangeBinarySearch (#4987) 2019-04-17 18:15:20 -07:00
wal_manager_test.cc Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
wal_manager.cc Smooth the deletion of WAL files (#5116) 2019-03-28 15:17:13 -07:00
wal_manager.h Fix memleak when DB::DeleteFile() 2018-01-11 18:57:33 -08:00
write_batch_base.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_batch_internal.h WriteUnPrepared: Add new WAL marker kTypeBeginUnprepareXID (#4069) 2018-06-28 18:58:29 -07:00
write_batch_test.cc Apply modernize-use-override (3) 2019-02-19 13:39:49 -08:00
write_batch.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
write_callback_test.cc Unordered Writes (#5218) 2019-05-13 17:47:21 -07:00
write_callback.h Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller_test.cc Apply modernize-use-override (3) 2019-02-19 13:39:49 -08:00
write_controller.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
write_controller.h Change RocksDB License 2017-07-15 16:11:23 -07:00
write_thread.cc Add more sync point to fix flaky test GroupCommitTest 2018-11-07 14:07:53 -08:00
write_thread.h Fix skip WAL for whole write_group when leader's callback fail (#4838) 2019-01-03 12:40:42 -08:00