rocksdb/util
lovro b6655a679d Replace std::priority_queue in MergingIterator with custom heap
Summary:
While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison.  Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge.

Keys in our dataset include sequence numbers that increase with time.  Adjacent keys in an L0 file are very likely to be adjacent in the full database.  Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one.  It would be great to avoid the O(log K) operation per row while compacting.

This diff replaces std::priority_queue with a custom binary heap implementation.  It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top).

Test Plan:
make check

To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number).  I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs).  The exact improvement depends on the number of L0 files and the amount of locality.  Performance on randomly distributed keys seems on par with the old code.

Reviewers: kailiu, sdong, igor

Reviewed By: igor

Subscribers: yoshinorim, dhruba, tnovak

Differential Revision: https://reviews.facebook.net/D29133
2015-07-06 04:24:09 -07:00
..
allocator.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
arena_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
arena.cc Make arena use hugepage if possible 2014-11-21 14:11:40 -08:00
arena.h Removing unnecessary kInlineSize 2015-03-12 21:13:53 +03:00
auto_roll_logger_test.cc Introduce InfoLogLevel::HEADER_LEVEL 2015-07-02 17:14:39 -07:00
auto_roll_logger.cc Add Header to logging to capture application level information 2015-02-06 10:37:45 -08:00
auto_roll_logger.h rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
autovector_test.cc Make autovector_test runnable in ROCKSDB_LITE 2015-06-18 15:58:00 -07:00
autovector.h Fix possible SIGSEGV in CompactRange (github issue #596) 2015-04-29 10:52:31 -07:00
bloom_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
bloom.cc fix typos 2015-04-25 18:14:27 +09:00
build_version.h build: do not relink every single binary just for a timestamp 2015-02-19 13:11:10 -08:00
cache_bench.cc Fix -Wshadow for tools 2014-11-07 15:04:30 -08:00
cache_test.cc Fix memory leaks in PinnedUsageTest 2015-06-19 09:43:08 -07:00
cache.cc Add Cache.GetPinnedUsageUsage() 2015-06-18 13:56:31 -07:00
channel.h Multithreaded backup and restore in BackupEngineImpl 2015-07-02 11:35:51 -07:00
coding_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
coding.cc Removing BitStream* functions 2014-08-19 06:48:21 -07:00
coding.h Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
compaction_job_stats_impl.cc Fixed the tsan failure in util/compaction_job_stats_impl.cc 2015-06-05 11:05:35 -07:00
comparator.cc rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
compression.h Fail DB::Open() when the requested compression is not available 2015-06-18 14:55:05 -07:00
crc32c_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
crc32c.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
crc32c.h Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
db_info_dumper.cc Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL 2015-05-22 11:54:59 -07:00
db_info_dumper.h Fix iOS compile with -Wshorten-64-to-32 2014-11-13 14:39:30 -05:00
dynamic_bloom_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
dynamic_bloom.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
dynamic_bloom.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
env_hdfs.cc fix typos 2015-04-25 18:14:27 +09:00
env_posix.cc Add read_nanos to IOStatsContext. 2015-06-22 11:09:35 -07:00
env_test.cc add rocksdb::WritableFileWrapper similar to rocksdb::EnvWrapper 2015-06-01 11:22:36 -07:00
env.cc Introduce InfoLogLevel::HEADER_LEVEL 2015-07-02 17:14:39 -07:00
event_logger_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
event_logger.cc Allow EventLogger to directly log from a JSONWriter. 2015-05-21 15:39:30 -07:00
event_logger.h Allow EventLogger to directly log from a JSONWriter. 2015-05-21 15:39:30 -07:00
file_util.cc Provide openable snapshots 2014-11-14 11:38:26 -08:00
file_util.h Provide openable snapshots 2014-11-14 11:38:26 -08:00
filelock_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
filter_policy.cc Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
hash_cuckoo_rep.cc Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
hash_cuckoo_rep.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
hash_linklist_rep.cc rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
hash_linklist_rep.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
hash_skiplist_rep.cc rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
hash_skiplist_rep.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
hash.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
hash.h Introduce GetThreadList API 2014-11-20 10:49:32 -08:00
heap.h Replace std::priority_queue in MergingIterator with custom heap 2015-07-06 04:24:09 -07:00
histogram_test.cc fix typos 2015-04-25 18:14:27 +09:00
histogram.cc fix typos 2015-04-25 18:14:27 +09:00
histogram.h Fix iOS compile with -Wshorten-64-to-32 2014-11-13 14:39:30 -05:00
instrumented_mutex.cc Perf Context to report DB mutex waiting time 2015-02-09 17:55:12 -08:00
instrumented_mutex.h Add a counter for collecting the wait time on db mutex. 2015-02-04 21:39:45 -08:00
iostats_context_imp.h Removed two unused macros in iostats_context 2015-06-12 10:45:02 -07:00
iostats_context.cc Add read_nanos to IOStatsContext. 2015-06-22 11:09:35 -07:00
ldb_cmd_execute_result.h rocksdb: Small refactoring before migrating to gtest 2015-03-16 18:08:59 -07:00
ldb_cmd.cc Use CompactRangeOptions for CompactRange 2015-06-17 14:36:14 -07:00
ldb_cmd.h rocksdb: Small refactoring before migrating to gtest 2015-03-16 18:08:59 -07:00
ldb_tool.cc Added 'dump_live_files' command to ldb tool. 2014-12-12 17:50:36 -08:00
log_buffer.cc Enlarge log size cap when printing file summary 2014-09-23 16:56:34 -07:00
log_buffer.h RocksDB on FreeBSD support 2015-02-26 15:19:17 -08:00
log_write_bench.cc Fix more gflag namespace issues 2014-05-09 08:41:02 -07:00
logging.cc Make the benchmark scripts configurable and add tests 2015-03-30 11:28:25 -07:00
logging.h Make the benchmark scripts configurable and add tests 2015-03-30 11:28:25 -07:00
manual_compaction_test.cc Use CompactRangeOptions for CompactRange 2015-06-17 14:36:14 -07:00
memenv_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
memenv.cc rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
mock_env_test.cc Fix flakiness of WalManagerTest 2015-04-13 16:15:05 -07:00
mock_env.cc Fix flakiness of WalManagerTest 2015-04-13 16:15:05 -07:00
mock_env.h Fix flakiness of WalManagerTest 2015-04-13 16:15:05 -07:00
murmurhash.cc Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
murmurhash.h Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
mutable_cf_options.cc options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically. 2015-03-02 22:40:41 -08:00
mutable_cf_options.h options.paranoid_file_checks to read all rows after writing to a file. 2015-04-23 11:34:35 -07:00
mutexlock.h Add separate Read/WriteUnlock methods in MutexRW. 2014-06-16 15:41:46 -07:00
options_builder.cc Remove the compability check on log2 OS_ANDROID as it's already blocked by ROCKSDB_LITE 2014-12-04 13:56:14 -08:00
options_helper.cc Support saving history in memtable_list 2015-05-28 16:34:24 -07:00
options_helper.h Missing header in build on CentOS 2014-11-18 22:21:02 +01:00
options_test.cc Support saving history in memtable_list 2015-05-28 16:34:24 -07:00
options.cc Implement a table-level row cache 2015-06-23 10:25:45 -07:00
perf_context_imp.h more times in perf_context and iostats_context 2015-06-02 02:07:58 -07:00
perf_context.cc more times in perf_context and iostats_context 2015-06-02 02:07:58 -07:00
perf_level_imp.h more times in perf_context and iostats_context 2015-06-02 02:07:58 -07:00
perf_level.cc more times in perf_context and iostats_context 2015-06-02 02:07:58 -07:00
perf_step_timer.h more times in perf_context and iostats_context 2015-06-02 02:07:58 -07:00
posix_logger.h more times in perf_context and iostats_context 2015-06-02 02:07:58 -07:00
random.h Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
rate_limiter_test.cc Enable dynamic changing of rate limiter's bytes_per_second 2015-03-18 15:35:55 -07:00
rate_limiter.cc Enable dynamic changing of rate limiter's bytes_per_second 2015-03-18 15:35:55 -07:00
rate_limiter.h Enable dynamic changing of rate limiter's bytes_per_second 2015-03-18 15:35:55 -07:00
scoped_arena_iterator.h Remove path with arena==nullptr from NewInternalIterator 2014-09-04 17:40:41 -07:00
skiplistrep.cc Allow GetApproximateSize() to include mem table size if it is skip list memtable 2015-06-16 18:13:23 -07:00
slice_transform_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
slice.cc rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
sst_dump_test.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
sst_dump_tool_imp.h Disable pre-fetching of index and filter blocks for sst_dump_tool. 2015-02-25 16:34:26 -08:00
sst_dump_tool.cc Abstract out SetMaxPossibleForUserKey() and SetMinPossibleForUserKey 2015-04-23 18:08:37 -07:00
statistics.cc Fix assert in histogramData 2015-01-23 18:10:52 -08:00
statistics.h make statistics forward-able 2014-07-28 12:10:49 -07:00
status.cc Optimistic Transactions 2015-05-29 14:36:35 -07:00
stl_wrappers.h Killing Transform Rep 2013-12-03 12:42:15 -08:00
stop_watch.h Change StopWatch interface 2014-07-28 12:22:37 -07:00
string_util.cc Clean up StringSplit 2014-11-21 11:05:28 -05:00
string_util.h Build for CYGWIN 2015-04-23 21:33:44 -07:00
sync_point.cc SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward 2015-04-14 16:18:50 -07:00
sync_point.h SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward 2015-04-14 16:18:50 -07:00
testharness.cc rocksdb: print status error message when (ASSERT|EXPECT)_OK fails 2015-03-19 17:32:43 -07:00
testharness.h rocksdb: print status error message when (ASSERT|EXPECT)_OK fails 2015-03-19 17:32:43 -07:00
testutil.cc Merger test 2014-09-08 22:24:40 -07:00
testutil.h rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
thread_list_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
thread_local_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
thread_local.cc Use ustricter consistency in thread local operations 2015-01-27 13:56:03 -08:00
thread_local.h Improve the comment of util/thread_local.h 2014-10-21 17:28:31 -07:00
thread_operation.h update an import path to fit in with the rest of the kids 2015-05-22 22:56:32 -07:00
thread_status_impl.cc Fixed compile errors due to some gcc does not have std::map::emplace 2015-05-18 13:48:56 -07:00
thread_status_updater_debug.cc Allow GetThreadList() to indicate a thread is doing Compaction. 2015-01-13 00:04:08 -08:00
thread_status_updater.cc Only initialize the ThreadStatusData when necessary. 2015-06-17 11:21:18 -07:00
thread_status_updater.h Only initialize the ThreadStatusData when necessary. 2015-06-17 11:21:18 -07:00
thread_status_util_debug.cc Fix bad performance in debug mode 2015-04-13 15:58:45 -07:00
thread_status_util.cc Only initialize the ThreadStatusData when necessary. 2015-06-17 11:21:18 -07:00
thread_status_util.h Only initialize the ThreadStatusData when necessary. 2015-06-17 11:21:18 -07:00
vectorrep.cc assert(sorted) in vector rep 2015-04-13 17:33:24 -07:00
xfunc.cc Optimistic Transactions 2015-05-29 14:36:35 -07:00
xfunc.h Optimistic Transactions 2015-05-29 14:36:35 -07:00
xxhash.cc Prevent xxhash symbols from polluting global namespace 2015-03-12 12:07:10 -07:00
xxhash.h Prevent xxhash symbols from polluting global namespace 2015-03-12 12:07:10 -07:00