rocksdb/table
lovro b6655a679d Replace std::priority_queue in MergingIterator with custom heap
Summary:
While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison.  Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge.

Keys in our dataset include sequence numbers that increase with time.  Adjacent keys in an L0 file are very likely to be adjacent in the full database.  Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one.  It would be great to avoid the O(log K) operation per row while compacting.

This diff replaces std::priority_queue with a custom binary heap implementation.  It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top).

Test Plan:
make check

To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number).  I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs).  The exact improvement depends on the number of L0 files and the amount of locality.  Performance on randomly distributed keys seems on par with the old code.

Reviewers: kailiu, sdong, igor

Reviewed By: igor

Subscribers: yoshinorim, dhruba, tnovak

Differential Revision: https://reviews.facebook.net/D29133
2015-07-06 04:24:09 -07:00
..
adaptive_table_factory.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
adaptive_table_factory.h A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
block_based_filter_block_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
block_based_filter_block.cc Build for CYGWIN 2015-04-23 21:33:44 -07:00
block_based_filter_block.h Remember whole key/prefix filtering on/off in SST file 2015-02-11 11:20:04 -08:00
block_based_table_builder.cc Use malloc_usable_size() for accounting block cache size 2015-06-26 11:48:09 -07:00
block_based_table_builder.h Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files 2015-06-05 20:18:21 -07:00
block_based_table_factory.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
block_based_table_factory.h A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
block_based_table_reader.cc Use malloc_usable_size() for accounting block cache size 2015-06-26 11:48:09 -07:00
block_based_table_reader.h Add functionality to pre-fetch blocks specified by a key range to BlockBasedTable implementation. 2015-03-02 17:07:03 -08:00
block_builder.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
block_builder.h delete unused Comparator 2014-09-04 09:10:13 +08:00
block_hash_index_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
block_hash_index.cc fix typos 2015-04-25 18:14:27 +09:00
block_hash_index.h Turn on -Wshadow 2014-10-31 11:59:54 -07:00
block_prefix_index.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
block_prefix_index.h fix a few compile warnings 2014-09-04 23:06:23 +08:00
block_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
block.cc Use malloc_usable_size() for accounting block cache size 2015-06-26 11:48:09 -07:00
block.h Use malloc_usable_size() for accounting block cache size 2015-06-26 11:48:09 -07:00
bloom_block.cc table/bloom_block.*: pass func parameter by reference 2014-09-30 23:30:31 +02:00
bloom_block.h Enforce write buffer memory limit across column families 2014-12-02 12:09:20 -08:00
cuckoo_table_builder_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
cuckoo_table_builder.cc Support footer versions bigger than 1 2015-01-13 14:33:04 -08:00
cuckoo_table_builder.h Add more table properties to EventLogger 2015-05-12 15:53:55 -07:00
cuckoo_table_factory.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
cuckoo_table_factory.h A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
cuckoo_table_reader_test.cc Fixing build issue 2015-03-24 16:27:24 -07:00
cuckoo_table_reader.cc Turn on -Wshorten-64-to-32 and fix all the errors 2014-11-11 16:47:22 -05:00
cuckoo_table_reader.h use GetContext to replace callback function pointer 2014-09-29 11:09:09 -07:00
filter_block.h Dump routine to BlockBasedTableReader 2014-12-23 13:24:07 -08:00
flush_block_policy.cc move block based table related options BlockBasedTableOptions 2014-08-25 14:22:05 -07:00
format.cc Build for CYGWIN 2015-04-23 21:33:44 -07:00
format.h New BlockBasedTable version -- better compressed block format 2015-01-14 16:24:24 -08:00
full_filter_block_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
full_filter_block.cc Remember whole key/prefix filtering on/off in SST file 2015-02-11 11:20:04 -08:00
full_filter_block.h Remember whole key/prefix filtering on/off in SST file 2015-02-11 11:20:04 -08:00
get_context.cc Implement a table-level row cache 2015-06-23 10:25:45 -07:00
get_context.h Implement a table-level row cache 2015-06-23 10:25:45 -07:00
iter_heap.h Replace std::priority_queue in MergingIterator with custom heap 2015-07-06 04:24:09 -07:00
iterator_wrapper.h Turn -Wshadow back on 2014-11-06 11:14:28 -08:00
iterator.cc rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
merger_test.cc rocksdb: switch to gtest 2015-03-17 14:08:00 -07:00
merger.cc Replace std::priority_queue in MergingIterator with custom heap 2015-07-06 04:24:09 -07:00
merger.h In DB::NewIterator(), try to allocate the whole iterator tree in an arena 2014-06-02 17:44:57 -07:00
meta_blocks.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
meta_blocks.h A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
mock_table.cc SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward 2015-04-14 16:18:50 -07:00
mock_table.h Add more table properties to EventLogger 2015-05-12 15:53:55 -07:00
plain_table_builder.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
plain_table_builder.h Add more table properties to EventLogger 2015-05-12 15:53:55 -07:00
plain_table_factory.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
plain_table_factory.h A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
plain_table_index.cc Block plain_table_index.cc in ROCKSDB_LITE 2014-11-24 20:47:27 -08:00
plain_table_index.h Block plain_table_index.cc in ROCKSDB_LITE 2014-11-24 20:47:27 -08:00
plain_table_key_coding.cc Avoid naming conflict of EntryType 2015-04-06 11:49:13 -07:00
plain_table_key_coding.h typo improvement 2014-09-06 23:21:26 +08:00
plain_table_reader.cc rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
plain_table_reader.h rocksdb: Add missing override 2015-02-26 11:28:41 -08:00
table_builder.h Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files 2015-06-05 20:18:21 -07:00
table_properties_internal.h Fix iOS compile with -Wshorten-64-to-32 2014-11-13 14:39:30 -05:00
table_properties.cc Add rocksdb::ToString() to address cases where std::to_string is not available. 2014-11-24 20:44:49 -08:00
table_reader_bench.cc A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge 2015-04-06 10:27:21 -07:00
table_reader.h Add functionality to pre-fetch blocks specified by a key range to BlockBasedTable implementation. 2015-03-02 17:07:03 -08:00
table_test.cc Optimistic Transactions 2015-05-29 14:36:35 -07:00
two_level_iterator.cc Allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena 2015-06-30 17:30:38 -07:00
two_level_iterator.h Allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena 2015-06-30 17:30:38 -07:00