rocksdb/table
Sagar Vemuri d938226af4 Improve performance of long range scans with readahead
Summary:
This change improves the performance of iterators doing long range scans (e.g. big/full table scans in MyRocks) by using readahead and prefetching additional data on each disk IO. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.

Constraints:
- The prefetched data is stored by the OS in page cache. So this currently works only for non direct-reads use-cases i.e applications which use page cache. (Direct-I/O support will be enabled in a later PR).
- This gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).

Thanks to siying for the original idea and implementation.

**Benchmarks:**
Data fill:
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
```
Do a long range scan: Seekrandom with large number of nexts
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
```

Page cache was cleared before each experiment with the command:
```
sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
```
```
Before:
seekrandom   :   34020.945 micros/op 29 ops/sec;   32.5 MB/s (1636 of 1999 found)
With this change:
seekrandom   :    8726.912 micros/op 114 ops/sec;  126.8 MB/s (5702 of 6999 found)
```
~3.9X performance improvement.

Also verified with strace and gdb that the readahead size is increasing as expected.
```
strace -e readahead -f -T -t -p <db_bench process pid>
```
Closes https://github.com/facebook/rocksdb/pull/3282

Differential Revision: D6586477

Pulled By: sagar0

fbshipit-source-id: 8a118a0ed4594fbb7f5b1cafb242d7a4033cb58c
2018-01-25 21:41:53 -08:00
..
adaptive_table_factory.cc Support prefetch last 512KB with direct I/O in block based file reader 2017-08-11 12:16:45 -07:00
adaptive_table_factory.h Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
block_based_filter_block_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
block_based_filter_block.cc Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
block_based_filter_block.h Change RocksDB License 2017-07-15 16:11:23 -07:00
block_based_table_builder.cc Add a BlockBasedTableOption to turn off index block compression. 2018-01-10 15:11:59 -08:00
block_based_table_builder.h TableProperty::oldest_key_time defaults to 0 2017-10-27 15:00:05 -07:00
block_based_table_factory.cc Add a BlockBasedTableOption to turn off index block compression. 2018-01-10 15:11:59 -08:00
block_based_table_factory.h Add a BlockBasedTableOption to turn off index block compression. 2018-01-10 15:11:59 -08:00
block_based_table_reader.cc Improve performance of long range scans with readahead 2018-01-25 21:41:53 -08:00
block_based_table_reader.h Improve performance of long range scans with readahead 2018-01-25 21:41:53 -08:00
block_builder.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
block_builder.h Change RocksDB License 2017-07-15 16:11:23 -07:00
block_fetcher.cc Refactor ReadBlockContents() 2017-12-11 15:27:32 -08:00
block_fetcher.h Fix BlockFetcher ASAN error 2017-12-12 12:12:38 -08:00
block_prefix_index.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
block_prefix_index.h Change RocksDB License 2017-07-15 16:11:23 -07:00
block_test.cc Speed up BlockTest.BlockReadAmpBitmap 2018-01-02 10:41:28 -08:00
block.cc BlockBasedTable::NewDataBlockIterator to always return BlockIter 2018-01-25 14:57:18 -08:00
block.h BlockBasedTable::NewDataBlockIterator to always return BlockIter 2018-01-25 14:57:18 -08:00
bloom_block.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
bloom_block.h Change RocksDB License 2017-07-15 16:11:23 -07:00
cleanable_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
cuckoo_table_builder_test.cc allow nullptr Slice only as sentinel 2017-08-23 10:56:06 -07:00
cuckoo_table_builder.cc Enable MSVC W4 with a few exceptions. Fix warnings and bugs 2017-10-19 10:57:12 -07:00
cuckoo_table_builder.h Change RocksDB License 2017-07-15 16:11:23 -07:00
cuckoo_table_factory.cc Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
cuckoo_table_factory.h Replace dynamic_cast<> 2017-07-28 16:27:16 -07:00
cuckoo_table_reader_test.cc fix gflags namespace 2017-12-01 10:42:05 -08:00
cuckoo_table_reader.cc table: Fix coverity issues 2017-12-07 11:57:36 -08:00
cuckoo_table_reader.h remove unnecessary internal_comparator param in newIterator 2017-07-27 14:30:42 -07:00
filter_block.h Extend pin_l0 to filter partitions 2017-08-23 07:56:08 -07:00
flush_block_policy.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
format.cc NUMBER_BLOCK_COMPRESSED, etc, shouldn't be treated as timer counter 2017-12-14 10:27:43 -08:00
format.h Refactor ReadBlockContents() 2017-12-11 15:27:32 -08:00
full_filter_bits_builder.h Change RocksDB License 2017-07-15 16:11:23 -07:00
full_filter_block_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
full_filter_block.cc Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
full_filter_block.h Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
get_context.cc Reduce heavy hitter for Get operation 2017-12-12 21:11:33 -08:00
get_context.h Reduce heavy hitter for Get operation 2017-12-12 21:11:33 -08:00
index_builder.cc Enable MSVC W4 with a few exceptions. Fix warnings and bugs 2017-10-19 10:57:12 -07:00
index_builder.h Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
internal_iterator.h Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
iter_heap.h Make InternalKeyComparator final and directly use it in merging iterator 2017-09-11 12:04:21 -07:00
iterator_wrapper.h Change RocksDB License 2017-07-15 16:11:23 -07:00
iterator.cc PinnableSlice move assignment 2017-10-12 18:28:24 -07:00
merger_test.cc Make InternalKeyComparator final and directly use it in merging iterator 2017-09-11 12:04:21 -07:00
merging_iterator.cc fix DBImpl::NewInternalIterator super-version leak on failure 2017-10-11 14:57:43 -07:00
merging_iterator.h fix DBImpl::NewInternalIterator super-version leak on failure 2017-10-11 14:57:43 -07:00
meta_blocks.cc Refactor ReadBlockContents() 2017-12-11 15:27:32 -08:00
meta_blocks.h Support prefetch last 512KB with direct I/O in block based file reader 2017-08-11 12:16:45 -07:00
mock_table.cc remove unnecessary internal_comparator param in newIterator 2017-07-27 14:30:42 -07:00
mock_table.h remove unnecessary internal_comparator param in newIterator 2017-07-27 14:30:42 -07:00
partitioned_filter_block_test.cc Reduce heavy hitter for Get operation 2017-12-12 21:11:33 -08:00
partitioned_filter_block.cc Reduce heavy hitter for Get operation 2017-12-12 21:11:33 -08:00
partitioned_filter_block.h Extend pin_l0 to filter partitions 2017-08-23 07:56:08 -07:00
persistent_cache_helper.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
persistent_cache_helper.h Change RocksDB License 2017-07-15 16:11:23 -07:00
persistent_cache_options.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_builder.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_builder.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_factory.cc Allow upgrades from nullptr to some merge operator 2017-10-04 09:57:23 -07:00
plain_table_factory.h Enable MSVC W4 with a few exceptions. Fix warnings and bugs 2017-10-19 10:57:12 -07:00
plain_table_index.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_index.h table: Fix coverity issues 2017-12-07 11:57:36 -08:00
plain_table_key_coding.cc Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
plain_table_key_coding.h Change RocksDB License 2017-07-15 16:11:23 -07:00
plain_table_reader.cc Support prefetch last 512KB with direct I/O in block based file reader 2017-08-11 12:16:45 -07:00
plain_table_reader.h remove unnecessary internal_comparator param in newIterator 2017-07-27 14:30:42 -07:00
scoped_arena_iterator.h Change RocksDB License 2017-07-15 16:11:23 -07:00
sst_file_writer_collectors.h Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
sst_file_writer.cc Support skipping bloom filters for SstFileWriter 2018-01-22 14:42:18 -08:00
table_builder.h TableProperty::oldest_key_time defaults to 0 2017-10-27 15:00:05 -07:00
table_properties_internal.h Eliminate some redundant block reads. 2018-01-10 17:11:58 -08:00
table_properties.cc Eliminate some redundant block reads. 2018-01-10 17:11:58 -08:00
table_reader_bench.cc fix gflags namespace 2017-12-01 10:42:05 -08:00
table_reader.h add VerifyChecksum() to db.h 2017-08-09 15:58:13 -07:00
table_test.cc Fix multiple build failures 2018-01-16 17:30:39 -08:00
two_level_iterator.cc Change RocksDB License 2017-07-15 16:11:23 -07:00
two_level_iterator.h Change RocksDB License 2017-07-15 16:11:23 -07:00