rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.

Go to file

Sagar Vemuri d938226af4 Improve performance of long range scans with readahead Summary: This change improves the performance of iterators doing long range scans (e.g. big/full table scans in MyRocks) by using readahead and prefetching additional data on each disk IO. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan. Constraints: - The prefetched data is stored by the OS in page cache. So this currently works only for non direct-reads use-cases i.e applications which use page cache. (Direct-I/O support will be enabled in a later PR). - This gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value). Thanks to siying for the original idea and implementation. Benchmarks: Data fill: ``` TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes ``` Do a long range scan: Seekrandom with large number of nexts ``` TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram ``` Page cache was cleared before each experiment with the command: ``` sudo sh -c "echo 3 > /proc/sys/vm/drop_caches" ``` ``` Before: seekrandom : 34020.945 micros/op 29 ops/sec; 32.5 MB/s (1636 of 1999 found) With this change: seekrandom : 8726.912 micros/op 114 ops/sec; 126.8 MB/s (5702 of 6999 found) ``` ~3.9X performance improvement. Also verified with strace and gdb that the readahead size is increasing as expected. ``` strace -e readahead -f -T -t -p <db_bench process pid> ``` Closes https://github.com/facebook/rocksdb/pull/3282 Differential Revision: D6586477 Pulled By: sagar0 fbshipit-source-id: 8a118a0ed4594fbb7f5b1cafb242d7a4033cb58c		2018-01-25 21:41:53 -08:00
buckifier	Remove `import` use from TARGETS	2017-11-30 15:27:34 -08:00
build_tools	FreeBSD build support for RocksDB and RocksJava	2018-01-11 13:29:55 -08:00
cache	fix gflags namespace	2017-12-01 10:42:05 -08:00
cmake	add missing config checks to CMakeLists.txt	2017-11-30 22:57:00 -08:00
coverage	Fix /bin/bash shebangs	2017-08-03 15:56:46 -07:00
db	WritePrepared Txn: Fix DBIterator and add test	2018-01-23 16:57:11 -08:00
docs	fix Gemfile.lock nokogiri dependencies	2018-01-11 20:11:32 -08:00
env	Add a Close() method to DB to return status when closing a db	2018-01-16 11:08:57 -08:00
examples	Pinnableslice examples and blog post	2017-08-24 12:26:07 -07:00
hdfs	Revert "comment out unused parameters"	2017-07-21 18:26:26 -07:00
include/rocksdb	Update comments about default WALRecoveryMode	2018-01-25 18:12:08 -08:00
java	FreeBSD build support for RocksDB and RocksJava	2018-01-11 13:29:55 -08:00
memtable	fix gflags namespace	2017-12-01 10:42:05 -08:00
monitoring	fix ThreadStatus for bottom-pri compaction threads	2017-12-14 14:57:49 -08:00
options	DB::DumpSupportInfo should log all supported compression types	2018-01-23 14:44:12 -08:00
port	FreeBSD build support for RocksDB and RocksJava	2018-01-11 13:29:55 -08:00
table	Improve performance of long range scans with readahead	2018-01-25 21:41:53 -08:00
third-party	Enable MSVC W4 with a few exceptions. Fix warnings and bugs	2017-10-19 10:57:12 -07:00
tools	Add 5.10.fb to tools/check_format_compatible.sh	2018-01-19 12:42:07 -08:00
util	Add a Close() method to DB to return status when closing a db	2018-01-16 11:08:57 -08:00
utilities	Blob DB: dump blob_db_options.min_blob_size	2018-01-22 22:41:27 -08:00
.clang-format	A script that automatically reformat affected lines	2014-01-14 12:21:24 -08:00
.gitignore	Remove leftover references to phutil_module_cache	2017-08-23 12:12:21 -07:00
.travis.yml	CMake cross platform Java support and add JNI to travis	2017-11-28 12:27:53 -08:00
appveyor.yml	Make Windows dep switches compatible with other builds	2018-01-05 14:56:54 -08:00
AUTHORS	Update RocksDB Authors File	2017-10-18 14:42:10 -07:00
CMakeLists.txt	CMake changes for CRC32 Optimization on PowerPC	2018-01-23 16:57:11 -08:00
CODE_OF_CONDUCT.md	Add Code of Conduct	2017-12-05 18:42:35 -08:00
CONTRIBUTING.md	Add Code of Conduct	2017-12-05 18:42:35 -08:00
COPYING	Add GPLv2 as an alternative license.	2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md	options.delayed_write_rate use the rate of rate_limiter by default.	2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md	First version of rocksdb_dump and rocksdb_undump.	2015-06-19 16:24:36 -07:00
HISTORY.md	Improve performance of long range scans with readahead	2018-01-25 21:41:53 -08:00
INSTALL.md	FreeBSD build support for RocksDB and RocksJava	2018-01-11 13:29:55 -08:00
issue_template.md	Add a template for issues	2017-09-29 11:41:28 -07:00
LANGUAGE-BINDINGS.md	Add Elixir to the list of language bindings	2017-11-21 10:13:14 -08:00
LICENSE.Apache	Change RocksDB License	2017-07-15 16:11:23 -07:00
LICENSE.leveldb	Add back the LevelDB license file	2017-07-16 18:42:18 -07:00
Makefile	Revert Snappy version upgrade	2018-01-12 23:41:43 -08:00
README.md	Add Jenkins for PPC64le build status badge	2018-01-11 14:57:45 -08:00
ROCKSDB_LITE.md	Optimistic Transactions	2015-05-29 14:36:35 -07:00
src.mk	Refactor ReadBlockContents()	2017-12-11 15:27:32 -08:00
TARGETS	WritePrepared Txn: make buck tests parallel	2017-12-18 14:42:09 -08:00
thirdparty.inc	Make Windows dep switches compatible with other builds	2018-01-05 14:56:54 -08:00
USERS.md	Added ProfaneDB	2017-11-19 10:11:44 -08:00
Vagrantfile	Update Vagrant file (test internal phabricator workflow)	2016-10-28 15:39:19 -07:00
WINDOWS_PORT.md	Commit both PR and internal code review changes	2015-07-07 16:58:20 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/