A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
lovro b6655a679d Replace std::priority_queue in MergingIterator with custom heap
Summary:
While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison.  Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge.

Keys in our dataset include sequence numbers that increase with time.  Adjacent keys in an L0 file are very likely to be adjacent in the full database.  Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one.  It would be great to avoid the O(log K) operation per row while compacting.

This diff replaces std::priority_queue with a custom binary heap implementation.  It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top).

Test Plan:
make check

To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number).  I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs).  The exact improvement depends on the number of L0 files and the amount of locality.  Performance on randomly distributed keys seems on par with the old code.

Reviewers: kailiu, sdong, igor

Reviewed By: igor

Subscribers: yoshinorim, dhruba, tnovak

Differential Revision: https://reviews.facebook.net/D29133
2015-07-06 04:24:09 -07:00
arcanist_util Integrate Jenkins with Phabricator 2015-04-07 11:56:29 -07:00
build_tools Add rpath option to production builds for 4.8.1 toolchain 2015-06-30 13:30:54 -07:00
coverage Fix coverage script 2014-11-03 14:53:00 -08:00
db [wal changes 1/3] fixed unbounded wal growth in some workloads 2015-07-02 14:27:00 -07:00
doc Remove seek compaction 2014-06-20 10:23:02 +02:00
examples [API Change] Improve EventListener::OnFlushCompleted interface 2015-06-05 12:28:51 -07:00
hdfs Add Env::GetThreadID(), which returns the ID of the current thread. 2015-06-11 14:18:02 -07:00
include Introduce InfoLogLevel::HEADER_LEVEL 2015-07-02 17:14:39 -07:00
java [RocksJava] Fixed test failures 2015-07-01 23:22:03 -07:00
port Build for CYGWIN 2015-04-23 21:33:44 -07:00
table Replace std::priority_queue in MergingIterator with custom heap 2015-07-06 04:24:09 -07:00
third-party Update COMMIT.md 2015-03-30 17:48:16 -07:00
tools Fix mac compile 2015-06-26 10:29:24 -07:00
util Replace std::priority_queue in MergingIterator with custom heap 2015-07-06 04:24:09 -07:00
utilities Fix unity build by removing anonymous namespace 2015-07-02 12:27:35 -07:00
.arcconfig Integrate Jenkins with Phabricator 2015-04-07 11:56:29 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
.travis.yml Don't preinstall jemalloc in Travis 2015-04-24 18:43:07 -07:00
AUTHORS Add AUTHORS file. Fix #203 2014-09-29 10:52:18 -07:00
CONTRIBUTING.md facebook accounts are not required for CLA signers 2014-07-08 05:57:54 -04:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Prepare 3.12 2015-07-02 12:20:36 -07:00
INSTALL.md Fix broken gflags link 2015-06-22 09:31:52 -07:00
LICENSE Fix copyright year 2014-03-12 12:06:58 -07:00
Makefile Remove -Wl,--no-as-needed flag when making shared_lib in OSX and IOS 2015-06-23 16:32:59 -07:00
PATENTS Update Patent Grant. 2015-04-13 10:33:43 +01:00
README.md Replaced "built on on earlier work" by "built on earlier work" in README.md 2014-09-17 01:16:17 -07:00
ROCKSDB_LITE.md Optimistic Transactions 2015-05-29 14:36:35 -07:00
src.mk Add wal files to Checkpoint for multiple column families. 2015-06-19 16:08:31 -07:00
USERS.md Add Yahoo's blog post about Sherpa to USERS.md 2015-06-09 12:55:58 -07:00
Vagrantfile RocksDB on FreeBSD support 2015-02-26 15:19:17 -08:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/