A library that provides an embeddable, persistent key-value store for fast storage.
554c06dd18
Summary: There is a new option called hybrid_mode which, when switched on, causes HBase style compactions. Files from L0 are compacted back into L0. This meat of this compaction algorithm is in PickCompactionHybrid(). All files reside in L0. That means all files have overlapping keys. Each file has a time-bound, i.e. each file contains a range of keys that were inserted around the same time. The start-seqno and the end-seqno refers to the timeframe when these keys were inserted. Files that have contiguous seqno are compacted together into a larger file. All files are ordered from most recent to the oldest. The current compaction algorithm starts to look for candidate files starting from the most recent file. It continues to add more files to the same compaction run as long as the sum of the files chosen till now is smaller than the next candidate file size. This logic needs to be debated and validated. The above logic should reduce write amplification to a large extent... will publish numbers shortly. Test Plan: dbstress runs for 6 hours with no data corruption (tested so far). Differential Revision: https://reviews.facebook.net/D11289 |
||
---|---|---|
db | ||
doc | ||
hdfs | ||
helpers/memenv | ||
include | ||
java | ||
linters/src | ||
port | ||
scribe | ||
snappy | ||
table | ||
thrift | ||
tools | ||
util | ||
utilities | ||
VALGRIND_LOGS | ||
.arcconfig | ||
.gitignore | ||
build_detect_platform | ||
build_detect_version | ||
build_java.sh | ||
e | ||
fbcode.clang31.sh | ||
fbcode.gcc471.sh | ||
LICENSE | ||
Makefile | ||
README | ||
README.fb | ||
regression_build_test.sh | ||
valgrind_test.sh |
rocksdb: A persistent key-value store for flash storage Authors: The Facebook Database Engineering Team This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has an Log-Stuctured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor(SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database. The core of this code has been derived from open-source leveldb. The code under this directory implements a system for maintaining a persistent key/value store. See doc/index.html for more explanation. See doc/impl.html for a brief overview of the implementation. The public interface is in include/*.h. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning. Guide to header files: include/db.h Main interface to the DB: Start here include/options.h Control over the behavior of an entire database, and also control over the behavior of individual reads and writes. include/comparator.h Abstraction for user-specified comparison function. If you want just bytewise comparison of keys, you can use the default comparator, but clients can write their own comparator implementations if they want custom ordering (e.g. to handle different character encodings, etc.) include/iterator.h Interface for iterating over data. You can get an iterator from a DB object. include/write_batch.h Interface for atomically applying multiple updates to a database. include/slice.h A simple module for maintaining a pointer and a length into some other byte array. include/status.h Status is returned from many of the public interfaces and is used to report success and various kinds of errors. include/env.h Abstraction of the OS environment. A posix implementation of this interface is in util/env_posix.cc include/table.h include/table_builder.h Lower-level modules that most clients probably won't use directly