A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Lei Jin 0f2d768191 hints for narrowing down FindFile range and avoiding checking unrelevant L0 files
Summary:
The file tree structure in Version is prebuilt and the range of each file is known.
On the Get() code path, we do binary search in FindFile() by comparing
target key with each file's largest key and also check the range for each L0 file.
With some pre-calculated knowledge, each key comparision that has been done can serve
as a hint to narrow down further searches:
(1) If a key falls within a L0 file's range, we can safely skip the next
file if its range does not overlap with the current one.
(2) If a key falls within a file's range in level L0 - Ln-1, we should only
need to binary search in the next level for files that overlap with the current one.

(1) will be able to skip some files depending one the key distribution.
(2) can greatly reduce the range of binary search, especially for bottom
levels, given that one file most likely only overlaps with N files from
the level below (where N is max_bytes_for_level_multiplier). So on level
L, we will only look at ~N files instead of N^L files.

Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this
improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s =
9.6M/s), it gives us ~13% improvement.

Test Plan: make all check

Reviewers: haobo, igor, dhruba, sdong, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17205
2014-04-21 09:10:12 -07:00
build_tools RocksDBLite 2014-04-15 13:39:26 -07:00
coverage Disable the html-based coverage report by default 2014-02-06 12:58:13 -08:00
db hints for narrowing down FindFile range and avoiding checking unrelevant L0 files 2014-04-21 09:10:12 -07:00
doc doc: table_stats_collectors -> table_properties_collectors. 2014-02-07 12:19:25 -08:00
hdfs Env to add a function to allow users to query waiting queue length 2014-03-11 10:19:02 -07:00
helpers/memenv Expose in memory Env to the world 2014-04-14 12:28:15 -07:00
include RocksDBLite 2014-04-15 13:39:26 -07:00
java Fix formatting issues 2014-04-18 10:48:48 -07:00
linters allow lambda function syntax in cpplint 2014-02-20 12:47:05 -08:00
port db_bench cleanup 2014-04-08 11:21:09 -07:00
table Use a different approach to make sure BlockBasedTableReader can use hash index on older files 2014-04-18 14:09:21 -07:00
tools hints for narrowing down FindFile range and avoiding checking unrelevant L0 files 2014-04-21 09:10:12 -07:00
util Fix ifdef NDEBUG 2014-04-17 14:29:28 -07:00
utilities RocksDBLite 2014-04-15 13:39:26 -07:00
.arcconfig Improve/fix bugs for the cpp linter 2014-02-13 17:48:11 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Make RocksDB compile for iOS 2014-04-04 13:11:44 -07:00
CONTRIBUTING.md Update to CONTRIBUTING.md 2014-02-20 10:55:54 -08:00
HISTORY.md Added period 2014-04-18 09:33:27 -07:00
INSTALL.md Make RocksDB compile for iOS 2014-04-04 13:11:44 -07:00
LICENSE Fix copyright year 2014-03-12 12:06:58 -07:00
Makefile hints for narrowing down FindFile range and avoiding checking unrelevant L0 files 2014-04-21 09:10:12 -07:00
PATENTS Fix the patent format 2013-10-16 15:37:32 -07:00
README Add a pointer to the engineering design discussion forum. 2013-12-23 12:19:18 -08:00
README.fb update the latest version in README.fb to 2.7 2013-12-30 16:16:24 -08:00
ROCKSDB_LITE.md RocksDBLite 2014-04-15 13:39:26 -07:00

rocksdb: A persistent key-value store for flash storage
Authors: * The Facebook Database Engineering Team
         * Build on earlier work on leveldb by Sanjay Ghemawat
           (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
for more explanation.

The public interface is in include/*.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Guide to header files:

include/rocksdb/db.h
    Main interface to the DB: Start here

include/rocksdb/options.h
    Control over the behavior of an entire database, and also
    control over the behavior of individual reads and writes.

include/rocksdb/comparator.h
    Abstraction for user-specified comparison function.  If you want
    just bytewise comparison of keys, you can use the default comparator,
    but clients can write their own comparator implementations if they
    want custom ordering (e.g. to handle different character
    encodings, etc.)

include/rocksdb/iterator.h
    Interface for iterating over data. You can get an iterator
    from a DB object.

include/rocksdb/write_batch.h
    Interface for atomically applying multiple updates to a database.

include/rocksdb/slice.h
    A simple module for maintaining a pointer and a length into some
    other byte array.

include/rocksdb/status.h
    Status is returned from many of the public interfaces and is used
    to report success and various kinds of errors.

include/rocksdb/env.h
    Abstraction of the OS environment.  A posix implementation of
    this interface is in util/env_posix.cc

include/rocksdb/table_builder.h
    Lower-level modules that most clients probably won't use directly

include/rocksdb/cache.h
    An API for the block cache.

include/rocksdb/compaction_filter.h
    An API for a application filter invoked on every compaction.

include/rocksdb/filter_policy.h
    An API for configuring a bloom filter.

include/rocksdb/memtablerep.h
    An API for implementing a memtable.

include/rocksdb/statistics.h
    An API to retrieve various database statistics.

include/rocksdb/transaction_log.h
    An API to retrieve transaction logs from a database.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/