A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Deon Nicholas c2d7826ced [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences.
Summary:
Here are the major changes to the Merge Interface. It has been expanded
to handle cases where the MergeOperator is not associative. It does so by stacking
up merge operations while scanning through the key history (i.e.: during Get() or
Compaction), until a valid Put/Delete/end-of-history is encountered; it then
applies all of the merge operations in the correct sequence starting with the
base/sentinel value.

I have also introduced an "AssociativeMerge" function which allows the user to
take advantage of associative merge operations (such as in the case of counters).
The implementation will always attempt to merge the operations/operands themselves
together when they are encountered, and will resort to the "stacking" method if
and only if the "associative-merge" fails.

This implementation is conjectured to allow MergeOperator to handle the general
case, while still providing the user with the ability to take advantage of certain
efficiencies in their own merge-operator / data-structure.

NOTE: This is a preliminary diff. This must still go through a lot of review,
revision, and testing. Feedback welcome!

Test Plan:
  -This is a preliminary diff. I have only just begun testing/debugging it.
  -I will be testing this with the existing MergeOperator use-cases and unit-tests
(counters, string-append, and redis-lists)
  -I will be "desk-checking" and walking through the code with the help gdb.
  -I will find a way of stress-testing the new interface / implementation using
db_bench, db_test, merge_test, and/or db_stress.
  -I will ensure that my tests cover all cases: Get-Memtable,
Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0,
Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found,
end-of-history, end-of-file, etc.
  -A lot of feedback from the reviewers.

Reviewers: haobo, dhruba, zshao, emayanke

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D11499
2013-08-05 20:14:32 -07:00
db [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. 2013-08-05 20:14:32 -07:00
doc merge 1.5 2012-08-28 11:43:33 -07:00
hdfs Ability to configure bufferedio-reads, filesystem-readaheads and mmap-read-write per database. 2013-03-20 23:14:03 -07:00
helpers/memenv [RocksDB] cleanup EnvOptions 2013-06-12 11:17:19 -07:00
include [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. 2013-08-05 20:14:32 -07:00
java Pom changes to make relase 1.5.7 for java. 2013-01-10 10:43:43 -08:00
linters/src fixing linters. 2012-12-14 14:05:27 -08:00
port Fix Zlib_Compress and Zlib_Uncompress 2013-06-18 16:57:42 -07:00
scribe fix db_test error with scribe logger turned on 2012-08-28 11:22:58 -07:00
snappy Build with gcc-4.7.1-glibc-2.14.1. 2012-09-17 10:56:26 -07:00
table [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. 2013-08-05 20:14:32 -07:00
thrift Implement RowLocks for assoc schema 2012-10-03 23:19:01 -07:00
tools Expose base db object from ttl wrapper 2013-08-05 18:44:14 -07:00
util Expose base db object from ttl wrapper 2013-08-05 18:44:14 -07:00
utilities [RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences. 2013-08-05 20:14:32 -07:00
VALGRIND_LOGS Use version 3.8.1 for valgrind in third_party and do away with log files 2013-03-06 17:47:31 -08:00
.arcconfig Enable linting in arc. 2013-02-01 11:34:25 -08:00
.gitignore Various build cleanups/improvements 2013-01-14 18:40:22 -08:00
build_detect_platform Modify build_detect_platform to run fbcode.*.* irrespective of $PATH 2013-05-14 22:09:01 -07:00
build_detect_version Make the build-time show up in the leveldb library. 2013-03-11 10:33:15 -07:00
build_java.sh Release 1.5.6 for Java code + Script to automate it. 2012-12-17 12:11:11 -08:00
e Enhance db_bench 2013-03-14 16:00:23 -07:00
fbcode.clang31.sh Cleanup TODO/NEWS/AUTHORS files 2013-01-25 09:11:26 -08:00
fbcode.gcc471.sh Updating fbcode.gcc471.sh to use jemalloc 3.3.1 2013-03-13 15:34:50 -07:00
LICENSE reverting disastrous MOE commit, returning to r21 2011-04-19 23:11:15 +00:00
Makefile Changing Makefile to have rocksdb instead of leveldb in binary-names 2013-08-05 11:14:01 -07:00
README Fix README contents. 2013-07-30 08:30:13 -07:00
README.fb Release 1.5.9.fb to third party 2013-04-10 17:23:58 -07:00
regression_build_test.sh Minor improvements to the regression testing 2013-01-16 14:47:20 -08:00
valgrind_test.sh make clean in valgrind_test.sh first 2013-04-23 14:25:19 -07:00

rocksdb: A persistent key-value store for flash storage
Authors: * The Facebook Database Engineering Team
         * Build on earlier work on leveldb by Sanjay Ghemawat 
           (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Stuctured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html for more explanation.
See doc/impl.html for a brief overview of the implementation.

The public interface is in include/*.h.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Guide to header files:

include/db.h
    Main interface to the DB: Start here

include/options.h
    Control over the behavior of an entire database, and also
    control over the behavior of individual reads and writes.

include/comparator.h
    Abstraction for user-specified comparison function.  If you want
    just bytewise comparison of keys, you can use the default comparator,
    but clients can write their own comparator implementations if they
    want custom ordering (e.g. to handle different character
    encodings, etc.)

include/iterator.h
    Interface for iterating over data. You can get an iterator
    from a DB object.

include/write_batch.h
    Interface for atomically applying multiple updates to a database.

include/slice.h
    A simple module for maintaining a pointer and a length into some
    other byte array.

include/status.h
    Status is returned from many of the public interfaces and is used
    to report success and various kinds of errors.

include/env.h
    Abstraction of the OS environment.  A posix implementation of
    this interface is in util/env_posix.cc

include/table_builder.h
    Lower-level modules that most clients probably won't use directly

include/cache.h
    An API for the block cache.

include/compaction_filter.h
    An API for a application filter invoked on every compaction.

include/filter_policy.h
    An API for configuring a bloom filter.

include/memtablerep.h
    An API for implementing a memtable.

include/statistics.h
    An API to retrieve various database statistics.

include/transaction_log_iterator.h
    An API to retrieve transaction logs from a database.