A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Igor Canadi 6fe9b57748 Refactor Recover() code
Summary:
This diff does two things:
* Rethinks how we call Recover() with read_only option. Before, we call it with pointer to memtable where we'd like to apply those changes to. This memtable is set in db_impl_readonly.cc and it's actually DBImpl::mem_. Why don't we just apply updates to mem_ right away? It seems more intuitive.
* Changes when we apply updates to manifest. Before, the process is to recover all the logs, flush it to sst files and then do one giant commit that atomically adds all recovered sst files and sets the next log number. This works good enough, but causes some small troubles for my column family approach, since I can't have one VersionEdit apply to more than single column family[1]. The change here is to commit the files recovered from logs right away. Here is the state of the world before the change:
1. Recover log 5, add new sst files to edit
2. Recover log 7, add new sst files to edit
3. Recover log 8, add new sst files to edit
4. Commit all added sst files to manifest and mark log files 5, 7 and 8 as recoverd (via SetLogNumber(9) function)
After the change, we'll do:
1. Recover log 5, commit the new sst files and set log 5 as recovered
2. Recover log 7, commit the new sst files and set log 7 as recovered
3. Recover log 8, commit the new sst files and set log 8 as recovered

The added (small) benefit is that if we fail after (2), the new recovery will only have to recover log 8. In previous case, we'll have to restart the recovery from the beginning. The bigger benefit will be to enable easier integration of multiple column families in Recovery code path.

[1] I'm happy to dicuss this decison, but I believe this is the cleanest way to go. It also makes backward compatibility much easier. We don't have a requirement of adding multiple column families atomically.

Test Plan: make check

Reviewers: dhruba, haobo, kailiu, sdong

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15237
2014-01-22 10:45:26 -08:00
build_tools Fix some "make format" issue 2014-01-16 14:26:51 -08:00
coverage Fix the gcov/lcov related issues 2013-08-22 17:01:06 -07:00
db Refactor Recover() code 2014-01-22 10:45:26 -08:00
doc Fix typo. 2013-11-28 03:57:16 +09:00
hdfs Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
helpers/memenv Change Function names from Compaction->Flush When they really mean Flush 2013-10-14 15:12:15 -07:00
include Statistics code cleanup 2014-01-17 12:46:06 -08:00
linters/src fixing linters. 2012-12-14 14:05:27 -08:00
port Print stack trace on assertion failure 2013-12-06 17:11:09 -08:00
table Statistics code cleanup 2014-01-17 12:46:06 -08:00
tools Statistics code cleanup 2014-01-17 12:46:06 -08:00
util Fix a Statistics-related unit test faulure 2014-01-21 18:02:55 -08:00
utilities Fix share_table_files condition in BackupEngine constructor. 2014-01-11 05:12:07 +09:00
.arcconfig Enable linting in arc. 2013-02-01 11:34:25 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Refactor build_tools/build_detect_version 2014-01-06 08:44:43 +02:00
CONTRIBUTING.md Reformat CONTRIBUTING.md with less than 80 characters. 2013-11-16 22:51:09 -08:00
INSTALL.md docs for shared library builds 2013-12-30 21:34:45 +08:00
LICENSE Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
Makefile Move the compilation of the shared libraries to "make release" 2014-01-14 13:54:33 -08:00
PATENTS Fix the patent format 2013-10-16 15:37:32 -07:00
README Add a pointer to the engineering design discussion forum. 2013-12-23 12:19:18 -08:00
README.fb update the latest version in README.fb to 2.7 2013-12-30 16:16:24 -08:00

rocksdb: A persistent key-value store for flash storage
Authors: * The Facebook Database Engineering Team
         * Build on earlier work on leveldb by Sanjay Ghemawat
           (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
for more explanation.

The public interface is in include/*.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Guide to header files:

include/rocksdb/db.h
    Main interface to the DB: Start here

include/rocksdb/options.h
    Control over the behavior of an entire database, and also
    control over the behavior of individual reads and writes.

include/rocksdb/comparator.h
    Abstraction for user-specified comparison function.  If you want
    just bytewise comparison of keys, you can use the default comparator,
    but clients can write their own comparator implementations if they
    want custom ordering (e.g. to handle different character
    encodings, etc.)

include/rocksdb/iterator.h
    Interface for iterating over data. You can get an iterator
    from a DB object.

include/rocksdb/write_batch.h
    Interface for atomically applying multiple updates to a database.

include/rocksdb/slice.h
    A simple module for maintaining a pointer and a length into some
    other byte array.

include/rocksdb/status.h
    Status is returned from many of the public interfaces and is used
    to report success and various kinds of errors.

include/rocksdb/env.h
    Abstraction of the OS environment.  A posix implementation of
    this interface is in util/env_posix.cc

include/rocksdb/table_builder.h
    Lower-level modules that most clients probably won't use directly

include/rocksdb/cache.h
    An API for the block cache.

include/rocksdb/compaction_filter.h
    An API for a application filter invoked on every compaction.

include/rocksdb/filter_policy.h
    An API for configuring a bloom filter.

include/rocksdb/memtablerep.h
    An API for implementing a memtable.

include/rocksdb/statistics.h
    An API to retrieve various database statistics.

include/rocksdb/transaction_log.h
    An API to retrieve transaction logs from a database.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/