A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Schalk-Willem Kruger 3d33da75ef Fix UnmarkEOF for partial blocks
Summary:
Blocks in the transaction log are a fixed size, but the last block in the transaction log file is usually a partial block. When a new record is added after the reader hit the end of the file, a new physical record will be appended to the last block. ReadPhysicalRecord can only read full blocks and assumes that the file position indicator is aligned to the start of a block. If the reader is forced to read further by simply clearing the EOF flag, ReadPhysicalRecord will read a full block starting from somewhere in the middle of a real block, causing it to lose alignment and to have a partial physical record at the end of the read buffer. This will result in length mismatches and checksum failures. When the log file is tailed for replication this will cause the log iterator to become invalid, necessitating the creation of a new iterator which will have to read the log file from scratch.

This diff fixes this issue by reading the remaining portion of the last block we read from. This is done when the reader is forced to read further (UnmarkEOF is called).

Test Plan:
- Added unit tests
- Stress test (with replication). Check dbdir/LOG file for corruptions.
- Test on test tier

Reviewers: emayanke, haobo, dhruba

Reviewed By: haobo

CC: vamsi, sheki, dhruba, kailiu, igor

Differential Revision: https://reviews.facebook.net/D15249
2014-01-27 14:49:10 -08:00
build_tools Revert "Moving to glibc-fb" 2014-01-24 11:50:38 -08:00
coverage Fix the gcov/lcov related issues 2013-08-22 17:01:06 -07:00
db Fix UnmarkEOF for partial blocks 2014-01-27 14:49:10 -08:00
doc Fix typo. 2013-11-28 03:57:16 +09:00
hdfs Fsync directory after we create a new file 2014-01-27 11:02:21 -08:00
helpers/memenv Fsync directory after we create a new file 2014-01-27 11:02:21 -08:00
include Fsync directory after we create a new file 2014-01-27 11:02:21 -08:00
linters Add google-style checker to "arc lint" 2014-01-23 15:04:12 -08:00
port Print stack trace on assertion failure 2013-12-06 17:11:09 -08:00
table Temporarily disable caching index/filter blocks 2014-01-24 10:57:15 -08:00
tools Moving Some includes from options.h to forward declaration 2014-01-24 17:16:22 -08:00
util Fsync directory after we create a new file 2014-01-27 11:02:21 -08:00
utilities Moving Some includes from options.h to forward declaration 2014-01-24 17:16:22 -08:00
.arcconfig Add google-style checker to "arc lint" 2014-01-23 15:04:12 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Add google-style checker to "arc lint" 2014-01-23 15:04:12 -08:00
CONTRIBUTING.md Reformat CONTRIBUTING.md with less than 80 characters. 2013-11-16 22:51:09 -08:00
INSTALL.md docs for shared library builds 2013-12-30 21:34:45 +08:00
LICENSE Add appropriate LICENSE and Copyright message. 2013-10-16 17:48:41 -07:00
Makefile Add a make target for shared library 2014-01-24 11:56:01 -08:00
PATENTS Fix the patent format 2013-10-16 15:37:32 -07:00
README Add a pointer to the engineering design discussion forum. 2013-12-23 12:19:18 -08:00
README.fb update the latest version in README.fb to 2.7 2013-12-30 16:16:24 -08:00

rocksdb: A persistent key-value store for flash storage
Authors: * The Facebook Database Engineering Team
         * Build on earlier work on leveldb by Sanjay Ghemawat
           (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast
key value server, especially suited for storing data on flash drives.
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
for more explanation.

The public interface is in include/*.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Guide to header files:

include/rocksdb/db.h
    Main interface to the DB: Start here

include/rocksdb/options.h
    Control over the behavior of an entire database, and also
    control over the behavior of individual reads and writes.

include/rocksdb/comparator.h
    Abstraction for user-specified comparison function.  If you want
    just bytewise comparison of keys, you can use the default comparator,
    but clients can write their own comparator implementations if they
    want custom ordering (e.g. to handle different character
    encodings, etc.)

include/rocksdb/iterator.h
    Interface for iterating over data. You can get an iterator
    from a DB object.

include/rocksdb/write_batch.h
    Interface for atomically applying multiple updates to a database.

include/rocksdb/slice.h
    A simple module for maintaining a pointer and a length into some
    other byte array.

include/rocksdb/status.h
    Status is returned from many of the public interfaces and is used
    to report success and various kinds of errors.

include/rocksdb/env.h
    Abstraction of the OS environment.  A posix implementation of
    this interface is in util/env_posix.cc

include/rocksdb/table_builder.h
    Lower-level modules that most clients probably won't use directly

include/rocksdb/cache.h
    An API for the block cache.

include/rocksdb/compaction_filter.h
    An API for a application filter invoked on every compaction.

include/rocksdb/filter_policy.h
    An API for configuring a bloom filter.

include/rocksdb/memtablerep.h
    An API for implementing a memtable.

include/rocksdb/statistics.h
    An API to retrieve various database statistics.

include/rocksdb/transaction_log.h
    An API to retrieve transaction logs from a database.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/