A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Vamsi Ponnekanti 465b9103f8 [Add a second kind of verification to db_stress
Summary:
Currently the test tracks all writes in memory and
uses it for verification at the end. This has 4 problems:
(a) It needs mutex for each write to ensure in-memory update
and leveldb update are done atomically. This slows down the
benchmark.
(b) Verification phase at the end is time consuming as well
(c) Does not test batch writes or snapshots
(d) We cannot kill the test and restart multiple times in a
loop because in-memory state will be lost.

I am adding a FLAGS_multi that does MultiGet/MultiPut/MultiDelete
instead of get/put/delete to get/put/delete a group of related
keys with same values atomically. Every get retrieves the group
of keys and checks that their values are same. This does not have
the above problems but the downside is that it does less amount
of validation than the other approach.

Test Plan:
This whole this is a test! Here is a small run. I am doing larger run now.

[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --ops_per_thread=10000 --multi=1 --ops_per_key=25
LevelDB version     : 1.5
Number of threads   : 32
Ops per thread      : 10000
Read percentage     : 10
Delete percentage   : 30
Max key             : 2147483648
Num times DB reopens: 10
Num keys per lock   : 4
Compression         : snappy
------------------------------------------------
Creating 536870912 locks
2013/02/20-16:59:32  Starting database operations
Created bg thread 0x7f9ebcfff700
2013/02/20-16:59:37  Reopening database for the 1th time
2013/02/20-16:59:46  Reopening database for the 2th time
2013/02/20-16:59:57  Reopening database for the 3th time
2013/02/20-17:00:11  Reopening database for the 4th time
2013/02/20-17:00:25  Reopening database for the 5th time
2013/02/20-17:00:36  Reopening database for the 6th time
2013/02/20-17:00:47  Reopening database for the 7th time
2013/02/20-17:00:59  Reopening database for the 8th time
2013/02/20-17:01:10  Reopening database for the 9th time
2013/02/20-17:01:20  Reopening database for the 10th time
2013/02/20-17:01:31  Reopening database for the 11th time
2013/02/20-17:01:31  Starting verification
Stress Test : 109.125 micros/op 22191 ops/sec
            : Wrote 0.00 MB (0.23 MB/sec) (59% of 32 ops)
            : Deleted 10 times
2013/02/20-17:01:31  Verification successful

Revert Plan: OK

Task ID: #

Reviewers: dhruba, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D8733
2013-02-22 12:20:11 -08:00
db Measure compaction time. 2013-02-22 11:38:40 -08:00
doc merge 1.5 2012-08-28 11:43:33 -07:00
hdfs Fix a number of object lifetime/ownership issues 2013-01-23 16:54:11 -08:00
helpers/memenv Fix a number of object lifetime/ownership issues 2013-01-23 16:54:11 -08:00
include/leveldb Measure compaction time. 2013-02-22 11:38:40 -08:00
java Pom changes to make relase 1.5.7 for java. 2013-01-10 10:43:43 -08:00
linters/src fixing linters. 2012-12-14 14:05:27 -08:00
port Make compression options configurable. These include window-bits, level and strategy for ZlibCompression 2012-11-02 11:26:39 -07:00
scribe fix db_test error with scribe logger turned on 2012-08-28 11:22:58 -07:00
snappy Build with gcc-4.7.1-glibc-2.14.1. 2012-09-17 10:56:26 -07:00
table Fixed cache key for block cache 2013-01-31 15:20:24 -08:00
thrift Implement RowLocks for assoc schema 2012-10-03 23:19:01 -07:00
tools [Add a second kind of verification to db_stress 2013-02-22 12:20:11 -08:00
util ldb waldump to print the keys along with other stats + NULL to nullptr in ldb_cmd.cc 2013-02-20 11:01:37 -08:00
.arcconfig Enable linting in arc. 2013-02-01 11:34:25 -08:00
.gitignore Various build cleanups/improvements 2013-01-14 18:40:22 -08:00
build_detect_platform Add optional clang compile mode 2013-01-15 18:48:37 -08:00
build_detect_version Stop continually re-creating build_version.c 2013-01-24 17:51:39 -08:00
build_java.sh Release 1.5.6 for Java code + Script to automate it. 2012-12-17 12:11:11 -08:00
fbcode.clang31.sh Cleanup TODO/NEWS/AUTHORS files 2013-01-25 09:11:26 -08:00
fbcode.gcc471.sh Add zlib to our builds and tweak histogram output 2013-02-07 15:31:53 -08:00
LICENSE reverting disastrous MOE commit, returning to r21 2011-04-19 23:11:15 +00:00
Makefile Adding a rule in the Makefile to run valgrind on the rocksdb tests 2013-02-21 18:41:00 -08:00
README cleanup README. 2013-02-18 19:42:29 -08:00
README.fb Cleanup README.fb 2013-02-19 09:54:54 -08:00
regression_build_test.sh Minor improvements to the regression testing 2013-01-16 14:47:20 -08:00

rocksdb: A persistent key-value store for flash storage
Authors: The Facebook Database Engineering Team

This code is a library that forms the core building block for a fast 
key value server, especially suited for storing data on flash drives.
It has an Log-Stuctured-Merge-Database (LSM) design with flexible tradeoffs
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
making it specially suitable for storing multiple terabytes of data in a
single database.

The core of this code has been derived from open-source leveldb.

The code under this directory implements a system for maintaining a
persistent key/value store.

See doc/index.html for more explanation.
See doc/impl.html for a brief overview of the implementation.

The public interface is in include/*.h.  Callers should not include or
rely on the details of any other header files in this package.  Those
internal APIs may be changed without warning.

Guide to header files:

include/db.h
    Main interface to the DB: Start here

include/options.h
    Control over the behavior of an entire database, and also
    control over the behavior of individual reads and writes.

include/comparator.h
    Abstraction for user-specified comparison function.  If you want
    just bytewise comparison of keys, you can use the default comparator,
    but clients can write their own comparator implementations if they
    want custom ordering (e.g. to handle different character
    encodings, etc.)

include/iterator.h
    Interface for iterating over data. You can get an iterator
    from a DB object.

include/write_batch.h
    Interface for atomically applying multiple updates to a database.

include/slice.h
    A simple module for maintaining a pointer and a length into some
    other byte array.

include/status.h
    Status is returned from many of the public interfaces and is used
    to report success and various kinds of errors.

include/env.h
    Abstraction of the OS environment.  A posix implementation of
    this interface is in util/env_posix.cc

include/table.h
include/table_builder.h
    Lower-level modules that most clients probably won't use directly