Make it easier to start using RocksDB
Summary: This diff is addressing multiple things with a single goal -- to make RocksDB easier to use: * Add some functions to Options that make RocksDB easier to tune. * Add example code for both simple RocksDB and RocksDB with Column Families. * Rewrite our README.md Regarding Options, I took a stab at something we talked about for a long time: * https://www.facebook.com/groups/rocksdb.dev/permalink/563169950448190/ I added functions: * IncreaseParallelism() -- easy, increases the thread pool and max_background_compactions * OptimizeLevelStyleCompaction(memtable_memory_budget) -- the easiest way to optimize rocksdb for less stalls with level style compaction. This is very likely not ideal configuration. Feel free to suggest improvements. I used some of Mark's suggestions from here: https://github.com/facebook/rocksdb/issues/54 * OptimizeUniversalStyleCompaction(memtable_memory_budget) -- optimize for universal compaction. Test Plan: compiled rocksdb. ran examples. Reviewers: dhruba, MarkCallaghan, haobo, sdong, yhchiang Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18621
This commit is contained in:
parent
acd17fd002
commit
038a477b53
82
README
82
README
@ -1,82 +0,0 @@
|
|||||||
rocksdb: A persistent key-value store for flash storage
|
|
||||||
Authors: * The Facebook Database Engineering Team
|
|
||||||
* Build on earlier work on leveldb by Sanjay Ghemawat
|
|
||||||
(sanjay@google.com) and Jeff Dean (jeff@google.com)
|
|
||||||
|
|
||||||
This code is a library that forms the core building block for a fast
|
|
||||||
key value server, especially suited for storing data on flash drives.
|
|
||||||
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
|
|
||||||
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
|
|
||||||
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
|
|
||||||
making it specially suitable for storing multiple terabytes of data in a
|
|
||||||
single database.
|
|
||||||
|
|
||||||
The core of this code has been derived from open-source leveldb.
|
|
||||||
|
|
||||||
The code under this directory implements a system for maintaining a
|
|
||||||
persistent key/value store.
|
|
||||||
|
|
||||||
See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
|
|
||||||
for more explanation.
|
|
||||||
|
|
||||||
The public interface is in include/*. Callers should not include or
|
|
||||||
rely on the details of any other header files in this package. Those
|
|
||||||
internal APIs may be changed without warning.
|
|
||||||
|
|
||||||
Guide to header files:
|
|
||||||
|
|
||||||
include/rocksdb/db.h
|
|
||||||
Main interface to the DB: Start here
|
|
||||||
|
|
||||||
include/rocksdb/options.h
|
|
||||||
Control over the behavior of an entire database, and also
|
|
||||||
control over the behavior of individual reads and writes.
|
|
||||||
|
|
||||||
include/rocksdb/comparator.h
|
|
||||||
Abstraction for user-specified comparison function. If you want
|
|
||||||
just bytewise comparison of keys, you can use the default comparator,
|
|
||||||
but clients can write their own comparator implementations if they
|
|
||||||
want custom ordering (e.g. to handle different character
|
|
||||||
encodings, etc.)
|
|
||||||
|
|
||||||
include/rocksdb/iterator.h
|
|
||||||
Interface for iterating over data. You can get an iterator
|
|
||||||
from a DB object.
|
|
||||||
|
|
||||||
include/rocksdb/write_batch.h
|
|
||||||
Interface for atomically applying multiple updates to a database.
|
|
||||||
|
|
||||||
include/rocksdb/slice.h
|
|
||||||
A simple module for maintaining a pointer and a length into some
|
|
||||||
other byte array.
|
|
||||||
|
|
||||||
include/rocksdb/status.h
|
|
||||||
Status is returned from many of the public interfaces and is used
|
|
||||||
to report success and various kinds of errors.
|
|
||||||
|
|
||||||
include/rocksdb/env.h
|
|
||||||
Abstraction of the OS environment. A posix implementation of
|
|
||||||
this interface is in util/env_posix.cc
|
|
||||||
|
|
||||||
include/rocksdb/table_builder.h
|
|
||||||
Lower-level modules that most clients probably won't use directly
|
|
||||||
|
|
||||||
include/rocksdb/cache.h
|
|
||||||
An API for the block cache.
|
|
||||||
|
|
||||||
include/rocksdb/compaction_filter.h
|
|
||||||
An API for a application filter invoked on every compaction.
|
|
||||||
|
|
||||||
include/rocksdb/filter_policy.h
|
|
||||||
An API for configuring a bloom filter.
|
|
||||||
|
|
||||||
include/rocksdb/memtablerep.h
|
|
||||||
An API for implementing a memtable.
|
|
||||||
|
|
||||||
include/rocksdb/statistics.h
|
|
||||||
An API to retrieve various database statistics.
|
|
||||||
|
|
||||||
include/rocksdb/transaction_log.h
|
|
||||||
An API to retrieve transaction logs from a database.
|
|
||||||
|
|
||||||
Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
|
|
24
README.md
Normal file
24
README.md
Normal file
@ -0,0 +1,24 @@
|
|||||||
|
## RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
|
||||||
|
|
||||||
|
RocksDB is developed and maintained by Facebook Database Engineering Team.
|
||||||
|
It is built on on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com)
|
||||||
|
and Jeff Dean (jeff@google.com)
|
||||||
|
|
||||||
|
This code is a library that forms the core building block for a fast
|
||||||
|
key value server, especially suited for storing data on flash drives.
|
||||||
|
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
|
||||||
|
between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF)
|
||||||
|
and Space-Amplification-Factor (SAF). It has multi-threaded compactions,
|
||||||
|
making it specially suitable for storing multiple terabytes of data in a
|
||||||
|
single database.
|
||||||
|
|
||||||
|
Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples
|
||||||
|
|
||||||
|
See [doc/index.html](https://github.com/facebook/rocksdb/blob/master/doc/index.html) and
|
||||||
|
[github wiki](https://github.com/facebook/rocksdb/wiki) for more explanation.
|
||||||
|
|
||||||
|
The public interface is in `include/`. Callers should not include or
|
||||||
|
rely on the details of any other header files in this package. Those
|
||||||
|
internal APIs may be changed without warning.
|
||||||
|
|
||||||
|
Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
|
2
examples/.gitignore
vendored
Normal file
2
examples/.gitignore
vendored
Normal file
@ -0,0 +1,2 @@
|
|||||||
|
column_families_example
|
||||||
|
simple_example
|
9
examples/Makefile
Normal file
9
examples/Makefile
Normal file
@ -0,0 +1,9 @@
|
|||||||
|
include ../build_config.mk
|
||||||
|
|
||||||
|
all: simple_example column_families_example
|
||||||
|
|
||||||
|
simple_example: simple_example.cc
|
||||||
|
$(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)
|
||||||
|
|
||||||
|
column_families_example: column_families_example.cc
|
||||||
|
$(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)
|
1
examples/README.md
Normal file
1
examples/README.md
Normal file
@ -0,0 +1 @@
|
|||||||
|
Compile RocksDB first by executing `make static_lib` in parent dir
|
72
examples/column_families_example.cc
Normal file
72
examples/column_families_example.cc
Normal file
@ -0,0 +1,72 @@
|
|||||||
|
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
||||||
|
// This source code is licensed under the BSD-style license found in the
|
||||||
|
// LICENSE file in the root directory of this source tree. An additional grant
|
||||||
|
// of patent rights can be found in the PATENTS file in the same directory.
|
||||||
|
#include <cstdio>
|
||||||
|
#include <string>
|
||||||
|
#include <vector>
|
||||||
|
|
||||||
|
#include "rocksdb/db.h"
|
||||||
|
#include "rocksdb/slice.h"
|
||||||
|
#include "rocksdb/options.h"
|
||||||
|
|
||||||
|
using namespace rocksdb;
|
||||||
|
|
||||||
|
std::string kDBPath = "/tmp/rocksdb_column_families_example";
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
// open DB
|
||||||
|
Options options;
|
||||||
|
options.create_if_missing = true;
|
||||||
|
DB* db;
|
||||||
|
Status s = DB::Open(options, kDBPath, &db);
|
||||||
|
assert(s.ok());
|
||||||
|
|
||||||
|
// create column family
|
||||||
|
ColumnFamilyHandle* cf;
|
||||||
|
s = db->CreateColumnFamily(ColumnFamilyOptions(), "new_cf", &cf);
|
||||||
|
assert(s.ok());
|
||||||
|
|
||||||
|
// close DB
|
||||||
|
delete cf;
|
||||||
|
delete db;
|
||||||
|
|
||||||
|
// open DB with two column families
|
||||||
|
std::vector<ColumnFamilyDescriptor> column_families;
|
||||||
|
// have to open default column familiy
|
||||||
|
column_families.push_back(ColumnFamilyDescriptor(
|
||||||
|
kDefaultColumnFamilyName, ColumnFamilyOptions()));
|
||||||
|
// open the new one, too
|
||||||
|
column_families.push_back(ColumnFamilyDescriptor(
|
||||||
|
"new_cf", ColumnFamilyOptions()));
|
||||||
|
std::vector<ColumnFamilyHandle*> handles;
|
||||||
|
s = DB::Open(DBOptions(), kDBPath, column_families, &handles, &db);
|
||||||
|
assert(s.ok());
|
||||||
|
|
||||||
|
// put and get from non-default column family
|
||||||
|
s = db->Put(WriteOptions(), handles[1], Slice("key"), Slice("value"));
|
||||||
|
assert(s.ok());
|
||||||
|
std::string value;
|
||||||
|
s = db->Get(ReadOptions(), handles[1], Slice("key"), &value);
|
||||||
|
assert(s.ok());
|
||||||
|
|
||||||
|
// atomic write
|
||||||
|
WriteBatch batch;
|
||||||
|
batch.Put(handles[0], Slice("key2"), Slice("value2"));
|
||||||
|
batch.Put(handles[1], Slice("key3"), Slice("value3"));
|
||||||
|
batch.Delete(handles[0], Slice("key"));
|
||||||
|
s = db->Write(WriteOptions(), &batch);
|
||||||
|
assert(s.ok());
|
||||||
|
|
||||||
|
// drop column family
|
||||||
|
s = db->DropColumnFamily(handles[1]);
|
||||||
|
assert(s.ok());
|
||||||
|
|
||||||
|
// close db
|
||||||
|
for (auto handle : handles) {
|
||||||
|
delete handle;
|
||||||
|
}
|
||||||
|
delete db;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
41
examples/simple_example.cc
Normal file
41
examples/simple_example.cc
Normal file
@ -0,0 +1,41 @@
|
|||||||
|
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
||||||
|
// This source code is licensed under the BSD-style license found in the
|
||||||
|
// LICENSE file in the root directory of this source tree. An additional grant
|
||||||
|
// of patent rights can be found in the PATENTS file in the same directory.
|
||||||
|
#include <cstdio>
|
||||||
|
#include <string>
|
||||||
|
|
||||||
|
#include "rocksdb/db.h"
|
||||||
|
#include "rocksdb/slice.h"
|
||||||
|
#include "rocksdb/options.h"
|
||||||
|
|
||||||
|
using namespace rocksdb;
|
||||||
|
|
||||||
|
std::string kDBPath = "/tmp/rocksdb_simple_example";
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
DB* db;
|
||||||
|
Options options;
|
||||||
|
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
|
||||||
|
options.IncreaseParallelism();
|
||||||
|
options.OptimizeLevelStyleCompaction();
|
||||||
|
// create the DB if it's not already present
|
||||||
|
options.create_if_missing = true;
|
||||||
|
|
||||||
|
// open DB
|
||||||
|
Status s = DB::Open(options, kDBPath, &db);
|
||||||
|
assert(s.ok());
|
||||||
|
|
||||||
|
// Put key-value
|
||||||
|
s = db->Put(WriteOptions(), "key", "value");
|
||||||
|
assert(s.ok());
|
||||||
|
std::string value;
|
||||||
|
// get value
|
||||||
|
s = db->Get(ReadOptions(), "key", &value);
|
||||||
|
assert(s.ok());
|
||||||
|
assert(value == "value");
|
||||||
|
|
||||||
|
delete db;
|
||||||
|
|
||||||
|
return 0;
|
||||||
|
}
|
@ -76,6 +76,29 @@ enum UpdateStatus { // Return status For inplace update callback
|
|||||||
struct Options;
|
struct Options;
|
||||||
|
|
||||||
struct ColumnFamilyOptions {
|
struct ColumnFamilyOptions {
|
||||||
|
// Some functions that make it easier to optimize RocksDB
|
||||||
|
|
||||||
|
// Use this if you don't need to keep the data sorted, i.e. you'll never use
|
||||||
|
// an iterator, only Put() and Get() API calls
|
||||||
|
ColumnFamilyOptions* OptimizeForPointLookup();
|
||||||
|
|
||||||
|
// Default values for some parameters in ColumnFamilyOptions are not
|
||||||
|
// optimized for heavy workloads and big datasets, which means you might
|
||||||
|
// observe write stalls under some conditions. As a starting point for tuning
|
||||||
|
// RocksDB options, use the following two functions:
|
||||||
|
// * OptimizeLevelStyleCompaction -- optimizes level style compaction
|
||||||
|
// * OptimizeUniversalStyleCompaction -- optimizes universal style compaction
|
||||||
|
// Universal style compaction is focused on reducing Write Amplification
|
||||||
|
// Factor for big data sets, but increases Space Amplification. You can learn
|
||||||
|
// more about the different styles here:
|
||||||
|
// https://github.com/facebook/rocksdb/wiki/Rocksdb-Architecture-Guide
|
||||||
|
// Note: we might use more memory than memtable_memory_budget during high
|
||||||
|
// write rate period
|
||||||
|
ColumnFamilyOptions* OptimizeLevelStyleCompaction(
|
||||||
|
uint64_t memtable_memory_budget = 512 * 1024 * 1024);
|
||||||
|
ColumnFamilyOptions* OptimizeUniversalStyleCompaction(
|
||||||
|
uint64_t memtable_memory_budget = 512 * 1024 * 1024);
|
||||||
|
|
||||||
// -------------------
|
// -------------------
|
||||||
// Parameters that affect behavior
|
// Parameters that affect behavior
|
||||||
|
|
||||||
@ -336,6 +359,7 @@ struct ColumnFamilyOptions {
|
|||||||
// With bloomfilter and fast storage, a miss on one level
|
// With bloomfilter and fast storage, a miss on one level
|
||||||
// is very cheap if the file handle is cached in table cache
|
// is very cheap if the file handle is cached in table cache
|
||||||
// (which is true if max_open_files is large).
|
// (which is true if max_open_files is large).
|
||||||
|
// Default: true
|
||||||
bool disable_seek_compaction;
|
bool disable_seek_compaction;
|
||||||
|
|
||||||
// Puts are delayed 0-1 ms when any level has a compaction score that exceeds
|
// Puts are delayed 0-1 ms when any level has a compaction score that exceeds
|
||||||
@ -546,6 +570,15 @@ struct ColumnFamilyOptions {
|
|||||||
};
|
};
|
||||||
|
|
||||||
struct DBOptions {
|
struct DBOptions {
|
||||||
|
// Some functions that make it easier to optimize RocksDB
|
||||||
|
|
||||||
|
// By default, RocksDB uses only one background thread for flush and
|
||||||
|
// compaction. Calling this function will set it up such that total of
|
||||||
|
// `total_threads` is used. Good value for `total_threads` is the number of
|
||||||
|
// cores. You almost definitely want to call this function if your system is
|
||||||
|
// bottlenecked by RocksDB.
|
||||||
|
DBOptions* IncreaseParallelism(int total_threads = 16);
|
||||||
|
|
||||||
// If true, the database will be created if it is missing.
|
// If true, the database will be created if it is missing.
|
||||||
// Default: false
|
// Default: false
|
||||||
bool create_if_missing;
|
bool create_if_missing;
|
||||||
|
@ -480,4 +480,68 @@ Options::PrepareForBulkLoad()
|
|||||||
return this;
|
return this;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Optimization functions
|
||||||
|
ColumnFamilyOptions* ColumnFamilyOptions::OptimizeForPointLookup() {
|
||||||
|
prefix_extractor.reset(NewNoopTransform());
|
||||||
|
BlockBasedTableOptions block_based_options;
|
||||||
|
block_based_options.index_type = BlockBasedTableOptions::kBinarySearch;
|
||||||
|
table_factory.reset(new BlockBasedTableFactory(block_based_options));
|
||||||
|
memtable_factory.reset(NewHashLinkListRepFactory());
|
||||||
|
return this;
|
||||||
|
}
|
||||||
|
|
||||||
|
ColumnFamilyOptions* ColumnFamilyOptions::OptimizeLevelStyleCompaction(
|
||||||
|
uint64_t memtable_memory_budget) {
|
||||||
|
write_buffer_size = memtable_memory_budget / 4;
|
||||||
|
// merge two memtables when flushing to L0
|
||||||
|
min_write_buffer_number_to_merge = 2;
|
||||||
|
// this means we'll use 50% extra memory in the worst case, but will reduce
|
||||||
|
// write stalls.
|
||||||
|
max_write_buffer_number = 6;
|
||||||
|
// start flushing L0->L1 as soon as possible. each file on level0 is
|
||||||
|
// (memtable_memory_budget / 2). This will flush level 0 when it's bigger than
|
||||||
|
// memtable_memory_budget.
|
||||||
|
level0_file_num_compaction_trigger = 2;
|
||||||
|
// doesn't really matter much, but we don't want to create too many files
|
||||||
|
target_file_size_base = memtable_memory_budget / 8;
|
||||||
|
// make Level1 size equal to Level0 size, so that L0->L1 compactions are fast
|
||||||
|
max_bytes_for_level_base = memtable_memory_budget;
|
||||||
|
|
||||||
|
// level style compaction
|
||||||
|
compaction_style = kCompactionStyleLevel;
|
||||||
|
|
||||||
|
// only compress levels >= 2
|
||||||
|
compression_per_level.resize(num_levels);
|
||||||
|
for (int i = 0; i < num_levels; ++i) {
|
||||||
|
if (i < 2) {
|
||||||
|
compression_per_level[i] = kNoCompression;
|
||||||
|
} else {
|
||||||
|
compression_per_level[i] = kSnappyCompression;
|
||||||
|
}
|
||||||
|
}
|
||||||
|
return this;
|
||||||
|
}
|
||||||
|
|
||||||
|
ColumnFamilyOptions* ColumnFamilyOptions::OptimizeUniversalStyleCompaction(
|
||||||
|
uint64_t memtable_memory_budget) {
|
||||||
|
write_buffer_size = memtable_memory_budget / 4;
|
||||||
|
// merge two memtables when flushing to L0
|
||||||
|
min_write_buffer_number_to_merge = 2;
|
||||||
|
// this means we'll use 50% extra memory in the worst case, but will reduce
|
||||||
|
// write stalls.
|
||||||
|
max_write_buffer_number = 6;
|
||||||
|
// universal style compaction
|
||||||
|
compaction_style = kCompactionStyleUniversal;
|
||||||
|
compaction_options_universal.compression_size_percent = 80;
|
||||||
|
return this;
|
||||||
|
}
|
||||||
|
|
||||||
|
DBOptions* DBOptions::IncreaseParallelism(int total_threads) {
|
||||||
|
max_background_compactions = total_threads - 1;
|
||||||
|
max_background_flushes = 1;
|
||||||
|
env->SetBackgroundThreads(total_threads, Env::LOW);
|
||||||
|
env->SetBackgroundThreads(1, Env::HIGH);
|
||||||
|
return this;
|
||||||
|
}
|
||||||
|
|
||||||
} // namespace rocksdb
|
} // namespace rocksdb
|
||||||
|
Loading…
Reference in New Issue
Block a user