Make it easier to start using RocksDB
Summary: This diff is addressing multiple things with a single goal -- to make RocksDB easier to use: * Add some functions to Options that make RocksDB easier to tune. * Add example code for both simple RocksDB and RocksDB with Column Families. * Rewrite our README.md Regarding Options, I took a stab at something we talked about for a long time: * https://www.facebook.com/groups/rocksdb.dev/permalink/563169950448190/ I added functions: * IncreaseParallelism() -- easy, increases the thread pool and max_background_compactions * OptimizeLevelStyleCompaction(memtable_memory_budget) -- the easiest way to optimize rocksdb for less stalls with level style compaction. This is very likely not ideal configuration. Feel free to suggest improvements. I used some of Mark's suggestions from here: https://github.com/facebook/rocksdb/issues/54 * OptimizeUniversalStyleCompaction(memtable_memory_budget) -- optimize for universal compaction. Test Plan: compiled rocksdb. ran examples. Reviewers: dhruba, MarkCallaghan, haobo, sdong, yhchiang Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18621
This commit is contained in:
parent
acd17fd002
commit
038a477b53
82
README
82
README
@ -1,82 +0,0 @@
|
||||
rocksdb: A persistent key-value store for flash storage
|
||||
Authors: * The Facebook Database Engineering Team
|
||||
* Build on earlier work on leveldb by Sanjay Ghemawat
|
||||
(sanjay@google.com) and Jeff Dean (jeff@google.com)
|
||||
|
||||
This code is a library that forms the core building block for a fast
|
||||
key value server, especially suited for storing data on flash drives.
|
||||
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
|
||||
between Write-Amplification-Factor(WAF), Read-Amplification-Factor (RAF)
|
||||
and Space-Amplification-Factor(SAF). It has multi-threaded compactions,
|
||||
making it specially suitable for storing multiple terabytes of data in a
|
||||
single database.
|
||||
|
||||
The core of this code has been derived from open-source leveldb.
|
||||
|
||||
The code under this directory implements a system for maintaining a
|
||||
persistent key/value store.
|
||||
|
||||
See doc/index.html and github wiki (https://github.com/facebook/rocksdb/wiki)
|
||||
for more explanation.
|
||||
|
||||
The public interface is in include/*. Callers should not include or
|
||||
rely on the details of any other header files in this package. Those
|
||||
internal APIs may be changed without warning.
|
||||
|
||||
Guide to header files:
|
||||
|
||||
include/rocksdb/db.h
|
||||
Main interface to the DB: Start here
|
||||
|
||||
include/rocksdb/options.h
|
||||
Control over the behavior of an entire database, and also
|
||||
control over the behavior of individual reads and writes.
|
||||
|
||||
include/rocksdb/comparator.h
|
||||
Abstraction for user-specified comparison function. If you want
|
||||
just bytewise comparison of keys, you can use the default comparator,
|
||||
but clients can write their own comparator implementations if they
|
||||
want custom ordering (e.g. to handle different character
|
||||
encodings, etc.)
|
||||
|
||||
include/rocksdb/iterator.h
|
||||
Interface for iterating over data. You can get an iterator
|
||||
from a DB object.
|
||||
|
||||
include/rocksdb/write_batch.h
|
||||
Interface for atomically applying multiple updates to a database.
|
||||
|
||||
include/rocksdb/slice.h
|
||||
A simple module for maintaining a pointer and a length into some
|
||||
other byte array.
|
||||
|
||||
include/rocksdb/status.h
|
||||
Status is returned from many of the public interfaces and is used
|
||||
to report success and various kinds of errors.
|
||||
|
||||
include/rocksdb/env.h
|
||||
Abstraction of the OS environment. A posix implementation of
|
||||
this interface is in util/env_posix.cc
|
||||
|
||||
include/rocksdb/table_builder.h
|
||||
Lower-level modules that most clients probably won't use directly
|
||||
|
||||
include/rocksdb/cache.h
|
||||
An API for the block cache.
|
||||
|
||||
include/rocksdb/compaction_filter.h
|
||||
An API for a application filter invoked on every compaction.
|
||||
|
||||
include/rocksdb/filter_policy.h
|
||||
An API for configuring a bloom filter.
|
||||
|
||||
include/rocksdb/memtablerep.h
|
||||
An API for implementing a memtable.
|
||||
|
||||
include/rocksdb/statistics.h
|
||||
An API to retrieve various database statistics.
|
||||
|
||||
include/rocksdb/transaction_log.h
|
||||
An API to retrieve transaction logs from a database.
|
||||
|
||||
Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
|
24
README.md
Normal file
24
README.md
Normal file
@ -0,0 +1,24 @@
|
||||
## RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
|
||||
|
||||
RocksDB is developed and maintained by Facebook Database Engineering Team.
|
||||
It is built on on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com)
|
||||
and Jeff Dean (jeff@google.com)
|
||||
|
||||
This code is a library that forms the core building block for a fast
|
||||
key value server, especially suited for storing data on flash drives.
|
||||
It has an Log-Structured-Merge-Database (LSM) design with flexible tradeoffs
|
||||
between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF)
|
||||
and Space-Amplification-Factor (SAF). It has multi-threaded compactions,
|
||||
making it specially suitable for storing multiple terabytes of data in a
|
||||
single database.
|
||||
|
||||
Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples
|
||||
|
||||
See [doc/index.html](https://github.com/facebook/rocksdb/blob/master/doc/index.html) and
|
||||
[github wiki](https://github.com/facebook/rocksdb/wiki) for more explanation.
|
||||
|
||||
The public interface is in `include/`. Callers should not include or
|
||||
rely on the details of any other header files in this package. Those
|
||||
internal APIs may be changed without warning.
|
||||
|
||||
Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
|
2
examples/.gitignore
vendored
Normal file
2
examples/.gitignore
vendored
Normal file
@ -0,0 +1,2 @@
|
||||
column_families_example
|
||||
simple_example
|
9
examples/Makefile
Normal file
9
examples/Makefile
Normal file
@ -0,0 +1,9 @@
|
||||
include ../build_config.mk
|
||||
|
||||
all: simple_example column_families_example
|
||||
|
||||
simple_example: simple_example.cc
|
||||
$(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)
|
||||
|
||||
column_families_example: column_families_example.cc
|
||||
$(CXX) $(CXXFLAGS) $@.cc -o$@ ../librocksdb.a -I../include -O2 -std=c++11 $(PLATFORM_LDFLAGS) $(PLATFORM_CXXFLAGS) $(EXEC_LDFLAGS)
|
1
examples/README.md
Normal file
1
examples/README.md
Normal file
@ -0,0 +1 @@
|
||||
Compile RocksDB first by executing `make static_lib` in parent dir
|
72
examples/column_families_example.cc
Normal file
72
examples/column_families_example.cc
Normal file
@ -0,0 +1,72 @@
|
||||
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
||||
// This source code is licensed under the BSD-style license found in the
|
||||
// LICENSE file in the root directory of this source tree. An additional grant
|
||||
// of patent rights can be found in the PATENTS file in the same directory.
|
||||
#include <cstdio>
|
||||
#include <string>
|
||||
#include <vector>
|
||||
|
||||
#include "rocksdb/db.h"
|
||||
#include "rocksdb/slice.h"
|
||||
#include "rocksdb/options.h"
|
||||
|
||||
using namespace rocksdb;
|
||||
|
||||
std::string kDBPath = "/tmp/rocksdb_column_families_example";
|
||||
|
||||
int main() {
|
||||
// open DB
|
||||
Options options;
|
||||
options.create_if_missing = true;
|
||||
DB* db;
|
||||
Status s = DB::Open(options, kDBPath, &db);
|
||||
assert(s.ok());
|
||||
|
||||
// create column family
|
||||
ColumnFamilyHandle* cf;
|
||||
s = db->CreateColumnFamily(ColumnFamilyOptions(), "new_cf", &cf);
|
||||
assert(s.ok());
|
||||
|
||||
// close DB
|
||||
delete cf;
|
||||
delete db;
|
||||
|
||||
// open DB with two column families
|
||||
std::vector<ColumnFamilyDescriptor> column_families;
|
||||
// have to open default column familiy
|
||||
column_families.push_back(ColumnFamilyDescriptor(
|
||||
kDefaultColumnFamilyName, ColumnFamilyOptions()));
|
||||
// open the new one, too
|
||||
column_families.push_back(ColumnFamilyDescriptor(
|
||||
"new_cf", ColumnFamilyOptions()));
|
||||
std::vector<ColumnFamilyHandle*> handles;
|
||||
s = DB::Open(DBOptions(), kDBPath, column_families, &handles, &db);
|
||||
assert(s.ok());
|
||||
|
||||
// put and get from non-default column family
|
||||
s = db->Put(WriteOptions(), handles[1], Slice("key"), Slice("value"));
|
||||
assert(s.ok());
|
||||
std::string value;
|
||||
s = db->Get(ReadOptions(), handles[1], Slice("key"), &value);
|
||||
assert(s.ok());
|
||||
|
||||
// atomic write
|
||||
WriteBatch batch;
|
||||
batch.Put(handles[0], Slice("key2"), Slice("value2"));
|
||||
batch.Put(handles[1], Slice("key3"), Slice("value3"));
|
||||
batch.Delete(handles[0], Slice("key"));
|
||||
s = db->Write(WriteOptions(), &batch);
|
||||
assert(s.ok());
|
||||
|
||||
// drop column family
|
||||
s = db->DropColumnFamily(handles[1]);
|
||||
assert(s.ok());
|
||||
|
||||
// close db
|
||||
for (auto handle : handles) {
|
||||
delete handle;
|
||||
}
|
||||
delete db;
|
||||
|
||||
return 0;
|
||||
}
|
41
examples/simple_example.cc
Normal file
41
examples/simple_example.cc
Normal file
@ -0,0 +1,41 @@
|
||||
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
||||
// This source code is licensed under the BSD-style license found in the
|
||||
// LICENSE file in the root directory of this source tree. An additional grant
|
||||
// of patent rights can be found in the PATENTS file in the same directory.
|
||||
#include <cstdio>
|
||||
#include <string>
|
||||
|
||||
#include "rocksdb/db.h"
|
||||
#include "rocksdb/slice.h"
|
||||
#include "rocksdb/options.h"
|
||||
|
||||
using namespace rocksdb;
|
||||
|
||||
std::string kDBPath = "/tmp/rocksdb_simple_example";
|
||||
|
||||
int main() {
|
||||
DB* db;
|
||||
Options options;
|
||||
// Optimize RocksDB. This is the easiest way to get RocksDB to perform well
|
||||
options.IncreaseParallelism();
|
||||
options.OptimizeLevelStyleCompaction();
|
||||
// create the DB if it's not already present
|
||||
options.create_if_missing = true;
|
||||
|
||||
// open DB
|
||||
Status s = DB::Open(options, kDBPath, &db);
|
||||
assert(s.ok());
|
||||
|
||||
// Put key-value
|
||||
s = db->Put(WriteOptions(), "key", "value");
|
||||
assert(s.ok());
|
||||
std::string value;
|
||||
// get value
|
||||
s = db->Get(ReadOptions(), "key", &value);
|
||||
assert(s.ok());
|
||||
assert(value == "value");
|
||||
|
||||
delete db;
|
||||
|
||||
return 0;
|
||||
}
|
@ -76,6 +76,29 @@ enum UpdateStatus { // Return status For inplace update callback
|
||||
struct Options;
|
||||
|
||||
struct ColumnFamilyOptions {
|
||||
// Some functions that make it easier to optimize RocksDB
|
||||
|
||||
// Use this if you don't need to keep the data sorted, i.e. you'll never use
|
||||
// an iterator, only Put() and Get() API calls
|
||||
ColumnFamilyOptions* OptimizeForPointLookup();
|
||||
|
||||
// Default values for some parameters in ColumnFamilyOptions are not
|
||||
// optimized for heavy workloads and big datasets, which means you might
|
||||
// observe write stalls under some conditions. As a starting point for tuning
|
||||
// RocksDB options, use the following two functions:
|
||||
// * OptimizeLevelStyleCompaction -- optimizes level style compaction
|
||||
// * OptimizeUniversalStyleCompaction -- optimizes universal style compaction
|
||||
// Universal style compaction is focused on reducing Write Amplification
|
||||
// Factor for big data sets, but increases Space Amplification. You can learn
|
||||
// more about the different styles here:
|
||||
// https://github.com/facebook/rocksdb/wiki/Rocksdb-Architecture-Guide
|
||||
// Note: we might use more memory than memtable_memory_budget during high
|
||||
// write rate period
|
||||
ColumnFamilyOptions* OptimizeLevelStyleCompaction(
|
||||
uint64_t memtable_memory_budget = 512 * 1024 * 1024);
|
||||
ColumnFamilyOptions* OptimizeUniversalStyleCompaction(
|
||||
uint64_t memtable_memory_budget = 512 * 1024 * 1024);
|
||||
|
||||
// -------------------
|
||||
// Parameters that affect behavior
|
||||
|
||||
@ -336,6 +359,7 @@ struct ColumnFamilyOptions {
|
||||
// With bloomfilter and fast storage, a miss on one level
|
||||
// is very cheap if the file handle is cached in table cache
|
||||
// (which is true if max_open_files is large).
|
||||
// Default: true
|
||||
bool disable_seek_compaction;
|
||||
|
||||
// Puts are delayed 0-1 ms when any level has a compaction score that exceeds
|
||||
@ -546,6 +570,15 @@ struct ColumnFamilyOptions {
|
||||
};
|
||||
|
||||
struct DBOptions {
|
||||
// Some functions that make it easier to optimize RocksDB
|
||||
|
||||
// By default, RocksDB uses only one background thread for flush and
|
||||
// compaction. Calling this function will set it up such that total of
|
||||
// `total_threads` is used. Good value for `total_threads` is the number of
|
||||
// cores. You almost definitely want to call this function if your system is
|
||||
// bottlenecked by RocksDB.
|
||||
DBOptions* IncreaseParallelism(int total_threads = 16);
|
||||
|
||||
// If true, the database will be created if it is missing.
|
||||
// Default: false
|
||||
bool create_if_missing;
|
||||
|
@ -480,4 +480,68 @@ Options::PrepareForBulkLoad()
|
||||
return this;
|
||||
}
|
||||
|
||||
// Optimization functions
|
||||
ColumnFamilyOptions* ColumnFamilyOptions::OptimizeForPointLookup() {
|
||||
prefix_extractor.reset(NewNoopTransform());
|
||||
BlockBasedTableOptions block_based_options;
|
||||
block_based_options.index_type = BlockBasedTableOptions::kBinarySearch;
|
||||
table_factory.reset(new BlockBasedTableFactory(block_based_options));
|
||||
memtable_factory.reset(NewHashLinkListRepFactory());
|
||||
return this;
|
||||
}
|
||||
|
||||
ColumnFamilyOptions* ColumnFamilyOptions::OptimizeLevelStyleCompaction(
|
||||
uint64_t memtable_memory_budget) {
|
||||
write_buffer_size = memtable_memory_budget / 4;
|
||||
// merge two memtables when flushing to L0
|
||||
min_write_buffer_number_to_merge = 2;
|
||||
// this means we'll use 50% extra memory in the worst case, but will reduce
|
||||
// write stalls.
|
||||
max_write_buffer_number = 6;
|
||||
// start flushing L0->L1 as soon as possible. each file on level0 is
|
||||
// (memtable_memory_budget / 2). This will flush level 0 when it's bigger than
|
||||
// memtable_memory_budget.
|
||||
level0_file_num_compaction_trigger = 2;
|
||||
// doesn't really matter much, but we don't want to create too many files
|
||||
target_file_size_base = memtable_memory_budget / 8;
|
||||
// make Level1 size equal to Level0 size, so that L0->L1 compactions are fast
|
||||
max_bytes_for_level_base = memtable_memory_budget;
|
||||
|
||||
// level style compaction
|
||||
compaction_style = kCompactionStyleLevel;
|
||||
|
||||
// only compress levels >= 2
|
||||
compression_per_level.resize(num_levels);
|
||||
for (int i = 0; i < num_levels; ++i) {
|
||||
if (i < 2) {
|
||||
compression_per_level[i] = kNoCompression;
|
||||
} else {
|
||||
compression_per_level[i] = kSnappyCompression;
|
||||
}
|
||||
}
|
||||
return this;
|
||||
}
|
||||
|
||||
ColumnFamilyOptions* ColumnFamilyOptions::OptimizeUniversalStyleCompaction(
|
||||
uint64_t memtable_memory_budget) {
|
||||
write_buffer_size = memtable_memory_budget / 4;
|
||||
// merge two memtables when flushing to L0
|
||||
min_write_buffer_number_to_merge = 2;
|
||||
// this means we'll use 50% extra memory in the worst case, but will reduce
|
||||
// write stalls.
|
||||
max_write_buffer_number = 6;
|
||||
// universal style compaction
|
||||
compaction_style = kCompactionStyleUniversal;
|
||||
compaction_options_universal.compression_size_percent = 80;
|
||||
return this;
|
||||
}
|
||||
|
||||
DBOptions* DBOptions::IncreaseParallelism(int total_threads) {
|
||||
max_background_compactions = total_threads - 1;
|
||||
max_background_flushes = 1;
|
||||
env->SetBackgroundThreads(total_threads, Env::LOW);
|
||||
env->SetBackgroundThreads(1, Env::HIGH);
|
||||
return this;
|
||||
}
|
||||
|
||||
} // namespace rocksdb
|
||||
|
Loading…
Reference in New Issue
Block a user