rocksdb

Go to file

Igor Canadi fdb6be4e24 Rewritten system for scheduling background work

Summary:
When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.

The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.

Here are the performance results:

Command:

    ./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000  --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333

Before the patch:

     fillrandom   :      26.950 micros/op 37105 ops/sec;    4.1 MB/s

After the patch:

      fillrandom   :      17.404 micros/op 57456 ops/sec;    6.4 MB/s

Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:

      fillrandom   :       7.590 micros/op 131758 ops/sec;   14.6 MB/s

Test Plan:
make check

two stress tests:

Big number of compactions and flushes:

    ./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000

max_background_flushes=0, to verify that this case also works correctly

    ./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0  --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000

Reviewers: ljin, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30123

2014-12-19 20:38:12 +01:00

build_tools

Remove -mtune=native because it's redundant

2014-12-19 09:06:45 -08:00

coverage

Fix coverage script

2014-11-03 14:53:00 -08:00

Rewritten system for scheduling background work

2014-12-19 20:38:12 +01:00

doc

Remove seek compaction

2014-06-20 10:23:02 +02:00

examples

style fixes in c example

2014-12-18 06:48:46 -08:00

hdfs

Replace exception by abort() in dummy HdfsEnv implementation.

2014-12-05 13:30:57 -08:00

helpers/memenv

Turn -Wshadow back on

2014-11-06 11:14:28 -08:00

include

Fix calculation of max_total_wal_size in db_options_.max_total_wal_size == 0 case

2014-12-08 15:26:35 -08:00

java

[RocksJava] Incorporated changes D30081

2014-12-18 22:27:50 +01:00

linters

Fix linters

2014-12-02 13:53:39 -05:00

port

Add rocksdb::ToString() to address cases where std::to_string is not available.

2014-11-24 20:44:49 -08:00

table

Enforce write buffer memory limit across column families

2014-12-02 12:09:20 -08:00

third-party/rapidjson

Fix a rapidjson compile error in mac.

2014-06-23 17:09:24 -06:00

tools

Added 'dump_live_files' command to ldb tool.

2014-12-12 17:50:36 -08:00

util

Handle errors during pthread calls

2014-12-17 16:25:09 -08:00

utilities

Clean up StringSplit

2014-11-21 11:05:28 -05:00

.arcconfig

Improve/fix bugs for the cpp linter

2014-02-13 17:48:11 -08:00

.clang-format

A script that automatically reformat affected lines

2014-01-14 12:21:24 -08:00

.gitignore

Ignore IntelliJ idea project files and ignore java/out folder

2014-10-21 15:52:27 +01:00

.travis.yml

Don't parallelize the build in travis

2014-11-14 16:23:56 -08:00

AUTHORS

Add AUTHORS file. Fix #203

2014-09-29 10:52:18 -07:00

CONTRIBUTING.md

facebook accounts are not required for CLA signers

2014-07-08 05:57:54 -04:00

HISTORY.md

Move the file copy out of the mutex.

2014-12-16 16:57:22 -08:00

INSTALL.md

Optimize default compile to compilation platform by default

2014-12-15 11:29:41 +01:00

LICENSE

Fix copyright year

2014-03-12 12:06:58 -07:00

Makefile

Add -fno-exceptions flag to ROCKSDB_LITE.

2014-12-05 21:34:20 -08:00

PATENTS

Fix the patent format

2013-10-16 15:37:32 -07:00

README.md

Replaced "built on on earlier work" by "built on earlier work" in README.md

2014-09-17 01:16:17 -07:00

ROCKSDB_LITE.md

RocksDBLite

2014-04-15 13:39:26 -07:00

Vagrantfile

Package generation for Ubuntu and CentOS

2014-09-29 16:09:46 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

Languages

C++ 82.1%

Java 10.3%

C 2.5%

Python 1.7%

Perl 1.1%

Other 2.1%