A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Andrew Kryczka cc01985db0 Introduce bottom-pri thread pool for large universal compactions
Summary:
When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs.

This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions.

- Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads.
- Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default.
- Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic.
- Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`).
- Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free.
Closes https://github.com/facebook/rocksdb/pull/2580

Differential Revision: D5422916

Pulled By: ajkr

fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333
2017-08-03 15:43:29 -07:00
arcanist_util Remove arcanist_util directory 2017-07-19 16:49:55 -07:00
buckifier TARGETS file not setting sse explicitly 2017-07-27 17:41:36 -07:00
build_tools Fix use of RocksDBCommonHelper in cont_integration.sh 2017-07-26 19:31:36 -07:00
cache Replace dynamic_cast<> 2017-07-28 16:27:16 -07:00
cmake/modules CMake: more MinGW fixes 2017-04-06 14:09:13 -07:00
coverage Fix coverage script 2014-11-03 14:53:00 -08:00
db Introduce bottom-pri thread pool for large universal compactions 2017-08-03 15:43:29 -07:00
docs 5.6.1 release blog post 2017-07-25 12:27:22 -07:00
env Introduce bottom-pri thread pool for large universal compactions 2017-08-03 15:43:29 -07:00
examples Replace dynamic_cast<> 2017-07-28 16:27:16 -07:00
hdfs Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
include/rocksdb Introduce bottom-pri thread pool for large universal compactions 2017-08-03 15:43:29 -07:00
java Fix statistics in RocksJava sample 2017-08-01 16:58:26 -07:00
memtable Introduce bottom-pri thread pool for large universal compactions 2017-08-03 15:43:29 -07:00
monitoring Replace dynamic_cast<> 2017-07-28 16:27:16 -07:00
options Replace dynamic_cast<> 2017-07-28 16:27:16 -07:00
port LRUCacheShard cache line size alignment 2017-07-24 10:54:37 -07:00
table Replace dynamic_cast<> 2017-07-28 16:27:16 -07:00
third-party Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
tools Introduce bottom-pri thread pool for large universal compactions 2017-08-03 15:43:29 -07:00
util Introduce bottom-pri thread pool for large universal compactions 2017-08-03 15:43:29 -07:00
utilities Allow concurrent writes to blob db 2017-08-03 15:11:26 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Simple blob file dumper 2017-05-23 10:42:59 -07:00
.travis.yml Build fewer tests in Travis platform_dependent tests 2017-07-27 17:29:01 -07:00
appveyor.yml Rework test running script. 2017-04-05 11:39:20 -07:00
AUTHORS Add AUTHORS file. Fix #203 2014-09-29 10:52:18 -07:00
CMakeLists.txt Refactor TransactionImpl 2017-08-03 08:57:22 -07:00
CONTRIBUTING.md Remove the licensing description in CONTRIBUTING.md 2017-07-16 15:57:18 -07:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Introduce bottom-pri thread pool for large universal compactions 2017-08-03 15:43:29 -07:00
INSTALL.md add vcpkg as an windows option 2017-07-24 15:12:45 -07:00
LANGUAGE-BINDINGS.md Adding Dlang to the list 2017-02-16 17:24:10 -08:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Replace dynamic_cast<> 2017-07-28 16:27:16 -07:00
README.md Appveyor badge to show master branch 2016-07-26 13:54:08 -07:00
ROCKSDB_LITE.md Optimistic Transactions 2015-05-29 14:36:35 -07:00
src.mk Refactor TransactionImpl 2017-08-03 08:57:22 -07:00
TARGETS Dump Blob DB options to info log 2017-08-01 13:01:47 -07:00
thirdparty.inc Introduce XPRESS compresssion on Windows. (#1081) 2016-04-19 22:54:24 -07:00
USERS.md fixed typo 2017-06-13 16:58:01 -07:00
Vagrantfile Update Vagrant file (test internal phabricator workflow) 2016-10-28 15:39:19 -07:00
WINDOWS_PORT.md Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Build Status Build status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/