A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Andrew Kryczka ccacadf51f single-file bottom-level compaction when snapshot released
Summary:
When snapshots are held for a long time, files may reach the bottom level containing overwritten/deleted keys. We previously had no mechanism to trigger compaction on such files. This particularly impacted DBs that write to different parts of the keyspace over time, as such files would never be naturally compacted due to second-last level files moving down. This PR introduces a mechanism for bottommost files to be recompacted upon releasing all snapshots that prevent them from dropping their deleted/overwritten keys.

- Changed `CompactionPicker` to compact files in `BottommostFilesMarkedForCompaction()`. These are the last choice when picking. Each file will be compacted alone and output to the same level in which it originated. The goal of this type of compaction is to rewrite the data excluding deleted/overwritten keys.
- Changed `ReleaseSnapshot()` to recompute the bottom files marked for compaction when the oldest existing snapshot changes, and schedule a compaction if needed. We cache the value that oldest existing snapshot needs to exceed in order for another file to be marked in `bottommost_files_mark_threshold_`, which allows us to avoid recomputing marked files for most snapshot releases.
- Changed `VersionStorageInfo` to track the list of bottommost files, which is recomputed every time the version changes by `UpdateBottommostFiles()`. The list of marked bottommost files is first computed in `ComputeBottommostFilesMarkedForCompaction()` when the version changes, but may also be recomputed when `ReleaseSnapshot()` is called.
- Extracted core logic of `Compaction::IsBottommostLevel()` into `VersionStorageInfo::RangeMightExistAfterSortedRun()` since logic to check whether a file is bottommost is now necessary outside of compaction.
Closes https://github.com/facebook/rocksdb/pull/3009

Differential Revision: D6062044

Pulled By: ajkr

fbshipit-source-id: 123d201cf140715a7d5928e8b3cb4f9cd9f7ad21
2017-10-26 18:11:35 -07:00
arcanist_util Fix arc setting for Facebook internal tools 2017-02-02 13:24:16 -08:00
buckifier Make TARGETS file portable 2017-07-14 15:45:36 -07:00
build_tools Change RocksDB License 2017-07-26 11:31:01 -07:00
cache Change RocksDB License 2017-07-26 11:31:01 -07:00
cmake/modules CMake: more MinGW fixes 2017-04-06 14:09:13 -07:00
coverage Fix coverage script 2014-11-03 14:53:00 -08:00
db single-file bottom-level compaction when snapshot released 2017-10-26 18:11:35 -07:00
docs rocksdb 5.5.1 release post 2017-07-05 16:41:30 -07:00
env Change RocksDB License 2017-07-26 11:31:01 -07:00
examples Change RocksDB License 2017-07-26 11:31:01 -07:00
hdfs Change RocksDB License 2017-07-26 11:31:01 -07:00
include/rocksdb single-file bottom-level compaction when snapshot released 2017-10-26 18:11:35 -07:00
java Update java/rocksjni.pom 2017-07-26 11:31:58 -07:00
memtable Change RocksDB License 2017-07-26 11:31:01 -07:00
monitoring Change RocksDB License 2017-07-26 11:31:01 -07:00
options fix missing manual_wal_flush for DBOptions ctor 2017-09-14 15:37:51 -07:00
port Change RocksDB License 2017-07-26 11:31:01 -07:00
table Remove some left-over BSD headers 2017-07-26 11:31:42 -07:00
third-party Change RocksDB License 2017-07-26 11:31:01 -07:00
tools Dump Blob DB options to info log 2017-08-31 14:03:03 -07:00
util Change RocksDB License 2017-07-26 11:31:01 -07:00
utilities make blob file close synchronous 2017-08-31 14:22:10 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.deprecated_arcconfig Update ShipIt to honor TARGETS updates 2017-04-13 16:12:03 -07:00
.gitignore Simple blob file dumper 2017-05-23 10:42:59 -07:00
.travis.yml Force travis to build with clang on MacOS 2017-06-05 15:41:57 -07:00
appveyor.yml Rework test running script. 2017-04-05 11:39:20 -07:00
AUTHORS Add AUTHORS file. Fix #203 2014-09-29 10:52:18 -07:00
CMakeLists.txt Dump Blob DB options to info log 2017-08-31 14:03:03 -07:00
CONTRIBUTING.md Remove the licensing description in CONTRIBUTING.md 2017-07-26 11:31:18 -07:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md single-file bottom-level compaction when snapshot released 2017-10-26 18:11:35 -07:00
INSTALL.md Added a note about LZ4 compression dependency 2017-07-10 12:12:22 -07:00
LANGUAGE-BINDINGS.md Adding Dlang to the list 2017-02-16 17:24:10 -08:00
LICENSE.Apache Change RocksDB License 2017-07-26 11:31:01 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-26 11:31:29 -07:00
Makefile Fix undefined behavior in Hash 2017-07-10 12:29:24 -07:00
README.md Appveyor badge to show master branch 2016-07-26 13:54:08 -07:00
ROCKSDB_LITE.md Optimistic Transactions 2015-05-29 14:36:35 -07:00
src.mk Dump Blob DB options to info log 2017-08-31 14:03:03 -07:00
TARGETS Dump Blob DB options to info log 2017-08-31 14:03:03 -07:00
thirdparty.inc Introduce XPRESS compresssion on Windows. (#1081) 2016-04-19 22:54:24 -07:00
USERS.md fixed typo 2017-06-13 16:58:01 -07:00
Vagrantfile Update Vagrant file (test internal phabricator workflow) 2016-10-28 15:39:19 -07:00
WINDOWS_PORT.md Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Build Status Build status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/