rocksdb

Go to file

Andrew Kryczka 9b18cc2363 single-file bottom-level compaction when snapshot released

Summary:
When snapshots are held for a long time, files may reach the bottom level containing overwritten/deleted keys. We previously had no mechanism to trigger compaction on such files. This particularly impacted DBs that write to different parts of the keyspace over time, as such files would never be naturally compacted due to second-last level files moving down. This PR introduces a mechanism for bottommost files to be recompacted upon releasing all snapshots that prevent them from dropping their deleted/overwritten keys.

- Changed `CompactionPicker` to compact files in `BottommostFilesMarkedForCompaction()`. These are the last choice when picking. Each file will be compacted alone and output to the same level in which it originated. The goal of this type of compaction is to rewrite the data excluding deleted/overwritten keys.
- Changed `ReleaseSnapshot()` to recompute the bottom files marked for compaction when the oldest existing snapshot changes, and schedule a compaction if needed. We cache the value that oldest existing snapshot needs to exceed in order for another file to be marked in `bottommost_files_mark_threshold_`, which allows us to avoid recomputing marked files for most snapshot releases.
- Changed `VersionStorageInfo` to track the list of bottommost files, which is recomputed every time the version changes by `UpdateBottommostFiles()`. The list of marked bottommost files is first computed in `ComputeBottommostFilesMarkedForCompaction()` when the version changes, but may also be recomputed when `ReleaseSnapshot()` is called.
- Extracted core logic of `Compaction::IsBottommostLevel()` into `VersionStorageInfo::RangeMightExistAfterSortedRun()` since logic to check whether a file is bottommost is now necessary outside of compaction.
Closes https://github.com/facebook/rocksdb/pull/3009

Differential Revision: D6062044

Pulled By: ajkr

fbshipit-source-id: 123d201cf140715a7d5928e8b3cb4f9cd9f7ad21

2017-10-25 16:30:37 -07:00

buckifier

rocksdb: make buildable on aarch64

2017-08-13 17:13:54 -07:00

build_tools

fix lite build

2017-10-17 08:57:09 -07:00

cache

Fix unstable floating point exception

2017-10-20 10:12:49 -07:00

cmake

CMake: Add support for CMake packages

2017-08-28 17:14:37 -07:00

coverage

Fix /bin/bash shebangs

2017-08-03 15:56:46 -07:00

single-file bottom-level compaction when snapshot released

2017-10-25 16:30:37 -07:00

docs

Blog post for 5.8 release

2017-09-28 10:14:09 -07:00

env

Fix build on OpenBSD

2017-10-24 13:27:38 -07:00

examples

Pinnableslice examples and blog post

2017-08-24 12:26:07 -07:00

hdfs

Revert "comment out unused parameters"

2017-07-21 18:26:26 -07:00

include/rocksdb

single-file bottom-level compaction when snapshot released

2017-10-25 16:30:37 -07:00

java

added missing subcodes and improved error message for missing enum values

2017-10-23 16:42:07 -07:00

memtable

Enable MSVC W4 with a few exceptions. Fix warnings and bugs

2017-10-19 10:57:12 -07:00

monitoring

Fix unused var warnings in Release mode

2017-10-23 14:27:04 -07:00

options

Make FIFO compaction options dynamically configurable

2017-10-19 15:26:36 -07:00

port

Fix unused var warnings in Release mode

2017-10-23 14:27:04 -07:00

table

Add DB::Properties::kEstimateOldestKeyTime

2017-10-23 15:27:27 -07:00

third-party

Enable MSVC W4 with a few exceptions. Fix warnings and bugs

2017-10-19 10:57:12 -07:00

tools

db_stress support long-held snapshots

2017-10-20 15:26:59 -07:00

util

Make FIFO compaction options dynamically configurable

2017-10-19 15:26:36 -07:00

utilities

Return write error on reaching blob dir size limit

2017-10-25 16:30:37 -07:00

.clang-format

A script that automatically reformat affected lines

2014-01-14 12:21:24 -08:00

.gitignore

Remove leftover references to phutil_module_cache

2017-08-23 12:12:21 -07:00

.travis.yml

fix lite build

2017-10-17 08:57:09 -07:00

appveyor.yml

Add -DPORTABLE=1 to MSVC CI build

2017-08-31 16:42:48 -07:00

AUTHORS

Update RocksDB Authors File

2017-10-18 14:42:10 -07:00

CMakeLists.txt

Enable MSVC W4 with a few exceptions. Fix warnings and bugs

2017-10-19 10:57:12 -07:00

CONTRIBUTING.md

Remove the licensing description in CONTRIBUTING.md

2017-07-16 15:57:18 -07:00

COPYING

Add GPLv2 as an alternative license.

2017-04-27 18:06:12 -07:00

DEFAULT_OPTIONS_HISTORY.md

options.delayed_write_rate use the rate of rate_limiter by default.

2017-05-24 09:58:24 -07:00

DUMP_FORMAT.md

First version of rocksdb_dump and rocksdb_undump.

2015-06-19 16:24:36 -07:00

HISTORY.md

single-file bottom-level compaction when snapshot released

2017-10-25 16:30:37 -07:00

INSTALL.md

Default one to rocksdb:x64-windows

2017-09-28 16:12:24 -07:00

issue_template.md

Add a template for issues

2017-09-29 11:41:28 -07:00

LANGUAGE-BINDINGS.md

add Erlang to the list of language bindings

2017-08-28 16:43:16 -07:00

LICENSE.Apache

Change RocksDB License

2017-07-15 16:11:23 -07:00

LICENSE.leveldb

Add back the LevelDB license file

2017-07-16 18:42:18 -07:00

Makefile

PinnableSlice move assignment

2017-10-12 18:28:24 -07:00

README.md

Appveyor badge to show master branch

2016-07-26 13:54:08 -07:00

ROCKSDB_LITE.md

Optimistic Transactions

2015-05-29 14:36:35 -07:00

src.mk

PinnableSlice move assignment

2017-10-12 18:28:24 -07:00

TARGETS

PinnableSlice move assignment

2017-10-12 18:28:24 -07:00

thirdparty.inc

Introduce XPRESS compresssion on Windows. (#1081 )

2016-04-19 22:54:24 -07:00

USERS.md

Add LogDevice to USERS.md

2017-09-25 15:56:40 -07:00

Vagrantfile

Update Vagrant file (test internal phabricator workflow)

2016-10-28 15:39:19 -07:00

WINDOWS_PORT.md

Commit both PR and internal code review changes

2015-07-07 16:58:20 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

Languages

C++ 82.1%

Java 10.3%

C 2.5%

Python 1.7%

Perl 1.1%

Other 2.1%