A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Yanqin Jin dd63f04c83 First step towards handling MANIFEST write error (#6949)
Summary:
This PR provides preliminary support for handling IO error during MANIFEST write.
File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted.
One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach.
If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled.
Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled.

Possible future directions:
- Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added.

Test plan (dev server):
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949

Reviewed By: anand1976

Differential Revision: D22026020

Pulled By: riversand963

fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8
2020-07-09 15:50:33 -07:00
.circleci Introduce some Linux build to CircleCI (#6937) 2020-06-08 19:34:31 -07:00
.github/workflows Clean up some code related to file checksums (#6861) 2020-05-21 08:12:51 -07:00
buckifier Exclude c_test from buck build opt mode 2020-07-07 09:31:44 -07:00
build_tools Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
cache Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Find the correct gcov (#6904) 2020-06-01 16:33:05 -07:00
db First step towards handling MANIFEST write error (#6949) 2020-07-09 15:50:33 -07:00
db_stress_tool Fix potential overflow of unsigned type in for loop (#6902) 2020-06-02 15:05:07 -07:00
docs Log warning for high bits/key in legacy Bloom filter (#6312) 2020-01-17 19:37:35 -08:00
env Close file to avoid file-descriptor leakage (#6936) 2020-06-04 14:21:15 -07:00
examples Add missing my_pid to fprintf in multi_process_example (#6731) 2020-05-08 20:49:33 -07:00
file Ingest SST files with checksum information (#6891) 2020-06-11 14:27:36 -07:00
hdfs prototype status check enforcement (#6798) 2020-05-08 12:40:43 -07:00
include/rocksdb First step towards handling MANIFEST write error (#6949) 2020-07-09 15:50:33 -07:00
java Add logs and stats in DeleteScheduler (#6927) 2020-06-05 09:43:04 -07:00
logging Fix info log source file display length (#5824) 2020-04-08 20:18:08 -07:00
memory C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
memtable Fix more defects reported by Coverity Scan (#6935) 2020-06-04 15:35:08 -07:00
monitoring Add logs and stats in DeleteScheduler (#6927) 2020-06-05 09:43:04 -07:00
options make L0 index/filter pinned memory usage predictable (#6911) 2020-06-09 16:51:23 -07:00
port Fix more defects reported by Coverity Scan (#6935) 2020-06-04 15:35:08 -07:00
table Update Flush policy in PartitionedIndexBuilder on switching from user-key to internal-key mode (#7096) 2020-07-09 15:48:54 -07:00
test_util Check iterator status BlockBasedTableReader::VerifyChecksumInBlocks() (#6909) 2020-06-05 11:08:25 -07:00
third-party Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
tools Add --version and --help to ldb and sst_dump (#6951) 2020-06-09 10:04:01 -07:00
trace_replay Fix more defects reported by Coverity Scan (#6935) 2020-06-04 15:35:08 -07:00
util Fix ThreadLocalTest.SequentialReadWriteTest failure when running individually (#6929) 2020-06-04 11:44:09 -07:00
utilities Move kNoExpiration to blob_db.h (#7018) 2020-06-23 14:13:48 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Make sure core components not depend on gtest (#6921) 2020-06-03 18:22:14 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Move blob_log_{format,reader,writer}.{cc,h} to db/blob/ (#6960) 2020-06-09 15:16:05 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md First step towards handling MANIFEST write error (#6949) 2020-07-09 15:50:33 -07:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md LANGUAGE-BINDINGS.md: mention python-rocksdb 2019-03-20 11:10:48 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Introduce some Linux build to CircleCI (#6937) 2020-06-08 19:34:31 -07:00
README.md Add Slack forum to README (#6773) 2020-04-30 11:00:28 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Move blob_log_{format,reader,writer}.{cc,h} to db/blob/ (#6960) 2020-06-09 15:16:05 -07:00
TARGETS Exclude c_test from buck build opt mode 2020-07-07 09:31:44 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add YugabyteDB to USERS (#6786) 2020-05-06 10:28:29 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.