405a35f8c3
Summary: In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't flush the data from WAL to L0 for all column families if possible. As a result, not all column families can increase their log_numbers, and min_log_number_to_keep won't change. For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change. If we persist a new MANIFEST with advanced log_numbers for some column families, then during a second crash after persisting the MANIFEST, RocksDB will see some column families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail. As a solution, RocksDB will persist the new MANIFEST after successfully syncing the new WAL. If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point. If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful Pull Request resolved: https://github.com/facebook/rocksdb/pull/9922 Test Plan: 1. Update unit tests to fail without this change 2. make crast_test -j Branch with unit test and no fix https://github.com/facebook/rocksdb/pull/9942 to keep track of unit test (without fix) Reviewed By: riversand963 Differential Revision: D36043701 Pulled By: akankshamahajan15 fbshipit-source-id: 5760970db0a0920fb73d3c054a4155733500acd9 |
||
---|---|---|
.circleci | ||
.github/workflows | ||
buckifier | ||
build_tools | ||
cache | ||
cmake | ||
coverage | ||
db | ||
db_stress_tool | ||
docs | ||
env | ||
examples | ||
file | ||
fuzz | ||
include/rocksdb | ||
java | ||
logging | ||
memory | ||
memtable | ||
microbench | ||
monitoring | ||
options | ||
plugin | ||
port | ||
table | ||
test_util | ||
third-party | ||
tools | ||
trace_replay | ||
util | ||
utilities | ||
.clang-format | ||
.gitignore | ||
.lgtm.yml | ||
.travis.yml | ||
.watchmanconfig | ||
AUTHORS | ||
CMakeLists.txt | ||
CODE_OF_CONDUCT.md | ||
common.mk | ||
CONTRIBUTING.md | ||
COPYING | ||
crash_test.mk | ||
DEFAULT_OPTIONS_HISTORY.md | ||
DUMP_FORMAT.md | ||
HISTORY.md | ||
INSTALL.md | ||
issue_template.md | ||
LANGUAGE-BINDINGS.md | ||
LICENSE.Apache | ||
LICENSE.leveldb | ||
Makefile | ||
PLUGINS.md | ||
README.md | ||
ROCKSDB_LITE.md | ||
rocksdb.pc.in | ||
src.mk | ||
TARGETS | ||
thirdparty.inc | ||
USERS.md | ||
Vagrantfile | ||
WINDOWS_PORT.md |
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)
This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.
Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples
See the github wiki for more explanation.
The public interface is in include/
. Callers should not include or
rely on the details of any other header files in this package. Those
internal APIs may be changed without warning.
Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.
License
RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.