e626016545
Summary: Previous code may call `~ColumnFamilyData` in `DBImpl::AtomicFlushMemTablesToOutputFiles` if the column family is dropped or `cfd->IsFlushPending() == false`. In `~ColumnFamilyData`, the db mutex is released briefly and re-acquired. This can cause correctness issue. The reason is as follows. Assume there are more bg flush threads. After bg_flush_thr1 releases the db mutex, bg_flush_thr2 can grab it and pop an element from the flush queue. This will cause bg_flush_thr2 to accidentally pick some memtables which should have been picked by bg_flush_thr1. To make the matter worse, bg_flush_thr2 can clear `flush_requested_` flag for the memtable list, causing a subsequent call to `MemTableList::IsFlushPending()` by bg_flush_thr1 to return false, which is wrong. The fix is to delay `ColumnFamilyData::Unref` and `~ColumnFamilyData` for column families not selected for flush until `AtomicFlushMemTablesToOutputFiles` returns. Furthermore, a bg flush thread should not clear `MemTableList::flush_requested_` in `MemTableList::PickMemtablesToFlush` unless atomic flush is not used **or** the memtable list does not have unpicked memtables. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5294 Differential Revision: D15295297 Pulled By: riversand963 fbshipit-source-id: 03b101205ca22c242647cbf488bcf0ed80b2ecbd |
||
---|---|---|
buckifier | ||
build_tools | ||
cache | ||
cmake | ||
coverage | ||
db | ||
docs | ||
env | ||
examples | ||
hdfs | ||
include/rocksdb | ||
java | ||
memtable | ||
monitoring | ||
options | ||
port | ||
table | ||
third-party/gtest-1.7.0/fused-src/gtest | ||
tools | ||
util | ||
utilities | ||
.clang-format | ||
.gitignore | ||
.lgtm.yml | ||
.travis.yml | ||
appveyor.yml | ||
AUTHORS | ||
CMakeLists.txt | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
COPYING | ||
DEFAULT_OPTIONS_HISTORY.md | ||
defs.bzl | ||
DUMP_FORMAT.md | ||
HISTORY.md | ||
INSTALL.md | ||
issue_template.md | ||
LANGUAGE-BINDINGS.md | ||
LICENSE.Apache | ||
LICENSE.leveldb | ||
Makefile | ||
README.md | ||
ROCKSDB_LITE.md | ||
src.mk | ||
TARGETS | ||
thirdparty.inc | ||
USERS.md | ||
Vagrantfile | ||
WINDOWS_PORT.md |
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)
This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.
Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples
See the github wiki for more explanation.
The public interface is in include/
. Callers should not include or
rely on the details of any other header files in this package. Those
internal APIs may be changed without warning.
Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/
License
RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.