A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Levi Tamasi 3f7e929865 Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605)
Summary:
The `ColumnFamilyData::UnrefAndTryDelete` code currently on the trunk
unlocks the DB mutex before destroying the `ThreadLocalPtr` holding
the per-thread `SuperVersion` pointers when the only remaining reference
is the back reference from `super_version_`. The idea behind this was to
break the circular dependency between `ColumnFamilyData` and `SuperVersion`:
when the penultimate reference goes away, `ColumnFamilyData` can clean up
the `SuperVersion`, which can in turn clean up `ColumnFamilyData`. (Assuming there
is a `SuperVersion` and it is not referenced by anything else.) However,
unlocking the mutex throws a wrench in this plan by making it possible for another thread
to jump in and take another reference to the `ColumnFamilyData`, keeping the
object alive in a zombie `ThreadLocalPtr`-less state. This can cause issues like
https://github.com/facebook/rocksdb/issues/8440 ,
https://github.com/facebook/rocksdb/issues/8382 ,
and might also explain the `was_last_ref` assertion failures from the `ColumnFamilySet`
destructor we sometimes observe during close in our stress tests.

Digging through the archives, this unlocking goes way back to 2014 (or earlier). The original
rationale was that `SuperVersionUnrefHandle` used to lock the mutex so it can call
`SuperVersion::Cleanup`; however, this logic turned out to be deadlock-prone.
https://github.com/facebook/rocksdb/pull/3510 fixed the deadlock but left the
unlocking in place. https://github.com/facebook/rocksdb/pull/6147 then introduced
the circular dependency and associated cleanup logic described above (in order
to enable iterators to keep the `ColumnFamilyData` for dropped column families alive),
and moved the unlocking-relocking snippet to its present location in `UnrefAndTryDelete`.
Finally, https://github.com/facebook/rocksdb/pull/7749 fixed a memory leak but
apparently exacerbated the race by (otherwise correctly) switching to `UnrefAndTryDelete`
in `SuperVersion::Cleanup`.

The patch simply eliminates the unlocking and relocking, which has been unnecessary
ever since https://github.com/facebook/rocksdb/issues/3510 made `SuperVersionUnrefHandle` lock-free.
This closes the window during which another thread could increase the reference count,
and hopefully fixes the issues above.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8605

Test Plan: Ran `make check` and stress tests locally.

Reviewed By: pdillinger

Differential Revision: D30051035

Pulled By: ltamasi

fbshipit-source-id: 8fe559e4b4ad69fc142579f8bc393ef525918528
2021-08-02 18:12:11 -07:00
.circleci Add missing steps for cmake build (#8524) 2021-07-15 13:37:49 -07:00
.github/workflows Update clang-format-diff.py path (#7944) 2021-02-09 12:49:38 -08:00
buckifier Modify script which generates TARGETS (#8366) 2021-06-04 16:28:59 -07:00
build_tools Pass extra db_stress args to fbcode crash tests (#8587) 2021-07-27 12:46:47 -07:00
cache Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Find the correct gcov (#6904) 2020-06-01 16:33:05 -07:00
db Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605) 2021-08-02 18:12:11 -07:00
db_stress_tool Add experimental mempurge policy flag to db_stress. (#8588) 2021-07-28 13:27:58 -07:00
docs Bump addressable from 2.7.0 to 2.8.0 in /docs (#8515) 2021-07-12 17:06:07 -07:00
env Make EncryptionProvider and BlockCipher into Customizable objects (#8354) 2021-07-16 07:58:51 -07:00
examples make:Fix c header prototypes (#7994) 2021-03-09 20:44:23 -08:00
file Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
fuzz Make EventListener into a Customizable Class (#8473) 2021-07-27 07:47:02 -07:00
hdfs fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
include/rocksdb Make EventListener into a Customizable Class (#8473) 2021-07-27 07:47:02 -07:00
java Allow to use a string as a delimiter in StringAppendOperator (#8536) 2021-08-02 16:50:41 -07:00
logging Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
memory Use thread-safe strerror_r() to get error message (#8087) 2021-03-24 23:07:27 -07:00
memtable Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475) 2021-07-07 11:14:05 -07:00
microbench Add micro-benchmark support (#8493) 2021-07-08 18:22:45 -07:00
monitoring Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
options Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
plugin Makefile support to statically link external plugin code (#7918) 2021-02-10 08:35:34 -08:00
port Standardize on GCC for TSAN conditional compilation (#8543) 2021-07-15 23:50:00 -07:00
table Fix Get() return status when block cache is disabled (#8485) 2021-07-13 18:13:24 -07:00
test_util Add CreateFrom methods to Env/FileSystem (#8174) 2021-06-15 03:43:48 -07:00
third-party Fix a compilation error in CircleCI vs2019 CXX20 (#8090) 2021-03-23 10:28:04 -07:00
tools Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
trace_replay Add MultiGet to replay (#8577) 2021-07-27 13:56:15 -07:00
util Fix insecure internal API for GetImpl (#8590) 2021-07-29 17:23:01 -07:00
utilities Allow to use a string as a delimiter in StringAppendOperator (#8536) 2021-08-02 16:50:41 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore gitignore cmake-build-* for CLion integration (#7933) 2021-02-19 13:43:15 -08:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Move arm build from travis to circleci (#8203) 2021-04-19 20:07:02 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Remove 2019 from appveyor (#7038) 2020-06-29 14:31:41 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Add micro-benchmark support (#8493) 2021-07-08 18:22:45 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Allow to use a string as a delimiter in StringAppendOperator (#8536) 2021-08-02 16:50:41 -07:00
INSTALL.md Update installation instructions (#8158) 2021-04-06 16:02:04 -07:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Add RestoreDBFromLatestBackup to C API, add new C# package (#7092) 2020-07-08 11:56:41 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Minor Makefile update to exclude microbench as dependency (#8523) 2021-07-16 15:36:51 -07:00
PLUGINS.md Add ZenFS to plugin list (#8218) 2021-04-22 11:12:40 -07:00
README.md Fix the CI badge for ppc64le Jenkins (#7561) 2020-10-16 09:00:56 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Add micro-benchmark support (#8493) 2021-07-08 18:22:45 -07:00
TARGETS Add an internal iterator that can measure the inflow of blobs (#8443) 2021-06-23 10:25:47 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add Apache Doris to USERS (#7865) 2021-01-19 15:31:56 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.