A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Mike Kolupaev 97307d888f Fix deadlock in ColumnFamilyData::InstallSuperVersion()
Summary:
Deadlock: a memtable flush holds DB::mutex_ and calls ThreadLocalPtr::Scrape(), which locks ThreadLocalPtr mutex; meanwhile, a thread exit handler locks ThreadLocalPtr mutex and calls SuperVersionUnrefHandle, which tries to lock DB::mutex_.

This deadlock is hit all the time on our workload. It blocks our release.

In general, the problem is that ThreadLocalPtr takes an arbitrary callback and calls it while holding a lock on a global mutex. The same global mutex is (at least in some cases) locked by almost all ThreadLocalPtr methods, on any instance of ThreadLocalPtr. So, there'll be a deadlock if the callback tries to do anything to any instance of ThreadLocalPtr, or waits for another thread to do so.

So, probably the only safe way to use ThreadLocalPtr callbacks is to do only do simple and lock-free things in them.

This PR fixes the deadlock by making sure that local_sv_ never holds the last reference to a SuperVersion, and therefore SuperVersionUnrefHandle never has to do any nontrivial cleanup.

I also searched for other uses of ThreadLocalPtr to see if they may have similar bugs. There's only one other use, in transaction_lock_mgr.cc, and it looks fine.
Closes https://github.com/facebook/rocksdb/pull/3510

Reviewed By: sagar0

Differential Revision: D7005346

Pulled By: al13n321

fbshipit-source-id: 37575591b84f07a891d6659e87e784660fde815f
2018-02-16 08:13:34 -08:00
buckifier Suppress lint in old files 2018-01-29 12:56:42 -08:00
build_tools Legocastle job to report lite build binary size to scuba 2018-02-15 17:27:24 -08:00
cache Minor typo in comment (s/pro/pri) 2018-02-03 18:27:14 -08:00
cmake add missing config checks to CMakeLists.txt 2017-11-30 22:57:00 -08:00
coverage Suppress lint in old files 2018-01-29 12:56:42 -08:00
db Fix deadlock in ColumnFamilyData::InstallSuperVersion() 2018-02-16 08:13:34 -08:00
docs Adding blog post for 5.10.2 release 2018-02-13 11:56:59 -08:00
env Several small "fixes" 2018-02-15 16:57:37 -08:00
examples Pinnableslice examples and blog post 2017-08-24 12:26:07 -07:00
hdfs Suppress lint in old files 2018-01-29 12:56:42 -08:00
include/rocksdb Unbreak MemTableRep API change 2018-02-15 17:27:24 -08:00
java Java: Add copy constructors for various option classes 2018-02-02 10:57:28 -08:00
memtable Unbreak MemTableRep API change 2018-02-15 17:27:24 -08:00
monitoring Compilation fixes for powerpc build, -Wparentheses-equality error and missing header guards 2018-02-09 14:12:43 -08:00
options options: Fix coverity issues 2018-02-01 14:27:42 -08:00
port Explictly fail writes if key or value is not smaller than 4GB 2018-02-09 14:57:54 -08:00
table Several small "fixes" 2018-02-15 16:57:37 -08:00
third-party Enable MSVC W4 with a few exceptions. Fix warnings and bugs 2017-10-19 10:57:12 -07:00
tools Legocastle job to report lite build binary size to scuba 2018-02-15 17:27:24 -08:00
util Fix deadlock in ColumnFamilyData::InstallSuperVersion() 2018-02-16 08:13:34 -08:00
utilities Several small "fixes" 2018-02-15 16:57:37 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Remove leftover references to phutil_module_cache 2017-08-23 12:12:21 -07:00
.travis.yml CMake cross platform Java support and add JNI to travis 2017-11-28 12:27:53 -08:00
appveyor.yml Upgrade Appveyor to VS2017 2018-02-01 13:57:01 -08:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt CMake changes for CRC32 Optimization on PowerPC 2018-01-23 16:57:11 -08:00
CODE_OF_CONDUCT.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Add delay before flush in CompactRange to avoid write stalling 2018-02-12 15:42:47 -08:00
INSTALL.md FreeBSD build support for RocksDB and RocksJava 2018-01-11 13:29:55 -08:00
issue_template.md Add a template for issues 2017-09-29 11:41:28 -07:00
LANGUAGE-BINDINGS.md Add Nim to the list of language bindings 2018-01-29 09:57:46 -08:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Suppress UBSAN error in finer guanularity 2018-02-13 12:18:07 -08:00
README.md Add Jenkins for PPC64le build status badge 2018-01-11 14:57:45 -08:00
ROCKSDB_LITE.md Optimistic Transactions 2015-05-29 14:36:35 -07:00
src.mk Refactor ReadBlockContents() 2017-12-11 15:27:32 -08:00
TARGETS WritePrepared Txn: make buck tests parallel 2017-12-18 14:42:09 -08:00
thirdparty.inc Make Windows dep switches compatible with other builds 2018-01-05 14:56:54 -08:00
USERS.md Added ProfaneDB 2017-11-19 10:11:44 -08:00
Vagrantfile Update Vagrant file (test internal phabricator workflow) 2016-10-28 15:39:19 -07:00
WINDOWS_PORT.md Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/