A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Yanqin Jin bbdaf63d0f Fix a TSAN-reported bug caused by concurrent accesss to std::deque (#9686)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9686

According to https://www.cplusplus.com/reference/deque/deque/back/,
"
The container is accessed (neither the const nor the non-const versions modify the container).
The last element is potentially accessed or modified by the caller. Concurrently accessing or modifying other elements is safe.
"

Also according to https://www.cplusplus.com/reference/deque/deque/pop_front/,
"
The container is modified.
The first element is modified. Concurrently accessing or modifying other elements is safe (although see iterator validity above).
"
In RocksDB, we never pop the last element of `DBImpl::alive_log_files_`. We have been
exploiting this fact and the above two properties when ensuring correctness when
`DBImpl::alive_log_files_` may be accessed concurrently. Specifically, it can be accessed
in the write path when db mutex is released. Sometimes, the log_mute_ is held. It can also be accessed in `FindObsoleteFiles()`
when db mutex is always held. It can also be accessed
during recovery when db mutex is also held.
Given the fact that we never pop the last element of alive_log_files_, we currently do not
acquire additional locks when accessing it in `WriteToWAL()` as follows
```
alive_log_files_.back().AddSize(log_entry.size());
```

This is problematic.

Check source code of deque.h
```
  back() _GLIBCXX_NOEXCEPT
  {
__glibcxx_requires_nonempty();
...
  }

  pop_front() _GLIBCXX_NOEXCEPT
  {
...
  if (this->_M_impl._M_start._M_cur
      != this->_M_impl._M_start._M_last - 1)
    {
      ...
      ++this->_M_impl._M_start._M_cur;
    }
  ...
  }
```

`back()` will actually call `__glibcxx_requires_nonempty()` first.
If `__glibcxx_requires_nonempty()` is enabled and not an empty macro,
it will call `empty()`
```
bool empty() {
return this->_M_impl._M_finish == this->_M_impl._M_start;
}
```
You can see that it will access `this->_M_impl._M_start`, racing with `pop_front()`.
Therefore, TSAN will actually catch the bug in this case.

To be able to use TSAN on our library and unit tests, we should always coordinate
concurrent accesses to STL containers properly.

We need to pass information about db mutex and log mutex into `WriteToWAL()`, otherwise
it's impossible to know which mutex to acquire inside the function.

To fix this, we can catch the tail of `alive_log_files_` by reference, so that we do not have to call `back()` in `WriteToWAL()`.

Reviewed By: pdillinger

Differential Revision: D34780309

fbshipit-source-id: 1def9821f0c437f2736c6a26445d75890377889b
2022-03-14 18:49:55 -07:00
.circleci Reduce Windows build parallelism number (#9687) 2022-03-12 11:45:10 -08:00
.github/workflows Use released clang-format instead of the one from dev branch (#9646) 2022-03-01 10:51:38 -08:00
buckifier regenerate config jsons, reduce noise (#9644) 2022-03-01 15:09:45 -08:00
build_tools Reenable s390x platform_dependent travis job (#9631) 2022-03-01 13:50:41 -08:00
cache typo(clock_cache) fix incomplete message typo (#9638) 2022-03-01 10:57:09 -08:00
cmake gcc-11 and cmake related cleanup (#9286) 2021-12-17 17:04:35 -08:00
coverage Remove asan_symbolize.py for internal asan build (#8737) 2021-09-07 15:39:11 -07:00
db Fix a TSAN-reported bug caused by concurrent accesss to std::deque (#9686) 2022-03-14 18:49:55 -07:00
db_stress_tool Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00
docs Update Githubpages version (#9670) 2022-03-07 14:48:06 -08:00
env Fix build for io_uring (#9690) 2022-03-12 22:12:18 -08:00
examples Remove BlockBasedTableOptions.hash_index_allow_collision (#9454) 2022-03-01 13:58:02 -08:00
file Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00
fuzz Fix compilation errors and add fuzzers to CircleCI (#9420) 2022-02-01 10:32:15 -08:00
include/rocksdb Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
java NPE in Java_org_rocksdb_ColumnFamilyOptions_setSstPartitionerFactory (#9622) 2022-03-14 14:12:30 -07:00
logging Use system-wide thread ID in info log lines (#9164) 2021-11-12 19:46:06 -08:00
memory Return different Status based on ObjectRegistry::NewObject calls (#9333) 2022-02-11 05:11:24 -08:00
memtable Remove using namespace (#9369) 2022-01-12 09:31:12 -08:00
microbench regenerate config jsons, reduce noise (#9644) 2022-03-01 15:09:45 -08:00
monitoring Dedicate cacheline for DB mutex (#9637) 2022-02-27 11:36:54 -08:00
options remove redundant assignment code for member state (#9665) 2022-03-08 11:03:56 -08:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port #include <winioctl.h> as MSDN prescribes (#9612) 2022-03-13 17:01:21 -07:00
table compression_per_level should be used for flush and changeable (#9658) 2022-03-07 18:06:19 -08:00
test_util Remove BlockBasedTableOptions.hash_index_allow_collision (#9454) 2022-03-01 13:58:02 -08:00
third-party Work around some new clang-analyze failures (#9515) 2022-02-07 18:24:36 -08:00
tools Reactivate Mempurge feature in crash test. (#9684) 2022-03-11 15:47:30 -08:00
trace_replay Added TraceOptions::preserve_write_order (#9334) 2021-12-28 15:04:26 -08:00
util Fix a timer crash caused by invalid memory management (#9656) 2022-03-12 11:45:56 -08:00
utilities Fix clang-analyze by adding assertion (#9682) 2022-03-09 10:13:02 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore gitignore cmake-build-* for CLion integration (#7933) 2021-02-19 13:43:15 -08:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Reenable s390x platform_dependent travis job (#9631) 2022-03-01 13:50:41 -08:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Avoid popcnt on Windows when unavailable and in portable builds (#9680) 2022-03-09 21:07:31 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363) 2022-01-18 17:31:03 -08:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Fix a TSAN-reported bug caused by concurrent accesss to std::deque (#9686) 2022-03-14 18:49:55 -07:00
INSTALL.md Clarify Google benchmark < 1.6.0 in INSTALL.md (#9505) 2022-02-07 10:00:46 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Update branch name to "main" in README/LANGUAGE_BINDINGS (#8727) 2021-09-01 15:26:34 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Support user-defined timestamps in write-committed txns (#9629) 2022-03-08 16:20:59 -08:00
PLUGINS.md Move RADOS support to separate repo (#9206) 2022-01-24 22:50:07 -08:00
README.md README: De-list slack channel, list Google group (#9387) 2022-01-18 08:19:48 -08:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Support user-defined timestamps in write-committed txns (#9629) 2022-03-08 16:20:59 -08:00
TARGETS Support user-defined timestamps in write-committed txns (#9629) 2022-03-08 16:20:59 -08:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add Solana's RocksDB use case in USERS.md (#9558) 2022-02-16 09:23:01 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.