A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Yanqin Jin 6134ce6444 Perform post-flush updates of memtable list in a callback (#6069)
Summary:
Currently, the following interleaving of events can lead to SuperVersion containing both immutable memtables as well as the resulting L0. This can cause Get to return incorrect result if there are merge operands. This may also affect other operations such as single deletes.

```
  time  main_thr  bg_flush_thr  bg_compact_thr  compact_thr  set_opts_thr
0  |                                                         WriteManifest:0
1  |                                           issue compact
2  |                                 wait
3  |   Merge(counter)
4  |   issue flush
5  |                   wait
6  |                                                         WriteManifest:1
7  |                                 wake up
8  |                                 write manifest
9  |                  wake up
10 |  Get(counter)
11 |                  remove imm
   V
```

The reason behind is that: one bg flush thread's installing new `Version` can be batched and performed by another thread that is the "leader" MANIFEST writer. This bg thread removes the memtables from current super version only after `LogAndApply` returns. After the leader MANIFEST writer signals (releasing mutex) this bg flush thread, it is possible that another thread sees this cf with both memtables (whose data have been flushed to the newest L0) and the L0 before this bg flush thread removes the memtables.

To address this issue, each bg flush thread can pass a callback function to `LogAndApply`. The callback is responsible for removing the memtables. Therefore, the leader MANIFEST writer can call this callback and remove the memtables before releasing the mutex.

Test plan (devserver)
```
$make merge_test
$./merge_test --gtest_filter=MergeTest.MergeWithCompactionAndFlush
$make check
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6069

Reviewed By: cheng-chang

Differential Revision: D18790894

Pulled By: riversand963

fbshipit-source-id: e41bd600c0448b4f4b2deb3f7677f95e3076b4ed
2020-10-26 18:23:01 -07:00
.circleci Add macos tests to circleci (#7536) 2020-10-12 10:46:40 -07:00
.github/workflows Clean up some code related to file checksums (#6861) 2020-05-21 08:12:51 -07:00
buckifier Add a rocksdb lib target with link_whole=True (#7466) 2020-09-30 22:50:32 -07:00
build_tools New bit manipulation functions and 128-bit value library (#7338) 2020-09-03 09:32:59 -07:00
cache Introduce BlobFileCache and add support for blob files to Get() (#7540) 2020-10-15 13:04:47 -07:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Find the correct gcov (#6904) 2020-06-01 16:33:05 -07:00
db Perform post-flush updates of memtable list in a callback (#6069) 2020-10-26 18:23:01 -07:00
db_stress_tool db_stress prints key in Hex (#7533) 2020-10-13 12:38:59 -07:00
docs Update github-pages to v207 (#7235) 2020-08-12 09:26:24 -07:00
env Add a host location property to TableProperties (#7479) 2020-10-19 11:38:48 -07:00
examples Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
file Make FileType Public and Replace kLogFile with kWalFile (#7580) 2020-10-22 17:06:20 -07:00
hdfs fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
include/rocksdb add StartTrace and EndTrace to stackable_db (#7585) 2020-10-22 17:31:54 -07:00
java Small JNI improvements (#7371) 2020-10-14 22:23:56 -07:00
logging Add more tests to ASSERT_STATUS_CHECKED (#7367) 2020-09-16 15:48:07 -07:00
memory C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
memtable Test for LoadLatestOptions (#7554) 2020-10-14 22:28:55 -07:00
monitoring Add Stats for MultiGet (#7366) 2020-10-07 13:28:48 -07:00
options Revert Statuses returned from pre-Configurable options functions (#7563) 2020-10-20 11:53:28 -07:00
port Fix MSVC-related build issues (#7439) 2020-10-01 09:23:04 -07:00
table Make parallel compression optimization code tidier (#6888) 2020-10-22 11:05:25 -07:00
test_util Allow compaction iterator to perform garbage collection (#7556) 2020-10-23 22:59:46 -07:00
third-party Fix MSVC-related build issues (#7439) 2020-10-01 09:23:04 -07:00
tools Make FileType Public and Replace kLogFile with kWalFile (#7580) 2020-10-22 17:06:20 -07:00
trace_replay Genericize and clean up FastRange (#7436) 2020-09-28 11:35:00 -07:00
util Ribbon: initial (general) algorithms and basic unit test (#7491) 2020-10-25 20:44:49 -07:00
utilities Make FileType Public and Replace kLogFile with kWalFile (#7580) 2020-10-22 17:06:20 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Re-add extra compiler flags when building unittests (#7437) 2020-09-25 16:44:43 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Update Travis config for broken snapd on ppc (#7381) 2020-09-14 14:23:13 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Remove 2019 from appveyor (#7038) 2020-06-29 14:31:41 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Ribbon: initial (general) algorithms and basic unit test (#7491) 2020-10-25 20:44:49 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Perform post-flush updates of memtable list in a callback (#6069) 2020-10-26 18:23:01 -07:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Add RestoreDBFromLatestBackup to C API, add new C# package (#7092) 2020-07-08 11:56:41 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Ribbon: initial (general) algorithms and basic unit test (#7491) 2020-10-25 20:44:49 -07:00
README.md Fix the CI badge for ppc64le Jenkins (#7561) 2020-10-16 09:00:56 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Ribbon: initial (general) algorithms and basic unit test (#7491) 2020-10-25 20:44:49 -07:00
TARGETS Ribbon: initial (general) algorithms and basic unit test (#7491) 2020-10-25 20:44:49 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add YugabyteDB to USERS (#6786) 2020-05-06 10:28:29 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.