A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
fanrui03 67d72fb5dc Fix checkpoint stuck (#7921)
Summary:
## 1. Bug description:

When RocksDB Checkpoint, it may be stuck in `WaitUntilFlushWouldNotStallWrites` method.

## 2. Simple analysis of the reasons:

### 2.1 Configuration parameters:

```yaml
Compaction Style : Universal

max_write_buffer_number : 4
min_write_buffer_number_to_merge : 3
```

Checkpoint is usually very fast. When the Checkpoint is executed, `WaitUntilFlushWouldNotStallWrites` is called. If there are 2 Immutable MemTables, which are less than `min_write_buffer_number_to_merge`, they will not be flushed. But will enter this code.

```c++
// method: GetWriteStallConditionAndCause
if (mutable_cf_options.max_write_buffer_number> 3 &&
              num_unflushed_memtables >=
                  mutable_cf_options.max_write_buffer_number-1) {
     return {WriteStallCondition::kDelayed, WriteStallCause::kMemtableLimit};
}
```

code link: fbed72f03c/db/column_family.cc (L847)

Checkpoint thought there was a FlushJob, but it didn't. So will always wait.

### 2.2 solution:

Increase the restriction: the `number of Immutable MemTable` >= `min_write_buffer_number_to_merge will wait`.

If there are other better solutions, you can correct me.

### 2.3 Code that can reproduce the problem:

https://github.com/1996fanrui/fanrui-learning/blob/flink-1.12/module-java/src/main/java/com/dream/rocksdb/RocksDBCheckpointStuck.java

## 3. Interesting point

This bug will be triggered only when `the number of sorted runs >= level0_file_num_compaction_trigger`.

Because there is a break in WaitUntilFlushWouldNotStallWrites.

```c++
if (cfd->imm()->NumNotFlushed() <
        cfd->ioptions()->min_write_buffer_number_to_merge &&
    vstorage->l0_delay_trigger_count() <
        mutable_cf_options.level0_file_num_compaction_trigger) {
  break;
}
```

code link: fbed72f03c/db/db_impl/db_impl_compaction_flush.cc (L1974)

Universal may have `l0_delay_trigger_count() >= level0_file_num_compaction_trigger`, so this bug is triggered.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7921

Reviewed By: jay-zhuang

Differential Revision: D26900559

Pulled By: ajkr

fbshipit-source-id: 133c1252dad7393753f04a47590b68c7d8e670df
2021-03-09 02:21:25 -08:00
.circleci Add circleci format_compatible nightly build (#7926) 2021-02-09 20:48:53 -08:00
.github/workflows Update clang-format-diff.py path (#7944) 2021-02-09 12:49:38 -08:00
buckifier Update zstd in buck build (#7923) 2021-02-08 14:46:01 -08:00
build_tools Extract test cases correctly in run_ci_db_test.ps1 script (#7989) 2021-02-23 14:25:42 -08:00
cache Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Find the correct gcov (#6904) 2020-06-01 16:33:05 -07:00
db Fix checkpoint stuck (#7921) 2021-03-09 02:21:25 -08:00
db_stress_tool Enable compact filter for blob in dbstress and dbbench (#8011) 2021-03-01 17:24:47 -08:00
docs Update github-pages and dependencies (#7850) 2021-01-11 12:48:01 -08:00
env Wal recovery failure with encryption due to zero bytes WAL size. (#7924) 2021-02-05 12:40:52 -08:00
examples Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
file Make BlockBasedTable::kMaxAutoReadAheadSize configurable (#7951) 2021-02-23 16:54:08 -08:00
fuzz Remove Legacy and Custom FileWrapper classes from header files (#7851) 2021-01-28 22:10:32 -08:00
hdfs fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
include/rocksdb Fix mis-spelling (#8001) 2021-03-09 01:19:18 -08:00
java Limit buffering for collecting samples for compression dictionary (#7970) 2021-02-19 14:09:54 -08:00
logging Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
memory slightly improve jemalloc allocator API header (#7592) 2020-10-28 13:47:12 -07:00
memtable Feature: add SetBufferSize() so that managed size can be dynamic (#7961) 2021-03-03 14:22:11 -08:00
monitoring make PerfStepTimer struct smaller by reordering members (#7931) 2021-03-08 21:33:15 -08:00
options Make BlockBasedTable::kMaxAutoReadAheadSize configurable (#7951) 2021-02-23 16:54:08 -08:00
plugin Makefile support to statically link external plugin code (#7918) 2021-02-10 08:35:34 -08:00
port Update win_logger.cc : assert failed when return value not checked. (-DROCKSDB_ASSERT_STATUS_CHECKED) (#7955) 2021-02-18 16:34:10 -08:00
table Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
test_util Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
third-party Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
tools Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
trace_replay Introduce a new trace file format (v 0.2) for better extension (#7977) 2021-02-18 23:05:35 -08:00
util Update compaction statistics to include the amount of data read from blob files (#8022) 2021-03-04 00:43:48 -08:00
utilities Support retrieving checksums for blob files from the MANIFEST when checkpointing (#8003) 2021-03-01 20:07:07 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore gitignore cmake-build-* for CLion integration (#7933) 2021-02-19 13:43:15 -08:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml use LIB_MODE=shared on Travis make commands (#8043) 2021-03-08 17:21:24 -08:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Remove 2019 from appveyor (#7038) 2020-06-29 14:31:41 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Clarifying comments for Read() APIs (#8029) 2021-03-05 14:42:19 -08:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Add RestoreDBFromLatestBackup to C API, add new C# package (#7092) 2020-07-08 11:56:41 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Compaction filter support for (new) BlobDB (#7974) 2021-02-25 16:32:35 -08:00
PLUGINS.md Makefile support to statically link external plugin code (#7918) 2021-02-10 08:35:34 -08:00
README.md Fix the CI badge for ppc64le Jenkins (#7561) 2020-10-16 09:00:56 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
TARGETS Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add Apache Doris to USERS (#7865) 2021-01-19 15:31:56 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.