A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Baptiste Lemaire 7bed6595f3 Fix mempurge crash reported in #8958 (#9671)
Summary:
Change the `MemPurge` code to address a failure during a crash test reported in https://github.com/facebook/rocksdb/issues/8958.

### Details and results of the crash investigation:
These failures happened in a specific scenario where the list of immutable tables was composed of 2 or more memtables, and the last memtable was the output of a previous `Mempurge` operation. Because the `PickMemtablesToFlush` function included a sorting of the memtables (previous PR related to the Mempurge project), and because the `VersionEdit` of the flush class is piggybacked onto a single one of these memtables, the `VersionEdit` was not properly selected and applied to the `VersionSet` of the DB. Since the `VersionSet` was not edited properly, the database was losing track of the SST file created during the flush process, which was subsequently deleted (and as you can expect, caused the tests to crash).
The following command consistently failed, which was quite convenient to investigate the issue:
`$ while rm -rf /dev/shm/single_stress && ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/single_stress --experimental_mempurge_threshold=5.493146827397074 --flush_one_in=10000 --reopen=0 --write_buffer_size=262144 --value_size_mult=33 --max_write_buffer_number=3 -ops_per_thread=10000; do : ; done`

### Solution proposed
The memtables are no longer sorted based on their `memtableID` in the `PickMemtablesToFlush` function. Additionally, the `next_log_number` of the memtable created as an output of the `Mempurge` function now takes in the correct value (the log number of the first memtable being mempurged). Finally, the VersionEdit object of the flush class now takes the maximum `next_log_number` of the stack of memtables being flushed, which doesnt change anything when Mempurge is `off` but becomes necessary when Mempurge is `on`.

### Testing of the solution
The following command no longer fails:
``$ while rm -rf /dev/shm/single_stress && ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/single_stress --experimental_mempurge_threshold=5.493146827397074 --flush_one_in=10000 --reopen=0 --write_buffer_size=262144 --value_size_mult=33 --max_write_buffer_number=3 -ops_per_thread=10000; do : ; done``
Additionally, I ran `db_crashtest` (`whitebox` and `blackbox`) for 2.5 hours with MemPurge on and did not observe any crash.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9671

Reviewed By: pdillinger

Differential Revision: D34697424

Pulled By: bjlemaire

fbshipit-source-id: d1ab675b361904351ac81a35c184030e52222874
2022-03-10 15:16:55 -08:00
.circleci Remove remaining SKIP_LINK=1 in circleci config (#9669) 2022-03-07 15:23:25 -08:00
.github/workflows Use released clang-format instead of the one from dev branch (#9646) 2022-03-01 10:51:38 -08:00
buckifier regenerate config jsons, reduce noise (#9644) 2022-03-01 15:09:45 -08:00
build_tools Reenable s390x platform_dependent travis job (#9631) 2022-03-01 13:50:41 -08:00
cache typo(clock_cache) fix incomplete message typo (#9638) 2022-03-01 10:57:09 -08:00
cmake gcc-11 and cmake related cleanup (#9286) 2021-12-17 17:04:35 -08:00
coverage Remove asan_symbolize.py for internal asan build (#8737) 2021-09-07 15:39:11 -07:00
db Fix mempurge crash reported in #8958 (#9671) 2022-03-10 15:16:55 -08:00
db_stress_tool Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00
docs Update Githubpages version (#9670) 2022-03-07 14:48:06 -08:00
env Fix issue #9627 (#9657) 2022-03-07 11:39:31 -08:00
examples Remove BlockBasedTableOptions.hash_index_allow_collision (#9454) 2022-03-01 13:58:02 -08:00
file Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00
fuzz Fix compilation errors and add fuzzers to CircleCI (#9420) 2022-02-01 10:32:15 -08:00
include/rocksdb Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00
java Fix RocksJava releases for macOS (#9662) 2022-03-07 10:50:52 -08:00
logging Use system-wide thread ID in info log lines (#9164) 2021-11-12 19:46:06 -08:00
memory Return different Status based on ObjectRegistry::NewObject calls (#9333) 2022-02-11 05:11:24 -08:00
memtable Remove using namespace (#9369) 2022-01-12 09:31:12 -08:00
microbench regenerate config jsons, reduce noise (#9644) 2022-03-01 15:09:45 -08:00
monitoring Dedicate cacheline for DB mutex (#9637) 2022-02-27 11:36:54 -08:00
options remove redundant assignment code for member state (#9665) 2022-03-08 11:03:56 -08:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port Require C++17 (#9481) 2022-02-04 17:13:10 -08:00
table compression_per_level should be used for flush and changeable (#9658) 2022-03-07 18:06:19 -08:00
test_util Remove BlockBasedTableOptions.hash_index_allow_collision (#9454) 2022-03-01 13:58:02 -08:00
third-party Work around some new clang-analyze failures (#9515) 2022-02-07 18:24:36 -08:00
tools Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00
trace_replay Added TraceOptions::preserve_write_order (#9334) 2021-12-28 15:04:26 -08:00
util Avoid popcnt on Windows when unavailable and in portable builds (#9680) 2022-03-09 21:07:31 -08:00
utilities Fix clang-analyze by adding assertion (#9682) 2022-03-09 10:13:02 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore gitignore cmake-build-* for CLion integration (#7933) 2021-02-19 13:43:15 -08:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Reenable s390x platform_dependent travis job (#9631) 2022-03-01 13:50:41 -08:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Avoid popcnt on Windows when unavailable and in portable builds (#9680) 2022-03-09 21:07:31 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363) 2022-01-18 17:31:03 -08:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Support user-defined timestamps in write-committed txns (#9629) 2022-03-08 16:20:59 -08:00
INSTALL.md Clarify Google benchmark < 1.6.0 in INSTALL.md (#9505) 2022-02-07 10:00:46 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Update branch name to "main" in README/LANGUAGE_BINDINGS (#8727) 2021-09-01 15:26:34 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Support user-defined timestamps in write-committed txns (#9629) 2022-03-08 16:20:59 -08:00
PLUGINS.md Move RADOS support to separate repo (#9206) 2022-01-24 22:50:07 -08:00
README.md README: De-list slack channel, list Google group (#9387) 2022-01-18 08:19:48 -08:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Support user-defined timestamps in write-committed txns (#9629) 2022-03-08 16:20:59 -08:00
TARGETS Support user-defined timestamps in write-committed txns (#9629) 2022-03-08 16:20:59 -08:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add Solana's RocksDB use case in USERS.md (#9558) 2022-02-16 09:23:01 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.