A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Yanqin Jin 08721293ea Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236)
Summary:
`db_stress` is a user of `FaultInjectionTestFS`. After injecting a write error, `db_stress` probabilistically determins
data drop (https://github.com/facebook/rocksdb/blob/6.27.fb/db_stress_tool/db_stress_test_base.cc#L2615:L2619).

In some of our recent runs of `db_stress`, we found duplicate trailing entries corresponding to file trivial move in
the MANIFEST, causing the recovery to fail, because the file move operation is not idempotent: you cannot delete a
file from a given level twice.

Investigation suggests that data buffering in both `WritableFileWriter` and `FaultInjectionTestFS` may be the root cause.

WritableFileWriter buffers data to write in a memory buffer, `WritableFileWriter::buf_`. After each
`WriteBuffered()`/`WriteBufferedWithChecksum()` succeeds, the `buf_` is cleared.

If the underlying file `WritableFileWriter::writable_file_` is opened in buffered IO mode, then `FaultInjectionTestFS`
buffers data written for each file until next file sync. After an injected error, user of `FaultInjectionFS` can
choose to drop some or none of previously buffered data. If `db_stress` does not drop any unsynced data, then
such data will still exist in the `FaultInjectionTestFS`'s buffer.

Existing implementation of `WritableileWriter::WriteBuffered()` does not clear `buf_` if there is an error. This may lead
to the data being buffered two copies: one in `WritableFileWriter`, and another in `FaultInjectionTestFS`.
We also know that the `WritableFileWriter` of MANIFEST file will close upon an error.  During `Close()`, it will flush the
content in `buf_`. If no write error is injected to `FaultInjectionTestFS` this time, then we end up with two copies of the
data appended to the file.

To fix, we clear the `WritableFileWriter::buf_` upon failure as well. We focus this PR on files opened in non-direct mode.

This PR includes a unit test to reproduce a case when write error injection
to `WritableFile` can cause duplicate trailing entries.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9236

Test Plan: make check

Reviewed By: zhichao-cao

Differential Revision: D33033984

Pulled By: riversand963

fbshipit-source-id: ebfa5a0db8cbf1ed73100528b34fcba543c5db31
2021-12-13 09:00:36 -08:00
.circleci Fix some CI output (#9193) 2021-11-20 10:21:58 -08:00
.github/workflows Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
buckifier Update buckify scripts (#9104) 2021-11-01 10:11:18 -07:00
build_tools More improvements to output for CircleCI (#9201) 2021-11-23 22:10:27 -08:00
cache Fix build for msvc (#9230) 2021-11-29 14:27:48 -08:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Remove asan_symbolize.py for internal asan build (#8737) 2021-09-07 15:39:11 -07:00
db Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
db_stress_tool Fix segmentation fault in table_options.prepopulate_block_cache when used with partition_filters (#9263) 2021-12-08 12:44:38 -08:00
docs Misc doc fixes (#8983) 2021-10-07 11:22:17 -07:00
env Fix fstatfs call for compilation on 32 bit systems (#9251) 2021-12-08 22:01:23 -08:00
examples Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
file Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
fuzz Make EventListener into a Customizable Class (#8473) 2021-07-27 07:47:02 -07:00
hdfs fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
include/rocksdb Fix link error reported in issue 9272 (#9278) 2021-12-10 20:33:46 -08:00
java Fix copy constructors of Options and ColumnFamilyOptions (#9166) 2021-12-13 07:22:56 -08:00
logging Use system-wide thread ID in info log lines (#9164) 2021-11-12 19:46:06 -08:00
memory Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
memtable Minor improvement to CacheReservationManager/WriteBufferManager/CompressionDictBuilding (#9139) 2021-11-05 16:13:47 -07:00
microbench Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
monitoring Add tiered storage related read bytes stats to Statistic (#9123) 2021-11-16 15:17:17 -08:00
options Fix an issue with MemTableRepFactory::CreateFromString (#9273) 2021-12-09 12:36:18 -08:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port Fix/improve 'must free heap allocations' code (#9209) 2021-11-29 10:53:52 -08:00
table More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
test_util More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
third-party Add support for building on s390x platform (#8962) 2021-10-22 10:13:15 -07:00
tools Add commit marker with timestamp (#9266) 2021-12-10 11:05:35 -08:00
trace_replay Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
util More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
utilities Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore gitignore cmake-build-* for CLion integration (#7933) 2021-02-19 13:43:15 -08:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Re-enable 390x+cmake* Travis jobs (#9110) 2021-11-03 20:30:15 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Remove 2019 from appveyor (#7038) 2020-06-29 14:31:41 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
INSTALL.md Update installation instructions (#8158) 2021-04-06 16:02:04 -07:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Update branch name to "main" in README/LANGUAGE_BINDINGS (#8727) 2021-09-01 15:26:34 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Get DBTest passing Assert Status Checked (#7737) 2021-12-09 11:00:17 -08:00
PLUGINS.md Add ZenFS to plugin list (#8218) 2021-04-22 11:12:40 -07:00
README.md Update branch name to "main" in README/LANGUAGE_BINDINGS (#8727) 2021-09-01 15:26:34 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Fix Statistics in db_stress (#9260) 2021-12-07 16:24:22 -08:00
TARGETS Fix Statistics in db_stress (#9260) 2021-12-07 16:24:22 -08:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Update USERS.md (#8923) 2021-10-01 16:10:35 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.