A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Cheng Chang cd48ecaa1a Define WAL related classes to be used in VersionEdit and VersionSet (#7164)
Summary:
`WalAddition`, `WalDeletion` are defined in `wal_version.h` and used in `VersionEdit`.
`WalAddition` is used to represent events of creating a new WAL (no size, just log number), or closing a WAL (with size).
`WalDeletion` is used to represent events of deleting or archiving a WAL, it means the WAL is no longer alive (won't be replayed during recovery).

`WalSet` is the set of alive WALs kept in `VersionSet`.

1. Why use `WalDeletion` instead of relying on `MinLogNumber` to identify outdated WALs

On recovery, we can compute `MinLogNumber()` based on the log numbers kept in MANIFEST, any log with number < MinLogNumber can be ignored. So it seems that we don't need to persist `WalDeletion` to MANIFEST, since we can ignore the WALs based on MinLogNumber.

But the `MinLogNumber()` is actually a lower bound, it does not exactly mean that logs starting from MinLogNumber must exist. This is because in a corner case, when a column family is empty and never flushed, its log number is set to the largest log number, but not persisted in MANIFEST. So let's say there are 2 column families, when creating the DB, the first WAL has log number 1, so it's persisted to MANIFEST for both column families. Then CF 0 is empty and never flushed, CF 1 is updated and flushed, so a new WAL with log number 2 is created and persisted to MANIFEST for CF 1. But CF 0's log number in MANIFEST is still 1. So on recovery, MinLogNumber is 1, but since log 1 only contains data for CF 1, and CF 1 is flushed, log 1 might have already been deleted from disk.

We can make `MinLogNumber()` be the exactly minimum log number that must exist, by persisting the most recent log number for empty column families that are not flushed. But if there are N such column families, then every time a new WAL is created, we need to add N records to MANIFEST.

In current design, a record is persisted to MANIFEST only when WAL is created, closed, or deleted/archived, so the number of WAL related records are bounded to 3x number of WALs.

2. Why keep `WalSet` in `VersionSet` instead of applying the `VersionEdit`s to `VersionStorageInfo`

`VersionEdit`s are originally designed to track the addition and deletion of SST files. The SST files are related to column families, each column family has a list of `Version`s, and each `Version` keeps the set of active SST files in `VersionStorageInfo`.

But WALs are a concept of DB, they are not bounded to specific column families. So logically it does not make sense to store WALs in a column family's `Version`s.
Also, `Version`'s purpose is to keep reference to SST / blob files, so that they are not deleted until there is no version referencing them. But a WAL is deleted regardless of version references.
So we keep the WALs in `VersionSet`  for the purpose of writing out the DB state's snapshot when creating new MANIFESTs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7164

Test Plan:
make version_edit_test && ./version_edit_test
make wal_edit_test && ./wal_edit_test

Reviewed By: ltamasi

Differential Revision: D22677936

Pulled By: cheng-chang

fbshipit-source-id: 5a3b6890140e572ffd79eb37e6e4c3c32361a859
2020-08-05 16:34:38 -07:00
.circleci Exclude two tests in CircleCI TSAN tests (#7152) 2020-07-20 15:01:17 -07:00
.github/workflows Clean up some code related to file checksums (#6861) 2020-05-21 08:12:51 -07:00
buckifier Exclude c_test from buck build opt mode (#7093) 2020-07-07 11:28:22 -07:00
build_tools Makefile support for link-time optimization (#7181) 2020-07-28 13:10:44 -07:00
cache Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
cmake Add find_dependency() in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Find the correct gcov (#6904) 2020-06-01 16:33:05 -07:00
db Define WAL related classes to be used in VersionEdit and VersionSet (#7164) 2020-08-05 16:34:38 -07:00
db_stress_tool Remove redundant ROCKSDB_LITE check (#7172) 2020-07-24 14:14:14 -07:00
docs remediation of S205607 2020-07-17 17:20:49 -07:00
env Add support to start and end IOTracing through DB APIs (#7203) 2020-08-04 18:41:45 -07:00
examples add WITH_EXAMPLES options to cmake and cleanups. (#6580) 2020-06-18 18:00:04 -07:00
file Make env*_test work with ASSERT_STATUS_CHECKED (#7176) 2020-07-28 22:59:48 -07:00
hdfs fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
include/rocksdb Add support to start and end IOTracing through DB APIs (#7203) 2020-08-04 18:41:45 -07:00
java Compaction Read/Write Stats by Compaction Type (#7165) 2020-07-29 13:39:29 -07:00
logging Fix info log source file display length (#5824) 2020-04-08 20:18:08 -07:00
memory C++20 compatibility (#6697) 2020-04-20 13:24:25 -07:00
memtable More Makefile Cleanup (#7097) 2020-07-09 14:35:17 -07:00
monitoring Compaction Read/Write Stats by Compaction Type (#7165) 2020-07-29 13:39:29 -07:00
options SST Partitioner interface that allows to split SST files (#6957) 2020-07-24 13:44:49 -07:00
port build fixes for GNU/kFreeBSD (#6992) 2020-06-18 09:51:28 -07:00
table Clean up InternalIterator upper bound logic a little bit (#7200) 2020-08-05 10:44:57 -07:00
test_util Make env*_test work with ASSERT_STATUS_CHECKED (#7176) 2020-07-28 22:59:48 -07:00
third-party Revert "Update googletest from 1.8.1 to 1.10.0 (#6808)" (#6923) 2020-06-03 15:55:03 -07:00
tools Compaction Read/Write Stats by Compaction Type (#7165) 2020-07-29 13:39:29 -07:00
trace_replay Add support to start and end IOTracing through DB APIs (#7203) 2020-08-04 18:41:45 -07:00
util Fix Timer unable to schedule new added job (#7216) 2020-08-04 09:20:28 -07:00
utilities Remove assertion from FaultInjectionTestFS::NewDirectory (#7220) 2020-08-05 16:25:14 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Allow missing "unversioned" python, as in CentOS 8 (#6883) 2020-05-29 11:29:23 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml More Makefile clean-up (#7066) 2020-07-06 11:19:48 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
appveyor.yml Remove 2019 from appveyor (#7038) 2020-06-29 14:31:41 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Define WAL related classes to be used in VersionEdit and VersionSet (#7164) 2020-08-05 16:34:38 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Fix a perf regression that caused every key to go through upper bound check (#7209) 2020-08-04 11:30:09 -07:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Add RestoreDBFromLatestBackup to C API, add new C# package (#7092) 2020-07-08 11:56:41 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Define WAL related classes to be used in VersionEdit and VersionSet (#7164) 2020-08-05 16:34:38 -07:00
README.md Add CircleCI gadget (#7028) 2020-06-25 10:30:33 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Define WAL related classes to be used in VersionEdit and VersionSet (#7164) 2020-08-05 16:34:38 -07:00
TARGETS Define WAL related classes to be used in VersionEdit and VersionSet (#7164) 2020-08-05 16:34:38 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add YugabyteDB to USERS (#6786) 2020-05-06 10:28:29 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.