A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Maysam Yabandeh fe642cbee6 WritePrepared: fix race condition in reading batch with duplicate keys (#5147)
Summary:
When ReadOption doesn't specify a snapshot, WritePrepared::Get used kMaxSequenceNumber to avoid the cost of creating a new snapshot object (that requires sync over db_mutex). This creates a race condition if it is reading from the writes of a transaction that had duplicate keys: each instance of duplicate key is inserted with a different sequence number and depending on the ordering the ::Get might skip the newer one and read the older one that is obsolete.
The patch fixes that by using last published seq as the snapshot sequence number. It also adds a check after the read is done to ensure that the max_evicted_seq has not advanced the aforementioned seq, which is a very unlikely event. If it did, then the read is not valid since the seq is not backed by an actually snapshot to let IsInSnapshot handle that properly when an overlapping commit is evicted from commit cache.
A unit  test is added to reproduce the race condition with duplicate keys.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5147

Differential Revision: D14758815

Pulled By: maysamyabandeh

fbshipit-source-id: a56915657132cf6ba5e3f5ea1b5d78c803407719
2019-04-12 14:40:41 -07:00
buckifier Add load statements to rocksdb TARGETS files 2019-02-13 14:08:21 -08:00
build_tools Fix db_stress for custom env (#5122) 2019-03-28 19:20:27 -07:00
cache Consolidate hash function used for non-persistent data in a new function (#5155) 2019-04-08 13:32:06 -07:00
cmake Make FindZLIB consistent with official definitions (#4823) 2019-01-02 12:49:57 -08:00
coverage Remove unused imports, from python scripts. (#4057) 2018-06-26 12:43:04 -07:00
db WritePrepared: fix race condition in reading batch with duplicate keys (#5147) 2019-04-12 14:40:41 -07:00
docs Blog post for format_version=4 2019-03-08 16:49:30 -08:00
env Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165) 2019-04-11 10:45:36 -07:00
examples Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
hdfs Fix db_stress for custom env (#5122) 2019-03-28 19:20:27 -07:00
include/rocksdb Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
java Expose JavaAPI for getting the filter policy of a BlockBasedTableConfig (#5186) 2019-04-12 14:01:36 -07:00
memtable Consolidate hash function used for non-persistent data in a new function (#5155) 2019-04-08 13:32:06 -07:00
monitoring Still implement StatisticsImpl::measureTime() (#5181) 2019-04-12 11:00:35 -07:00
options Change OptimizeForPointLookup() and OptimizeForSmallDb() (#5165) 2019-04-11 10:45:36 -07:00
port #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00
table Fix bugs detected by clang analyzer (#5185) 2019-04-12 10:45:56 -07:00
third-party/gtest-1.7.0/fused-src/gtest remove bundled but unused fbson library (#5108) 2019-03-26 16:37:52 -07:00
tools Fix bugs detected by clang analyzer (#5185) 2019-04-12 10:45:56 -07:00
util Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
utilities WritePrepared: fix race condition in reading batch with duplicate keys (#5147) 2019-04-12 14:40:41 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore RocksDB Trace Analyzer (#4091) 2018-08-13 11:44:02 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Fix printf formatting on MacOS (#4533) 2018-10-19 14:46:09 -07:00
appveyor.yml Add RocksJava build to AppVeyor 2019-01-03 10:44:44 -08:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
CODE_OF_CONDUCT.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
defs.bzl [sync fix] Add defs.bzl 2019-02-28 11:35:30 -08:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md WritePrepared: fix race condition in reading batch with duplicate keys (#5147) 2019-04-12 14:40:41 -07:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
issue_template.md Add a template for issues 2017-09-29 11:41:28 -07:00
LANGUAGE-BINDINGS.md LANGUAGE-BINDINGS.md: mention python-rocksdb 2019-03-20 11:10:48 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
README.md Add LevelDB repository link in the Readme 2019-04-01 18:19:09 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
TARGETS Support for single-primary, multi-secondary instances (#4899) 2019-03-26 16:45:31 -07:00
thirdparty.inc Provide a way to override windows memory allocator with jemalloc for ZSTD 2018-06-04 12:12:48 -07:00
USERS.md Adding IOTA Foundation to USERS.MD (#4436) 2018-10-02 10:03:46 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.