A library that provides an embeddable, persistent key-value store for fast storage.
Go to file
Andrew Kryczka fea2b1dfb2 Copy Get() result when file reads use mmap
Summary:
For iterator reads, a `SuperVersion` is pinned to preserve a snapshot of SST files, and `Block`s are pinned to allow `key()` and `value()` to return pointers directly into a RocksDB memory region. This works for both non-mmap reads, where the block owns the memory region, and mmap reads, where the file owns the memory region.

For point reads with `PinnableSlice`, only the `Block` object is pinned. This works for non-mmap reads because the block owns the memory region, so even if the file is deleted after compaction, the memory region survives. However, for mmap reads, file deletion causes the memory region to which the `PinnableSlice` refers to be unmapped.   The result is usually a segfault upon accessing the `PinnableSlice`, although sometimes it returned wrong results (I repro'd this a bunch of times with `db_stress`).

This PR copies the value into the `PinnableSlice` when it comes from mmap'd memory. We can tell whether the `Block` owns its memory using `Block::cachable()`, which is unset when reads do not use the provided buffer as is the case with mmap file reads. When that is false we ensure the result of `Get()` is copied.

This feels like a short-term solution as ideally we'd have the `PinnableSlice` pin the mmap'd memory so we can do zero-copy reads. It seemed hard so I chose this approach to fix correctness in the meantime.
Closes https://github.com/facebook/rocksdb/pull/3881

Differential Revision: D8076288

Pulled By: ajkr

fbshipit-source-id: 31d78ec010198723522323dbc6ea325122a46b08
2018-06-01 16:57:58 -07:00
buckifier Update buckifier and TARGETS 2018-03-30 14:26:53 -07:00
build_tools Pass -latomic to linker when using clang 2018-04-25 12:13:41 -07:00
cache Fix LRUCache missing null check on destruct 2018-05-29 15:13:09 -07:00
cmake Search paths provided by intel's "tbbvars.sh". 2018-05-07 14:28:36 -07:00
coverage Suppress lint in old files 2018-01-29 12:56:42 -08:00
db Copy Get() result when file reads use mmap 2018-06-01 16:57:58 -07:00
docs Adding blog post for 5.10.2 release 2018-02-13 11:56:59 -08:00
env Fix Fadvise on closed file when reads use mmap 2018-05-25 10:57:57 -07:00
examples Pinnableslice examples and blog post 2017-08-24 12:26:07 -07:00
hdfs Comment out unused variables 2018-03-05 13:13:41 -08:00
include/rocksdb add c api rocksdb_sstfilewriter_file_size 2018-06-01 09:43:59 -07:00
java Fix an issue with unnecessary capture in lambda expressions 2018-05-25 15:12:44 -07:00
memtable Remove tests from ROCKSDB_VALGRIND_RUN 2018-05-30 16:15:16 -07:00
monitoring Print histogram count and sum in statistics string 2018-05-21 11:12:47 -07:00
options PersistRocksDBOptions() to use WritableFileWriter 2018-05-21 16:42:22 -07:00
port Catchup with posix features 2018-05-24 15:13:04 -07:00
table Copy Get() result when file reads use mmap 2018-06-01 16:57:58 -07:00
third-party fix some text in comments. 2018-04-10 15:59:24 -07:00
tools Configure direct I/O statically in db_stress 2018-06-01 16:42:34 -07:00
util Exclude seq from index keys 2018-05-25 18:42:43 -07:00
utilities Extend existing unit tests to run with WriteUnprepared as well 2018-06-01 14:58:41 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Remove leftover references to phutil_module_cache 2017-08-23 12:12:21 -07:00
.travis.yml add 4th test_group in travis 2018-03-13 18:57:29 -07:00
appveyor.yml Upgrade Appveyor to VS2017 2018-02-01 13:57:01 -08:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Extend existing unit tests to run with WriteUnprepared as well 2018-06-01 14:58:41 -07:00
CODE_OF_CONDUCT.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Copy Get() result when file reads use mmap 2018-06-01 16:57:58 -07:00
INSTALL.md Enable compilation on OpenBSD 2018-03-19 12:30:05 -07:00
issue_template.md Add a template for issues 2017-09-29 11:41:28 -07:00
LANGUAGE-BINDINGS.md Add Nim to the list of language bindings 2018-01-29 09:57:46 -08:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Extend existing unit tests to run with WriteUnprepared as well 2018-06-01 14:58:41 -07:00
README.md Add dual-license info to README.md 2018-03-06 12:43:51 -08:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
src.mk Extend existing unit tests to run with WriteUnprepared as well 2018-06-01 14:58:41 -07:00
TARGETS Extend existing unit tests to run with WriteUnprepared as well 2018-06-01 14:58:41 -07:00
thirdparty.inc Make Windows dep switches compatible with other builds 2018-01-05 14:56:54 -08:00
USERS.md Added ProfaneDB 2017-11-19 10:11:44 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.