Nikhil Benesch 7891af8b53 expose a hook to skip tables during iteration
Summary:
As discussed on the mailing list (["Skipping entire SSTs while iterating"](https://groups.google.com/forum/#!topic/rocksdb/ujHCJVLrHlU)), this patch adds a `table_filter` to `ReadOptions` that allows specifying a callback to be executed during iteration before each table in the database is scanned. The callback is passed the table's properties; the table is scanned iff the callback returns true.

This can be used in conjunction with a `TablePropertiesCollector` to dramatically speed up scans by skipping tables that are known to contain irrelevant data for the scan at hand.

We're using this [downstream in CockroachDB](https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/engine/db.cc#L2009-L2022) already. With this feature, under ideal conditions, we can reduce the time of an incremental backup in  from hours to seconds.

FYI, the first commit in this PR fixes a segfault that I unfortunately have not figured out how to reproduce outside of CockroachDB. I'm hoping you accept it on the grounds that it is not correct to return 8-byte aligned memory from a call to `malloc` on some 64-bit platforms; one correct approach is to infer the necessary alignment from `std::max_align_t`, as done here. As noted in the first commit message, the bug is tickled by having a`std::function` in `struct ReadOptions`. That is, the following patch alone is enough to cause RocksDB to segfault when run from CockroachDB on Darwin.

```diff
 --- a/include/rocksdb/options.h
+++ b/include/rocksdb/options.h
@@ -1546,6 +1546,13 @@ struct ReadOptions {
   // Default: false
   bool ignore_range_deletions;

+  // A callback to determine whether relevant keys for this scan exist in a
+  // given table based on the table's properties. The callback is passed the
+  // properties of each table during iteration. If the callback returns false,
+  // the table will not be scanned.
+  // Default: empty (every table will be scanned)
+  std::function<bool(const TableProperties&)> table_filter;
+
   ReadOptions();
   ReadOptions(bool cksum, bool cache);
 };
```

/cc danhhz
Closes https://github.com/facebook/rocksdb/pull/2265

Differential Revision: D5054262

Pulled By: yiwu-arbug

fbshipit-source-id: dd6b28f2bba6cb8466250d8c5c542d3c92785476
2017-10-17 22:12:00 -07:00
2017-10-17 08:57:09 -07:00
2017-08-31 16:42:48 -07:00
2017-08-03 15:56:46 -07:00
2017-09-28 10:14:09 -07:00
2017-10-04 18:12:52 -07:00
2017-10-17 08:57:09 -07:00
2017-09-19 10:28:26 -07:00
2017-10-17 08:57:09 -07:00
2017-08-31 16:42:48 -07:00
2014-09-29 10:52:18 -07:00
2017-10-12 18:28:24 -07:00
2017-04-27 18:06:12 -07:00
2017-10-04 19:15:01 -07:00
2017-09-28 16:12:24 -07:00
2017-09-29 11:41:28 -07:00
2017-07-15 16:11:23 -07:00
2017-10-12 18:28:24 -07:00
2016-07-26 13:54:08 -07:00
2015-05-29 14:36:35 -07:00
2017-10-12 18:28:24 -07:00
2017-10-12 18:28:24 -07:00
2017-09-25 15:56:40 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Build Status Build status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

Description
A library that provides an embeddable, persistent key-value store for fast storage.
Readme 271 MiB
Languages
C++ 82.1%
Java 10.3%
C 2.5%
Python 1.7%
Perl 1.1%
Other 2.1%