Commit Graph

4 Commits

Author SHA1 Message Date
Peter Dillinger
746909ceda Ribbon: InterleavedSolutionStorage (#7598)
Summary:
The core algorithms for InterleavedSolutionStorage and the
implementation SerializableInterleavedSolution make Ribbon fast for
filter queries. Example output from new unit test:

    Simple      outside query, hot, incl hashing, ns/key: 117.796
    Interleaved outside query, hot, incl hashing, ns/key: 42.2655
    Bloom       outside query, hot, incl hashing, ns/key: 24.0071

Also includes misc cleanup of previous Ribbon code and comments.

Some TODOs and FIXMEs remain for futher work / investigation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7598

Test Plan: unit tests included (integration work and tests coming later)

Reviewed By: jay-zhuang

Differential Revision: D24559209

Pulled By: pdillinger

fbshipit-source-id: fea483cd354ba782aea3e806f2bc96e183d59441
2020-11-03 12:46:36 -08:00
Peter Dillinger
25d54c799c Ribbon: initial (general) algorithms and basic unit test (#7491)
Summary:
This is intended as the first commit toward a near-optimal alternative to static Bloom filters for SSTs. Stephan Walzer and I have agreed upon the name "Ribbon" for a PHSF based on his linear system construction in "Efficient Gauss Elimination for Near-Quadratic Matrices with One Short Random Block per Row, with Applications" ("SGauss") and my much faster "on the fly" algorithm for gaussian elimination (or for this linear system, "banding"), which can be faster than peeling while also more compact and flexible. See util/ribbon_alg.h for more detailed introduction and background. RIBBON = Rapid Incremental Boolean Banding ON-the-fly

This commit just adds generic (templatized) core algorithms and a basic unit test showing some features, including the ability to construct structures within 2.5% space overhead vs. information theoretic lower bound. (Compare to cache-local Bloom filter's ~50% space overhead -> ~30% reduction anticipated.) This commit does not include the storage scheme necessary to make queries fast, especially for filter queries, nor fractional "result bits", but there is some description already and those implementations will come soon. Nor does this commit add FilterPolicy support, for use in SST files, but that will also come soon.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7491

Reviewed By: jay-zhuang

Differential Revision: D24517954

Pulled By: pdillinger

fbshipit-source-id: 0119ee597e250d7e0edd38ada2ba50d755606fa7
2020-10-25 20:44:49 -07:00
Peter Dillinger
a16d1b2fd3 Add Encode/DecodeFixedGeneric, coding_lean.h (#7587)
Summary:
To minimize dependencies for Ribbon filter code in progress,
core part of coding.h for fixed sizes has been moved to coding_lean.h.
Also, generic versions of these functions have been added to math128.h
(since the generic versions are likely only to be used along with
Unsigned128).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7587

Test Plan: Unit tests added for new functions

Reviewed By: jay-zhuang

Differential Revision: D24486718

Pulled By: pdillinger

fbshipit-source-id: a69768f742379689442135fa52237c01dfe2647e
2020-10-23 14:11:15 -07:00
Peter Dillinger
c4d8838a2b New bit manipulation functions and 128-bit value library (#7338)
Summary:
These new functions and 128-bit value bit operations are
expected to be used in a forthcoming Bloom filter alternative.

No functional changes to production code, just new code only called by
unit tests, cosmetic changes to existing headers, and fix an existing
function for a yet-unused template instantiation (BitsSetToOne on
something signed and smaller than 32 bits).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7338

Test Plan:
Unit tests included. Works with and without
TEST_UINT128_COMPAT=1 to check compatibility with and without
__uint128_t. Also added that parameter to the CircleCI build
build-linux-shared_lib-alt_namespace-status_checked.

Reviewed By: jay-zhuang

Differential Revision: D23494945

Pulled By: pdillinger

fbshipit-source-id: 5c0dc419100d9df5d4d9abb153b2855d5aea39e8
2020-09-03 09:32:59 -07:00