Peter Dillinger 8b8a2e9f05 Ribbon: major re-work of hashing, seeds, and more (#7635)
Summary:
* Fully optimized StandardHasher, in terms of efficiently generating Start, CoeffRow, and ResultRow from a stock hash value, with sufficient independence between them to have no measurably degraded behavior. (Degraded behavior would be an FP rate higher than explainable by 2^-b and, if using a 32-bit stock hash function, expected stock hash collisions.) Details in code comments.
* Our standard 64-bit and 32-bit hash functions do not exhibit sufficient independence on sequential seeds (for one Ribbon construction attempt to have independent probability from the next). I have worked around this in the Ribbon code by "pre-mixing" "ordinal seeds," sequentially tried and appropriate for storage in persisted metadata, into "raw seeds," ready for application and appropriate for in-memory storage. This way the pre-mixing step (though fast) is only applied on loading or configuring the structure, not on each query or banding add.
* Fix a subtle flaw in which backtracking not clearing ResultRow data could lead to elevated FP rate on keys that were backtracked on and should (for generality) exhibit the same FP rate as novel keys.
* Added a basic test for PhsfQuery and construction algorithms (map or "retrieval structure" rather than set or filter), and made a few trivial related fixes.
* Better random configuration generation in unit tests
* Some other minor cleanup / clarification / etc.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7635

Test Plan: unit tests included

Reviewed By: jay-zhuang

Differential Revision: D24738978

Pulled By: pdillinger

fbshipit-source-id: f9d03599d9e2ca3e30e9d3e7d81cd936b56f76f0
2020-11-07 17:22:54 -08:00
..
2020-10-28 23:22:27 -07:00
2017-07-15 16:11:23 -07:00
2020-10-28 23:22:27 -07:00
2020-04-20 13:24:25 -07:00
2020-10-01 09:23:04 -07:00
2020-04-20 13:24:25 -07:00
2020-07-09 14:35:17 -07:00
2020-07-09 14:35:17 -07:00
2020-10-28 23:22:27 -07:00
2017-07-15 16:11:23 -07:00
2020-08-21 15:48:52 -07:00
2020-10-01 09:23:04 -07:00
2020-10-28 23:22:27 -07:00