Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.
Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20979
103 lines
4.1 KiB
C++
103 lines
4.1 KiB
C++
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
|
// This source code is licensed under the BSD-style license found in the
|
|
// LICENSE file in the root directory of this source tree. An additional grant
|
|
// of patent rights can be found in the PATENTS file in the same directory.
|
|
//
|
|
// Copyright (c) 2012 The LevelDB Authors. All rights reserved.
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
//
|
|
// A filter block is stored near the end of a Table file. It contains
|
|
// filters (e.g., bloom filters) for all data blocks in the table combined
|
|
// into a single filter block.
|
|
|
|
#pragma once
|
|
|
|
#include <stddef.h>
|
|
#include <stdint.h>
|
|
#include <string>
|
|
#include <memory>
|
|
#include <vector>
|
|
#include "rocksdb/options.h"
|
|
#include "rocksdb/slice.h"
|
|
#include "rocksdb/slice_transform.h"
|
|
#include "table/filter_block.h"
|
|
#include "util/hash.h"
|
|
|
|
namespace rocksdb {
|
|
|
|
|
|
// A BlockBasedFilterBlockBuilder is used to construct all of the filters for a
|
|
// particular Table. It generates a single string which is stored as
|
|
// a special block in the Table.
|
|
//
|
|
// The sequence of calls to BlockBasedFilterBlockBuilder must match the regexp:
|
|
// (StartBlock Add*)* Finish
|
|
class BlockBasedFilterBlockBuilder : public FilterBlockBuilder {
|
|
public:
|
|
BlockBasedFilterBlockBuilder(const SliceTransform* prefix_extractor,
|
|
const BlockBasedTableOptions& table_opt);
|
|
|
|
virtual bool IsBlockBased() override { return true; }
|
|
virtual void StartBlock(uint64_t block_offset) override;
|
|
virtual void Add(const Slice& key) override;
|
|
virtual Slice Finish() override;
|
|
|
|
private:
|
|
void AddKey(const Slice& key);
|
|
void AddPrefix(const Slice& key);
|
|
void GenerateFilter();
|
|
|
|
// important: all of these might point to invalid addresses
|
|
// at the time of destruction of this filter block. destructor
|
|
// should NOT dereference them.
|
|
const FilterPolicy* policy_;
|
|
const SliceTransform* prefix_extractor_;
|
|
bool whole_key_filtering_;
|
|
|
|
std::string entries_; // Flattened entry contents
|
|
std::vector<size_t> start_; // Starting index in entries_ of each entry
|
|
uint32_t added_to_start_; // To indicate if key is added
|
|
std::string result_; // Filter data computed so far
|
|
std::vector<Slice> tmp_entries_; // policy_->CreateFilter() argument
|
|
std::vector<uint32_t> filter_offsets_;
|
|
|
|
// No copying allowed
|
|
BlockBasedFilterBlockBuilder(const BlockBasedFilterBlockBuilder&);
|
|
void operator=(const BlockBasedFilterBlockBuilder&);
|
|
};
|
|
|
|
// A FilterBlockReader is used to parse filter from SST table.
|
|
// KeyMayMatch and PrefixMayMatch would trigger filter checking
|
|
class BlockBasedFilterBlockReader : public FilterBlockReader {
|
|
public:
|
|
// REQUIRES: "contents" and *policy must stay live while *this is live.
|
|
BlockBasedFilterBlockReader(const SliceTransform* prefix_extractor,
|
|
const BlockBasedTableOptions& table_opt,
|
|
const Slice& contents,
|
|
bool delete_contents_after_use = false);
|
|
virtual bool IsBlockBased() override { return true; }
|
|
virtual bool KeyMayMatch(const Slice& key,
|
|
uint64_t block_offset = kNotValid) override;
|
|
virtual bool PrefixMayMatch(const Slice& prefix,
|
|
uint64_t block_offset = kNotValid) override;
|
|
virtual size_t ApproximateMemoryUsage() const override;
|
|
|
|
private:
|
|
const FilterPolicy* policy_;
|
|
const SliceTransform* prefix_extractor_;
|
|
bool whole_key_filtering_;
|
|
const char* data_; // Pointer to filter data (at block-start)
|
|
const char* offset_; // Pointer to beginning of offset array (at block-end)
|
|
size_t num_; // Number of entries in offset array
|
|
size_t base_lg_; // Encoding parameter (see kFilterBaseLg in .cc file)
|
|
std::unique_ptr<const char[]> filter_data;
|
|
|
|
bool MayMatch(const Slice& entry, uint64_t block_offset);
|
|
|
|
// No copying allowed
|
|
BlockBasedFilterBlockReader(const BlockBasedFilterBlockReader&);
|
|
void operator=(const BlockBasedFilterBlockReader&);
|
|
};
|
|
} // namespace rocksdb
|