rocksdb/db/merge_context.h
anand76 fefd4b98c5 Introduce a new MultiGet batching implementation (#5011)
Summary:
This PR introduces a new MultiGet() API, with the underlying implementation grouping keys based on SST file and batching lookups in a file. The reason for the new API is twofold - the definition allows callers to allocate storage for status and values on stack instead of std::vector, as well as return values as PinnableSlices in order to avoid copying, and it keeps the original MultiGet() implementation intact while we experiment with batching.

Batching is useful when there is some spatial locality to the keys being queries, as well as larger batch sizes. The main benefits are due to -
1. Fewer function calls, especially to BlockBasedTableReader::MultiGet() and FullFilterBlockReader::KeysMayMatch()
2. Bloom filter cachelines can be prefetched, hiding the cache miss latency

The next step is to optimize the binary searches in the level_storage_info, index blocks and data blocks, since we could reduce the number of key comparisons if the keys are relatively close to each other. The batching optimizations also need to be extended to other formats, such as PlainTable and filter formats. This also needs to be added to db_stress.

Benchmark results from db_bench for various batch size/locality of reference combinations are given below. Locality was simulated by offsetting the keys in a batch by a stride length. Each SST file is about 8.6MB uncompressed and key/value size is 16/100 uncompressed. To focus on the cpu benefit of batching, the runs were single threaded and bound to the same cpu to eliminate interference from other system events. The results show a 10-25% improvement in micros/op from smaller to larger batch sizes (4 - 32).

Batch   Sizes

1        | 2        | 4         | 8      | 16  | 32

Random pattern (Stride length 0)
4.158 | 4.109 | 4.026 | 4.05 | 4.1 | 4.074        - Get
4.438 | 4.302 | 4.165 | 4.122 | 4.096 | 4.075 - MultiGet (no batching)
4.461 | 4.256 | 4.277 | 4.11 | 4.182 | 4.14        - MultiGet (w/ batching)

Good locality (Stride length 16)
4.048 | 3.659 | 3.248 | 2.99 | 2.84 | 2.753
4.429 | 3.728 | 3.406 | 3.053 | 2.911 | 2.781
4.452 | 3.45 | 2.833 | 2.451 | 2.233 | 2.135

Good locality (Stride length 256)
4.066 | 3.786 | 3.581 | 3.447 | 3.415 | 3.232
4.406 | 4.005 | 3.644 | 3.49 | 3.381 | 3.268
4.393 | 3.649 | 3.186 | 2.882 | 2.676 | 2.62

Medium locality (Stride length 4096)
4.012 | 3.922 | 3.768 | 3.61 | 3.582 | 3.555
4.364 | 4.057 | 3.791 | 3.65 | 3.57 | 3.465
4.479 | 3.758 | 3.316 | 3.077 | 2.959 | 2.891

dbbench command used (on a DB with 4 levels, 12 million keys)-
TEST_TMPDIR=/dev/shm numactl -C 10  ./db_bench.tmp -use_existing_db=true -benchmarks="readseq,multireadrandom" -write_buffer_size=4194304 -target_file_size_base=4194304 -max_bytes_for_level_base=16777216 -num=12000000 -reads=12000000 -duration=90 -threads=1 -compression_type=none -cache_size=4194304000 -batch_size=32 -disable_auto_compactions=true -bloom_bits=10 -cache_index_and_filter_blocks=true -pin_l0_filter_and_index_blocks_in_cache=true -multiread_batched=true -multiread_stride=4
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5011

Differential Revision: D14348703

Pulled By: anand1976

fbshipit-source-id: 774406dab3776d979c809522a67bedac6c17f84b
2019-04-11 14:28:26 -07:00

135 lines
3.7 KiB
C++

// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
#pragma once
#include <algorithm>
#include <memory>
#include <string>
#include <vector>
#include "rocksdb/slice.h"
namespace rocksdb {
const std::vector<Slice> empty_operand_list;
// The merge context for merging a user key.
// When doing a Get(), DB will create such a class and pass it when
// issuing Get() operation to memtables and version_set. The operands
// will be fetched from the context when issuing partial of full merge.
class MergeContext {
public:
// Clear all the operands
void Clear() {
if (operand_list_) {
operand_list_->clear();
copied_operands_->clear();
}
}
// Push a merge operand
void PushOperand(const Slice& operand_slice, bool operand_pinned = false) {
Initialize();
SetDirectionBackward();
if (operand_pinned) {
operand_list_->push_back(operand_slice);
} else {
// We need to have our own copy of the operand since it's not pinned
copied_operands_->emplace_back(
new std::string(operand_slice.data(), operand_slice.size()));
operand_list_->push_back(*copied_operands_->back());
}
}
// Push back a merge operand
void PushOperandBack(const Slice& operand_slice,
bool operand_pinned = false) {
Initialize();
SetDirectionForward();
if (operand_pinned) {
operand_list_->push_back(operand_slice);
} else {
// We need to have our own copy of the operand since it's not pinned
copied_operands_->emplace_back(
new std::string(operand_slice.data(), operand_slice.size()));
operand_list_->push_back(*copied_operands_->back());
}
}
// return total number of operands in the list
size_t GetNumOperands() const {
if (!operand_list_) {
return 0;
}
return operand_list_->size();
}
// Get the operand at the index.
Slice GetOperand(int index) {
assert(operand_list_);
SetDirectionForward();
return (*operand_list_)[index];
}
// Same as GetOperandsDirectionForward
const std::vector<Slice>& GetOperands() {
return GetOperandsDirectionForward();
}
// Return all the operands in the order as they were merged (passed to
// FullMerge or FullMergeV2)
const std::vector<Slice>& GetOperandsDirectionForward() {
if (!operand_list_) {
return empty_operand_list;
}
SetDirectionForward();
return *operand_list_;
}
// Return all the operands in the reversed order relative to how they were
// merged (passed to FullMerge or FullMergeV2)
const std::vector<Slice>& GetOperandsDirectionBackward() {
if (!operand_list_) {
return empty_operand_list;
}
SetDirectionBackward();
return *operand_list_;
}
private:
void Initialize() {
if (!operand_list_) {
operand_list_.reset(new std::vector<Slice>());
copied_operands_.reset(new std::vector<std::unique_ptr<std::string>>());
}
}
void SetDirectionForward() {
if (operands_reversed_ == true) {
std::reverse(operand_list_->begin(), operand_list_->end());
operands_reversed_ = false;
}
}
void SetDirectionBackward() {
if (operands_reversed_ == false) {
std::reverse(operand_list_->begin(), operand_list_->end());
operands_reversed_ = true;
}
}
// List of operands
std::unique_ptr<std::vector<Slice>> operand_list_;
// Copy of operands that are not pinned.
std::unique_ptr<std::vector<std::unique_ptr<std::string>>> copied_operands_;
bool operands_reversed_ = true;
};
} // namespace rocksdb