fc9d4071f0
Summary: Fixes a major performance regression in 6.26, where extra CPU is spent in SliceTransform::AsString when reads involve a prefix_extractor (Get, MultiGet, Seek). Common case performance is now better than 6.25. This change creates a "fast path" for verifying that the current prefix extractor is unchanged and compatible with what was used to generate a table file. This fast path detects the common case by pointer comparison on the current prefix_extractor and a "known good" prefix extractor (if applicable) that is saved at the time the table reader is opened. The "known good" prefix extractor is saved as another shared_ptr copy (in an existing field, however) to ensure the pointer is not recycled. When the prefix_extractor has changed to a different instance but same compatible configuration (rare, odd), performance is still a regression compared to 6.25, but this is likely acceptable because of the oddity of such a case. The performance of incompatible prefix_extractor is essentially unchanged. Also fixed a minor case (ForwardIterator) where a prefix_extractor could be used via a raw pointer after being freed as a shared_ptr, if replaced via SetOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9407 Test Plan: ## Performance Populate DB with `TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12` Running head-to-head comparisons simultaneously with `TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=seekrandom -num=10000000 -duration=20 -disable_wal=1 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12` Below each is compared by ops/sec vs. baseline which is version 6.25 (multiple baseline runs because of variable machine load) v6.26: 4833 vs. 6698 (<- major regression!) v6.27: 4737 vs. 6397 (still) New: 6704 vs. 6461 (better than baseline in common case) Disabled fastpath: 4843 vs. 6389 (e.g. if prefix extractor instance changes but is still compatible) Changed prefix size (no usable filter) in new: 787 vs. 5927 Changed prefix size (no usable filter) in new & baseline: 773 vs. 784 Reviewed By: mrambacher Differential Revision: D33677812 Pulled By: pdillinger fbshipit-source-id: 571d9711c461fb97f957378a061b7e7dbc4d6a76
165 lines
5.7 KiB
C++
165 lines
5.7 KiB
C++
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
#pragma once
|
|
|
|
#ifndef ROCKSDB_LITE
|
|
|
|
#include <string>
|
|
#include <vector>
|
|
#include <queue>
|
|
|
|
#include "memory/arena.h"
|
|
#include "rocksdb/db.h"
|
|
#include "rocksdb/iterator.h"
|
|
#include "rocksdb/options.h"
|
|
#include "table/internal_iterator.h"
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
class DBImpl;
|
|
class Env;
|
|
struct SuperVersion;
|
|
class ColumnFamilyData;
|
|
class ForwardLevelIterator;
|
|
class VersionStorageInfo;
|
|
struct FileMetaData;
|
|
|
|
class MinIterComparator {
|
|
public:
|
|
explicit MinIterComparator(const Comparator* comparator) :
|
|
comparator_(comparator) {}
|
|
|
|
bool operator()(InternalIterator* a, InternalIterator* b) {
|
|
return comparator_->Compare(a->key(), b->key()) > 0;
|
|
}
|
|
private:
|
|
const Comparator* comparator_;
|
|
};
|
|
|
|
using MinIterHeap =
|
|
std::priority_queue<InternalIterator*, std::vector<InternalIterator*>,
|
|
MinIterComparator>;
|
|
|
|
/**
|
|
* ForwardIterator is a special type of iterator that only supports Seek()
|
|
* and Next(). It is expected to perform better than TailingIterator by
|
|
* removing the encapsulation and making all information accessible within
|
|
* the iterator. At the current implementation, snapshot is taken at the
|
|
* time Seek() is called. The Next() followed do not see new values after.
|
|
*/
|
|
class ForwardIterator : public InternalIterator {
|
|
public:
|
|
ForwardIterator(DBImpl* db, const ReadOptions& read_options,
|
|
ColumnFamilyData* cfd, SuperVersion* current_sv = nullptr,
|
|
bool allow_unprepared_value = false);
|
|
virtual ~ForwardIterator();
|
|
|
|
void SeekForPrev(const Slice& /*target*/) override {
|
|
status_ = Status::NotSupported("ForwardIterator::SeekForPrev()");
|
|
valid_ = false;
|
|
}
|
|
void SeekToLast() override {
|
|
status_ = Status::NotSupported("ForwardIterator::SeekToLast()");
|
|
valid_ = false;
|
|
}
|
|
void Prev() override {
|
|
status_ = Status::NotSupported("ForwardIterator::Prev");
|
|
valid_ = false;
|
|
}
|
|
|
|
virtual bool Valid() const override;
|
|
void SeekToFirst() override;
|
|
virtual void Seek(const Slice& target) override;
|
|
virtual void Next() override;
|
|
virtual Slice key() const override;
|
|
virtual Slice value() const override;
|
|
virtual Status status() const override;
|
|
virtual bool PrepareValue() override;
|
|
virtual Status GetProperty(std::string prop_name, std::string* prop) override;
|
|
virtual void SetPinnedItersMgr(
|
|
PinnedIteratorsManager* pinned_iters_mgr) override;
|
|
virtual bool IsKeyPinned() const override;
|
|
virtual bool IsValuePinned() const override;
|
|
|
|
bool TEST_CheckDeletedIters(int* deleted_iters, int* num_iters);
|
|
|
|
private:
|
|
void Cleanup(bool release_sv);
|
|
// Unreference and, if needed, clean up the current SuperVersion. This is
|
|
// either done immediately or deferred until this iterator is unpinned by
|
|
// PinnedIteratorsManager.
|
|
void SVCleanup();
|
|
static void SVCleanup(
|
|
DBImpl* db, SuperVersion* sv, bool background_purge_on_iterator_cleanup);
|
|
static void DeferredSVCleanup(void* arg);
|
|
|
|
void RebuildIterators(bool refresh_sv);
|
|
void RenewIterators();
|
|
void BuildLevelIterators(const VersionStorageInfo* vstorage,
|
|
SuperVersion* sv);
|
|
void ResetIncompleteIterators();
|
|
void SeekInternal(const Slice& internal_key, bool seek_to_first);
|
|
void UpdateCurrent();
|
|
bool NeedToSeekImmutable(const Slice& internal_key);
|
|
void DeleteCurrentIter();
|
|
uint32_t FindFileInRange(
|
|
const std::vector<FileMetaData*>& files, const Slice& internal_key,
|
|
uint32_t left, uint32_t right);
|
|
|
|
bool IsOverUpperBound(const Slice& internal_key) const;
|
|
|
|
// Set PinnedIteratorsManager for all children Iterators, this function should
|
|
// be called whenever we update children Iterators or pinned_iters_mgr_.
|
|
void UpdateChildrenPinnedItersMgr();
|
|
|
|
// A helper function that will release iter in the proper manner, or pass it
|
|
// to pinned_iters_mgr_ to release it later if pinning is enabled.
|
|
void DeleteIterator(InternalIterator* iter, bool is_arena = false);
|
|
|
|
DBImpl* const db_;
|
|
const ReadOptions read_options_;
|
|
ColumnFamilyData* const cfd_;
|
|
const SliceTransform* const prefix_extractor_;
|
|
const Comparator* user_comparator_;
|
|
const bool allow_unprepared_value_;
|
|
MinIterHeap immutable_min_heap_;
|
|
|
|
SuperVersion* sv_;
|
|
InternalIterator* mutable_iter_;
|
|
std::vector<InternalIterator*> imm_iters_;
|
|
std::vector<InternalIterator*> l0_iters_;
|
|
std::vector<ForwardLevelIterator*> level_iters_;
|
|
InternalIterator* current_;
|
|
bool valid_;
|
|
|
|
// Internal iterator status; set only by one of the unsupported methods.
|
|
Status status_;
|
|
// Status of immutable iterators, maintained here to avoid iterating over
|
|
// all of them in status().
|
|
Status immutable_status_;
|
|
// Indicates that at least one of the immutable iterators pointed to a key
|
|
// larger than iterate_upper_bound and was therefore destroyed. Seek() may
|
|
// need to rebuild such iterators.
|
|
bool has_iter_trimmed_for_upper_bound_;
|
|
// Is current key larger than iterate_upper_bound? If so, makes Valid()
|
|
// return false.
|
|
bool current_over_upper_bound_;
|
|
|
|
// Left endpoint of the range of keys that immutable iterators currently
|
|
// cover. When Seek() is called with a key that's within that range, immutable
|
|
// iterators don't need to be moved; see NeedToSeekImmutable(). This key is
|
|
// included in the range after a Seek(), but excluded when advancing the
|
|
// iterator using Next().
|
|
IterKey prev_key_;
|
|
bool is_prev_set_;
|
|
bool is_prev_inclusive_;
|
|
|
|
PinnedIteratorsManager* pinned_iters_mgr_;
|
|
Arena arena_;
|
|
};
|
|
|
|
} // namespace ROCKSDB_NAMESPACE
|
|
#endif // ROCKSDB_LITE
|