2016-02-09 15:12:00 -08:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-15 16:03:42 -07:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
#pragma once
|
2017-01-11 16:42:07 -08:00
|
|
|
#include <atomic>
|
2015-10-16 14:33:47 -07:00
|
|
|
#include <string>
|
2017-01-11 16:42:07 -08:00
|
|
|
#include "port/port.h"
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
#include "rocksdb/env.h"
|
2017-06-13 14:51:22 -07:00
|
|
|
#include "rocksdb/rate_limiter.h"
|
2015-09-11 09:57:02 -07:00
|
|
|
#include "util/aligned_buffer.h"
|
2017-10-31 13:49:25 -07:00
|
|
|
#include "util/sync_point.h"
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
|
|
|
|
namespace rocksdb {
|
2015-08-05 12:11:30 -07:00
|
|
|
|
|
|
|
class Statistics;
|
Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44193
2015-08-13 14:35:54 -07:00
|
|
|
class HistogramImpl;
|
2015-08-05 12:11:30 -07:00
|
|
|
|
2015-08-26 15:25:59 -07:00
|
|
|
std::unique_ptr<RandomAccessFile> NewReadaheadRandomAccessFile(
|
2015-09-11 09:57:02 -07:00
|
|
|
std::unique_ptr<RandomAccessFile>&& file, size_t readahead_size);
|
2015-08-26 15:25:59 -07:00
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
class SequentialFileReader {
|
|
|
|
private:
|
|
|
|
std::unique_ptr<SequentialFile> file_;
|
2018-06-21 08:34:24 -07:00
|
|
|
std::string file_name_;
|
2017-01-11 16:42:07 -08:00
|
|
|
std::atomic<size_t> offset_; // read offset
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
|
|
|
|
public:
|
2018-06-21 08:34:24 -07:00
|
|
|
explicit SequentialFileReader(std::unique_ptr<SequentialFile>&& _file,
|
|
|
|
const std::string& _file_name)
|
|
|
|
: file_(std::move(_file)), file_name_(_file_name), offset_(0) {}
|
2015-09-11 09:57:02 -07:00
|
|
|
|
|
|
|
SequentialFileReader(SequentialFileReader&& o) ROCKSDB_NOEXCEPT {
|
|
|
|
*this = std::move(o);
|
|
|
|
}
|
|
|
|
|
|
|
|
SequentialFileReader& operator=(SequentialFileReader&& o) ROCKSDB_NOEXCEPT {
|
|
|
|
file_ = std::move(o.file_);
|
|
|
|
return *this;
|
|
|
|
}
|
|
|
|
|
2015-10-16 14:33:47 -07:00
|
|
|
SequentialFileReader(const SequentialFileReader&) = delete;
|
|
|
|
SequentialFileReader& operator=(const SequentialFileReader&) = delete;
|
2015-09-11 09:57:02 -07:00
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
Status Read(size_t n, Slice* result, char* scratch);
|
|
|
|
|
|
|
|
Status Skip(uint64_t n);
|
|
|
|
|
2017-05-10 14:54:35 -07:00
|
|
|
void Rewind();
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
SequentialFile* file() { return file_.get(); }
|
2017-01-11 16:42:07 -08:00
|
|
|
|
2018-06-21 08:34:24 -07:00
|
|
|
std::string file_name() { return file_name_; }
|
|
|
|
|
2017-01-13 12:01:08 -08:00
|
|
|
bool use_direct_io() const { return file_->use_direct_io(); }
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
};
|
|
|
|
|
2015-09-22 18:21:10 -07:00
|
|
|
class RandomAccessFileReader {
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
private:
|
|
|
|
std::unique_ptr<RandomAccessFile> file_;
|
2017-06-28 21:26:03 -07:00
|
|
|
std::string file_name_;
|
2015-09-11 09:57:02 -07:00
|
|
|
Env* env_;
|
|
|
|
Statistics* stats_;
|
|
|
|
uint32_t hist_type_;
|
|
|
|
HistogramImpl* file_read_hist_;
|
2017-06-13 14:51:22 -07:00
|
|
|
RateLimiter* rate_limiter_;
|
|
|
|
bool for_compaction_;
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
|
|
|
|
public:
|
2015-08-05 12:11:30 -07:00
|
|
|
explicit RandomAccessFileReader(std::unique_ptr<RandomAccessFile>&& raf,
|
2017-06-28 21:26:03 -07:00
|
|
|
std::string _file_name,
|
2015-08-05 12:11:30 -07:00
|
|
|
Env* env = nullptr,
|
|
|
|
Statistics* stats = nullptr,
|
Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44193
2015-08-13 14:35:54 -07:00
|
|
|
uint32_t hist_type = 0,
|
2017-06-13 14:51:22 -07:00
|
|
|
HistogramImpl* file_read_hist = nullptr,
|
|
|
|
RateLimiter* rate_limiter = nullptr,
|
|
|
|
bool for_compaction = false)
|
2015-08-05 12:11:30 -07:00
|
|
|
: file_(std::move(raf)),
|
2017-06-28 21:26:03 -07:00
|
|
|
file_name_(std::move(_file_name)),
|
2015-08-05 12:11:30 -07:00
|
|
|
env_(env),
|
|
|
|
stats_(stats),
|
Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44193
2015-08-13 14:35:54 -07:00
|
|
|
hist_type_(hist_type),
|
2017-06-13 14:51:22 -07:00
|
|
|
file_read_hist_(file_read_hist),
|
|
|
|
rate_limiter_(rate_limiter),
|
|
|
|
for_compaction_(for_compaction) {}
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
|
2015-09-11 09:57:02 -07:00
|
|
|
RandomAccessFileReader(RandomAccessFileReader&& o) ROCKSDB_NOEXCEPT {
|
|
|
|
*this = std::move(o);
|
|
|
|
}
|
|
|
|
|
2017-06-13 14:51:22 -07:00
|
|
|
RandomAccessFileReader& operator=(RandomAccessFileReader&& o)
|
|
|
|
ROCKSDB_NOEXCEPT {
|
2015-09-11 09:57:02 -07:00
|
|
|
file_ = std::move(o.file_);
|
|
|
|
env_ = std::move(o.env_);
|
|
|
|
stats_ = std::move(o.stats_);
|
|
|
|
hist_type_ = std::move(o.hist_type_);
|
|
|
|
file_read_hist_ = std::move(o.file_read_hist_);
|
2017-06-13 14:51:22 -07:00
|
|
|
rate_limiter_ = std::move(o.rate_limiter_);
|
|
|
|
for_compaction_ = std::move(o.for_compaction_);
|
2015-09-11 09:57:02 -07:00
|
|
|
return *this;
|
|
|
|
}
|
|
|
|
|
|
|
|
RandomAccessFileReader(const RandomAccessFileReader&) = delete;
|
|
|
|
RandomAccessFileReader& operator=(const RandomAccessFileReader&) = delete;
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
Status Read(uint64_t offset, size_t n, Slice* result, char* scratch) const;
|
|
|
|
|
2017-04-14 18:43:32 -07:00
|
|
|
Status Prefetch(uint64_t offset, size_t n) const {
|
|
|
|
return file_->Prefetch(offset, n);
|
|
|
|
}
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
RandomAccessFile* file() { return file_.get(); }
|
2017-01-11 16:42:07 -08:00
|
|
|
|
2017-06-28 21:26:03 -07:00
|
|
|
std::string file_name() const { return file_name_; }
|
|
|
|
|
2017-01-13 12:01:08 -08:00
|
|
|
bool use_direct_io() const { return file_->use_direct_io(); }
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
};
|
|
|
|
|
|
|
|
// Use posix write to write data to a file.
|
|
|
|
class WritableFileWriter {
|
|
|
|
private:
|
|
|
|
std::unique_ptr<WritableFile> writable_file_;
|
2015-09-11 09:57:02 -07:00
|
|
|
AlignedBuffer buf_;
|
2015-10-29 22:10:25 -07:00
|
|
|
size_t max_buffer_size_;
|
2015-09-11 09:57:02 -07:00
|
|
|
// Actually written data size can be used for truncate
|
|
|
|
// not counting padding data
|
|
|
|
uint64_t filesize_;
|
2017-06-12 06:32:01 -07:00
|
|
|
#ifndef ROCKSDB_LITE
|
2015-09-11 09:57:02 -07:00
|
|
|
// This is necessary when we use unbuffered access
|
|
|
|
// and writes must happen on aligned offsets
|
|
|
|
// so we need to go back and write that page again
|
|
|
|
uint64_t next_write_offset_;
|
2017-06-12 06:32:01 -07:00
|
|
|
#endif // ROCKSDB_LITE
|
2015-09-11 09:57:02 -07:00
|
|
|
bool pending_sync_;
|
|
|
|
uint64_t last_sync_size_;
|
|
|
|
uint64_t bytes_per_sync_;
|
|
|
|
RateLimiter* rate_limiter_;
|
2017-03-02 17:40:24 -08:00
|
|
|
Statistics* stats_;
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
|
|
|
|
public:
|
2015-09-11 09:57:02 -07:00
|
|
|
WritableFileWriter(std::unique_ptr<WritableFile>&& file,
|
2017-03-02 17:40:24 -08:00
|
|
|
const EnvOptions& options, Statistics* stats = nullptr)
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
: writable_file_(std::move(file)),
|
2015-09-11 09:57:02 -07:00
|
|
|
buf_(),
|
2015-10-29 22:10:25 -07:00
|
|
|
max_buffer_size_(options.writable_file_max_buffer_size),
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
filesize_(0),
|
2017-06-12 06:32:01 -07:00
|
|
|
#ifndef ROCKSDB_LITE
|
2015-09-11 09:57:02 -07:00
|
|
|
next_write_offset_(0),
|
2017-06-12 06:32:01 -07:00
|
|
|
#endif // ROCKSDB_LITE
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
pending_sync_(false),
|
|
|
|
last_sync_size_(0),
|
|
|
|
bytes_per_sync_(options.bytes_per_sync),
|
2017-03-02 17:40:24 -08:00
|
|
|
rate_limiter_(options.rate_limiter),
|
|
|
|
stats_(stats) {
|
2017-10-31 13:49:25 -07:00
|
|
|
TEST_SYNC_POINT_CALLBACK("WritableFileWriter::WritableFileWriter:0",
|
|
|
|
reinterpret_cast<void*>(max_buffer_size_));
|
2015-09-11 17:36:48 -07:00
|
|
|
buf_.Alignment(writable_file_->GetRequiredBufferAlignment());
|
2017-06-13 04:34:51 -07:00
|
|
|
buf_.AllocateNewBuffer(std::min((size_t)65536, max_buffer_size_));
|
2015-09-11 09:57:02 -07:00
|
|
|
}
|
|
|
|
|
|
|
|
WritableFileWriter(const WritableFileWriter&) = delete;
|
|
|
|
|
|
|
|
WritableFileWriter& operator=(const WritableFileWriter&) = delete;
|
|
|
|
|
|
|
|
~WritableFileWriter() { Close(); }
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
|
|
|
|
Status Append(const Slice& data);
|
|
|
|
|
2018-03-26 20:14:24 -07:00
|
|
|
Status Pad(const size_t pad_bytes);
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
Status Flush();
|
|
|
|
|
|
|
|
Status Close();
|
|
|
|
|
|
|
|
Status Sync(bool use_fsync);
|
|
|
|
|
[wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 06:06:39 -07:00
|
|
|
// Sync only the data that was already Flush()ed. Safe to call concurrently
|
|
|
|
// with Append() and Flush(). If !writable_file_->IsSyncThreadSafe(),
|
|
|
|
// returns NotSupported status.
|
|
|
|
Status SyncWithoutFlush(bool use_fsync);
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
uint64_t GetFileSize() { return filesize_; }
|
|
|
|
|
|
|
|
Status InvalidateCache(size_t offset, size_t length) {
|
|
|
|
return writable_file_->InvalidateCache(offset, length);
|
|
|
|
}
|
|
|
|
|
|
|
|
WritableFile* writable_file() const { return writable_file_.get(); }
|
|
|
|
|
2017-01-13 12:01:08 -08:00
|
|
|
bool use_direct_io() { return writable_file_->use_direct_io(); }
|
2017-01-11 16:42:07 -08:00
|
|
|
|
2018-05-14 10:53:32 -07:00
|
|
|
bool TEST_BufferIsEmpty() { return buf_.CurrentSize() == 0; }
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
private:
|
2015-09-11 09:57:02 -07:00
|
|
|
// Used when os buffering is OFF and we are writing
|
2016-12-22 12:51:29 -08:00
|
|
|
// DMA such as in Direct I/O mode
|
2017-02-16 10:25:06 -08:00
|
|
|
#ifndef ROCKSDB_LITE
|
2016-12-22 12:51:29 -08:00
|
|
|
Status WriteDirect();
|
2017-02-16 10:25:06 -08:00
|
|
|
#endif // !ROCKSDB_LITE
|
2015-09-11 09:57:02 -07:00
|
|
|
// Normal write
|
|
|
|
Status WriteBuffered(const char* data, size_t size);
|
2015-11-10 17:03:42 -08:00
|
|
|
Status RangeSync(uint64_t offset, uint64_t nbytes);
|
[wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 06:06:39 -07:00
|
|
|
Status SyncInternal(bool use_fsync);
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
};
|
2015-10-16 14:33:47 -07:00
|
|
|
|
Improve direct IO range scan performance with readahead (#3884)
Summary:
This PR extends the improvements in #3282 to also work when using Direct IO.
We see **4.5X performance improvement** in seekrandom benchmark doing long range scans, when using direct reads, on flash.
**Description:**
This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.
**Implementation Details:**
- Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead.
- `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled.
- `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer.
- Made sure not to re-read partial chunks of data that were already available in the buffer, from device again.
- Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date.
**Constraints:**
- Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).
- Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously.
- Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them.
**Benchmarks:**
I used the same benchmark as used in #3282.
Data fill:
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
```
Do a long range scan: Seekrandom with large number of nexts
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
```
```
Before:
seekrandom : 37939.906 micros/op 26 ops/sec; 29.2 MB/s (1636 of 1999 found)
With this change:
seekrandom : 8527.720 micros/op 117 ops/sec; 129.7 MB/s (6530 of 7999 found)
```
~4.5X perf improvement. Taken on an average of 3 runs.
Closes https://github.com/facebook/rocksdb/pull/3884
Differential Revision: D8082143
Pulled By: sagar0
fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb
2018-06-21 11:02:49 -07:00
|
|
|
// FilePrefetchBuffer can automatically do the readahead if file_reader,
|
|
|
|
// readahead_size, and max_readahead_size are passed in.
|
|
|
|
// max_readahead_size should be greater than or equal to readahead_size.
|
|
|
|
// readahead_size will be doubled on every IO, until max_readahead_size.
|
2017-08-11 11:59:13 -07:00
|
|
|
class FilePrefetchBuffer {
|
|
|
|
public:
|
Improve direct IO range scan performance with readahead (#3884)
Summary:
This PR extends the improvements in #3282 to also work when using Direct IO.
We see **4.5X performance improvement** in seekrandom benchmark doing long range scans, when using direct reads, on flash.
**Description:**
This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.
**Implementation Details:**
- Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead.
- `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled.
- `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer.
- Made sure not to re-read partial chunks of data that were already available in the buffer, from device again.
- Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date.
**Constraints:**
- Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).
- Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously.
- Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them.
**Benchmarks:**
I used the same benchmark as used in #3282.
Data fill:
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
```
Do a long range scan: Seekrandom with large number of nexts
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
```
```
Before:
seekrandom : 37939.906 micros/op 26 ops/sec; 29.2 MB/s (1636 of 1999 found)
With this change:
seekrandom : 8527.720 micros/op 117 ops/sec; 129.7 MB/s (6530 of 7999 found)
```
~4.5X perf improvement. Taken on an average of 3 runs.
Closes https://github.com/facebook/rocksdb/pull/3884
Differential Revision: D8082143
Pulled By: sagar0
fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb
2018-06-21 11:02:49 -07:00
|
|
|
FilePrefetchBuffer(RandomAccessFileReader* file_reader = nullptr,
|
|
|
|
size_t readadhead_size = 0, size_t max_readahead_size = 0)
|
|
|
|
: buffer_offset_(0),
|
|
|
|
file_reader_(file_reader),
|
|
|
|
readahead_size_(readadhead_size),
|
|
|
|
max_readahead_size_(max_readahead_size) {}
|
2017-08-11 11:59:13 -07:00
|
|
|
Status Prefetch(RandomAccessFileReader* reader, uint64_t offset, size_t n);
|
Improve direct IO range scan performance with readahead (#3884)
Summary:
This PR extends the improvements in #3282 to also work when using Direct IO.
We see **4.5X performance improvement** in seekrandom benchmark doing long range scans, when using direct reads, on flash.
**Description:**
This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.
**Implementation Details:**
- Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead.
- `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled.
- `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer.
- Made sure not to re-read partial chunks of data that were already available in the buffer, from device again.
- Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date.
**Constraints:**
- Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).
- Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously.
- Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them.
**Benchmarks:**
I used the same benchmark as used in #3282.
Data fill:
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
```
Do a long range scan: Seekrandom with large number of nexts
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
```
```
Before:
seekrandom : 37939.906 micros/op 26 ops/sec; 29.2 MB/s (1636 of 1999 found)
With this change:
seekrandom : 8527.720 micros/op 117 ops/sec; 129.7 MB/s (6530 of 7999 found)
```
~4.5X perf improvement. Taken on an average of 3 runs.
Closes https://github.com/facebook/rocksdb/pull/3884
Differential Revision: D8082143
Pulled By: sagar0
fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb
2018-06-21 11:02:49 -07:00
|
|
|
bool TryReadFromCache(uint64_t offset, size_t n, Slice* result);
|
2017-08-11 11:59:13 -07:00
|
|
|
|
|
|
|
private:
|
|
|
|
AlignedBuffer buffer_;
|
|
|
|
uint64_t buffer_offset_;
|
Improve direct IO range scan performance with readahead (#3884)
Summary:
This PR extends the improvements in #3282 to also work when using Direct IO.
We see **4.5X performance improvement** in seekrandom benchmark doing long range scans, when using direct reads, on flash.
**Description:**
This change improves the performance of iterators doing long range scans (e.g. big/full index or table scans in MyRocks) by using readahead and prefetching additional data on each disk IO, and storing in a local buffer. This prefetching is automatically enabled on noticing more than 2 IOs for the same table file during iteration. The readahead size starts with 8KB and is exponentially increased on each additional sequential IO, up to a max of 256 KB. This helps in cutting down the number of IOs needed to complete the range scan.
**Implementation Details:**
- Used `FilePrefetchBuffer` as the underlying buffer to store the readahead data. `FilePrefetchBuffer` can now take file_reader, readahead_size and max_readahead_size as input to the constructor, and automatically do readahead.
- `FilePrefetchBuffer::TryReadFromCache` can now call `FilePrefetchBuffer::Prefetch` if readahead is enabled.
- `AlignedBuffer` (which is the underlying store for `FilePrefetchBuffer`) now takes a few additional args in `AlignedBuffer::AllocateNewBuffer` to allow copying data from the old buffer.
- Made sure not to re-read partial chunks of data that were already available in the buffer, from device again.
- Fixed a couple of cases where `AlignedBuffer::cursize_` was not being properly kept up-to-date.
**Constraints:**
- Similar to #3282, this gets currently enabled only when ReadOptions.readahead_size = 0 (which is the default value).
- Since the prefetched data is stored in a temporary buffer allocated on heap, this could increase the memory usage if you have many iterators doing long range scans simultaneously.
- Enabled only for user reads, and disabled for compactions. Compaction reads are controlled by the options `use_direct_io_for_flush_and_compaction` and `compaction_readahead_size`, and the current feature takes precautions not to mess with them.
**Benchmarks:**
I used the same benchmark as used in #3282.
Data fill:
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=fillrandom -num=1000000000 -compression_type="none" -level_compaction_dynamic_level_bytes
```
Do a long range scan: Seekrandom with large number of nexts
```
TEST_TMPDIR=/data/users/$USER/benchmarks/iter ./db_bench -benchmarks=seekrandom -use_direct_reads -duration=60 -num=1000000000 -use_existing_db -seek_nexts=10000 -statistics -histogram
```
```
Before:
seekrandom : 37939.906 micros/op 26 ops/sec; 29.2 MB/s (1636 of 1999 found)
With this change:
seekrandom : 8527.720 micros/op 117 ops/sec; 129.7 MB/s (6530 of 7999 found)
```
~4.5X perf improvement. Taken on an average of 3 runs.
Closes https://github.com/facebook/rocksdb/pull/3884
Differential Revision: D8082143
Pulled By: sagar0
fbshipit-source-id: 4d7a8561cbac03478663713df4d31ad2620253bb
2018-06-21 11:02:49 -07:00
|
|
|
RandomAccessFileReader* file_reader_;
|
|
|
|
size_t readahead_size_;
|
|
|
|
size_t max_readahead_size_;
|
2017-08-11 11:59:13 -07:00
|
|
|
};
|
|
|
|
|
2015-10-16 14:33:47 -07:00
|
|
|
extern Status NewWritableFile(Env* env, const std::string& fname,
|
|
|
|
unique_ptr<WritableFile>* result,
|
|
|
|
const EnvOptions& options);
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:16:11 -07:00
|
|
|
} // namespace rocksdb
|