Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under the BSD-style license found in the
|
|
|
|
// LICENSE file in the root directory of this source tree. An additional grant
|
|
|
|
// of patent rights can be found in the PATENTS file in the same directory.
|
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
|
|
|
|
#include "util/file_reader_writer.h"
|
|
|
|
|
|
|
|
#include <algorithm>
|
2015-08-27 00:25:59 +02:00
|
|
|
#include <mutex>
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
#include "port/port.h"
|
Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44193
2015-08-13 23:35:54 +02:00
|
|
|
#include "util/histogram.h"
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
#include "util/iostats_context_imp.h"
|
|
|
|
#include "util/random.h"
|
|
|
|
#include "util/rate_limiter.h"
|
|
|
|
#include "util/sync_point.h"
|
|
|
|
|
|
|
|
namespace rocksdb {
|
|
|
|
Status SequentialFileReader::Read(size_t n, Slice* result, char* scratch) {
|
|
|
|
Status s = file_->Read(n, result, scratch);
|
|
|
|
IOSTATS_ADD(bytes_read, result->size());
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
|
|
|
Status SequentialFileReader::Skip(uint64_t n) { return file_->Skip(n); }
|
|
|
|
|
|
|
|
Status RandomAccessFileReader::Read(uint64_t offset, size_t n, Slice* result,
|
|
|
|
char* scratch) const {
|
Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44193
2015-08-13 23:35:54 +02:00
|
|
|
Status s;
|
|
|
|
uint64_t elapsed = 0;
|
|
|
|
{
|
|
|
|
StopWatch sw(env_, stats_, hist_type_,
|
|
|
|
(stats_ != nullptr) ? &elapsed : nullptr);
|
|
|
|
IOSTATS_TIMER_GUARD(read_nanos);
|
|
|
|
s = file_->Read(offset, n, result, scratch);
|
|
|
|
IOSTATS_ADD_IF_POSITIVE(bytes_read, result->size());
|
|
|
|
}
|
|
|
|
if (stats_ != nullptr && file_read_hist_ != nullptr) {
|
|
|
|
file_read_hist_->Add(elapsed);
|
|
|
|
}
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
|
|
|
Status WritableFileWriter::Append(const Slice& data) {
|
|
|
|
const char* src = data.data();
|
|
|
|
size_t left = data.size();
|
|
|
|
Status s;
|
|
|
|
pending_sync_ = true;
|
|
|
|
pending_fsync_ = true;
|
|
|
|
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds * REDUCE_ODDS2);
|
|
|
|
|
Add options.compaction_measure_io_stats to print write I/O stats in compactions
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
Add two more counters in iostats_context.
Also add a parameter of db_bench.
Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44115
2015-08-13 02:24:45 +02:00
|
|
|
{
|
|
|
|
IOSTATS_TIMER_GUARD(prepare_write_nanos);
|
|
|
|
TEST_SYNC_POINT("WritableFileWriter::Append:BeforePrepareWrite");
|
|
|
|
writable_file_->PrepareWrite(static_cast<size_t>(GetFileSize()), left);
|
|
|
|
}
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
// if there is no space in the cache, then flush
|
|
|
|
if (cursize_ + left > capacity_) {
|
|
|
|
s = Flush();
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
// Increase the buffer size, but capped at 1MB
|
|
|
|
if (capacity_ < (1 << 20)) {
|
|
|
|
capacity_ *= 2;
|
|
|
|
buf_.reset(new char[capacity_]);
|
|
|
|
}
|
|
|
|
assert(cursize_ == 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
// if the write fits into the cache, then write to cache
|
|
|
|
// otherwise do a write() syscall to write to OS buffers.
|
|
|
|
if (cursize_ + left <= capacity_) {
|
|
|
|
memcpy(buf_.get() + cursize_, src, left);
|
|
|
|
cursize_ += left;
|
|
|
|
} else {
|
|
|
|
while (left != 0) {
|
|
|
|
size_t size = RequestToken(left);
|
|
|
|
{
|
|
|
|
IOSTATS_TIMER_GUARD(write_nanos);
|
|
|
|
s = writable_file_->Append(Slice(src, size));
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
IOSTATS_ADD(bytes_written, size);
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds);
|
|
|
|
|
|
|
|
left -= size;
|
|
|
|
src += size;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds);
|
|
|
|
filesize_ += data.size();
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
|
|
|
|
Status WritableFileWriter::Close() {
|
|
|
|
Status s;
|
|
|
|
s = Flush(); // flush cache to OS
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds);
|
|
|
|
return writable_file_->Close();
|
|
|
|
}
|
|
|
|
|
|
|
|
// write out the cached data to the OS cache
|
|
|
|
Status WritableFileWriter::Flush() {
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds * REDUCE_ODDS2);
|
|
|
|
size_t left = cursize_;
|
|
|
|
char* src = buf_.get();
|
|
|
|
while (left != 0) {
|
|
|
|
size_t size = RequestToken(left);
|
|
|
|
{
|
|
|
|
IOSTATS_TIMER_GUARD(write_nanos);
|
Add options.compaction_measure_io_stats to print write I/O stats in compactions
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
Add two more counters in iostats_context.
Also add a parameter of db_bench.
Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44115
2015-08-13 02:24:45 +02:00
|
|
|
TEST_SYNC_POINT("WritableFileWriter::Flush:BeforeAppend");
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
Status s = writable_file_->Append(Slice(src, size));
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
IOSTATS_ADD(bytes_written, size);
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds * REDUCE_ODDS2);
|
|
|
|
left -= size;
|
|
|
|
src += size;
|
|
|
|
}
|
|
|
|
cursize_ = 0;
|
|
|
|
|
|
|
|
writable_file_->Flush();
|
|
|
|
|
|
|
|
// sync OS cache to disk for every bytes_per_sync_
|
|
|
|
// TODO: give log file and sst file different options (log
|
|
|
|
// files could be potentially cached in OS for their whole
|
|
|
|
// life time, thus we might not want to flush at all).
|
RangeSync not to sync last 1MB of the file
Summary:
From other ones' investigation:
"sync_file_range() behavior highly depends on kernel version and filesystem.
xfs does neighbor page flushing outside of the specified ranges. For example, sync_file_range(fd, 8192, 16384) does not only trigger flushing page #3 to #4, but also flushing many more dirty pages (i.e. up to page#16)... Ranges of the sync_file_range() should be far enough from write() offset (at least 1MB)."
Test Plan: make all check
Reviewers: igor, rven, kradhakrishnan, yhchiang, IslamAbdelRahman, anthony
Reviewed By: anthony
Subscribers: yoshinorim, MarkCallaghan, sumeet, domas, dhruba, leveldb, ljin
Differential Revision: https://reviews.facebook.net/D15807
2015-07-20 23:46:15 +02:00
|
|
|
|
|
|
|
// We try to avoid sync to the last 1MB of data. For two reasons:
|
|
|
|
// (1) avoid rewrite the same page that is modified later.
|
|
|
|
// (2) for older version of OS, write can block while writing out
|
|
|
|
// the page.
|
|
|
|
// Xfs does neighbor page flushing outside of the specified ranges. We
|
|
|
|
// need to make sure sync range is far from the write offset.
|
2015-08-26 15:28:09 +02:00
|
|
|
if (!direct_io_ && bytes_per_sync_) {
|
RangeSync not to sync last 1MB of the file
Summary:
From other ones' investigation:
"sync_file_range() behavior highly depends on kernel version and filesystem.
xfs does neighbor page flushing outside of the specified ranges. For example, sync_file_range(fd, 8192, 16384) does not only trigger flushing page #3 to #4, but also flushing many more dirty pages (i.e. up to page#16)... Ranges of the sync_file_range() should be far enough from write() offset (at least 1MB)."
Test Plan: make all check
Reviewers: igor, rven, kradhakrishnan, yhchiang, IslamAbdelRahman, anthony
Reviewed By: anthony
Subscribers: yoshinorim, MarkCallaghan, sumeet, domas, dhruba, leveldb, ljin
Differential Revision: https://reviews.facebook.net/D15807
2015-07-20 23:46:15 +02:00
|
|
|
uint64_t kBytesNotSyncRange = 1024 * 1024; // recent 1MB is not synced.
|
|
|
|
uint64_t kBytesAlignWhenSync = 4 * 1024; // Align 4KB.
|
|
|
|
if (filesize_ > kBytesNotSyncRange) {
|
|
|
|
uint64_t offset_sync_to = filesize_ - kBytesNotSyncRange;
|
|
|
|
offset_sync_to -= offset_sync_to % kBytesAlignWhenSync;
|
|
|
|
assert(offset_sync_to >= last_sync_size_);
|
|
|
|
if (offset_sync_to > 0 &&
|
|
|
|
offset_sync_to - last_sync_size_ >= bytes_per_sync_) {
|
|
|
|
RangeSync(last_sync_size_, offset_sync_to - last_sync_size_);
|
|
|
|
last_sync_size_ = offset_sync_to;
|
|
|
|
}
|
|
|
|
}
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
|
|
|
|
Status WritableFileWriter::Sync(bool use_fsync) {
|
|
|
|
Status s = Flush();
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds);
|
2015-08-26 15:28:09 +02:00
|
|
|
if (!direct_io_ && pending_sync_) {
|
[wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 15:06:39 +02:00
|
|
|
s = SyncInternal(use_fsync);
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
TEST_KILL_RANDOM(rocksdb_kill_odds);
|
|
|
|
pending_sync_ = false;
|
|
|
|
if (use_fsync) {
|
|
|
|
pending_fsync_ = false;
|
|
|
|
}
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
|
[wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 15:06:39 +02:00
|
|
|
Status WritableFileWriter::SyncWithoutFlush(bool use_fsync) {
|
|
|
|
if (!writable_file_->IsSyncThreadSafe()) {
|
|
|
|
return Status::NotSupported(
|
|
|
|
"Can't WritableFileWriter::SyncWithoutFlush() because "
|
|
|
|
"WritableFile::IsSyncThreadSafe() is false");
|
|
|
|
}
|
2015-08-05 20:56:19 +02:00
|
|
|
TEST_SYNC_POINT("WritableFileWriter::SyncWithoutFlush:1");
|
|
|
|
Status s = SyncInternal(use_fsync);
|
|
|
|
TEST_SYNC_POINT("WritableFileWriter::SyncWithoutFlush:2");
|
|
|
|
return s;
|
[wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 15:06:39 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
Status WritableFileWriter::SyncInternal(bool use_fsync) {
|
|
|
|
Status s;
|
Add options.compaction_measure_io_stats to print write I/O stats in compactions
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
Add two more counters in iostats_context.
Also add a parameter of db_bench.
Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44115
2015-08-13 02:24:45 +02:00
|
|
|
IOSTATS_TIMER_GUARD(fsync_nanos);
|
|
|
|
TEST_SYNC_POINT("WritableFileWriter::SyncInternal:0");
|
[wal changes 3/3] method in DB to sync WAL without blocking writers
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
2015-08-05 15:06:39 +02:00
|
|
|
if (use_fsync) {
|
|
|
|
s = writable_file_->Fsync();
|
|
|
|
} else {
|
|
|
|
s = writable_file_->Sync();
|
|
|
|
}
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
Status WritableFileWriter::RangeSync(off_t offset, off_t nbytes) {
|
|
|
|
IOSTATS_TIMER_GUARD(range_sync_nanos);
|
Add options.compaction_measure_io_stats to print write I/O stats in compactions
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
Add two more counters in iostats_context.
Also add a parameter of db_bench.
Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44115
2015-08-13 02:24:45 +02:00
|
|
|
TEST_SYNC_POINT("WritableFileWriter::RangeSync:0");
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
return writable_file_->RangeSync(offset, nbytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t WritableFileWriter::RequestToken(size_t bytes) {
|
|
|
|
Env::IOPriority io_priority;
|
|
|
|
if (rate_limiter_&&(io_priority = writable_file_->GetIOPriority()) <
|
|
|
|
Env::IO_TOTAL) {
|
|
|
|
bytes = std::min(bytes,
|
|
|
|
static_cast<size_t>(rate_limiter_->GetSingleBurstBytes()));
|
|
|
|
rate_limiter_->Request(bytes, io_priority);
|
|
|
|
}
|
|
|
|
return bytes;
|
|
|
|
}
|
2015-08-27 00:25:59 +02:00
|
|
|
|
|
|
|
namespace {
|
|
|
|
class ReadaheadRandomAccessFile : public RandomAccessFile {
|
|
|
|
public:
|
|
|
|
ReadaheadRandomAccessFile(std::unique_ptr<RandomAccessFile> file,
|
|
|
|
size_t readahead_size)
|
|
|
|
: file_(std::move(file)),
|
|
|
|
readahead_size_(readahead_size),
|
|
|
|
buffer_(new char[readahead_size_]),
|
|
|
|
buffer_offset_(0),
|
|
|
|
buffer_len_(0) {}
|
|
|
|
|
|
|
|
virtual Status Read(uint64_t offset, size_t n, Slice* result,
|
|
|
|
char* scratch) const override {
|
|
|
|
if (n >= readahead_size_) {
|
|
|
|
return file_->Read(offset, n, result, scratch);
|
|
|
|
}
|
|
|
|
|
|
|
|
std::unique_lock<std::mutex> lk(lock_);
|
|
|
|
|
|
|
|
size_t copied = 0;
|
|
|
|
// if offset between [buffer_offset_, buffer_offset_ + buffer_len>
|
|
|
|
if (offset >= buffer_offset_ && offset < buffer_len_ + buffer_offset_) {
|
|
|
|
uint64_t offset_in_buffer = offset - buffer_offset_;
|
|
|
|
copied = std::min(static_cast<uint64_t>(buffer_len_) - offset_in_buffer,
|
|
|
|
static_cast<uint64_t>(n));
|
|
|
|
memcpy(scratch, buffer_.get() + offset_in_buffer, copied);
|
|
|
|
if (copied == n) {
|
|
|
|
// fully cached
|
|
|
|
*result = Slice(scratch, n);
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
Slice readahead_result;
|
|
|
|
Status s = file_->Read(offset + copied, readahead_size_, &readahead_result,
|
|
|
|
buffer_.get());
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
|
|
|
auto left_to_copy = std::min(readahead_result.size(), n - copied);
|
|
|
|
memcpy(scratch + copied, readahead_result.data(), left_to_copy);
|
|
|
|
*result = Slice(scratch, copied + left_to_copy);
|
|
|
|
|
|
|
|
if (readahead_result.data() == buffer_.get()) {
|
|
|
|
buffer_offset_ = offset + copied;
|
|
|
|
buffer_len_ = readahead_result.size();
|
|
|
|
} else {
|
|
|
|
buffer_len_ = 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
|
|
|
|
virtual size_t GetUniqueId(char* id, size_t max_size) const override {
|
|
|
|
return file_->GetUniqueId(id, max_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
virtual void Hint(AccessPattern pattern) override { file_->Hint(pattern); }
|
|
|
|
|
|
|
|
virtual Status InvalidateCache(size_t offset, size_t length) override {
|
|
|
|
return file_->InvalidateCache(offset, length);
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
std::unique_ptr<RandomAccessFile> file_;
|
|
|
|
size_t readahead_size_;
|
|
|
|
|
|
|
|
mutable std::mutex lock_;
|
|
|
|
mutable std::unique_ptr<char[]> buffer_;
|
|
|
|
mutable uint64_t buffer_offset_;
|
|
|
|
mutable size_t buffer_len_;
|
|
|
|
};
|
|
|
|
} // namespace
|
|
|
|
|
|
|
|
std::unique_ptr<RandomAccessFile> NewReadaheadRandomAccessFile(
|
|
|
|
std::unique_ptr<RandomAccessFile> file, size_t readahead_size) {
|
|
|
|
std::unique_ptr<ReadaheadRandomAccessFile> wrapped_file(
|
|
|
|
new ReadaheadRandomAccessFile(std::move(file), readahead_size));
|
|
|
|
return std::move(wrapped_file);
|
|
|
|
}
|
|
|
|
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
} // namespace rocksdb
|