2017-06-02 23:13:59 +02:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-16 01:03:42 +02:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
2017-06-02 23:13:59 +02:00
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
|
|
|
|
#include "rocksdb/write_buffer_manager.h"
|
2021-04-21 22:53:05 +02:00
|
|
|
|
2022-04-06 19:33:00 +02:00
|
|
|
#include <memory>
|
|
|
|
|
Use deleters to label cache entries and collect stats (#8297)
Summary:
This change gathers and publishes statistics about the
kinds of items in block cache. This is especially important for
profiling relative usage of cache by index vs. filter vs. data blocks.
It works by iterating over the cache during periodic stats dump
(InternalStats, stats_dump_period_sec) or on demand when
DB::Get(Map)Property(kBlockCacheEntryStats), except that for
efficiency and sharing among column families, saved data from
the last scan is used when the data is not considered too old.
The new information can be seen in info LOG, for example:
Block cache LRUCache@0x7fca62229330 capacity: 95.37 MB collections: 8 last_copies: 0 last_secs: 0.00178 secs_since: 0
Block cache entry stats(count,size,portion): DataBlock(7092,28.24 MB,29.6136%) FilterBlock(215,867.90 KB,0.888728%) FilterMetaBlock(2,5.31 KB,0.00544%) IndexBlock(217,180.11 KB,0.184432%) WriteBuffer(1,256.00 KB,0.262144%) Misc(1,0.00 KB,0%)
And also through DB::GetProperty and GetMapProperty (here using
ldb just for demonstration):
$ ./ldb --db=/dev/shm/dbbench/ get_property rocksdb.block-cache-entry-stats
rocksdb.block-cache-entry-stats.bytes.data-block: 0
rocksdb.block-cache-entry-stats.bytes.deprecated-filter-block: 0
rocksdb.block-cache-entry-stats.bytes.filter-block: 0
rocksdb.block-cache-entry-stats.bytes.filter-meta-block: 0
rocksdb.block-cache-entry-stats.bytes.index-block: 178992
rocksdb.block-cache-entry-stats.bytes.misc: 0
rocksdb.block-cache-entry-stats.bytes.other-block: 0
rocksdb.block-cache-entry-stats.bytes.write-buffer: 0
rocksdb.block-cache-entry-stats.capacity: 8388608
rocksdb.block-cache-entry-stats.count.data-block: 0
rocksdb.block-cache-entry-stats.count.deprecated-filter-block: 0
rocksdb.block-cache-entry-stats.count.filter-block: 0
rocksdb.block-cache-entry-stats.count.filter-meta-block: 0
rocksdb.block-cache-entry-stats.count.index-block: 215
rocksdb.block-cache-entry-stats.count.misc: 1
rocksdb.block-cache-entry-stats.count.other-block: 0
rocksdb.block-cache-entry-stats.count.write-buffer: 0
rocksdb.block-cache-entry-stats.id: LRUCache@0x7f3636661290
rocksdb.block-cache-entry-stats.percent.data-block: 0.000000
rocksdb.block-cache-entry-stats.percent.deprecated-filter-block: 0.000000
rocksdb.block-cache-entry-stats.percent.filter-block: 0.000000
rocksdb.block-cache-entry-stats.percent.filter-meta-block: 0.000000
rocksdb.block-cache-entry-stats.percent.index-block: 2.133751
rocksdb.block-cache-entry-stats.percent.misc: 0.000000
rocksdb.block-cache-entry-stats.percent.other-block: 0.000000
rocksdb.block-cache-entry-stats.percent.write-buffer: 0.000000
rocksdb.block-cache-entry-stats.secs_for_last_collection: 0.000052
rocksdb.block-cache-entry-stats.secs_since_last_collection: 0
Solution detail - We need some way to flag what kind of blocks each
entry belongs to, preferably without changing the Cache API.
One of the complications is that Cache is a general interface that could
have other users that don't adhere to whichever convention we decide
on for keys and values. Or we would pay for an extra field in the Handle
that would only be used for this purpose.
This change uses a back-door approach, the deleter, to indicate the
"role" of a Cache entry (in addition to the value type, implicitly).
This has the added benefit of ensuring proper code origin whenever we
recognize a particular role for a cache entry; if the entry came from
some other part of the code, it will use an unrecognized deleter, which
we simply attribute to the "Misc" role.
An internal API makes for simple instantiation and automatic
registration of Cache deleters for a given value type and "role".
Another internal API, CacheEntryStatsCollector, solves the problem of
caching the results of a scan and sharing them, to ensure scans are
neither excessive nor redundant so as not to harm Cache performance.
Because code is added to BlocklikeTraits, it is pulled out of
block_based_table_reader.cc into its own file.
This is a reformulation of https://github.com/facebook/rocksdb/issues/8276, without the type checking option
(could still be added), and with actual stat gathering.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8297
Test Plan: manual testing with db_bench, and a couple of basic unit tests
Reviewed By: ltamasi
Differential Revision: D28488721
Pulled By: pdillinger
fbshipit-source-id: 472f524a9691b5afb107934be2d41d84f2b129fb
2021-05-20 01:45:51 +02:00
|
|
|
#include "cache/cache_entry_roles.h"
|
2021-08-24 21:42:31 +02:00
|
|
|
#include "cache/cache_reservation_manager.h"
|
2021-04-21 22:53:05 +02:00
|
|
|
#include "db/db_impl/db_impl.h"
|
2021-08-24 21:42:31 +02:00
|
|
|
#include "rocksdb/status.h"
|
2017-06-02 23:13:59 +02:00
|
|
|
#include "util/coding.h"
|
|
|
|
|
2020-02-20 21:07:53 +01:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2017-06-02 23:13:59 +02:00
|
|
|
WriteBufferManager::WriteBufferManager(size_t _buffer_size,
|
2021-04-21 22:53:05 +02:00
|
|
|
std::shared_ptr<Cache> cache,
|
|
|
|
bool allow_stall)
|
2017-06-02 23:13:59 +02:00
|
|
|
: buffer_size_(_buffer_size),
|
2017-06-21 19:28:54 +02:00
|
|
|
mutable_limit_(buffer_size_ * 7 / 8),
|
2017-06-02 23:13:59 +02:00
|
|
|
memory_used_(0),
|
|
|
|
memory_active_(0),
|
2021-11-06 00:12:11 +01:00
|
|
|
cache_res_mgr_(nullptr),
|
2021-04-21 22:53:05 +02:00
|
|
|
allow_stall_(allow_stall),
|
|
|
|
stall_active_(false) {
|
2017-06-02 23:13:59 +02:00
|
|
|
#ifndef ROCKSDB_LITE
|
|
|
|
if (cache) {
|
2021-08-24 21:42:31 +02:00
|
|
|
// Memtable's memory usage tends to fluctuate frequently
|
|
|
|
// therefore we set delayed_decrease = true to save some dummy entry
|
|
|
|
// insertion on memory increase right after memory decrease
|
2022-04-06 19:33:00 +02:00
|
|
|
cache_res_mgr_ = std::make_shared<
|
|
|
|
CacheReservationManagerImpl<CacheEntryRole::kWriteBuffer>>(
|
|
|
|
cache, true /* delayed_decrease */);
|
2017-06-02 23:13:59 +02:00
|
|
|
}
|
2018-04-13 02:55:14 +02:00
|
|
|
#else
|
|
|
|
(void)cache;
|
2017-06-02 23:13:59 +02:00
|
|
|
#endif // ROCKSDB_LITE
|
|
|
|
}
|
|
|
|
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
WriteBufferManager::~WriteBufferManager() {
|
|
|
|
#ifndef NDEBUG
|
|
|
|
std::unique_lock<std::mutex> lock(mu_);
|
|
|
|
assert(queue_.empty());
|
|
|
|
#endif
|
|
|
|
}
|
2021-08-24 21:42:31 +02:00
|
|
|
|
|
|
|
std::size_t WriteBufferManager::dummy_entries_in_cache_usage() const {
|
2021-11-06 00:12:11 +01:00
|
|
|
if (cache_res_mgr_ != nullptr) {
|
|
|
|
return cache_res_mgr_->GetTotalReservedCacheSize();
|
2021-08-24 21:42:31 +02:00
|
|
|
} else {
|
|
|
|
return 0;
|
2017-06-02 23:13:59 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-04-21 22:53:05 +02:00
|
|
|
void WriteBufferManager::ReserveMem(size_t mem) {
|
2021-11-06 00:12:11 +01:00
|
|
|
if (cache_res_mgr_ != nullptr) {
|
2021-04-21 22:53:05 +02:00
|
|
|
ReserveMemWithCache(mem);
|
|
|
|
} else if (enabled()) {
|
|
|
|
memory_used_.fetch_add(mem, std::memory_order_relaxed);
|
|
|
|
}
|
|
|
|
if (enabled()) {
|
|
|
|
memory_active_.fetch_add(mem, std::memory_order_relaxed);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-06-02 23:13:59 +02:00
|
|
|
// Should only be called from write thread
|
|
|
|
void WriteBufferManager::ReserveMemWithCache(size_t mem) {
|
|
|
|
#ifndef ROCKSDB_LITE
|
2021-11-06 00:12:11 +01:00
|
|
|
assert(cache_res_mgr_ != nullptr);
|
2018-10-23 19:33:55 +02:00
|
|
|
// Use a mutex to protect various data structures. Can be optimized to a
|
2017-06-02 23:13:59 +02:00
|
|
|
// lock-free solution if it ends up with a performance bottleneck.
|
2021-11-06 00:12:11 +01:00
|
|
|
std::lock_guard<std::mutex> lock(cache_res_mgr_mu_);
|
2017-06-02 23:13:59 +02:00
|
|
|
|
|
|
|
size_t new_mem_used = memory_used_.load(std::memory_order_relaxed) + mem;
|
|
|
|
memory_used_.store(new_mem_used, std::memory_order_relaxed);
|
2022-04-06 19:33:00 +02:00
|
|
|
Status s = cache_res_mgr_->UpdateCacheReservation(new_mem_used);
|
2021-08-24 21:42:31 +02:00
|
|
|
|
|
|
|
// We absorb the error since WriteBufferManager is not able to handle
|
|
|
|
// this failure properly. Ideallly we should prevent this allocation
|
|
|
|
// from happening if this cache reservation fails.
|
|
|
|
// [TODO] We'll need to improve it in the future and figure out what to do on
|
|
|
|
// error
|
|
|
|
s.PermitUncheckedError();
|
2018-04-13 02:55:14 +02:00
|
|
|
#else
|
|
|
|
(void)mem;
|
2017-06-02 23:13:59 +02:00
|
|
|
#endif // ROCKSDB_LITE
|
|
|
|
}
|
|
|
|
|
2021-04-21 22:53:05 +02:00
|
|
|
void WriteBufferManager::ScheduleFreeMem(size_t mem) {
|
|
|
|
if (enabled()) {
|
|
|
|
memory_active_.fetch_sub(mem, std::memory_order_relaxed);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
void WriteBufferManager::FreeMem(size_t mem) {
|
2021-11-06 00:12:11 +01:00
|
|
|
if (cache_res_mgr_ != nullptr) {
|
2021-04-21 22:53:05 +02:00
|
|
|
FreeMemWithCache(mem);
|
|
|
|
} else if (enabled()) {
|
|
|
|
memory_used_.fetch_sub(mem, std::memory_order_relaxed);
|
|
|
|
}
|
|
|
|
// Check if stall is active and can be ended.
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
MaybeEndWriteStall();
|
2021-04-21 22:53:05 +02:00
|
|
|
}
|
|
|
|
|
2017-06-02 23:13:59 +02:00
|
|
|
void WriteBufferManager::FreeMemWithCache(size_t mem) {
|
|
|
|
#ifndef ROCKSDB_LITE
|
2021-11-06 00:12:11 +01:00
|
|
|
assert(cache_res_mgr_ != nullptr);
|
2018-10-23 19:33:55 +02:00
|
|
|
// Use a mutex to protect various data structures. Can be optimized to a
|
2017-06-02 23:13:59 +02:00
|
|
|
// lock-free solution if it ends up with a performance bottleneck.
|
2021-11-06 00:12:11 +01:00
|
|
|
std::lock_guard<std::mutex> lock(cache_res_mgr_mu_);
|
2017-06-02 23:13:59 +02:00
|
|
|
size_t new_mem_used = memory_used_.load(std::memory_order_relaxed) - mem;
|
|
|
|
memory_used_.store(new_mem_used, std::memory_order_relaxed);
|
2022-04-06 19:33:00 +02:00
|
|
|
Status s = cache_res_mgr_->UpdateCacheReservation(new_mem_used);
|
2021-08-24 21:42:31 +02:00
|
|
|
|
|
|
|
// We absorb the error since WriteBufferManager is not able to handle
|
|
|
|
// this failure properly.
|
|
|
|
// [TODO] We'll need to improve it in the future and figure out what to do on
|
|
|
|
// error
|
|
|
|
s.PermitUncheckedError();
|
2018-04-13 02:55:14 +02:00
|
|
|
#else
|
|
|
|
(void)mem;
|
2017-06-02 23:13:59 +02:00
|
|
|
#endif // ROCKSDB_LITE
|
|
|
|
}
|
2021-04-21 22:53:05 +02:00
|
|
|
|
|
|
|
void WriteBufferManager::BeginWriteStall(StallInterface* wbm_stall) {
|
|
|
|
assert(wbm_stall != nullptr);
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
assert(allow_stall_);
|
|
|
|
|
|
|
|
// Allocate outside of the lock.
|
|
|
|
std::list<StallInterface*> new_node = {wbm_stall};
|
|
|
|
|
|
|
|
{
|
2021-04-21 22:53:05 +02:00
|
|
|
std::unique_lock<std::mutex> lock(mu_);
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
// Verify if the stall conditions are stil active.
|
|
|
|
if (ShouldStall()) {
|
|
|
|
stall_active_.store(true, std::memory_order_relaxed);
|
|
|
|
queue_.splice(queue_.end(), std::move(new_node));
|
|
|
|
}
|
2021-04-21 22:53:05 +02:00
|
|
|
}
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
|
|
|
|
// If the node was not consumed, the stall has ended already and we can signal
|
|
|
|
// the caller.
|
|
|
|
if (!new_node.empty()) {
|
|
|
|
new_node.front()->Signal();
|
2021-04-21 22:53:05 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
// Called when memory is freed in FreeMem or the buffer size has changed.
|
|
|
|
void WriteBufferManager::MaybeEndWriteStall() {
|
|
|
|
// Cannot early-exit on !enabled() because SetBufferSize(0) needs to unblock
|
|
|
|
// the writers.
|
|
|
|
if (!allow_stall_) {
|
|
|
|
return;
|
|
|
|
}
|
2021-04-21 22:53:05 +02:00
|
|
|
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
if (IsStallThresholdExceeded()) {
|
|
|
|
return; // Stall conditions have not resolved.
|
|
|
|
}
|
|
|
|
|
|
|
|
// Perform all deallocations outside of the lock.
|
|
|
|
std::list<StallInterface*> cleanup;
|
|
|
|
|
|
|
|
std::unique_lock<std::mutex> lock(mu_);
|
|
|
|
if (!stall_active_.load(std::memory_order_relaxed)) {
|
|
|
|
return; // Nothing to do.
|
|
|
|
}
|
|
|
|
|
|
|
|
// Unblock new writers.
|
|
|
|
stall_active_.store(false, std::memory_order_relaxed);
|
|
|
|
|
|
|
|
// Unblock the writers in the queue.
|
|
|
|
for (StallInterface* wbm_stall : queue_) {
|
|
|
|
wbm_stall->Signal();
|
2021-04-21 22:53:05 +02:00
|
|
|
}
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
cleanup = std::move(queue_);
|
2021-04-21 22:53:05 +02:00
|
|
|
}
|
|
|
|
|
|
|
|
void WriteBufferManager::RemoveDBFromQueue(StallInterface* wbm_stall) {
|
|
|
|
assert(wbm_stall != nullptr);
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
|
|
|
|
// Deallocate the removed nodes outside of the lock.
|
|
|
|
std::list<StallInterface*> cleanup;
|
|
|
|
|
2021-04-21 22:53:05 +02:00
|
|
|
if (enabled() && allow_stall_) {
|
|
|
|
std::unique_lock<std::mutex> lock(mu_);
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
for (auto it = queue_.begin(); it != queue_.end();) {
|
|
|
|
auto next = std::next(it);
|
|
|
|
if (*it == wbm_stall) {
|
|
|
|
cleanup.splice(cleanup.end(), queue_, std::move(it));
|
|
|
|
}
|
|
|
|
it = next;
|
|
|
|
}
|
2021-04-21 22:53:05 +02:00
|
|
|
}
|
Fix race in WriteBufferManager (#9009)
Summary:
EndWriteStall has a data race: `queue_.empty()` is checked outside of the
mutex, so once we enter the critical section another thread may already have
cleared the list, and accessing the `front()` is undefined behavior (and causes
interesting crashes under high concurrency).
This PR fixes the bug, and also rewrites the logic to make it easier to reason
about it. It also fixes another subtle bug: if some writers are stalled and
`SetBufferSize(0)` is called, which disables the WBM, the writer are not
unblocked because of an early `enabled()` check in `EndWriteStall()`.
It doesn't significantly change the locking behavior, as before writers won't
lock unless entering a stall condition, and `FreeMem` almost always locks if
stalling is allowed, but that is inevitable with the current design. Liveness is
guaranteed by the fact that if some writes are blocked, eventually all writes
will be blocked due to `stall_active_`, and eventually all memory is freed.
While at it, do a couple of optimizations:
- In `WBMStallInterface::Signal()` signal the CV only after releasing the
lock. Signaling under the lock is a common pitfall, as it causes the woken-up
thread to immediately go back to sleep because the mutex is still locked by
the awaker.
- Move all allocations and deallocations outside of the lock.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9009
Test Plan:
```
USE_CLANG=1 make -j64 all check
```
Reviewed By: akankshamahajan15
Differential Revision: D31550668
Pulled By: ot
fbshipit-source-id: 5125387c3dc7ecaaa2b8bbc736e58c4156698580
2021-10-12 09:14:41 +02:00
|
|
|
wbm_stall->Signal();
|
2021-04-21 22:53:05 +02:00
|
|
|
}
|
|
|
|
|
2020-02-20 21:07:53 +01:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|