74544d582f
Summary: Note: This PR is the 4th part of a bigger PR stack (https://github.com/facebook/rocksdb/pull/9073) and will rebase/merge only after the first three PRs (https://github.com/facebook/rocksdb/pull/9070, https://github.com/facebook/rocksdb/pull/9071, https://github.com/facebook/rocksdb/pull/9130) merge. **Context:** Similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track memory usage during (new) Bloom Filter (i.e,FastLocalBloom) and Ribbon Filter (i.e, Ribbon128) construction, moving toward the goal of [single global memory limit using block cache capacity](https://github.com/facebook/rocksdb/wiki/Projects-Being-Developed#improving-memory-efficiency). It also constrains the size of the banding portion of Ribbon Filter during construction by falling back to Bloom Filter if that banding is, at some point, larger than the available space in the cache under `LRUCacheOptions::strict_capacity_limit=true`. The option to turn on this feature is `BlockBasedTableOptions::reserve_table_builder_memory = true` which by default is set to `false`. We [decided](https://github.com/facebook/rocksdb/pull/9073#discussion_r741548409) not to have separate option for separate memory user in table building therefore their memory accounting are all bundled under one general option. **Summary:** - Reserved/released cache for creation/destruction of three main memory users with the passed-in `FilterBuildingContext::cache_res_mgr` during filter construction: - hash entries (i.e`hash_entries`.size(), we bucket-charge hash entries during insertion for performance), - banding (Ribbon Filter only, `bytes_coeff_rows` +`bytes_result_rows` + `bytes_backtrack`), - final filter (i.e, `mutable_buf`'s size). - Implementation details: in order to use `CacheReservationManager::CacheReservationHandle` to account final filter's memory, we have to store the `CacheReservationManager` object and `CacheReservationHandle` for final filter in `XXPH3BitsFilterBuilder` as well as explicitly delete the filter bits builder when done with the final filter in block based table. - Added option fo run `filter_bench` with this memory reservation feature Pull Request resolved: https://github.com/facebook/rocksdb/pull/9073 Test Plan: - Added new tests in `db_bloom_filter_test` to verify filter construction peak cache reservation under combination of `BlockBasedTable::Rep::FilterType` (e.g, `kFullFilter`, `kPartitionedFilter`), `BloomFilterPolicy::Mode`(e.g, `kFastLocalBloom`, `kStandard128Ribbon`, `kDeprecatedBlock`) and `BlockBasedTableOptions::reserve_table_builder_memory` - To address the concern for slow test: tests with memory reservation under `kFullFilter` + `kStandard128Ribbon` and `kPartitionedFilter` take around **3000 - 6000 ms** and others take around **1500 - 2000 ms**, in total adding **20000 - 25000 ms** to the test suit running locally - Added new test in `bloom_test` to verify Ribbon Filter fallback on large banding in FullFilter - Added test in `filter_bench` to verify that this feature does not significantly slow down Bloom/Ribbon Filter construction speed. Local result averaged over **20** run as below: - FastLocalBloom - baseline `./filter_bench -impl=2 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 29.56295** (DEBUG_LEVEL=1), **29.98153** (DEBUG_LEVEL=0) - new feature (expected to be similar as above)`./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg'`: - **Build avg ns/key: 30.99046** (DEBUG_LEVEL=1), **30.48867** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be similar as above) `./filter_bench -impl=2 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 31.146975** (DEBUG_LEVEL=1), **30.08165** (DEBUG_LEVEL=0) - Ribbon128 - baseline `./filter_bench -impl=3 -quick -runs 20 | grep 'Build avg'`: - **Build avg ns/key: 129.17585** (DEBUG_LEVEL=1), **130.5225** (DEBUG_LEVEL=0) - new feature (expected to be similar as above) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true | grep 'Build avg' `: - **Build avg ns/key: 131.61645** (DEBUG_LEVEL=1), **132.98075** (DEBUG_LEVEL=0) - new feature of RibbonFilter with fallback (expected to be a lot faster than above due to fallback) `./filter_bench -impl=3 -quick -runs 20 -reserve_table_builder_memory=true -strict_capacity_limit=true | grep 'Build avg'` : - **Build avg ns/key: 52.032965** (DEBUG_LEVEL=1), **52.597825** (DEBUG_LEVEL=0) - And the warning message of `"Cache reservation for Ribbon filter banding failed due to cache full"` is indeed logged to console. Reviewed By: pdillinger Differential Revision: D31991348 Pulled By: hx235 fbshipit-source-id: 9336b2c60f44d530063da518ceaf56dac5f9df8e
135 lines
5.2 KiB
C++
135 lines
5.2 KiB
C++
// Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
#pragma once
|
|
|
|
#include <array>
|
|
#include <cstdint>
|
|
#include <memory>
|
|
#include <type_traits>
|
|
#include <unordered_map>
|
|
|
|
#include "rocksdb/cache.h"
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
// Classifications of block cache entries, for reporting statistics
|
|
// Adding new enum to this class requires corresponding updates to
|
|
// kCacheEntryRoleToCamelString and kCacheEntryRoleToHyphenString
|
|
enum class CacheEntryRole {
|
|
// Block-based table data block
|
|
kDataBlock,
|
|
// Block-based table filter block (full or partitioned)
|
|
kFilterBlock,
|
|
// Block-based table metadata block for partitioned filter
|
|
kFilterMetaBlock,
|
|
// Block-based table deprecated filter block (old "block-based" filter)
|
|
kDeprecatedFilterBlock,
|
|
// Block-based table index block
|
|
kIndexBlock,
|
|
// Other kinds of block-based table block
|
|
kOtherBlock,
|
|
// WriteBufferManager reservations to account for memtable usage
|
|
kWriteBuffer,
|
|
// BlockBasedTableBuilder reservations to account for
|
|
// compression dictionary building buffer's memory usage
|
|
kCompressionDictionaryBuildingBuffer,
|
|
// Filter reservations to account for
|
|
// (new) bloom and ribbon filter construction's memory usage
|
|
kFilterConstruction,
|
|
// Default bucket, for miscellaneous cache entries. Do not use for
|
|
// entries that could potentially add up to large usage.
|
|
kMisc,
|
|
};
|
|
constexpr uint32_t kNumCacheEntryRoles =
|
|
static_cast<uint32_t>(CacheEntryRole::kMisc) + 1;
|
|
|
|
extern std::array<const char*, kNumCacheEntryRoles>
|
|
kCacheEntryRoleToCamelString;
|
|
extern std::array<const char*, kNumCacheEntryRoles>
|
|
kCacheEntryRoleToHyphenString;
|
|
|
|
// To associate cache entries with their role, we use a hack on the
|
|
// existing Cache interface. Because the deleter of an entry can authenticate
|
|
// the code origin of an entry, we can elaborate the choice of deleter to
|
|
// also encode role information, without inferring false role information
|
|
// from entries not choosing to encode a role.
|
|
//
|
|
// The rest of this file is for handling mappings between deleters and
|
|
// roles.
|
|
|
|
// To infer a role from a deleter, the deleter must be registered. This
|
|
// can be done "manually" with this function. This function is thread-safe,
|
|
// and the registration mappings go into private but static storage. (Note
|
|
// that DeleterFn is a function pointer, not std::function. Registrations
|
|
// should not be too many.)
|
|
void RegisterCacheDeleterRole(Cache::DeleterFn fn, CacheEntryRole role);
|
|
|
|
// Gets a copy of the registered deleter -> role mappings. This is the only
|
|
// function for reading the mappings made with RegisterCacheDeleterRole.
|
|
// Why only this interface for reading?
|
|
// * This function has to be thread safe, which could incur substantial
|
|
// overhead. We should not pay this overhead for every deleter look-up.
|
|
// * This is suitable for preparing for batch operations, like with
|
|
// CacheEntryStatsCollector.
|
|
// * The number of mappings should be sufficiently small (dozens).
|
|
std::unordered_map<Cache::DeleterFn, CacheEntryRole> CopyCacheDeleterRoleMap();
|
|
|
|
// ************************************************************** //
|
|
// An automatic registration infrastructure. This enables code
|
|
// to simply ask for a deleter associated with a particular type
|
|
// and role, and registration is automatic. In a sense, this is
|
|
// a small dependency injection infrastructure, because linking
|
|
// in new deleter instantiations is essentially sufficient for
|
|
// making stats collection (using CopyCacheDeleterRoleMap) aware
|
|
// of them.
|
|
|
|
namespace cache_entry_roles_detail {
|
|
|
|
template <typename T, CacheEntryRole R>
|
|
struct RegisteredDeleter {
|
|
RegisteredDeleter() { RegisterCacheDeleterRole(Delete, R); }
|
|
|
|
// These have global linkage to help ensure compiler optimizations do not
|
|
// break uniqueness for each <T,R>
|
|
static void Delete(const Slice& /* key */, void* value) {
|
|
// Supports T == Something[], unlike delete operator
|
|
std::default_delete<T>()(
|
|
static_cast<typename std::remove_extent<T>::type*>(value));
|
|
}
|
|
};
|
|
|
|
template <CacheEntryRole R>
|
|
struct RegisteredNoopDeleter {
|
|
RegisteredNoopDeleter() { RegisterCacheDeleterRole(Delete, R); }
|
|
|
|
static void Delete(const Slice& /* key */, void* /* value */) {
|
|
// Here was `assert(value == nullptr);` but we can also put pointers
|
|
// to static data in Cache, for testing at least.
|
|
}
|
|
};
|
|
|
|
} // namespace cache_entry_roles_detail
|
|
|
|
// Get an automatically registered deleter for value type T and role R.
|
|
// Based on C++ semantics, registration is invoked exactly once in a
|
|
// thread-safe way on first call to this function, for each <T, R>.
|
|
template <typename T, CacheEntryRole R>
|
|
Cache::DeleterFn GetCacheEntryDeleterForRole() {
|
|
static cache_entry_roles_detail::RegisteredDeleter<T, R> reg;
|
|
return reg.Delete;
|
|
}
|
|
|
|
// Get an automatically registered no-op deleter (value should be nullptr)
|
|
// and associated with role R. This is used for Cache "reservation" entries
|
|
// such as for WriteBufferManager.
|
|
template <CacheEntryRole R>
|
|
Cache::DeleterFn GetNoopDeleterForRole() {
|
|
static cache_entry_roles_detail::RegisteredNoopDeleter<R> reg;
|
|
return reg.Delete;
|
|
}
|
|
|
|
} // namespace ROCKSDB_NAMESPACE
|