stagger first DumpMallocStats after opening DB (#7145)

Summary:
Previously when running `db_bench` with large value for `num_multi_dbs` and enabled `Options::dump_malloc_stats`, we would see most CPU spent in jemalloc locking. After this PR that no longer shows up at the top of the profile.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7145

Reviewed By: riversand963

Differential Revision: D22593031

Pulled By: ajkr

fbshipit-source-id: 3b3fc91f93249c6afee53f59f34c487c3fc5add6
This commit is contained in:
Andrew Kryczka 2020-07-17 16:12:09 -07:00 committed by Facebook GitHub Bot
parent ec711b2315
commit 9a83fd21e6
2 changed files with 21 additions and 1 deletions

View File

@ -8,6 +8,7 @@
* Best-efforts recovery ignores CURRENT file completely. If CURRENT file is missing during recovery, best-efforts recovery still proceeds with MANIFEST file(s).
* In best-efforts recovery, an error that is not Corruption or IOError::kNotFound or IOError::kPathNotFound will be overwritten silently. Fix this by checking all non-ok cases and return early.
* When `file_checksum_gen_factory` is set to `GetFileChecksumGenCrc32cFactory()`, BackupEngine will compare the crc32c checksums of table files computed when creating a backup to the expected checksums stored in the DB manifest, and will fail `CreateNewBackup()` on mismatch (corruption). If the `file_checksum_gen_factory` is not set or set to any other customized factory, there is no checksum verification to detect if SST files in a DB are corrupt when read, copied, and independently checksummed by BackupEngine.
* When a DB sets `stats_dump_period_sec > 0`, either as the initial value for DB open or as a dynamic option change, the first stats dump is staggered in the following X seconds, where X is an integer in `[0, stats_dump_period_sec)`. Subsequent stats dumps are still spaced `stats_dump_period_sec` seconds apart.
### Bug fixes
* Compressed block cache was automatically disabled with read-only DBs by mistake. Now it is fixed: compressed block cache will be in effective with read-only DB too.

View File

@ -685,9 +685,18 @@ void DBImpl::StartTimedTasks() {
stats_dump_period_sec = mutable_db_options_.stats_dump_period_sec;
if (stats_dump_period_sec > 0) {
if (!thread_dump_stats_) {
// In case of many `DB::Open()` in rapid succession we can have all
// threads dumping at once, which causes severe lock contention in
// jemalloc. Ensure successive `DB::Open()`s are staggered by at least
// one second in the common case.
static uint64_t stats_dump_threads_started = 0;
thread_dump_stats_.reset(new ROCKSDB_NAMESPACE::RepeatableThread(
[this]() { DBImpl::DumpStats(); }, "dump_st", env_,
static_cast<uint64_t>(stats_dump_period_sec) * kMicrosInSecond));
static_cast<uint64_t>(stats_dump_period_sec) * kMicrosInSecond,
stats_dump_threads_started %
static_cast<uint64_t>(stats_dump_period_sec) *
kMicrosInSecond));
++stats_dump_threads_started;
}
}
stats_persist_period_sec = mutable_db_options_.stats_persist_period_sec;
@ -1083,10 +1092,20 @@ Status DBImpl::SetDBOptions(
mutex_.Lock();
}
if (new_options.stats_dump_period_sec > 0) {
// In case many DBs have `stats_dump_period_sec` enabled in rapid
// succession, we can have all threads dumping at once, which causes
// severe lock contention in jemalloc. Ensure successive enabling of
// `stats_dump_period_sec` are staggered by at least one second in the
// common case.
static uint64_t stats_dump_threads_started = 0;
thread_dump_stats_.reset(new ROCKSDB_NAMESPACE::RepeatableThread(
[this]() { DBImpl::DumpStats(); }, "dump_st", env_,
static_cast<uint64_t>(new_options.stats_dump_period_sec) *
kMicrosInSecond,
stats_dump_threads_started %
static_cast<uint64_t>(new_options.stats_dump_period_sec) *
kMicrosInSecond));
++stats_dump_threads_started;
} else {
thread_dump_stats_.reset();
}