stagger first DumpMallocStats after opening DB (#7145)

Summary: Previously when running `db_bench` with large value for `num_multi_dbs` and enabled `Options::dump_malloc_stats`, we would see most CPU spent in jemalloc locking. After this PR that no longer shows up at the top of the profile. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7145 Reviewed By: riversand963 Differential Revision: D22593031 Pulled By: ajkr fbshipit-source-id: 3b3fc91f93249c6afee53f59f34c487c3fc5add6
2020-07-17 16:12:09 -07:00 · 2020-07-17 16:12:09 -07:00 · 9a83fd21e6
commit 9a83fd21e6
parent ec711b2315
2 changed files with 21 additions and 1 deletions
--- a/HISTORY.md
+++ b/HISTORY.md
@ -8,6 +8,7 @@
 * Best-efforts recovery ignores CURRENT file completely. If CURRENT file is missing during recovery, best-efforts recovery still proceeds with MANIFEST file(s).
 * In best-efforts recovery, an error that is not Corruption or IOError::kNotFound or IOError::kPathNotFound will be overwritten silently. Fix this by checking all non-ok cases and return early.
 * When `file_checksum_gen_factory` is set to `GetFileChecksumGenCrc32cFactory()`, BackupEngine will compare the crc32c checksums of table files computed when creating a backup to the expected checksums stored in the DB manifest, and will fail `CreateNewBackup()` on mismatch (corruption). If the `file_checksum_gen_factory` is not set or set to any other customized factory, there is no checksum verification to detect if SST files in a DB are corrupt when read, copied, and independently checksummed by BackupEngine.
+* When a DB sets `stats_dump_period_sec > 0`, either as the initial value for DB open or as a dynamic option change, the first stats dump is staggered in the following X seconds, where X is an integer in `[0, stats_dump_period_sec)`. Subsequent stats dumps are still spaced `stats_dump_period_sec` seconds apart.

 ### Bug fixes
 * Compressed block cache was automatically disabled with read-only DBs by mistake. Now it is fixed: compressed block cache will be in effective with read-only DB too.
--- a/db/db_impl/db_impl.cc
+++ b/db/db_impl/db_impl.cc
@ -685,9 +685,18 @@ void DBImpl::StartTimedTasks() {
    stats_dump_period_sec = mutable_db_options_.stats_dump_period_sec;
    if (stats_dump_period_sec > 0) {
      if (!thread_dump_stats_) {
+        // In case of many `DB::Open()` in rapid succession we can have all
+        // threads dumping at once, which causes severe lock contention in
+        // jemalloc. Ensure successive `DB::Open()`s are staggered by at least
+        // one second in the common case.
+        static uint64_t stats_dump_threads_started = 0;
        thread_dump_stats_.reset(new ROCKSDB_NAMESPACE::RepeatableThread(
            [this]() { DBImpl::DumpStats(); }, "dump_st", env_,
-            static_cast<uint64_t>(stats_dump_period_sec) * kMicrosInSecond));
+            static_cast<uint64_t>(stats_dump_period_sec) * kMicrosInSecond,
+            stats_dump_threads_started %
+                static_cast<uint64_t>(stats_dump_period_sec) *
+                kMicrosInSecond));
+        ++stats_dump_threads_started;
      }
    }
    stats_persist_period_sec = mutable_db_options_.stats_persist_period_sec;
@ -1083,10 +1092,20 @@ Status DBImpl::SetDBOptions(
          mutex_.Lock();
        }
        if (new_options.stats_dump_period_sec > 0) {
+          // In case many DBs have `stats_dump_period_sec` enabled in rapid
+          // succession, we can have all threads dumping at once, which causes
+          // severe lock contention in jemalloc. Ensure successive enabling of
+          // `stats_dump_period_sec` are staggered by at least one second in the
+          // common case.
+          static uint64_t stats_dump_threads_started = 0;
          thread_dump_stats_.reset(new ROCKSDB_NAMESPACE::RepeatableThread(
              [this]() { DBImpl::DumpStats(); }, "dump_st", env_,
              static_cast<uint64_t>(new_options.stats_dump_period_sec) *
+                  kMicrosInSecond,
+              stats_dump_threads_started %
+                  static_cast<uint64_t>(new_options.stats_dump_period_sec) *
                  kMicrosInSecond));
+          ++stats_dump_threads_started;
        } else {
          thread_dump_stats_.reset();
        }