Fix double-dumping CF stats to log (#8380)

Summary:
DBImpl::DumpStats is supposed to do this:
Dump DB stats to LOG
For each CF, dump CFStatsNoFileHistogram to LOG
For each CF, dump CFFileHistogram to LOG

Instead, due to a longstanding bug from 2017 (https://github.com/facebook/rocksdb/issues/2126), it would dump
CFStats, which includes both CFStatsNoFileHistogram and CFFileHistogram,
in both loops, resulting in near-duplicate output.

This fixes the bug.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8380

Test Plan: Manual inspection of LOG after db_bench

Reviewed By: jay-zhuang

Differential Revision: D29017535

Pulled By: pdillinger

fbshipit-source-id: 3010604c4a629a80347f129cd746ce9b0d0cbda6
This commit is contained in:
Peter Dillinger 2021-06-11 17:04:43 -07:00 committed by Facebook GitHub Bot
parent 58162835d1
commit b3dbeadc34
2 changed files with 18 additions and 13 deletions

View File

@ -16,6 +16,7 @@
* Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe(). * Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe().
* In the IngestExternalFile() API, only try to sync the ingested file if the file is linked and the FileSystem/Env supports reopening a writable file. * In the IngestExternalFile() API, only try to sync the ingested file if the file is linked and the FileSystem/Env supports reopening a writable file.
* Fixed a bug that `AdvancedColumnFamilyOptions.max_compaction_bytes` is under-calculated for manual compaction (`CompactRange()`). Manual compaction is split to multiple compactions if the compaction size exceed the `max_compaction_bytes`. The bug creates much larger compaction which size exceed the user setting. On the other hand, larger manual compaction size can increase the subcompaction parallelism, you can tune that by setting `max_compaction_bytes`. * Fixed a bug that `AdvancedColumnFamilyOptions.max_compaction_bytes` is under-calculated for manual compaction (`CompactRange()`). Manual compaction is split to multiple compactions if the compaction size exceed the `max_compaction_bytes`. The bug creates much larger compaction which size exceed the user setting. On the other hand, larger manual compaction size can increase the subcompaction parallelism, you can tune that by setting `max_compaction_bytes`.
* Fixed confusingly duplicated output in LOG for periodic stats ("DUMPING STATS"), including "Compaction Stats" and "File Read Latency Histogram By Level".
### Behavior Changes ### Behavior Changes
* Due to the fix of false-postive alert of "SST file is ahead of WAL", all the CFs with no SST file (CF empty) will bypass the consistency check. We fixed a false-positive, but introduced a very rare true-negative which will be triggered in the following conditions: A CF with some delete operations in the last a few queries which will result in an empty CF (those are flushed to SST file and a compaction triggered which combines this file and all other SST files and generates an empty CF, or there is another reason to write a manifest entry for this CF after a flush that generates no SST file from an empty CF). The deletion entries are logged in a WAL and this WAL was corrupted, while the CF's log number points to the next WAL (due to the flush). Therefore, the DB can only recover to the point without these trailing deletions and cause the inconsistent DB status. * Due to the fix of false-postive alert of "SST file is ahead of WAL", all the CFs with no SST file (CF empty) will bypass the consistency check. We fixed a false-positive, but introduced a very rare true-negative which will be triggered in the following conditions: A CF with some delete operations in the last a few queries which will result in an empty CF (those are flushed to SST file and a compaction triggered which combines this file and all other SST files and generates an empty CF, or there is another reason to write a manifest entry for this CF after a flush that generates no SST file from an empty CF). The deletion entries are logged in a WAL and this WAL was corrupted, while the CF's log number points to the next WAL (due to the flush). Therefore, the DB can only recover to the point without these trailing deletions and cause the inconsistent DB status.

View File

@ -907,13 +907,6 @@ Status DBImpl::GetStatsHistory(
void DBImpl::DumpStats() { void DBImpl::DumpStats() {
TEST_SYNC_POINT("DBImpl::DumpStats:1"); TEST_SYNC_POINT("DBImpl::DumpStats:1");
#ifndef ROCKSDB_LITE #ifndef ROCKSDB_LITE
const DBPropertyInfo* cf_property_info =
GetPropertyInfo(DB::Properties::kCFStats);
assert(cf_property_info != nullptr);
const DBPropertyInfo* db_property_info =
GetPropertyInfo(DB::Properties::kDBStats);
assert(db_property_info != nullptr);
std::string stats; std::string stats;
if (shutdown_initiated_) { if (shutdown_initiated_) {
return; return;
@ -921,18 +914,29 @@ void DBImpl::DumpStats() {
TEST_SYNC_POINT("DBImpl::DumpStats:StartRunning"); TEST_SYNC_POINT("DBImpl::DumpStats:StartRunning");
{ {
InstrumentedMutexLock l(&mutex_); InstrumentedMutexLock l(&mutex_);
default_cf_internal_stats_->GetStringProperty( const std::string* property = &DB::Properties::kDBStats;
*db_property_info, DB::Properties::kDBStats, &stats); const DBPropertyInfo* property_info = GetPropertyInfo(*property);
assert(property_info != nullptr);
default_cf_internal_stats_->GetStringProperty(*property_info, *property,
&stats);
property = &DB::Properties::kCFStatsNoFileHistogram;
property_info = GetPropertyInfo(*property);
assert(property_info != nullptr);
for (auto cfd : *versions_->GetColumnFamilySet()) { for (auto cfd : *versions_->GetColumnFamilySet()) {
if (cfd->initialized()) { if (cfd->initialized()) {
cfd->internal_stats()->GetStringProperty( cfd->internal_stats()->GetStringProperty(*property_info, *property,
*cf_property_info, DB::Properties::kCFStatsNoFileHistogram, &stats); &stats);
} }
} }
property = &DB::Properties::kCFFileHistogram;
property_info = GetPropertyInfo(*property);
assert(property_info != nullptr);
for (auto cfd : *versions_->GetColumnFamilySet()) { for (auto cfd : *versions_->GetColumnFamilySet()) {
if (cfd->initialized()) { if (cfd->initialized()) {
cfd->internal_stats()->GetStringProperty( cfd->internal_stats()->GetStringProperty(*property_info, *property,
*cf_property_info, DB::Properties::kCFFileHistogram, &stats); &stats);
} }
} }
} }