rocksdb/db/db_impl
Zhichao Cao fe352574b4 Fix the false positive alert of CF consistency check in WAL recovery (#8207)
Summary:
In current RocksDB, in recover the information form WAL, we do the consistency check for each column family when one WAL file is corrupted and PointInTimeRecovery is set. However, it will report a false positive alert on "SST file is ahead of WALs" when one of the CF current log number is greater than the corrupted WAL number (CF contains the data beyond the corrupted WAl) due to a new column family creation during flush. In this case, a new WAL is created (it is empty) during a flush. Also, due to some reason (e.g., storage issue or crash happens before SyncCloseLog is called), the old WAL is corrupted. The new CF has no data, therefore, it does not have the consistency issue.

Fix: when checking cfd->GetLogNumber() > corrupted_wal_number also check cfd->GetLiveSstFilesSize() > 0. So the CFs with no SST file data will skip the check here.

Note potential ignored inconsistency caused due to fix: empty CF can also be caused by write+delete. In this case, after flush, there is no SST files being generated. However, this CF still have the log in the WAL. When the WAL is corrupted, the DB might be inconsistent.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8207

Test Plan: added unit test, make crash_test

Reviewed By: riversand963

Differential Revision: D27898839

Pulled By: zhichao-cao

fbshipit-source-id: 931fc2d8b92dd00b4169bf84b94e712fd688a83e
2021-04-23 15:39:35 -07:00
..
db_impl_compaction_flush.cc Fix the false positive alert of CF consistency check in WAL recovery (#8207) 2021-04-23 15:39:35 -07:00
db_impl_debug.cc Do not set bg error for compaction in retryable IO Error case (#7899) 2021-01-27 17:58:12 -08:00
db_impl_experimental.cc Replace reinterpret_cast with static_cast_with_check (#7067) 2020-07-02 19:25:41 -07:00
db_impl_files.cc Handle rename() failure in non-local FS (#8192) 2021-04-19 19:56:59 -07:00
db_impl_open.cc Fix the false positive alert of CF consistency check in WAL recovery (#8207) 2021-04-23 15:39:35 -07:00
db_impl_readonly.cc Bug fix for status overridden by Status::NotFound in db_impl_readonly (#7972) 2021-02-17 19:35:57 -08:00
db_impl_readonly.h RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046) 2020-09-17 15:41:25 -07:00
db_impl_secondary.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
db_impl_secondary.h RocksJava - Add errorIfLogFileExists parameter to RocksDB.openReadOnly (#7046) 2020-09-17 15:41:25 -07:00
db_impl_write.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
db_impl.cc Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00
db_impl.h Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00