rocksdb/utilities
Peter Dillinger 9d8eb77c4d Less I/O for incremental backups, slightly better corruption detection (#7413)
Summary:
Two relatively simple functional changes to incremental backup
behavior, integrated with a minor refactoring to reduce code redundancy and
improve error/log message. There are nuances to the impact of these changes,
but I believe they are fundamentally good and generally safe. Those functional
changes:

* Incremental backups no longer read DB table files that are already saved to a
shared part of the backup directory, unless `share_files_with_checksum` is used
with `kLegacyCrc32cAndFileSize` naming (discouraged) where crc32c full file
checksums are needed to determine file naming.
  * Justification: incremental backups should not need to read the whole DB,
especially without rate limiting. (Although other BackupEngine reads are not
rate limited either, other non-trivial reads are generally limited by a
corresponding write, as in copying files.) Also, the fact that this is not
already fixed was arguably a bug/oversight in the implementation of https://github.com/facebook/rocksdb/issues/7110.

* When considering whether a table file is already backed up in a shared part
of backup directory, BackupEngine would already query the sizes of source (DB)
and pre-existing destination (backup) files. BackupEngine now uses these file
sizes to detect corruption, as at least one of (a) old backup, (b) backup in
progress, or (c) current DB is corrupt if there's a size mismatch.
  * Justification: a random related fix that also helps to cover a small hole
in corruption checking uncovered by the other functional change:
  * For `share_table_files` without "checksum" (not recommended), the other
change regresses in detecting fundamentally unsafe use of this option
combination: when you might generate different versions of same SST file
number. As demonstrated by `BackupableDBTest.FailOverwritingBackups,` this
regression is greatly mitigated by the new file size checking. Nevertheless,
almost no reason to use `share_files_with_checksum=false` should remain, and
comments are updated appropriately.

Also, this change renames internal function `CalculateChecksum` to
`ReadFileAndComputeChecksum` to make the performance impact of this function
clear in code reviews.

It is not clear what 'same_path' is for in backupable_db.cc, and I suspect it
cannot be true for a DB with unique file names (like DBImpl). Nevertheless,
I've tried to keep its functionality intact when `true` to minimize risk for
now, despite having no unit tests for which it is true.

Select impact details (much more in unit tests): For
`share_files_with_checksum`, I am confident there is no regression (vs.
pre-6.12) in detecting DB or backup corruption at backup creation time, mostly
because the old design did not leverage this extra checksum computation for
detecting inconsistencies at backup creation time. (With computed checksums in
names, a recently corrupted file just looked like a different file vs. what was
already backed up.)

Even in the hypothetical case of DB session id collision (~100 bits entropy
collision), file size in name and/or our file size check add an extra layer of
protection against false success in creating an accurate new backup. (Unit test
included.)

`DB::VerifyChecksum` and `BackupEngine::VerifyBackup` with checksum checking
are still able to catch corruptions that `CreateNewBackup` does not. Note that
when custom file checksum support is added to BackupEngine, that will
essentially give the same power as `DB::VerifyChecksum` into `CreateNewBackup`.
We could add options for `CreateNewBackup` to cover some of what would be
caught by `VerifyBackup` with checksum checking.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7413

Test Plan:
Two new unit tests included, both of which fail without these
changes. Although we don't test the I/O improvement directly, we test it
indirectly in DB corruption detection power that was inadvertently unlocked
with new backup file naming PLUS computing current content checksums (now
removed). (I don't think that case of DB corruption detection justifies reading
the whole DB on incremental backup.)

Reviewed By: zhichao-cao

Differential Revision: D23818480

Pulled By: pdillinger

fbshipit-source-id: 148aff16f001af5b9fd4b22f155311c2461f1bac
2020-09-21 16:19:24 -07:00
..
backupable Less I/O for incremental backups, slightly better corruption detection (#7413) 2020-09-21 16:19:24 -07:00
blob_db Expose the start of the expiration range for TTL blob files through LiveFileMetaData (#7365) 2020-09-10 11:33:33 -07:00
cassandra Replace reinterpret_cast with static_cast_with_check (#7067) 2020-07-02 19:25:41 -07:00
checkpoint Real fix for race in backup custom checksum checking (#7309) 2020-08-26 10:39:20 -07:00
compaction_filters Compaction filter support for BlobDB (#6850) 2020-06-29 17:32:14 -07:00
convenience Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
leveldb_options Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
memory Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
merge_operators Make StringAppendOperatorTest a parameterized test (#6930) 2020-06-04 14:17:11 -07:00
option_change_migration Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
options Add more tests to ASSERT_STATUS_CHECKED (#7367) 2020-09-16 15:48:07 -07:00
persistent_cache Add more tests to ASSERT_STATUS_CHECKED (#7367) 2020-09-16 15:48:07 -07:00
simulator_cache Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
table_properties_collectors Trigger compaction in CompactOnDeletionCollector based on deletion ratio (#6806) 2020-05-18 08:42:05 -07:00
trace Add some simulator cache and block tracer tests to ASSERT_STATUS_CHECKED (#7305) 2020-08-24 16:43:31 -07:00
transactions Disable fsync in SeqAdvanceConcurrentTest (#7302) 2020-08-24 13:22:06 -07:00
ttl DBWithTTL::Open() param ttls: vector<int32_t> to const vector<int32_t>& (#7196) 2020-08-24 16:24:16 -07:00
write_batch_with_index Replace reinterpret_cast with static_cast_with_check (#7067) 2020-07-02 19:25:41 -07:00
debug.cc dedup ReadOptions in iterator hierarchy (#7210) 2020-08-03 15:23:04 -07:00
env_librados_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
env_librados.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
env_librados.md Add EnvLibrados - RocksDB Env of RADOS (#1222) 2016-07-21 11:16:34 -07:00
env_mirror_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
env_mirror.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
env_timed_test.cc Make env*_test work with ASSERT_STATUS_CHECKED (#7176) 2020-07-28 22:59:48 -07:00
env_timed.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
fault_injection_env.cc More Makefile Cleanup (#7097) 2020-07-09 14:35:17 -07:00
fault_injection_env.h Add EnvTestWithParam::OptionsTest to the ASSERT_STATUS_CHECKED passes (#7283) 2020-08-20 19:18:35 -07:00
fault_injection_fs.cc Add EnvTestWithParam::OptionsTest to the ASSERT_STATUS_CHECKED passes (#7283) 2020-08-20 19:18:35 -07:00
fault_injection_fs.h Make env*_test work with ASSERT_STATUS_CHECKED (#7176) 2020-07-28 22:59:48 -07:00
merge_operators.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
object_registry_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
object_registry.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
util_merge_operators_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00