rocksdb/utilities
Yanqin Jin e5451b30db Fix a silent data loss for write-committed txn (#9571)
Summary:
The following sequence of events can cause silent data loss for write-committed
transactions.
```
Time    thread 1                                       bg flush
 |   db->Put("a")
 |   txn = NewTxn()
 |   txn->Put("b", "v")
 |   txn->Prepare()       // writes only to 5.log
 |   db->SwitchMemtable() // memtable 1 has "a"
 |                        // close 5.log,
 |                        // creates 8.log
 |   trigger flush
 |                                                  pick memtable 1
 |                                                  unlock db mutex
 |                                                  write new sst
 |   txn->ctwb->Put("gtid", "1") // writes 8.log
 |   txn->Commit() // writes to 8.log
 |                 // writes to memtable 2
 |                                               compute min_log_number_to_keep_2pc, this
 |                                               will be 8 (incorrect).
 |
 |                                             Purge obsolete wals, including 5.log
 |
 V
```

At this point, writes of txn exists only in memtable. Close db without flush because db thinks the data in
memtable are backed by log. Then reopen, the writes are lost except key-value pair {"gtid"->"1"},
only the commit marker of txn is in 8.log

The reason lies in `PrecomputeMinLogNumberToKeep2PC()` which calls `FindMinPrepLogReferencedByMemTable()`.
In the above example, when bg flush thread tries to find obsolete wals, it uses the information
computed by `PrecomputeMinLogNumberToKeep2PC()`. The return value of `PrecomputeMinLogNumberToKeep2PC()`
depends on three components
- `PrecomputeMinLogNumberToKeepNon2PC()`. This represents the WAL that has unflushed data. As the name of this method suggests, it does not account for 2PC. Although the keys reside in the prepare section of a previous WAL, the column family references the current WAL when they are actually inserted into the memtable during txn commit.
- `prep_tracker->FindMinLogContainingOutstandingPrep()`. This represents the WAL with a prepare section but the txn hasn't committed.
- `FindMinPrepLogReferencedByMemTable()`. This represents the WAL on which some memtables (mutable and immutable) depend for their unflushed data.

The bug lies in `FindMinPrepLogReferencedByMemTable()`. Originally, this function skips checking the column families
that are being flushed, but the unit test added in this PR shows that they should not be. In this unit test, there is
only the default column family, and one of its memtables has unflushed data backed by a prepare section in 5.log.
We should return this information via `FindMinPrepLogReferencedByMemTable()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9571

Test Plan:
```
./transaction_test --gtest_filter=*/TransactionTest.SwitchMemtableDuringPrepareAndCommit_WC/*
make check
```

Reviewed By: siying

Differential Revision: D34235236

Pulled By: riversand963

fbshipit-source-id: 120eb21a666728a38dda77b96276c6af72b008b1
2022-02-17 15:37:59 -08:00
..
backupable Make the Env class Customizable (#9293) 2022-01-04 16:45:49 -08:00
blob_db Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
cassandra Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
checkpoint Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
compaction_filters Make MergeOperator+CompactionFilter/Factory into Customizable Classes (#8481) 2021-08-06 08:27:25 -07:00
convenience Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
leveldb_options Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
memory Make types of Immutable/Mutable Options fields match that of the underlying Option (#8176) 2021-04-22 20:43:54 -07:00
merge_operators Remove using namespace (#9369) 2022-01-12 09:31:12 -08:00
option_change_migration Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
options Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
persistent_cache Improve support for using regexes (#8740) 2021-09-07 13:05:23 -07:00
simulator_cache Fix flaky SimCacheTest.SimCacheLogging (#9373) 2022-01-10 22:03:36 -08:00
table_properties_collectors Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
trace Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
transactions Fix a silent data loss for write-committed txn (#9571) 2022-02-17 15:37:59 -08:00
ttl Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
write_batch_with_index Check that newIteratorWithBase regardless of WBWI Overwrite Mode (#8134) 2021-11-18 11:53:09 -08:00
cache_dump_load_impl.cc New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
cache_dump_load_impl.h Initialize cache dumper DumpUnit in constructor (#9014) 2021-10-11 13:05:35 -07:00
cache_dump_load.cc Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache (#8912) 2021-10-07 11:42:31 -07:00
compaction_filters.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
debug.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
env_librados_test.cc Fix EnvLibrados and add to CI (#9088) 2021-10-29 08:19:03 -07:00
env_librados.cc Fix EnvLibrados and add to CI (#9088) 2021-10-29 08:19:03 -07:00
env_librados.md Update branch name to main in env_librados.md (#8738) 2021-09-01 14:28:58 -07:00
env_mirror_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
env_mirror.cc Fix clang13 build error (#9374) 2022-01-11 10:36:22 -08:00
env_timed_test.cc Make env*_test work with ASSERT_STATUS_CHECKED (#7176) 2020-07-28 22:59:48 -07:00
env_timed.cc Make FileSystem a Customizable Class (#8649) 2021-11-02 09:07:11 -07:00
env_timed.h Make FileSystem a Customizable Class (#8649) 2021-11-02 09:07:11 -07:00
fault_injection_env.cc Protect existing files in FaultInjectionTest{Env,FS}::ReopenWritableFile() (#8995) 2021-10-11 16:23:18 -07:00
fault_injection_env.h Make the Env class Customizable (#9293) 2022-01-04 16:45:49 -08:00
fault_injection_fs.cc Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
fault_injection_fs.h Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
fault_injection_secondary_cache.cc Secondary cache error injection (#9002) 2021-11-08 10:27:27 -08:00
fault_injection_secondary_cache.h Secondary cache error injection (#9002) 2021-11-08 10:27:27 -08:00
memory_allocators.h Make MemoryAllocator into a Customizable class (#8980) 2021-12-17 04:20:47 -08:00
merge_operators.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
merge_operators.h Make MergeOperator+CompactionFilter/Factory into Customizable Classes (#8481) 2021-08-06 08:27:25 -07:00
object_registry_test.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
object_registry.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
util_merge_operators_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
wal_filter.cc Make WalFilter, SstPartitionerFactory, FileChecksumGenFactory, and TableProperties Customizable (#8638) 2021-09-28 05:32:02 -07:00