Fix assertion error during compaction with write-prepared txn enabled (#9105)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9105 The user contract of SingleDelete is that: a SingleDelete can only be issued to a key that exists and has NOT been updated. For example, application can insert one key `key`, and uses a SingleDelete to delete it in the future. The `key` cannot be updated or removed using Delete. In reality, especially when write-prepared transaction is being used, things can get tricky. For example, a prepared transaction already writes `key` to the memtable after a successful Prepare(). Afterwards, should the transaction rollback, it will insert a Delete into the memtable to cancel out the prior Put. Consider the following sequence of operations. ``` // operation sequence 1 Begin txn Put(key) Prepare() Flush() Rollback txn Flush() ``` There will be two SSTs resulting from above. One of the contains a PUT, while the second one contains a Delete. It is also known that releasing a snapshot can lead to an L0 containing only a SD for a particular key. Consider the following operations following the above block. ``` // operation sequence 2 db->Put(key) db->SingleDelete(key) Flush() ``` The operation sequence 2 can result in an L0 with only the SD. Should there be a snapshot for conflict checking created before operation sequence 1, then an attempt to compact the db may hit the assertion failure below, because ikey_.type is Delete (from a rollback). ``` else if (clear_and_output_next_key_) { assert(ikey_.type == kTypeValue || ikey_.type == kTypeBlobIndex); } ``` To fix the assertion failure, we can skip the SingleDelete if we detect an earlier Delete in the same snapshot interval. Reviewed By: ltamasi Differential Revision: D32056848 fbshipit-source-id: 23620a91e28562d91c45cf7e95f414b54b729748
This commit is contained in:
parent
7d74d3471c
commit
5237b39d2e
@ -11,6 +11,7 @@
|
|||||||
* EventListener::OnTableFileCreated was previously called with OK status and file_size==0 in cases of no SST file contents written (because there was no content to add) and the empty file deleted before calling the listener. Now the status is Aborted.
|
* EventListener::OnTableFileCreated was previously called with OK status and file_size==0 in cases of no SST file contents written (because there was no content to add) and the empty file deleted before calling the listener. Now the status is Aborted.
|
||||||
* Fixed a bug in CompactionIterator when write-preared transaction is used. Releasing earliest_snapshot during compaction may cause a SingleDelete to be output after a PUT of the same user key whose seq has been zeroed.
|
* Fixed a bug in CompactionIterator when write-preared transaction is used. Releasing earliest_snapshot during compaction may cause a SingleDelete to be output after a PUT of the same user key whose seq has been zeroed.
|
||||||
* Added input sanitization on negative bytes passed into `GenericRateLimiter::Request`.
|
* Added input sanitization on negative bytes passed into `GenericRateLimiter::Request`.
|
||||||
|
* Fixed an assertion failure in CompactionIterator when write-prepared transaction is used. We prove that certain operations can lead to a Delete being followed by a SingleDelete (same user key). We can drop the SingleDelete.
|
||||||
|
|
||||||
### Behavior Changes
|
### Behavior Changes
|
||||||
* `NUM_FILES_IN_SINGLE_COMPACTION` was only counting the first input level files, now it's including all input files.
|
* `NUM_FILES_IN_SINGLE_COMPACTION` was only counting the first input level files, now it's including all input files.
|
||||||
|
@ -608,11 +608,19 @@ void CompactionIterator::NextFromInput() {
|
|||||||
|
|
||||||
TEST_SYNC_POINT_CALLBACK(
|
TEST_SYNC_POINT_CALLBACK(
|
||||||
"CompactionIterator::NextFromInput:SingleDelete:2", nullptr);
|
"CompactionIterator::NextFromInput:SingleDelete:2", nullptr);
|
||||||
if (next_ikey.type == kTypeSingleDeletion) {
|
if (next_ikey.type == kTypeSingleDeletion ||
|
||||||
|
next_ikey.type == kTypeDeletion) {
|
||||||
// We encountered two SingleDeletes for same key in a row. This
|
// We encountered two SingleDeletes for same key in a row. This
|
||||||
// could be due to unexpected user input. Skip the first
|
// could be due to unexpected user input. If write-(un)prepared
|
||||||
// SingleDelete and let the next iteration decide how to handle the
|
// transaction is used, this could also be due to releasing an old
|
||||||
// second SingleDelete
|
// snapshot between a Put and its matching SingleDelete.
|
||||||
|
// Furthermore, if write-(un)prepared transaction is rolled back
|
||||||
|
// after prepare, we will write a Delete to cancel a prior Put. If
|
||||||
|
// old snapshot is released between a later Put and its matching
|
||||||
|
// SingleDelete, we will end up with a Delete followed by
|
||||||
|
// SingleDelete.
|
||||||
|
// Skip the first SingleDelete and let the next iteration decide
|
||||||
|
// how to handle the second SingleDelete or Delete.
|
||||||
|
|
||||||
// First SingleDelete has been skipped since we already called
|
// First SingleDelete has been skipped since we already called
|
||||||
// input_.Next().
|
// input_.Next().
|
||||||
|
@ -3199,6 +3199,82 @@ TEST_P(WritePreparedTransactionTest, ReleaseEarliestSnapshotAfterSeqZeroing2) {
|
|||||||
SyncPoint::GetInstance()->ClearAllCallBacks();
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
||||||
}
|
}
|
||||||
|
|
||||||
|
// Although the user-contract indicates that a SD can only be issued for a key
|
||||||
|
// that exists and has not been overwritten, it is still possible for a Delete
|
||||||
|
// to be present when write-prepared transaction is rolled back.
|
||||||
|
TEST_P(WritePreparedTransactionTest, SingleDeleteAfterRollback) {
|
||||||
|
constexpr size_t kSnapshotCacheBits = 7; // same as default
|
||||||
|
constexpr size_t kCommitCacheBits = 0; // minimum commit cache
|
||||||
|
UpdateTransactionDBOptions(kSnapshotCacheBits, kCommitCacheBits);
|
||||||
|
options.disable_auto_compactions = true;
|
||||||
|
ASSERT_OK(ReOpen());
|
||||||
|
|
||||||
|
// Get a write conflict snapshot by creating a transaction with
|
||||||
|
// set_snapshot=true.
|
||||||
|
TransactionOptions txn_opts;
|
||||||
|
txn_opts.set_snapshot = true;
|
||||||
|
std::unique_ptr<Transaction> dummy_txn(
|
||||||
|
db->BeginTransaction(WriteOptions(), txn_opts));
|
||||||
|
|
||||||
|
std::unique_ptr<Transaction> txn0(
|
||||||
|
db->BeginTransaction(WriteOptions(), TransactionOptions()));
|
||||||
|
ASSERT_OK(txn0->Put("foo", "value"));
|
||||||
|
ASSERT_OK(txn0->SetName("xid0"));
|
||||||
|
ASSERT_OK(txn0->Prepare());
|
||||||
|
|
||||||
|
// Create an SST with only {"foo": "value"}.
|
||||||
|
ASSERT_OK(db->Flush(FlushOptions()));
|
||||||
|
|
||||||
|
// Insert a Delete to cancel out the prior Put by txn0.
|
||||||
|
ASSERT_OK(txn0->Rollback());
|
||||||
|
txn0.reset();
|
||||||
|
|
||||||
|
// Create a second SST.
|
||||||
|
ASSERT_OK(db->Flush(FlushOptions()));
|
||||||
|
|
||||||
|
ASSERT_OK(db->Put(WriteOptions(), "foo", "value1"));
|
||||||
|
|
||||||
|
auto* snapshot = db->GetSnapshot();
|
||||||
|
|
||||||
|
ASSERT_OK(db->SingleDelete(WriteOptions(), "foo"));
|
||||||
|
|
||||||
|
int count = 0;
|
||||||
|
SyncPoint::GetInstance()->DisableProcessing();
|
||||||
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
||||||
|
SyncPoint::GetInstance()->SetCallBack(
|
||||||
|
"CompactionIterator::NextFromInput:SingleDelete:1", [&](void* arg) {
|
||||||
|
const auto* const c = reinterpret_cast<const Compaction*>(arg);
|
||||||
|
assert(!c);
|
||||||
|
// Trigger once only for SingleDelete during flush.
|
||||||
|
if (0 == count) {
|
||||||
|
++count;
|
||||||
|
db->ReleaseSnapshot(snapshot);
|
||||||
|
// Bump max_evicted_seq
|
||||||
|
ASSERT_OK(db->Put(WriteOptions(), "x", "dontcare"));
|
||||||
|
}
|
||||||
|
});
|
||||||
|
SyncPoint::GetInstance()->EnableProcessing();
|
||||||
|
|
||||||
|
// Create a third SST containing a SD without its matching PUT.
|
||||||
|
ASSERT_OK(db->Flush(FlushOptions()));
|
||||||
|
|
||||||
|
SyncPoint::GetInstance()->DisableProcessing();
|
||||||
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
||||||
|
SyncPoint::GetInstance()->EnableProcessing();
|
||||||
|
|
||||||
|
DBImpl* dbimpl = static_cast_with_check<DBImpl>(db->GetRootDB());
|
||||||
|
assert(dbimpl);
|
||||||
|
ASSERT_OK(dbimpl->TEST_CompactRange(
|
||||||
|
/*level=*/0, /*begin=*/nullptr, /*end=*/nullptr,
|
||||||
|
/*column_family=*/nullptr, /*disallow_trivial_mode=*/true));
|
||||||
|
|
||||||
|
SyncPoint::GetInstance()->DisableProcessing();
|
||||||
|
SyncPoint::GetInstance()->ClearAllCallBacks();
|
||||||
|
|
||||||
|
// Release the conflict-checking snapshot.
|
||||||
|
ASSERT_OK(dummy_txn->Rollback());
|
||||||
|
}
|
||||||
|
|
||||||
// A more complex test to verify compaction/flush should keep keys visible
|
// A more complex test to verify compaction/flush should keep keys visible
|
||||||
// to snapshots.
|
// to snapshots.
|
||||||
TEST_P(WritePreparedTransactionTest,
|
TEST_P(WritePreparedTransactionTest,
|
||||||
|
Loading…
Reference in New Issue
Block a user