Yanqin Jin 5237b39d2e Fix assertion error during compaction with write-prepared txn enabled (#9105)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9105

The user contract of SingleDelete is that: a SingleDelete can only be issued to
a key that exists and has NOT been updated. For example, application can insert
one key `key`, and uses a SingleDelete to delete it in the future. The `key`
cannot be updated or removed using Delete.
In reality, especially when write-prepared transaction is being used, things
can get tricky. For example, a prepared transaction already writes `key` to the
memtable after a successful Prepare(). Afterwards, should the transaction
rollback, it will insert a Delete into the memtable to cancel out the prior
Put. Consider the following sequence of operations.

```
// operation sequence 1
Begin txn
Put(key)
Prepare()
Flush()

Rollback txn
Flush()
```

There will be two SSTs resulting from above. One of the contains a PUT, while
the second one contains a Delete. It is also known that releasing a snapshot
can lead to an L0 containing only a SD for a particular key. Consider the
following operations following the above block.

```
// operation sequence 2
db->Put(key)
db->SingleDelete(key)
Flush()
```

The operation sequence 2 can result in an L0 with only the SD.

Should there be a snapshot for conflict checking created before operation
sequence 1, then an attempt to compact the db may hit the assertion failure
below, because ikey_.type is Delete (from a rollback).

```
else if (clear_and_output_next_key_) {
  assert(ikey_.type == kTypeValue || ikey_.type == kTypeBlobIndex);
}
```

To fix the assertion failure, we can skip the SingleDelete if we detect an
earlier Delete in the same snapshot interval.

Reviewed By: ltamasi

Differential Revision: D32056848

fbshipit-source-id: 23620a91e28562d91c45cf7e95f414b54b729748
2021-11-05 15:29:18 -07:00
2021-11-01 10:11:18 -07:00
2021-10-07 11:22:17 -07:00
2020-06-29 14:31:41 -07:00
2017-10-18 14:42:10 -07:00
2019-08-29 23:21:01 -07:00
2017-12-05 18:42:35 -08:00
2017-04-27 18:06:12 -07:00
2017-07-15 16:11:23 -07:00
2021-04-22 11:12:40 -07:00
2021-10-29 19:34:39 -07:00
2019-06-24 17:40:32 -07:00
2021-10-01 16:10:35 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.

Description
A library that provides an embeddable, persistent key-value store for fast storage.
Readme 271 MiB
Languages
C++ 82.1%
Java 10.3%
C 2.5%
Python 1.7%
Perl 1.1%
Other 2.1%