Summary:
GetCommitTimeWriteBatch is currently used to store some state as part of commit in 2PC. In MyRocks it is specifically used to store some data that would be needed only during recovery. So it is not need to be stored in memtable right after each commit.
This patch enables an optimization to write the GetCommitTimeWriteBatch only to the WAL. The batch will be written to memtable during recovery when the WAL is replayed. To cover the case when WAL is deleted after memtable flush, the batch is also buffered and written to memtable right before each memtable flush.
Closes https://github.com/facebook/rocksdb/pull/3071
Differential Revision: D6148023
Pulled By: maysamyabandeh
fbshipit-source-id: 2d09bae5565abe2017c0327421010d5c0d55eaa7
Summary:
Enable concurrent_prepare flag for WritePrepared transactions and extend the existing transaction tests with this config.
Closes https://github.com/facebook/rocksdb/pull/3046
Differential Revision: D6106534
Pulled By: maysamyabandeh
fbshipit-source-id: 88c8d21d45bc492beb0a131caea84a2ac5e7d38c
Summary:
On iterator create, take a snapshot, create a ReadCallback and pass the ReadCallback to the underlying DBIter to check if key is committed.
Closes https://github.com/facebook/rocksdb/pull/2981
Differential Revision: D6001471
Pulled By: yiwu-arbug
fbshipit-source-id: 3565c4cdaf25370ba47008b0e0cb65b31dfe79fe
Summary:
With WriteCommitted, when the write batch has duplicate keys, the txn db simply inserts them to the db with different seq numbers and let the db ignore/merge the duplicate values at the read time. With WritePrepared all the entries of the batch are inserted with the same seq number which prevents us from benefiting from this simple solution.
This patch applies a hackish solution to unblock the end-to-end testing. The hack is to be replaced with a proper solution soon. The patch simply detects the duplicate key insertions, and mark the previous one as obsolete. Then before writing to the db it rewrites the batch eliminating the obsolete keys. This would incur a memcpy cost. Furthermore handing duplicate merge would require to do FullMerge instead of simply ignoring the previous value, which is not handled by this patch.
Closes https://github.com/facebook/rocksdb/pull/2969
Differential Revision: D5976337
Pulled By: maysamyabandeh
fbshipit-source-id: 114e65b66f137d8454ff2d1d782b8c05da95f989
Summary:
Implement the rollback of WritePrepared txns. For each modified value, it reads the value before the txn and write it back. This would cancel out the effect of transaction. It also remove the rolled back txn from prepared heap.
Closes https://github.com/facebook/rocksdb/pull/2946
Differential Revision: D5937575
Pulled By: maysamyabandeh
fbshipit-source-id: a6d3c47f44db3729f44b287a80f97d08dc4e888d
Summary:
Recover txns from the WAL. Also added some unit tests.
Closes https://github.com/facebook/rocksdb/pull/2901
Differential Revision: D5859596
Pulled By: maysamyabandeh
fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0
Summary:
By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions.
Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch.
Closes https://github.com/facebook/rocksdb/pull/2885
Differential Revision: D5837843
Pulled By: maysamyabandeh
fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c
Summary:
We had two proposals for lock-free commit maps. This patch implements the latter one that was simpler. We can later experiment with both proposals.
In this impl each entry is an std::atomic of uint64_t, which are accessed via memory_order_acquire/release. In x86_64 arch this is compiled to simple reads and writes from memory.
Closes https://github.com/facebook/rocksdb/pull/2861
Differential Revision: D5800724
Pulled By: maysamyabandeh
fbshipit-source-id: 41abae9a4a5df050a8eb696c43de11c2770afdda
Summary:
This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
Closes https://github.com/facebook/rocksdb/pull/2850
Differential Revision: D5787375
Pulled By: maysamyabandeh
fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
Summary:
Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped.
This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel.
Closes https://github.com/facebook/rocksdb/pull/2713
Differential Revision: D5640021
Pulled By: maysamyabandeh
fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c
Summary:
This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl.
Closes https://github.com/facebook/rocksdb/pull/2691
Differential Revision: D5569464
Pulled By: maysamyabandeh
fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325