Commit Graph

377 Commits

Author SHA1 Message Date
Manuel Ung
9300ef5455 Fix shared lock upgrades
Summary:
Upgrading a shared lock was silently succeeding because the actual locking code was skipped. This is because if the keys are tracked, it is assumed that they are already locked and do not require locking. Fix this by recording in tracked keys whether the key was locked exclusively or not.

Note that lock downgrades are impossible, which is the behaviour we expect.

This fixes facebook/mysql-5.6#587.
Closes https://github.com/facebook/rocksdb/pull/2122

Differential Revision: D4861489

Pulled By: IslamAbdelRahman

fbshipit-source-id: 58c7ebe7af098bf01b9774b666d3e9867747d8fd
2017-04-10 16:06:00 -07:00
Manuel Ung
1f8b119ed6 Limit maximum memory used in the WriteBatch representation
Summary:
Extend TransactionOptions to include max_write_batch_size which determines the maximum size of the writebatch representation. If memory limit is exceeded, the operation will abort with subcode kMemoryLimit.
Closes https://github.com/facebook/rocksdb/pull/2124

Differential Revision: D4861842

Pulled By: lth

fbshipit-source-id: 46fd172ea67cc90bbba829bf0d70cfab2261c161
2017-04-10 15:42:26 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Yi Wu
9e44531803 Refactor WriteImpl (pipeline write part 1)
Summary:
Refactor WriteImpl() so when I plug-in the pipeline write code (which is
an alternative approach for WriteThread), some of the logic can be
reuse. I split out the following methods from WriteImpl():

* PreprocessWrite()
* HandleWALFull() (previous MaybeFlushColumnFamilies())
* HandleWriteBufferFull()
* WriteToWAL()

Also adding a constructor to WriteThread::Writer, and move WriteContext into db_impl.h.
No real logic change in this patch.
Closes https://github.com/facebook/rocksdb/pull/2042

Differential Revision: D4781014

Pulled By: yiwu-arbug

fbshipit-source-id: d45ca18
2017-04-04 10:24:32 -07:00
Dmitri Smirnov
0a4cdde50a Windows thread
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
  an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823

Differential Revision: D4492902

Pulled By: siying

fbshipit-source-id: c74cb11
2017-02-06 14:54:18 -08:00
Reid Horuff
5cf176ca15 Fix for 2PC causing WAL to grow too large
Summary:
Consider the following single column family scenario:
prepare in log A
commit in log B
*WAL is too large, flush all CFs to releast log A*
*CFA is on log B so we do not see CFA is depending on log A so no flush is requested*

To fix this we must also consider the log containing the prepare section when determining what log a CF is dependent on.
Closes https://github.com/facebook/rocksdb/pull/1768

Differential Revision: D4403265

Pulled By: reidHoruff

fbshipit-source-id: ce800ff
2017-01-19 15:39:12 -08:00
Dmitri Smirnov
3c233ca4ea Fix Windows environment issues
Summary:
Enable directIO on WritableFileImpl::Append
     with offset being current length of the file.
     Enable UniqueID tests on Windows, disable others but
     leeting them to compile. Unique tests are valuable to
     detect failures on different filesystems and upcoming
     ReFS.
     Clear output in WinEnv Getchildren.This is different from
     previous strategy, do not touch output on failure.
     Make sure DBTest.OpenWhenOpen works with windows error message
Closes https://github.com/facebook/rocksdb/pull/1746

Differential Revision: D4385681

Pulled By: IslamAbdelRahman

fbshipit-source-id: c07b702
2017-01-09 15:54:12 -08:00
Maysam Yabandeh
7631734563 Fix the error in ColumnFamiliesTest
Summary:
In the test the last change to AAAZZZ in handles[1] is deleting it. The
result of the get must be NotFound then. Previosuly the test did not
check for the return value of Get and assumed that the status is ok. It
then move ahead asserting the returned value. The passed-by-reference
string value however was not changed (since the key was not found) and
the asserted value is what it contained before doing the Get.
Closes https://github.com/facebook/rocksdb/pull/1753

Differential Revision: D4390982

Pulled By: maysamyabandeh

fbshipit-source-id: dd55a34
2017-01-09 14:09:13 -08:00
Daniel Black
816c1e30ca gcc-7 requires include <functional> for std::function
Summary:
Fixes compile error:

In file included from ./util/statistics.h:17:0,
                 from ./util/stop_watch.h:8,
                 from ./util/perf_step_timer.h:9,
                 from ./util/iostats_context_imp.h:8,
                 from ./util/posix_logger.h:27,
                 from ./port/util_logger.h:18,
                 from ./db/auto_roll_logger.h:15,
                 from db/auto_roll_logger.cc:6:
./util/thread_local.h:65:16: error: 'function' in namespace 'std' does not name a template type
   typedef std::function<void(void*, void*)> FoldFunc;
Closes https://github.com/facebook/rocksdb/pull/1656

Differential Revision: D4318702

Pulled By: yiwu-arbug

fbshipit-source-id: 8c5d17a
2016-12-16 11:24:18 -08:00
Daniel Black
0ab6fc167f Gcc-7 buffer size insufficient
Summary:
Bunch of commits related to insufficient buffer size. Errors in individual commits.
Closes https://github.com/facebook/rocksdb/pull/1673

Differential Revision: D4332127

Pulled By: IslamAbdelRahman

fbshipit-source-id: 878f73c
2016-12-14 19:24:26 -08:00
Yi Wu
c26a4d8e8a Fix compile error in trasaction_lock_mgr.cc
Summary:
Fix error on mac/windows build since they don't recognize `uint`.
Closes https://github.com/facebook/rocksdb/pull/1624

Differential Revision: D4287139

Pulled By: yiwu-arbug

fbshipit-source-id: b7cc88f
2016-12-06 14:39:16 -08:00
Manuel Ung
2005c88a75 Implement non-exclusive locks
Summary:
This is an implementation of non-exclusive locks for pessimistic transactions. It is relatively simple and does not prevent starvation (ie. it's possible that request for exclusive access will never be granted if there are always threads holding shared access). It is done by changing `KeyLockInfo` to hold an set a transaction ids, instead of just one, and adding a flag specifying whether this lock is currently held with exclusive access or not.

Some implementation notes:
- Some lock diagnostic functions had to be updated to return a set of transaction ids for a given lock, eg. `GetWaitingTxn` and `GetLockStatusData`.
- Deadlock detection is a bit more complicated since a transaction can now wait on multiple other transactions. A BFS is done in this case, and deadlock detection depth is now just a limit on the number of transactions we visit.
- Expirable transactions do not work efficiently with shared locks at the moment, but that's okay for now.
Closes https://github.com/facebook/rocksdb/pull/1573

Differential Revision: D4239097

Pulled By: lth

fbshipit-source-id: da7c074
2016-12-05 17:39:17 -08:00
Manuel Ung
e63350e726 Use more efficient hash map for deadlock detection
Summary:
Currently, deadlock cycles are held in std::unordered_map. The problem with it is that it allocates/deallocates memory on every insertion/deletion. This limits throughput since we're doing this expensive operation while holding a global mutex. Fix this by using a vector which caches memory instead.

Running the deadlock stress test, this change increased throughput from 39k txns/s -> 49k txns/s. The effect is more noticeable in MyRocks.
Closes https://github.com/facebook/rocksdb/pull/1545

Differential Revision: D4205662

Pulled By: lth

fbshipit-source-id: ff990e4
2016-11-19 11:39:15 -08:00
Reid Horuff
1ca5f6d132 Fix 2PC Recovery SeqId Miscount
Summary:
Originally sequence ids were calculated, in recovery, based off of the first seqid found if the first log recovered. The working seqid was then incremented from that value based on every insertion that took place. This was faulty because of the potential for missing log files or inserts that skipped the WAL. The current recovery scheme grabs sequence from current recovering batch and increments using memtableinserter to track how many actual inserts take place. This works for 2PC batches as well scenarios where some logs are missing or inserts that skip the WAL.
Closes https://github.com/facebook/rocksdb/pull/1486

Differential Revision: D4156064

Pulled By: reidHoruff

fbshipit-source-id: a6da8d9
2016-11-10 11:09:22 -08:00
Reid Horuff
d133b08f68 Use correct sequence number when creating memtable
Summary:
copied from: 5ebfd2623a

Opening existing RocksDB attempts recovery from log files, which uses
wrong sequence number to create the memtable. This is a regression
introduced in change a400336.

This change includes a test demonstrating the problem, without the fix
the test fails with "Operation failed. Try again.: Transaction could not
check for conflicts for operation at SequenceNumber 1 as the MemTable
only contains changes newer than SequenceNumber 2.  Increasing the value
of the max_write_buffer_number_to_maintain option could reduce the
frequency of this error"

This change is a joint effort by Peter 'Stig' Edwards thatsafunnyname
and me.
Closes https://github.com/facebook/rocksdb/pull/1458

Differential Revision: D4143791

Pulled By: reidHoruff

fbshipit-source-id: 5a25033
2016-11-09 12:24:17 -08:00
Reid Horuff
4dfaa6610a Make IsDeadlockDetect() virtual member of Transaction
Summary: Make `IsDeadlockDetect()` virtual member of base class `Transaction` for ease of use in MyRocks

Test Plan: compiles. compiles into MyRocks call-site.

Reviewers: mung

Reviewed By: mung

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65385
2016-10-21 14:47:59 -07:00
Manuel Ung
4edd39fda2 Implement deadlock detection
Summary: Implement deadlock detection. This is done by maintaining a TxnID -> TxnID map which represents the edges in the wait for graph (this is named `wait_txn_map_`).

Test Plan: transaction_test

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64491
2016-10-19 19:45:57 -07:00
Reid Horuff
8c55bb87c8 Make Lock Info test multiple column families
Summary: Modifies the lock info export test to test multiple column families after I was experiencing a bug while developing the MyRocks front-end for this.

Test Plan: is test.

Reviewers: mung

Reviewed By: mung

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64725
2016-10-07 15:04:05 -07:00
Reid Horuff
37737c3a6b Expose Transaction State Publicly
Summary:
This exposes a transactions state through a public api rather than through a public member variable. I also do some name refactoring.
ExecutionStatus => TransactionState
exec_status_ => trx_state_

Test Plan: It compiles and transaction_test passes.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, mung, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D64689
2016-10-07 11:58:53 -07:00
Reid Horuff
2c1f95291d Add facility to write only a portion of WriteBatch to WAL
Summary:
When constructing a write batch a client may now call MarkWalTerminationPoint() on that batch. No batch operations after this call will be added written to the WAL but will still be inserted into the Memtable. This facility is used to remove one of the three WriteImpl calls in 2PC transactions. This produces a ~1% perf improvement.

```
RocksDB - unoptimized 2pc, sync_binlog=1, disable_2pc=off
INFO 2016-08-31 14:30:38,814 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2619 seconds. Requests/second = 28628

RocksDB - optimized 2pc , sync_binlog=1, disable_2pc=off
INFO 2016-08-31 16:26:59,442 [main]: REQUEST PHASE COMPLETED. 75000000 requests done in 2581 seconds. Requests/second = 29054
```

Test Plan: Two unit tests added.

Reviewers: sdong, yiwu, IslamAbdelRahman

Reviewed By: yiwu

Subscribers: hermanlee4, dhruba, andrewkr

Differential Revision: https://reviews.facebook.net/D64599
2016-10-07 11:32:10 -07:00
Manuel Ung
be1f1092c9 Expose transaction id, lock state information and transaction wait information
Summary:
This diff does 3 things:

Expose TransactionID so that we can identify transactions when we retrieve locking and lock wait information. This is exposed as `Transaction::GetID`.

Expose lock state information by locking all stripes in all column families and copying their contents to a data structure. This is exposed as `TransactionDB::GetLockStatusData`.

Adds support for tracking the transaction and the key being waited on, and exposes this as `Transaction::GetWaitingTxn`.

Test Plan: unit tests

Reviewers: horuff, sdong

Reviewed By: sdong

Subscribers: vasilep, hermanlee4, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64413
2016-09-30 11:41:21 -07:00
Wanning Jiang
78837f5d61 TableBuilder / TableReader support for range deletion
Summary: 1. Range Deletion Tombstone structure 2. Modify Add() in table_builder to make it usable for adding range del tombstones 3. Expose NewTombstoneIterator() API in table_reader

Test Plan: table_test.cc (now BlockBasedTableBuilder::Add() only accepts InternalKey. I make table_test only pass InternalKey to BlockBasedTableBuidler. Also test writing/reading range deletion tombstones in table_test )

Reviewers: sdong, IslamAbdelRahman, lightmark, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61473
2016-08-19 15:10:31 -07:00
Aaron Gao
76a67cf741 support stackableDB as the baseDB of transactionDB
Summary: make transactionDB working with StackableDB

Test Plan: make all check -j64

Reviewers: andrewkr, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60705
2016-08-11 14:19:33 -07:00
Aaron Gao
7c190070b4 delete unnessary pointer cast in beginInternalTransaction() function
Summary: use of dynamic_cast<TransactionImpl*> is unnecessary and also introduce difficulty for fbrocksdb support of TransactionDB

Test Plan: ./transaction_test

Reviewers: sdong, IslamAbdelRahman, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60501
2016-07-07 16:34:41 -07:00
Reid Horuff
892e9d3047 make transaction WriteOptions modifiable 2016-06-27 12:53:30 -07:00
Islam AbdelRahman
05c5c39a7c Fix build 2016-05-18 00:41:14 -07:00
Reid Horuff
a6254f2bd4 Long outstanding prepare test
Summary: This tests that a prepared transaction is not lost after several crashes, restarts, and memtable flushes.

Test Plan: TwoPhaseLongPrepareTest

Reviewers: sdong

Subscribers: hermanlee4, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58185
2016-05-17 18:57:06 -07:00
Islam AbdelRahman
2ead115116 Fix TransactionTest.TwoPhaseMultiThreadTest under TSAN
Summary:
TransactionTest.TwoPhaseMultiThreadTest runs forever under TSAN and our CI builds time out
looks like the reason is that some threads keep running and other threads dont get a chance to increment the counter

Test Plan: run the test under TSAN

Reviewers: sdong, horuff

Reviewed By: horuff

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58359
2016-05-17 18:54:27 -07:00
Islam AbdelRahman
f6aedb62c0 Fix Transaction memory leak
Summary:
- Make sure we clean up recovered_transactions_ on DBImpl destructor
- delete leaked txns and env in TransactionTest

Test Plan: Run transaction_test under valgrind

Reviewers: sdong, andrewkr, yhchiang, horuff

Reviewed By: horuff

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58263
2016-05-16 16:32:55 -07:00
Reid Horuff
40123b3805 signed vs unsigned comparison fix 2016-05-11 14:22:43 -07:00
Reid Horuff
c27061dae7 [rocksdb] 2PC double recovery bug fix
Summary:
1. prepare()
2. crash
3. recover
4. commit()
5. crash
6. data is lost

This is due to the transaction data still only residing in the WAL but because the logs were flushed on the first recovery the data is ignored on the second recovery. We must scan all logs found on recovery and only ignore redundant data at the time of replay. It is not possible to know which logs still contain relevant data at time of recovery. We cannot simply ignore a log because all of the non-2pc data it contains has already been written to L0.

The changes made to MemTableInserter are to ensure that prepared sections are still recovered even if all of the non-2pc data in that log has already been flushed to L0.

Test Plan: Provided test.

Reviewers: sdong

Subscribers: andrewkr, hermanlee4, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57729
2016-05-10 14:06:07 -07:00
Reid Horuff
a657ee9a9c [rocksdb] Recovery path sequence miscount fix
Summary:
Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert.
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)]
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)]
[4: COMMIT(a)]
[7: COMMIT(b)]

The first two batches do not consume any sequence numbers so are both prefixed with seq=1.
For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL.
We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries.
This is a valid WAL.

Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains.

We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b))

So, now upon recovery we must track the actual consumption of sequence numbers.
In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct?

Test Plan: provided test.

Reviewers: sdong

Subscribers: andrewkr, leveldb, dhruba, hermanlee4

Differential Revision: https://reviews.facebook.net/D57645
2016-05-10 14:06:07 -07:00
Reid Horuff
8a66c85e90 [rocksdb] Two Phase Transaction
Summary:
Two Phase Commit addition to RocksDB.

See wiki: https://github.com/facebook/rocksdb/wiki/Two-Phase-Commit-Implementation
Quip: https://fb.quip.com/pxZrAyrx53r3

Depends on:
WriteBatch modification: https://reviews.facebook.net/D54093
Memtable Log Referencing and Prepared Batch Recovery: https://reviews.facebook.net/D56919

Test Plan:
- SimpleTwoPhaseTransactionTest
- PersistentTwoPhaseTransactionTest.
- TwoPhaseRollbackTest
- TwoPhaseMultiThreadTest
- TwoPhaseLogRollingTest
- TwoPhaseEmptyWriteTest
- TwoPhaseExpirationTest

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: leveldb, hermanlee4, andrewkr, vasilep, dhruba, santoshb

Differential Revision: https://reviews.facebook.net/D56925
2016-05-10 14:06:07 -07:00
Li Peng
6d4832a998 Merge pull request #1101 from flyd1005/wip-fix-typo
fix typos and remove duplicated words
2016-04-28 02:30:44 -07:00
SherlockNoMad
f11b0df121 Fix AppVeyor build error 2016-03-15 10:57:33 -07:00
agiardullo
790252805d Add multithreaded transaction test
Summary: Refactored db_bench transaction stress tests so that they can be called from unit tests as well.

Test Plan: run new unit test as well as db_bench

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55203
2016-03-11 15:16:52 -08:00
agiardullo
2200295ee1 optimistic transactions support for reinitialization
Summary: Extend optimization in D53835 to optimistic transactions for completeness.

Test Plan: added test

Reviewers: sdong, IslamAbdelRahman, horuff, jkedgar

Reviewed By: horuff

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55059
2016-03-07 19:03:09 -08:00
agiardullo
200080ed72 Improve snapshot handling for Transaction reinitialization
Summary: Previously, reusing a transaction (by passing it as an argument to BeginTransaction) would not clear the transaction's snapshot.  This is not a clear, well-definited behavior.

Test Plan: improved test

Reviewers: sdong, IslamAbdelRahman, horuff, jkedgar

Reviewed By: jkedgar

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55053
2016-03-07 13:28:11 -08:00
agiardullo
5ea9aa3c14 TransactionDB:ReinitializeTransaction
Summary: Add function to reinitialize a transaction object so that it can be reused.  This is an optimization so users can potentially avoid reallocating transaction objects.

Test Plan: added tests

Reviewers: yhchiang, kradhakrishnan, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: jkedgar, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53835
2016-02-29 16:27:32 -08:00
agiardullo
d08d50295c Fix transaction locking
Summary: Broke transaction locking in 4.4 in D52197.  Will cherry-pick this change into 4.4 (which hasn't yet been fully released).  Repro'd using db_bench.

Test Plan: unit tests and db_Bench

Reviewers: sdong, yhchiang, kradhakrishnan, ngbronson

Reviewed By: ngbronson

Subscribers: ngbronson, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54021
2016-02-16 17:15:05 -08:00
reid horuff
5bcf952a87 Fix WriteImpl empty batch hanging issue
Summary: There is an issue in DBImpl::WriteImpl where if an empty writebatch comes in and sync=true then the logs will be marked as being synced yet the sync never actually happens because there is no data in the writebatch. This causes the next incoming batch to hang while waiting for the logs to complete syncing. This fix syncs logs even if the writebatch is empty.

Test Plan: DoubleEmptyBatch unit test in transaction_test.

Reviewers: yoshinorim, hermanlee4, sdong, ngbronson, anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54057
2016-02-16 12:21:33 -08:00
Igor Canadi
c90d63a23d can_unlock set but not used
Test Plan: I couldn't repro, but I hope this fixes it. See the error here: https://evergreen.mongodb.com/task_log_raw/mongodb_mongo_master_ubuntu1404_rocksdb_compile_6e9fd902d5cb25aef992363efa128640affd5196_16_02_11_04_33_37/0?type=T

Reviewers: yhchiang, andrewkr, sdong, anthony

Reviewed By: anthony

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54123
2016-02-16 11:24:40 -08:00
Yueh-Hsuan Chiang
3a67bffaa8 Fix an ASAN error in transaction_test.cc
Summary:
One test in transaction_test.cc forgets to call SyncPoint::DisableProcessing().
As a result, a program might to access the SyncPoint singleton after it
already goes out of scope.

This patch fix this error by calling SyncPoint::DisableProcessing().

Test Plan: transaction_test

Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54033
2016-02-10 12:06:59 -08:00
Baraa Hamodi
21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
agiardullo
fe93bf9b5d Transaction::UndoGetForUpdate
Summary: MyRocks wants to be able to un-lock a key that was just locked by GetForUpdate().  To do this safely, I am now keeping track of the number of reads(for update) and writes for each key in a transaction.  UndoGetForUpdate() will only unlock a key if it hasn't been written and the read count reaches 0.

Test Plan: more unit tests

Reviewers: igor, rven, yhchiang, spetrunia, sdong

Reviewed By: spetrunia, sdong

Subscribers: spetrunia, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47043
2016-02-09 10:46:11 -08:00
reid horuff
6f71d3b68b Improve perf of Pessimistic Transaction expirations (and optimistic transactions)
Summary:
copy from task 8196669:

1) Optimistic transactions do not support batching writes from different threads.
2) Pessimistic transactions do not support batching writes if an expiration time is set.

In these 2 cases, we currently do not do any write batching in DBImpl::WriteImpl() because there is a WriteCallback that could decide at the last minute to abort the write.  But we could support batching write operations with callbacks if we make sure to process the callbacks correctly.

To do this, we would first need to modify write_thread.cc to stop preventing writes with callbacks from being batched together.  Then we would need to change DBImpl::WriteImpl() to call all WriteCallback's in a batch, only write the batches that succeed, and correctly set the state of each batch's WriteThread::Writer.

Test Plan: Added test WriteWithCallbackTest to write_callback_test.cc which creates multiple client threads and verifies that writes are batched and executed properly.

Reviewers: hermanlee4, anthony, ngbronson

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52863
2016-02-05 10:44:13 -08:00
Gabriela Jacques da Silva
0c2bd5cb4b Removing data race from expirable transactions
Summary:
Doing inline checking of transaction expiration instead of
using a callback.

Test Plan: To be added

Reviewers: anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53673
2016-02-02 18:37:44 -08:00
agiardullo
45768ade4f transaction allocation perf improvements
Summary: Removed a couple of memory allocations

Test Plan: changes covered by existing tests

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D53523
2016-01-28 17:32:28 -08:00
agiardullo
eff309867e Do not use timed_mutex in TransactionDB
Summary: Stopped using std::timed_mutex as it has known issues in older versiong of gcc.  Ran into these problems when testing MongoRocks.

Test Plan: unit tests.  Manual mongo testing on gcc 4.8.

Reviewers: igor, yhchiang, rven, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52197
2015-12-18 17:26:02 -08:00
Dmitri Smirnov
b6d19adcf7 Use port size_t formatting 2015-12-15 11:34:22 -08:00
agiardullo
84f98792d6 Transaction::SetWriteOptions()
Summary: Add support to change write options after creating a transaction.  This is needed for MongoRocks.

Test Plan: added test

Reviewers: sdong, rven, kradhakrishnan, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51867
2015-12-11 16:08:25 -08:00
agiardullo
3bfd3d39a3 Use SST files for Transaction conflict detection
Summary:
Currently, transactions can fail even if there is no actual write conflict.  This is due to relying on only the memtables to check for write-conflicts.  Users have to tune memtable settings to try to avoid this, but it's hard to figure out exactly how to tune these settings.

With this diff, TransactionDB will use both memtables and SST files to determine if there are any write conflicts.  This relies on the fact that BlockBasedTable stores sequence numbers for all writes that happen after any open snapshot.  Also, D50295 is needed to prevent SingleDelete from disappearing writes (the TODOs in this test code will be fixed once the other diff is approved and merged).

Note that Optimistic transactions will still rely on tuning memtable settings as we do not want to read from SST while on the write thread.  Also, memtable settings can still be used to reduce how often TransactionDB needs to read SST files.

Test Plan: unit tests, db bench

Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb, yoshinorim

Differential Revision: https://reviews.facebook.net/D50475
2015-12-11 12:34:11 -08:00
agiardullo
e5c5f23814 Support marking snapshots for write-conflict checking - Take 2
Summary:
D51183 was reverted due to breaking the LITE build.

This diff is the same as D51183 but with a fix for the LITE BUILD(D51693)

Test Plan: run all unit tests

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51711
2015-12-08 16:47:31 -08:00
sdong
1d63c3d610 Revert "Support marking snapshots for write-conflict checking"
This reverts commit ec704aafdc for it broke RocksDB LITE build.
2015-12-08 09:27:17 -08:00
agiardullo
ec704aafdc Support marking snapshots for write-conflict checking
Summary:
D50475 enables using SST files for transaction write-conflict checking.  In order for this to work, we need to make sure not to compact out SingleDeletes when there is an earlier transaction snapshot(D50295).  If there is a long-held snapshot, this could reduce the benefit of the SingleDelete optimization.

This diff allows Transactions to mark snapshots as being used for write-conflict checking.  Then, during compaction, we will be able to optimize SingleDeletes better in the future.

This diff adds a flag to SnapshotImpl which is used by Transactions.  This diff also passes the earliest write-conflict snapshot's sequence number to CompactionIterator.  This diff does not actually change Compaction (after this diff is pushed, D50295 will be able to use this information).

Test Plan: no behavior change, ran existing tests

Reviewers: rven, kradhakrishnan, yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D51183
2015-12-07 19:40:51 -08:00
Jay Edgar
b28b7c6dd9 Added callback notification when a snapshot is created
Summary: When SetSnapshot() is used the caller immediately knows a snapshot has been created, but when SetSnapshotOnNextOperation() is used the caller needs a way to get notified when that snapshot has been generated.  This creates an interface that the client can implement that will be called at the time the snapshot is created.

Test Plan: Added a new SetSnapshotOnNextOperationWithNotification test into the transaction_test.

Reviewers: sdong, anthony

Reviewed By: anthony

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51177
2015-12-04 10:20:36 -08:00
Alex Yang
e8180f9901 added public api to schedule flush/compaction, code to prevent race with db::open
Summary:
Fixes T8781168.

Added a new function EnableAutoCompactions in db.h to be publicly
avialable.  This allows compaction to be re-enabled after disabling it via
SetOptions

Refactored code to set the dbptr earlier on in TransactionDB::Open and DB::Open
Temporarily disable auto_compaction in TransactionDB::Open until dbptr is set to
prevent race condition.

Test Plan:
Ran make all check

verified fix on myrocks side:
was able to reproduce the seg fault with
../tools/mysqltest.sh --mem --force rocksdb.drop_table

method was to manually sleep the thread after DB::Open but before TransactionDB ptr was
assigned in transaction_db_impl.cc:
  DB::Open(db_options, dbname, column_families_copy, handles, &db);
  clock_t goal = (60000 * 10) + clock();
  while (goal > clock());
  ...dbptr(aka rdb) gets assigned below

verified my changes fixed the issue.

Also added unit test 'ToggleAutoCompaction' in transaction_test.cc

Reviewers: hermanlee4, anthony

Reviewed By: anthony

Subscribers: alex, dhruba

Differential Revision: https://reviews.facebook.net/D51147
2015-12-03 22:59:44 -08:00
Jay Edgar
8f143e03fb Add ClearSnapshot()
Summary:
MyRocks needs the ability to clear a snapshot for Read Committed support

Test Plan: transaction_test

Reviewers: anthony

Reviewed By: anthony

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48861
2015-10-16 11:53:30 -07:00
agiardullo
def74f8763 Deferred snapshot creation in transactions
Summary: Support for Transaction::CreateSnapshotOnNextOperation().  This is to fix a write-conflict race-condition that Yoshinori was running into when testing MyRocks with LinkBench.

Test Plan: New tests

Reviewers: yhchiang, spetrunia, rven, igor, yoshinorim, sdong

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48099
2015-10-09 15:46:16 -07:00
agiardullo
c5f3707d42 DisableIndexing() for Transactions
Summary:
MyRocks reported some perfomance issues when inserting many keys into a transaction due to the cost of inserting new keys into WriteBatchWithIndex.  Frequently, they don't even need the keys to be indexed as they don't need to read them back.  DisableIndexing() can be used to avoid the cost of indexing.

I also plan on eventually investigating if we can improve WriteBatchWithIndex performance.  But even if we improved the perf here, it is still beneficial to be able to disable the indexing all together for large transactions.

Test Plan: unit test

Reviewers: igor, rven, yoshinorim, spetrunia, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48471
2015-10-09 15:36:09 -07:00
agiardullo
03b08ba9a9 Return MergeInProgress when fetching from transactions or WBWI with overwrite_key
Summary:
WriteBatchWithIndex::GetFromBatchAndDB only works correctly for overwrite_key=false.  Transactions use overwrite_key=true (since WriteBatchWithIndex::GetIteratorWithBase only works when overwrite_key=true).  So currently, Transactions could return incorrectly merged results when calling Get/GetForUpdate().

Until a permanent fix can be put in place, Transaction::Get[ForUpdate] and WriteBatchWithIndex::GetFromBatch[AndDB] will now return MergeInProgress if the most recent write to a key in the batch is a Merge.

Test Plan: more tests

Reviewers: sdong, yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47817
2015-09-30 11:14:42 -07:00
agiardullo
5a51fa907b Fix accidental object copy in transactions
Summary: Should have used auto& instead of auto.  Also needed to change the code a bit due to const correctness.

Test Plan: unit tests

Reviewers: sdong, igor, yoshinorim, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47787
2015-09-29 17:36:56 -07:00
agiardullo
afe0dc539b SingleDelete support for Transactions
Summary: Transactional SingleDelete is needed for MyRocks.  Note: This diff requires D47529.

Test Plan: Added some new tests in this diff as well as more tests added in D47529

Reviewers: rven, sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47535
2015-09-28 12:14:26 -07:00
Islam AbdelRahman
7cb314b9e6 Skip some tests in ROCKSD_LITE
Summary:
Skip these tests under ROCKSDB_LITE

compaction_job_stats_test
corruption_test
transactions/transaction_test

Test Plan: compile using ROCKSDB_LITE

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46923
2015-09-14 16:44:35 -07:00
agiardullo
a3fc49bfdd Transactions: Release Locks when rolling back to a savepoint
Summary: Transaction::RollbackToSavePoint() will now release any locks that were taken since the previous SavePoint.  To do this cleanly, I moved tracked_keys_ management into TransactionBase.

Test Plan: New Transaction test.

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, spetrunia, leveldb

Differential Revision: https://reviews.facebook.net/D46761
2015-09-11 18:10:50 -07:00
agiardullo
aa6eed0c1e Transaction stats
Summary: Added funtions to fetch the number of locked keys in a transaction, the number of pending puts/merge/deletes, and the elapsed time

Test Plan: unit tests

Reviewers: yoshinorim, jkedgar, rven, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45417
2015-09-09 13:35:53 -07:00
agiardullo
5e94f68f35 TransactionDB Custom Locking API
Summary:
Prototype of API to allow MyRocks to override default Mutex/CondVar used by transactions with their own implementations.  They would simply need to pass their own implementations of Mutex/CondVar to the templated TransactionDB::Open().

Default implementation of TransactionDBMutex/TransactionDBCondVar provided (but the code is not currently changed to use this).

Let me know if this API makes sense or if it should be changed

Test Plan: n/a

Reviewers: yhchiang, rven, igor, sdong, spetrunia

Reviewed By: spetrunia

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43761
2015-09-08 17:03:57 -07:00
agiardullo
0f1aab6c12 Add SetLockTimeout for Transactions
Summary: MyRocks wants to be able to change the lock timeout of a transaction that has already started.  Expose existing SetLockTimeout function to users.

Test Plan: unit test

Reviewers: spetrunia, rven, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45987
2015-09-02 20:07:19 -07:00
agiardullo
77a28615ec Support static Status messages
Summary: Provide a way to specify a detailed static error message for a Status without incurring a memcpy.  Let me know what people think of this approach.

Test Plan: added simple test

Reviewers: igor, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D44259
2015-08-31 16:13:29 -07:00
agiardullo
20d1e547d1 Common base class for transactions
Summary:
As I keep adding new features to transactions, I keep creating more duplicate code.  This diff cleans this up by creating a base implementation class for Transaction and OptimisticTransaction to inherit from.

The code in TransactionBase.h/.cc is all just copied from elsewhere.  The only entertaining part of this class worth looking at is the virtual TryLock method which allows OptimisticTransactions and Transactions to share the same common code for Put/Get/etc.

The rest of this diff is mostly red and easy on the eyes.

Test Plan: No functionality change.  existing tests pass.

Reviewers: sdong, jkedgar, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45135
2015-08-24 19:09:43 -07:00
agiardullo
c3466eab07 Have Transactions use WriteBatch::RollbackToSavePoint
Summary:
Clean up transactions to use the new RollbackToSavePoint api in WriteBatchWithIndex.

Note, this diff depends on Pessimistic Transactions diff and ManagedSnapshot diff (D40869 and D43293).

Test Plan: unit tests

Reviewers: rven, yhchiang, kradhakrishnan, spetrunia, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43371
2015-08-11 17:53:30 -07:00
agiardullo
0db807ec28 Transaction error statuses
Summary:
Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures.

https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954

Test Plan: unit tests

Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43323
2015-08-11 17:52:56 -07:00
agiardullo
c2f2cb0214 Pessimistic Transactions
Summary:
Initial implementation of Pessimistic Transactions.  This diff contains the api changes discussed in D38913.  This diff is pretty large, so let me know if people would prefer to meet up to discuss it.

MyRocks folks:  please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.

Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint().  After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex.  We can then decide which route is preferable.

Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.

Test Plan: Unit tests, db_bench parallel testing.

Reviewers: igor, rven, sdong, yhchiang, yoshinorim

Reviewed By: sdong

Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40869
2015-08-11 17:52:23 -07:00
Islam AbdelRahman
e8e8c90499 fix compile for optimistic_transaction_test under ROCKSDB_LITE
Summary: Adding a main for optimistic_transaction_test that report that it was skipped when using ROCKSDB_LITE

Test Plan: optimistic_transaction_test

Reviewers: yhchiang, sdong, igor, anthony

Reviewed By: anthony

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D41979
2015-07-13 16:42:40 -07:00
agiardullo
ca8b85ac04 better document max_write_buffer_number_to_maintain
Test Plan: compile

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39285
2015-06-02 21:24:19 -07:00
Igor Canadi
4c181f08bc Fix compile on darwin
Summary: As title

Test Plan: make check

Reviewers: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39243
2015-05-30 12:25:45 -04:00
agiardullo
dc9d70de65 Optimistic Transactions
Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.

Test Plan: Added a new test, but still need to look into stress testing.

Reviewers: yhchiang, igor, rven, sdong

Reviewed By: sdong

Subscribers: adamretter, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33435
2015-05-29 14:36:35 -07:00