Summary:
MyRocks needs the ability to clear a snapshot for Read Committed support
Test Plan: transaction_test
Reviewers: anthony
Reviewed By: anthony
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D48861
Summary: Support for Transaction::CreateSnapshotOnNextOperation(). This is to fix a write-conflict race-condition that Yoshinori was running into when testing MyRocks with LinkBench.
Test Plan: New tests
Reviewers: yhchiang, spetrunia, rven, igor, yoshinorim, sdong
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D48099
Summary:
MyRocks reported some perfomance issues when inserting many keys into a transaction due to the cost of inserting new keys into WriteBatchWithIndex. Frequently, they don't even need the keys to be indexed as they don't need to read them back. DisableIndexing() can be used to avoid the cost of indexing.
I also plan on eventually investigating if we can improve WriteBatchWithIndex performance. But even if we improved the perf here, it is still beneficial to be able to disable the indexing all together for large transactions.
Test Plan: unit test
Reviewers: igor, rven, yoshinorim, spetrunia, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D48471
Summary:
To support a new MongoDB capability, we need to make sure that we don't do any IO for a short period of time. For background, see:
* https://jira.mongodb.org/browse/SERVER-20704
* https://jira.mongodb.org/browse/SERVER-18899
To implement that, I add a new API calls PauseBackgroundWork() and ContinueBackgroundWork() which reuse the capability we already have in place for RefitLevel() function.
Test Plan: Added a new test in db_test. Made sure that test fails when PauseBackgroundWork() is commented out.
Reviewers: IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47901
Summary:
WriteBatchWithIndex::GetFromBatchAndDB only works correctly for overwrite_key=false. Transactions use overwrite_key=true (since WriteBatchWithIndex::GetIteratorWithBase only works when overwrite_key=true). So currently, Transactions could return incorrectly merged results when calling Get/GetForUpdate().
Until a permanent fix can be put in place, Transaction::Get[ForUpdate] and WriteBatchWithIndex::GetFromBatch[AndDB] will now return MergeInProgress if the most recent write to a key in the batch is a Merge.
Test Plan: more tests
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47817
Summary: Transactional SingleDelete is needed for MyRocks. Note: This diff requires D47529.
Test Plan: Added some new tests in this diff as well as more tests added in D47529
Reviewers: rven, sdong, igor, yhchiang
Reviewed By: yhchiang
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47535
Summary: Fixed some bugs in using SingleDelete on a WriteBatchWithIndex and added some tests.
Test Plan: new tests
Reviewers: sdong, yhchiang, rven, kradhakrishnan, IslamAbdelRahman, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47529
Summary:
This is an initial version of bulk load feature
This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are
(1) Memtables are empty
(2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0
(3) Added sst files values are not overlapping
Test Plan: unit testing
Reviewers: igor, ott, sdong
Reviewed By: sdong
Subscribers: leveldb, ott, dhruba
Differential Revision: https://reviews.facebook.net/D39081
Summary:
This patch fixes#7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).
In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.
Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.
Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
this patch supported this so it could be resurrected if needed)
Test Plan: make all check
Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor
Reviewed By: igor
Subscribers: maykov, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43179
Summary:
In case of huge db backup infromation about progress of downloading would help.
New callback parameter in CreateNewBackup() function will trigger whenever a some amount of data downloaded.
Task: 8057631
Test Plan:
ProgressCallbackDuringBackup test that cover new functionality added to BackupableDBTest tests.
other test succeed as well.
Reviewers: Guenena, benj, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46575
Summary: Added funtions to fetch the number of locked keys in a transaction, the number of pending puts/merge/deletes, and the elapsed time
Test Plan: unit tests
Reviewers: yoshinorim, jkedgar, rven, sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45417
Summary:
Prototype of API to allow MyRocks to override default Mutex/CondVar used by transactions with their own implementations. They would simply need to pass their own implementations of Mutex/CondVar to the templated TransactionDB::Open().
Default implementation of TransactionDBMutex/TransactionDBCondVar provided (but the code is not currently changed to use this).
Let me know if this API makes sense or if it should be changed
Test Plan: n/a
Reviewers: yhchiang, rven, igor, sdong, spetrunia
Reviewed By: spetrunia
Subscribers: maykov, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43761
Summary: This diff adds a verifyBackup method to BackupEngine. The method verifies the name and size of each file in the backup.
Test Plan: Unit test cases created and passing.
Reviewers: igor, benj
Subscribers: zelaine.fong, yhchiang, sdong, lgalanis, dhruba, AaronFeldman
Differential Revision: https://reviews.facebook.net/D46029
Summary: MyRocks wants to be able to change the lock timeout of a transaction that has already started. Expose existing SetLockTimeout function to users.
Test Plan: unit test
Reviewers: spetrunia, rven, sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45987
Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).
Test Plan: make clean check all
Reviewers: rven, anthony, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45993
Summary:
Clean up transactions to use the new RollbackToSavePoint api in WriteBatchWithIndex.
Note, this diff depends on Pessimistic Transactions diff and ManagedSnapshot diff (D40869 and D43293).
Test Plan: unit tests
Reviewers: rven, yhchiang, kradhakrishnan, spetrunia, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43371
Summary:
Initial implementation of Pessimistic Transactions. This diff contains the api changes discussed in D38913. This diff is pretty large, so let me know if people would prefer to meet up to discuss it.
MyRocks folks: please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.
Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint(). After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex. We can then decide which route is preferable.
Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.
Test Plan: Unit tests, db_bench parallel testing.
Reviewers: igor, rven, sdong, yhchiang, yoshinorim
Reviewed By: sdong
Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D40869
Summary: The list of info log files of a db can be obtained using the new function.
Test Plan: New test in db_test.cc passed.
Reviewers: yhchiang, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: IslamAbdelRahman, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D41715
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
Summary:
This diff adds CompactOnDeletionCollector in utilities/table_properties_collectors,
which applies a sliding window to a sst file and mark this file as need-compaction
when it observe enough deletion entries within the consecutive keys covered by
the sliding window.
Test Plan: compact_on_deletion_collector_test
Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yoshinorim, sdong
Reviewed By: sdong
Subscribers: maykov, dhruba
Differential Revision: https://reviews.facebook.net/D41175
Summary:
Support RollbackToSavePoint() in WriteBatch and WriteBatchWithIndex. Support for partial transaction rollback is needed for MyRocks.
An alternate implementation of Transaction::RollbackToSavePoint() exists in D40869. However, the other implementation is messier because it is implemented outside of WriteBatch. This implementation is much cleaner and also exposes a potentially useful feature to WriteBatch.
Test Plan: Added unit tests
Reviewers: IslamAbdelRahman, kradhakrishnan, maykov, yoshinorim, hermanlee4, spetrunia, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42723
Summary:
My latest fix to pragma_error.h caused compilation errors for another internal project. I am now unable to figure out how to get pragma_error working on all platforms and build environments (nor am I able to test any other options).
Seems like the best option is to get rid of this macro. include/utilities has been deprecated for a year now, so lets just deal with a breaking change in 3.13 to remove these files. And I guess we'll have to live with having an extra convenience.h.
Thoughts?
Test Plan: build
Reviewers: igor, yhchiang, kradhakrishnan, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42597
Summary:
Skipping these tests in ROCKSDB_LITE since they are not supported
json_document_test
wal_manager_test
ttl_test
sst_dump_test
deletefile_test
compact_files_test
prefix_test
checkpoint_test
Test Plan:
json_document_test
wal_manager_test
ttl_test
sst_dump_test
deletefile_test
compact_files_test
prefix_test
checkpoint_test
Reviewers: igor, sdong, yhchiang, kradhakrishnan, anthony
Reviewed By: anthony
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D42573
Summary:
ROCKSDB_WARNING is only defined if either ROCKSDB_PLATFORM_POSIX or OS_WIN is defined. This works well for building rocksdb with its own build scripts. But this won't work when an outside project(like mongodb) doesn't define ROCKSDB_PLATFORM_POSIX.
This fix defines ROCKSDB_WARNING for all platforms. No idea if its defined correctly on non-posix,non-windows platforms but this is no worse that the current situation where this macro is missing on unexpected platforms.
This fix should hopefully fix anyone whose build broke now that we've switched from using #warning to Pragma (to support windows). Unfortunately, while mongo-rocks compiles, it ignores the Pragma and doesn't print a warning. I have not been able to figure out a way to implement this portably on all platforms.
Of course, an alternate solution would be to just get rid of ROCKSDB_WARNING and live with include file redirects indefinitely. Thoughts?
Test Plan: build rocks, build mongorocks
Reviewers: igor, kradhakrishnan, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42477
- Remove make file defines from public headers and use _WIN32 because it is compiler defined
- use __GNUC__ and __clang__ to guard non-portable attributes
- add #include "port/port.h" to some new .cc files.
- minor changes in CMakeLists to reflect recent changes
Summary: quick fix for now. will figure out a better fix soon
Test Plan: build
Reviewers: sdong, igor, kradhakrishnan
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42381
Summary: Unbreak build for mongo. later, we'll have to figure out if there is a platform independent way to implement ROCKSDB_WARNING
Test Plan: build
Reviewers: igor, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42369
Summary: Moved convenience.h out of utilities to remove a dependency on utilities in db.
Test Plan: unit tests. Also compiled a link to the old location to verify the _Pragma works.
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42201
Summary:
Couple of changes here:
* NewBackupEngine() and NewReadOnlyBackupEngine() are now removed. They were deprecated since RocksDB 3.8. Changing these to new functions should be pretty straight-forward. As a followup, I'll fix all fbcode callsights
* Instead of initializing backup engine in the constructor, we initialize it in a separate function now. That way, we can catch all errors and return appropriate status code.
* We catch all errors during initializations and return them to the client properly.
* Added new tests to backupable_db_test, to make sure that we can't open BackupEngine when there are Env errors.
* Transitioned backupable_db_test to use BackupEngine rather than BackupableDB. From the two available APIs, judging by the current use-cases, it looks like BackupEngine API won. It's much more flexible since it doesn't require StackableDB.
Test Plan: Added a new unit test to backupable_db_test
Reviewers: yhchiang, sdong, AaronFeldman
Reviewed By: AaronFeldman
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41925
Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more.
Test Plan: Build it.
Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D41961
Summary:
Add a new field: BackupableDBOptions.max_background_copies.
CreateNewBackup() and RestoreDBFromBackup() will use this number of threads to perform copies.
If there is a backup rate limit, then max_background_copies must be 1.
Update backupable_db_test.cc to test multi-threaded backup and restore.
Update backupable_db_test.cc to test backups when the backup environment is not the same as the database environment.
Test Plan:
Run ./backupable_db_test
Run valgrind ./backupable_db_test
Run with TSAN and ASAN
Reviewers: yhchiang, rven, anthony, sdong, igor
Reviewed By: igor
Subscribers: yhchiang, anthony, sdong, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D40725
invalid suffix on literal
no return statement in function returning non-void CuckooStep::operator=
extra qualification ‘rocksdb::spatial::Variant::
dereferencing type-punned pointer will break strict-aliasing rules
Summary: Make RocksDb build and run on Windows to be functionally
complete and performant. All existing test cases run with no
regressions. Performance numbers are in the pull-request.
Test plan: make all of the existing unit tests pass, obtain perf numbers.
Co-authored-by: Praveen Rao praveensinghrao@outlook.com
Co-authored-by: Sherlock Huang baihan.huang@gmail.com
Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com
Summary:
This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters
Old CompactRange is still available but deprecated
Test Plan:
make all check
make rocksdbjava
USE_CLANG=1 make all
OPT=-DROCKSDB_LITE make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D40209
Summary:
Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables.
To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>.
Test Plan: Add a test case
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D40119
Summary:
[This is the resubmit of D39813. Tests were failing, so I reverted the diff. I found the bug and I'm now resubmitting]
If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).
Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D39897
This reverts commit 51440f83ec82b4b3fa54c1963f58d07a8b2c2810.
Revert "Re-generate WriteEntry on WBWIIterator::Entry()"
This reverts commit 4949ef08db89bdc63028679d1cc11312094e860e.
Summary: If we don't do this, any calls to Entry() after WBWI mutation will result in undefined behavior. We need to re-fetch the offset from the skip list and regenerate the new pointer (because string's base pointer can change while mutating).
Test Plan: COMPILE_WITH_ASAN=1 make write_batch_with_index_test && ./write_batch_with_index_test
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39813
Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true.
Test Plan: Add a unit test for it.
Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven
Reviewed By: rven
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D39099
Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported.
Test Plan: Added a new test, but still need to look into stress testing.
Reviewers: yhchiang, igor, rven, sdong
Reviewed By: sdong
Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33435
Summary:
Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch. (This is needed for transactions).
I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep).
Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc. WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)).
Test Plan: Added new unit tests.
Reviewers: rven, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38037