Summary:
Allow customized merge operator to be loaded from option file/map/string
by allowing users to pre-regiester merge operators to object registry.
Also update HISTORY.md and header files for the same feature for comparator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5123
Differential Revision: D14658488
Pulled By: siying
fbshipit-source-id: 86ea2fbd2a0a04632d8ea9fceaffefd041f6ae61
Summary:
Automatically format public headers so it looks more consistent.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5115
Differential Revision: D14632854
Pulled By: siying
fbshipit-source-id: ce9929ea62f9dcd65c69660b23eed1931cb0ae84
Summary:
Right now ldb command doesn't allow cases where option values contain equals sign. For example,
```
ldb --db=/tmp/test scan --from='q=3' --max_keys=1
```
after parsing, ldb will have one option 'db', 'max_keys' and one flag 'from'.
This PR updates the parsing logic so that it now supports the above mentioned cases
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5088
Differential Revision: D14600869
Pulled By: miasantreble
fbshipit-source-id: c6ef518c74a98d7b6675ea5954ae08b1bda5554e
Summary:
[RocksDB] Make it easier for users to load options from option file and set shared block cache.
Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file.
If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess.
Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063
Differential Revision: D14518584
Pulled By: rashmishrm
fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa
Summary:
BackupEngine relies on write-ahead logs to back up the memtable. Disabling write-ahead logs
can result in backups failing to preserve unflushed keys. This PR updates the documentation to specify this behavior, and suggest always flushing the memtable when write-ahead logs are disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5071
Differential Revision: D14524124
Pulled By: miasantreble
fbshipit-source-id: 635f855f8a42ad60273b5efd226139b511e3e5d5
Summary:
The transaction stress tests, stress a high concurrency scenario. In WritePrepared/WriteUnPrepared we need to also stress the scenarios where an inserting/reading transaction is very slow. This would stress the corner cases that the caching is not sufficient and other slower data structures are engaged. To emulate such cases we make use of slow inserter/verifier threads and also reduce the size of cache data structures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4974
Differential Revision: D14143070
Pulled By: maysamyabandeh
fbshipit-source-id: 81eb674678faf9fae0f654cd60ebcc74e26aeee7
Summary:
as title. For people who continue to need Lua compaction filter, you
can copy the include/rocksdb/utilities/rocks_lua/lua_compaction_filter.h and
utilities/lua/rocks_lua_compaction_filter.cc to your own codebase.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4971
Differential Revision: D14047468
Pulled By: riversand963
fbshipit-source-id: 9ad1a6484a7c94e478f1e108127a3184e4069f70
Summary:
Make file ingestion atomic.
as title.
Ingesting external SST files into multiple column families should be atomic. If
a crash occurs and db reopens, either all column families have successfully
ingested the files before the crash, or non of the ingestions have any effect
on the state of the db.
Also add unit tests for atomic ingestion.
Note that the unit test here does not cover the case of incomplete atomic group
in the MANIFEST, which is covered in VersionSetTest already.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4895
Differential Revision: D13718245
Pulled By: riversand963
fbshipit-source-id: 7df97cc483af73ad44dd6993008f99b083852198
Summary:
WritePreparedTransactionDB operates with more options which should not be configurable to avoid complicating it for the users. For testing purposes however we need to change the default value of this parameters. This patch makes these parameters private fields in TransactionDBOptions so that the existing ::Open API could use them seamlessly without however exposing them to the users.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4966
Differential Revision: D14015986
Pulled By: maysamyabandeh
fbshipit-source-id: 13037efa7dfdd6f73ec7a19414b66571e044c633
Summary:
We found that the behavior of CompactionFilter::IgnoreSnapshots() = false isn't
what we have expected. We thought that snapshot will always be preserved.
However, we just realized that, if no snapshot is created while compaction
starts, and a snapshot is created after that, the data seen from the snapshot
can successfully be dropped by the compaction. This creates a strange behavior
to the feature, which is hard to explain. Like what is documented in code
comment, this feature is not very useful with snapshot anyway. The decision
is to deprecate the feature.
We keep the function to avoid to break users code. However, we will fail
compactions if false is returned.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4954
Differential Revision: D13981900
Pulled By: siying
fbshipit-source-id: 2db8c2c3865acd86a28dca625945d1481b1d1e36
Summary:
Remove some components that we never heard people using them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4101
Differential Revision: D8825431
Pulled By: siying
fbshipit-source-id: 97a12ad3cad4ab12c82741a5ba49669aaa854180
Summary:
1) `transaction_base.h` overrides from `transaction.h` with a `const boolean do_validate`.
The non-const base declaration, which I cannot see the need for, causes a compilation error on Microsoft Windows.
2) Implicit cast from `double` to `uint64_t` causes a compilation error on Microsoft Windows.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4798
Differential Revision: D13519734
Pulled By: sagar0
fbshipit-source-id: 6e8cb80e9a589b1122e1500c21b8e3a3a472b459
Summary:
Fix the definition of assumed_tracked in Transaction that was introduced in #4680
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4762
Differential Revision: D13399150
Pulled By: maysamyabandeh
fbshipit-source-id: 2a30fe49e3c44adacd7e45cd48eae95023ca9dca
Summary:
Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false).
The Java APIs are accordingly updated.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680
Differential Revision: D13068508
Pulled By: maysamyabandeh
fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d
Summary:
Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors.
This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator
Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702
Differential Revision: D13146643
Pulled By: miasantreble
fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0
Summary:
Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656
Differential Revision: D13039257
Pulled By: miasantreble
fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
Differential Revision: D12934992
Pulled By: sagar0
fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
Summary:
Couple of very minor improvements (typos in comments, full qualification of class name, reordering members of a struct to make it smaller)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4564
Differential Revision: D10510183
Pulled By: maysamyabandeh
fbshipit-source-id: c7ddf9bfbf2db08cd31896c3fd93789d3fa68c8b
Summary:
Make the CompactOnDeletionCollectorFactory class public, and provide
methods to update the window size and deletion trigger params. These
will take effect on subsequent created SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4403
Differential Revision: D9976857
Pulled By: anand1976
fbshipit-source-id: 31dbf0511c12fa2bb9b2a7ba620079e0ee09cf48
Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346
Differential Revision: D9759149
Pulled By: maysamyabandeh
fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.
Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339
Differential Revision: D9654990
Pulled By: ajkr
fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
Summary:
Transaction has had methods to deal with SavePoints already, but
was missing the PopSavePoint method provided by WriteBatch and
WriteBatchWithIndex.
This PR adds PopSavePoint to Transaction as well. Having the method
on Transaction-level too is useful for applications that repeatedly
execute a sequence of operations that normally succeed, but infrequently
need to get rolled back. Using SavePoints here is sensible, but as
operations normally succeed the application may pile up a lot of
useless SavePoints inside a Transaction, leading to slightly increased
memory usage for managing the unneeded SavePoints.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4256
Differential Revision: D9326932
Pulled By: yiwu-arbug
fbshipit-source-id: 53a0af18a6c7e87feff8a56f1f3eab9df7f371d6
Summary:
Right now, `ldb idump` may have memory out of control if there is a big range of tombstones. Add an option to cut maxinum number of keys in GetAllKeyVersions(), and push down --max_num_ikeys from ldb.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4271
Differential Revision: D9369149
Pulled By: siying
fbshipit-source-id: 7cbb797b7d2fa16573495a7e84937456d3ff25bf
Summary:
We add two subcommands `write_extern_sst` and `ingest_extern_sst` to ldb. This PR avoids changing existing code because we hope to cherry-pick to earlier releases to support compatibility check for external SST file ingestion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4205
Differential Revision: D9112711
Pulled By: riversand963
fbshipit-source-id: 7cae88380d4de86da8440230e87eca66755648e4
Summary:
This adds support for writing unprepared batches based on size defined in `TransactionOptions::max_write_batch_size`. This is done by overriding methods that modify data (Put/Delete/SingleDelete/Merge) and checking first if write batch size has exceeded threshold. If so, the write batch is written to DB as an unprepared batch.
Support for Commit/Rollback for unprepared batch is added as well. This has been done by simply extending the WritePrepared Commit/Rollback logic to take care of all unprep_seq numbers either when updating prepare heap, or adding to commit map. For updating the commit map, this logic exists inside `WriteUnpreparedCommitEntryPreReleaseCallback`.
A test change was also made to have transactions unregister themselves when committing without prepare. This is because with write unprepared, there may be unprepared entries (which act similarly to prepared entries) already when a commit is done without prepare.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4104
Differential Revision: D8785717
Pulled By: lth
fbshipit-source-id: c02006e281ec1ce00f628e2a7beec0ee73096a91
Summary:
This adds support for recovering WriteUnprepared transactions through the following changes:
- The information in `RecoveredTransaction` is extended so that it can reference multiple batches.
- `MarkBeginPrepare` is extended with a bool indicating whether it is an unprepared begin, and this is passed down to `InsertRecoveredTransaction` to indicate whether the current transaction is prepared or not.
- `WriteUnpreparedTxnDB::Initialize` is overridden so that it will rollback unprepared transactions from the recovered transactions. This can be done without updating the prepare heap/commit map, because this is before the DB has finished initializing, and after writing the rollback batch, those data structures should not contain information about the rolled back transaction anyway.
Commit/Rollback of live transactions is still unimplemented and will come later.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4078
Differential Revision: D8703382
Pulled By: lth
fbshipit-source-id: 7e0aada6c23bd39299f1f20d6c060492e0e6b60a
Summary:
This PR supports the group commit of multiple version edit entries corresponding to different column families. Column family drop/creation still cannot be grouped. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
Closes https://github.com/facebook/rocksdb/pull/3944
Differential Revision: D8432536
Pulled By: riversand963
fbshipit-source-id: 8f11bd05193b6c0d9272d82e44b676abfac113cb
Summary:
- Summary
Add timestamp into the DeadlockInfo to store the timestamp when deadlock detected on the rocksdb side.
- Testplan:
`make check -j64`
Closes https://github.com/facebook/rocksdb/pull/4060
Differential Revision: D8655380
Pulled By: chouxi
fbshipit-source-id: f58e1aa5e09eb1d1eed0a181d4e2304aaf01efe8
Summary:
This is implemented by extending ReadCallback with another function `MaxUnpreparedSequenceNumber` which returns the largest visible sequence number for the current transaction, if there is uncommitted data written to DB. Otherwise, it returns zero, indicating no uncommitted data.
There are the places where reads had to be modified.
- Get and Seek/Next was just updated to seek to max(snapshot_seq, MaxUnpreparedSequenceNumber()) instead, and iterate until a key was visible.
- Prev did not need need updates since it did not use the Seek to sequence number optimization. Assuming that locks were held when writing unprepared keys, and ValidateSnapshot runs, there should only be committed keys and unprepared keys of the current transaction, all of which are visible. Prev will simply iterate to get the last visible key.
- Reseeking to skip keys optimization was also disabled for write unprepared, since it's possible to hit the max_skip condition even while reseeking. There needs to be some way to resolve infinite looping in this case.
Closes https://github.com/facebook/rocksdb/pull/3955
Differential Revision: D8286688
Pulled By: lth
fbshipit-source-id: 25e42f47fdeb5f7accea0f4fd350ef35198caafe
Summary:
The patch clarifies the ownership of the root db after TransactionDB::Open. If it is a success the ownership if with the TransactionDB, and the root db will be deleted when the destructor of the base class, StackableDB, is called. If it is failure, the temporarily created root db will also be deleted properly.
The patch also includes lots of useful formatting changes.
Closes https://github.com/facebook/rocksdb/pull/3714 upon which this patch is built.
Closes https://github.com/facebook/rocksdb/pull/3806
Differential Revision: D7878010
Pulled By: maysamyabandeh
fbshipit-source-id: f54f3942e29434143ae5a2423ceec9c7072cd4c2
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662
Differential Revision: D7426121
Pulled By: Dayvedde
fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
Summary:
This is a hack as temporary fix of MyRocks with rollbacking the merge operands. The way MyRocks uses merge operands is without protection of locks, which violates the assumption behind the rollback algorithm. They are ok with not being rolled back as it would just create a gap in the autoincrement column. The hack add an option to disable the rollback of merge operands by default and only enables it to let the unit test pass.
Closes https://github.com/facebook/rocksdb/pull/3711
Differential Revision: D7597177
Pulled By: maysamyabandeh
fbshipit-source-id: 544be0f666c7e7abb7f651ec8b23124e05056728
Summary:
Adding some stats that would be helpful to monitor if the DB has gone to unlikely stats that would hurt the performance. These are mostly when we end up needing to acquire a mutex.
Closes https://github.com/facebook/rocksdb/pull/3683
Differential Revision: D7529393
Pulled By: maysamyabandeh
fbshipit-source-id: f7d36279a8f39bd84d8ddbf64b5c97f670c5d6d9
Summary:
Currently dump_wal cannot print the prepared records from the WAL that is generated by WRITE_PREPARED write policy since the default reaction of the handler is to return NotSupported if markers of WRITE_PREPARED are encountered. This patch enables the admin to pass --write_committed=false option, which will be accordingly passed to the handler. Note that DBFileDumperCommand and DBDumperCommand are still not updated by this patch but firstly they are not urgent and secondly we need to revise this approach later when we also add WRITE_UNPREPARED markers so I leave it for future work.
Tested by running it on a WAL generated by WRITE_PREPARED:
$ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log | grep BEGIN_PREARE | head -1
1,2,70,0,BEGIN_PREARE
$ ./ldb dump_wal --walfile=/dev/shm/dbbench/000003.log --write_committed=false | grep BEGIN_PREARE | head -1
1,2,70,0,BEGIN_PREARE PUT(0) : 0x30303031313330313938 PUT(0) : 0x30303032353732313935 END_PREPARE(0x74786E31313535383434323738303738363938313335312D30)
Closes https://github.com/facebook/rocksdb/pull/3682
Differential Revision: D7522090
Pulled By: maysamyabandeh
fbshipit-source-id: a0332207261c61e18b2f9dfbe9feecd9a1339aca
Summary:
When `max_valid_backups_to_open` is set, the `BackupEngine` doesn't know about the files referenced by existing backups. This PR prevents us from deleting valid files when that option is set, in cases where we are unable to accurately determine refcount. There are warnings logged when we may miss deleting unreferenced files, and a recommendation in the header for users to periodically unset this option and run a full `GarbageCollect`.
Closes https://github.com/facebook/rocksdb/pull/3518
Differential Revision: D7008331
Pulled By: ajkr
fbshipit-source-id: 87907f964dc9716e229d08636a895d2fc7b72305
Summary:
This change models Optimistic Tx db after Pessimistic TX db. The motivation for this change is to make the ptr polymorphic so it can be held by the same raw or smart ptr.
Currently, due to the inheritance of the Opt Tx db not being rooted in the manner of Pess Tx from a single DB root it is more difficult to write clean code and have clear ownership of the database in cases when options dictate instantiate of plan DB, Pess Tx DB or Opt tx db.
Closes https://github.com/facebook/rocksdb/pull/3566
Differential Revision: D7184502
Pulled By: yiwu-arbug
fbshipit-source-id: 31d06efafd79497bb0c230e971857dba3bd962c3
Summary:
This is a small API extension to allow the CompactFiles method to return the names of files that were created during the compaction.
Closes https://github.com/facebook/rocksdb/pull/3608
Differential Revision: D7275789
Pulled By: siying
fbshipit-source-id: 1ec0c3954a0f10cd877efb5f29f9be6c7b59e9ba
Summary:
Fix the following bugs:
- During recovery a duplicate key was inserted twice into the write batch of the recovery transaction,
once when the memtable returns false (because it was duplicates) and once for the 2nd attempt. This would result into different SubBatch count measured when the recovered transactions is committing.
- If a cf is flushed during recovery the memtable is not available to assist in detecting the duplicate key. This could result into not advancing the sequence number when iterating over duplicate keys of a flushed cf and hence inserting the next key with the wrong sequence number.
- SubBacthCounter would reset the comparator to default comparator after the first duplicate key. The 2nd duplicate key hence would have gone through a wrong comparator and not being detected.
Closes https://github.com/facebook/rocksdb/pull/3562
Differential Revision: D7149440
Pulled By: maysamyabandeh
fbshipit-source-id: 91ec317b165f363f5d11ff8b8c47c81cebb8ed77
Summary:
Make use of the index in WriteBatchWithIndex to also count the number of sub-batches. This eliminates the need to separately scan the batch to count the number of sub-batches once a duplicate key is detected.
Closes https://github.com/facebook/rocksdb/pull/3529
Differential Revision: D7049947
Pulled By: maysamyabandeh
fbshipit-source-id: 81cbf12c4e662541c772c7265a8f91631e25c7cd