Commit Graph

849 Commits

Author SHA1 Message Date
Siying Dong
d59549298f Skip deleted WALs during recovery
Summary:
This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.

Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)

This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
Closes https://github.com/facebook/rocksdb/pull/3765

Differential Revision: D7747618

Pulled By: siying

fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
2018-05-03 15:43:09 -07:00
Maysam Yabandeh
cfb86659bf WritePrepared Txn: enable rollback in stress test
Summary:
Rollback was disabled in stress test since there was a concurrency issue in WritePrepared rollback algorithm. The issue is fixed by caching the column family handles in WritePrepared to skip getting them from the db when needed for rollback.

Tested by running transaction stress test under tsan.
Closes https://github.com/facebook/rocksdb/pull/3785

Differential Revision: D7793727

Pulled By: maysamyabandeh

fbshipit-source-id: d81ab6fda0e53186ca69944cfe0712ce4869451e
2018-05-02 18:13:05 -07:00
Maysam Yabandeh
5bed8a0065 WritePrepared Txn: split SeqAdvanceConcurrentTest
Summary:
The tsan flavor of SeqAdvanceConcurrentTest times out in our test infra. The patch splits it into 10 tests.
On my vm before:
[       OK ] WritePreparedTransactionTest/WritePreparedTransactionTest.SeqAdvanceConcurrentTest/0 (5194 ms)
after:
[       OK ] OneWriteQueue/SeqAdvanceConcurrentTest.SeqAdvanceConcurrentTest/0 (1906 ms)
Closes https://github.com/facebook/rocksdb/pull/3799

Differential Revision: D7854515

Pulled By: maysamyabandeh

fbshipit-source-id: 4fbac42a1f974326cbc237f8cb9d6232d379c431
2018-05-02 18:13:05 -07:00
Andrew Kryczka
a8a28da215 Avoid directory renames in BackupEngine
Summary:
We used to name private directories like "1.tmp" while BackupEngine populated them, and then rename without the ".tmp" suffix (i.e., rename "1.tmp" to "1") after all files were copied. On glusterfs, directory renames like this require operations across many hosts, and partial failures have caused operational problems.

Fortunately we don't need to rename private directories. We already have a meta-file that uses the tempfile-rename pattern to commit a backup atomically after all its files have been successfully copied. So we can copy private files directly to their final location, so now there's no directory rename.
Closes https://github.com/facebook/rocksdb/pull/3749

Differential Revision: D7705610

Pulled By: ajkr

fbshipit-source-id: fd724a28dd2bf993ce323a5f2cb7e7d6980cc346
2018-04-20 17:28:33 -07:00
Maysam Yabandeh
bb2a2ec731 WritePrepared Txn: rollback via commit
Summary:
Currently WritePrepared rolls back a transaction with prepare sequence number prepare_seq by i) write a single rollback batch with rollback_seq, ii) add <rollback_seq, rollback_seq> to commit cache, iii) remove prepare_seq from PrepareHeap.
This is correct assuming that there is no snapshot taken when a transaction is rolled back. This is the case the way MySQL does rollback which is after recovery. Otherwise if max_evicted_seq advances the prepare_seq, the live snapshot might assume data as committed since it does not find them in CommitCache.
The change is to simply add <prepare_seq. rollback_seq> to commit cache before removing prepare_seq from PrepareHeap. In this way if max_evicted_seq advances prpeare_seq, the existing mechanism that we have to check evicted entries against live snapshots will make sure that the live snapshot will not see the data of rolled back transaction.
Closes https://github.com/facebook/rocksdb/pull/3745

Differential Revision: D7696193

Pulled By: maysamyabandeh

fbshipit-source-id: c9a2d46341ddc03554dded1303520a1cab74ef9c
2018-04-20 15:28:19 -07:00
Maysam Yabandeh
c3d1e36cce WritePrepared Txn: enable TryAgain for duplicates at the end of the batch
Summary:
The WriteBatch::Iterate will try with a larger sequence number if the memtable reports a duplicate. This status is specified with TryAgain status. So far the assumption was that the last entry in the batch will never return TryAgain, which is correct when WAL is created via WritePrepared since it always appends a batch separator if a natural one does not exist. However when reading a WAL generated by WriteCommitted this batch separator might  not exist. Although WritePrepared is not supposed to be able to read the WAL generated by WriteCommitted we should avoid confusing scenarios in which the behavior becomes unpredictable. The path fixes that by allowing TryAgain even for the last entry of the write batch.
Closes https://github.com/facebook/rocksdb/pull/3747

Differential Revision: D7708391

Pulled By: maysamyabandeh

fbshipit-source-id: bfaddaa9b14a4cdaff6977f6f63c789a6ab1ee0d
2018-04-20 15:28:19 -07:00
Zhongyi Xie
954b496b3f fix memory leak in two_level_iterator
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
2018-04-15 17:26:26 -07:00
David Lai
3be9b36453 comment unused parameters to turn on -Wunused-parameter flag
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662

Differential Revision: D7426121

Pulled By: Dayvedde

fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
2018-04-12 17:59:16 -07:00
Maysam Yabandeh
d15397ba10 WritePrepared Txn: rollback_merge_operands hack
Summary:
This is a hack as temporary fix of MyRocks with rollbacking  the merge operands. The way MyRocks uses merge operands is without protection of locks, which violates the assumption behind the rollback algorithm. They are ok with not being rolled back as it would just create a gap in the autoincrement column. The hack add an option to disable the rollback of merge operands by default and only enables it to let the unit test pass.
Closes https://github.com/facebook/rocksdb/pull/3711

Differential Revision: D7597177

Pulled By: maysamyabandeh

fbshipit-source-id: 544be0f666c7e7abb7f651ec8b23124e05056728
2018-04-12 11:58:11 -07:00
Maysam Yabandeh
6f5e6445d9 WritePrepared Txn: fix smallest_prep atomicity issue
Summary:
We introduced smallest_prep optimization in this commit b225de7e10, which enables storing the smallest uncommitted sequence number along with the snapshot. This enables the readers that read from the snapshot to skip further checks and safely assumed the data is committed if its sequence number is less than smallest uncommitted when the snapshot was taken. The problem was that smallest uncommitted and the snapshot must be taken atomically, and the lack of atomicity had led to readers using a smallest uncommitted after the snapshot was taken and hence mistakenly skipping some data.
This patch fixes the problem by i) separating the process of removing of prepare entries from the AddCommitted function, ii) removing the prepare entires AFTER the committed sequence number is published, iii) getting smallest uncommitted (from the prepare list) BEFORE taking a snapshot. This guarantees that the smallest uncommitted that is accompanied with a snapshot is less than or equal of such number if it was obtained atomically.

Tested by running MySQLStyleTransactionTest/MySQLStyleTransactionTest.TransactionStressTest that was failing sporadically.
Closes https://github.com/facebook/rocksdb/pull/3703

Differential Revision: D7581934

Pulled By: maysamyabandeh

fbshipit-source-id: dc9d6f4fb477eba75d4d5927326905b548a96a32
2018-04-11 20:11:51 -07:00
Dmitri Smirnov
5ec382b918 Fix up backupable_db stack corruption.
Summary:
Fix up OACR(Lint) warnings.
Closes https://github.com/facebook/rocksdb/pull/3674

Differential Revision: D7563869

Pulled By: ajkr

fbshipit-source-id: 8c1e5045c8a6a2d85b2933fdbc60fde93bf0c9de
2018-04-09 19:27:24 -07:00
Maysam Yabandeh
bde1c1a72a WritePrepared Txn: add stats
Summary:
Adding some stats that would be helpful to monitor if the DB has gone to unlikely stats that would hurt the performance. These are mostly when we end up needing to acquire a mutex.
Closes https://github.com/facebook/rocksdb/pull/3683

Differential Revision: D7529393

Pulled By: maysamyabandeh

fbshipit-source-id: f7d36279a8f39bd84d8ddbf64b5c97f670c5d6d9
2018-04-07 21:56:42 -07:00
Andrew Kryczka
faba3fb53d protect valid backup files when max_valid_backups_to_open is set
Summary:
When `max_valid_backups_to_open` is set, the `BackupEngine` doesn't know about the files referenced by existing backups. This PR prevents us from deleting valid files when that option is set, in cases where we are unable to accurately determine refcount. There are warnings logged when we may miss deleting unreferenced files, and a recommendation in the header for users to periodically unset this option and run a full `GarbageCollect`.
Closes https://github.com/facebook/rocksdb/pull/3518

Differential Revision: D7008331

Pulled By: ajkr

fbshipit-source-id: 87907f964dc9716e229d08636a895d2fc7b72305
2018-04-05 21:13:21 -07:00
Yi Wu
36a9f22931 Blob DB: blob_dump to show uncompressed values
Summary:
Make blob_dump tool able to show uncompressed values if the blob file is compressed. Also show total compressed vs. raw size at the end if --show_summary is provided.
Closes https://github.com/facebook/rocksdb/pull/3633

Differential Revision: D7348926

Pulled By: yiwu-arbug

fbshipit-source-id: ca709cb4ed5cf6a550ff2987df8033df81516f8e
2018-04-05 11:12:16 -07:00
Dmitri Smirnov
2a62ca1750 Make Optimistic Tx database stackable
Summary:
This change models Optimistic Tx db after Pessimistic TX db. The motivation for this change is to make the ptr polymorphic so it can be held by the same raw or smart ptr.

Currently, due to the inheritance of the Opt Tx db not being rooted in the manner of Pess Tx from a single DB root it is more difficult to write clean code and have clear ownership of the database in cases when options dictate instantiate of plan DB, Pess Tx DB or Opt tx db.
Closes https://github.com/facebook/rocksdb/pull/3566

Differential Revision: D7184502

Pulled By: yiwu-arbug

fbshipit-source-id: 31d06efafd79497bb0c230e971857dba3bd962c3
2018-04-03 15:28:40 -07:00
Maysam Yabandeh
b225de7e10 WritePrepared Txn: smallest_prepare optimization
Summary:
The is an optimization to reduce lookup in the CommitCache when querying IsInSnapshot. The optimization takes the smallest uncommitted data at the time that the snapshot was taken and if the sequence number of the read data is lower than that number it assumes the data as committed.
To implement this optimization two changes are required: i) The AddPrepared function must be called sequentially to avoid out of order insertion in the PrepareHeap (otherwise the top of the heap does not indicate the smallest prepare in future too), ii) non-2PC transactions also call AddPrepared if they do not commit in one step.
Closes https://github.com/facebook/rocksdb/pull/3649

Differential Revision: D7388630

Pulled By: maysamyabandeh

fbshipit-source-id: b79506238c17467d590763582960d4d90181c600
2018-04-02 20:27:41 -07:00
Maysam Yabandeh
0377ff9dea WritePrepared Txn: make recoverable state visible after flush
Summary:
Currently if the CommitTimeWriteBatch is set to be used only as a state that is required only for recovery , the user cannot see that in DB until it is restarted. This while the state is already inserted into the DB after the memtable flush. It would be useful for debugging if make this state visible to the user after the flush by committing it. The patch does it by a invoking a callback that does the commit on the recoverable state.
Closes https://github.com/facebook/rocksdb/pull/3661

Differential Revision: D7424577

Pulled By: maysamyabandeh

fbshipit-source-id: 137f9408662f0853938b33fa440f27f04c1bbf5c
2018-03-28 12:12:08 -07:00
Maysam Yabandeh
0999e9b79a WritePrepared Txn: Increase commit cache size to 2^23
Summary:
Current commit cache size is 2^21. This was due to a type. With 2^23 commit entries we can have transactions as long as 64s without incurring the cost of having them evicted from the commit cache before their commit. Here is the math:
2^23 / 2 (one out of two seq numbers are for commit) / 2^16 TPS = 2^6 = 64s
Closes https://github.com/facebook/rocksdb/pull/3657

Differential Revision: D7411211

Pulled By: maysamyabandeh

fbshipit-source-id: e7cacf40579f3acf940643d8a1cfe5dd201caa35
2018-03-26 19:45:17 -07:00
Maysam Yabandeh
35a4469bbf Fix race condition via concurrent FlushWAL
Summary:
Currently log_writer->AddRecord in WriteImpl is protected from concurrent calls via FlushWAL only if two_write_queues_ option is set. The patch fixes the problem by i) skip log_writer->AddRecord in FlushWAL if manual_wal_flush is not set, ii) protects log_writer->AddRecord in WriteImpl via log_write_mutex_ if manual_wal_flush_ is set but two_write_queues_ is not.

Fixes #3599
Closes https://github.com/facebook/rocksdb/pull/3656

Differential Revision: D7405608

Pulled By: maysamyabandeh

fbshipit-source-id: d6cc265051c77ae49c7c6df4f427350baaf46934
2018-03-26 16:29:56 -07:00
Maysam Yabandeh
3e417a6607 WritePrepared Txn: AddPrepared for all sub-batches
Summary:
Currently AddPrepared is performed only on the first sub-batch if there are duplicate keys in the write batch. This could cause a problem if the transaction takes too long to commit and the seq number of the first sub-patch moved to old_prepared_ but not the seq of the later ones. The patch fixes this by calling AddPrepared for all sub-patches.
Closes https://github.com/facebook/rocksdb/pull/3651

Differential Revision: D7388635

Pulled By: maysamyabandeh

fbshipit-source-id: 0ccd80c150d9bc42fe955e49ddb9d7ca353067b4
2018-03-23 17:30:04 -07:00
Andrew Kryczka
82137f0ce8 Add unit test for WAL corruption
Summary: Closes https://github.com/facebook/rocksdb/pull/3618

Differential Revision: D7301053

Pulled By: ajkr

fbshipit-source-id: a9dde90caa548c294d03d6386f78428c8536ca14
2018-03-22 18:28:01 -07:00
Maysam Yabandeh
7429b20e39 WritePrepared Txn: fix race condition on publishing seq
Summary:
This commit fixes a race condition on calling SetLastPublishedSequence. The function must be called only from the 2nd write queue when two_write_queues is enabled. However there was a bug that would also call it from the main write queue if CommitTimeWriteBatch is provided to the commit request and yet use_only_the_last_commit_time_batch_for_recovery optimization is not enabled. To fix that we penalize the commit request in such cases by doing an additional write solely to publish the seq number from the 2nd queue.
Closes https://github.com/facebook/rocksdb/pull/3641

Differential Revision: D7361508

Pulled By: maysamyabandeh

fbshipit-source-id: bf8f7a27e5cccf5425dccbce25eb0032e8e5a4d7
2018-03-22 14:43:36 -07:00
Rohan Rathi
fa8c050e9f Fixed buffer overrun in BackupEngineImpl::BackupMeta::StoreToFile
Summary:
The 10MB buffer in BackupEngineImpl::BackupMeta::StoreToFile can be corrupted with a large number of files. Added a check to determine current buffer length and append data to file if buffer becomes full.

Resolves https://github.com/facebook/rocksdb/issues/3228
Closes https://github.com/facebook/rocksdb/pull/3636

Differential Revision: D7354160

Pulled By: ajkr

fbshipit-source-id: eec12d38095a0d17551a4aaee52b99d30a555722
2018-03-22 14:08:10 -07:00
Yi Wu
f1b7b790c9 BlobDB: Fix BlobDBImpl::GCFileAndUpdateLSM issues
Summary:
* Fix BlobDBImpl::GCFileAndUpdateLSM doesn't close the new file, and the new file will not be able to be garbage collected later.
* Fix BlobDBImpl::GCFileAndUpdateLSM doesn't copy over metadata from old file to new file.
Closes https://github.com/facebook/rocksdb/pull/3639

Differential Revision: D7355092

Pulled By: yiwu-arbug

fbshipit-source-id: 4fa3594ac5ce376bed1af04a545c532cfc0088c4
2018-03-21 16:30:09 -07:00
Andrew Kryczka
0cdaa1a804 Fix WAL corruption from checkpoint/backup race condition
Summary:
`Writer::WriteBuffer` was always called at the beginning of checkpoint/backup. But that log writer has no internal synchronization, which meant the same buffer could be flushed twice in a race condition case, causing a WAL entry to be duplicated. Then subsequent WAL entries would be at unexpected offsets, causing the 32KB block boundaries to be overlapped and manifesting as a corruption.

This PR fixes the behavior to only use `WriteBuffer` (via `FlushWAL`) in checkpoint/backup when manual WAL flush is enabled. In that case, users are responsible for providing synchronization between WAL flushes. We can also consider removing the call entirely.
Closes https://github.com/facebook/rocksdb/pull/3603

Differential Revision: D7277447

Pulled By: ajkr

fbshipit-source-id: 1b15bd7fd930511222b075418c10de0aaa70a35a
2018-03-14 16:12:50 -07:00
Yi Wu
449627f0ea Blob DB: remove unreacheable code
Summary:
Fixing #3604.
Closes https://github.com/facebook/rocksdb/pull/3606

Reviewed By: siying

Differential Revision: D7276604

Pulled By: yiwu-arbug

fbshipit-source-id: 915c5897b010d28956f369989e49e64785d1161f
2018-03-14 14:27:28 -07:00
Javeme Lee
f6156fb558 Support StringAppendOperator(delimiter_char) constructor in java-api
Summary:
Fixes #3336
Closes https://github.com/facebook/rocksdb/pull/3337

Differential Revision: D7196585

Pulled By: sagar0

fbshipit-source-id: a854f3fc906862ecba685b31946e4ef7c0b421c5
2018-03-08 16:17:47 -08:00
Bruce Mitchener
a3a3f5497c Fix some typos in comments and docs.
Summary: Closes https://github.com/facebook/rocksdb/pull/3568

Differential Revision: D7170953

Pulled By: siying

fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c
2018-03-08 10:27:25 -08:00
Bruce Mitchener
0de710f5b8 Use nullptr instead of NULL / 0 more consistently.
Summary: Closes https://github.com/facebook/rocksdb/pull/3569

Differential Revision: D7170968

Pulled By: yiwu-arbug

fbshipit-source-id: 308a6b7dd358a04fd9a7de3d927bfd8abd57d348
2018-03-07 12:42:12 -08:00
amytai
0a3db28d98 Disallow compactions if there isn't enough free space
Summary:
This diff handles cases where compaction causes an ENOSPC error.
This does not handle corner cases where another background job is started while compaction is running, and the other background job triggers ENOSPC, although we do allow the user to provision for these background jobs with SstFileManager::SetCompactionBufferSize.
It also does not handle the case where compaction has finished and some other background job independently triggers ENOSPC.

Usage: Functionality is inside SstFileManager. In particular, users should set SstFileManager::SetMaxAllowedSpaceUsage, which is the reference highwatermark for determining whether to cancel compactions.
Closes https://github.com/facebook/rocksdb/pull/3449

Differential Revision: D7016941

Pulled By: amytai

fbshipit-source-id: 8965ab8dd8b00972e771637a41b4e6c645450445
2018-03-06 16:27:54 -08:00
Dmitri Smirnov
c364eb42b5 Windows cumulative patch
Summary:
This patch addressed several issues.
  Portability including db_test std::thread -> port::Thread Cc: @
  and %z to ROCKSDB portable macro. Cc: maysamyabandeh

  Implement Env::AreFilesSame

  Make the implementation of file unique number more robust

  Get rid of C-runtime and go directly to Windows API when dealing
  with file primitives.

  Implement GetSectorSize() and aling unbuffered read on the value if
  available.

  Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976

  Fix test running script issue where $status var was of incorrect scope
  so the failures were swallowed and not reported.

  DestroyDB() creates a logger and opens a LOG file in the directory
  being cleaned up. This holds a lock on the folder and the cleanup is
  prevented. This fails one of the checkpoin tests. We observe the same in production.
  We close the log file in this change.

 Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
 attempts to open a directory with NewRandomAccessFile which does not
 work on Windows.
  Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
Closes https://github.com/facebook/rocksdb/pull/3552

Differential Revision: D7156304

Pulled By: siying

fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
2018-03-06 11:57:43 -08:00
Yi Wu
b864bc9b5b Blob DB: Improve FIFO eviction
Summary:
Improving blob db FIFO eviction with the following changes,
* Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size.
* FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion.
* FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files.
* If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO.
* Compaction filter also filter those blob indexes where corresponding blob file is gone.
* Add an event listener to listen compaction/flush event and update sst file size.
* Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db.
* More blob db statistics around FIFO.
* Fix some locking issue when accessing a blob file.
Closes https://github.com/facebook/rocksdb/pull/3556

Differential Revision: D7139328

Pulled By: yiwu-arbug

fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa
2018-03-06 11:57:42 -08:00
Pooya Shareghi
0a2354ca8f Added bytes XOR merge operator
Summary:
Closes https://github.com/facebook/rocksdb/pull/575

I fixed the merge conflicts etc.
Closes https://github.com/facebook/rocksdb/pull/3065

Differential Revision: D7128233

Pulled By: sagar0

fbshipit-source-id: 2c23a48c9f0432c290b0cd16a12fb691bb37820c
2018-03-06 10:27:36 -08:00
Maysam Yabandeh
62277e15c3 WritePrepared Txn: Move DuplicateDetector to util
Summary:
Move DuplicateDetector and SetComparator to its own header file in util. It would also address a complaint in the unity test.
Closes https://github.com/facebook/rocksdb/pull/3567

Differential Revision: D7163268

Pulled By: maysamyabandeh

fbshipit-source-id: 6ddf82773473646dbbc1284ae601a78c4907c778
2018-03-05 23:57:12 -08:00
Andrew Kryczka
5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Maysam Yabandeh
680864ae54 WritePrepared Txn: Fix bug with duplicate keys during recovery
Summary:
Fix the following bugs:
- During recovery a duplicate key was inserted twice into the write batch of the recovery transaction,
once when the memtable returns false (because it was duplicates) and once for the 2nd attempt. This would result into different SubBatch count measured when the recovered transactions is committing.
- If a cf is flushed during recovery the memtable is not available to assist in detecting the duplicate key. This could result into not advancing the sequence number when iterating over duplicate keys of a flushed cf and hence inserting the next key with the wrong sequence number.
- SubBacthCounter would reset the comparator to default comparator after the first duplicate key. The 2nd duplicate key hence would have gone through a wrong comparator and not being detected.
Closes https://github.com/facebook/rocksdb/pull/3562

Differential Revision: D7149440

Pulled By: maysamyabandeh

fbshipit-source-id: 91ec317b165f363f5d11ff8b8c47c81cebb8ed77
2018-03-05 10:57:59 -08:00
Yi Wu
1209b6db5c Blob DB: remove existing garbage collection implementation
Summary:
Red diff to remove existing implementation of garbage collection. The current approach is reference counting kind of approach and require a lot of effort to get the size counter right on compaction and deletion. I'm going to go with a simple mark-sweep kind of approach and will send another PR for that.

CompactionEventListener was added solely for blob db and it adds complexity and overhead to compaction iterator. Removing it as well.
Closes https://github.com/facebook/rocksdb/pull/3551

Differential Revision: D7130190

Pulled By: yiwu-arbug

fbshipit-source-id: c3a375ad2639a3f6ed179df6eda602372cc5b8df
2018-03-02 12:57:23 -08:00
Maysam Yabandeh
d060421c77 Fix a leak in prepared_section_completed_
Summary:
The zeroed entries were not removed from prepared_section_completed_ map. This patch adds a unit test to show the problem and fixes that by refactoring the code. The new code is more efficient since i) it uses two separate mutex to avoid contention between commit and prepare threads, ii) it uses a sorted vector for maintaining uniq log entires with prepare which avoids a very large heap with many duplicate entries.
Closes https://github.com/facebook/rocksdb/pull/3545

Differential Revision: D7106071

Pulled By: maysamyabandeh

fbshipit-source-id: b3ae17cb6cd37ef10b6b35e0086c15c758768a48
2018-03-01 20:41:56 -08:00
Maysam Yabandeh
90eca1e616 WritePrepared Txn: optimize SubBatchCnt
Summary:
Make use of the index in WriteBatchWithIndex to also count the number of sub-batches. This eliminates the need to separately scan the batch to count the number of sub-batches once a duplicate key is detected.
Closes https://github.com/facebook/rocksdb/pull/3529

Differential Revision: D7049947

Pulled By: maysamyabandeh

fbshipit-source-id: 81cbf12c4e662541c772c7265a8f91631e25c7cd
2018-02-22 18:12:26 -08:00
Igor Sugak
aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai
f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Andrew Kryczka
b092977643 BackupEngine gluster-friendly file naming convention
Summary:
Use the rsync tempfile naming convention in our `BackupEngine`. The temp file follows the format, `.<filename>.<suffix>`, which is later renamed to `<filename>`. We fix `tmp` as the `<suffix>` as we don't need to use random bytes for now. The benefit is gluster treats this tempfile naming convention specially and applies hashing only to `<filename>`, so the file won't need to be linked or moved when it's renamed. Our gluster team suggested this will make things operationally easier.
Closes https://github.com/facebook/rocksdb/pull/3463

Differential Revision: D6893333

Pulled By: ajkr

fbshipit-source-id: fd7622978f4b2487fce33cde40dd3124f16bcaa8
2018-02-21 17:42:07 -08:00
Maysam Yabandeh
828211e901 WritePrepared Txn: fix non-emptied PreparedHeap bug
Summary:
Under a certain sequence of accessing PreparedHeap, there was a bug that would not successfully empty the heap. This would result in performance issues when the heap content is moved to old_prepared_ after max_evicted_seq_ advances the orphan prepared sequence numbers. The patch fixed the bug and add more unit tests. It also does more logging when the unlikely scenarios are faced
Closes https://github.com/facebook/rocksdb/pull/3526

Differential Revision: D7038486

Pulled By: maysamyabandeh

fbshipit-source-id: f1e40bea558f67b03d2a29131fcb8734c65fce97
2018-02-21 13:42:23 -08:00
jsteemann
e9c31ab159 save redundant key lookup in map of locked keys
Summary:
In case it is found that a key is already marked as locked in a
stripe's map of locked keys, it is not necessary to look it up
again using `std::unordered_map<std::string, ...>::at(size_t)`.

Instead, we can use the already found position using the iterator
produced by the previous `find` operation. Reusing the iterator
will avoid having to hash the key again and do additional "random"
memory lookups in the map of keys (though the data will very
likely sit available in caches here already due to the previous
find operation)
Closes https://github.com/facebook/rocksdb/pull/3505

Differential Revision: D7036446

Pulled By: sagar0

fbshipit-source-id: cced51547b2bd2d49394f6bc8c5896f09fa80f68
2018-02-20 17:44:44 -08:00
Andrew Kryczka
1960e73e21 fix handling of empty string as checkpoint directory
Summary:
- made `CreateCheckpoint` properly return `InvalidArgument` when called with an empty directory. Previously it triggered an assertion failure due to a bug in the logic.
- made `ldb` set empty `checkpoint_dir` if that's what the user specifies, so that we can use it to properly test `CreateCheckpoint` in the future.

Differential Revision: D6874562

fbshipit-source-id: dcc1bd41768261d9338987fa7711444289707ed7
2018-02-20 16:44:00 -08:00
Igor Sugak
5263da6396 fix shift UBSAN error in col_buf_encoder.cc
Summary:
Add a static cast to perform the left shift as with an unsigned type.

make ubsan_check
Closes https://github.com/facebook/rocksdb/pull/3517

Reviewed By: sagar0

Differential Revision: D7016044

Pulled By: igorsugak

fbshipit-source-id: baf72f6197edd8f7220d010b15a23d6de6a72c49
2018-02-20 16:44:00 -08:00
Po-Chuan Hsieh
ab446dc22d Fix build with USE_RTTI=0
Summary:
utilities/column_aware_encoding_util.cc:61:23: error: cannot use dynamic_cast with -fno-rtti
  table_reader_.reset(dynamic_cast<BlockBasedTable*>(table_reader.release()));
                      ^
1 error generated.

It was added as a [local patch](https://svnweb.freebsd.org/ports/head/databases/rocksdb/files/patch-utilities-column_aware_encoding_util.cc) on FreeBSD since RocksDB 5.8.
It also fixes #2707.
Closes https://github.com/facebook/rocksdb/pull/3514

Differential Revision: D7005571

Pulled By: siying

fbshipit-source-id: 351a9055d21d0accdd7a932e8e7bfcd3c8e22068
2018-02-16 10:41:49 -08:00
Maysam Yabandeh
c178da053b WritePrepared Txn: optimizations for sysbench update_noindex
Summary:
These are optimization that we applied to improve sysbech's update_noindex performance.
1. Make use of LIKELY compiler hint
2. Move std::atomic so the subclass
3. Make use of skip_prepared in non-2pc transactions.
Closes https://github.com/facebook/rocksdb/pull/3512

Differential Revision: D7000075

Pulled By: maysamyabandeh

fbshipit-source-id: 1ab8292584df1f6305a4992973fb1b7933632181
2018-02-16 08:42:31 -08:00
jsteemann
4e7a182d09 Several small "fixes"
Summary:
- removed a few unneeded variables
- fused some variable declarations and their assignments
- fixed right-trimming code in string_util.cc to not underflow
- simplifed an assertion
- move non-nullptr check assertion before dereferencing of that pointer
- pass an std::string function parameter by const reference instead of by value (avoiding potential copy)
Closes https://github.com/facebook/rocksdb/pull/3507

Differential Revision: D7004679

Pulled By: sagar0

fbshipit-source-id: 52944952d9b56dfcac3bea3cd7878e315bb563c4
2018-02-15 16:57:37 -08:00
jsteemann
6a30b98fdc fix wrong indentation
Summary:
Somehow the indentation was incorrect in this file.
The only change in this PR is to get it right again in order to make the code more readable.
Please reject if you think it's not worth it.
Closes https://github.com/facebook/rocksdb/pull/3504

Differential Revision: D6996011

Pulled By: miasantreble

fbshipit-source-id: 060514a3a8c910d34bad795b36eb4d278512b154
2018-02-15 11:13:37 -08:00
Maysam Yabandeh
8a04ee4fd1 WritePrepared Txn: use TransactionDBWriteOptimizations (2nd attempt)
Summary:
TransactionDB::Write can receive some optimization hints from the user. One is to skip the concurrency control mechanism. WritePreparedTxnDB is currently ignoring such hints. This patch optimizes WritePreparedTxnDB::Write for skip_concurrency_control and skip_duplicate_key_check hints.
Closes https://github.com/facebook/rocksdb/pull/3496

Differential Revision: D6971784

Pulled By: maysamyabandeh

fbshipit-source-id: cbab10ad538fa2b8bcb47e37c77724afe6e30f03
2018-02-12 16:43:40 -08:00
Siying Dong
a0931b3185 Fix UBSAN Error in WritePreparedTransactionTest
Summary:
WritePreparedTransactionTest has the UBSAN error because the wrong order of its parent class construction. Fix it.
Closes https://github.com/facebook/rocksdb/pull/3478

Differential Revision: D6928975

Pulled By: siying

fbshipit-source-id: 13edfd5cb9cf73f1ac5ae3b6f53061d32783733d
2018-02-07 14:57:35 -08:00
Maysam Yabandeh
8feee28020 Add skip_cc option to TransactionDB::Write
Summary:
Compared to DB::Write, TransactionDB::Write has the additional overhead of creating and initializing an internal transaction object, as well as the overhead of locking/unlocking the keys. This patch extends the TransactionDB::Write with an skip_cc option to allow the users to indicate that the write batch do not conflict with others and the concurrency control and its overhead can be skipped. TransactionDB::Write by default calls DB::Write when skip_cc is set, which works for WriteCommitted WritePolicy. Any other flavor of TransactionDB that is not compatible with this default behavior (such as WritePreparedTxnDB) can extend ::Write and implement their own approach for taking into account the skip_cc optimization.
Closes https://github.com/facebook/rocksdb/pull/3457

Differential Revision: D6877318

Pulled By: maysamyabandeh

fbshipit-source-id: 56f4e21db87ff71492db4e376fb7c2b03dfeab6b
2018-02-06 15:28:24 -08:00
Maysam Yabandeh
8f8eb4f1c0 Fix leak report by asan on DuplicateKeys test
Summary:
Deletes the transaction object at the end of the test.
Verified by:
- COMPILE_WITH_ASAN=1 make -j32 transaction_test
- ./transaction_test --gtest_filter="DBA**Duplicate*"
Closes https://github.com/facebook/rocksdb/pull/3470

Differential Revision: D6916473

Pulled By: maysamyabandeh

fbshipit-source-id: 8303df25408635d5d3ac2b25f309a3d15957c937
2018-02-06 14:26:35 -08:00
Maysam Yabandeh
88d8b2a2f5 WritePrepared Txn: Duplicate Keys, Txn Part
Summary:
This patch takes advantage of memtable being able to detect duplicate <key,seq> and returning TryAgain to handle duplicate keys in WritePrepared Txns. Through WriteBatchWithIndex's index it detects existence of at least a duplicate key in the write batch. If duplicate key was reported, it then pays the cost of counting the number of sub-patches by iterating over the write batch and pass it to DBImpl::Write. DB will make use of the provided batch_count to assign proper sequence numbers before sending them to the WAL. When later inserting the batch to the memtable, it increases the seq each time memtbale reports a duplicate (a sub-patch in our counting) and tries again.
Closes https://github.com/facebook/rocksdb/pull/3455

Differential Revision: D6873699

Pulled By: maysamyabandeh

fbshipit-source-id: db8487526c3a5dc1ddda0ea49f0f979b26ae648d
2018-02-05 18:43:24 -08:00
Tamir Duberstein
cd5092e168 Suppress unused warnings
Summary:
- Use `__unused__` everywhere
- Suppress unused warnings in Release mode
    + This currently affects non-MSVC builds (e.g. mingw64).
Closes https://github.com/facebook/rocksdb/pull/3448

Differential Revision: D6885496

Pulled By: miasantreble

fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211
2018-02-02 12:27:07 -08:00
Yi Wu
e62a763752 Blob DB: miscellaneous changes
Summary:
* Expose garbage collection related options
* Minor logging and counter name update
* Remove unused constants.
Closes https://github.com/facebook/rocksdb/pull/3451

Differential Revision: D6867077

Pulled By: yiwu-arbug

fbshipit-source-id: 6c3272a9c9d78b125a0bd6b2e56d00d087cdd6c8
2018-01-31 18:13:23 -08:00
Maysam Yabandeh
ec225d2e97 Make WithParamInterface virtual in transaction_test
Summary:
Without this patch, ubsan_check is currently failing with this error:
```
utilities/transactions/write_prepared_transaction_test.cc:369:63: runtime error: member call on address 0x0000051649f8 which does not point to an object of type 'WithParamInterface'
0x0000051649f8: note: object has invalid vptr
```
Tested by `COMPILE_WITH_UBSAN=1 make -j32 transaction_test` and running `./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SnapshotConcurrentAccessTest.SnapshotConcurrentAccessTest1/0`
Closes https://github.com/facebook/rocksdb/pull/3444

Differential Revision: D6850087

Pulled By: maysamyabandeh

fbshipit-source-id: 5b254da8504b8757f7aec8a820ad464154da1a1d
2018-01-30 16:26:56 -08:00
Andrew Kryczka
f3fe6f883b fix for checkpoint directory with trailing slash(es)
Summary:
previously if `checkpoint_dir` contained a trailing slash, we'd attempt to create the `.tmp` directory under `checkpoint_dir` due to simply concatenating `checkpoint_dir + ".tmp"`. This failed because `checkpoint_dir` hadn't been created yet and our directory creation is non-recursive. This PR fixes the issue by always creating the `.tmp` directory in the same parent as `checkpoint_dir` by stripping trailing slashes before concatenating.
Closes https://github.com/facebook/rocksdb/pull/3275

Differential Revision: D6574952

Pulled By: ajkr

fbshipit-source-id: a6daa6777a901eac2460cd0140c9515f7241aefc
2018-01-29 21:11:42 -08:00
Maysam Yabandeh
3073b1c573 Split SnapshotConcurrentAccessTest into 20 sub tests
Summary:
SnapshotConcurrentAccessTest sometimes times out when running on the test infra. This patch splits the test into smaller sub-tests to avoid the timeout. It also benefits from lower run-time of each sub-test and increases the coverage of the test. The overall run-time of each final sub-test is at most half of the original test so we should no longer see a timeout.
Closes https://github.com/facebook/rocksdb/pull/3435

Differential Revision: D6839427

Pulled By: maysamyabandeh

fbshipit-source-id: d53fdb157109e2438ca7fe447d0cf4b71f304bd8
2018-01-29 17:12:55 -08:00
Mark Isaacson
b8eb32f8cf Suppress lint in old files
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.

Reviewed By: yiwu-arbug

Differential Revision: D6821806

fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
2018-01-29 12:56:42 -08:00
Yi Wu
7e3d3326ce Blob DB: dump blob_db_options.min_blob_size
Summary:
min_blob_size was missing from BlobDBOptions::Dump.
Closes https://github.com/facebook/rocksdb/pull/3400

Differential Revision: D6781525

Pulled By: yiwu-arbug

fbshipit-source-id: 40d9b391578d7f8c91bd89f4ce2eda5064864c25
2018-01-22 22:41:27 -08:00
Yi Wu
f2f034ef3b Blob DB: fix crash when DB full but no candidate file to evict
Summary:
When blob_files is empty, std::min_element will return blobfiles.end(), which cannot be dereference. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3387

Differential Revision: D6764927

Pulled By: yiwu-arbug

fbshipit-source-id: 86f78700132be95760d35ac63480dfd3a8bbe17a
2018-01-19 16:26:50 -08:00
jonasf
4decff6fa8 Add possibility to change ttl on open DB
Summary:
We have seen cases where it could be good to change TTL on already open DB.
Change ttl in TtlCompactionFilterFactory on open db.
Next time a filter is created, it will filter accroding to the set TTL.

Is this something that could be useful for others?
Any downsides?
Closes https://github.com/facebook/rocksdb/pull/3292

Differential Revision: D6731993

Pulled By: miasantreble

fbshipit-source-id: 73b94d69237b11e8730734389052429d621a6b1e
2018-01-18 10:42:15 -08:00
Andrew Kryczka
bafec6bb30 Fix checkpoint_test directory setup/cleanup
Summary:
- Change directory name from "db_test" to "checkpoint_test". Previously it used the same directory as `db_test`
- Systematically cleanup snapshot and snapshot staging directories before each test. Previously a failed test run caused subsequent runs to fail, particularly when the first failure caused "snapshot.tmp" to not be cleaned up.
Closes https://github.com/facebook/rocksdb/pull/3351

Differential Revision: D6691015

Pulled By: ajkr

fbshipit-source-id: 4fc2ac2e21ff2617ea0e96297c5132b5f2eefd79
2018-01-10 12:26:49 -08:00
Maysam Yabandeh
00b33c2474 WritePrepared Txn: address some pending TODOs
Summary:
This patch addresses a couple of minor TODOs for WritePrepared Txn such as double checking some assert statements at runtime as well, skip extra AddPrepared in non-2pc transactions, and safety check for infinite loops.
Closes https://github.com/facebook/rocksdb/pull/3302

Differential Revision: D6617002

Pulled By: maysamyabandeh

fbshipit-source-id: ef6673c139cb49f64c0879508d2f573b78609aca
2018-01-09 08:57:20 -08:00
Yi Wu
30a017feca Blob DB: avoid having a separate read of checksum
Summary:
Previously on a blob db read, we are making a read of the blob value, and then make another read to get CRC checksum. I'm combining the two read into one.

readrandom db_bench with 1G database with base db size of 13M, value size 1k:
`./db_bench --db=/home/yiwu/tmp/db_bench --use_blob_db --value_size=1024 --num=1000000 --benchmarks=readrandom --use_existing_db --cache_size=32000000`
master: throughput 234MB/s, get micros p50 5.984 p95 9.998 p99 20.817 p100 787
this PR: throughput 261MB/s, get micros p50 5.157 p95 9.928 p99 20.724 p100 190
Closes https://github.com/facebook/rocksdb/pull/3301

Differential Revision: D6615950

Pulled By: yiwu-arbug

fbshipit-source-id: 052410c6d8539ec0cc305d53793bbc8f3616baa3
2018-01-05 16:41:58 -08:00
Maysam Yabandeh
1c9ada59cc Remove assert(s.ok()) from ::DeleteFile
Summary:
DestroyDB that is used in tests loops over the files returned by ::GetChildren and delete them one by one. Such files might be already deleted in the file system (during DeleteObsoleteFileImpl for example) but will get actually deleted with a delay sometimes before ::DeleteFile is called on the file name. We have some test failures where FaultInjectionTestEnv::DeleteFile fails on assert(s.ok()) during DestroyDB. This patch removes the assert statement to fix that.
Closes https://github.com/facebook/rocksdb/pull/3324

Differential Revision: D6659545

Pulled By: maysamyabandeh

fbshipit-source-id: 4c9552fbcd494dcf3e61d475c11fc965c4388b2c
2018-01-04 11:11:45 -08:00
Yi Wu
e763e1b623 BlobDB: dump blob db options on open
Summary:
We dump blob db options on blob db open, but it was removed by mistake in #3246. Adding it back.
Closes https://github.com/facebook/rocksdb/pull/3298

Differential Revision: D6607177

Pulled By: yiwu-arbug

fbshipit-source-id: 2a4aacbfa52fd8f1878dc9e1fbb95fe48faf80c0
2017-12-19 16:57:12 -08:00
Yi Wu
48cf8da2bb BlobDB: update blob_db_options.bytes_per_sync behavior
Summary:
Previously, if blob_db_options.bytes_per_sync, there is a background job to call fsync() for every bytes_per_sync bytes written to a blob file. With the change we simply pass bytes_per_sync as env_options_ to blob files so that sync_file_range() will be used instead.
Closes https://github.com/facebook/rocksdb/pull/3297

Differential Revision: D6606994

Pulled By: yiwu-arbug

fbshipit-source-id: 452424be52e32ba92f5ea603b564e9b88929af47
2017-12-19 16:41:41 -08:00
Yi Wu
06149429d9 WritePrepared Txn: Return NotSupported on iterator refresh
Summary:
A proper implementation of Iterator::Refresh() for WritePreparedTxnDB would require release and acquire another snapshot. Since MyRocks don't make use of Iterator::Refresh(), we just simply mark it as not supported.
Closes https://github.com/facebook/rocksdb/pull/3290

Differential Revision: D6599931

Pulled By: yiwu-arbug

fbshipit-source-id: 4e1632d967316431424f6e458254ecf9a97567cf
2017-12-18 22:29:30 -08:00
Guo Xiao
aa6509d8e4 Fix build for linux
Summary:
* Include `unistd.h` for `sleep(3)`
* Include `sys/time.h` for `gettimeofday(3)`
* Include `utils/random.h` for `Random64`

Error messages:

utilities/persistent_cache/hash_table_bench.cc: In constructor ‘rocksdb::HashTableBenchmark::HashTableBenchmark(rocksdb::HashTableImpl<long unsigned int, std::__cxx11::basic_string<char> >*, size_t, size_t, size_t, size_t)’:
utilities/persistent_cache/hash_table_bench.cc:76:28: error: ‘sleep’ was not declared in this scope
       /* sleep override */ sleep(1);
                            ^~~~~
utilities/persistent_cache/hash_table_bench.cc:76:28: note: suggested alternative: ‘strsep’
       /* sleep override */ sleep(1);
                            ^~~~~
                            strsep
utilities/persistent_cache/hash_table_bench.cc: In member function ‘void rocksdb::HashTableBenchmark::RunRead()’:
utilities/persistent_cache/hash_table_bench.cc:107:5: error: ‘Random64’ was not declared in this scope
     Random64 rgen(time(nullptr));
     ^~~~~~~~
utilities/persistent_cache/hash_table_bench.cc:107:5: note: suggested alternative: ‘random_r’
     Random64 rgen(time(nullptr));
     ^~~~~~~~
     random_r
utilities/persistent_cache/hash_table_bench.cc:110:18: error: ‘rgen’ was not declared in this scope
       size_t k = rgen.Next() % max_prepop_key;
                  ^~~~
utilities/persistent_cache/hash_table_bench.cc: In static member function ‘static uint64_t rocksdb::HashTableBenchmark::NowInMillSec()’:
utilities/persistent_cache/hash_table_bench.cc:153:5: error: ‘gettimeofday’ was not declared in this scope
     gettimeofday(&tv, /*tz=*/nullptr);
     ^~~~~~~~~~~~
make[2]: *** [CMakeFiles/hash_table_bench.dir/build.make:63: CMakeFiles/hash_table_bench.dir/utilities/persistent_cache/hash_table_bench.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3346: CMakeFiles/hash_table_bench.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Closes https://github.com/facebook/rocksdb/pull/3283

Differential Revision: D6594850

Pulled By: ajkr

fbshipit-source-id: fd83957338c210cdfd253763347aafd39476824f
2017-12-18 12:28:03 -08:00
Maysam Yabandeh
a6d3c762df WritePrepared Txn: non-2pc write in one round
Summary:
Currently non-2pc writes do the 2nd dummy write to actually commit the transaction. This was necessary to ensure that publishing the commit sequence number will be done only from one queue (the queue that does not write to memtable). This is however not necessary when we have only one write queue, which is actually the setup that would be used by non-2pc writes. This patch eliminates the 2nd write when two_write_queues are disabled by updating the commit map in the 1st write.
Closes https://github.com/facebook/rocksdb/pull/3277

Differential Revision: D6575392

Pulled By: maysamyabandeh

fbshipit-source-id: 8ab458f7ca506905962f9166026b2ec81e749c46
2017-12-18 08:19:43 -08:00
Yi Wu
237b292515 BlobDB: Remove the need to get sequence number per write
Summary:
Previously we store sequence number range of each blob files, and use the sequence number range to check if the file can be possibly visible by a snapshot. But it adds complexity to the code, since the sequence number is only available after a write. (The current implementation get sequence number by calling GetLatestSequenceNumber(), which is wrong.) With the patch, we are not storing sequence number range, and check if snapshot_sequence < obsolete_sequence to decide if the file is visible by a snapshot (previously we check if first_sequence <= snapshot_sequence < obsolete_sequence).
Closes https://github.com/facebook/rocksdb/pull/3274

Differential Revision: D6571497

Pulled By: yiwu-arbug

fbshipit-source-id: ca06479dc1fcd8782f6525b62b7762cd47d61909
2017-12-15 13:27:30 -08:00
Andrew Kryczka
a79c7c05e8 fix backup meta-file buffer overrun
Summary:
- check most times after calling snprintf that the buffer didn't fill up. Previously we'd proceed and use `buf_size - len` as the length in subsequent calls, which underflowed as those are unsigned size_t.
- replace some memcpys with snprintf for consistency
Closes https://github.com/facebook/rocksdb/pull/3255

Differential Revision: D6541464

Pulled By: ajkr

fbshipit-source-id: 8610ea6a24f38e0a37c6d17bc65b7c712da6d932
2017-12-15 12:29:16 -08:00
Orvid King
b4d88d7128 Fix the build with MSVC 2017
Summary:
There were a few places where MSVC's implicit truncation warnings were getting triggered, which was causing the MSVC build to fail due to warnings being treated as errors. This resolves the issues by making the truncations in some places explicit, and by making it so there are no truncations of literals.

Fixes #3239
Supersedes #3259
Closes https://github.com/facebook/rocksdb/pull/3273

Reviewed By: yiwu-arbug

Differential Revision: D6569204

Pulled By: Orvid

fbshipit-source-id: c188cf1cf98d9acb6d94b71875041cc81f8ff088
2017-12-14 12:02:22 -08:00
Maysam Yabandeh
35dfbd58dd WritePrepared Txn: GC old_commit_map_
Summary:
Garbage collect entries from old_commit_map_ when the corresponding snapshots are released.
Closes https://github.com/facebook/rocksdb/pull/3247

Differential Revision: D6528478

Pulled By: maysamyabandeh

fbshipit-source-id: 15d1566d85d4ac07036bc0dc47418f6c3228d4bf
2017-12-13 07:57:43 -08:00
Yi Wu
250a51a3f9 BlobDB: refactor DB open logic
Summary:
Refactor BlobDB open logic. List of changes:

Major:
* On reopen, mark blob files found as immutable, do not use them for writing new keys.
* Not to scan the whole file to find file footer. Instead just seek to the end of the file and try to read footer.

Minor:
* Move most of the real logic from blob_db.cc to blob_db_impl.cc.
* Not to hold shared_ptr of event listeners in global maps in blob_db.cc
* Some changes to BlobFile interface.
* Improve logging and error handling.
Closes https://github.com/facebook/rocksdb/pull/3246

Differential Revision: D6526147

Pulled By: yiwu-arbug

fbshipit-source-id: 9dc4cdd63359a2f9b696af817086949da8d06952
2017-12-11 12:12:38 -08:00
Maysam Yabandeh
36911f55dd WritePrepared Txn: stress test
Summary:
Augment the existing MySQLStyleTransactionTest to check for more core case scenarios. The changes showed effective in revealing the bugs reported in https://github.com/facebook/rocksdb/pull/3205 and https://github.com/facebook/rocksdb/pull/3101
Closes https://github.com/facebook/rocksdb/pull/3222

Differential Revision: D6476862

Pulled By: maysamyabandeh

fbshipit-source-id: 5068497702d67ffc206a58ed96f8578fbb510137
2017-12-06 09:42:28 -08:00
Andrew Kryczka
63f1c0a57d fix gflags namespace
Summary:
I started adding gflags support for cmake on linux and got frustrated that I'd need to duplicate the build_detect_platform logic, which determines namespace based on attempting compilation. We can do it differently -- use the GFLAGS_NAMESPACE macro if available, and if not, that indicates it's an old gflags version without configurable namespace so we can simply hardcode "google".
Closes https://github.com/facebook/rocksdb/pull/3212

Differential Revision: D6456973

Pulled By: ajkr

fbshipit-source-id: 3e6d5bde3ca00d4496a120a7caf4687399f5d656
2017-12-01 10:42:05 -08:00
Maysam Yabandeh
18dcf7f98d WritePrepared Txn: PreReleaseCallback
Summary:
Add PreReleaseCallback to be called at the end of WriteImpl but before publishing the sequence number. The callback is used in WritePrepareTxn to i) update the commit map, ii) update the last published sequence number in the 2nd write queue. It also ensures that all the commits will go to the 2nd queue.
These changes will ensure that the commit map is updated before the sequence number is published and used by reading snapshots. If we use two write queues, the snapshots will use the seq number published by the 2nd queue. If we use one write queue (the default, the snapshots will use the last seq number in the memtable, which also indicates the last published seq number.
Closes https://github.com/facebook/rocksdb/pull/3205

Differential Revision: D6438959

Pulled By: maysamyabandeh

fbshipit-source-id: f8b6c434e94bc5f5ab9cb696879d4c23e2577ab9
2017-11-30 23:50:45 -08:00
Prashant D
81cf262ff5 utilities/backupable : Fix coverity issues
Summary:
1. Class BackupMeta
```
52      : timestamp_(0), size_(0), meta_filename_(meta_filename),

CID 1168103 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member sequence_number_ is not initialized in this constructor nor in any functions that it calls.
153        file_infos_(file_infos), env_(env) {}
```
2. class BackupEngineImpl
```
513  }
        7. uninit_member: Non-static class member latest_backup_id_ is not initialized in this constructor nor in any functions that it calls.

CID 1322803 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
9. uninit_member: Non-static class member latest_valid_backup_id_ is not initialized in this constructor nor in any functions that it calls.
514}
```
3. struct BackupAfterCopyOrCreateWorkItem
```
368  struct BackupAfterCopyOrCreateWorkItem {
369    std::future<CopyOrCreateResult> result;
        1. member_decl: Class member declaration for shared.
370    bool shared;
        3. member_decl: Class member declaration for needed_to_copy.
371    bool needed_to_copy;
        5. member_decl: Class member declaration for backup_env.
372    Env* backup_env;
373    std::string dst_path_tmp;
374    std::string dst_path;
375    std::string dst_relative;
        2. uninit_member: Non-static class member shared is not initialized in this constructor nor in any functions that it calls.
        4. uninit_member: Non-static class member needed_to_copy is not initialized in this constructor nor in any functions that it calls.

CID 1396122 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
6. uninit_member: Non-static class member backup_env is not initialized in this constructor nor in any functions that it calls.
376    BackupAfterCopyOrCreateWorkItem() {}
```
4. struct CopyOrCreateWorkItem
```
318  struct CopyOrCreateWorkItem {
319    std::string src_path;
320    std::string dst_path;
321    std::string contents;
        1. member_decl: Class member declaration for src_env.
322    Env* src_env;
        3. member_decl: Class member declaration for dst_env.
323    Env* dst_env;
        5. member_decl: Class member declaration for sync.
324    bool sync;
        7. member_decl: Class member declaration for rate_limiter.
325    RateLimiter* rate_limiter;
        9. member_decl: Class member declaration for size_limit.
326    uint64_t size_limit;
327    std::promise<CopyOrCreateResult> result;
328    std::function<void()> progress_callback;
329
        2. uninit_member: Non-static class member src_env is not initialized in this constructor nor in any functions that it calls.
        4. uninit_member: Non-static class member dst_env is not initialized in this constructor nor in any functions that it calls.
        6. uninit_member: Non-static class member sync is not initialized in this constructor nor in any functions that it calls.
        8. uninit_member: Non-static class member rate_limiter is not initialized in this constructor nor in any functions that it calls.

CID 1396123 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
10. uninit_member: Non-static class member size_limit is not initialized in this constructor nor in any functions that it calls.
330    CopyOrCreateWorkItem() {}
```
5. struct RestoreAfterCopyOrCreateWorkItem
```
struct RestoreAfterCopyOrCreateWorkItem {
410    std::future<CopyOrCreateResult> result;
        1. member_decl: Class member declaration for checksum_value.
411    uint32_t checksum_value;

CID 1396153 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member checksum_value is not initialized in this constructor nor in any functions that it calls.
412    RestoreAfterCopyOrCreateWorkItem() {}
```
Closes https://github.com/facebook/rocksdb/pull/3131

Differential Revision: D6428556

Pulled By: sagar0

fbshipit-source-id: a86675444543eff028e3cae6942197a143a112c4
2017-11-28 14:43:28 -08:00
Prashant D
b45fbc1175 utilities: Fix coverity issues
Summary:
```
utilities/persistent_cache/block_cache_tier_file.cc
64struct CacheRecordHeader {
   	2. uninit_member: Non-static class member magic_ is not initialized in this constructor nor in any functions that it calls.
   	4. uninit_member: Non-static class member crc_ is not initialized in this constructor nor in any functions that it calls.
   	6. uninit_member: Non-static class member key_size_ is not initialized in this constructor nor in any functions that it calls.

CID 1396161 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
8. uninit_member: Non-static class member val_size_ is not initialized in this constructor nor in any functions that it calls.
 65  CacheRecordHeader() {}
 66  CacheRecordHeader(const uint32_t magic, const uint32_t key_size,
 67                    const uint32_t val_size)
 68      : magic_(magic), crc_(0), key_size_(key_size), val_size_(val_size) {}
 69
   	1. member_decl: Class member declaration for magic_.
 70  uint32_t magic_;
   	3. member_decl: Class member declaration for crc_.
 71  uint32_t crc_;
   	5. member_decl: Class member declaration for key_size_.
 72  uint32_t key_size_;
   	7. member_decl: Class member declaration for val_size_.
 73  uint32_t val_size_;
 74};

utilities/simulator_cache/sim_cache.cc:
157        miss_times_(0),

CID 1396124 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member stats_ is not initialized in this constructor nor in any functions that it calls.
158        hit_times_(0) {}
159
```
Closes https://github.com/facebook/rocksdb/pull/3155

Differential Revision: D6427237

Pulled By: sagar0

fbshipit-source-id: 97e493da5fc043c5b9a3e0d33103442cffb75aad
2017-11-28 13:27:08 -08:00
Prashant D
7b57510a17 utilities: Fix coverity issues in blob_db and col_buf_decoder
Summary:
utilities/blob_db/blob_db_impl.cc
265                    : bdb_options_.blob_dir;
   	3. uninit_member: Non-static class member env_ is not initialized in this constructor nor in any functions that it calls.
   	5. uninit_member: Non-static class member ttl_extractor_ is not initialized in this constructor nor in any functions that it calls.
   	7. uninit_member: Non-static class member open_p1_done_ is not initialized in this constructor nor in any functions that it calls.

CID 1418245 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
9. uninit_member: Non-static class member debug_level_ is not initialized in this constructor nor in any functions that it calls.
266}

   	4. past_the_end: Function end creates an iterator.

CID 1418258 (#1 of 1): Using invalid iterator (INVALIDATE_ITERATOR)
5. deref_iterator: Dereferencing iterator file_nums.end() though it is already past the end of its container.

utilities/col_buf_decoder.h:
     nullable_(nullable),
   	2. uninit_member: Non-static class member remain_runs_ is not initialized in this constructor nor in any functions that it calls.
   	4. uninit_member: Non-static class member run_val_ is not initialized in this constructor nor in any functions that it calls.

CID 1396134 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
6. uninit_member: Non-static class member last_val_ is not initialized in this constructor nor in any functions that it calls.
 46        big_endian_(big_endian) {}
Closes https://github.com/facebook/rocksdb/pull/3134

Differential Revision: D6340607

Pulled By: sagar0

fbshipit-source-id: 25c52566e2ff979fe6c7abb0f40c27fc16597054
2017-11-28 12:27:57 -08:00
Yi Wu
78279350aa Blob DB: Add statistics
Summary:
Adding a list of blob db counters.

Also remove WaStats() which doesn't expose the stats and can be substitute by (BLOB_DB_BYTES_WRITTEN / BLOB_DB_BLOB_FILE_BYTES_WRITTEN).
Closes https://github.com/facebook/rocksdb/pull/3193

Differential Revision: D6394216

Pulled By: yiwu-arbug

fbshipit-source-id: 017508c8ff3fcd7ea7403c64d0f9834b24816803
2017-11-28 11:58:49 -08:00
Maysam Yabandeh
b72b3c6f51 WritePrepared Txn: Add MultiGet to DB
Summary:
This patch implements MultiGet API for WritePreparedTxnDB and update the existing unit tests.
Closes https://github.com/facebook/rocksdb/pull/3196

Differential Revision: D6401493

Pulled By: maysamyabandeh

fbshipit-source-id: 51501a1e32645fc2da8680e77a50035f6530f2cc
2017-11-27 08:56:21 -08:00
Yi Wu
f0dde49cda Blob DB: Fix GC handling for inlined blob
Summary:
Garbage collection checks if the offset in blob index matches the offset of the blob value in the file. If it is a mismatch, the value is the current version. However it failed to check if the blob index is an inlined type, which don't even have an offset. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3194

Differential Revision: D6394270

Pulled By: yiwu-arbug

fbshipit-source-id: 7c2b9d795f1116f55f4d728086980f9b6e88ea78
2017-11-24 11:56:47 -08:00
Sagar Vemuri
578f36e431 Blob DB: Remove some redundant log lines
Summary:
Saw some redundant log lines when trying to benchmark blob db. So, removed the lines from blob_file.cc, and let the lines in blob_db_impl.cc take the lead.
Closes https://github.com/facebook/rocksdb/pull/3189

Differential Revision: D6381726

Pulled By: sagar0

fbshipit-source-id: 5f0b1e56fe4bc3b715d89ea9b5749bd935cd0606
2017-11-20 21:11:34 -08:00
Maysam Yabandeh
53863b76f9 WritePrepared Txn: fix bug with Rollback seq
Summary:
The sequence number was not properly advanced after a rollback marker. The patch extends the existing unit tests to detect the bug and also fixes it.
Closes https://github.com/facebook/rocksdb/pull/3157

Differential Revision: D6304291

Pulled By: maysamyabandeh

fbshipit-source-id: 1b519c44a5371b802da49c9e32bd00087a8da401
2017-11-15 08:27:06 -08:00
Yi Wu
42564ada53 Blob DB: not using PinnableSlice move assignment
Summary:
The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky.
Closes https://github.com/facebook/rocksdb/pull/3164

Differential Revision: D6319201

Pulled By: yiwu-arbug

fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48
2017-11-13 18:12:20 -08:00
Maysam Yabandeh
2515266725 WritePrepared Txn: Refactoring TrackKeys
Summary:
This patch clarifies and refactors the logic around tracked keys in transactions.
Closes https://github.com/facebook/rocksdb/pull/3140

Differential Revision: D6290258

Pulled By: maysamyabandeh

fbshipit-source-id: 03b50646264cbcc550813c060b180fc7451a55c1
2017-11-11 13:14:20 -08:00
Maysam Yabandeh
2edc92bc28 WritePrepared Txn: cross-compatibility test
Summary:
Add tests to ensure that WritePrepared and WriteCommitted policies are cross compatible when the db WAL is empty. This is important when the admin want to switch between the policies. In such case, before the switch the admin needs to empty the WAL by i) committing/rollbacking all the pending transactions, ii) FlushMemTables
Closes https://github.com/facebook/rocksdb/pull/3118

Differential Revision: D6227247

Pulled By: maysamyabandeh

fbshipit-source-id: bcde3d92c1e89cda3b9cfa69f6a20af5d8993db7
2017-11-11 11:28:37 -08:00
Maysam Yabandeh
857adf388f WritePrepared Txn: Refactor conf params
Summary:
Summary of changes:
- Move seq_per_batch out of Options
- Rename concurrent_prepare to two_write_queues
- Add allocate_seq_only_for_data_
Closes https://github.com/facebook/rocksdb/pull/3136

Differential Revision: D6304458

Pulled By: maysamyabandeh

fbshipit-source-id: 08e685bfa82bbc41b5b1c5eb7040a8ca6e05e58c
2017-11-10 17:28:12 -08:00
Dmitri Smirnov
f8e2db0717 Fix crashes, address test issues and adjust windows test script
Summary:
Add per-exe execution capability
  Add fix parsing of groups/tests
  Add timer test exclusion

 Fix unit tests
  Ifdef threadpool specific tests that do not pass on Vista threadpool.
  Remove spurious outout from prefix_test so test case listing works
  properly.
  Fix not using standard test directories results in file creation errors
  in sst_dump_test.

  BlobDb fixes:
    In C++ end() iterators can not be dereferenced. They are not valid.
	When deleting blob_db_ set it to nullptr before any other code executes.
	Not fixed:. On Windows you can not delete a file while it is open.
	[ RUN      ] BlobDBTest.ReadWhileGC
	d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options)
	IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied
	d:\dev\rocksdb\rocksdb\utilities\blob_db\blob_db_test.cc(75): error: DestroyBlobDB(dbname_, options, bdb_options)
	IO error: Failed to delete: d:/mnt/db\testrocksdb-17444/blob_db_test/blob_dir/000001.blob: Permission denied

  write_batch
    Should not call front() if there is a chance the container is empty
Closes https://github.com/facebook/rocksdb/pull/3152

Differential Revision: D6293274

Pulled By: sagar0

fbshipit-source-id: 318c3717c22087fae13b18715dffb24565dbd956
2017-11-10 10:41:57 -08:00
Yi Wu
5e9e5a4702 Blob DB: Fix race condition between flush and write
Summary:
A race condition will happen when:
* a user thread writes a value, but it hits the write stop condition because there are too many un-flushed memtables, while holding blob_db_impl.write_mutex_.
* Flush is triggered and call flush begin listener and try to acquire blob_db_impl.write_mutex_.

Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3149

Differential Revision: D6279805

Pulled By: yiwu-arbug

fbshipit-source-id: 0e3c58afb78795ebe3360a2c69e05651e3908c40
2017-11-08 19:42:22 -08:00
Yi Wu
ca75f0a64a Blob DB: Fix release build
Summary:
`compression` shadow the method name in `BlobFile`. Rename it.
Closes https://github.com/facebook/rocksdb/pull/3148

Differential Revision: D6274498

Pulled By: yiwu-arbug

fbshipit-source-id: 7d293596530998b23b6b8a8940f983f9b6343a98
2017-11-08 13:14:20 -08:00
Yi Wu
4f9f124347 Blob DB: use compression in file header instead of global options
Summary:
To fix the issue of failing to decompress existing value after reopen DB with a different compression settings.
Closes https://github.com/facebook/rocksdb/pull/3142

Differential Revision: D6267260

Pulled By: yiwu-arbug

fbshipit-source-id: c7cf7f3e33b0cd25520abf4771cdf9180cc02a5f
2017-11-07 17:42:17 -08:00
Manuel Ung
e03377c7fd Add lock wait time as a perf context counter
Summary:
Adds two new counters:

`key_lock_wait_count` counts how many times a lock was blocked by another transaction and had to wait, instead of being granted the lock immediately.
`key_lock_wait_time` counts the time spent acquiring locks.
Closes https://github.com/facebook/rocksdb/pull/3107

Differential Revision: D6217332

Pulled By: lth

fbshipit-source-id: 55d4f46da5550c333e523263422fd61d6a46deb9
2017-11-06 10:57:19 -08:00
Yi Wu
be410dede8 Fix PinnableSlice move assignment
Summary:
After move assignment, we need to re-initialized the moved PinnableSlice.

Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move.
Closes https://github.com/facebook/rocksdb/pull/3127

Differential Revision: D6238585

Pulled By: yiwu-arbug

fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e
2017-11-03 18:13:21 -07:00
Yi Wu
2581c0a5a1 Blob DB: Fix BlobDBTest::SnapshotAndGarbageCollection asan failure
Summary:
Fix unreleased snapshot at the end of the test.
Closes https://github.com/facebook/rocksdb/pull/3126

Differential Revision: D6232867

Pulled By: yiwu-arbug

fbshipit-source-id: 651ca3144fc573ea2ab0ab20f0a752fb4a101d26
2017-11-03 10:26:59 -07:00