Commit Graph

702 Commits

Author SHA1 Message Date
Maysam Yabandeh
f46464d383 write-prepared txn: call IsInSnapshot
Summary:
This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
Closes https://github.com/facebook/rocksdb/pull/2850

Differential Revision: D5787375

Pulled By: maysamyabandeh

fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
2017-09-11 09:14:48 -07:00
Maysam Yabandeh
9a4df72994 WritePrepared Txn: CommitBatch
Summary:
Implements CommitBatch and CommitWithoutPrepare for WritePreparedTxn
Closes https://github.com/facebook/rocksdb/pull/2854

Differential Revision: D5793999

Pulled By: maysamyabandeh

fbshipit-source-id: d8b9858221162c6ac7a1f6912cbd3481d0d8a503
2017-09-08 15:56:39 -07:00
Maysam Yabandeh
fce6c892ab Advance max evicted seq in coarser granularity
Summary:
This patch advances the max_evicted_seq_ is larger granularities to reduce the overhead of updating the relevant data structures.

It also refactor the related code and adds testing to that. As part of this patch some of the TODOs for removing usage of non-static const members are also addressed.
Closes https://github.com/facebook/rocksdb/pull/2844

Differential Revision: D5772928

Pulled By: maysamyabandeh

fbshipit-source-id: f4fcc2948be69c034f10812cf922ce5ab82ef98c
2017-09-08 14:41:22 -07:00
Yi Wu
dcd36a6aee Make it explicit blob db doesn't support CF
Summary:
Blob db doesn't currently support column families. Return NotSupported status explicitly.
Closes https://github.com/facebook/rocksdb/pull/2825

Differential Revision: D5757438

Pulled By: yiwu-arbug

fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa
2017-09-08 11:11:04 -07:00
Siying Dong
0e99323ac2 Fix CLANG Analyze
Summary:
clang analyze shows warnings after we upgrade the CLANG version. Fix them.
Closes https://github.com/facebook/rocksdb/pull/2839

Differential Revision: D5769060

Pulled By: siying

fbshipit-source-id: 3f8e4df715590d8984f6564b608fa08cfdfa5f14
2017-09-07 14:28:06 -07:00
Maysam Yabandeh
7e19a571e9 Remove unused TransactionCallback
Summary:
TransactionCallback was never used. Remove it to avoid confusion.
Closes https://github.com/facebook/rocksdb/pull/2853

Differential Revision: D5787219

Pulled By: maysamyabandeh

fbshipit-source-id: e2b6a89537e3770a269ad38be71c4b0b160a88ac
2017-09-07 12:17:18 -07:00
Maysam Yabandeh
79810e2d49 Skip write_prepared_transaction_test in travis
Summary:
The patch skips write_prepared_transaction_test from travis as they time out there. They are still covered in daily runs of tests.
Closes https://github.com/facebook/rocksdb/pull/2836

Differential Revision: D5767203

Pulled By: maysamyabandeh

fbshipit-source-id: 51045ef98a745197136e14b2ec02fc6f38081b75
2017-09-05 15:29:52 -07:00
Yi Wu
ab95e293d2 Fix memory leak on blob db open
Summary:
Fixes #2820
Closes https://github.com/facebook/rocksdb/pull/2826

Differential Revision: D5757527

Pulled By: yiwu-arbug

fbshipit-source-id: f495b63700495aeaade30a1da5e3675848f3d72f
2017-09-01 14:13:51 -07:00
Maysam Yabandeh
37ae8cc60f Signal progress of the test to avoid timeout
Summary: Closes https://github.com/facebook/rocksdb/pull/2824

Differential Revision: D5756457

Pulled By: maysamyabandeh

fbshipit-source-id: dff53e945d8ac4ffe6775a2176424fd1a27fc189
2017-09-01 11:28:52 -07:00
Dmitri Smirnov
0ec90a7cc2 Add -DPORTABLE=1 to MSVC CI build
Summary:
Add -DPORTABLE=1
  port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running.
Closes https://github.com/facebook/rocksdb/pull/2815

Differential Revision: D5751661

Pulled By: siying

fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0
2017-08-31 16:42:48 -07:00
Andrew Kryczka
b97685aef6 fix backup engine when latest backup corrupt
Summary:
Backup engine is intentionally openable even when some backups are corrupt. Previously the engine could write new backups as long as the most recent backup wasn't corrupt. This PR makes the backup engine able to create new backups even when the most recent one is corrupt.

We now maintain two ID instance variables:

- `latest_backup_id_` is used when creating backup to choose the new ID
- `latest_valid_backup_id_` is used when restoring latest backup since we want most recent valid one
Closes https://github.com/facebook/rocksdb/pull/2804

Differential Revision: D5734148

Pulled By: ajkr

fbshipit-source-id: db440707b31df2c7b084188aa5f6368449e10bcf
2017-08-31 15:41:49 -07:00
Pengchao Wang
825a22c00c garbage collect tombstones in merge operator
Summary:
Remove cassandra tombstone when reaching the max compaction level (full merge). if all columns collected key will be removed in next compaction via compaction filter
Closes https://github.com/facebook/rocksdb/pull/2791

Reviewed By: sagar0

Differential Revision: D5722465

Pulled By: wpc

fbshipit-source-id: 61e9898a5686551653a16383255aeaab3197e65e
2017-08-31 10:11:54 -07:00
Maysam Yabandeh
26ac24f199 Add more unit test to write_prepared txns
Summary: Closes https://github.com/facebook/rocksdb/pull/2798

Differential Revision: D5724173

Pulled By: maysamyabandeh

fbshipit-source-id: fb6b782d933fb4be315b1a231a6a67a66fdc9c96
2017-08-31 09:41:27 -07:00
Maysam Yabandeh
fbfa3e7a43 WriteAtPrepare: Efficient read from snapshot list
Summary:
Divide the old snapshots to two lists: a few that fit into a cached array and the rest in a vector, which is expected to be empty in normal cases. The former is to optimize concurrent reads from snapshots without requiring locks. It is done by an array of std::atomic, from which std::memory_order_acquire reads are compiled to simple read instructions in most of the x86_64 architectures.
Closes https://github.com/facebook/rocksdb/pull/2758

Differential Revision: D5660504

Pulled By: maysamyabandeh

fbshipit-source-id: 524fcf9a8e7f90a92324536456912a99aaa6740c
2017-08-26 01:00:38 -07:00
Yi Wu
503db684f7 make blob file close synchronous
Summary:
Fixing flaky blob_db_test.

To close a blob file, blob db used to add a CloseSeqWrite job to the background thread to close it. Changing file close to be synchronous in order to simplify logic, and fix flaky blob_db_test.
Closes https://github.com/facebook/rocksdb/pull/2787

Differential Revision: D5699387

Pulled By: yiwu-arbug

fbshipit-source-id: dd07a945cd435cd3808fce7ee4ea57817409474a
2017-08-25 10:41:49 -07:00
Maysam Yabandeh
cd26af3476 Add unit test for WritePrepared skeleton
Summary: Closes https://github.com/facebook/rocksdb/pull/2756

Differential Revision: D5660516

Pulled By: maysamyabandeh

fbshipit-source-id: f3f3d3b5f544007a7fbdd78e49e4738b4437c7ee
2017-08-23 13:56:03 -07:00
Andrew Kryczka
234f33a3f9 allow nullptr Slice only as sentinel
Summary:
Allow `Slice` holding nullptr as a sentinel value but not in comparisons. This new restriction eliminates the need for the manual checks in 39ef900551, while still conforming to glibc's `memcmp` API. Thanks siying for the idea. Users may need to migrate, so mentioned it in HISTORY.md.
Closes https://github.com/facebook/rocksdb/pull/2777

Differential Revision: D5686016

Pulled By: ajkr

fbshipit-source-id: 03a2ca3fd9a0ebade9d0d5686c81d59a9534f563
2017-08-23 10:56:06 -07:00
Maysam Yabandeh
ccf7f833e3 Use PinnableSlice in Transactions
Summary:
The ::Get from DB is not augmented with an overload method that takes a PinnableSlice instead of a string. Transactions however are not yet upgraded to use the new API. As a result, transaction users such as MyRocks cannot benefit from it. This patch updates the transactional API with a PinnableSlice overload.
Closes https://github.com/facebook/rocksdb/pull/2736

Differential Revision: D5645770

Pulled By: maysamyabandeh

fbshipit-source-id: f6af520df902f842de1bcf99bed3e8dfc43ad96d
2017-08-23 10:11:45 -07:00
yiwu-arbug
5b68b114f1 Blob db create a snapshot before every read
Summary:
If GC kicks in between

* A Get() reads index entry from base db.
* The Get() read from a blob file

The GC can delete the corresponding blob file, making the key not found. Fortunately we have existing logic to avoid deleting a blob file if it is referenced by a snapshot. So the fix is to explicitly create a snapshot before reading index entry from base db.
Closes https://github.com/facebook/rocksdb/pull/2754

Differential Revision: D5655956

Pulled By: yiwu-arbug

fbshipit-source-id: e4ccbc51331362542e7343175bbcbdea5830f544
2017-08-20 18:26:19 -07:00
yiwu-arbug
4624ae52c9 GC the oldest file when out of space
Summary:
When out of space, blob db should GC the oldest file. The current implementation GC the newest one instead. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/2757

Differential Revision: D5657611

Pulled By: yiwu-arbug

fbshipit-source-id: 56c30a4c52e6ab04551dda8c5c46006d4070b28d
2017-08-20 17:11:06 -07:00
Archit Mishra
bddd5d3630 Added mechanism to track deadlock chain
Summary:
Changes:
* extended the wait_txn_map to track additional information
* designed circular buffer to store n latest deadlocks' information
* added test coverage to verify the additional information tracked is accurately stored in the buffer
Closes https://github.com/facebook/rocksdb/pull/2630

Differential Revision: D5478025

Pulled By: armishra

fbshipit-source-id: 2b138de7b5a73f5ca554fc3ff8220a3be49f39e7
2017-08-17 18:56:21 -07:00
yiwu-arbug
29877ec7b4 Fix blob db crash during calculating write amp
Summary:
On initial call to BlobDBImpl::WaStats() `all_periods_write_` would be empty, so it will crash when we call pop_front() at line 1627. Apparently it is mean to pop only when `all_periods_write_.size() > kWriteAmplificationStatsPeriods`.

The whole write amp calculation doesn't seems to be correct and it is not being exposed. Will work on it later.

Test Plan
Change kWriteAmplificationStatsPeriodMillisecs to 1000 (1 second) and run db_bench --use_blob_db for 5 minutes.
Closes https://github.com/facebook/rocksdb/pull/2751

Differential Revision: D5648269

Pulled By: yiwu-arbug

fbshipit-source-id: b843d9a09bb5f9e1b713d101ec7b87e54b5115a4
2017-08-17 15:01:09 -07:00
Sagar Vemuri
8f2598ac9d Enable Cassandra merge operator to be called with a single merge operand
Summary:
Updating Cassandra merge operator to make use of a single merge operand when needed. Single merge operand support has been introduced in #2721.
Closes https://github.com/facebook/rocksdb/pull/2753

Differential Revision: D5652867

Pulled By: sagar0

fbshipit-source-id: b9fbd3196d3ebd0b752626dbf9bec9aa53e3e26a
2017-08-17 15:01:09 -07:00
follitude
ac8fb77afd fix some misspellings
Summary:
PTAL ajkr
Closes https://github.com/facebook/rocksdb/pull/2750

Differential Revision: D5648052

Pulled By: ajkr

fbshipit-source-id: 7cd1ddd61364d5a55a10fdd293fa74b2bf89dd98
2017-08-16 21:57:20 -07:00
Maysam Yabandeh
eb6425303e Update WritePrepared with the pseudo code
Summary:
Implement the main body of WritePrepared pseudo code. This includes PrepareInternal and CommitInternal, as well as AddCommitted which updates the commit map. It also provides a IsInSnapshot method that could be later called form the read path to decide if a version is in the read snapshot or it should other be skipped.

This patch lacks unit tests and does not attempt to offer an efficient implementation. The idea is that to have the API specified so that we can work on related tasks in parallel.
Closes https://github.com/facebook/rocksdb/pull/2713

Differential Revision: D5640021

Pulled By: maysamyabandeh

fbshipit-source-id: bfa7a05e8d8498811fab714ce4b9c21530514e1c
2017-08-16 16:57:47 -07:00
Sagar Vemuri
132306fbf0 Remove PartialMerge implementation from Cassandra merge operator
Summary:
`PartialMergeMulti` implementation is enough for Cassandra, and `PartialMerge` is not required. Implementing both will just duplicate the code.
As per https://github.com/facebook/rocksdb/blob/master/include/rocksdb/merge_operator.h#L130-L135 :

```
  // The default implementation of PartialMergeMulti will use this function
  // as a helper, for backward compatibility.  Any successor class of
  // MergeOperator should either implement PartialMerge or PartialMergeMulti,
  // although implementing PartialMergeMulti is suggested as it is in general
  // more effective to merge multiple operands at a time instead of two
  // operands at a time.
```
Closes https://github.com/facebook/rocksdb/pull/2737

Reviewed By: scv119

Differential Revision: D5633073

Pulled By: sagar0

fbshipit-source-id: ef4fa102c22fec6a0175ed12f5c44c15afe3c8ca
2017-08-15 14:59:34 -07:00
yiwu-arbug
e5a1b727c0 Fix blob DB transaction usage while GC
Summary:
While GC, blob DB use optimistic transaction to delete or replace the index entry in LSM, to guarantee correctness if there's a normal write writing to the same key. However, the previous implementation doesn't call SetSnapshot() nor use GetForUpdate() of transaction API, instead it do its own sequence number checking before beginning the transaction. A normal write can sneak in after the sequence number check and overwrite the key, and the GC will delete or relocate the old version of the key by mistake. Update the code to property use GetForUpdate() to check the existing index entry.

After the patch the sequence number store with each blob record is useless, So I'm considering remove the sequence number from blob record, in another patch.
Closes https://github.com/facebook/rocksdb/pull/2703

Differential Revision: D5589178

Pulled By: yiwu-arbug

fbshipit-source-id: 8dc960cd5f4e61b36024ba7c32d05584ce149c24
2017-08-11 12:43:17 -07:00
Daniel Black
64f8484356 block_cache_tier: fix gcc-7 warnings
Summary:
Error was:

utilities/persistent_cache/block_cache_tier.cc: In instantiation of ‘void rocksdb::Add(std::map<std::__cxx11::basic_string<char>, double>*, const string&, const T&) [with T = double; std::__cxx11::string = std::__cxx11::basic_string<char>]’:
utilities/persistent_cache/block_cache_tier.cc:147:40:   required from here
utilities/persistent_cache/block_cache_tier.cc:141:23: error: type qualifiers ignored on cast result type [-Werror=ignored-qualifiers]

   stats->insert({key, static_cast<const double>(t)});

fixing like #2562
Closes https://github.com/facebook/rocksdb/pull/2603

Differential Revision: D5600910

Pulled By: yiwu-arbug

fbshipit-source-id: 891a5ec7e451d2dec6ad1b6b7fac545657f87363
2017-08-10 11:58:53 -07:00
Maysam Yabandeh
bdc056f8aa Refactor PessimisticTransaction
Summary:
This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl.
Closes https://github.com/facebook/rocksdb/pull/2691

Differential Revision: D5569464

Pulled By: maysamyabandeh

fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325
2017-08-07 16:12:29 -07:00
Maysam Yabandeh
c9804e007a Refactor TransactionDBImpl
Summary:
This opens space for the new implementations of TransactionDBImpl such as WritePreparedTxnDBImpl that has a different policy of how to write to DB.
Closes https://github.com/facebook/rocksdb/pull/2689

Differential Revision: D5568918

Pulled By: maysamyabandeh

fbshipit-source-id: f7eac866e175daf3793ae79da108f65cc7dc7b25
2017-08-05 17:26:15 -07:00
Yi Wu
0d4a2b7330 Avoid blob db call Sync() while writing
Summary:
The FsyncFiles background job call Fsync() periodically for blob files. However it can access WritableFileWriter concurrently with a Put() or Write(). And WritableFileWriter does not support concurrent access. It will lead to WritableFileWriter buffer being flush with same content twice, and blob file end up corrupted. Fixing by simply let FsyncFiles hold write_mutex_.
Closes https://github.com/facebook/rocksdb/pull/2685

Differential Revision: D5561908

Pulled By: yiwu-arbug

fbshipit-source-id: f0bb5bcab0e05694e053b8c49eab43640721e872
2017-08-04 13:12:07 -07:00
Yi Wu
92afe830f9 Update all blob db TTL and timestamps to uint64_t
Summary:
The current blob db implementation use mix of int32_t, uint32_t and uint64_t for TTL and expiration. Update all timestamps to uint64_t for consistency.
Closes https://github.com/facebook/rocksdb/pull/2683

Differential Revision: D5557103

Pulled By: yiwu-arbug

fbshipit-source-id: e4eab2691629a755e614e8cf1eed9c3a681d0c42
2017-08-03 17:57:30 -07:00
Yi Wu
0b814ba92d Allow concurrent writes to blob db
Summary:
I'm going with brute-force solution, just letting Put() and Write() holding a mutex before writing. May improve concurrent writing with finer granularity locking later.
Closes https://github.com/facebook/rocksdb/pull/2682

Differential Revision: D5552690

Pulled By: yiwu-arbug

fbshipit-source-id: 039abd675b5d274a7af6428198d1733cafecef4c
2017-08-03 15:11:26 -07:00
Yi Wu
2c45ada4c4 Blob DB garbage collection should keep keys with newer version
Summary:
Fix the bug where if blob db garbage collection revmoe keys with newer version. It shouldn't delete the key from base db when sequence number in base db is not equal to the one in blob log.
Closes https://github.com/facebook/rocksdb/pull/2678

Differential Revision: D5549752

Pulled By: yiwu-arbug

fbshipit-source-id: abb8649260963b5c389748023970fd746279d227
2017-08-03 13:12:12 -07:00
Maysam Yabandeh
c3d5c4d38a Refactor TransactionImpl
Summary:
This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation.
Closes https://github.com/facebook/rocksdb/pull/2676

Differential Revision: D5549998

Pulled By: maysamyabandeh

fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec
2017-08-03 08:57:22 -07:00
Yi Wu
1900771bd2 Dump Blob DB options to info log
Summary:
* Dump blob db options to info log
* Remove BlobDBOptionsImpl to disallow dynamic cast *BlobDBOptions into *BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645
* Change some of the default options
* Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon.
Closes https://github.com/facebook/rocksdb/pull/2671

Differential Revision: D5529912

Pulled By: yiwu-arbug

fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7
2017-08-01 13:01:47 -07:00
Siying Dong
21696ba502 Replace dynamic_cast<>
Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645

Differential Revision: D5502723

Pulled By: siying

fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
2017-07-28 16:27:16 -07:00
Yi Wu
aaf42fe775 Move blob_db/ttl_extractor.h into blob_db/blob_db.h
Summary:
Move blob_db/ttl_extractor.h into blob_db/blob_db.h
Also exclude TTLExtractor from LITE build.
Closes https://github.com/facebook/rocksdb/pull/2665

Differential Revision: D5520009

Pulled By: yiwu-arbug

fbshipit-source-id: 4813dcc272c7cc4bf2cdac285256d9a17d78c7b7
2017-07-28 14:28:21 -07:00
Sagar Vemuri
aace46516b Fix license headers in Cassandra related files
Summary:
I might have missed these while doing some recent cassandra code reviews.
Closes https://github.com/facebook/rocksdb/pull/2663

Differential Revision: D5520138

Pulled By: sagar0

fbshipit-source-id: 340930afe9efe03c75f535a1da1f89bd3e53c1f9
2017-07-28 13:56:56 -07:00
Islam AbdelRahman
50a969131f CacheActivityLogger, component to log cache activity into a file
Summary:
Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache.
API:
```
SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>)
SimCache::StopActivityLogging()
```

Sending for review, Still need to add more comments.

I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache.
Closes https://github.com/facebook/rocksdb/pull/2295

Differential Revision: D5063826

Pulled By: IslamAbdelRahman

fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c
2017-07-28 12:36:48 -07:00
Yi Wu
6083bc79f8 Blob DB TTL extractor
Summary:
Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL
extractor can be use to extract TTL from keys insert with Put or
WriteBatch. Change over existing extract_ttl_fn are:
* If value is changed, it will be return via std::string* (rather than Slice*). With Slice* the new value has to be part of the existing value. With std::string* the limitation is removed.
* It can optionally return TTL or expiration.

Other changes in this PR:
* replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests.
* add several TTL tests.
* other minor naming change.
Closes https://github.com/facebook/rocksdb/pull/2659

Differential Revision: D5512627

Pulled By: yiwu-arbug

fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440
2017-07-27 23:26:04 -07:00
Maysam Yabandeh
2b259c9d49 Lower num of iterations in DeadlockCycle test
Summary:
Currently this test times out with tsan. This is likely due to decreased speed with tsan. By lowering the number of iterations we can still catch a bug as the test is run regularly and multiple runs of the test is equivalent with running the test with more iterations.
Closes https://github.com/facebook/rocksdb/pull/2639

Differential Revision: D5490549

Pulled By: maysamyabandeh

fbshipit-source-id: bd69c42a9728d337ac95a06a401088384e51731a
2017-07-25 11:42:26 -07:00
Siying Dong
e67b35c076 Add Iterator::Refresh()
Summary:
Add and implement Iterator::Refresh(). When this function is called, if the super version doesn't change, update the sequence number of the iterator to the latest one and invalidate the iterator. If the super version changed, recreated the whole iterator. This can help users reuse the iterator more easily.
Closes https://github.com/facebook/rocksdb/pull/2621

Differential Revision: D5464500

Pulled By: siying

fbshipit-source-id: f548bd35e85c1efca2ea69273802f6704eba6ba9
2017-07-24 10:54:37 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Pengchao Wang
534c255c7a Cassandra compaction filter for purge expired columns and rows
Summary:
Major changes in this PR:
* Implement CassandraCompactionFilter to remove expired columns and rows (if all column expired)
* Move cassandra related code from utilities/merge_operators/cassandra to utilities/cassandra/*
* Switch to use shared_ptr<> from uniqu_ptr for Column membership management in RowValue. Since columns do have multiple owners in Merge and GC process, use shared_ptr helps make RowValue immutable.
* Rename cassandra_merge_test to cassandra_functional_test and add two TTL compaction related tests there.
Closes https://github.com/facebook/rocksdb/pull/2588

Differential Revision: D5430010

Pulled By: wpc

fbshipit-source-id: 9566c21e06de17491d486a68c70f52d501f27687
2017-07-21 14:57:44 -07:00
Yi Wu
0302da47a7 Reduce blob db noisy logging
Summary:
Remove some of the per-key logging by blob db to reduce noise.
Closes https://github.com/facebook/rocksdb/pull/2587

Differential Revision: D5429115

Pulled By: yiwu-arbug

fbshipit-source-id: b89328282fb8b3c64923ce48738c16017ce7feaf
2017-07-20 15:02:31 -07:00
Yedidya Feldblum
f1a056e005 CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera
Summary:
CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.

The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.

Reviewed By: Orvid

Differential Revision:
D5432398

Tags: codemod, codemod-opensource

fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
2017-07-16 21:26:02 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Yi Wu
26ce69b195 Update blob db to use ROCKS_LOG_* macro
Summary:
Update blob db to use the newer ROCKS_LOG_* macro.
Closes https://github.com/facebook/rocksdb/pull/2574

Differential Revision: D5414526

Pulled By: yiwu-arbug

fbshipit-source-id: e428753aa5917e8b435cead2db26df586e5d1def
2017-07-13 10:14:04 -07:00
foolenough
21b17d7686 Fix BlobDB::Get which only get out the value offset
Summary:
Blob db use StackableDB::get which only get out the
value offset, but not the value.
Fix by making BlobDB::Get override the designated getter.
Closes https://github.com/facebook/rocksdb/pull/2553

Differential Revision: D5396823

Pulled By: yiwu-arbug

fbshipit-source-id: 5a7d1cf77ee44490f836a6537225955382296878
2017-07-12 17:57:40 -07:00
Andrew Kryczka
33042573db Fix GetCurrentTime() initialization for valgrind
Summary:
Valgrind had false positive complaints about the initialization pattern for `GetCurrentTime()`'s argument in #2480. We can instead have the client initialize the time variable before calling `GetCurrentTime()`, and have `GetCurrentTime()` promise to only overwrite it in success case.
Closes https://github.com/facebook/rocksdb/pull/2526

Differential Revision: D5358689

Pulled By: ajkr

fbshipit-source-id: 857b189f24c19196f6bb299216f3e23e7bc4be42
2017-07-05 12:12:00 -07:00
Mike Kolupaev
397ab11152 Improve Status message for block checksum mismatches
Summary:
We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages.

It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild.
Closes https://github.com/facebook/rocksdb/pull/2507

Differential Revision: D5345702

Pulled By: al13n321

fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339
2017-06-28 21:27:01 -07:00
Siying Dong
18c63af6ef Make "make analyze" happy
Summary:
"make analyze" is reporting some errors. It's complicated to look but it seems to me that they are all false positive. Anyway, I think cleaning them up is a good idea. Some of the changes are hacky but I don't know a better way.
Closes https://github.com/facebook/rocksdb/pull/2508

Differential Revision: D5341710

Pulled By: siying

fbshipit-source-id: 6070e430e0e41a080ef441e05e8ec827d45efab6
2017-06-28 15:42:27 -07:00
Yi Wu
982cec22af Fix TARGETS file tests list
Summary:
1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
2. add blob_db_test
Closes https://github.com/facebook/rocksdb/pull/2506

Differential Revision: D5331517

Pulled By: yiwu-arbug

fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57
2017-06-27 14:12:02 -07:00
Maysam Yabandeh
499ebb3ab5 Optimize for serial commits in 2PC
Summary:
Throughput: 46k tps in our sysbench settings (filling the details later)

The idea is to have the simplest change that gives us a reasonable boost
in 2PC throughput.

Major design changes:
1. The WAL file internal buffer is not flushed after each write. Instead
it is flushed before critical operations (WAL copy via fs) or when
FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
via mutex_.
2. Use two sequence numbers: last seq, and last seq for write. Last seq
is the last visible sequence number for reads. Last seq for write is the
next sequence number that should be used to write to WAL/memtable. This
allows to have a memtable write be in parallel to WAL writes.
3. BatchGroup is not used for writes. This means that we can have
parallel writers which changes a major assumption in the code base. To
accommodate for that i) allow only 1 WriteImpl that intends to write to
memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
come via group commit phase which is serial anyway, ii) make all the
parts in the code base that assumed to be the only writer (via
EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
protected via a stat_mutex_.

Note: the first commit has the approach figured out but is not clean.
Submitting the PR anyway to get the early feedback on the approach. If
we are ok with the approach I will go ahead with this updates:
0) Rebase with Yi's pipelining changes
1) Currently batching is disabled by default to make sure that it will be
consistent with all unit tests. Will make this optional via a config.
2) A couple of unit tests are disabled. They need to be updated with the
serial commit of 2PC taken into account.
3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
releasing mutex_ beforehand (the same way EnterUnbatched does). This
needs to be cleaned up.
Closes https://github.com/facebook/rocksdb/pull/2345

Differential Revision: D5210732

Pulled By: maysamyabandeh

fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
2017-06-24 14:11:29 -07:00
Siying Dong
88cd2d96e7 Downgrade option sanitiy check level for prefix_extractor
Summary:
With c7004840d2, it's safe to open a DB with different prefix extractor. So it's safe to skip prefix extractor check.
Closes https://github.com/facebook/rocksdb/pull/2474

Differential Revision: D5294700

Pulled By: siying

fbshipit-source-id: eeb500da795eecb29b8c9c56a14cfd4afda12ecc
2017-06-22 16:26:36 -07:00
Andrew Kryczka
048446fc74 Fix cassandra ASAN use-after-free
Summary:
When we create a column based on the `string::c_str()`, we need to make sure that char array doesn't get deleted when calls to `string::append()` cause the string to expand.
Closes https://github.com/facebook/rocksdb/pull/2470

Differential Revision: D5285049

Pulled By: ajkr

fbshipit-source-id: f918dd426ff3c024e7a293dcb10448f10b6c98e8
2017-06-20 13:27:16 -07:00
Dmitri Smirnov
a21db161c9 Implement ReopenWritibaleFile on Windows and other fixes
Summary:
Make default impl return NoSupported so the db_blob
  tests exist in a meaningful manner.
  Replace std::thread to port::Thread
Closes https://github.com/facebook/rocksdb/pull/2465

Differential Revision: D5275563

Pulled By: yiwu-arbug

fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6
2017-06-20 10:31:13 -07:00
Chen Shen
cbd825deea Create a MergeOperator for Cassandra Row Value
Summary:
This PR implements the MergeOperator for Cassandra Row Values.
Closes https://github.com/facebook/rocksdb/pull/2289

Differential Revision: D5055464

Pulled By: scv119

fbshipit-source-id: 45f276ef8cbc4704279202f6a20c64889bc1adef
2017-06-16 14:27:00 -07:00
Yi Wu
ae8571f5c2 Fix blob db compression bug
Summary:
`CompressBlock()` will return the uncompressed slice (i.e. `Slice(value_unc)`) if compression ratio is not good enough. This is undesired. We need to always assign the compressed slice to `value`.
Closes https://github.com/facebook/rocksdb/pull/2447

Differential Revision: D5244682

Pulled By: yiwu-arbug

fbshipit-source-id: 6989dd8852c9622822ba9acec9beea02007dff09
2017-06-14 13:56:42 -07:00
Yi Wu
7a380deff7 Update blob_db_test
Summary:
I'm trying to improve unit test of blob db. I'm rewriting blob db test. In this patch:
* Rewrite tests of basic put/write/delete operations.
* Add disable_background_tasks to BlobDBOptionsImpl to allow me not running any background job for basic unit tests.
* Move DestroyBlobDB out from BlobDBImpl to be a standalone function.
* Remove all garbage collection related tests. Will rewrite them in following patch.
* Disabled compression test since it is failing. Will fix in a followup patch.
Closes https://github.com/facebook/rocksdb/pull/2446

Differential Revision: D5243306

Pulled By: yiwu-arbug

fbshipit-source-id: 157c71ad3b699307cb88baa3830e9b6e74f8e939
2017-06-14 13:12:34 -07:00
Sagar Vemuri
89ad9f3adb Allow ignoring unknown options when loading options from a file
Summary:
Added a flag, `ignore_unknown_options`, to skip unknown options when loading an options file (using `LoadLatestOptions`/`LoadOptionsFromFile`) or while verifying options (using `CheckOptionsCompatibility`). This will help in downgrading the db to an older version.

Also added `--ignore_unknown_options` flag to ldb

**Example Use case:**
In MyRocks, if copying from newer version to older version, it is often impossible to start because of new RocksDB options that don't exist in older version, even though data format is compatible.
MyRocks uses these load and verify functions in [ha_rocksdb.cc::check_rocksdb_options_compatibility](e004fd9f41/storage/rocksdb/ha_rocksdb.cc (L3348-L3401)).

**Test Plan:**
Updated the unit tests.
`make check`

ldb:
$ ./ldb --db=/tmp/test_db --create_if_missing put a1 b1
OK

Now edit /tmp/test_db/<OPTIONS-file> and add an unknown option.

Try loading the options now, and it fails:
$ ./ldb --db=/tmp/test_db --try_load_options get a1
Failed: Invalid argument: Unrecognized option DBOptions:: abcd

Passes with the new --ignore_unknown_options flag
$ ./ldb --db=/tmp/test_db --try_load_options --ignore_unknown_options get a1
b1
Closes https://github.com/facebook/rocksdb/pull/2423

Differential Revision: D5212091

Pulled By: sagar0

fbshipit-source-id: 2ec17636feb47dc0351b53a77e5f15ef7cbf2ca7
2017-06-13 16:58:01 -07:00
Andrew Kryczka
c217e0b9c7 Call RateLimiter for compaction reads
Summary:
Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed.
Closes https://github.com/facebook/rocksdb/pull/2433

Differential Revision: D5216946

Pulled By: ajkr

fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e
2017-06-13 14:56:46 -07:00
Yi Wu
91e2aa3ce2 write exact sequence number for each put in write batch
Summary:
At the beginning of write batch write, grab the latest sequence from base db and assume sequence number will increment by 1 for each put and delete, and write the exact sequence number with each put. This is assuming we are the only writer to increment sequence number (no external file ingestion, etc) and there should be no holes in the sequence number.

Also having some minor naming changes.
Closes https://github.com/facebook/rocksdb/pull/2402

Differential Revision: D5176134

Pulled By: yiwu-arbug

fbshipit-source-id: cb4712ee44478d5a2e5951213a10b72f08fe8c88
2017-06-13 12:42:36 -07:00
Maysam Yabandeh
550a1df72c Fix clang errors by asserting the precondition
Summary:
USE_CLANG=1 make -j32 analyze
The two errors would disappear after the assertion.
Closes https://github.com/facebook/rocksdb/pull/2416

Differential Revision: D5193526

Pulled By: maysamyabandeh

fbshipit-source-id: 16a21f18f68023f862764dd3ab9e00ca60b0eefa
2017-06-06 12:56:52 -07:00
hyunwoo
c7662a44a4 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2376

Differential Revision: D5183630

Pulled By: ajkr

fbshipit-source-id: 133cfd0445959e70aa2cd1a12151bf3c0c5c3ac5
2017-06-05 11:27:34 -07:00
Aaron Gao
7f6c02dda1 using ThreadLocalPtr to hide ROCKSDB_SUPPORT_THREAD_LOCAL from public…
Summary:
… headers

https://github.com/facebook/rocksdb/pull/2199 should not reference RocksDB-specific macros (like ROCKSDB_SUPPORT_THREAD_LOCAL in this case) to public headers, `iostats_context.h` and `perf_context.h`. We shouldn't do that because users have to provide these compiler flags when building their binary with RocksDB.

We should hide the thread local global variable inside our implementation and just expose a function api to retrieve these variables. It may break some users for now but good for long term.

make check -j64
Closes https://github.com/facebook/rocksdb/pull/2380

Differential Revision: D5177896

Pulled By: lightmark

fbshipit-source-id: 6fcdfac57f2e2dcfe60992b7385c5403f6dcb390
2017-06-02 17:26:19 -07:00
Yi Wu
ad19eb8686 Fixing blob db sequence number handling
Summary:
Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch.

Stacking on #2375.
Closes https://github.com/facebook/rocksdb/pull/2385

Differential Revision: D5148358

Pulled By: yiwu-arbug

fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33
2017-05-31 10:56:45 -07:00
Yi Wu
345878a7fb update blob_db_test
Summary:
Re-enable blob_db_test with some update:
* Commented out delay at the end of GC tests. Will update the logic later with sync point to properly trigger GC.
* Added some helper functions.

Also update make files to include blob_dump tool.
Closes https://github.com/facebook/rocksdb/pull/2375

Differential Revision: D5133793

Pulled By: yiwu-arbug

fbshipit-source-id: 95470b26d0c1f9592ba4b7637e027fdd263f425c
2017-05-30 22:26:13 -07:00
Tamir Duberstein
103d0692ea Avoid unsupported attributes when not building with UBSAN
Summary:
yiwu-arbug see individual commits.
Closes https://github.com/facebook/rocksdb/pull/2318

Differential Revision: D5141520

Pulled By: yiwu-arbug

fbshipit-source-id: 7987c92ab4461eef36afce5a133d3a0ee0c96300
2017-05-30 11:13:01 -07:00
Sagar Vemuri
02594b5f11 Fix build errors in blob_dump_tool with GCC 4.8
Summary:
Fixing the build errors seen with GCC 4.8.1.
```
Makefile:105: Warning: Compiling in debug mode. Don't use the resulting binary in production
utilities/blob_db/blob_dump_tool.cc: In member function ‘rocksdb::Status rocksdb::blob_db::BlobDumpTool::DumpBlobLogFooter(uint64_t, uint64_t*)’:
utilities/blob_db/blob_dump_tool.cc:149:42: error: expected ‘)’ before ‘PRIu64’
   fprintf(stdout, "  Blob count     : %" PRIu64 "\n", footer.GetBlobCount());
                                          ^
utilities/blob_db/blob_dump_tool.cc:149:76: error: spurious trailing ‘%’ in format [-Werror=format=]
   fprintf(stdout, "  Blob count     : %" PRIu64 "\n", footer.GetBlobCount());
                                                                            ^
utilities/blob_db/blob_dump_tool.cc:149:76: error: too many arguments for format [-Werror=format-extra-args]
utilities/blob_db/blob_dump_tool.cc: In member function ‘rocksdb::Status rocksdb::blob_db::BlobDumpTool::DumpRecord(rocksdb::blob_db::BlobDumpTool::DisplayType, rocksdb::blob_db::BlobDumpTool::DisplayType, uint64_t*)’:
utilities/blob_db/blob_dump_tool.cc:161:49: error: expected ‘)’ before ‘PRIx64’
   fprintf(stdout, "Read record with offset 0x%" PRIx64 " (%" PRIu64 "):\n",
                                                 ^
utilities/blob_db/blob_dump_tool.cc:162:27: error: spurious trailing ‘%’ in format [-Werror=format=]
           *offset, *offset);
                           ^
utilities/blob_db/blob_dump_tool.cc:162:27: error: too many arguments for format [-Werror=format-extra-args]
utilities/blob_db/blob_dump_tool.cc:176:38: error: expected ‘)’ before ‘PRIu64’
   fprintf(stdout, "  blob size  : %" PRIu64 "\n", record.GetBlobSize());
                                      ^
utilities/blob_db/blob_dump_tool.cc:176:71: error: spurious trailing ‘%’ in format [-Werror=format=]
   fprintf(stdout, "  blob size  : %" PRIu64 "\n", record.GetBlobSize());
                                                                       ^
utilities/blob_db/blob_dump_tool.cc:176:71: error: too many arguments for format [-Werror=format-extra-args]
utilities/blob_db/blob_dump_tool.cc:178:38: error: expected ‘)’ before ‘PRIu64’
   fprintf(stdout, "  time       : %" PRIu64 "\n", record.GetTimeVal());
                                      ^
utilities/blob_db/blob_dump_tool.cc:178:70: error: spurious trailing ‘%’ in format [-Werror=format=]
   fprintf(stdout, "  time       : %" PRIu64 "\n", record.GetTimeVal());
                                                                      ^
utilities/blob_db/blob_dump_tool.cc:178:70: error: too many arguments for format [-Werror=format-extra-args]
utilities/blob_db/blob_dump_tool.cc:214:38: error: expected ‘)’ before ‘PRIu64’
   fprintf(stdout, "  sequence   : %" PRIu64 "\n", record.GetSN());
                                      ^
utilities/blob_db/blob_dump_tool.cc:214:65: error: spurious trailing ‘%’ in format [-Werror=format=]
   fprintf(stdout, "  sequence   : %" PRIu64 "\n", record.GetSN());
```
Closes https://github.com/facebook/rocksdb/pull/2359

Differential Revision: D5117684

Pulled By: sagar0

fbshipit-source-id: 7480346bcd96205fcae890927c5e68cf004e87be
2017-05-24 00:11:36 -07:00
Yi Wu
578fb0b1dc Simple blob file dumper
Summary:
A simple blob file dumper.
Closes https://github.com/facebook/rocksdb/pull/2242

Differential Revision: D5097553

Pulled By: yiwu-arbug

fbshipit-source-id: c6e00d949fcd3658f9f68da9352f06339fac418d
2017-05-23 10:42:59 -07:00
Aaron Gao
3e86c0f07c disable direct reads for log and manifest and add direct io to tests
Summary:
Disable direct reads for log and manifest. Direct reads should not affect sequential_file
Also add kDirectIO for option_config_ in db_test_util
Closes https://github.com/facebook/rocksdb/pull/2337

Differential Revision: D5100261

Pulled By: lightmark

fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b
2017-05-22 18:41:28 -07:00
Sagar Vemuri
228f49d20a Fix data races caught by tsan
Summary:
This fixes the tsan build failures in:
- write_callback_test
- persistent_cache_test.*
Closes https://github.com/facebook/rocksdb/pull/2339

Differential Revision: D5101190

Pulled By: sagar0

fbshipit-source-id: 537e19ed05272b1f34cfbf793aa822b2264a1643
2017-05-22 10:27:23 -07:00
Yi Wu
d746aead1a Suppress clang-analyzer false positive
Summary:
Fixing two types of clang-analyzer false positives:
* db is deleted and then reopen, and clang-analyzer thinks we are reusing the pointer after it has been deleted. Adding asserts to hint clang-analyzer the pointer is recreated.
* ParsedInternalKey is (intentionally) uninitialized. Initialize the struct only when clang-analyzer is running.
Closes https://github.com/facebook/rocksdb/pull/2334

Differential Revision: D5093801

Pulled By: yiwu-arbug

fbshipit-source-id: f51355382098eb3da5ab9f64e094c6d03e6bdf7d
2017-05-19 10:56:28 -07:00
yizhu.sun
f5ba131bf8 Fixed some spelling mistakes
Summary: Closes https://github.com/facebook/rocksdb/pull/2314

Differential Revision: D5079601

Pulled By: sagar0

fbshipit-source-id: ae5696fd735718f544435c64c3179c49b8c04349
2017-05-17 23:12:36 -07:00
hyunwoo
0ebdd70579 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2312

Differential Revision: D5079631

Pulled By: sagar0

fbshipit-source-id: e4c8d1d89b244ee69e9dea1dd013227cc5241026
2017-05-17 16:41:49 -07:00
Yi Wu
445f1235bf s/std::snprintf/snprintf
Summary:
Looks like std::snprintf is not available on all platforms (e.g. MSVC 2010). Change it back to snprintf, where we have a macro in port.h to workaround compatibility.
Closes https://github.com/facebook/rocksdb/pull/2308

Differential Revision: D5070988

Pulled By: yiwu-arbug

fbshipit-source-id: bedfc1660bab0431c583ad434b7e68265e1211b1
2017-05-16 12:01:04 -07:00
Yi Wu
86d5492530 Fix build error with blob DB.
Summary:
snprintf is in <stdio.h> and not in namespace std.
Closes https://github.com/facebook/rocksdb/pull/2287

Reviewed By: anirbanr-fb

Differential Revision: D5054752

Pulled By: yiwu-arbug

fbshipit-source-id: 356807ec38f3c7d95951cdb41f31a3d3ae0714d4
2017-05-15 14:05:46 -07:00
Andrew Kryczka
3fa9a39c68 Add GetAllKeyVersions API
Summary:
- Introduced an include/ file dedicated to db-related debug functions to avoid making db.h more complex
- Added debugging function, `GetAllKeyVersions()`, to return a listing of internal data for a range of user keys. The new `struct KeyVersion` exposes data similar to internal key without exposing any internal type.
- Migrated the "ldb idump" subcommand to use this function
- The API takes an inclusive-exclusive range to match behavior of "ldb idump". This will be quite annoying for users who want to query a single user key's versions :(.
Closes https://github.com/facebook/rocksdb/pull/2232

Differential Revision: D4976007

Pulled By: ajkr

fbshipit-source-id: cab375da53a7595d6575af2b7e3b776aa3ad793e
2017-05-12 15:54:06 -07:00
Anirban Rahut
d85ff4953c Blob storage pr
Summary:
The final pull request for Blob Storage.
Closes https://github.com/facebook/rocksdb/pull/2269

Differential Revision: D5033189

Pulled By: yiwu-arbug

fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06
2017-05-10 15:14:44 -07:00
siddontang
b551104e04 support PopSavePoint for WriteBatch
Summary:
Try to fix https://github.com/facebook/rocksdb/issues/1969
Closes https://github.com/facebook/rocksdb/pull/2170

Differential Revision: D4907333

Pulled By: yiwu-arbug

fbshipit-source-id: 417b420ff668e6c2fd0dad42a94c57385012edc5
2017-05-03 10:57:45 -07:00
Yi Wu
da4b2070b3 Fix WriteBatchWithIndex address use after scope error
Summary:
Fix use after scope error caught by ASAN.
Closes https://github.com/facebook/rocksdb/pull/2228

Differential Revision: D4968028

Pulled By: yiwu-arbug

fbshipit-source-id: a2a266c98634237494ab4fb2d666bc938127aeb2
2017-04-28 13:12:10 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Dmitri Smirnov
cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Andrew Kryczka
e5e545a021 Reunite checkpoint and backup core logic
Summary:
These code paths forked when checkpoint was introduced by copy/pasting the core backup logic. Over time they diverged and bug fixes were sometimes applied to one but not the other (like fix to include all relevant WALs for 2PC), or it required extra effort to fix both (like fix to forge CURRENT file). This diff reunites the code paths by extracting the core logic into a function, CreateCustomCheckpoint(), that is customizable via callbacks to implement both checkpoint and backup.

Related changes:

- flush_before_backup is now forcibly enabled when 2PC is enabled
- Extracted CheckpointImpl class definition into a header file. This is so the function, CreateCustomCheckpoint(), can be called by internal rocksdb code but not exposed to users.
- Implemented more functions in DummyDB/DummyLogFile (in backupable_db_test.cc) that are used by CreateCustomCheckpoint().
Closes https://github.com/facebook/rocksdb/pull/1932

Differential Revision: D4622986

Pulled By: ajkr

fbshipit-source-id: 157723884236ee3999a682673b64f7457a7a0d87
2017-04-24 15:06:46 -07:00
Maysam Yabandeh
4c9447d889 Add erase option to release cache
Summary:
This is useful when we put the entries in the block cache for accounting
purposes and do not expect it to be used after it is released. If the cache does not
erase the item in such cases not only the performance of cache is
negatively affected but the item's destructor not being called at the
time of release might violate the assumptions about the lifetime of the
object.

The new change adds a force_erase option to the Release method and
returns a boolean to indicate whehter the item is successfully deleted.
Closes https://github.com/facebook/rocksdb/pull/2180

Differential Revision: D4916032

Pulled By: maysamyabandeh

fbshipit-source-id: 94409a346069923cac9de8e57adc313b4ed46f28
2017-04-24 11:28:36 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Siying Dong
7534ba7bde StackableDB should pass ResetStats()
Summary: Closes https://github.com/facebook/rocksdb/pull/2190

Differential Revision: D4922688

Pulled By: siying

fbshipit-source-id: eaa3d122f8d389ae0508ec8b61f7780fd8b0a7ef
2017-04-20 16:11:56 -07:00
Andrew Kryczka
df74b775e6 Limit backups opened
Summary:
This was requested by a customer who wants to proactively monitor whether any valid backups are available. The existing performance was poor because Open() serially reads every small meta-file (one per backup), which was slow on HDFS.

Now we only read the minimum number of meta-files to find `max_valid_backups_to_open` valid backups. The customer mentioned above can just set it to one.
Closes https://github.com/facebook/rocksdb/pull/2151

Differential Revision: D4882564

Pulled By: ajkr

fbshipit-source-id: cb0edf9e8ac693e4d5f24902e725a011ed8c0c2f
2017-04-19 13:26:47 -07:00
Siying Dong
ca96654d85 Change Build Env to gcc-5
Summary:
Default to build using gcc-5. Only apply to Facebook-only environments.
Closes https://github.com/facebook/rocksdb/pull/2158

Differential Revision: D4887568

Pulled By: siying

fbshipit-source-id: 53496c9af3273ccd44441bd0bef9d29beefbc00b
2017-04-14 11:12:56 -07:00
Manuel Ung
9300ef5455 Fix shared lock upgrades
Summary:
Upgrading a shared lock was silently succeeding because the actual locking code was skipped. This is because if the keys are tracked, it is assumed that they are already locked and do not require locking. Fix this by recording in tracked keys whether the key was locked exclusively or not.

Note that lock downgrades are impossible, which is the behaviour we expect.

This fixes facebook/mysql-5.6#587.
Closes https://github.com/facebook/rocksdb/pull/2122

Differential Revision: D4861489

Pulled By: IslamAbdelRahman

fbshipit-source-id: 58c7ebe7af098bf01b9774b666d3e9867747d8fd
2017-04-10 16:06:00 -07:00
Manuel Ung
1f8b119ed6 Limit maximum memory used in the WriteBatch representation
Summary:
Extend TransactionOptions to include max_write_batch_size which determines the maximum size of the writebatch representation. If memory limit is exceeded, the operation will abort with subcode kMemoryLimit.
Closes https://github.com/facebook/rocksdb/pull/2124

Differential Revision: D4861842

Pulled By: lth

fbshipit-source-id: 46fd172ea67cc90bbba829bf0d70cfab2261c161
2017-04-10 15:42:26 -07:00
Sagar Vemuri
7124268a09 Reduce the number of params needed to construct DBIter
Summary:
DBIter, and in-turn NewDBIterator and NewArenaWrappedDBIterator, take a  bunch of params. They can be reduced by passing in ReadOptions directly instead of passing in every new param separately. It also seems much cleaner as a bunch of the params towards the end seem to be optional.

(Recently I introduced max_skippable_internal_keys, which added one more to the already huge count).

Idea courtesy IslamAbdelRahman
Closes https://github.com/facebook/rocksdb/pull/2116

Differential Revision: D4857128

Pulled By: sagar0

fbshipit-source-id: 7d239df094b94bd9ea79d145cdf825478ac037a8
2017-04-10 11:14:14 -07:00
Sagar Vemuri
343b59d6ee Move various string utility functions into string_util
Summary:
This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.

Check the sub-commits for easier reviewing.
Closes https://github.com/facebook/rocksdb/pull/2094

Differential Revision: D4837730

Pulled By: sagar0

fbshipit-source-id: 344278a
2017-04-06 14:54:12 -07:00
Yi Wu
df6f5a3772 Move memtable related files into memtable directory
Summary:
Move memtable related files into memtable directory.
Closes https://github.com/facebook/rocksdb/pull/2087

Differential Revision: D4829242

Pulled By: yiwu-arbug

fbshipit-source-id: ca70ab6
2017-04-06 14:09:13 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Andrew Kryczka
d659faad54 Level-based L0->L0 compaction
Summary:
Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.

- L0->L0 is triggered when base level is unavailable due to pending compactions
- L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
- Subcompactions are disabled for L0->L0 since we want to output one file.
- Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
Closes https://github.com/facebook/rocksdb/pull/2027

Differential Revision: D4760318

Pulled By: ajkr

fbshipit-source-id: 9d07183
2017-04-04 18:09:11 -07:00
Andrew Kryczka
e2c6c06366 add TimedEnv
Summary:
I've needed Env timing measurements a few times now, so finally built something for it.
Closes https://github.com/facebook/rocksdb/pull/2073

Differential Revision: D4811231

Pulled By: ajkr

fbshipit-source-id: 218a249
2017-04-04 11:24:12 -07:00