Commit Graph

6682 Commits

Author SHA1 Message Date
Quinn Jarrell
6a541afcc4 Make bytes_per_sync and wal_bytes_per_sync mutable
Summary:
SUMMARY
Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed.
TEST PLAN
ran make check
all passed

Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value.
Closes https://github.com/facebook/rocksdb/pull/2893

Reviewed By: yiwu-arbug

Differential Revision: D5845814

Pulled By: TheRushingWookie

fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd
2017-09-27 17:49:45 -07:00
Yi Wu
ec48e5c77f Add TransactionDB::SingleDelete()
Summary:
Looks like the API is simply missing. Adding it.
Closes https://github.com/facebook/rocksdb/pull/2937

Differential Revision: D5919955

Pulled By: yiwu-arbug

fbshipit-source-id: 6e2e9c96c29882b0bb4113d1f8efb72bffc57878
2017-09-27 10:27:26 -07:00
Sagar Vemuri
0806801dc8 DestroyDB API
Summary:
Expose DestroyDB API in RocksJava.
Closes https://github.com/facebook/rocksdb/pull/2934

Differential Revision: D5914775

Pulled By: sagar0

fbshipit-source-id: 84af6ea0d2bccdcfb9fe8c07b2f87373f0d5bab6
2017-09-26 16:42:11 -07:00
Maysam Yabandeh
aa67bae6cf Break down PinnedDataIteratorRandomized
Summary:
Its timing out under tsan.
Closes https://github.com/facebook/rocksdb/pull/2928

Differential Revision: D5911766

Pulled By: maysamyabandeh

fbshipit-source-id: 2faacc07752ac8713a3a2abb5a4c4b7ae3bdf208
2017-09-26 14:27:30 -07:00
Siying Dong
4748911357 Add LogDevice to USERS.md
Summary: Closes https://github.com/facebook/rocksdb/pull/2927

Differential Revision: D5906613

Pulled By: siying

fbshipit-source-id: 607401e05b27508c816c700864fe81514606e4ef
2017-09-25 15:56:40 -07:00
Zhongyi Xie
1d6700f9e6 Add test kPointInTimeRecoveryCFConsistency
Summary:
Context/problem:

- CFs may be flushed at different times
- A WAL can only be deleted after all CFs have flushed beyond end of that WAL.
- Point-in-time recovery might stop upon reaching the first corruption.
- Some CFs may have already flushed beyond that point, while others haven't. We should fail the Open() instead of proceeding with inconsistent CFs.
Closes https://github.com/facebook/rocksdb/pull/2900

Differential Revision: D5863281

Pulled By: miasantreble

fbshipit-source-id: 180dbaf83d96c804cff49b3c406312a4ae61313e
2017-09-22 17:26:36 -07:00
Yi Wu
be97dbb15c Fix WritePreparedTransactionTest::SeqAdvanceTest ASAN failure
Summary: Closes https://github.com/facebook/rocksdb/pull/2922

Differential Revision: D5895310

Pulled By: yiwu-arbug

fbshipit-source-id: 52c635a25d22478ec1eca49b6817551202babac2
2017-09-22 15:26:42 -07:00
Andrew Kryczka
4708a6875c Repair DBs with trailing slash in name
Summary:
Problem:

- `DB::SanitizeOptions` strips trailing slash from `wal_dir` but not `dbname`
- We check whether `wal_dir` and `dbname` refer to the same directory using string equality: https://github.com/facebook/rocksdb/blob/master/db/repair.cc#L258
- Providing `dbname` with trailing slash causes default `wal_dir` to be misidentified as a separate directory.
- Then the repair tries to add all SST files to the `VersionEdit` twice (once for `dbname` dir, once for `wal_dir`) and fails with coredump.

Solution:

- Add a new `Env` function, `AreFilesSame`, which uses device and inode number to check whether files are the same. It's currently only implemented in `PosixEnv`.
- Migrate repair to use `AreFilesSame` to check whether `dbname` and `wal_dir` are same. If unsupported, falls back to string comparison.
Closes https://github.com/facebook/rocksdb/pull/2827

Differential Revision: D5761349

Pulled By: ajkr

fbshipit-source-id: c839d548678b742af1166d60b09abd94e5476238
2017-09-22 12:42:22 -07:00
Andrew Kryczka
fc7476bec1 fix populating range deletions in forward iterator
Summary:
fixes #2902
Closes https://github.com/facebook/rocksdb/pull/2917

Differential Revision: D5887175

Pulled By: ajkr

fbshipit-source-id: 364e292c636a3238bfc53b0fb9a01ff2f82dcbb9
2017-09-21 17:56:38 -07:00
Sagar Vemuri
c8f3606731 Expose LoadLatestOptions, LoadOptionsFromFile and GetLatestOptionsFileName APIs in RocksJava
Summary:
JNI wrappers for LoadLatestOptions, LoadOptionsFromFile and GetLatestOptionsFileName APIs.
Closes https://github.com/facebook/rocksdb/pull/2898

Differential Revision: D5857934

Pulled By: sagar0

fbshipit-source-id: 68b79e83eab8de9416e3f1fef73e11cf7947e90a
2017-09-21 17:29:13 -07:00
Sagar Vemuri
96a13b4f4b Use jemalloc in rocksdbjni library built via vagrant
Summary:
Problem:
During RocksJava performance testing we found that the rocksdb jni library is not built with jemalloc; instead it was getting built with the default glibc malloc. We saw quite a bit of memory bloat due to this.

Addressed this by installing jemalloc-devel package in the vm that we use to build release jars.
Closes https://github.com/facebook/rocksdb/pull/2916

Differential Revision: D5887018

Pulled By: sagar0

fbshipit-source-id: ace0b5d60234b3a30dcd5d39633e7827a5982a50
2017-09-21 16:42:06 -07:00
PhaniShekhar
65a9cd6168 Use L1 size as estimate for L0 size in LevelCompactionBuilder::GetPathID
Summary:
Fix for [2461](https://github.com/facebook/rocksdb/issues/2461).

Problem: When using multiple db_paths setting with RocksDB, RocksDB incorrectly calculates the size of L1 in LevelCompactionBuilder::GetPathId.

max_bytes_for_level_base is used as L0 size and L1 size is calculated as (L0 size * max_bytes_for_level_multiplier). However, L1 size should be max_bytes_for_level_base.

Solution: Use max_bytes_for_level_base as L1 size. Also, use L1 size as the estimated size of L0.
Closes https://github.com/facebook/rocksdb/pull/2903

Differential Revision: D5885442

Pulled By: maysamyabandeh

fbshipit-source-id: 036da1c9298d173b9b80479cc6661ee4b7a951f6
2017-09-21 15:57:58 -07:00
Andrew Kryczka
8fc3de3c62 make rate limiter a general option
Summary:
it's unsupported in options file, so the flag should be respected by db_bench even when an options file is provided.
Closes https://github.com/facebook/rocksdb/pull/2910

Differential Revision: D5869836

Pulled By: ajkr

fbshipit-source-id: f67f591ae083e95e989f86b6fad50765d2e3d855
2017-09-21 11:11:00 -07:00
Yi Wu
1480e6f7cf Fix TransactionTest::SeqAdvanceTest ASAN failure
Summary:
The test didn't delete txn before creating a new one.
Closes https://github.com/facebook/rocksdb/pull/2913

Differential Revision: D5880236

Pulled By: yiwu-arbug

fbshipit-source-id: 7a4fcaada3d86332292754502cd8f4341143bf4f
2017-09-21 09:56:54 -07:00
Sagar Vemuri
3fc08fa88e Expose max_background_jobs option in RocksJava
Summary:
This option was introduced in the C++ API in RocksDB 5.6 in bb01c1880c . Now, exposing it through RocksJava API.
Closes https://github.com/facebook/rocksdb/pull/2908

Differential Revision: D5864224

Pulled By: sagar0

fbshipit-source-id: 140aa55dcf74b14e4d11219d996735c7fdddf513
2017-09-20 10:26:37 -07:00
Yao Zongyou
8ae81684e9 Update cmake_minimum_required to 2.8.12.
Summary:
Hello,

current master branch declares cmake_minimum_required (VERSION 2.8.11)
but cmake gives the following error:

[  6%] CMake Error at CMakeLists.txt:658 (install):
  install TARGETS given unknown argument "INCLUDES".

CMake Error at src/CMakeLists.txt:658 (install): install TARGETS given unknown argument "INCLUDES".

because this argument not supported on CMake versions prior 2.8.12
Closes https://github.com/facebook/rocksdb/pull/2904

Differential Revision: D5863430

Pulled By: yiwu-arbug

fbshipit-source-id: 0f7230e080add472ad4b87836b3104ea0b971a38
2017-09-19 12:01:09 -07:00
Yi Wu
b4596c6174 Fix Get does not return super version on error
Summary:
This is caught when I was testing #2886.
Closes https://github.com/facebook/rocksdb/pull/2907

Differential Revision: D5863153

Pulled By: yiwu-arbug

fbshipit-source-id: 8c54759ba1a0dc101f24ab50423e35731300612d
2017-09-19 12:01:09 -07:00
Orgad Shaneh
34ebadf930 Fix MinGW build
Summary:
snprintf is defined as _snprintf, which doesn't exist in the std
namespace.
Closes https://github.com/facebook/rocksdb/pull/2298

Differential Revision: D5070457

Pulled By: yiwu-arbug

fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
2017-09-19 10:28:26 -07:00
Pengchao Wang
e4234fbdcf collecting kValue type tombstone
Summary:
In our testing cluster, we found large amount tombstone has been promoted to kValue type from kMerge after reaching the top level of compaction. Since we used to only collecting tombstone in merge operator, those tombstones can never be collected.

This PR addresses the issue by adding a GC step in compaction filter, which is only for kValue type records. Since those record already reached the top of compaction (no earlier data exists) we can safely remove them in compaction filter without worrying old data appears.

This PR also removes an old optimization in cassandra merge operator for single merge operands.  We need to do GC even on a single operand, so the optimation does not make sense anymore.
Closes https://github.com/facebook/rocksdb/pull/2855

Reviewed By: sagar0

Differential Revision: D5806445

Pulled By: wpc

fbshipit-source-id: 6eb25629d4ce917eb5e8b489f64a6aa78c7d270b
2017-09-18 16:27:12 -07:00
Maysam Yabandeh
60beefd6e0 WritePrepared Txn: Advance seq one per batch
Summary:
By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions.

Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch.
Closes https://github.com/facebook/rocksdb/pull/2885

Differential Revision: D5837843

Pulled By: maysamyabandeh

fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c
2017-09-18 14:45:08 -07:00
Maysam Yabandeh
c57050b770 Use the default copy constructor in Options
Summary:
Our current implementation of (semi-)copy constructor of DBOptions and ColumnFamilyOptions seems to intend value by value copy, which is what the default copy constructor does anyway. Moreover not using the default constructor has the risk of forgetting to add newly added options.

As an example, allow_2pc seems to be forgotten in the copy constructor which was causing one of the unit tests not seeing its effect.
Closes https://github.com/facebook/rocksdb/pull/2888

Differential Revision: D5846368

Pulled By: maysamyabandeh

fbshipit-source-id: 1ee92a2aeae93886754b7bc039c3411ea2458683
2017-09-15 17:15:10 -07:00
Siying Dong
c319792059 Directly refernce perf_context internally.
Summary:
After 7f6c02dda1, the same get_perf_context() is called both of internally and externally. However, I found internally this is not got inlined. I don't know why this is the case, but directly referencing perf_context is the logical way to do.
Closes https://github.com/facebook/rocksdb/pull/2892

Differential Revision: D5843789

Pulled By: siying

fbshipit-source-id: b49777d8809f35847699291bb7f8ea2754c3af49
2017-09-15 17:15:10 -07:00
Yi Wu
6b3c71f6ed Fix DBImpl::NotifyOnCompactionCompleted data race
Summary:
Access of `cfd->current()` needs to hold db mutex. The data race is caught by TSAN but hard to reproduce: https://gist.github.com/yiwu-arbug/0fc6dc0de915297a1740aa9610be9373
Closes https://github.com/facebook/rocksdb/pull/2894

Differential Revision: D5843884

Pulled By: yiwu-arbug

fbshipit-source-id: 0a30a421bc96f51840821538ad6453dc0815a942
2017-09-15 11:56:31 -07:00
Yi Wu
f47b4eeb1e Fix memory leak in OptionsTest::OptionsComposeDecompose
Summary:
Fixing asan error.
Closes https://github.com/facebook/rocksdb/pull/2887

Differential Revision: D5838895

Pulled By: yiwu-arbug

fbshipit-source-id: 1662ce9856eb5e6877675347dc2240f2acb6fae8
2017-09-15 11:37:37 -07:00
Ben Clay
382277d0fe JNI support for ReadOptions::iterate_upper_bound
Summary:
Plumbed ReadOptions::iterate_upper_bound through JNI.

Made the following design choices:
* Used Slice instead of AbstractSlice due to the anticipated usecase (key / key prefix). Can change this if anyone disagrees.
* Used Slice instead of raw byte[] which seemed cleaner but necessitated the package-private handle-based Slice constructor. Followed WriteBatch as an example.
* We need a copy constructor for ReadOptions, as we create one base ReadOptions for a particular usecase and clone -> change the iterate_upper_bound on each slice operation. Shallow copy seemed cleanest.
* Hold a reference to the upper bound slice on ReadOptions, in contrast to Snapshot.

Signed a Facebook CLA this morning.
Closes https://github.com/facebook/rocksdb/pull/2872

Differential Revision: D5824446

Pulled By: sagar0

fbshipit-source-id: 74fc51313a10a81ecd348625e2a50ca5b7766888
2017-09-14 18:28:20 -07:00
Siying Dong
edcbb36944 Three code-level optimization to Iterator::Next()
Summary:
Three small optimizations:
(1) iter_->IsKeyPinned() shouldn't be called if read_options.pin_data is not true. This may trigger function call all the way down the iterator tree.
(2) reuse the iterator key object in DBIter::FindNextUserEntryInternal(). The constructor of the class has some overheads.
(3) Move the switching direction logic in MergingIterator::Next() to a separate function.

These three in total improves readseq performance by about 3% in my benchmark setting.
Closes https://github.com/facebook/rocksdb/pull/2880

Differential Revision: D5829252

Pulled By: siying

fbshipit-source-id: 991aea10c6d6c3b43769cb4db168db62954ad1e3
2017-09-14 17:57:31 -07:00
Siying Dong
885b1c682e Two small refactoring for better inlining
Summary:
Move uncommon code paths in RangeDelAggregator::ShouldDelete() and IterKey::EnlargeBufferIfNeeded() to a separate function, so that the inlined strcuture can be more optimized.

Optimize it because these places show up in CPU profiling, though minimum. The performance is really hard measure. I ran db_bench with readseq benchmark against in-memory DB many times. The variation is big, but it seems to show 1% improvements.
Closes https://github.com/facebook/rocksdb/pull/2877

Differential Revision: D5828123

Pulled By: siying

fbshipit-source-id: 41a49e229f91e9f8409f85cc6f0dc70e31334e4b
2017-09-14 15:41:49 -07:00
Oleksandr Anyshchenko
ffac68367f Added save points for transactions C API
Summary:
Added possibility to set save points in transactions and then rollback to them
Closes https://github.com/facebook/rocksdb/pull/2876

Differential Revision: D5825829

Pulled By: yiwu-arbug

fbshipit-source-id: 62168992340bbcddecdaea3baa2a678475d1429d
2017-09-14 14:18:59 -07:00
Yi Wu
9a970c81af Fix WriteBatchWithIndex::GetFromBatchAndDB not allowing StackableDB
Summary: Closes https://github.com/facebook/rocksdb/pull/2881

Differential Revision: D5829682

Pulled By: yiwu-arbug

fbshipit-source-id: abb8fa14b58cea7c416282f9be19e8b1a7961c6e
2017-09-13 17:26:35 -07:00
Yi Wu
a843df668b Fix use-after-free in c_tset
Summary:
Fix asan error introduce by #2823
Closes https://github.com/facebook/rocksdb/pull/2879

Differential Revision: D5828454

Pulled By: yiwu-arbug

fbshipit-source-id: 50777855667f4e7b634279a654c3bfa01a1ac729
2017-09-13 16:12:02 -07:00
Sagar Vemuri
2d6e42122b Remove 'experimental' comment around level_compaction_dynamic_level_bytes option
Summary:
Remove misleading 'experimental' comment around `level_compaction_dynamic_level_bytes` option. This is not experimental anymore and is ready for wider adoption. MyRocks is already using it in production.
Closes https://github.com/facebook/rocksdb/pull/2878

Differential Revision: D5828890

Pulled By: sagar0

fbshipit-source-id: fffb45f4999f689b7eca326e4f4caf472d40c5a9
2017-09-13 15:56:24 -07:00
Andrew Kryczka
464fb36de9 fix hanging after CompactFiles with L0 overlap
Summary:
Bug report: https://www.facebook.com/groups/rocksdb.dev/permalink/1389452781153232/

Non-empty `level0_compactions_in_progress_` was aborting `CompactFiles` after incrementing `bg_compaction_scheduled_`, and in that case we never decremented it. This blocked future compactions and prevented DB close as we wait for scheduled compactions to finish/abort during close.

I eliminated `CompactFiles`'s dependency on `level0_compactions_in_progress_`. Since it takes a contiguous span of L0 files -- through the last L0 file if any L1+ files are included -- it's fine to run in parallel with other compactions involving L0. We make the same assumption in intra-L0 compaction.
Closes https://github.com/facebook/rocksdb/pull/2849

Differential Revision: D5780440

Pulled By: ajkr

fbshipit-source-id: 15b15d3faf5a699aed4b82a58352d4a7bb23e027
2017-09-13 15:41:38 -07:00
Maysam Yabandeh
09713a64b3 WritePrepared Txn: Lock-free CommitMap
Summary:
We had two proposals for lock-free commit maps. This patch implements the latter one that was simpler. We can later experiment with both proposals.

In this impl each entry is an std::atomic of uint64_t, which are accessed via memory_order_acquire/release. In x86_64 arch this is compiled to simple reads and writes from memory.
Closes https://github.com/facebook/rocksdb/pull/2861

Differential Revision: D5800724

Pulled By: maysamyabandeh

fbshipit-source-id: 41abae9a4a5df050a8eb696c43de11c2770afdda
2017-09-13 12:12:11 -07:00
Oleksandr Anyshchenko
72e4190918 Additions for OptimisticTransactionDB in C API
Summary:
Added some bindings for `OptimisticTransactionDB` in C API
Closes https://github.com/facebook/rocksdb/pull/2823

Differential Revision: D5820672

Pulled By: yiwu-arbug

fbshipit-source-id: 7efd17f619cc0741feddd2050b8fc856f9288350
2017-09-13 12:12:11 -07:00
Andrew Kryczka
9d115d3689 regression test for missing init options
Summary:
test the `DBOptions(const Options&)` and `ColumnFamilyOptions(const Options&)` constructors. Actually this'll work better once we refactor `RandomInitDBOptions` / `RandomInitCFOptions` to use the authoritative sources of struct members: `db_options_type_info` / `cf_options_type_info` (internal task T21804189 for this).
Closes https://github.com/facebook/rocksdb/pull/2873

Differential Revision: D5817141

Pulled By: ajkr

fbshipit-source-id: 8567c20feced9d1751fdf1f4383e2af30f7e3591
2017-09-13 11:56:35 -07:00
gladiator
f615f5604b fix missing manual_wal_flush for DBOptions ctor
Summary:
currently `ImmutableDBOptions::Dump` use default value for `concurrent_prepare` and `manual_wal_flush`, because DBOptions ctor does not init those member variables.

so in LOG file,  it will be
```
             Options.concurrent_prepare: 0
             Options.manual_wal_flush: 0
```
Closes https://github.com/facebook/rocksdb/pull/2864

Differential Revision: D5816240

Pulled By: ajkr

fbshipit-source-id: 82335e8bcae3dceedc6a99224e7998de5fad1e50
2017-09-12 18:01:08 -07:00
Amy Xu
5785b1fcb8 Fix naming in InternalKey
Summary:
- Switched all instances of SetMinPossibleForUserKey and SetMaxPossibleForUserKey in accordance to InternalKeyComparator's comparison logic
Closes https://github.com/facebook/rocksdb/pull/2868

Differential Revision: D5804152

Pulled By: axxufb

fbshipit-source-id: 80be35e04f2e8abc35cc64abe1fecb03af24e183
2017-09-12 17:17:42 -07:00
Bernhard M. Wiedemann
82860bd55c Use cmake TIMESTAMP function
Summary:
because it is not only platform independent
but also allows to override the build date
This helps to make ceph builds reproducible (that includes a fork of rockdb in a submodule)

Also adds UTC flag, to be independent of timezone.

Requires cmake-2.8.11+ from 2013
Closes https://github.com/facebook/rocksdb/pull/2848

Differential Revision: D5820189

Pulled By: yiwu-arbug

fbshipit-source-id: e3e8c1550e10e238c173f6c5d9ba15f71ad3ce28
2017-09-12 17:17:42 -07:00
Maysam Yabandeh
2d30aaae47 Exclude incompatible options in test
Summary:
options.enable_pipelined_write and options.concurrent_prepare are incompatible and should not be set together.
Closes https://github.com/facebook/rocksdb/pull/2875

Differential Revision: D5818358

Pulled By: maysamyabandeh

fbshipit-source-id: dad862508f00817ab302f8b61729accf38315fb8
2017-09-12 14:58:46 -07:00
Andrew Kryczka
f5148ade10 support opening zero backups during engine init
Summary:
There are internal users who open BackupEngine for writing new backups only, and they don't care whether old backups can be read or not. The condition `BackupableDBOptions::max_valid_backups_to_open == 0` should be supported (previously in df74b775e6 I made the mistake of choosing 0 as a special value to disable the limit).
Closes https://github.com/facebook/rocksdb/pull/2819

Differential Revision: D5751599

Pulled By: ajkr

fbshipit-source-id: e73ac19eb5d756d6b68601eae8e43407ee4f2752
2017-09-12 13:26:34 -07:00
Archit Mishra
3c42807794 do not call merge when checking to see if key exists
Summary:
Changes:
* added check for value before merge is called on code path that should check if key exists
Closes https://github.com/facebook/rocksdb/pull/2814

Reviewed By: IslamAbdelRahman

Differential Revision: D5743966

Pulled By: armishra

fbshipit-source-id: 6ac4283bc510c8ca50827d87ef0ba631f2b33b18
2017-09-12 12:02:53 -07:00
Andrew Kryczka
025b85b4ac speedup DBTest.EncodeDecompressedBlockSizeTest
Summary:
it sometimes takes more than 10 minutes (i.e., times out) on our internal CI. mainly because bzip is super slow. so I reduced the amount of  work it tries to do.
Closes https://github.com/facebook/rocksdb/pull/2856

Differential Revision: D5795883

Pulled By: ajkr

fbshipit-source-id: e69f986ae60b44ecc26b6b024abd0f13bdf3a3c5
2017-09-12 11:26:47 -07:00
zawlazaw
044a71e27e Add iterator's SeekForPrev functionality to the java-api
Summary:
As discussed in #2742 , this pull-requests brings the iterator's [SeekForPrev()](https://github.com/facebook/rocksdb/wiki/SeekForPrev) functionality to the java-api. It affects all locations in the code where previously only Seek() was supported.

All code changes are essentially a copy & paste of the already existing implementations for Seek().
**Please Note**: the changes to the C++ code were applied without fully understanding its effect, so please take a closer look. However, since Seek() and SeekForPrev() provide exactly the same signature, I do not expect any mistake here.

The java-tests are extended by new tests for the additional functionality.

Compilation (`make rocksdbjavastatic`) and test (`java/make test`) run without errors.
Closes https://github.com/facebook/rocksdb/pull/2747

Differential Revision: D5721011

Pulled By: sagar0

fbshipit-source-id: c1f951cddc321592c70dd2d32bc04892f3f119f8
2017-09-12 10:56:29 -07:00
Siying Dong
64b6452e0c Make InternalKeyComparator final and directly use it in merging iterator
Summary:
Merging iterator invokes InternalKeyComparator.Compare() frequently to heap merge. By making InternalKeyComparator final and merging iterator to directly use InternalKeyComparator rather than through Iterator interface, we can give compiler a choice to avoid one more virtual function call if possible. I ran readseq benchmark in memory-only use case to make sure the performance at least doesn't regress.

I have to disable the final key word in debug build, as a hack test class depends on overriding the class.
Closes https://github.com/facebook/rocksdb/pull/2860

Differential Revision: D5800461

Pulled By: siying

fbshipit-source-id: ab876f22a09bb5c560740911412336e0e25ccb53
2017-09-11 12:04:21 -07:00
Siying Dong
2dd22e5449 Make DBIter class final
Summary:
DBIter is referenced in ArenaWrappedDBIter, which is a simple wrapper. If DBIter is final, some virtual function call can be avoided. Some functions can even be inlined, like DBIter.value() to ArenaWrappedDBIter.value() and DBIter.key() to ArenaWrappedDBIter.key(). The performance gain is hard to measure. I just ran the memory-only benchmark for readseq and saw it didn't regress. There shouldn't be any harm doing it. Just give compiler more choices.
Closes https://github.com/facebook/rocksdb/pull/2859

Differential Revision: D5799888

Pulled By: siying

fbshipit-source-id: 829788f91310c40282dcfb7e412e6ef489931143
2017-09-11 12:04:21 -07:00
Huachao Huang
2a5915049e Fix missing BYTES_PER_WRITE for pipeline write
Summary: Closes https://github.com/facebook/rocksdb/pull/2862

Differential Revision: D5805638

Pulled By: yiwu-arbug

fbshipit-source-id: 72d38c74395690023a719f400daff01527645a17
2017-09-11 11:41:27 -07:00
Maysam Yabandeh
f46464d383 write-prepared txn: call IsInSnapshot
Summary:
This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
Closes https://github.com/facebook/rocksdb/pull/2850

Differential Revision: D5787375

Pulled By: maysamyabandeh

fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
2017-09-11 09:14:48 -07:00
Maysam Yabandeh
9a4df72994 WritePrepared Txn: CommitBatch
Summary:
Implements CommitBatch and CommitWithoutPrepare for WritePreparedTxn
Closes https://github.com/facebook/rocksdb/pull/2854

Differential Revision: D5793999

Pulled By: maysamyabandeh

fbshipit-source-id: d8b9858221162c6ac7a1f6912cbd3481d0d8a503
2017-09-08 15:56:39 -07:00
Maysam Yabandeh
fce6c892ab Advance max evicted seq in coarser granularity
Summary:
This patch advances the max_evicted_seq_ is larger granularities to reduce the overhead of updating the relevant data structures.

It also refactor the related code and adds testing to that. As part of this patch some of the TODOs for removing usage of non-static const members are also addressed.
Closes https://github.com/facebook/rocksdb/pull/2844

Differential Revision: D5772928

Pulled By: maysamyabandeh

fbshipit-source-id: f4fcc2948be69c034f10812cf922ce5ab82ef98c
2017-09-08 14:41:22 -07:00
Yi Wu
dcd36a6aee Make it explicit blob db doesn't support CF
Summary:
Blob db doesn't currently support column families. Return NotSupported status explicitly.
Closes https://github.com/facebook/rocksdb/pull/2825

Differential Revision: D5757438

Pulled By: yiwu-arbug

fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa
2017-09-08 11:11:04 -07:00