Commit Graph

6549 Commits

Author SHA1 Message Date
Yi Wu
d1cab2b64e Add ValueType::kTypeBlobIndex
Summary:
Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).

The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().

Changes on blob db side will be in a separate patch.
Closes https://github.com/facebook/rocksdb/pull/2886

Differential Revision: D5838431

Pulled By: yiwu-arbug

fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
2017-10-03 09:11:23 -07:00
Andrew Kryczka
880411f54c disable populating block cache for in-place updates
Summary:
There's no point populating the block cache during this read. The key we read is guaranteed to be overwritten with a new `kValueType` key immediately afterwards, so can't be accessed again. A user was seeing high turnover of data blocks, at least partially due to this.
Closes https://github.com/facebook/rocksdb/pull/2959

Differential Revision: D5961672

Pulled By: ajkr

fbshipit-source-id: e7cb27c156c5db3b32af355c780efb99dbdf087c
2017-10-02 20:41:24 -07:00
Maysam Yabandeh
d27258d3a6 WritePrepared Txn: Rollback
Summary:
Implement the rollback of WritePrepared txns. For each modified value, it reads the value before the txn and write it back. This would cancel out the effect of transaction. It also remove the rolled back txn from prepared heap.
Closes https://github.com/facebook/rocksdb/pull/2946

Differential Revision: D5937575

Pulled By: maysamyabandeh

fbshipit-source-id: a6d3c47f44db3729f44b287a80f97d08dc4e888d
2017-10-02 19:59:27 -07:00
Sagar Vemuri
bb38cd03a9 Limit number of merge operands in Cassandra merge operator
Summary:
Now that RocksDB supports conditional merging during point lookups (introduced in #2923), Cassandra value merge operator can be updated to pass in a limit. The limit needs to be passed in from the Cassandra code.
Closes https://github.com/facebook/rocksdb/pull/2947

Differential Revision: D5938454

Pulled By: sagar0

fbshipit-source-id: d64a72d53170d8cf202b53bd648475c3952f7d7f
2017-10-02 16:11:40 -07:00
Aliaksei Sandryhaila
cf51d3eb73 Remove an "unused" variable
Summary:
PR 2893 introduced a variable that is only used in TEST_SYNC_POINT_CALLBACK. When RocksDB is not built in debug mode, this method is not compiled in, and the variable is unused, which triggers a compiler error.

This patch reverts the corresponding part of #2893.
Closes https://github.com/facebook/rocksdb/pull/2956

Reviewed By: yiwu-arbug

Differential Revision: D5955679

Pulled By: asandryh

fbshipit-source-id: ac4a8e85b22da7f02efb117cd2e4a6e07ba73390
2017-10-02 15:26:29 -07:00
Adam Retter
983028f097 RocksJava build target for Docker on ppc64le
Summary:
This enables us to crossbuild pcc64le RocksJava binaries with a suitably old version of glibc (2.17) on CentOS 7.
Closes https://github.com/facebook/rocksdb/pull/2491

Differential Revision: D5955301

Pulled By: sagar0

fbshipit-source-id: 69ef9746f1dc30ffde4063dc764583d8c7ae937e
2017-10-02 11:11:56 -07:00
Siying Dong
2a3363d52e ldb dump can print histogram of value size
Summary:
Make "ldb dump --count_only" print histogram of value size. Also, fix a bug that "ldb dump --path=<db_path>" doesn't work.
Closes https://github.com/facebook/rocksdb/pull/2944

Differential Revision: D5954527

Pulled By: siying

fbshipit-source-id: c620a444ec544258b8d113f5f663c375dd53d6be
2017-10-02 09:41:17 -07:00
Zhongyi Xie
593d3de371 No need for Restart Interval for meta blocks
Summary:
In SST files, restart interval helps us search in data blocks. However, some meta blocks will be read sequentially, so there's no need for restart points. Restart interval will introduce extra space in the block (https://github.com/facebook/rocksdb/blob/master/table/block_builder.cc#L80). We will see if we can remove this redundant space. (Maybe set restart interval to infinite.)
Closes https://github.com/facebook/rocksdb/pull/2940

Differential Revision: D5930139

Pulled By: miasantreble

fbshipit-source-id: 92b1b23c15cffa90378343ac846b713623b19c21
2017-09-29 20:26:20 -07:00
Maysam Yabandeh
2b22baf304 Add a template for issues
Summary:
This template reminds the users to use issues only for bug reports. The template is written according to the github guidelines at https://help.github.com/articles/creating-an-issue-template-for-your-repository/
Closes https://github.com/facebook/rocksdb/pull/2948

Differential Revision: D5943558

Pulled By: maysamyabandeh

fbshipit-source-id: c83b5d211ea8e334107141967689b2f0c453bbc9
2017-09-29 11:41:28 -07:00
Maysam Yabandeh
ab0542f5ec Fix for when block.cache_handle is nullptr
Summary:
When using with compressed cache it is possible that the status is ok but the block is not actually added to the block cache. The patch takes this case into account.
Closes https://github.com/facebook/rocksdb/pull/2945

Differential Revision: D5937613

Pulled By: maysamyabandeh

fbshipit-source-id: 5428cf1115e5046b3d01ab78d26cb181122af4c6
2017-09-29 07:56:55 -07:00
Andrew Kryczka
5df172da2f fix deletion-triggered compaction in table builder
Summary:
It was broken when `NotifyCollectTableCollectorsOnFinish` was introduced. That function called `Finish` on each of the `TablePropertiesCollector`s, and `CompactOnDeletionCollector::Finish()` was resetting all its internal state. Then, when we checked whether compaction is necessary, the flag had already been cleared.

Fixed above issue by avoiding resetting internal state during `Finish()`. Multiple calls to `Finish()` are allowed, but callers cannot invoke `AddUserKey()` on the collector after any finishes.
Closes https://github.com/facebook/rocksdb/pull/2936

Differential Revision: D5918659

Pulled By: ajkr

fbshipit-source-id: 4f05e9d80e50ee762ba1e611d8d22620029dca6b
2017-09-28 18:17:30 -07:00
Maysam Yabandeh
385049baf2 WritePrepared Txn: Recovery
Summary:
Recover txns from the WAL. Also added some unit tests.
Closes https://github.com/facebook/rocksdb/pull/2901

Differential Revision: D5859596

Pulled By: maysamyabandeh

fbshipit-source-id: 6424967b231388093b4effffe0a3b1b7ec8caeb0
2017-09-28 16:56:45 -07:00
Yu Shu
8c724f5c7f Default one to rocksdb:x64-windows
Summary:
The default one will try to install rocksdb:x86-windows, which would lead to failing of the build at the last step (CMake Error, Rocksdb only supports x64). Because it will try to install a serials of x86 version package, and those cannot proceed to rocksdb:x86-windows building. By using rocksdb:x64-windows, we can make sure to install x64 version.
Tested on Win10 x64.
Closes https://github.com/facebook/rocksdb/pull/2941

Differential Revision: D5937139

Pulled By: sagar0

fbshipit-source-id: 15637fe23df59326a0e607bd4d5c48733e20bae3
2017-09-28 16:12:24 -07:00
Sagar Vemuri
93c2b91740 Introduce conditional merge-operator invocation in point lookups
Summary:
For every merge operand encountered for a key in the read path we now have the ability to decide whether to look further (to retrieve more merge operands for the key) or stop and invoke the merge operator to return the value. The user needs to override `ShouldMerge()` method with a condition to terminate search when true to avail this facility.

This has a couple of advantages:
1. It helps in limiting the number of merge operands that are looked at to compute a value as part of a user Get operation.
2. It allows to peek at a merge key-value to see if further merge operands need to look at.

Example: Limiting the number of merge operands that are looked at: Lets say you have 10 merge operands for a key spread over various levels. If you only want RocksDB to look at the latest two merge operands instead of all 10 to compute the value, it is now possible with this PR. You can set the condition in `ShouldMerge()` to return true when the size of the operand list is 2. Look at the example implementation in the unit test. Without this PR, a Get might look at all the 10 merge operands in different levels before invoking the merge-operator.

Added a new unit test.
Made sure that there is no perf regression by running benchmarks.

Command line to Load data:
```
TEST_TMPDIR=/dev/shm ./db_bench --benchmarks="mergerandom" --merge_operator="uint64add" --num=10000000
...
mergerandom  :      12.861 micros/op 77757 ops/sec;    8.6 MB/s ( updates:10000000)
```

**ReadRandomMergeRandom bechmark results:**
Command line:
```
TEST_TMPDIR=/dev/shm ./db_bench --benchmarks="readrandommergerandom" --merge_operator="uint64add" --num=10000000
```

Base -- Without this code change (on commit fc7476b):
```
readrandommergerandom :      38.586 micros/op 25916 ops/sec; (reads:3001599 merges:6998401 total:10000000 hits:842235 maxlength:8)
```

With this code change:
```
readrandommergerandom :      38.653 micros/op 25870 ops/sec; (reads:3001599 merges:6998401 total:10000000 hits:842235 maxlength:8)
```
Closes https://github.com/facebook/rocksdb/pull/2923

Differential Revision: D5898239

Pulled By: sagar0

fbshipit-source-id: daefa325019f77968639a75c851d46352c2303ef
2017-09-28 15:58:49 -07:00
Aliaksei Sandryhaila
a48a398e7c Use RAII instead of pointers in cf_info_map
Summary:
There is no need for smart pointers in cf_info_map, so use RAII. This should also placate valgrind.
Closes https://github.com/facebook/rocksdb/pull/2943

Differential Revision: D5932941

Pulled By: asandryh

fbshipit-source-id: 2c37df88573a9df2557880a31193926e4425e054
2017-09-28 14:26:47 -07:00
Maysam Yabandeh
c70586621c Blog post for 5.8 release
Summary: Closes https://github.com/facebook/rocksdb/pull/2942

Differential Revision: D5932858

Pulled By: maysamyabandeh

fbshipit-source-id: e11f52a0b08d65149bb49d99d1dbc82cb5a96fa0
2017-09-28 10:14:09 -07:00
Andrew Kryczka
c2f6e45aa3 prevent nullptr dereference in table reader error case
Summary:
A user encountered segfault on the call to `CacheDependencies()`, probably because `NewIndexIterator()` failed before populating `*index_entry`. Let's avoid the call in that case.
Closes https://github.com/facebook/rocksdb/pull/2939

Differential Revision: D5928611

Pulled By: ajkr

fbshipit-source-id: 484be453dbb00e5e160e9c6a1bc933df7d80f574
2017-09-28 00:12:34 -07:00
Quinn Jarrell
6a541afcc4 Make bytes_per_sync and wal_bytes_per_sync mutable
Summary:
SUMMARY
Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed.
TEST PLAN
ran make check
all passed

Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value.
Closes https://github.com/facebook/rocksdb/pull/2893

Reviewed By: yiwu-arbug

Differential Revision: D5845814

Pulled By: TheRushingWookie

fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd
2017-09-27 17:49:45 -07:00
Yi Wu
ec48e5c77f Add TransactionDB::SingleDelete()
Summary:
Looks like the API is simply missing. Adding it.
Closes https://github.com/facebook/rocksdb/pull/2937

Differential Revision: D5919955

Pulled By: yiwu-arbug

fbshipit-source-id: 6e2e9c96c29882b0bb4113d1f8efb72bffc57878
2017-09-27 10:27:26 -07:00
Sagar Vemuri
0806801dc8 DestroyDB API
Summary:
Expose DestroyDB API in RocksJava.
Closes https://github.com/facebook/rocksdb/pull/2934

Differential Revision: D5914775

Pulled By: sagar0

fbshipit-source-id: 84af6ea0d2bccdcfb9fe8c07b2f87373f0d5bab6
2017-09-26 16:42:11 -07:00
Maysam Yabandeh
aa67bae6cf Break down PinnedDataIteratorRandomized
Summary:
Its timing out under tsan.
Closes https://github.com/facebook/rocksdb/pull/2928

Differential Revision: D5911766

Pulled By: maysamyabandeh

fbshipit-source-id: 2faacc07752ac8713a3a2abb5a4c4b7ae3bdf208
2017-09-26 14:27:30 -07:00
Siying Dong
4748911357 Add LogDevice to USERS.md
Summary: Closes https://github.com/facebook/rocksdb/pull/2927

Differential Revision: D5906613

Pulled By: siying

fbshipit-source-id: 607401e05b27508c816c700864fe81514606e4ef
2017-09-25 15:56:40 -07:00
Zhongyi Xie
1d6700f9e6 Add test kPointInTimeRecoveryCFConsistency
Summary:
Context/problem:

- CFs may be flushed at different times
- A WAL can only be deleted after all CFs have flushed beyond end of that WAL.
- Point-in-time recovery might stop upon reaching the first corruption.
- Some CFs may have already flushed beyond that point, while others haven't. We should fail the Open() instead of proceeding with inconsistent CFs.
Closes https://github.com/facebook/rocksdb/pull/2900

Differential Revision: D5863281

Pulled By: miasantreble

fbshipit-source-id: 180dbaf83d96c804cff49b3c406312a4ae61313e
2017-09-22 17:26:36 -07:00
Yi Wu
be97dbb15c Fix WritePreparedTransactionTest::SeqAdvanceTest ASAN failure
Summary: Closes https://github.com/facebook/rocksdb/pull/2922

Differential Revision: D5895310

Pulled By: yiwu-arbug

fbshipit-source-id: 52c635a25d22478ec1eca49b6817551202babac2
2017-09-22 15:26:42 -07:00
Andrew Kryczka
4708a6875c Repair DBs with trailing slash in name
Summary:
Problem:

- `DB::SanitizeOptions` strips trailing slash from `wal_dir` but not `dbname`
- We check whether `wal_dir` and `dbname` refer to the same directory using string equality: https://github.com/facebook/rocksdb/blob/master/db/repair.cc#L258
- Providing `dbname` with trailing slash causes default `wal_dir` to be misidentified as a separate directory.
- Then the repair tries to add all SST files to the `VersionEdit` twice (once for `dbname` dir, once for `wal_dir`) and fails with coredump.

Solution:

- Add a new `Env` function, `AreFilesSame`, which uses device and inode number to check whether files are the same. It's currently only implemented in `PosixEnv`.
- Migrate repair to use `AreFilesSame` to check whether `dbname` and `wal_dir` are same. If unsupported, falls back to string comparison.
Closes https://github.com/facebook/rocksdb/pull/2827

Differential Revision: D5761349

Pulled By: ajkr

fbshipit-source-id: c839d548678b742af1166d60b09abd94e5476238
2017-09-22 12:42:22 -07:00
Andrew Kryczka
fc7476bec1 fix populating range deletions in forward iterator
Summary:
fixes #2902
Closes https://github.com/facebook/rocksdb/pull/2917

Differential Revision: D5887175

Pulled By: ajkr

fbshipit-source-id: 364e292c636a3238bfc53b0fb9a01ff2f82dcbb9
2017-09-21 17:56:38 -07:00
Sagar Vemuri
c8f3606731 Expose LoadLatestOptions, LoadOptionsFromFile and GetLatestOptionsFileName APIs in RocksJava
Summary:
JNI wrappers for LoadLatestOptions, LoadOptionsFromFile and GetLatestOptionsFileName APIs.
Closes https://github.com/facebook/rocksdb/pull/2898

Differential Revision: D5857934

Pulled By: sagar0

fbshipit-source-id: 68b79e83eab8de9416e3f1fef73e11cf7947e90a
2017-09-21 17:29:13 -07:00
Sagar Vemuri
96a13b4f4b Use jemalloc in rocksdbjni library built via vagrant
Summary:
Problem:
During RocksJava performance testing we found that the rocksdb jni library is not built with jemalloc; instead it was getting built with the default glibc malloc. We saw quite a bit of memory bloat due to this.

Addressed this by installing jemalloc-devel package in the vm that we use to build release jars.
Closes https://github.com/facebook/rocksdb/pull/2916

Differential Revision: D5887018

Pulled By: sagar0

fbshipit-source-id: ace0b5d60234b3a30dcd5d39633e7827a5982a50
2017-09-21 16:42:06 -07:00
PhaniShekhar
65a9cd6168 Use L1 size as estimate for L0 size in LevelCompactionBuilder::GetPathID
Summary:
Fix for [2461](https://github.com/facebook/rocksdb/issues/2461).

Problem: When using multiple db_paths setting with RocksDB, RocksDB incorrectly calculates the size of L1 in LevelCompactionBuilder::GetPathId.

max_bytes_for_level_base is used as L0 size and L1 size is calculated as (L0 size * max_bytes_for_level_multiplier). However, L1 size should be max_bytes_for_level_base.

Solution: Use max_bytes_for_level_base as L1 size. Also, use L1 size as the estimated size of L0.
Closes https://github.com/facebook/rocksdb/pull/2903

Differential Revision: D5885442

Pulled By: maysamyabandeh

fbshipit-source-id: 036da1c9298d173b9b80479cc6661ee4b7a951f6
2017-09-21 15:57:58 -07:00
Andrew Kryczka
8fc3de3c62 make rate limiter a general option
Summary:
it's unsupported in options file, so the flag should be respected by db_bench even when an options file is provided.
Closes https://github.com/facebook/rocksdb/pull/2910

Differential Revision: D5869836

Pulled By: ajkr

fbshipit-source-id: f67f591ae083e95e989f86b6fad50765d2e3d855
2017-09-21 11:11:00 -07:00
Yi Wu
1480e6f7cf Fix TransactionTest::SeqAdvanceTest ASAN failure
Summary:
The test didn't delete txn before creating a new one.
Closes https://github.com/facebook/rocksdb/pull/2913

Differential Revision: D5880236

Pulled By: yiwu-arbug

fbshipit-source-id: 7a4fcaada3d86332292754502cd8f4341143bf4f
2017-09-21 09:56:54 -07:00
Sagar Vemuri
3fc08fa88e Expose max_background_jobs option in RocksJava
Summary:
This option was introduced in the C++ API in RocksDB 5.6 in bb01c1880c . Now, exposing it through RocksJava API.
Closes https://github.com/facebook/rocksdb/pull/2908

Differential Revision: D5864224

Pulled By: sagar0

fbshipit-source-id: 140aa55dcf74b14e4d11219d996735c7fdddf513
2017-09-20 10:26:37 -07:00
Yao Zongyou
8ae81684e9 Update cmake_minimum_required to 2.8.12.
Summary:
Hello,

current master branch declares cmake_minimum_required (VERSION 2.8.11)
but cmake gives the following error:

[  6%] CMake Error at CMakeLists.txt:658 (install):
  install TARGETS given unknown argument "INCLUDES".

CMake Error at src/CMakeLists.txt:658 (install): install TARGETS given unknown argument "INCLUDES".

because this argument not supported on CMake versions prior 2.8.12
Closes https://github.com/facebook/rocksdb/pull/2904

Differential Revision: D5863430

Pulled By: yiwu-arbug

fbshipit-source-id: 0f7230e080add472ad4b87836b3104ea0b971a38
2017-09-19 12:01:09 -07:00
Yi Wu
b4596c6174 Fix Get does not return super version on error
Summary:
This is caught when I was testing #2886.
Closes https://github.com/facebook/rocksdb/pull/2907

Differential Revision: D5863153

Pulled By: yiwu-arbug

fbshipit-source-id: 8c54759ba1a0dc101f24ab50423e35731300612d
2017-09-19 12:01:09 -07:00
Orgad Shaneh
34ebadf930 Fix MinGW build
Summary:
snprintf is defined as _snprintf, which doesn't exist in the std
namespace.
Closes https://github.com/facebook/rocksdb/pull/2298

Differential Revision: D5070457

Pulled By: yiwu-arbug

fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
2017-09-19 10:28:26 -07:00
Pengchao Wang
e4234fbdcf collecting kValue type tombstone
Summary:
In our testing cluster, we found large amount tombstone has been promoted to kValue type from kMerge after reaching the top level of compaction. Since we used to only collecting tombstone in merge operator, those tombstones can never be collected.

This PR addresses the issue by adding a GC step in compaction filter, which is only for kValue type records. Since those record already reached the top of compaction (no earlier data exists) we can safely remove them in compaction filter without worrying old data appears.

This PR also removes an old optimization in cassandra merge operator for single merge operands.  We need to do GC even on a single operand, so the optimation does not make sense anymore.
Closes https://github.com/facebook/rocksdb/pull/2855

Reviewed By: sagar0

Differential Revision: D5806445

Pulled By: wpc

fbshipit-source-id: 6eb25629d4ce917eb5e8b489f64a6aa78c7d270b
2017-09-18 16:27:12 -07:00
Maysam Yabandeh
60beefd6e0 WritePrepared Txn: Advance seq one per batch
Summary:
By default the seq number in DB is increased once per written key. WritePrepared txns requires the seq to be increased once per the entire batch so that the seq would be used as the prepare timestamp by which the transaction is identified. Also we need to increase seq for the commit marker since it would give a unique id to the commit timestamp of transactions.

Two unit tests are added to verify our understanding of how the seq should be increased. The recovery path requires much more work and is left to another patch.
Closes https://github.com/facebook/rocksdb/pull/2885

Differential Revision: D5837843

Pulled By: maysamyabandeh

fbshipit-source-id: a08960b93d727e1cf438c254d0c2636fb133cc1c
2017-09-18 14:45:08 -07:00
Maysam Yabandeh
c57050b770 Use the default copy constructor in Options
Summary:
Our current implementation of (semi-)copy constructor of DBOptions and ColumnFamilyOptions seems to intend value by value copy, which is what the default copy constructor does anyway. Moreover not using the default constructor has the risk of forgetting to add newly added options.

As an example, allow_2pc seems to be forgotten in the copy constructor which was causing one of the unit tests not seeing its effect.
Closes https://github.com/facebook/rocksdb/pull/2888

Differential Revision: D5846368

Pulled By: maysamyabandeh

fbshipit-source-id: 1ee92a2aeae93886754b7bc039c3411ea2458683
2017-09-15 17:15:10 -07:00
Siying Dong
c319792059 Directly refernce perf_context internally.
Summary:
After 7f6c02dda1, the same get_perf_context() is called both of internally and externally. However, I found internally this is not got inlined. I don't know why this is the case, but directly referencing perf_context is the logical way to do.
Closes https://github.com/facebook/rocksdb/pull/2892

Differential Revision: D5843789

Pulled By: siying

fbshipit-source-id: b49777d8809f35847699291bb7f8ea2754c3af49
2017-09-15 17:15:10 -07:00
Yi Wu
6b3c71f6ed Fix DBImpl::NotifyOnCompactionCompleted data race
Summary:
Access of `cfd->current()` needs to hold db mutex. The data race is caught by TSAN but hard to reproduce: https://gist.github.com/yiwu-arbug/0fc6dc0de915297a1740aa9610be9373
Closes https://github.com/facebook/rocksdb/pull/2894

Differential Revision: D5843884

Pulled By: yiwu-arbug

fbshipit-source-id: 0a30a421bc96f51840821538ad6453dc0815a942
2017-09-15 11:56:31 -07:00
Yi Wu
f47b4eeb1e Fix memory leak in OptionsTest::OptionsComposeDecompose
Summary:
Fixing asan error.
Closes https://github.com/facebook/rocksdb/pull/2887

Differential Revision: D5838895

Pulled By: yiwu-arbug

fbshipit-source-id: 1662ce9856eb5e6877675347dc2240f2acb6fae8
2017-09-15 11:37:37 -07:00
Ben Clay
382277d0fe JNI support for ReadOptions::iterate_upper_bound
Summary:
Plumbed ReadOptions::iterate_upper_bound through JNI.

Made the following design choices:
* Used Slice instead of AbstractSlice due to the anticipated usecase (key / key prefix). Can change this if anyone disagrees.
* Used Slice instead of raw byte[] which seemed cleaner but necessitated the package-private handle-based Slice constructor. Followed WriteBatch as an example.
* We need a copy constructor for ReadOptions, as we create one base ReadOptions for a particular usecase and clone -> change the iterate_upper_bound on each slice operation. Shallow copy seemed cleanest.
* Hold a reference to the upper bound slice on ReadOptions, in contrast to Snapshot.

Signed a Facebook CLA this morning.
Closes https://github.com/facebook/rocksdb/pull/2872

Differential Revision: D5824446

Pulled By: sagar0

fbshipit-source-id: 74fc51313a10a81ecd348625e2a50ca5b7766888
2017-09-14 18:28:20 -07:00
Siying Dong
edcbb36944 Three code-level optimization to Iterator::Next()
Summary:
Three small optimizations:
(1) iter_->IsKeyPinned() shouldn't be called if read_options.pin_data is not true. This may trigger function call all the way down the iterator tree.
(2) reuse the iterator key object in DBIter::FindNextUserEntryInternal(). The constructor of the class has some overheads.
(3) Move the switching direction logic in MergingIterator::Next() to a separate function.

These three in total improves readseq performance by about 3% in my benchmark setting.
Closes https://github.com/facebook/rocksdb/pull/2880

Differential Revision: D5829252

Pulled By: siying

fbshipit-source-id: 991aea10c6d6c3b43769cb4db168db62954ad1e3
2017-09-14 17:57:31 -07:00
Siying Dong
885b1c682e Two small refactoring for better inlining
Summary:
Move uncommon code paths in RangeDelAggregator::ShouldDelete() and IterKey::EnlargeBufferIfNeeded() to a separate function, so that the inlined strcuture can be more optimized.

Optimize it because these places show up in CPU profiling, though minimum. The performance is really hard measure. I ran db_bench with readseq benchmark against in-memory DB many times. The variation is big, but it seems to show 1% improvements.
Closes https://github.com/facebook/rocksdb/pull/2877

Differential Revision: D5828123

Pulled By: siying

fbshipit-source-id: 41a49e229f91e9f8409f85cc6f0dc70e31334e4b
2017-09-14 15:41:49 -07:00
Oleksandr Anyshchenko
ffac68367f Added save points for transactions C API
Summary:
Added possibility to set save points in transactions and then rollback to them
Closes https://github.com/facebook/rocksdb/pull/2876

Differential Revision: D5825829

Pulled By: yiwu-arbug

fbshipit-source-id: 62168992340bbcddecdaea3baa2a678475d1429d
2017-09-14 14:18:59 -07:00
Yi Wu
9a970c81af Fix WriteBatchWithIndex::GetFromBatchAndDB not allowing StackableDB
Summary: Closes https://github.com/facebook/rocksdb/pull/2881

Differential Revision: D5829682

Pulled By: yiwu-arbug

fbshipit-source-id: abb8fa14b58cea7c416282f9be19e8b1a7961c6e
2017-09-13 17:26:35 -07:00
Yi Wu
a843df668b Fix use-after-free in c_tset
Summary:
Fix asan error introduce by #2823
Closes https://github.com/facebook/rocksdb/pull/2879

Differential Revision: D5828454

Pulled By: yiwu-arbug

fbshipit-source-id: 50777855667f4e7b634279a654c3bfa01a1ac729
2017-09-13 16:12:02 -07:00
Sagar Vemuri
2d6e42122b Remove 'experimental' comment around level_compaction_dynamic_level_bytes option
Summary:
Remove misleading 'experimental' comment around `level_compaction_dynamic_level_bytes` option. This is not experimental anymore and is ready for wider adoption. MyRocks is already using it in production.
Closes https://github.com/facebook/rocksdb/pull/2878

Differential Revision: D5828890

Pulled By: sagar0

fbshipit-source-id: fffb45f4999f689b7eca326e4f4caf472d40c5a9
2017-09-13 15:56:24 -07:00
Andrew Kryczka
464fb36de9 fix hanging after CompactFiles with L0 overlap
Summary:
Bug report: https://www.facebook.com/groups/rocksdb.dev/permalink/1389452781153232/

Non-empty `level0_compactions_in_progress_` was aborting `CompactFiles` after incrementing `bg_compaction_scheduled_`, and in that case we never decremented it. This blocked future compactions and prevented DB close as we wait for scheduled compactions to finish/abort during close.

I eliminated `CompactFiles`'s dependency on `level0_compactions_in_progress_`. Since it takes a contiguous span of L0 files -- through the last L0 file if any L1+ files are included -- it's fine to run in parallel with other compactions involving L0. We make the same assumption in intra-L0 compaction.
Closes https://github.com/facebook/rocksdb/pull/2849

Differential Revision: D5780440

Pulled By: ajkr

fbshipit-source-id: 15b15d3faf5a699aed4b82a58352d4a7bb23e027
2017-09-13 15:41:38 -07:00
Maysam Yabandeh
09713a64b3 WritePrepared Txn: Lock-free CommitMap
Summary:
We had two proposals for lock-free commit maps. This patch implements the latter one that was simpler. We can later experiment with both proposals.

In this impl each entry is an std::atomic of uint64_t, which are accessed via memory_order_acquire/release. In x86_64 arch this is compiled to simple reads and writes from memory.
Closes https://github.com/facebook/rocksdb/pull/2861

Differential Revision: D5800724

Pulled By: maysamyabandeh

fbshipit-source-id: 41abae9a4a5df050a8eb696c43de11c2770afdda
2017-09-13 12:12:11 -07:00