Summary:
It is a potential bug that will be triggered if we ingest files before inserting the first key into an empty db.
0 is a special value reserved to indicate the concept of non-existence. But not good for seqno in this case because 0 is a valid seqno for ingestion(bulk loading)
Closes https://github.com/facebook/rocksdb/pull/2183
Differential Revision: D4919827
Pulled By: lightmark
fbshipit-source-id: 237eea40f88bd6487b66806109d90065dc02c362
Summary:
Workaround for Solaris gcc binary. Program is crashing, because when TLS of perf context that is used twice on same frame, it is damaged thus Segmentation fault.
Issue: #2153
Closes https://github.com/facebook/rocksdb/pull/2187
Differential Revision: D4922274
Pulled By: siying
fbshipit-source-id: 549105ebce9a8ce08a737f4d6b9f2312ebcde9a8
Summary:
In a previous commit, I changed the way to checkout release branches from "git checkout <branch_name>" to "git checkout origin/<branch_name>". However, this doesn't seem to work in our CI environment. Revert it.
Closes https://github.com/facebook/rocksdb/pull/2189
Differential Revision: D4922294
Pulled By: siying
fbshipit-source-id: 482c17f9b05e6ccb190876b050682fe5a458103d
Summary:
tools/check_format_compatible.sh will check a newer version of RocksDB can open option files generated by older version releases. In order to achieve that, a new parameter "--try_load_options" is added to ldb. With this parameter set, if option file exists, we load the option file and use it to open the DB. With this opiton set, we can validate option loading logic.
Closes https://github.com/facebook/rocksdb/pull/2178
Differential Revision: D4914989
Pulled By: siying
fbshipit-source-id: db114f7724fcb41e5e9483116d84d7c4b8389ca4
Summary:
Tested by running it on a remote machine.
I could not run it on the particular remote machine which has a different location for time command since it is busy and the script does not allow concurrent runs. So I tested it by hacking the script and replacing the command with "\$(hostname)" and confirmed that the scripts prints out the host name of the remote machine.
Closes https://github.com/facebook/rocksdb/pull/2181
Differential Revision: D4921654
Pulled By: maysamyabandeh
fbshipit-source-id: 8abb5ea9f7234f3c50a749576ccbb47ff605beb9
Summary:
This was requested by a customer who wants to proactively monitor whether any valid backups are available. The existing performance was poor because Open() serially reads every small meta-file (one per backup), which was slow on HDFS.
Now we only read the minimum number of meta-files to find `max_valid_backups_to_open` valid backups. The customer mentioned above can just set it to one.
Closes https://github.com/facebook/rocksdb/pull/2151
Differential Revision: D4882564
Pulled By: ajkr
fbshipit-source-id: cb0edf9e8ac693e4d5f24902e725a011ed8c0c2f
Summary:
The goal is to avoid the problem of small number of L0 files triggering compaction to base level (which increased write-amp), while still allowing L0 compaction-by-size (so intra-L0 compactions cause score to increase).
Closes https://github.com/facebook/rocksdb/pull/2172
Differential Revision: D4908552
Pulled By: ajkr
fbshipit-source-id: 4b170142b2b368e24bd7948b2a6f24c69fabf73d
Summary:
index_per_partition should have deprecated deprecated instead of being removed. It is causing backward compatibility issues.
Closes https://github.com/facebook/rocksdb/pull/2173
Differential Revision: D4910947
Pulled By: maysamyabandeh
fbshipit-source-id: 5c52939381847d232ede6866606f67f2b4b857ae
Summary:
Need to add more recent versions to tools/check_format_compatible.sh to meka sure backward and forward compatibility.
Closes https://github.com/facebook/rocksdb/pull/2175
Differential Revision: D4911585
Pulled By: siying
fbshipit-source-id: 943e6488757efb11bb6720d811c7ba949915c9de
Summary:
Add a function to allow users to reset internal stats without restarting the DB.
Closes https://github.com/facebook/rocksdb/pull/2167
Differential Revision: D4907939
Pulled By: siying
fbshipit-source-id: ab2dd85b88aabe9380da7485320a1d460d3e1f68
Summary:
Previously, the shared library (make shared_lib) was built with only one
compile line, compiling all .cc files and linking the shared library in
one step. That step would often take 10+ minutes on one machine, and
could not take advantage of multiple CPUs (it's only one invocation of
the compiler).
This commit changes the shared_lib build to compile .o files
individually (placing the resulting .o files in the directory
shared-objects) and then link them into the shared library at the end,
similarly to how the java static build (jls) does it.
Tested by making sure that both static and shared libraries work, and by
making sure that "make clean" cleans up the shared-objects directory.
Closes https://github.com/facebook/rocksdb/pull/2165
Differential Revision: D4897121
Pulled By: yiwu-arbug
fbshipit-source-id: 9811e043d1c01e10503593f3489d186c786ee7d7
Summary:
st_blocks shows 16 though the right value is 8. This happens occasionally which seems a bug.
Closes https://github.com/facebook/rocksdb/pull/2160
Differential Revision: D4893542
Pulled By: lightmark
fbshipit-source-id: 68e832586b58bbc6162efbe83ce273f1570d5be3
Summary:
prefetch some data from the end of the file for each compaction to reduce IO.
Closes https://github.com/facebook/rocksdb/pull/2149
Differential Revision: D4880576
Pulled By: lightmark
fbshipit-source-id: aa767cd1afc84c541837fbf1ad6c0d45b34d3932
Summary:
The concept about early exit in write thread implementation is a confusing one. It means that if early exit is allowed, batch group leader will not responsible to exit the batch group, but the last finished writer do. In case we need to mark log synced, or encounter memtable insert error, early exit is disallowed.
This patch remove such a concept by:
* In all cases, the last finished writer (not necessary leader) is responsible to exit batch group.
* In case of parallel memtable write, leader will also mark log synced after memtable insert and before signal finish (call `CompleteParallelWorker()`). The purpose is to allow mark log synced (which require locking mutex) can run in parallel to memtable insert in other writers.
* The last finish writer should handle memtable insert error (update bg_error_) before exiting batch group.
Closes https://github.com/facebook/rocksdb/pull/2134
Differential Revision: D4869667
Pulled By: yiwu-arbug
fbshipit-source-id: aec170847c85b90f4179d6a4608a4fe1361544e3
Summary:
When people are working off of a rocksdb fork, i.e. when their 'origin'
points to github.com/<username>/rocksdb, the script creates a new branch
and pushes to their origin. The new branch created by this script should
instead be pushed to github.com/facebook/rocksdb. Many people might
have named facebook/rocksdb remote as 'upstream' (or something else).
This fix provides an option to specify the remote to push the branch to.
The default is still 'origin'
More context:
When I created 5.4 branch using this script, it got pushed to sagar0/rocksdb instead of facebook/rocksdb, as I was working off of a fork. My 'origin' was pointing to sagar0/rocksdb. My 'upstream' was set to 'facebook/rocksdb'. So, I had to manually push the branch to my 'upstream'.
Closes https://github.com/facebook/rocksdb/pull/2156
Differential Revision: D4885333
Pulled By: sagar0
fbshipit-source-id: 9410eab5bd9bbefc340059800bd6b8434406729d
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117
Differential Revision: D4860912
Pulled By: lightmark
fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
Summary:
BYTES_WRITTEN accounting doesn't work with disabled WAL. For example, this is what we
get in the LOG:
```
Cumulative writes: 9794K writes, 228M keys, 9794K commit groups, 1.0
writes per commit group, ingest: 0.00 GB, 0.00 MB/s
```
WAL bytes are tracked in a different statistic:
https://github.com/facebook/rocksdb/blob/master/db/internal_stats.h#L105.
BYTES_WRITTEN should count all the writes.
Closes https://github.com/facebook/rocksdb/pull/2133
Differential Revision: D4880615
Pulled By: yiwu-arbug
fbshipit-source-id: 8fd0b223099f3f5ad7df79d4e737d313687fec69
Summary:
To correct a build process where the JAVA_TEST_LIBDIR is a symlink to a cache directory.
Test -s (size 0) on symlinks returns true, resulting in a mkdir over the top of the symlink resulting in failure.
As a solution -d checks if it is a directory (or the symlink refers to a directory), which works in the case of real directories and symlinks to directories.
Trivial I know but it was really easy for me to use a symlink here to prevent frequent downloads in a CI environment.
Thanks for your consideration.
Closes https://github.com/facebook/rocksdb/pull/1917
Differential Revision: D4612263
Pulled By: siying
fbshipit-source-id: 4d458f8e1760068cdd6b5eae4bce6e12c400df41
Summary: Build Java and RocksDB LITE as a customized unit test under internal_repo_rocksdb. One thing I'm not sure is that whether these two tests are triggered in every flavor.
Reviewed By: IslamAbdelRahman
Differential Revision: D4855868
fbshipit-source-id: 82a1628b458744d7692bbd29ef7424cca1294031
Summary:
Users usually set readahead buffer to a multiple of 4k, more than that, usually a multiple of blocks.
So previously we set real buffer size 512 * n + 4k, which may introduce an additional block reading.
Closes https://github.com/facebook/rocksdb/pull/2138
Differential Revision: D4871504
Pulled By: lightmark
fbshipit-source-id: b070faa51d92e976e8e8468c00692699e585e243
Summary:
filter the warning out and only print it once.
Closes https://github.com/facebook/rocksdb/pull/2137
Differential Revision: D4870925
Pulled By: lightmark
fbshipit-source-id: 91b363ce7f70bce88b0780337f408fc4649139b8
Summary:
In some CI test environment, compression libraries can't be successfully built. It still helps to build RocksDB there. Provide such an option to skip to download and build compression libraries.
Closes https://github.com/facebook/rocksdb/pull/2135
Differential Revision: D4872617
Pulled By: siying
fbshipit-source-id: bb21ac373bc62a2528cdf1ca4547e05fcae86214
Summary:
Moved MergeOperatorPinning tests from db_test2.cc to db_merge_operator_test.cc.
[This is the same code as PR #2104 , which has already been reviewed, but I am creating a new PR as I cannot import from #2104 onto phabricator anymore even after rebasing. I'll close and discard #2104.]
Closes https://github.com/facebook/rocksdb/pull/2125
Differential Revision: D4863312
Pulled By: sagar0
fbshipit-source-id: 0f71a7690aa09c1d03ee85ce2bc1d2d89e4f4399
Summary:
Currently level histogram is only printed out for DB stats and for default CF. This is confusing. Change to print for every CF instead.
Closes https://github.com/facebook/rocksdb/pull/2126
Differential Revision: D4865373
Pulled By: siying
fbshipit-source-id: 1c853e0ac66e00120ee931cabc9daf69ccc2d577
Summary:
Upgrading a shared lock was silently succeeding because the actual locking code was skipped. This is because if the keys are tracked, it is assumed that they are already locked and do not require locking. Fix this by recording in tracked keys whether the key was locked exclusively or not.
Note that lock downgrades are impossible, which is the behaviour we expect.
This fixesfacebook/mysql-5.6#587.
Closes https://github.com/facebook/rocksdb/pull/2122
Differential Revision: D4861489
Pulled By: IslamAbdelRahman
fbshipit-source-id: 58c7ebe7af098bf01b9774b666d3e9867747d8fd
Summary:
Extend TransactionOptions to include max_write_batch_size which determines the maximum size of the writebatch representation. If memory limit is exceeded, the operation will abort with subcode kMemoryLimit.
Closes https://github.com/facebook/rocksdb/pull/2124
Differential Revision: D4861842
Pulled By: lth
fbshipit-source-id: 46fd172ea67cc90bbba829bf0d70cfab2261c161
Summary:
Run the time command before regression tests, parse the output, and add the numbers to the report.
Closes https://github.com/facebook/rocksdb/pull/2101
Differential Revision: D4862781
Pulled By: maysamyabandeh
fbshipit-source-id: 4a81caa5d14187d67093aad154c8f0ad56aba901
Summary:
also did minor refactoring
Closes https://github.com/facebook/rocksdb/pull/2115
Differential Revision: D4855818
Pulled By: maysamyabandeh
fbshipit-source-id: fbca6ac57e5c6677fffe8354f7291e596a50cb77
Summary:
DBIter, and in-turn NewDBIterator and NewArenaWrappedDBIterator, take a bunch of params. They can be reduced by passing in ReadOptions directly instead of passing in every new param separately. It also seems much cleaner as a bunch of the params towards the end seem to be optional.
(Recently I introduced max_skippable_internal_keys, which added one more to the already huge count).
Idea courtesy IslamAbdelRahman
Closes https://github.com/facebook/rocksdb/pull/2116
Differential Revision: D4857128
Pulled By: sagar0
fbshipit-source-id: 7d239df094b94bd9ea79d145cdf825478ac037a8
Summary:
The compiler error:
```
/home/jenkins/workspace/ceph-master/src/rocksdb/db/db_impl.cc:20:10: fatal error: 'jemalloc/jemalloc.h' file not found
^
1 error generated.
```
But is does compile with the `WITH_JEMALLOC` set.
So ignore all the other settings.
Closes https://github.com/facebook/rocksdb/pull/2118
Differential Revision: D4858387
Pulled By: yiwu-arbug
fbshipit-source-id: 05b982969dcab53669a73a903641e71641c714e7
Summary:
Check the result of the benchmark againt a specified truth_db, which is
expected to be produced using the same benchmark but perhaps on a
different commit or with different configs.
The verification is simple and assumes that key/values are generated
deterministically. This assumption would break if db_bench using rand
variable differently from the benchmark that produced truth_db.
Currently it is checked to work on fillrandom and readwhilewriting.
A param finish_after_writes is added to ensure that the background
writing thread will write the same number of entries between two
benchmarks.
Example:
$ TEST_TMPDIR=/dev/shm/truth_db ./db_bench
--benchmarks="fillrandom,readwhilewriting" --num=200000
--finish_after_writes=true
$ TEST_TMPDIR=/dev/shm/tmpdb ./db_bench
--benchmarks="fillrandom,readwhilewriting,verify" --truth_db
/dev/shm/truth_db/dbbench --num=200000 --finish_after_writes=true
Verifying db <= truth_db...
Verifying db >= truth_db...
...Verified
Closes https://github.com/facebook/rocksdb/pull/2098
Differential Revision: D4839233
Pulled By: maysamyabandeh
fbshipit-source-id: 2f4ed31