Commit Graph

7570 Commits

Author SHA1 Message Date
Andrew Kryczka
c94523ee56 Delete code for WAL reader to start at nonzero offset (#4362)
Summary:
The code is dead in RocksDB as `log::Reader::initial_offset_` is always zero. We should delete it so we don't have to maintain it like in #4359.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4362

Differential Revision: D9817829

Pulled By: ajkr

fbshipit-source-id: 474a2c679e5bd273b40608f3a5332931d9eefe6d
2018-09-13 17:13:03 -07:00
kckjn97
902261519e correct mistyped msg. (#4341)
Summary:
corrected the mistyped message.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4341

Differential Revision: D9816571

Pulled By: ajkr

fbshipit-source-id: 1df0424e981a01470a638a37b925c4133d59a48b
2018-09-13 14:57:38 -07:00
Vitaly Isaev
0bd2ede10e Memory usage stats in C API (#4340)
Summary:
Please consider this small PR providing access to the `MemoryUsage::GetApproximateMemoryUsageByType` function in plain C API. Actually I'm working on Go application and now trying to investigate the reasons of high memory consumption (#4313). Go [wrappers](https://github.com/tecbot/gorocksdb) are built on the top of Rocksdb C API. According to the #706, `MemoryUsage::GetApproximateMemoryUsageByType` is considered as the best option to get database internal memory usage stats, but it wasn't supported in C API yet.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4340

Differential Revision: D9655135

Pulled By: ajkr

fbshipit-source-id: a3d2f3f47c143ae75862fbcca2f571ea1b49e14a
2018-09-13 14:27:31 -07:00
Maysam Yabandeh
9ea9007b50 Reduce IndexBlockIter size (#4358)
Summary:
With #3983 the size of IndexBlockIter was increased. This had resulted in a regression on P50 latencies in one of our benchmarks. The patch reduces IndexBlockIter size be eliminating active_comparator_ field from the class.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4358

Differential Revision: D9781737

Pulled By: maysamyabandeh

fbshipit-source-id: 71e2b28d90ff0813db9e04b737ae73e185583c52
2018-09-12 10:03:35 -07:00
Dan Melnic
ca92fc71a4 Initialize uninitialized std::atomic variables
Summary: Initialize uninitialized std::atomic variables

Reviewed By: yfeldblum

Differential Revision: D9758050

fbshipit-source-id: 865d89eddafc81f3cab6f11e2ebb669f7ff70d04
2018-09-12 08:58:05 -07:00
Yanqin Jin
3ba3b153ef Fix Makefile target 'jtest' on PowerPC (#4357)
Summary:
Before the fix:
On a PowerPC machine, run the following
```
$ make jtest
```
The command will fail due to "undefined symbol: crc32c_ppc". It was caused by
'rocksdbjava' Makefile target not including crc32c_ppc object files when
generating the shared lib. The fix is simple.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4357

Differential Revision: D9779474

Pulled By: riversand963

fbshipit-source-id: 3c5ec9068c2b9c796e6500f71cd900267064fd51
2018-09-11 16:37:23 -07:00
Philip Jameson
dbf44c314b Lint TARGETS files with buildifier
Summary: Build file formatting

Reviewed By: mzlee

Differential Revision: D9728238

fbshipit-source-id: 99a266d5d2260eabfd63a200b2994c6850b59cf4
2018-09-11 14:58:19 -07:00
Abhishek Madan
c86a22ac43 Restrict RangeDelAggregator's tombstone end-key truncation (#4356)
Summary:
`RangeDelAggregator::AddTombstones` contained an assertion which stated that, if a range tombstone extended past the largest key in the sstable, then `FileMetaData::largest` must have a sentinel sequence number of `kMaxSequenceNumber`, which implies that the tombstone's end key is safe to truncate. However, `largest` will not be a sentinel key when the next sstable in the level's smallest key is equal to the current sstable's largest key, which caused the assertion to fail.

The assertion must hold for the truncation to be safe, so it has been moved to an additional check on end-key truncation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4356

Differential Revision: D9760891

Pulled By: abhimadan

fbshipit-source-id: 7c20c3885cd919dcd14f291f88fd27aa33defebc
2018-09-10 17:42:43 -07:00
Maysam Yabandeh
3f5282268f Skip concurrency control during recovery of pessimistic txn (#4346)
Summary:
TransactionOptions::skip_concurrency_control allows pessimistic transactions to skip the overhead of concurrency control. This could be as an optimization if the application knows that the transaction would not have any conflict with concurrent transactions. It is currently used during recovery assuming (i) application guarantees no conflict between prepared transactions in the WAL (ii) application guarantees that recovered transactions will be rolled back/commit before new transactions start.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4346

Differential Revision: D9759149

Pulled By: maysamyabandeh

fbshipit-source-id: f896e84fa58b0b584be904c7fd3883a41ea3215b
2018-09-10 16:57:53 -07:00
Kefu Chai
faf529fd7c env_librados.h: drop redundant #endif (#4354)
Summary:
without this change, rocksdb_env_librados_test fails to build.

it's a regression introduced by 64324e32

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4354

Differential Revision: D9702665

Pulled By: riversand963

fbshipit-source-id: 65134eaff0543733210edfc77f89c96709da7a3f
2018-09-07 11:12:44 -07:00
Maysam Yabandeh
655ef7d77f Inline doc for format_version 4 (#4350)
Summary:
Fixes #4337
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4350

Differential Revision: D9700871

Pulled By: maysamyabandeh

fbshipit-source-id: fe1e07803783f34588dc14aba66d51117ca4a180
2018-09-07 07:57:30 -07:00
Anand Ananthabhotla
ced618cf39 Fix a lint error due to unspecified move evaluation order (#4348)
Summary:
In C++ 11, the order of argument and move evaluation in a statement such
as below is unspecified -
  foo(a.b).bar(std::move(a))
The compiler is free to evaluate std::move(a) first, and then a.b is unspecified.

In C++ 17, this will be safe if a draft proposal around function
chaining rules is accepted.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4348

Differential Revision: D9688810

Pulled By: anand1976

fbshipit-source-id: e4651d0ca03dcf007e50371a0fc72c0d1e710fb4
2018-09-06 14:42:57 -07:00
Andrew Kryczka
2c14662213 Revert "Digest ZSTD compression dictionary once per SST file (#4251)" (#4347)
Summary:
Reverting is needed to unblock a user building against master, who is blocked for multiple days due to a thread-safety issue in `GetEmptyDict`. We haven't been able to fix it quickly, so reverting.

Simply ran `git revert 6c40806e51a89386d2b066fddf73d3fd03a36f65`. There were no merge conflicts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4347

Differential Revision: D9668365

Pulled By: ajkr

fbshipit-source-id: 0c56334f0a23cf5ee0233d4e4679eae6709739cd
2018-09-06 09:58:34 -07:00
cngzhnp
64324e329e Support pragma once in all header files and cleanup some warnings (#4339)
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.

Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339

Differential Revision: D9654990

Pulled By: ajkr

fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
2018-09-05 18:13:31 -07:00
Yanqin Jin
90f5048207 Remove warnings caused by unused variables in jni (#4345)
Summary:
Test plan
```
$make clean jclean
$make -j32 rocksdbjavastatic
$make -j32 rocksdbjava
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4345

Differential Revision: D9661256

Pulled By: riversand963

fbshipit-source-id: aed316c53b29d02fbdd3fa1063a3e832b8a66469
2018-09-05 13:42:34 -07:00
Andrew Kryczka
1a88c43751 Reduce empty SST creation/deletion in compaction (#4336)
Summary:
This is a followup to #4311. Checking `!RangeDelAggregator::IsEmpty()` before opening a dedicated range tombstone SST did not properly prevent empty SSTs from being generated. That's because it relies on `CollapsedRangeDelMap::Size`, which had an underflow bug when the map was empty. This PR fixes that underflow bug.

Also fixed an uninitialized variable in db_stress.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4336

Differential Revision: D9600080

Pulled By: ajkr

fbshipit-source-id: bc6980ca79d2cd01b825ebc9dbccd51c1a70cfc7
2018-08-31 12:28:52 -07:00
Yi Wu
462ed70d64 BlobDB: GetLiveFiles and GetLiveFilesMetadata return relative path (#4326)
Summary:
`GetLiveFiles` and `GetLiveFilesMetadata` should return path relative to db path.

It is a separate issue when `path_relative` is false how can we return relative path. But `DBImpl::GetLiveFiles` don't handle it as well when there are multiple `db_paths`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4326

Differential Revision: D9545904

Pulled By: yiwu-arbug

fbshipit-source-id: 6762d879fcb561df2b612e6fdfb4a6b51db03f5d
2018-08-31 12:12:49 -07:00
Zhongyi Xie
1cf17ba53b Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323)
Summary:
Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323

Differential Revision: D9599170

Pulled By: miasantreble

fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
2018-08-30 18:42:51 -07:00
Yi Wu
3e801e5ed1 BlobDB: Improve info log (#4324)
Summary:
Improve BlobDB info logs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4324

Differential Revision: D9545074

Pulled By: yiwu-arbug

fbshipit-source-id: 678ab8820a78758fee451be3b123b0680c1081df
2018-08-30 11:57:46 -07:00
Sagar Vemuri
f46dd5cbeb Remove trace_analyzer_tool from LIB_SOURCES (#4331)
Summary:
trace_analyzer_tool should only be in ANALYZER_LIB_SOURCES and not in LIB_SOURCES.
This fixes java_test travis build failures seen in jtest.
Blame: a6d3de4e7a
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4331

Differential Revision: D9560377

Pulled By: sagar0

fbshipit-source-id: 6b9636201a920b56ee0f61e367fee5d3dca692b0
2018-08-29 21:28:40 -07:00
Wez Furlong
d00e5de7fc use atomic O_CLOEXEC when available (#4328)
Summary:
In our application we spawn helper child processes concurrently with
opening rocksdb.  In one situation I observed that the child process had inherited
the rocksdb lock file as well as directory handles to the rocksdb storage location.

The code in env_posix takes care to set CLOEXEC but doesn't use `O_CLOEXEC` at the
time that the files are opened which means that there is a window of opportunity
to leak the descriptors across a fork/exec boundary.

This diff introduces a helper that can conditionally set the `O_CLOEXEC` bit for
the open call using the same logic as that in the existing helper for setting
that flag post-open.

I've preserved the post-open logic for systems that don't have `O_CLOEXEC`.

I've introduced setting `O_CLOEXEC` for what appears to be a number of temporary
or transient files and directory handles; I suspect that none of the files
opened by Rocks are intended to be inherited by a forked child process.

In one case, `fopen` is used to open a file.  I've added the use of the glibc-specific `e`
mode to turn on `O_CLOEXEC` for this case.  While this doesn't cover all posix systems,
it is an improvement for our common deployment system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4328

Reviewed By: ajkr

Differential Revision: D9553046

Pulled By: wez

fbshipit-source-id: acdb89f7a85ca649b22fe3c3bd76f82142bec2bf
2018-08-29 20:27:43 -07:00
Mikhail Antonov
927f274939 Avoiding write stall caused by manual flushes (#4297)
Summary:
Basically at the moment it seems it's possible to cause write stall by calling flush (either manually vis DB::Flush(), or from Backup Engine directly calling FlushMemTable() while background flush may be already happening.

One of the ways to fix it is that in DBImpl::CompactRange() we already check for possible stall and delay flush if needed before we actually proceed to call FlushMemTable(). We can simply move this delay logic to separate method and call it from FlushMemTable.

This is draft patch, for first look; need to check tests/update SyncPoints and most certainly would need to add allow_write_stall method to FlushOptions().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4297

Differential Revision: D9420705

Pulled By: mikhail-antonov

fbshipit-source-id: f81d206b55e1d7b39e4dc64242fdfbceeea03fcc
2018-08-29 12:12:55 -07:00
Fenggang Wu
5f63a89b35 data block hash index blog post
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4309

Differential Revision: D9557843

Pulled By: sagar0

fbshipit-source-id: 190e4ccedfaeaacd96d945610de843f97c307540
2018-08-29 10:58:10 -07:00
Philip Jameson
a876995ed4 Grab straggler files to explicitly import AutoHeaders
Summary: There were a few files that were missed when AutoHeaders were moved to their own file. Add explicit loads

Reviewed By: yfeldblum

Differential Revision: D9499942

fbshipit-source-id: 942bf3a683b8961e1b6244136f6337477dcc45af
2018-08-28 21:28:55 -07:00
Andrew Kryczka
42733637e1 Sync CURRENT file during checkpoint (#4322)
Summary: For the CURRENT file forged during checkpoint, we were forgetting to `fsync` or `fdatasync` it after its creation. This PR fixes it.

Differential Revision: D9525939

Pulled By: ajkr

fbshipit-source-id: a505483644026ee3f501cfc0dcbe74832165b2e3
2018-08-28 12:43:18 -07:00
Yi Wu
38ad3c9f8a BlobDB: Avoid returning garbage value on key not found (#4321)
Summary:
When reading an expired key using `Get(..., std::string* value)` API, BlobDB first read the index entry and decode expiration from it. In this case, although BlobDB reset the PinnableSlice, the index entry is stored in user provided string `value`. The value will be returned as a garbage value, despite status being NotFound. Fixing it by use a different PinnableSlice to read the index entry.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4321

Differential Revision: D9519042

Pulled By: yiwu-arbug

fbshipit-source-id: f054c951a1fa98265228be94f931904ed7056677
2018-08-27 16:28:39 -07:00
Jay Lee
6ed7f146c3 cmake: allow opting out debug runtime (#4317)
Summary:
Projects built in debug profile don't always link to debug runtime.
Allowing opting out the debug runtime to make rocksdb get along well
with other projects.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4317

Differential Revision: D9518038

Pulled By: sagar0

fbshipit-source-id: 384901a0d12b8de20759756e8a19b4888a27c399
2018-08-27 15:58:59 -07:00
Yi Wu
a6d3de4e7a BlobDB: Implement DisableFileDeletions (#4314)
Summary:
`DB::DiableFileDeletions` and `DB::EnableFileDeletions` are used for applications to stop RocksDB background jobs to delete files while they are doing replication. Implement these methods for BlobDB. `DeleteObsolteFiles` now needs to check `disable_file_deletions_` before starting, and will hold `delete_file_mutex_` the whole time while it is running. `DisableFileDeletions` needs to wait on `delete_file_mutex_` for running `DeleteObsolteFiles` job and set `disable_file_deletions_` flag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4314

Differential Revision: D9501373

Pulled By: yiwu-arbug

fbshipit-source-id: 81064c1228f1724eff46da22b50ff765b16292cd
2018-08-27 10:58:29 -07:00
Sagar Vemuri
2f871bc85e Download bzip2 packages from Internet Archive (#4306)
Summary:
Since bzip.org is no longer maintained, download the bzip2 packages from a snapshot taken by the internet archive until we figure out a more credible source.

Fixes issue: #4305
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4306

Differential Revision: D9514868

Pulled By: sagar0

fbshipit-source-id: 57c6a141a62e652f94377efc7ca9916b458e68d5
2018-08-27 09:58:24 -07:00
Yanqin Jin
198459ce17 Fix an inaccurate comment (#4315)
Summary:
According to 4848bd0c4e/db/log_reader.cc (L355), the original text is misleading when describing the layout of RecyclableLogHeader.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4315

Differential Revision: D9505284

Pulled By: riversand963

fbshipit-source-id: 79994c37a69e7003f03453e7efc0186feeafa609
2018-08-24 18:13:20 -07:00
Shrikanth Shankar
4848bd0c4e Drop unnecessary deletion markers during compaction (issue - 3842) (#4289)
Summary:
This PR fixes issue 3842. We drop deletion markers iff
1. We are the bottom most level AND
2. All other occurrences of the key are in the same snapshot range as the delete

I've also enhanced db_stress_test to add an option that does a full compare of the keys. This is done by a single thread (thread # 0). For tests I've run (so far)

make check -j64
db_stress
db_stress  --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify that new code doesnt break existing tests */
./db_stress --compare_full_db_state_snapshot=true --acquire_snapshot_one_in=1000 --ops_per_thread=100000 /* to verify new test code */
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4289

Differential Revision: D9491165

Pulled By: shrikanthshankar

fbshipit-source-id: ce144834f31736c189aaca81bed356ba990331e2
2018-08-24 15:17:54 -07:00
Yanqin Jin
8022500ecc Add compatibility test of SST ingestion (#4310)
Summary:
Test plan
```
$cd rocksdb/
$./tools/check_format_compatible.sh
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4310

Differential Revision: D9498125

Pulled By: riversand963

fbshipit-source-id: 83cf6992949a52199e7812bb41bc9281ac271a24
2018-08-24 14:27:43 -07:00
Yanqin Jin
7daae512d2 Refactor flush request queueing and processing (#3952)
Summary:
RocksDB currently queues individual column family for flushing. This is not sufficient to support the needs of some applications that want to enforce order/dependency between column families, given that multiple foreground and background activities can trigger flushing in RocksDB.

This PR aims to address this limitation. Each flush request is described as a `FlushRequest` that can contain multiple column families. A background flushing thread pops one flush request from the queue at a time and processes it.

This PR does not enable atomic_flush yet, but is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3952

Differential Revision: D8529933

Pulled By: riversand963

fbshipit-source-id: 78908a21e389a3a3f7de2a79bae0cd13af5f3539
2018-08-24 13:27:35 -07:00
Andrew Kryczka
17f9a181d5 Reduce empty SST creation/deletion during compaction (#4311)
Summary:
I have a PR to start calling `OnTableFileCreated` for empty SSTs: #4307. However, it is a behavior change so should not go into a patch release.

This PR adds back a check to make sure range deletions at least exist before starting file creation. This PR should be safe to backport to earlier versions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4311

Differential Revision: D9493734

Pulled By: ajkr

fbshipit-source-id: f0d43cda4cfd904f133cfe3a6eb622f52a9ccbe8
2018-08-24 12:27:57 -07:00
Andrew Kryczka
e7bb8e9b92 Fix clang build of db_stress (#4312)
Summary:
Blame: #4307
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4312

Differential Revision: D9494093

Pulled By: ajkr

fbshipit-source-id: eb6be2675c08b9ab508378d45110eb0fcf260a42
2018-08-23 21:57:57 -07:00
Andrew Kryczka
6c40806e51 Digest ZSTD compression dictionary once per SST file (#4251)
Summary:
In RocksDB, for a given SST file, all data blocks are compressed with the same dictionary. When we compress a block using the dictionary's raw bytes, the compression library first has to digest the dictionary to get it into a usable form. This digestion work is redundant and ideally should be done once per file.

ZSTD offers APIs for the caller to create and reuse a digested dictionary object (`ZSTD_CDict`). In this PR, we call `ZSTD_createCDict` once per file to digest the raw bytes. Then we use `ZSTD_compress_usingCDict` to compress each data block using the pre-digested dictionary. Once the file's created `ZSTD_freeCDict` releases the resources held by the digested dictionary.

There are a couple other changes included in this PR:

- Changed the parameter object for (un)compression functions from `CompressionContext`/`UncompressionContext` to `CompressionInfo`/`UncompressionInfo`. This avoids the previous pattern, where `CompressionContext`/`UncompressionContext` had to be mutated before calling a (un)compression function depending on whether dictionary should be used. I felt that mutation was error-prone so eliminated it.
- Added support for digested uncompression dictionaries (`ZSTD_DDict`) as well. However, this PR does not support reusing them across uncompression calls for the same file. That work is deferred to a later PR when we will store the `ZSTD_DDict` objects in block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4251

Differential Revision: D9257078

Pulled By: ajkr

fbshipit-source-id: 21b8cb6bbdd48e459f1c62343780ab66c0a64438
2018-08-23 19:28:18 -07:00
Andrew Kryczka
ee234e83e3 Invoke OnTableFileCreated for empty SSTs (#4307)
Summary:
The API comment on `OnTableFileCreationStarted` (b6280d01f9/include/rocksdb/listener.h (L331-L333)) led users to believe a call to `OnTableFileCreationStarted` will always be matched with a call to `OnTableFileCreated`. However, we were skipping the `OnTableFileCreated` call in one case: no error happens but also no file is generated since there's no data.

This PR adds the call to `OnTableFileCreated` for that case. The filename will be "(nil)" and the size will be zero.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4307

Differential Revision: D9485201

Pulled By: ajkr

fbshipit-source-id: 2f077ec7913f128487aae2624c69a50762394df6
2018-08-23 18:27:30 -07:00
zhichao-cao
cf7150ac2e Add the unit test of Iterator to trace_analyzer_test (#4282)
Summary:
Add the unit test of Iterator (Seek and SeekForPrev) to trace_analyzer_test. The output files after analyzing the trace file are checked to make sure that analyzing results are correct.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4282

Differential Revision: D9436758

Pulled By: zhichao-cao

fbshipit-source-id: 88d471c9a69e07382d9c6a45eba72773b171e7c2
2018-08-23 17:28:32 -07:00
Gauresh Rane
ad789e4e0d Adding a method for memtable class for memtable getting flushed. (#4304)
Summary:
Memtables are selected for flushing by the flush job. Currently we
have listener which is invoked when memtables for a column family are
flushed. That listener does not indicate which memtable was flushed in
the notification. If clients want to know if particular data in the
memtable was retired, there is no straight forward way to know this.
This method will help users who implement memtablerep factory and extend
interface for memtablerep, to know if the data in the memtable was
retired.
Another option that was tried, was to depend on memtable destructor to
be called after flush to mark that data was persisted. This works all
the time but sometimes there can huge delays between actual flush
happening and memtable getting destroyed. Hence, if anyone who is
waiting for data to persist will have to wait that longer.
It is expected that anyone who is implementing this method to have
return quickly as it blocks RocksDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4304

Reviewed By: riversand963

Differential Revision: D9472312

Pulled By: gdrane

fbshipit-source-id: 8e693308dee749586af3a4c5d4fcf1fa5276ea4d
2018-08-23 17:14:25 -07:00
Fenggang Wu
da40d45267 DataBlockHashIndex: avoiding expensive iiter->Next when handling hash kNoEntry (#4296)
Summary:
When returning `kNoEntry` from HashIndex lookup, previously we invalidate the
`biter` by set `current_=restarts_`, so that the search can continue to the next
block in case the search result may reside in the next block.

There is one problem: when we are searching for a missing key, if the search
finds a `kNoEntry` and continue the search to the next block, there is also a
non-trivial possibility that the HashIndex return `kNoEntry` too, and the
expensive index iterator `Next()` will happen several times for nothing.

The solution is that if the hash table returns `kNoEntry`, `SeekForGetImpl()` just search the last restart interval for the key. It will stop at the first key that is large than the seek_key, or to the end of the block, and each case will be handled correctly.

Microbenchmark script:
```
TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq,readtocache,readmissing \
          --cache_size=20000000000  --use_data_block_hash_index={true|false}
```

`readmissing` performance (lower is better):
```
binary:                      3.6098 micros/op
hash (before applying diff): 4.1048 micros/op
hash (after  applying diff): 3.3502 micros/op
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4296

Differential Revision: D9419159

Pulled By: fgwu

fbshipit-source-id: 21e3eedcccbc47a249aa8eb4bf405c9def0b8a05
2018-08-23 10:12:58 -07:00
Yanqin Jin
bb5dcea98e Add path to WritableFileWriter. (#4039)
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039

Differential Revision: D8670178

Pulled By: riversand963

fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
2018-08-23 10:12:58 -07:00
Zhongyi Xie
f1f5ba085f add missing counters in readonly mode (#4260)
Summary:
User reported (https://github.com/facebook/rocksdb/issues/4168) that when opening RocksDB in read-only mode, some statistics are not correctly reported. After some investigation, we believe the following counters are indeed not reported during Get() call in a read-only DB:
rocksdb.memtable.hit
rocksdb.memtable.miss
rocksdb.number.keys.read
rocksdb.bytes.read
As well as histogram rocksdb.bytes.per.read
and perf context get_read_bytes
This PR will add the necessary counter reporting logic in the Get() call path
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4260

Differential Revision: D9476431

Pulled By: miasantreble

fbshipit-source-id: 7ab409d4e59df05d09ae8b69fe75554e5aa240d6
2018-08-22 22:43:13 -07:00
Andrew Kryczka
b6280d01f9 Require ZSTD 1.1.3+ to use dictionary trainer (#4295)
Summary:
ZSTD's dynamic library exports `ZDICT_trainFromBuffer` symbol since v1.1.3, and its static library exports it since v0.6.1. We don't know whether linkage is static or dynamic, so just require v1.1.3 to use dictionary trainer.

Fixes the issue reported here: https://jira.mariadb.org/browse/MDEV-16525.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4295

Differential Revision: D9417183

Pulled By: ajkr

fbshipit-source-id: 0e89d2f48d9e7f6eee73e7f4572660a9f7122db8
2018-08-22 18:27:52 -07:00
Fenggang Wu
640cfa7c33 DataBlockHashIndex: fix comment in NumRestarts() (#4286)
Summary:
Improve the description of the backward compatibility check in NumRestarts()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4286

Differential Revision: D9412490

Pulled By: fgwu

fbshipit-source-id: ea7dd5c61d8ff8eacef623b729d4e4fd53cca066
2018-08-21 17:12:45 -07:00
Yi Wu
4f12d49daf Suppress clang analyzer error (#4299)
Summary:
Suppress multiple clang-analyzer error. All of them are clang false-positive.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4299

Differential Revision: D9430740

Pulled By: yiwu-arbug

fbshipit-source-id: fbdd575bdc214d124826d61d35a117995c509279
2018-08-21 16:43:05 -07:00
Anand Ananthabhotla
c9a0419413 Release 5.16 (#4298)
Summary:
Update HISTORY.md for 5.16.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4298

Differential Revision: D9433868

Pulled By: anand1976

fbshipit-source-id: e7880a1c952210b1e9d7466eed72a6cb5018096b
2018-08-21 14:43:08 -07:00
Zhichao Cao
9e2d5ab6bf Adjusted the Makefile of trace_analyzer to isolate the Gflags from other (#4290)
Summary:
Previously, the trace_analyzer_tool will be complied with other libobjects, which let the GFLAGS of trace_analyzer appear in other tools (e.g., db_bench, rocksdb_dump, and etc.). When using '--help', the help information of trace_analyzer will appear in other tool help information, which will cause confusion issues.

Currently, trace_analyzer_tool is built and used only by trace_analyzer and trace_analyzer_test to avoid the issues.

Tested with make asan_check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4290

Differential Revision: D9413163

Pulled By: zhichao-cao

fbshipit-source-id: ed5d20c4575a53ca15ff62a2ffe601d5cf278cc4
2018-08-21 10:47:24 -07:00
Fenggang Wu
6d37fdb365 DataBlockHashIndex: Remove the division from EstimateSize() (#4293)
Summary:
`BlockBasedTableBuilder::Add()` eventually calls
`DataBlockHashIndexBuilder::EstimateSize()`. The previous implementation
divides the `num_keys` by the `util_ratio_` to get an estimizted `num_buckets`.
Such division is expensive as it happens in every
`BlockBasedTableBuilder::Add()`.

This diff estimates the `num_buckets` by double addition instead of double
division. Specifically, in each `Add()`, we add `bucket_per_key_`
(inverse of `util_ratio_`) to the current `estimiated_num_buckets_`. The cost is
that we are gonna have the `estimated_num_buckets_` stored as one extra field
in the DataBlockHashIndexBuilder.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4293

Differential Revision: D9412472

Pulled By: fgwu

fbshipit-source-id: 2925c8509a401e7bd3c1ab1d9e9c7244755c277a
2018-08-20 23:13:50 -07:00
Yi Wu
7188bd34f3 BlobDB: Fix expired file not being evicted (#4294)
Summary:
Fix expired file not being evicted from the DB. We have a background task (previously called `CheckSeqFiles` and I rename it to `EvictExpiredFiles`) to scan and remove expired files, but it only close the files, not marking them as expired.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4294

Differential Revision: D9415984

Pulled By: yiwu-arbug

fbshipit-source-id: eff7bf0331c52a7ccdb02318602bff7f64f3ef3d
2018-08-20 22:42:33 -07:00
Siying Dong
d5612b43de Two code changes to make "clang analyze" happy (#4292)
Summary:
Clang analyze is not happy in two pieces of code, with "Potential memory leak". No idea what the problem but slightly changing the code makes clang happy.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4292

Differential Revision: D9413555

Pulled By: siying

fbshipit-source-id: 9428c9d3664530c72129feefd135ee63d8386137
2018-08-20 17:43:41 -07:00