Commit Graph

8980 Commits

Author SHA1 Message Date
Cheng Chang
91bc0130fa Refactor level compaction picker (#6804)
Summary:
1. refactor out PickFileToCompact to remove duplicated logic.
2. remove redundant checks of `start_level_inputs_.empty()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6804

Test Plan: make check

Reviewed By: siying

Differential Revision: D21390053

Pulled By: cheng-chang

fbshipit-source-id: 185d5987a08bfdaf63f0f245310c6da69878d415
2020-05-05 11:09:29 -07:00
Yanqin Jin
5584595f80 Do not swallow error returned from SaveTo() (#6801)
Summary:
With consistency check enabled, VersionBuilder::SaveTo() may return error once
corruption is detected while building versions. We should handle these errors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6801

Test Plan: make check

Reviewed By: siying

Differential Revision: D21385045

Pulled By: riversand963

fbshipit-source-id: 98f6424e2a4699b62befa21e9fe00e70a771118e
2020-05-05 10:46:20 -07:00
Yanqin Jin
5a61e7864d Fix db_stress when GetLiveFiles() flushes dropped CF (#6805)
Summary:
Current impl. of db_stress will abort verification and report failure if
GetLiveFiles() causes a dropped column family to be flushed. This is not
desired.
To fix, this PR makes the following change:
In GetLiveFiles, if flush is triggered and returns
Status::IsColumnFamilyDropped(), then set status to Status::OK().
This is OK because dropped column families will be skipped during the rest of
this function, and valid column families will have their live files returned to
caller.

Test plan (dev server):
make check
./db_stress -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100
./db_stress -disable_wal=1 -reopen=0 -ops_per_thread=1000 -get_live_files_one_in=100 -clear_column_family_one_in=100
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6805

Reviewed By: ltamasi

Differential Revision: D21390044

Pulled By: riversand963

fbshipit-source-id: de67846b95a4f1b88aa0a30c3d70c43cc68625b9
2020-05-04 17:45:49 -07:00
Levi Tamasi
a00ddf1574 Expose the set of live blob files from Version/VersionSet (#6785)
Summary:
The patch adds logic that returns the set of live blob files from
`Version::AddLiveFiles` and `VersionSet::AddLiveFiles` (in addition to
live table files), and also cleans up the code a bit, for example, by
exposing only the numbers of table files as opposed to the earlier
`FileDescriptor`s that no clients used. Moreover, the patch extends
the `GetLiveFiles` API so that it also exposes blob files in the current version.
Similarly to https://github.com/facebook/rocksdb/pull/6755,
this is a building block for identifying and purging obsolete blob files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6785

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D21336210

Pulled By: ltamasi

fbshipit-source-id: fc1aede8a49eacd03caafbc5f6f9ce43b6270821
2020-05-04 15:08:13 -07:00
sdong
680c416348 Avoid Swallowing Some File Consistency Checking Bugs (#6793)
Summary:
We are swallowing some file consistency checking failures. This is not expected. We are fixing two cases: DB reopen and manifest dump.
More places are not fixed and need follow-up.

Error from CheckConsistencyForDeletes() is also swallowed, which is not fixed in this PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6793

Test Plan: Add a unit test to cover the reopen case.

Reviewed By: riversand963

Differential Revision: D21366525

fbshipit-source-id: eb438a322237814e8d5125f916a3c6de97f39ded
2020-05-04 14:18:11 -07:00
Mian Qin
d9e170d82b Fix issues for reproducing synthetic ZippyDB workloads in the FAST20' paper (#6795)
Summary:
Fix issues for reproducing synthetic ZippyDB workloads in the FAST20' paper using db_bench. Details changes as follows.
1, add a separate random mode in MixGraph to produce all_random workload.
2, fix power inverse function for generating prefix_dist workload.
3, make sure key_offset in prefix mode is always unsigned.
note: Need to carefully choose key_dist_a/b to avoid aliasing. Power inverse function range should be close to overall key space.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6795

Reviewed By: akankshamahajan15

Differential Revision: D21371095

Pulled By: zhichao-cao

fbshipit-source-id: 80744381e242392c8c7cf8ac3d68fe67fe876048
2020-05-04 10:55:14 -07:00
Cheng Chang
211088df6e Remove redundant update of txn_state_ in transaction Prepare (#6778)
Summary:
When  expiration is set in a pessimistic transaction, `txn_state_` is already updated to `AWAITING_PREPARE` in the `if (expiration_time_ > 0)` block, there is  no need to update the state in `if (can_prepare)` block again.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6778

Test Plan: make check

Reviewed By: lth

Differential Revision: D21335319

Pulled By: cheng-chang

fbshipit-source-id: 251d634cc7d1a0e86e673a59f0bda8584da5a35f
2020-05-01 17:37:33 -07:00
Zhichao Cao
c8643edfc3 Fix multiple CF replay failure in db_bench replay (#6787)
Summary:
The multiple CF hash map is not passed to the multi-thread worker. When using multi-thread replay for multiple CFs, it will cause segment fault. Pass the cf_map to the argument.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6787

Test Plan: pass trace replay test.

Reviewed By: yhchiang

Differential Revision: D21339941

Pulled By: zhichao-cao

fbshipit-source-id: 434482b492287e6722c7cd5a706f057c5ec170ce
2020-05-01 00:03:38 -07:00
Yanqin Jin
6acbbbf9fc Add Github Action for some basic sanity test of PR (#6761)
Summary:
Add Github Action to perform some basic sanity check for PR, inclding the
following.
1) Buck TARGETS file.
On the one hand, The TARGETS file is used for internal buck, and we do not
manually update it. On the other hand, we need to run the buckifier scripts to
update TARGETS whenever new files are added, etc. With this Github Action, we
make sure that every PR does not forget this step. The GH Action uses
a Makefile target called check-buck-targets. Users can manually run `make
check-buck-targets` on local machine.

2) Code format
We use clang-format-diff.py to format our code. The GH Action in this PR makes
sure this step is not skipped. The checking script build_tools/format-diff.sh assumes that `clang-format-diff.py` is executable.
On host running GH Action, it is difficult to download `clang-format-diff.py` and make it
executable. Therefore, we modified build_tools/format-diff.sh to handle the case in which there is a non-executable clang-format-diff.py file in the top-level rocksdb repo directory.

Test Plan (Github and devserver):
Watch for Github Action result in the `Checks` tab.
On dev server
```
make check-format
make check-buck-targets
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6761

Test Plan: Watch for Github Action result in the `Checks` tab.

Reviewed By: pdillinger

Differential Revision: D21260209

Pulled By: riversand963

fbshipit-source-id: c646e2f37c6faf9f0614b68aa0efc818cff96787
2020-04-30 19:22:45 -07:00
sdong
6504ae0c4e Remove the support of setting CompressionOptions.parallel_threads from string for now (#6782)
Summary:
The current way of implementing CompressionOptions.parallel_threads introduces a format change. We plan to change CompressionOptions's serailization format to a new JSON-like format, which would be another format change. We would like to consolidate the two format changes into one, rather than making some users to change twice. Hold CompressionOptions.parallel_threads from being supported by option string for now. Will add it back after the general CompressionOptions's format change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6782

Test Plan: Run all existing tests.

Reviewed By: zhichao-cao

Differential Revision: D21338614

fbshipit-source-id: bca2dac3cb37d4e6e64b52cbbe8ea749cd848685
2020-04-30 17:01:17 -07:00
Cheng Chang
ef0c3eda27 Make users explicitly be aware of prepare before commit (#6775)
Summary:
In current commit protocol of pessimistic transaction, if the transaction is not prepared before commit, the commit protocol implicitly assumes that the user wants to commit without prepare.

This PR adds TransactionOptions::skip_prepare, the default value is `true` because if set to `false`, all existing users who commit without prepare need to update their code to set skip_prepare to true. Although this does not force the user to explicitly express their intention of skip_prepare, it at least lets the user be aware of the assumption of being able to commit without prepare.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6775

Test Plan: added a new unit test TransactionTest::CommitWithoutPrepare

Reviewed By: lth

Differential Revision: D21313270

Pulled By: cheng-chang

fbshipit-source-id: 3d95b7c9b2d6cdddc09bdd66c561bc4fae8c3251
2020-04-30 16:24:20 -07:00
sdong
079e50d2ba Disallow BlockBasedTableBuilder to set status from non-OK (#6776)
Summary:
There is no systematic mechanism to prevent BlockBasedTableBuilder's status to be set from non-OK to OK. Adding a mechanism to force this will help us prevent failures in the future.

The solution is to only make it possible to set the status code if the status code to set is not OK.

Since the status code passed to CompressAndVerifyBlock() is changed, a mini refactoring is done too so that the output arguments are changed from reference to pointers, based on Google C++ Style.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6776

Test Plan: Run all existing test.

Reviewed By: pdillinger

Differential Revision: D21314382

fbshipit-source-id: 27000c10f1e4c121661e026548d6882066409375
2020-04-30 15:37:03 -07:00
sdong
6277e28039 Flag CompressionOptions::parallel_threads to be experimental (#6781)
Summary:
The feature of CompressionOptions::parallel_threads is still not yet mature. Mention it to be experimental in the comments for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6781

Reviewed By: pdillinger

Differential Revision: D21330678

fbshipit-source-id: d7dd7d099fb002a5c6a5d8da689ce5ee08a9eb13
2020-04-30 15:22:06 -07:00
anand76
ab13d43e1d Pass a timeout to FileSystem for random reads (#6751)
Summary:
Calculate ```IOOptions::timeout``` using ```ReadOptions::deadline``` and pass it to ```FileSystem::Read/FileSystem::MultiRead```. This allows us to impose a tighter bound on the time taken by Get/MultiGet on FileSystem/Envs that support IO timeouts. Even on those that don't support, check in ```RandomAccessFileReader::Read``` and ```MultiRead``` and return ```Status::TimedOut()``` if the deadline is exceeded.

For now, TableReader creation, which might do file opens and reads, are not covered. It will be implemented in another PR.

Tests:
Update existing unit tests to verify the correct timeout value is being passed
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6751

Reviewed By: riversand963

Differential Revision: D21285631

Pulled By: anand1976

fbshipit-source-id: d89af843e5a91ece866e87aa29438b52a65a8567
2020-04-30 14:50:39 -07:00
Peter Dillinger
eecd8fba46 Fix assertion that can fail on sst corruption (#6780)
Summary:
An assertion that a char == a CompressionType (unsigned char)
originally cast from a char can fail if the original value is negative,
due to numeric promotion.  The assertion should pass even if the value
is invalid CompressionType, because the callee
UncompressBlockContentsForCompressionType checks for that and reports
status appropriately.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6780

Test Plan:
Temporarily change kZSTD = 0x88 and see tests fail. Make this
change (in addition), and tests pass.

Reviewed By: siying

Differential Revision: D21328498

Pulled By: pdillinger

fbshipit-source-id: 61caf8d815581ce49261ecb7ab0f396e9ac4bb92
2020-04-30 12:11:00 -07:00
Levi Tamasi
fe238e5438 Keep track of obsolete blob files in VersionSet (#6755)
Summary:
The patch adds logic to keep track of obsolete blob files. A blob file becomes
obsolete when the last `shared_ptr` that points to the corresponding
`SharedBlobFileMetaData` object goes away, which, in turn, happens when the
last `Version` that contains the blob file is destroyed. No longer needed blob
files are added to the obsolete list in `VersionSet` using a custom deleter to
avoid unnecessary coupling between `SharedBlobFileMetaData` and `VersionSet`.
Obsolete blob files are returned by `VersionSet::GetObsoleteFiles` and stored
in `JobContext`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6755

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D21233155

Pulled By: ltamasi

fbshipit-source-id: 47757e06fdc0127f27ed57f51abd27893d9a7b7a
2020-04-30 11:25:51 -07:00
Adam Retter
cf342464ca Add Slack forum to README (#6773)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6773

Reviewed By: siying

Differential Revision: D21310229

Pulled By: pdillinger

fbshipit-source-id: c0d52d0c51121d307d7d5c1374abc7bf78b0c4cf
2020-04-30 11:00:28 -07:00
Ziyue Yang
e619a20e93 Add an option for parallel compression in for db_stress (#6722)
Summary:
This commit adds an `compression_parallel_threads` option in
db_stress. It also fixes the naming of parallel compression
option in db_bench to keep it aligned with others.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722

Reviewed By: pdillinger

Differential Revision: D21091385

fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100
2020-04-30 10:49:07 -07:00
Zhichao Cao
8c694025e9 Fix potential size_t overflow in import_column_family (#6762)
Summary:
The issue is reported in https://github.com/facebook/rocksdb/issues/6753 . size_t is unsigned and if sorted_file.size() is 0, the end condition of i will be extremely large, cause segment fault in sorted_files[i] and sorted_files[i+1]. Added condition to fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6762

Test Plan: make asan_check

Reviewed By: pdillinger

Differential Revision: D21323063

Pulled By: zhichao-cao

fbshipit-source-id: 56ce59201949ed319448228553202b8642c2cc3a
2020-04-30 08:40:42 -07:00
anand76
b938e6042b Fix a couple of bugs in FaultInjectionTestFS (#6777)
Summary:
Fix the following cases that can cause false alarms in db_stress when read fault injection is
 enabled -
1. Turn off corruption/truncation when direct IO is enabled. Since the actual IO size is larger than block size due to alignment requirements, the corruption may not result in a detectable error.
2. Handle the case when the randomly generated string to overwrite the original block is identical to the original.

Tests:
Run db_stress w/ and wo/ direct IO and fault injection turned on
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6777

Reviewed By: zhichao-cao

Differential Revision: D21316734

Pulled By: anand1976

fbshipit-source-id: bf0e6468043063ca81ff877d4bf71d3f296c77aa
2020-04-29 19:28:29 -07:00
Peter Dillinger
28fe8e4620 Fix bug in format-diff.sh (#6772)
Summary:
Nasty bug in which more/different changes would be applied than
those shown to user
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6772

Test Plan: manual

Reviewed By: siying

Differential Revision: D21304604

Pulled By: pdillinger

fbshipit-source-id: 7e20740e513c9c300d1522511290a025b35abedc
2020-04-29 13:45:54 -07:00
Derrick Pallas
5272305437 Fix FilterBench when RTTI=0 (#6732)
Summary:
The dynamic_cast in the filter benchmark causes release mode to fail due to
no-rtti.  Replace with static_cast_with_check.

Signed-off-by: Derrick Pallas <derrick@pallas.us>

Addition by peterd: Remove unnecessary 2nd template arg on all static_cast_with_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6732

Reviewed By: ltamasi

Differential Revision: D21304260

Pulled By: pdillinger

fbshipit-source-id: 6e8eb437c4ca5a16dbbfa4053d67c4ad55f1608c
2020-04-29 13:09:23 -07:00
Peter Dillinger
8086e5e294 Fix LITE build (#6770)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6770

Test Plan: make LITE=1 check

Reviewed By: ajkr

Differential Revision: D21296261

Pulled By: pdillinger

fbshipit-source-id: b6075cc13a6d6db48617b7e0e9ebeea9364dfd9f
2020-04-28 21:37:20 -07:00
anand76
335ea73e49 Fix a valgrind failure due to DBBasicTestMultiGetDeadline (#6756)
Summary:
Fix a valgrind failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6756

Test Plan: valgrind_test

Reviewed By: pdillinger

Differential Revision: D21284660

Pulled By: anand1976

fbshipit-source-id: 39bf1bd130b6adb585ddbf2f9aa2f53dbf666f80
2020-04-28 20:06:06 -07:00
mrambacher
618bf638aa Add Functions to OptionTypeInfo (#6422)
Summary:
Added functions for parsing, serializing, and comparing elements to OptionTypeInfo.  These functions allow all of the special cases that could not be handled directly in the map of OptionTypeInfo to be moved into the map.  Using these functions, every type can be handled via the map rather than special cased.

By adding these functions, the code for handling options can become more standardized (fewer special cases) and (eventually) handled completely by common classes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6422

Test Plan: pass make check

Reviewed By: siying

Differential Revision: D21269005

Pulled By: zhichao-cao

fbshipit-source-id: 9ba71c721a38ebf9ee88259d60bd81b3282b9077
2020-04-28 18:04:26 -07:00
Peter Dillinger
b810e62b39 Clarifying comments in db.h (#6768)
Summary:
And fix a confusingly worded log message
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6768

Reviewed By: anand1976

Differential Revision: D21284527

Pulled By: pdillinger

fbshipit-source-id: f03c1422c229a901c3a65e524740452349626164
2020-04-28 15:26:03 -07:00
Peter Dillinger
bae6f58696 Basic MultiGet support for partitioned filters (#6757)
Summary:
In MultiGet, access each applicable filter partition only once
per batch, rather than for each applicable key. Also,

* Fix Bloom stats for MultiGet
* Fix/refactor MultiGetContext::Range::KeysLeft, including
* Add efficient BitsSetToOne implementation
* Assert that MultiGetContext::Range does not go beyond shift range

Performance test: Generate db:

    $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10 -partition_index_and_filters=true
    ...

Before (middle performing run of three; note some missing Bloom stats):

    $ ./db_bench --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 | egrep 'micros/op|block.cache.filter.hit|bloom.filter.(full|use)|number.multiget'
    multireadrandom :      26.403 micros/op 597517 ops/sec; (548427 of 671968 found)
    rocksdb.block.cache.filter.hit COUNT : 83443275
    rocksdb.bloom.filter.useful COUNT : 0
    rocksdb.bloom.filter.full.positive COUNT : 0
    rocksdb.bloom.filter.full.true.positive COUNT : 7931450
    rocksdb.number.multiget.get COUNT : 385984
    rocksdb.number.multiget.keys.read COUNT : 12351488
    rocksdb.number.multiget.bytes.read COUNT : 793145000
    rocksdb.number.multiget.keys.found COUNT : 7931450

After (middle performing run of three):

    $ ./db_bench_new --use-existing-db --benchmarks=multireadrandom --num=15000000 --cache_index_and_filter_blocks --bloom_bits=10 --threads=16 --cache_size=20000000 -partition_index_and_filters -batch_size=32 -multiread_batched -statistics --duration=20 2>&1 | egrep 'micros/op|block.cache.filter.hit|bloom.filter.(full|use)|number.multiget'
    multireadrandom :      21.024 micros/op 752963 ops/sec; (705188 of 863968 found)
    rocksdb.block.cache.filter.hit COUNT : 49856682
    rocksdb.bloom.filter.useful COUNT : 45684579
    rocksdb.bloom.filter.full.positive COUNT : 10395458
    rocksdb.bloom.filter.full.true.positive COUNT : 9908456
    rocksdb.number.multiget.get COUNT : 481984
    rocksdb.number.multiget.keys.read COUNT : 15423488
    rocksdb.number.multiget.bytes.read COUNT : 990845600
    rocksdb.number.multiget.keys.found COUNT : 9908456

So that's about 25% higher throughput even for random keys
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6757

Test Plan: unit test included

Reviewed By: anand1976

Differential Revision: D21243256

Pulled By: pdillinger

fbshipit-source-id: 5644a1468d9e8c8575be02f4e04bc5d62dbbb57f
2020-04-28 14:49:34 -07:00
Peter Dillinger
a7f0b27b39 HISTORY.md update for bzip upgrade (#6767)
Summary:
See https://github.com/facebook/rocksdb/issues/6714 and https://github.com/facebook/rocksdb/issues/6703
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6767

Reviewed By: riversand963

Differential Revision: D21283307

Pulled By: pdillinger

fbshipit-source-id: 8463bec725669d13846c728ad4b5bde43f9a84f8
2020-04-28 12:29:31 -07:00
Peter Dillinger
4574d7513d Update HISTORY.md for block cache redundant adds (#6764)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6764

Reviewed By: ltamasi

Differential Revision: D21267108

Pulled By: pdillinger

fbshipit-source-id: a3dfe2dbe4e8f6309a53eb72903ef58d52308f97
2020-04-28 08:26:43 -07:00
Yanqin Jin
d4398e08fc Fix timestamp support for MultiGet (#6748)
Summary:
1. Avoid nullptr dereference when passing timestamp to KeyContext creation.
2. Construct LookupKey correctly with timestamp when creating MultiGetContext.
3. Compare without timestamp when sorting KeyContexts.

Fixes https://github.com/facebook/rocksdb/issues/6745

Test plan (dev server):
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6748

Reviewed By: pdillinger

Differential Revision: D21258691

Pulled By: riversand963

fbshipit-source-id: 44e65b759c18b9986947783edf03be4f890bb004
2020-04-27 22:49:56 -07:00
Cheng Chang
4cd859edf1 Fix build under LITE (#6758)
Summary:
GetSupportedCompressions needs to be defined under LITE.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6758

Test Plan: build under LITE

Reviewed By: zhichao-cao

Differential Revision: D21247937

Pulled By: cheng-chang

fbshipit-source-id: 880e59d3e107cdd736d16427a68c5641d1318fb4
2020-04-27 16:55:14 -07:00
Levi Tamasi
bea91d5d61 Destroy any ColumnFamilyHandles in BlobDB::Open upon error (#6763)
Summary:
If an error happens during BlobDBImpl::Open after the base DB has been
opened, we need to destroy the `ColumnFamilyHandle`s returned by `DB::Open`
to prevent an assertion in `ColumnFamilySet`'s destructor from being hit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6763

Test Plan: Ran `make check` and tested using the BlobDB mode of `db_bench`.

Reviewed By: riversand963

Differential Revision: D21262643

Pulled By: ltamasi

fbshipit-source-id: 60ebc7ab19be66cf37fbe5f6d8957d58470f3d3b
2020-04-27 16:45:13 -07:00
Albert Hse-Lin Chen
cc8d16efd6 Fixed minor typo in comment for MergeOperator::FullMergeV2() (#6759)
Summary:
Fixed minor typo in comment for FullMergeV2().
Last operand up to snapshot should be +4 instead of +3.

Signed-off-by: Albert Hse-Lin Chen <hselin@kalista.io>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6759

Reviewed By: cheng-chang

Differential Revision: D21260295

Pulled By: zhichao-cao

fbshipit-source-id: cc942306f246c8606538feb30bfdf6df9fb6c54e
2020-04-27 14:44:47 -07:00
Peter Dillinger
249eff0f30 Stats for redundant insertions into block cache (#6681)
Summary:
Since read threads do not coordinate on loading data into block
cache, two threads between Lookup and Insert can end up loading and
inserting the same data. This is particularly concerning with
cache_index_and_filter_blocks since those are hot and more likely to
be race targets if ejected from (or not pre-populated in) the cache.

Particularly with moves toward disaggregated / network storage, the cost
of redundant retrieval might be high, and we should at least have some
hard statistics from which we can estimate impact.

Example with full filter thrashing "cliff":

    $ ./db_bench --benchmarks=fillrandom --num=15000000 --cache_index_and_filter_blocks -bloom_bits=10
    ...
    $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((130 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 | egrep '^rocksdb.block.cache.(.*add|.*redundant)' | grep -v compress | sort
    rocksdb.block.cache.add COUNT : 14181
    rocksdb.block.cache.add.failures COUNT : 0
    rocksdb.block.cache.add.redundant COUNT : 476
    rocksdb.block.cache.data.add COUNT : 12749
    rocksdb.block.cache.data.add.redundant COUNT : 18
    rocksdb.block.cache.filter.add COUNT : 1003
    rocksdb.block.cache.filter.add.redundant COUNT : 217
    rocksdb.block.cache.index.add COUNT : 429
    rocksdb.block.cache.index.add.redundant COUNT : 241
    $ ./db_bench --db=/tmp/rocksdbtest-172704/dbbench --use_existing_db --benchmarks=readrandom,stats --num=200000 --cache_index_and_filter_blocks --cache_size=$((120 * 1024 * 1024)) --bloom_bits=10 --threads=16 -statistics 2>&1 | egrep '^rocksdb.block.cache.(.*add|.*redundant)' | grep -v compress | sort
    rocksdb.block.cache.add COUNT : 1182223
    rocksdb.block.cache.add.failures COUNT : 0
    rocksdb.block.cache.add.redundant COUNT : 302728
    rocksdb.block.cache.data.add COUNT : 31425
    rocksdb.block.cache.data.add.redundant COUNT : 12
    rocksdb.block.cache.filter.add COUNT : 795455
    rocksdb.block.cache.filter.add.redundant COUNT : 130238
    rocksdb.block.cache.index.add COUNT : 355343
    rocksdb.block.cache.index.add.redundant COUNT : 172478
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6681

Test Plan: Some manual testing (above) and unit test covering key metrics is included

Reviewed By: ltamasi

Differential Revision: D21134113

Pulled By: pdillinger

fbshipit-source-id: c11497b5f00f4ffdfe919823904e52d0a1a91d87
2020-04-27 13:20:27 -07:00
Akanksha Mahajan
75b13ea94a Allow sst_dump to check size of different compression levels and report time (#6634)
Summary:
Summary : 1. Add two arguments --compression_level_from and --compression_level_to to check
	  the compression size with different compression level in the given range. Users must
          specify one compression type else it will error out. Both from and to levels must
	  also be specified together.
	  2. Display the time taken to compress each file with different compressions by default.

Test Plan : make -j64 check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6634

Test Plan: make -j64 check

Reviewed By: anand1976

Differential Revision: D20810282

Pulled By: akankshamahajan15

fbshipit-source-id: ac9098d3c079a1fad098f6678dbedb4d888a791b
2020-04-27 12:36:16 -07:00
Peter Dillinger
791e5714a5 Understand common build variables passed as make variables (#6740)
Summary:
Some common build variables like USE_CLANG and
COMPILE_WITH_UBSAN did not work if specified as make variables, as in
`make USE_CLANG=1 check` etc. rather than (in theory less hygienic)
`USE_CLANG=1 make check`. This patches Makefile to export some commonly
used ones to build_detect_platform so that they work. (I'm skeptical of
a broad `export` in Makefile because it's hard to predict how random
make variables might affect various invoked tools.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6740

Test Plan: manual / CI

Reviewed By: siying

Differential Revision: D21229011

Pulled By: pdillinger

fbshipit-source-id: b00c69b23eb2a13105bc8d860ce2d1e61ac5a355
2020-04-27 10:48:49 -07:00
Yanqin Jin
3b2f2719eb Update buckifier to unblock future internal release (#6726)
Summary:
Some recent PRs added new source files or modified TARGETS file manually.
During next internal release, executing the following command will revert the
manual changes.
Update buckifier so that the following command
```
python buckfier/buckify_rocksdb.py
```
does not change TARGETS file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6726

Test Plan:
```
python buckifier/buckify_rocksdb.py
```

Reviewed By: siying

Differential Revision: D21098930

Pulled By: riversand963

fbshipit-source-id: e884f507fefef88163363c9097a460c98f1ed850
2020-04-26 17:35:37 -07:00
Cheng Chang
0a77617820 Disable O_DIRECT in stress test when db directory does not support direct IO (#6727)
Summary:
In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test.

This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727

Test Plan:
export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"
make -j24 crash_test

Reviewed By: zhichao-cao

Differential Revision: D21139250

Pulled By: cheng-chang

fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee
2020-04-25 00:01:03 -07:00
Cheng Chang
40497a875a Reduce memory copies when fetching and uncompressing blocks from SST files (#6689)
Summary:
In https://github.com/facebook/rocksdb/pull/6455, we modified the interface of `RandomAccessFileReader::Read` to be able to get rid of memcpy in direct IO mode.
This PR applies the new interface to `BlockFetcher` when reading blocks from SST files in direct IO mode.

Without this PR, in direct IO mode, when fetching and uncompressing compressed blocks, `BlockFetcher` will first copy the raw compressed block into `BlockFetcher::compressed_buf_` or `BlockFetcher::stack_buf_` inside `RandomAccessFileReader::Read` depending on the block size. then during uncompressing, it will copy the uncompressed block into `BlockFetcher::heap_buf_`.

In this PR, we get rid of the first memcpy and directly uncompress the block from `direct_io_buf_` to `heap_buf_`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6689

Test Plan: A new unit test `block_fetcher_test` is added.

Reviewed By: anand1976

Differential Revision: D21006729

Pulled By: cheng-chang

fbshipit-source-id: 2370b92c24075692423b81277415feb2aed5d980
2020-04-24 15:32:56 -07:00
Cheng Chang
1758f76f2d Fix unused variable of r in release mode (#6750)
Summary:
In release mode, asserts are not compiled, so `r` is not used, causing compiler warnings.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6750

Test Plan: make check under release mode

Reviewed By: anand1976

Differential Revision: D21220365

Pulled By: cheng-chang

fbshipit-source-id: fd4afa9843d54af68c4da8660ec61549803e1167
2020-04-24 15:14:13 -07:00
anand76
9e7b7e2c08 Silence false alarms in db_stress fault injection (#6741)
Summary:
False alarms are caused by codepaths that intentionally swallow IO
errors.

Tests:
make crash_test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6741

Reviewed By: ltamasi

Differential Revision: D21181138

Pulled By: anand1976

fbshipit-source-id: 5ccfbc68eb192033488de6269e59c00f2c65ce00
2020-04-24 13:06:12 -07:00
Yanqin Jin
e04f3bce4f Update CURRENT file after best-efforts recovery (#6746)
Summary:
After a successful recovery, the CURRENT file should be updated to point to the valid MANIFEST.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6746

Test Plan: make check

Reviewed By: anand1976

Differential Revision: D21189876

Pulled By: riversand963

fbshipit-source-id: 7537b49988c5c425ebe9505a5cc260de351ad79b
2020-04-23 16:21:09 -07:00
Cheng Chang
51bdfae010 Check alignment of MultiRead requests in direct IO mode (#6739)
Summary:
Add assertions to check direct IO's alignment requirements in MultiRead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6739

Test Plan: make check

Reviewed By: siying

Differential Revision: D21143825

Pulled By: cheng-chang

fbshipit-source-id: 26f1623b062a1851080771128feac0669a61f5e9
2020-04-23 15:19:31 -07:00
Levi Tamasi
bc51e33d9c Make sure (Shared)BlobFileMetaData are owned by shared_ptrs (#6749)
Summary:
The patch makes a couple of small cleanups to `SharedBlobFileMetaData` and `BlobFileMetaData`:
* It makes the constructors private and introduces factory methods to ensure these objects are always owned by `shared_ptr`s. Note that `SharedBlobFileMetaData` has an additional factory that takes a deleter object; we can utilize this to e.g. notify `VersionSet` when a blob file becomes obsolete (which is exactly when `SharedBlobFileMetaData` is destroyed).
* It disables move operations explicitly instead of relying on them being suppressed because of a user-declared destructor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6749

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D21206947

Pulled By: ltamasi

fbshipit-source-id: 9094c14cc335b3e226f883e5a0df4f87a5cdeb95
2020-04-23 13:44:29 -07:00
Ibrahim Jarif
ae77880223 Fix some typos in code comments (#6733)
Summary:
This PR fixes some typos in code comments.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6733

Reviewed By: siying

Differential Revision: D21209037

Pulled By: zhichao-cao

fbshipit-source-id: d9274611fab1f5e992998c8c4117b8078c4cbc69
2020-04-23 12:28:49 -07:00
mrambacher
4cbc19d2a1 Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389)
Summary:
The methods in convenience.h are used to compare/convert objects to/from strings.  There is a mishmash of parameters in use here with more needed in the future.  This PR replaces those parameters with a single structure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389

Reviewed By: siying

Differential Revision: D21163707

Pulled By: zhichao-cao

fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd
2020-04-21 17:38:17 -07:00
anand76
c1ccd6b6af Implement deadline support for MultiGet (#6710)
Summary:
Initial implementation of ReadOptions.deadline for MultiGet. If the request takes longer than the deadline, the keys not yet found will be returned with Status::TimedOut(). This
implementation enforces the deadline in DBImpl, which is fairly high
level. Its best effort and may not check the deadline after every key
lookup, but may do so after a batch of keys.

In subsequent stages, we will extend this to passing a timeout down to the FileSystem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6710

Test Plan: Add new unit tests

Reviewed By: riversand963

Differential Revision: D21149158

Pulled By: anand1976

fbshipit-source-id: 9f44eecffeb40873f5034ed59a66d21f9f88879e
2020-04-21 14:51:51 -07:00
Tomas Kolda
6ee66cf8f0 Prevents Table Cache to open same files more times (#6707)
Summary:
In highly concurrent requests table cache opens same file more times which lowers purpose of max_open_files. Fixes (https://github.com/facebook/rocksdb/issues/6699)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6707

Reviewed By: ltamasi

Differential Revision: D21044965

fbshipit-source-id: f6e91d90b60dad86e518b5147021da42460ee1d2
2020-04-21 13:16:31 -07:00
Andrew Kryczka
f9155a3404 Prevent uninitialized load in IndexBlockIter (#6736)
Summary:
When index block is empty or an error happens while reading it,
`Invalidate()` is called rather than `Initialize()`. So `Seek()` must
not refer to member variables that are only initialized in
`Initialize()` until it is sure `Initialize()` has been called.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6736

Reviewed By: siying

Differential Revision: D21139641

Pulled By: ajkr

fbshipit-source-id: 71c58cc1adbd795dc3729dd5023bf7df1515ff32
2020-04-20 16:32:43 -07:00
Akanksha Mahajan
03a1d95db0 Set max_background_flushes dynamically (#6701)
Summary:
1. Add changes so that max_background_flushes can be set dynamically.
                   2. Add a testcase DBOptionsTest.SetBackgroundFlushThreads which set the
                        max_background_flushes dynamically using SetDBOptions.

TestPlan:  1. make -j64 check
                  2. Using new testcase DBOptionsTest.SetBackgroundFlushThreads
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6701

Reviewed By: ajkr

Differential Revision: D21028010

Pulled By: akankshamahajan15

fbshipit-source-id: 5f949e4a8fd3c32537b637947b7ee09a69cfc7c1
2020-04-20 16:19:02 -07:00