Commit Graph

8857 Commits

Author SHA1 Message Date
phantomape
cb671ea1ca env: Add clearerr() before repeating an interrupted file read (#6609)
Summary:
This change updates PosixSequentialFile::Read to call clearerr()
before fread()ing again after an EINTR is returned on a previous
fread.

The original fix is from bd8f1ebb91.
Fixing https://github.com/facebook/rocksdb/issues/6509

Signed-off-by: phantomape <cxucheng@outlook.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6609

Reviewed By: zhichao-cao

Differential Revision: D20731482

Pulled By: riversand963

fbshipit-source-id: 7f1f3a1449077d5560f45c465a78d08633740ba0
2020-03-29 21:56:31 -07:00
Zhichao Cao
e8d332d97e Use FileChecksumGenFactory for SST file checksum (#6600)
Summary:
In the current implementation, sst file checksum is calculated by a shared checksum function object, which may make some checksum function hard to be applied here such as SHA1. In this implementation, each sst file will have its own checksum generator obejct, created by FileChecksumGenFactory. User needs to implement its own FilechecksumGenerator and Factory to plugin the in checksum calculation method.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6600

Test Plan: tested with make asan_check

Reviewed By: riversand963

Differential Revision: D20717670

Pulled By: zhichao-cao

fbshipit-source-id: 2a74c1c280ac11a07a1980185b43b671acaa71c6
2020-03-29 15:58:46 -07:00
Cheng Chang
ee50b8d499 Be able to decrease background thread's CPU priority when creating database backup (#6602)
Summary:
When creating a database backup, the background threads will not only consume IO resources by copying files, but also consuming CPU such as by computing checksums. During peak times, the CPU consumption by the background threads might affect online queries.

This PR makes it possible to decrease CPU priority of these threads when creating a new backup.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6602

Test Plan: make check

Reviewed By: siying, zhichao-cao

Differential Revision: D20683216

Pulled By: cheng-chang

fbshipit-source-id: 9978b9ed9488e8ce135e90ca083e5b4b7221fd84
2020-03-28 19:07:25 -07:00
Levi Tamasi
3a35542f86 Call out the cache deleter related interface change in HISTORY.md (#6606)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6606

Reviewed By: riversand963

Differential Revision: D20708411

Pulled By: ltamasi

fbshipit-source-id: c15b4ded19a4b5c84e3e4240bdcec15460806c88
2020-03-27 16:18:23 -07:00
Cheng Chang
3881a678d5 Refactor IsLockExpired (#6586)
Summary:
1. If expiration_time is non-positive, no need to call NowMicros, save a syscall.
2. expire_time should only be set when expired is false.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6586

Test Plan: make check

Reviewed By: lth

Differential Revision: D20673730

Pulled By: cheng-chang

fbshipit-source-id: a69e8d7b16dc6d0d00487bb1c19f0710d79482e2
2020-03-27 16:14:22 -07:00
Zhichao Cao
4246888101 Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487)
Summary:
In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status.

The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487

Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check

Reviewed By: anand1976

Differential Revision: D20685017

Pulled By: zhichao-cao

fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0
2020-03-27 16:04:43 -07:00
Cheng Chang
2e276973e4 Compute cv_end_time with simpler logic (#6585)
Summary:
The refactored logic is easier to read.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6585

Test Plan: make check

Reviewed By: lth

Differential Revision: D20663225

Pulled By: cheng-chang

fbshipit-source-id: cfd28955cd03b0a71d9087085170875f6dd0be9e
2020-03-27 16:01:23 -07:00
Burton Li
8abd41a544 Fix write_unprepared_transaction_test crash on debug version. (#6574)
Summary:
The last key may hit index of out bound exception when id = 9.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6574

Reviewed By: riversand963

Differential Revision: D20699791

Pulled By: cheng-chang

fbshipit-source-id: 8e2c5be5ff0e53e9857cfd59cea97cff21446819
2020-03-27 11:12:23 -07:00
Peter Dillinger
e91d1a21a6 Streamline persistent_cache_test for testing efficiency (#6601)
Summary:
This test was written like a stress test, using up to 3x26GB
RSS memory during parallel 'make check'. Now, while this code is mostly
dormant, I've made the "for Travis" versions of the expensive tests the
canonical versions and disabled the expensive versions. This has the
side benefit of removing some arbitrary conditional compilation.

For unknown reason, the super expensive tests were gated on
Snappy_Supported, which appears to be irrelevant, so I removed it.

The tests can be fixed / improved / migrated to stress test if/when they
are deemed important again.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6601

Test Plan:
make check + CI

./persistent_cache_test Before:
...
[==========] 10 tests from 2 test cases ran. (114541 ms total)
[  PASSED  ] 10 tests.
YOU HAVE 1 DISABLED TEST

After:
...
[==========] 3 tests from 2 test cases ran. (1714 ms total)
[  PASSED  ] 3 tests.
YOU HAVE 10 DISABLED TESTS

Reviewed By: siying

Differential Revision: D20680983

Pulled By: pdillinger

fbshipit-source-id: 2be0fde13eeb0a71110ac7f5477cfe63996a509e
2020-03-26 19:36:32 -07:00
Levi Tamasi
6f62322fe4 Add blob files to VersionStorageInfo/VersionBuilder (#6597)
Summary:
The patch adds a couple of classes to represent metadata about
blob files: `SharedBlobFileMetaData` contains the information elements
that are immutable (once the blob file is closed), e.g. blob file number,
total number and size of blob files, checksum method/value, while
`BlobFileMetaData` contains attributes that can vary across versions like
the amount of garbage in the file. There is a single `SharedBlobFileMetaData`
for each blob file, which is jointly owned by the `BlobFileMetaData` objects
that point to it; `BlobFileMetaData` objects, in turn, are owned by `Version`s
and can also be shared if the (immutable _and_ mutable) state of the blob file
is the same in two versions.

In addition, the patch adds the blob file metadata to `VersionStorageInfo`, and extends
`VersionBuilder` so that it can apply blob file related `VersionEdit`s (i.e. those
containing `BlobFileAddition`s and/or `BlobFileGarbage`), and save blob file metadata
to a new `VersionStorageInfo`. Consistency checks are also extended to ensure
that table files point to blob files that are part of the `Version`, and that all blob files
that are part of any given `Version` have at least some _non_-garbage data in them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6597

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D20656803

Pulled By: ltamasi

fbshipit-source-id: f1f74d135045b3b42d0146f03ee576ef0a4bfd80
2020-03-26 18:51:53 -07:00
Levi Tamasi
6301dbe7a7 Use function objects as deleters in the block cache (#6545)
Summary:
As the first step of reintroducing eviction statistics for the block
cache, the patch switches from using simple function pointers as deleters
to function objects implementing an interface. This will enable using
deleters that have state, like a smart pointer to the statistics object
that is to be updated when an entry is removed from the cache. For now,
the patch adds a deleter template class `SimpleDeleter`, which simply
casts the `value` pointer to its original type and calls `delete` or
`delete[]` on it as appropriate. Note: to prevent object lifecycle
issues, deleters must outlive the cache entries referring to them;
`SimpleDeleter` ensures this by using the ("leaky") Meyers singleton
pattern.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6545

Test Plan: `make asan_check`

Reviewed By: siying

Differential Revision: D20475823

Pulled By: ltamasi

fbshipit-source-id: fe354c33dd96d9bafc094605462352305449a22a
2020-03-26 16:19:58 -07:00
Mike Kolupaev
963af52f15 Fix iterator reading filter block despite read_tier == kBlockCacheTier (#6562)
Summary:
We're seeing iterators with `ReadOptions::read_tier == kBlockCacheTier` sometimes doing file reads. Stack trace:

```
rocksdb::RandomAccessFileReader::Read(unsigned long, unsigned long, rocksdb::Slice*, char*, bool) const
rocksdb::BlockFetcher::ReadBlockContents()
rocksdb::Status rocksdb::BlockBasedTable::MaybeReadBlockAndLoadToCache<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::BlockContents*) const
rocksdb::Status rocksdb::BlockBasedTable::RetrieveBlock<rocksdb::ParsedFullFilterBlock>(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, bool, bool) const
rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::ReadFilterBlock(rocksdb::BlockBasedTable const*, rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*)
rocksdb::FilterBlockReaderCommon<rocksdb::ParsedFullFilterBlock>::GetOrReadFilterBlock(bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::CachableEntry<rocksdb::ParsedFullFilterBlock>*) const
rocksdb::FullFilterBlockReader::MayMatch(rocksdb::Slice const&, bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*) const
rocksdb::FullFilterBlockReader::RangeMayExist(rocksdb::Slice const*, rocksdb::Slice const&, rocksdb::SliceTransform const*, rocksdb::Comparator const*, rocksdb::Slice const*, bool*, bool, rocksdb::BlockCacheLookupContext*)
rocksdb::BlockBasedTable::PrefixMayMatch(rocksdb::Slice const&, rocksdb::ReadOptions const&, rocksdb::SliceTransform const*, bool, rocksdb::BlockCacheLookupContext*) const
rocksdb::BlockBasedTableIterator<rocksdb::DataBlockIter, rocksdb::Slice>::SeekImpl(rocksdb::Slice const*)
rocksdb::ForwardIterator::SeekInternal(rocksdb::Slice const&, bool)
rocksdb::DBIter::Seek(rocksdb::Slice const&)
```

`BlockBasedTableIterator::CheckPrefixMayMatch` was missing a check for `kBlockCacheTier`. This PR adds it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6562

Test Plan: deployed it to a logdevice test cluster and looked at logdevice's IO tracing.

Reviewed By: siying

Differential Revision: D20529368

Pulled By: al13n321

fbshipit-source-id: 65bf33964b1951464415c900336635fb20919611
2020-03-26 15:21:26 -07:00
Peter Dillinger
e70629e5f7 Re-update check_format_compatible.sh for default format_version=4 (#6598)
Summary:
Forward compatibility with new defaults only starts from 5.16
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6598

Test Plan: facebook automated test (so much easier than running myself)

Reviewed By: riversand963

Differential Revision: D20665553

Pulled By: pdillinger

fbshipit-source-id: b846bfaccf4d0946f92d323a3b4ee6e3e548df93
2020-03-26 10:11:09 -07:00
Peter Dillinger
8599efabab Update check_format_compatible.sh for default format_version=4 (#6594)
Summary:
And add releases that should have been added before (6.6 - 6.8)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6594

Test Plan: facebook automated test (so much easier than running myself)

Reviewed By: riversand963

Differential Revision: D20649106

Pulled By: pdillinger

fbshipit-source-id: 78832449d9295580282cebf117e3968362fbdc69
2020-03-25 13:54:58 -07:00
Peter Dillinger
93b80ca7ba Update default BBTO::format_version from 2 to 4 (#6582)
Summary:
Version 4 has been around long enough, for compatibility and
extensive validation, that it should be default.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6582

Test Plan:
CI (w.r.t. changing the default; format_version=4 is well
tested and massively in production at Facebook)

Reviewed By: siying

Differential Revision: D20625233

Pulled By: pdillinger

fbshipit-source-id: 2f83ed874cffa4a39bc7a66cdf3833b978fbb948
2020-03-24 21:22:21 -07:00
Yanqin Jin
ccf7676455 Update a few scripts to be python3 compatible (#6525)
Summary:
There are a few scripts with python3 compatibility issues that were not
detected by automated tool before. Update them now.

Test Plan (devserver):
python2 tools/ldb_test.py
python3 tools/ldb_test.py

python2 tools/write_stress_runner.py --runtime_sec=30
python3 tools/write_stress_runner.py --runtime_sec=30

python2 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox
python3 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox

python2 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
python3 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6525

Reviewed By: cheng-chang

Differential Revision: D20627820

Pulled By: riversand963

fbshipit-source-id: 4b25a7bd4d001c7f868be8b640ef876523be6ca3
2020-03-24 21:00:27 -07:00
sdong
6fd0ed4993 CompactRange() to use bottom pool when goes to bottommost level (#6593)
Summary:
In automatic compaction, if a compaction is bottommost, it goes to bottom thread pool. We should do the same for manual compaction too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6593

Test Plan: Add a unit test. See all existing tests pass.

Reviewed By: ajkr

Differential Revision: D20637408

fbshipit-source-id: cb03031e8f895085f7acf6d2d65e69e84c9ddef3
2020-03-24 20:24:32 -07:00
akankshamahajan
ceeca7542d Create a thread in DeleteScheduler only when rate limit enabled (#6564)
Summary:
Create a thread in DeleteScheduler only when delete rate limit is set
	 because when there is no rate limit on deletion, a thread per DeleteScheduler
	 consumes unnecessary resources.

Test Plan: make -j64 check

Reviewed By: riversand963

Differential Revision: D20538138

Pulled By: akankshamahajan15

fbshipit-source-id: 137499e810e817156345c30d627f8678b9adadf7
2020-03-24 11:29:51 -07:00
Huisheng Liu
a6ce5c823b multiget support for timestamps (#6483)
Summary:
Add timestamp support for MultiGet().
timestamp from readoptions is honored, and timestamps can be returned along with values.

MultiReadRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
base line (commit 17bef7d3a):
  multireadrandom :     104.173 micros/op 307167 ops/sec; (5462999 of 5462999 found)
This PR:
  multireadrandom :     104.199 micros/op 307095 ops/sec; (5307999 of 5307999 found)

.\db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=multireadrandom --use_existing_db=1 --num=25000000 --threads=32 --allow_concurrent_memtable_write=0
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6483

Reviewed By: anand1976

Differential Revision: D20498373

Pulled By: riversand963

fbshipit-source-id: 8505f22bc40fd791bc7dd05e48d7e67c91edb627
2020-03-24 11:24:09 -07:00
sdong
921cdd37e2 Fix bug that number of table loading threads is set as a boolean (#6576)
Summary:
When applying a new version in non DB open case, optimize_filters_for_hits is used for max_threads, which is clearly a bug. It is not clear what the indented value in the first place, but it value 1 makes sense here, which would create no extra threads. This bug is not expected to cause user visible problems, assuming C++ implicitly cast bool to 0 or 1.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6576

Test Plan: Run all exsiting test.

Reviewed By: ajkr

Differential Revision: D20602467

fbshipit-source-id: 40b2cd8619aba09ae9242b36c415464db3c9b737
2020-03-24 10:17:40 -07:00
anand76
a9d168cfd7 Simplify migration to FileSystem API (#6552)
Summary:
The current Env/FileSystem API separation has a couple of issues -
1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation.
2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes.

This PR solves the above issues and simplifies the migration in the following ways -
1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```.
1a. - This also makes it more robust by ensuring that even if RocksDB
  has some stray calls to Env APIs rather than FileSystem, they will go
  through the same object and thus there is no risk of getting out of
  sync.
2. Provide a ```NewCompositeEnv()``` API that can be used to construct a
PosixEnv with a custom FileSystem implementation. This eliminates an
indirection to call Env APIs, and relieves the FileSystem developer of
the burden of having to implement wrappers for the Env APIs.
3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and
```NewLogger()```

Tests:
1. New unit tests
2. make check and make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552

Reviewed By: riversand963

Differential Revision: D20592038

Pulled By: anand1976

fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7
2020-03-23 21:54:21 -07:00
Cheng Chang
43aee93d2b Initialize scratch to nullptr explicitly to make clang analyzer happy (#6577)
Summary:
`scratch` is not initialized in `Align` because it will be set outside of it. But clang analyzer is strict on initializing it before return.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6577

Test Plan: make analyze

Reviewed By: siying

Differential Revision: D20607303

Pulled By: cheng-chang

fbshipit-source-id: 2843d759345a057a8e122178d30b90deff0f9b2a
2020-03-23 20:15:27 -07:00
Zhichao Cao
d300d10962 Fix the MultiGet testing failure in Circleci (#6578)
Summary:
The MultiGet test in db_basic_test fails in CircleCI vs2019. The reason is that even Snappy compression is enabled, the first compression type is still kNoCompression. This PR checks the list and ensure that only when compression is enable and the compression type is valid, compression will be enabled. Such that, it will not fail the combined read test in MultiGet.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6578

Test Plan: make check, db_basic_test.

Reviewed By: anand1976

Differential Revision: D20607529

Pulled By: zhichao-cao

fbshipit-source-id: dcead264d5c2da105912c18caad34b8510bb04b0
2020-03-23 18:51:09 -07:00
Yanqin Jin
617f479266 Fix LITE build (#6575)
Summary:
Fix LITE build by excluding some unit tests that use features not supported in LITE.
```
db/db_basic_test.cc:1778:8: error: ‘void rocksdb::{anonymous}::TableFileListener::OnTableFileCreated(const rocksdb::TableFileCreationInfo&)’ marked ‘override’, but does not override
   void OnTableFileCreated(const TableFileCreationInfo& info) override {
        ^~~~~~~~~~~~~~~~~~
make: *** [db/db_basic_test.o] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6575

Reviewed By: ltamasi

Differential Revision: D20598598

Pulled By: riversand963

fbshipit-source-id: 367f7cb2500360ad57030b138a94c0f731a04339
2020-03-23 13:05:36 -07:00
Zhichao Cao
5c6346c420 Revert "Added the safe-to-ignore tag to version_edit (#6530)" (#6569)
Summary:
This reverts commit e10553f2a6.

Pass make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6569

Reviewed By: riversand963

Differential Revision: D20574319

Pulled By: zhichao-cao

fbshipit-source-id: ce36981a21596f5f2e14da6a59a2bb3619509a8b
2020-03-23 10:27:47 -07:00
Yanqin Jin
fb09ef05dc Attempt to recover from db with missing table files (#6334)
Summary:
There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status.
This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version.
`DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will *not* be performed.
To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush.

Test plan (on devserver):
```
$make check
$COMPILE_WITH_ASAN=1 make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334

Reviewed By: anand1976

Differential Revision: D19778960

Pulled By: riversand963

fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8
2020-03-20 19:30:48 -07:00
Cheng Chang
4fc216649d Support direct IO in RandomAccessFileReader::MultiRead (#6446)
Summary:
By supporting direct IO in RandomAccessFileReader::MultiRead, the benefits of parallel IO (IO uring) and direct IO can be combined.

In direct IO mode, read requests are aligned and merged together before being issued to RandomAccessFile::MultiRead, so blocks in the original requests might share the same underlying buffer, the shared buffers are returned in `aligned_bufs`, which is a new parameter of the `MultiRead` API.

For example, suppose alignment requirement for direct IO is 4KB, one request is (offset: 1KB, len: 1KB), another request is (offset: 3KB, len: 1KB), then since they all belong to page (offset: 0, len: 4KB), `MultiRead` only reads the page with direct IO into a buffer on heap, and returns 2 Slices referencing regions in that same buffer. See `random_access_file_reader_test.cc` for more examples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6446

Test Plan: Added a new test `random_access_file_reader_test.cc`.

Reviewed By: anand1976

Differential Revision: D20097518

Pulled By: cheng-chang

fbshipit-source-id: ca48a8faf9c3af146465c102ef6b266a363e78d1
2020-03-20 16:33:26 -07:00
Cheng Chang
5fd152b7ad Get block size only in direct IO mode (#6522)
Summary:
When `use_direct_reads` and `use_direct_writes` are `false`, `logical_sector_size_` inside various `*File` implementations are not actually used, so `GetLogicalBlockSize` does not necessarily need to be called for `logical_sector_size_`, just set a default page size.

This is a follow up PR for https://github.com/facebook/rocksdb/pull/6457.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6522

Test Plan: make check

Reviewed By: siying

Differential Revision: D20408885

Pulled By: cheng-chang

fbshipit-source-id: f2d3808f41265237e7fa2c0be9f084f8fa97fe3d
2020-03-20 15:26:10 -07:00
sdong
6c50fe1ec9 Change HashMap::Insert()'s value to a const reference (#6567)
Summary:
When building RocksDB on VS2015, an error shows up with

hash_map.h(39): error C2719: 'value': formal parameter with requested alignment of 8 won't be aligned

Making the reference a reference can solve the problem, and there isn't a reason we can't do that, at least for the current use of the hash map.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6567

Test Plan: See CI tests pass.

Reviewed By: pdillinger

Differential Revision: D20548543

fbshipit-source-id: 255b55d74cf68a0b324e6f504c56608a97ea6276
2020-03-20 14:59:54 -07:00
Peter Dillinger
66cd07c6d9 Exclude more Travis builds for each pull request (#6557)
Summary:
This commit fixes an incorrect version of this change that was previously landed.

On recently adding ARM64 and PPC64LE builds to Travis, we
seem to have hit some parallel build limits that dramatically increased
queue times.

This change majorly limits the configurations for ARM64 and PPC64LE to
build on each pull request, but keeps the large matrix for branch
builds.

In the process, I changed some previously excluded osx build configurations
to happen in branch builds.

NB: we might want to move master branch Travis build to daily trigger
rather than push trigger to further reduce contention.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6557

Test Plan: Travis only

Reviewed By: siying

Differential Revision: D20563425

Pulled By: pdillinger

fbshipit-source-id: d619eb9f196486ed000364aa40de4661f0b1029d
2020-03-20 13:20:19 -07:00
Ben Mehne
d2e3822d67 Make testpilot recognize that these tests have coverage instrumentation
Summary: TestPilot uses two flags to determine whether coverage is already instrumented: `fbcode_macros` and `coverage`.  Normally, these two tags are added automatically to cpp tests, but this is a fake cpp test, so we must manually add them.  The first is easy - `fbcode_macros` is added by the `custom_unittest` library, which is in `fbcode_macros`, so it is appropriate.  The second is harder - we need to verify that we should add the macro.  We do this using the `coverage.bzl` functions.

Reviewed By: siying

Differential Revision: D20549040

fbshipit-source-id: d2732b3ec26f3dff065efdf398abe3241075bb2f
2020-03-20 11:23:23 -07:00
Peter Dillinger
093ff0b2ce Exclude more Travis builds for each pull request (#6557)
Summary:
On recently adding ARM64 and PPC64LE builds to Travis, we
seem to have hit some parallel build limits that dramatically increased
queue times.

This change majorly limits the configurations for ARM64 and PPC64LE to
build on each pull request, but keeps the large matrix for branch
builds.

In the process, I changed some previously excluded osx build configurations
to happen in branch builds.

NB: we might want to move master branch Travis build to daily trigger
rather than push trigger to further reduce contention.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6557

Test Plan: Travis only

Reviewed By: siying

Differential Revision: D20524575

Pulled By: pdillinger

fbshipit-source-id: babcb2c64e195679e472473a1cbdf42de47231ff
2020-03-19 12:53:37 -07:00
Zhichao Cao
e10553f2a6 Added the safe-to-ignore tag to version_edit (#6530)
Summary:
Each time RocksDB switches to a new MANIFEST file from old one, it calls WriteCurrentStateToManifest() which writes a 'snapshot' of the current in-memory state of versions to the beginning of the new manifest as a bunch of version edits. We can distinguish these version edits from other version edits written during normal operations with a custom, safe-to-ignore tag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6530

Test Plan: added test to version_edit_test, pass make asan_check

Reviewed By: riversand963

Differential Revision: D20524516

Pulled By: zhichao-cao

fbshipit-source-id: f1de102f5499bfa88dae3caa2f32c7f42cf904db
2020-03-19 11:30:26 -07:00
Levi Tamasi
442404558a Clean up VersionBuilder a bit (#6556)
Summary:
The whole point of the pimpl idiom is to hide implementation details.
Internal helper methods like `CheckConsistency`, `CheckConsistencyForDeletes`,
and `MaybeAddFile` do not belong in the public interface of the class.
In addition, the patch switches to `unique_ptr` for the implementation
object instead of using a raw `delete`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6556

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D20523568

Pulled By: ltamasi

fbshipit-source-id: 5bbb0ccebd0c47a33b815398c7f9cfe13bd775ac
2020-03-19 10:44:16 -07:00
Levi Tamasi
217ce20021 Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491)
Summary:
Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`,
`GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the
command line parameter `get_live_files_and_wal_files_one_in` is specified.
The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable
in the sense that they can return errors if another thread removes a WAL file
while they are executing (which is a perfectly plausible and legitimate scenario).
The patch splits this command line parameter into three (one for each API),
and changes the crash test script so that only `GetLiveFiles` is tested during
our continuous crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491

Test Plan:
```
make check
python tools/db_crashtest.py whitebox
```

Reviewed By: siying

Differential Revision: D20312200

Pulled By: ltamasi

fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8
2020-03-18 17:14:15 -07:00
sdong
8ad4b32c5d cmake: add option WITH_CORE_TOOLS to exclude tools except ldb and sst_dump (#6506)
Summary:
ldb and sst_dump are most important tools and they don't dependend on gflags. In cmake, we don't have an way to only build these two tools and exclude other tools. This is inconvenient if the environment has a problem with gflags. Add such an option WITH_CORE_TOOLS.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6506

Test Plan: cmake and build with WITH_TOOLS and without.

Differential Revision: D20473029

fbshipit-source-id: 3d730fd14bbae6eeeae7f9cc9aec50a4e488ad72
2020-03-18 11:01:38 -07:00
Levi Tamasi
1df9b01680 Disable distributed mutex test for valgrind_test (#6553)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6553

Test Plan:
```
$ make valgrind_test -j24
$ ./folly_synchronization_distributed_mutex_test
DistributedMutex is not supported in ROCKSDB_LITE, on ARM, or in valgrind_test runs
```

Reviewed By: pdillinger

Differential Revision: D20501966

Pulled By: ltamasi

fbshipit-source-id: 386ec5f258f89d0781a36c5b390c665787093a74
2020-03-18 09:24:31 -07:00
sdong
712bc4b6a2 Fix regression bug in partitioned index reseek caused by #6531 (#6551)
Summary:
https://github.com/facebook/rocksdb/pull/6531 removed some code in partitioned index seek logic. By mistake the logic of storing previous index offset is removed, while the logic of using it is preserved, so that the code might use wrong value to determine reseeking condition.
This will trigger a bug, if following a Seek() not going to the last block, SeekToLast() is called, and then Seek() is called which should position the cursor to the block before SeekToLast().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6551

Test Plan: Add a unit test that reproduces the bug. In the same unit test, also some reseek cases are covered to avoid regression.

Reviewed By: pdillinger

Differential Revision: D20493990

fbshipit-source-id: 3919aa4861c0481ec96844e053048da1a934b91d
2020-03-17 12:33:10 -07:00
akankshamahajan
a8149aef1e Allow table/sst_file_reader_test.cc to use custom Env (#6536)
Summary:
Allowing table/sst_file_reader_test.cc to use custom Env specified by TEST_ENV_URI.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6536

Reviewed By: riversand963

Differential Revision: D20448525

Pulled By: akankshamahajan15

fbshipit-source-id: 74e4d34c8ac4c2743741e78bf599571a4a465459
2020-03-17 11:02:13 -07:00
Yanqin Jin
66ed58083a Reduce runtime of db_with_timestamp_basic_test (#6546)
Summary:
Reduce runtime by reducing test scale to avoid test time-outs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6546

Test Plan:
time ./db_with_timestamp_basic_test
and watch internal tests.

Reviewed By: zhichao-cao

Differential Revision: D20479292

Pulled By: riversand963

fbshipit-source-id: c9e4a155be7699dd4de60fa531de86d442a3ba0a
2020-03-17 10:50:48 -07:00
Yanqin Jin
098dce2d1a Fix compiler warning treated as error (#6547)
Summary:
Define a private member variable only in debug mode. Without fix, build will fail
```
In file included from table/block_based/partitioned_index_iterator.cc:9:
./table/block_based/partitioned_index_iterator.h:125:32: error: private field 'icomp_' is not used [-Werror,-Wunused-private-field]
  const InternalKeyComparator& icomp_;
```

Test plan (dev server)
1. make check
2. Make sure fixed in Travis
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6547

Reviewed By: siying

Differential Revision: D20480027

Pulled By: pdillinger

fbshipit-source-id: 288bc94280e240c3136335b6c73eb1ccb0db459d
2020-03-17 09:59:28 -07:00
Peter Dillinger
6c595f008a Update folly/lang/Align.h (backport to C++11) (#6534)
Summary:
For s390x support, some updates in newer version of Align.h are
needed. Upgrading just that file as best we can, with one addition to
Portability.h and tweaking new code in Align.h to use C++11 only (no
non-trivial constexpr functions).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6534

Test Plan: CI, further work in PR https://github.com/facebook/rocksdb/issues/6168

Differential Revision: D20445942

Pulled By: pdillinger

fbshipit-source-id: 0cef3c367463c71f3123d12cdf287c573af5e342
2020-03-16 19:07:31 -07:00
Peter Dillinger
db02664f35 Remove XXH3(preview) streaming APIs (#6540)
Summary:
There was an alignment bug in our copy of the streaming APIs
for XXH3 (which we dubbed "XXH3p" for "preview" release). Since those
APIs are unused and some values for XXH3 have changed since XXH3p, I'm
simply removing those APIs, expecting it's better to use finalized XXH3
function if/when we decide to use those APIs (e.g. for checksums).

Fixes https://github.com/facebook/rocksdb/issues/6508
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6540

Test Plan: make check

Differential Revision: D20479271

Pulled By: pdillinger

fbshipit-source-id: 246cf1690d614d3b31042b563d249de32dec1e0d
2020-03-16 17:02:00 -07:00
Yanqin Jin
58918d4ccc Use correct Env for DestroyDB in stress test (#6539)
Summary:
When using custom Env, trying to call DestroyDB() with default Options will
fail.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6539

Test Plan: ./db_stress

Differential Revision: D20476204

Pulled By: riversand963

fbshipit-source-id: 612c6754660cc9b5bb3e9c2dbb2f6ecd7f648797
2020-03-16 16:57:48 -07:00
sdong
488b1e6739 Fix an error in db_bench with gcc 4.8 (#6537)
Summary:
I start to see following failures:

tools/db_bench_tool.cc: In constructor ‘rocksdb::NormalDistribution::NormalDistribution(unsigned int, unsigned int)’:
tools/db_bench_tool.cc:1528:58: error: declaration of ‘max’ shadows a member of 'this' [-Werror=shadow]
   NormalDistribution(unsigned int min, unsigned int max) :
                                                          ^
tools/db_bench_tool.cc:1528:58: error: declaration of ‘min’ shadows a member of 'this' [-Werror=shadow]
tools/db_bench_tool.cc: In constructor ‘rocksdb::UniformDistribution::UniformDistribution(unsigned int, unsigned int)’:
tools/db_bench_tool.cc:1546:59: error: declaration of ‘max’ shadows a member of 'this' [-Werror=shadow]
   UniformDistribution(unsigned int min, unsigned int max) :
                                                           ^
tools/db_bench_tool.cc:1546:59: error: declaration of ‘min’ shadows a member of 'this' [-Werror=shadow]

when I build from GCC 4.8. Rename those variables to fix the problem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6537

Test Plan: make all with the compiler that used to show the failure.

Differential Revision: D20448741

fbshipit-source-id: 18bcf012dbe020f22f79038a9b08f447befa2574
2020-03-16 13:50:40 -07:00
sdong
d66908091d De-template block based table iterator (#6531)
Summary:
Right now block based table iterator is used as both of iterating data for block based table, and for the index iterator for partitioend index. This was initially convenient for introducing a new iterator and block type for new index format, while reducing code change. However, these two usage doesn't go with each other very well. For example, Prev() is never called for partitioned index iterator, and some other complexity is maintained in block based iterators, which is not needed for index iterator but maintainers will always need to reason about it. Furthermore, the template usage is not following Google C++ Style which we are following, and makes a large chunk of code tangled together. This commit separate the two iterators. Right now, here is what it is done:
1. Copy the block based iterator code into partitioned index iterator, and de-template them.
2. Remove some code not needed for partitioned index. The upper bound check and tricks are removed. We never tested performance for those tricks when partitioned index is enabled in the first place. It's unlikelyl to generate performance regression, as creating new partitioned index block is much rarer than data blocks.
3. Separate out the prefetch logic to a helper class and both classes call them.

This commit will enable future follow-ups. One direction is that we might separate index iterator interface for data blocks and index blocks, as they are quite different.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6531

Test Plan: build using make and cmake. And build release

Differential Revision: D20473108

fbshipit-source-id: e48011783b339a4257c204cc07507b171b834b0f
2020-03-16 12:20:50 -07:00
Cheng Chang
402da454cb Migrate AppVeyor to CircleCI (#6518)
Summary:
CircleCI is the new recommended CI system internally.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6518

Test Plan: Watch https://app.circleci.com/pipelines/github/facebook/rocksdb

Differential Revision: D20454743

Pulled By: cheng-chang

fbshipit-source-id: 39031568d6c1d3d25b7fbd78fa9a0e6067ddc47c
2020-03-13 21:58:51 -07:00
Cheng Chang
23eae14d24 Destroy DB at the end of each test in db_logical_block_size_cache_test (#6532)
Summary:
If DB is not deleted, in concurrent test, the tests might fail because of the previously existing DB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6532

Test Plan:
make clean && make -j24 LITE=1  db_logical_block_size_cache_test && ./db_logical_block_size_cache_test
make clean && make -j24 db_logical_block_size_cache_test && ./db_logical_block_size_cache_test

Differential Revision: D20454734

Pulled By: cheng-chang

fbshipit-source-id: 8abede2ec1d79c1a4fe1bc95fbda489f8f7ee052
2020-03-13 21:53:38 -07:00
Zhichao Cao
a824727db4 Fix build bug caused by PR 6516 (#6535)
Summary:
Fix the build corruption caused by PR https://github.com/facebook/rocksdb/issues/6516

Testing plan: make make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6535

Differential Revision: D20448614

Pulled By: zhichao-cao

fbshipit-source-id: 4d2a3dae6cdd781fcfe8e28a84ac3f536db1b067
2020-03-13 16:48:03 -07:00
Peter Dillinger
85dbdf2586 Use an Amazon S3 bucket for downloading deps (#6526)
Summary:
After we had a lot of failures with maven.org downloads, we
wanted an alternative location for downloading binary dependencies.
Hosting them through github would have been good in terms of
organizational and network dependencies, but that approach seems to be
awkward (fake releases, so would need a 'rocksdb-deps' repo) and
strangely complicated for Facebook policy on open source repositories.

This commit moves the downloads (that are not officially hosted by
others on github) from my personal rocksdb fork to an S3 bucket owned
by the Facebook RocksDB AWS account. Facebook employees can access
this through an internal tool, and we should be able to grant permission
to outside collaborators.

Assuming this works out, I will back-port to older branches to stabilize
their CI testing as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6526

Test Plan: CI

Differential Revision: D20430130

Pulled By: pdillinger

fbshipit-source-id: df52394a65e0a57942db3039bdaade8a4d520cb2
2020-03-13 13:39:03 -07:00