Commit Graph

9032 Commits

Author SHA1 Message Date
Yanqin Jin
fb09ef05dc Attempt to recover from db with missing table files (#6334)
Summary:
There are situations when RocksDB tries to recover, but the db is in an inconsistent state due to SST files referenced in the MANIFEST being missing. In this case, previous RocksDB will just fail the recovery and return a non-ok status.
This PR enables another possibility. During recovery, RocksDB checks possible MANIFEST files, and try to recover to the most recent state without missing table file. `VersionSet::Recover()` applies version edits incrementally and "materializes" a version only when this version does not reference any missing table file. After processing the entire MANIFEST, the version created last will be the latest version.
`DBImpl::Recover()` calls `VersionSet::Recover()`. Afterwards, WAL replay will *not* be performed.
To use this capability, set `options.best_efforts_recovery = true` when opening the db. Best-efforts recovery is currently incompatible with atomic flush.

Test plan (on devserver):
```
$make check
$COMPILE_WITH_ASAN=1 make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6334

Reviewed By: anand1976

Differential Revision: D19778960

Pulled By: riversand963

fbshipit-source-id: c27ea80f29bc952e7d3311ecf5ee9c54393b40a8
2020-03-20 19:30:48 -07:00
Cheng Chang
4fc216649d Support direct IO in RandomAccessFileReader::MultiRead (#6446)
Summary:
By supporting direct IO in RandomAccessFileReader::MultiRead, the benefits of parallel IO (IO uring) and direct IO can be combined.

In direct IO mode, read requests are aligned and merged together before being issued to RandomAccessFile::MultiRead, so blocks in the original requests might share the same underlying buffer, the shared buffers are returned in `aligned_bufs`, which is a new parameter of the `MultiRead` API.

For example, suppose alignment requirement for direct IO is 4KB, one request is (offset: 1KB, len: 1KB), another request is (offset: 3KB, len: 1KB), then since they all belong to page (offset: 0, len: 4KB), `MultiRead` only reads the page with direct IO into a buffer on heap, and returns 2 Slices referencing regions in that same buffer. See `random_access_file_reader_test.cc` for more examples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6446

Test Plan: Added a new test `random_access_file_reader_test.cc`.

Reviewed By: anand1976

Differential Revision: D20097518

Pulled By: cheng-chang

fbshipit-source-id: ca48a8faf9c3af146465c102ef6b266a363e78d1
2020-03-20 16:33:26 -07:00
Cheng Chang
5fd152b7ad Get block size only in direct IO mode (#6522)
Summary:
When `use_direct_reads` and `use_direct_writes` are `false`, `logical_sector_size_` inside various `*File` implementations are not actually used, so `GetLogicalBlockSize` does not necessarily need to be called for `logical_sector_size_`, just set a default page size.

This is a follow up PR for https://github.com/facebook/rocksdb/pull/6457.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6522

Test Plan: make check

Reviewed By: siying

Differential Revision: D20408885

Pulled By: cheng-chang

fbshipit-source-id: f2d3808f41265237e7fa2c0be9f084f8fa97fe3d
2020-03-20 15:26:10 -07:00
sdong
6c50fe1ec9 Change HashMap::Insert()'s value to a const reference (#6567)
Summary:
When building RocksDB on VS2015, an error shows up with

hash_map.h(39): error C2719: 'value': formal parameter with requested alignment of 8 won't be aligned

Making the reference a reference can solve the problem, and there isn't a reason we can't do that, at least for the current use of the hash map.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6567

Test Plan: See CI tests pass.

Reviewed By: pdillinger

Differential Revision: D20548543

fbshipit-source-id: 255b55d74cf68a0b324e6f504c56608a97ea6276
2020-03-20 14:59:54 -07:00
Peter Dillinger
66cd07c6d9 Exclude more Travis builds for each pull request (#6557)
Summary:
This commit fixes an incorrect version of this change that was previously landed.

On recently adding ARM64 and PPC64LE builds to Travis, we
seem to have hit some parallel build limits that dramatically increased
queue times.

This change majorly limits the configurations for ARM64 and PPC64LE to
build on each pull request, but keeps the large matrix for branch
builds.

In the process, I changed some previously excluded osx build configurations
to happen in branch builds.

NB: we might want to move master branch Travis build to daily trigger
rather than push trigger to further reduce contention.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6557

Test Plan: Travis only

Reviewed By: siying

Differential Revision: D20563425

Pulled By: pdillinger

fbshipit-source-id: d619eb9f196486ed000364aa40de4661f0b1029d
2020-03-20 13:20:19 -07:00
Ben Mehne
d2e3822d67 Make testpilot recognize that these tests have coverage instrumentation
Summary: TestPilot uses two flags to determine whether coverage is already instrumented: `fbcode_macros` and `coverage`.  Normally, these two tags are added automatically to cpp tests, but this is a fake cpp test, so we must manually add them.  The first is easy - `fbcode_macros` is added by the `custom_unittest` library, which is in `fbcode_macros`, so it is appropriate.  The second is harder - we need to verify that we should add the macro.  We do this using the `coverage.bzl` functions.

Reviewed By: siying

Differential Revision: D20549040

fbshipit-source-id: d2732b3ec26f3dff065efdf398abe3241075bb2f
2020-03-20 11:23:23 -07:00
Peter Dillinger
093ff0b2ce Exclude more Travis builds for each pull request (#6557)
Summary:
On recently adding ARM64 and PPC64LE builds to Travis, we
seem to have hit some parallel build limits that dramatically increased
queue times.

This change majorly limits the configurations for ARM64 and PPC64LE to
build on each pull request, but keeps the large matrix for branch
builds.

In the process, I changed some previously excluded osx build configurations
to happen in branch builds.

NB: we might want to move master branch Travis build to daily trigger
rather than push trigger to further reduce contention.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6557

Test Plan: Travis only

Reviewed By: siying

Differential Revision: D20524575

Pulled By: pdillinger

fbshipit-source-id: babcb2c64e195679e472473a1cbdf42de47231ff
2020-03-19 12:53:37 -07:00
Zhichao Cao
e10553f2a6 Added the safe-to-ignore tag to version_edit (#6530)
Summary:
Each time RocksDB switches to a new MANIFEST file from old one, it calls WriteCurrentStateToManifest() which writes a 'snapshot' of the current in-memory state of versions to the beginning of the new manifest as a bunch of version edits. We can distinguish these version edits from other version edits written during normal operations with a custom, safe-to-ignore tag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6530

Test Plan: added test to version_edit_test, pass make asan_check

Reviewed By: riversand963

Differential Revision: D20524516

Pulled By: zhichao-cao

fbshipit-source-id: f1de102f5499bfa88dae3caa2f32c7f42cf904db
2020-03-19 11:30:26 -07:00
Levi Tamasi
442404558a Clean up VersionBuilder a bit (#6556)
Summary:
The whole point of the pimpl idiom is to hide implementation details.
Internal helper methods like `CheckConsistency`, `CheckConsistencyForDeletes`,
and `MaybeAddFile` do not belong in the public interface of the class.
In addition, the patch switches to `unique_ptr` for the implementation
object instead of using a raw `delete`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6556

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D20523568

Pulled By: ltamasi

fbshipit-source-id: 5bbb0ccebd0c47a33b815398c7f9cfe13bd775ac
2020-03-19 10:44:16 -07:00
Levi Tamasi
217ce20021 Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491)
Summary:
Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`,
`GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the
command line parameter `get_live_files_and_wal_files_one_in` is specified.
The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable
in the sense that they can return errors if another thread removes a WAL file
while they are executing (which is a perfectly plausible and legitimate scenario).
The patch splits this command line parameter into three (one for each API),
and changes the crash test script so that only `GetLiveFiles` is tested during
our continuous crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491

Test Plan:
```
make check
python tools/db_crashtest.py whitebox
```

Reviewed By: siying

Differential Revision: D20312200

Pulled By: ltamasi

fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8
2020-03-18 17:14:15 -07:00
sdong
8ad4b32c5d cmake: add option WITH_CORE_TOOLS to exclude tools except ldb and sst_dump (#6506)
Summary:
ldb and sst_dump are most important tools and they don't dependend on gflags. In cmake, we don't have an way to only build these two tools and exclude other tools. This is inconvenient if the environment has a problem with gflags. Add such an option WITH_CORE_TOOLS.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6506

Test Plan: cmake and build with WITH_TOOLS and without.

Differential Revision: D20473029

fbshipit-source-id: 3d730fd14bbae6eeeae7f9cc9aec50a4e488ad72
2020-03-18 11:01:38 -07:00
Levi Tamasi
1df9b01680 Disable distributed mutex test for valgrind_test (#6553)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6553

Test Plan:
```
$ make valgrind_test -j24
$ ./folly_synchronization_distributed_mutex_test
DistributedMutex is not supported in ROCKSDB_LITE, on ARM, or in valgrind_test runs
```

Reviewed By: pdillinger

Differential Revision: D20501966

Pulled By: ltamasi

fbshipit-source-id: 386ec5f258f89d0781a36c5b390c665787093a74
2020-03-18 09:24:31 -07:00
sdong
712bc4b6a2 Fix regression bug in partitioned index reseek caused by #6531 (#6551)
Summary:
https://github.com/facebook/rocksdb/pull/6531 removed some code in partitioned index seek logic. By mistake the logic of storing previous index offset is removed, while the logic of using it is preserved, so that the code might use wrong value to determine reseeking condition.
This will trigger a bug, if following a Seek() not going to the last block, SeekToLast() is called, and then Seek() is called which should position the cursor to the block before SeekToLast().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6551

Test Plan: Add a unit test that reproduces the bug. In the same unit test, also some reseek cases are covered to avoid regression.

Reviewed By: pdillinger

Differential Revision: D20493990

fbshipit-source-id: 3919aa4861c0481ec96844e053048da1a934b91d
2020-03-17 12:33:10 -07:00
akankshamahajan
a8149aef1e Allow table/sst_file_reader_test.cc to use custom Env (#6536)
Summary:
Allowing table/sst_file_reader_test.cc to use custom Env specified by TEST_ENV_URI.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6536

Reviewed By: riversand963

Differential Revision: D20448525

Pulled By: akankshamahajan15

fbshipit-source-id: 74e4d34c8ac4c2743741e78bf599571a4a465459
2020-03-17 11:02:13 -07:00
Yanqin Jin
66ed58083a Reduce runtime of db_with_timestamp_basic_test (#6546)
Summary:
Reduce runtime by reducing test scale to avoid test time-outs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6546

Test Plan:
time ./db_with_timestamp_basic_test
and watch internal tests.

Reviewed By: zhichao-cao

Differential Revision: D20479292

Pulled By: riversand963

fbshipit-source-id: c9e4a155be7699dd4de60fa531de86d442a3ba0a
2020-03-17 10:50:48 -07:00
Yanqin Jin
098dce2d1a Fix compiler warning treated as error (#6547)
Summary:
Define a private member variable only in debug mode. Without fix, build will fail
```
In file included from table/block_based/partitioned_index_iterator.cc:9:
./table/block_based/partitioned_index_iterator.h:125:32: error: private field 'icomp_' is not used [-Werror,-Wunused-private-field]
  const InternalKeyComparator& icomp_;
```

Test plan (dev server)
1. make check
2. Make sure fixed in Travis
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6547

Reviewed By: siying

Differential Revision: D20480027

Pulled By: pdillinger

fbshipit-source-id: 288bc94280e240c3136335b6c73eb1ccb0db459d
2020-03-17 09:59:28 -07:00
Peter Dillinger
6c595f008a Update folly/lang/Align.h (backport to C++11) (#6534)
Summary:
For s390x support, some updates in newer version of Align.h are
needed. Upgrading just that file as best we can, with one addition to
Portability.h and tweaking new code in Align.h to use C++11 only (no
non-trivial constexpr functions).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6534

Test Plan: CI, further work in PR https://github.com/facebook/rocksdb/issues/6168

Differential Revision: D20445942

Pulled By: pdillinger

fbshipit-source-id: 0cef3c367463c71f3123d12cdf287c573af5e342
2020-03-16 19:07:31 -07:00
Peter Dillinger
db02664f35 Remove XXH3(preview) streaming APIs (#6540)
Summary:
There was an alignment bug in our copy of the streaming APIs
for XXH3 (which we dubbed "XXH3p" for "preview" release). Since those
APIs are unused and some values for XXH3 have changed since XXH3p, I'm
simply removing those APIs, expecting it's better to use finalized XXH3
function if/when we decide to use those APIs (e.g. for checksums).

Fixes https://github.com/facebook/rocksdb/issues/6508
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6540

Test Plan: make check

Differential Revision: D20479271

Pulled By: pdillinger

fbshipit-source-id: 246cf1690d614d3b31042b563d249de32dec1e0d
2020-03-16 17:02:00 -07:00
Yanqin Jin
58918d4ccc Use correct Env for DestroyDB in stress test (#6539)
Summary:
When using custom Env, trying to call DestroyDB() with default Options will
fail.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6539

Test Plan: ./db_stress

Differential Revision: D20476204

Pulled By: riversand963

fbshipit-source-id: 612c6754660cc9b5bb3e9c2dbb2f6ecd7f648797
2020-03-16 16:57:48 -07:00
sdong
488b1e6739 Fix an error in db_bench with gcc 4.8 (#6537)
Summary:
I start to see following failures:

tools/db_bench_tool.cc: In constructor ‘rocksdb::NormalDistribution::NormalDistribution(unsigned int, unsigned int)’:
tools/db_bench_tool.cc:1528:58: error: declaration of ‘max’ shadows a member of 'this' [-Werror=shadow]
   NormalDistribution(unsigned int min, unsigned int max) :
                                                          ^
tools/db_bench_tool.cc:1528:58: error: declaration of ‘min’ shadows a member of 'this' [-Werror=shadow]
tools/db_bench_tool.cc: In constructor ‘rocksdb::UniformDistribution::UniformDistribution(unsigned int, unsigned int)’:
tools/db_bench_tool.cc:1546:59: error: declaration of ‘max’ shadows a member of 'this' [-Werror=shadow]
   UniformDistribution(unsigned int min, unsigned int max) :
                                                           ^
tools/db_bench_tool.cc:1546:59: error: declaration of ‘min’ shadows a member of 'this' [-Werror=shadow]

when I build from GCC 4.8. Rename those variables to fix the problem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6537

Test Plan: make all with the compiler that used to show the failure.

Differential Revision: D20448741

fbshipit-source-id: 18bcf012dbe020f22f79038a9b08f447befa2574
2020-03-16 13:50:40 -07:00
sdong
d66908091d De-template block based table iterator (#6531)
Summary:
Right now block based table iterator is used as both of iterating data for block based table, and for the index iterator for partitioend index. This was initially convenient for introducing a new iterator and block type for new index format, while reducing code change. However, these two usage doesn't go with each other very well. For example, Prev() is never called for partitioned index iterator, and some other complexity is maintained in block based iterators, which is not needed for index iterator but maintainers will always need to reason about it. Furthermore, the template usage is not following Google C++ Style which we are following, and makes a large chunk of code tangled together. This commit separate the two iterators. Right now, here is what it is done:
1. Copy the block based iterator code into partitioned index iterator, and de-template them.
2. Remove some code not needed for partitioned index. The upper bound check and tricks are removed. We never tested performance for those tricks when partitioned index is enabled in the first place. It's unlikelyl to generate performance regression, as creating new partitioned index block is much rarer than data blocks.
3. Separate out the prefetch logic to a helper class and both classes call them.

This commit will enable future follow-ups. One direction is that we might separate index iterator interface for data blocks and index blocks, as they are quite different.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6531

Test Plan: build using make and cmake. And build release

Differential Revision: D20473108

fbshipit-source-id: e48011783b339a4257c204cc07507b171b834b0f
2020-03-16 12:20:50 -07:00
Cheng Chang
402da454cb Migrate AppVeyor to CircleCI (#6518)
Summary:
CircleCI is the new recommended CI system internally.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6518

Test Plan: Watch https://app.circleci.com/pipelines/github/facebook/rocksdb

Differential Revision: D20454743

Pulled By: cheng-chang

fbshipit-source-id: 39031568d6c1d3d25b7fbd78fa9a0e6067ddc47c
2020-03-13 21:58:51 -07:00
Cheng Chang
23eae14d24 Destroy DB at the end of each test in db_logical_block_size_cache_test (#6532)
Summary:
If DB is not deleted, in concurrent test, the tests might fail because of the previously existing DB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6532

Test Plan:
make clean && make -j24 LITE=1  db_logical_block_size_cache_test && ./db_logical_block_size_cache_test
make clean && make -j24 db_logical_block_size_cache_test && ./db_logical_block_size_cache_test

Differential Revision: D20454734

Pulled By: cheng-chang

fbshipit-source-id: 8abede2ec1d79c1a4fe1bc95fbda489f8f7ee052
2020-03-13 21:53:38 -07:00
Zhichao Cao
a824727db4 Fix build bug caused by PR 6516 (#6535)
Summary:
Fix the build corruption caused by PR https://github.com/facebook/rocksdb/issues/6516

Testing plan: make make asan_check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6535

Differential Revision: D20448614

Pulled By: zhichao-cao

fbshipit-source-id: 4d2a3dae6cdd781fcfe8e28a84ac3f536db1b067
2020-03-13 16:48:03 -07:00
Peter Dillinger
85dbdf2586 Use an Amazon S3 bucket for downloading deps (#6526)
Summary:
After we had a lot of failures with maven.org downloads, we
wanted an alternative location for downloading binary dependencies.
Hosting them through github would have been good in terms of
organizational and network dependencies, but that approach seems to be
awkward (fake releases, so would need a 'rocksdb-deps' repo) and
strangely complicated for Facebook policy on open source repositories.

This commit moves the downloads (that are not officially hosted by
others on github) from my personal rocksdb fork to an S3 bucket owned
by the Facebook RocksDB AWS account. Facebook employees can access
this through an internal tool, and we should be able to grant permission
to outside collaborators.

Assuming this works out, I will back-port to older branches to stabilize
their CI testing as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6526

Test Plan: CI

Differential Revision: D20430130

Pulled By: pdillinger

fbshipit-source-id: df52394a65e0a57942db3039bdaade8a4d520cb2
2020-03-13 13:39:03 -07:00
Zhichao Cao
5c30e6c088 Separate timestamp related test from db_basic_test (#6516)
Summary:
In some of the test, db_basic_test may cause time out due to its long running time. Separate the timestamp related test from db_basic_test to avoid the potential issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6516

Test Plan: pass make asan_check

Differential Revision: D20423922

Pulled By: zhichao-cao

fbshipit-source-id: d6306f89a8de55b07bf57233e4554c09ef1fe23a
2020-03-13 11:37:15 -07:00
sdong
674cf41732 Divide block_based_table_reader.cc (#6527)
Summary:
block_based_table_reader.cc is a giant file, which makes it hard for users to navigate the code. Divide the files to multiple files.
Some class templates cannot be moved to .cc file. They are moved to .h files. It is still better than including them all in block_based_table_reader.cc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6527

Test Plan: "make all check" and "make release". Also build using cmake.

Differential Revision: D20428455

fbshipit-source-id: ca713c698469f07f35bc0c271358c0874ed4eb28
2020-03-12 21:41:50 -07:00
Yuqi Gu
dd7a4a8f03 CI: add Arm support to travis CI matrix (#6436)
Summary:
This patch based on https://github.com/facebook/rocksdb/issues/5932 offers a better solution to add arm64 to TravisCI matrix.
Really thank adamretter for initiating Arm CI setup.

Difference comparing to amd64:
1. For CMake, as no official arm64 release ready on Kitware page,
a third party (conda-forge) released one is used instead of
building from source. The main reason is to save CI time.
2. Explicit export JAVA_HOME on arm64
3. Disable mingw test

Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6436

Differential Revision: D20428505

Pulled By: pdillinger

fbshipit-source-id: 81ef02435e41480bb71710b783d85ebf452ce926
2020-03-12 21:01:20 -07:00
Ben Mehne
a8851f2d05 Fix coverage for internal_repo_rocksdb
Summary:
tcc gtest runner need to know the location of the binary in order to collect coverage.  We can give them the location in an environment variable.

Note that all these tests will break in tpx currently, though this is a bug in rocksdb's wrapper script, not tpx.

Reviewed By: siying

Differential Revision: D20430043

fbshipit-source-id: c77d5f70bbc28f6011c6f91906bce2ceecc2f167
2020-03-12 17:48:16 -07:00
Cheng Chang
2ccb794eb6 Use DestroyColumnFamilyHandle instead of directly deleting column family handle (#6505)
Summary:
Update example usage of closing column family.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6505

Test Plan: cd examples && make column_families_example && ./column_families_example

Differential Revision: D20362100

Pulled By: cheng-chang

fbshipit-source-id: 493c5e0068a40b4f237f8f8511cddd22dc15ea5c
2020-03-12 14:30:46 -07:00
Cheng Chang
0d2c8e47e8 OpenForReadOnly is not supported in LITE mode (#6523)
Summary:
In DBLogicalBlockSizeCacheTest, do not test OpenForReadOnly in LITE mode.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6523

Test Plan: watch test for LITE mode

Differential Revision: D20420321

Pulled By: cheng-chang

fbshipit-source-id: e45bf6f2800206d6f8ce9af7308e76a08de80643
2020-03-12 14:13:59 -07:00
Adam Retter
0772768d07 Force Java version on Travis CI (#6512)
Summary:
In the `.travis.yml` file the `jdk: openjdk7` element is ignored when `language: cpp`. So whatever version of the JDK that was installed in the Travis container was used - typically JDK 11.

To ensure our RocksJava builds are working, we now instead install and use OpenJDK 8. Ideally we would use OpenJDK 7, as RocksJava supports Java 7, but many of the newer Travis containers don't support Java 7, so Java 8 is the next best thing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6512

Differential Revision: D20388296

Pulled By: pdillinger

fbshipit-source-id: 8bbe6b59b70cfab7fe81ff63867d907fefdd2df1
2020-03-12 12:24:51 -07:00
Levi Tamasi
c15e85bdcb Move BlobDB related files under db/ to db/blob/ (#6519)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6519

Test Plan:
```
make all
make check
```

Differential Revision: D20400691

Pulled By: ltamasi

fbshipit-source-id: 20ef911cf1c2c92c7f71ef0b493f9be64f2eef94
2020-03-12 11:00:56 -07:00
Huisheng Liu
07a3f7f008 fix MSVC build failures (#6517)
Summary:
fix a few build warnings that are treated as failures with more strict MSVC warning settings
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6517

Differential Revision: D20401325

Pulled By: pdillinger

fbshipit-source-id: b44979dfaafdc7b3b8cb44a565400a99b331dd30
2020-03-12 08:42:39 -07:00
Cheng Chang
6dea7530b5 Remove copy of pairs from the for range loop (#6514)
Summary:
Remove copy of pairs from the for range loop
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6514

Test Plan: make check

Differential Revision: D20389688

Pulled By: cheng-chang

fbshipit-source-id: 1c772091f955be33267514010f3596c61a6f46b5
2020-03-11 21:38:09 -07:00
Cheng Chang
2d9efc9ab2 Cache result of GetLogicalBufferSize in Linux (#6457)
Summary:
In Linux, when reopening DB with many SST files, profiling shows that 100% system cpu time spent for a couple of seconds for `GetLogicalBufferSize`. This slows down MyRocks' recovery time when site is down.

This PR introduces two new APIs:
1. `Env::RegisterDbPaths` and `Env::UnregisterDbPaths` lets `DB` tell the env when it starts or stops using its database directories . The `PosixFileSystem` takes this opportunity to set up a cache from database directories to the corresponding logical block sizes.
2. `LogicalBlockSizeCache` is defined only for OS_LINUX to cache the logical block sizes.

Other modifications:
1. rename `logical buffer size` to `logical block size` to be consistent with Linux terms.
2. declare `GetLogicalBlockSize` in `PosixHelper` to expose it to `PosixFileSystem`.
3. change the functions `IOError` and `IOStatus` in `env/io_posix.h` to have external linkage since they are used in other translation units too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6457

Test Plan:
1. A new unit test is added for `LogicalBlockSizeCache` in `env/io_posix_test.cc`.
2. A new integration test is added for `DB` operations related to the cache in `db/db_logical_block_size_cache_test.cc`.

`make check`

Differential Revision: D20131243

Pulled By: cheng-chang

fbshipit-source-id: 3077c50f8065c0bffb544d8f49fb10bba9408d04
2020-03-11 18:40:05 -07:00
sdong
331e6199df Include more information in file lock failure (#6507)
Summary:
When users fail to open a DB with file lock failure, it is sometimes hard for users to debug. We now include the time the lock is acquired and the thread ID that acquired the lock, to help users debug problems like this. Default Env's thread ID is used.

Since type of lockedFiles is changed, rename it to follow naming convention too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6507

Test Plan: Add a unit test and improve an existing test to validate the case.

Differential Revision: D20378333

fbshipit-source-id: 312fe0e9733fd1d1e9969c321b90ce523cf4708a
2020-03-11 16:23:08 -07:00
Levi Tamasi
37a635cfe6 Disambiguate CustomFieldTags for the unity build (#6513)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6513

Test Plan: `make unity_test`

Differential Revision: D20388919

Pulled By: ltamasi

fbshipit-source-id: 88dbceab0723a54ee3939e1644e13dc9a4c70420
2020-03-11 14:45:12 -07:00
Adam Retter
8fc20ac468 Add ppc64le builds to Travis (#6144)
Summary:
Let's see how this goes...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6144

Differential Revision: D20387515

Pulled By: pdillinger

fbshipit-source-id: ba2669348c267141dfddff910b4c2224a22cbb38
2020-03-11 12:33:45 -07:00
Adam Retter
65b60db9e1 Update to latest Snappy to fix compilation issue on latest MacOS XCode (#6496)
Summary:
* **macOS version:** 10.15.2 (Catalina)
* **XCode/Clang version:** Apple clang version 11.0.0 (clang-1100.0.33.16)

Before this bugfix the error generated is:

```
In file included from ./util/compression.h:23:
./snappy-1.1.7/snappy.h:76:59: error: unknown type name 'string'; did you mean 'std::string'?
  size_t Compress(const char* input, size_t input_length, string* output);
                                                          ^~~~~~
                                                          std::string
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here
typedef basic_string<char, char_traits<char>, allocator<char> > string;
                                                                ^
In file included from db/builder.cc:10:
In file included from ./db/builder.h:12:
In file included from ./db/range_tombstone_fragmenter.h:15:
In file included from ./db/pinned_iterators_manager.h:12:
In file included from ./table/internal_iterator.h:13:
In file included from ./table/format.h:25:
In file included from ./options/cf_options.h:14:
In file included from ./util/compression.h:23:
./snappy-1.1.7/snappy.h:85:19: error: unknown type name 'string'; did you mean 'std::string'?
                  string* uncompressed);
                  ^~~~~~
                  std::string
/Library/Developer/CommandLineTools/usr/bin/../include/c++/v1/iosfwd:211:65: note: 'std::string' declared here
typedef basic_string<char, char_traits<char>, allocator<char> > string;
                                                                ^
2 errors generated.
make: *** [jls/db/builder.o] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6496

Differential Revision: D20389254

Pulled By: pdillinger

fbshipit-source-id: 2864245c8d0dba7b2ab81294241a62f2adf02e20
2020-03-11 11:46:13 -07:00
Adam Retter
00c4ab01b9 When CMake fails to download a file, display the error message (#6511)
Summary:
This helps to diagnose errors in the CMake build where it tries to retrieve dependencies.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6511

Differential Revision: D20387392

Pulled By: pdillinger

fbshipit-source-id: 7028dfd62704bcc747f39ff864ea9c9bf51cd1be
2020-03-11 08:52:46 -07:00
Levi Tamasi
f5bc3b99d5 Split BlobFileState into an immutable and a mutable part (#6502)
Summary:
It's never too soon to refactor something. The patch splits the recently
introduced (`VersionEdit` related) `BlobFileState` into two classes
`BlobFileAddition` and `BlobFileGarbage`. The idea is that once blob files
are closed, they are immutable, and the only thing that changes is the
amount of garbage in them. In the new design, `BlobFileAddition` contains
the immutable attributes (currently, the count and total size of all blobs, checksum
method, and checksum value), while `BlobFileGarbage` contains the mutable
GC-related information elements (count and total size of garbage blobs). This is a
better fit for the GC logic and is more consistent with how SST files are handled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6502

Test Plan: `make check`

Differential Revision: D20348352

Pulled By: ltamasi

fbshipit-source-id: ff93f0121e80ab15e0e0a6525ba0d6af16a0e008
2020-03-10 17:27:26 -07:00
Chao Zhao
4028eba67b Optional sequence number exporting during checkpoint creation (#5528)
Summary:
Add sequence_number_ptr to the checkpoint interface to expose the sequence number during taking the checkpoint. The number will be consistent with the seq # in rocksdb log.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5528

Test Plan: make check -j64

Reviewed By: Winger1994

Differential Revision: D16080209

fbshipit-source-id: 6dc3c7680287ee97d673c5e61f89aae1f43e33df
2020-03-10 13:40:18 -07:00
Yanqin Jin
fd1da22111 Support options.max_open_files != -1 with FIFO compaction (#6503)
Summary:
Allow user to specify options.max_open_files != -1 with FIFO compaction.
If max_open_files != -1, not all table files are kept open.
In the past, FIFO style compaction requires all table files to be open in order
to read file creation time from table properties. Later, we added file creation
time to MANIFEST, making it possible to read file creation time without opening
file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6503

Test Plan: make check

Differential Revision: D20353758

Pulled By: riversand963

fbshipit-source-id: ba5c61a648419e47e9ef6d74e0e280e3ee24f296
2020-03-09 18:45:06 -07:00
Yanqin Jin
d93812c9ae Iterator with timestamp (#6255)
Summary:
Preliminary support for iterator with user timestamp. Current implementation does not consider merge operator and reverse iterator. Auto compaction is also disabled in unit tests.

Create an iterator with timestamp.
```
...
read_opts.timestamp = &ts;
auto* iter = db->NewIterator(read_opts);
// target is key without timestamp.
for (iter->Seek(target); iter->Valid(); iter->Next()) {}
for (iter->SeekToFirst(); iter->Valid(); iter->Next()) {}
delete iter;
read_opts.timestamp = &ts1;
// lower_bound and upper_bound are without timestamp.
read_opts.iterate_lower_bound = &lower_bound;
read_opts.iterate_upper_bound = &upper_bound;
auto* iter1 = db->NewIterator(read_opts);
// Do Seek or SeekToFirst()
delete iter1;
```

Test plan (dev server)
```
$make check
```

Simple benchmarking (dev server)
1. The overhead introduced by this PR even when timestamp is disabled.
key size: 16 bytes
value size: 100 bytes
Entries: 1000000
Data reside in main memory, and try to stress iterator.
Repeated three times on master and this PR.
- Seek without next
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3
```
master: 159047.0 ops/sec
this PR: 158922.3 ops/sec (2% drop in throughput)
- Seek and next 10 times
```
./db_bench -db=/dev/shm/rocksdbtest-1000 -benchmarks=fillseq,seekrandom -enable_pipelined_write=false -disable_wal=true -format_version=3 -seek_nexts=10
```
master: 109539.3 ops/sec
this PR: 107519.7 ops/sec (2% drop in throughput)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6255

Differential Revision: D19438227

Pulled By: riversand963

fbshipit-source-id: b66b4979486f8474619f4aa6bdd88598870b0746
2020-03-06 16:24:27 -08:00
Cheng Chang
0a0151fb99 Remove memcpy from RandomAccessFileReader::Read in direct IO mode (#6455)
Summary:
In direct IO mode, RandomAccessFileReader::Read allocates an internal aligned buffer, and then copies the result into the scratch buffer. If the result is only temporarily used inside a function, there is no need to do the memcpy and just let the result Slice refer to the internally allocated buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6455

Test Plan: make check

Differential Revision: D20106753

Pulled By: cheng-chang

fbshipit-source-id: 44f505843837bba47a56e3fa2c4dd3bd76486b58
2020-03-06 14:05:12 -08:00
Otto Kekäläinen
f6c2777d95 Fix spelling: commited -> committed (#6481)
Summary:
In most places in the code the variable names are spelled correctly as
COMMITTED but in a couple places not. This fixes them and ensures the
variable is always called COMMITTED everywhere.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6481

Differential Revision: D20306776

Pulled By: pdillinger

fbshipit-source-id: b6c1bfe41db559b4bc6955c530934460c07f7022
2020-03-06 12:45:20 -08:00
Yuqi Gu
e171a219d5 Fix db_wal_test::TruncateLastLogAfterRecoverWithoutFlush failure (#6437)
Summary:
`TruncateLastLogAfterRecoverWithoutFlush` case depends on fallocate support
of underlying file system.

On a file system which lacks of this feature, like zfs, it will fail to allocate predefined file size as this test case intends to do;

So a check block is added to detect fallocate support and skip test if not.
The related work is done by JunHe77. Thanks!

Signed-off-by: Yuqi Gu <yuqi.gu@arm.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6437

Differential Revision: D20145032

Pulled By: pdillinger

fbshipit-source-id: c8b691dc508e95acfa2a004ddbc07e2faa76680d
2020-03-05 17:18:16 -08:00
Cheng Chang
afb97094ae Skip high levels with no key falling in the range in CompactRange (#6482)
Summary:
In CompactRange, if there is no key in memtable falling in the specified range, then flush is skipped.
This PR extends this skipping logic to SST file levels: it starts compaction from the highest level (starting from L0) that has files with key falling in the specified range, instead of always starts from L0.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6482

Test Plan:
A new test ManualCompactionTest::SkipLevel is added.

Also updated a test related to statistics of index block cache hit in db_test2, the index cache hit is increased by 1 in this PR because when checking overlap for the key range in L0, OverlapWithLevelIterator will do a seek in the table cache iterator, which will read from the cached index.

Also updated db_compaction_test and db_test to use correct range for full compaction.

Differential Revision: D20251149

Pulled By: cheng-chang

fbshipit-source-id: f822157cf4796972bd5035d9d7178d8dfb7af08b
2020-03-04 20:15:25 -08:00
Zhichao Cao
e62fe50634 Introduce FaultInjectionTestFS to test fault File system instead of Env (#6414)
Summary:
In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller.

A set of ErrorHandlerFSTests are introduced for testing
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6414

Test Plan: pass make asan_check, pass error_handler_fs_test.

Differential Revision: D20252421

Pulled By: zhichao-cao

fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21
2020-03-04 12:35:05 -08:00