Commit Graph

8942 Commits

Author SHA1 Message Date
Cheng Chang
40497a875a Reduce memory copies when fetching and uncompressing blocks from SST files (#6689)
Summary:
In https://github.com/facebook/rocksdb/pull/6455, we modified the interface of `RandomAccessFileReader::Read` to be able to get rid of memcpy in direct IO mode.
This PR applies the new interface to `BlockFetcher` when reading blocks from SST files in direct IO mode.

Without this PR, in direct IO mode, when fetching and uncompressing compressed blocks, `BlockFetcher` will first copy the raw compressed block into `BlockFetcher::compressed_buf_` or `BlockFetcher::stack_buf_` inside `RandomAccessFileReader::Read` depending on the block size. then during uncompressing, it will copy the uncompressed block into `BlockFetcher::heap_buf_`.

In this PR, we get rid of the first memcpy and directly uncompress the block from `direct_io_buf_` to `heap_buf_`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6689

Test Plan: A new unit test `block_fetcher_test` is added.

Reviewed By: anand1976

Differential Revision: D21006729

Pulled By: cheng-chang

fbshipit-source-id: 2370b92c24075692423b81277415feb2aed5d980
2020-04-24 15:32:56 -07:00
Cheng Chang
1758f76f2d Fix unused variable of r in release mode (#6750)
Summary:
In release mode, asserts are not compiled, so `r` is not used, causing compiler warnings.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6750

Test Plan: make check under release mode

Reviewed By: anand1976

Differential Revision: D21220365

Pulled By: cheng-chang

fbshipit-source-id: fd4afa9843d54af68c4da8660ec61549803e1167
2020-04-24 15:14:13 -07:00
anand76
9e7b7e2c08 Silence false alarms in db_stress fault injection (#6741)
Summary:
False alarms are caused by codepaths that intentionally swallow IO
errors.

Tests:
make crash_test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6741

Reviewed By: ltamasi

Differential Revision: D21181138

Pulled By: anand1976

fbshipit-source-id: 5ccfbc68eb192033488de6269e59c00f2c65ce00
2020-04-24 13:06:12 -07:00
Yanqin Jin
e04f3bce4f Update CURRENT file after best-efforts recovery (#6746)
Summary:
After a successful recovery, the CURRENT file should be updated to point to the valid MANIFEST.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6746

Test Plan: make check

Reviewed By: anand1976

Differential Revision: D21189876

Pulled By: riversand963

fbshipit-source-id: 7537b49988c5c425ebe9505a5cc260de351ad79b
2020-04-23 16:21:09 -07:00
Cheng Chang
51bdfae010 Check alignment of MultiRead requests in direct IO mode (#6739)
Summary:
Add assertions to check direct IO's alignment requirements in MultiRead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6739

Test Plan: make check

Reviewed By: siying

Differential Revision: D21143825

Pulled By: cheng-chang

fbshipit-source-id: 26f1623b062a1851080771128feac0669a61f5e9
2020-04-23 15:19:31 -07:00
Levi Tamasi
bc51e33d9c Make sure (Shared)BlobFileMetaData are owned by shared_ptrs (#6749)
Summary:
The patch makes a couple of small cleanups to `SharedBlobFileMetaData` and `BlobFileMetaData`:
* It makes the constructors private and introduces factory methods to ensure these objects are always owned by `shared_ptr`s. Note that `SharedBlobFileMetaData` has an additional factory that takes a deleter object; we can utilize this to e.g. notify `VersionSet` when a blob file becomes obsolete (which is exactly when `SharedBlobFileMetaData` is destroyed).
* It disables move operations explicitly instead of relying on them being suppressed because of a user-declared destructor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6749

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D21206947

Pulled By: ltamasi

fbshipit-source-id: 9094c14cc335b3e226f883e5a0df4f87a5cdeb95
2020-04-23 13:44:29 -07:00
Ibrahim Jarif
ae77880223 Fix some typos in code comments (#6733)
Summary:
This PR fixes some typos in code comments.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6733

Reviewed By: siying

Differential Revision: D21209037

Pulled By: zhichao-cao

fbshipit-source-id: d9274611fab1f5e992998c8c4117b8078c4cbc69
2020-04-23 12:28:49 -07:00
mrambacher
4cbc19d2a1 Add a ConfigOptions for use in comparing objects and converting to/from strings (#6389)
Summary:
The methods in convenience.h are used to compare/convert objects to/from strings.  There is a mishmash of parameters in use here with more needed in the future.  This PR replaces those parameters with a single structure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6389

Reviewed By: siying

Differential Revision: D21163707

Pulled By: zhichao-cao

fbshipit-source-id: f807b4cc7e2b0af3871536b69546b2604dfa81bd
2020-04-21 17:38:17 -07:00
anand76
c1ccd6b6af Implement deadline support for MultiGet (#6710)
Summary:
Initial implementation of ReadOptions.deadline for MultiGet. If the request takes longer than the deadline, the keys not yet found will be returned with Status::TimedOut(). This
implementation enforces the deadline in DBImpl, which is fairly high
level. Its best effort and may not check the deadline after every key
lookup, but may do so after a batch of keys.

In subsequent stages, we will extend this to passing a timeout down to the FileSystem.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6710

Test Plan: Add new unit tests

Reviewed By: riversand963

Differential Revision: D21149158

Pulled By: anand1976

fbshipit-source-id: 9f44eecffeb40873f5034ed59a66d21f9f88879e
2020-04-21 14:51:51 -07:00
Tomas Kolda
6ee66cf8f0 Prevents Table Cache to open same files more times (#6707)
Summary:
In highly concurrent requests table cache opens same file more times which lowers purpose of max_open_files. Fixes (https://github.com/facebook/rocksdb/issues/6699)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6707

Reviewed By: ltamasi

Differential Revision: D21044965

fbshipit-source-id: f6e91d90b60dad86e518b5147021da42460ee1d2
2020-04-21 13:16:31 -07:00
Andrew Kryczka
f9155a3404 Prevent uninitialized load in IndexBlockIter (#6736)
Summary:
When index block is empty or an error happens while reading it,
`Invalidate()` is called rather than `Initialize()`. So `Seek()` must
not refer to member variables that are only initialized in
`Initialize()` until it is sure `Initialize()` has been called.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6736

Reviewed By: siying

Differential Revision: D21139641

Pulled By: ajkr

fbshipit-source-id: 71c58cc1adbd795dc3729dd5023bf7df1515ff32
2020-04-20 16:32:43 -07:00
Akanksha Mahajan
03a1d95db0 Set max_background_flushes dynamically (#6701)
Summary:
1. Add changes so that max_background_flushes can be set dynamically.
                   2. Add a testcase DBOptionsTest.SetBackgroundFlushThreads which set the
                        max_background_flushes dynamically using SetDBOptions.

TestPlan:  1. make -j64 check
                  2. Using new testcase DBOptionsTest.SetBackgroundFlushThreads
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6701

Reviewed By: ajkr

Differential Revision: D21028010

Pulled By: akankshamahajan15

fbshipit-source-id: 5f949e4a8fd3c32537b637947b7ee09a69cfc7c1
2020-04-20 16:19:02 -07:00
Peter Dillinger
31da5e34c1 C++20 compatibility (#6697)
Summary:
Based on https://github.com/facebook/rocksdb/issues/6648 (CLA Signed), but heavily modified / extended:

* Implicit capture of this via [=] deprecated in C++20, and [=,this] not standard before C++20 -> now using explicit capture lists
* Implicit copy operator deprecated in gcc 9 -> add explicit '= default' definition
* std::random_shuffle deprecated in C++17 and removed in C++20 -> migrated to a replacement in RocksDB random.h API
* Add the ability to build with different std version though -DCMAKE_CXX_STANDARD=11/14/17/20 on the cmake command line
* Minimal rebuild flag of MSVC is deprecated and is forbidden with /std:c++latest (C++20)
* Added MSVC 2019 C++11 & MSVC 2019 C++20 in AppVeyor
* Added GCC 9 C++11 & GCC9 C++20 in Travis
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6697

Test Plan: make check and CI

Reviewed By: cheng-chang

Differential Revision: D21020318

Pulled By: pdillinger

fbshipit-source-id: 12311be5dbd8675a0e2c817f7ec50fa11c18ab91
2020-04-20 13:24:25 -07:00
sdong
fe206f4f7c crash_test to cover index_type kBinarySearchWithFirstKey (#6721)
Summary:
Recently index_type kBinarySearchWithFirstKey is improved so that the API guarantee is exactly the same as other types and it is ready for wide production. We should cover it in crash tst.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6721

Test Plan: Run crash_test

Reviewed By: anand1976

Differential Revision: D21099781

fbshipit-source-id: fda91eba831d9eacbb140c703e9768bb1701f935
2020-04-20 12:57:15 -07:00
Peter Dillinger
45d2b4efca Fix tabs and lint-ignores (#6734)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6734

Reviewed By: cheng-chang

Differential Revision: D21134556

Pulled By: pdillinger

fbshipit-source-id: 3636cc1d1333137b70031f8277458781c21631fb
2020-04-20 11:39:31 -07:00
Yanqin Jin
243852ec15 Add IsDirectory() to Env and FS (#6711)
Summary:
IsDirectory() is a common API to check whether a path is a regular file or
directory.
POSIX: call stat() and use S_ISDIR(st_mode)
Windows: PathIsDirectoryA() and PathIsDirectoryW()
HDFS: FileSystem.IsDirectory()
Java: File.IsDirectory()
...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6711

Test Plan: make check

Reviewed By: anand1976

Differential Revision: D21053520

Pulled By: riversand963

fbshipit-source-id: 680aadfd8ce982b63689190cf31b3145d5a89e27
2020-04-17 14:39:18 -07:00
sdong
63d82e57b9 crash_test to cover small max_open_files (#6719)
Summary:
RocksDB behavior is different while max_open_files is small or large. Add the coverage to small max_open_files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6719

Test Plan: Run crash_test

Reviewed By: pdillinger

Differential Revision: D21081021

fbshipit-source-id: e3e211761a9bd25d93d19a61c1f7b62d48cf5e3c
2020-04-17 11:00:07 -07:00
Nicolas Pépin-Perreault
9e6f3efcd2 Add RocksIterator::Refresh (#6573)
Summary:
This PR exposes the `Iterator::Refresh` method to the Java API by adding it on the `RocksIteratorInterface` interface. There are three concrete implementations: `RocksIterator`, `SstFileReaderIterator`, and `WBWIRocksIterator`. For the first two cases, the JNI side simply delegates to the underlying `Iterator::Refresh` method; in the last case, as it doesn't share an ancestor, and per the discussion in https://github.com/facebook/rocksdb/issues/3465, a `Status::NotSupported` exception is thrown.

As the last PR had no activity in a while, I'm opening a new one - I'm completely fine with merging the previous PR if it gets completed before this is reviewed.

Let me know if there's anything missing or anything else I can do 👍
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6573

Reviewed By: cheng-chang

Differential Revision: D20604666

Pulled By: pdillinger

fbshipit-source-id: 4de17df1180c3b87b76cfdd77b674b81fc0563f7
2020-04-16 15:55:26 -07:00
Adam Retter
9ca49bd4df Keep building RocksJava on all architectures (#6583)
Summary:
Adding solid support for multiple architectures was initially triggered by RocksJava users. As such I would like to keep the CI for RocksJava on all architectures, to ensure we don't break backwards compatibility.

pdillinger okay let's see how long it takes to complete Travis-CI with this one...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6583

Reviewed By: cheng-chang

Differential Revision: D21036718

Pulled By: pdillinger

fbshipit-source-id: 97afe0db2e4c575cc0284fdc1d4cc45d5deb2272
2020-04-16 15:50:45 -07:00
Adam Retter
5fef0ffd66 Update RocksJava static version of bzip2 (#6714)
Summary:
Updates the version of bzip2 used for RocksJava static builds.

Please, can we also get this cherry-picked to:

1. 6.7.fb
2. 6.8.fb
3. 6.9.fb
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6714

Reviewed By: cheng-chang

Differential Revision: D21067233

Pulled By: pdillinger

fbshipit-source-id: 8164b7eb99c5ca7b2021ab8c371ba9ded4cb4f7e
2020-04-16 15:35:24 -07:00
Andrew Kryczka
6717ada899 Fix CF import with overlapping SST files (#6663)
Summary:
Invariant checking should use internal key comparator rather than
`sstableKeyCompare()`. The latter was intended for checking whether a
compaction input file's neighboring files need to be included in the
same compaction. Using it for invariant checking was leading to false
positives for files with overlapping endpoints.

Fixes https://github.com/facebook/rocksdb/issues/6647.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6663

Test Plan: regression test

Reviewed By: zhichao-cao

Differential Revision: D20910466

Pulled By: ajkr

fbshipit-source-id: f0b70dad7c4096fce635cab7a36f16e14f74ae3f
2020-04-16 13:16:06 -07:00
sdong
73523baeb1 crash_test to cover options.avoid_flush_during_recovery (#6712)
Summary:
Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712

Test Plan: Run crash_test and see the option can be used or not used by chance.

Reviewed By: ltamasi

Differential Revision: D21056566

fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793
2020-04-16 12:11:45 -07:00
Yueh-Hsuan Chiang
5801af4646 Add env_fault_injection argument to db_stress (#6687)
Summary:
Add env_fault_injection argument to db_stress.  When enabled,
FaultInjectionTestEnv will be used instead.  Currently this
option does not support running with other env setting.

This will allow
us to later manually produce error when running db_crashtest.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687

Test Plan:
make db_stress -j32
./db_stress --env_fault_injection
./db_stress --env_fault_injection --hdfs   // expect error message

Reviewed By: ajkr

Differential Revision: D21014683

Pulled By: yhchiang

fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a
2020-04-16 11:13:44 -07:00
Cheng Chang
2767972386 Fix warning when O_CLOEXEC is not defined (#6695)
Summary:
Compilation fails on systems that do not support O_CLOEXEC. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6695

Test Plan: compile without O_CLOEXEC support

Reviewed By: anand1976

Differential Revision: D21011850

Pulled By: cheng-chang

fbshipit-source-id: f1bf1cce2aa65c7d10b5a9613e941db30e928347
2020-04-16 11:02:50 -07:00
Mike Kolupaev
e45673dece Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621)
Summary:
Context: Index type `kBinarySearchWithFirstKey` added the ability for sst file iterator to sometimes report a key from index without reading the corresponding data block. This is useful when sst blocks are cut at some meaningful boundaries (e.g. one block per key prefix), and many seeks land between blocks (e.g. for each prefix, the ranges of keys in different sst files are nearly disjoint, so a typical seek needs to read a data block from only one file even if all files have the prefix). But this added a new error condition, which rocksdb code was really not equipped to deal with: `InternalIterator::value()` may fail with an IO error or Status::Incomplete, but it's just a method returning a Slice, with no way to report error instead. Before this PR, this type of error wasn't handled at all (an empty slice was returned), and kBinarySearchWithFirstKey implementation was considered a prototype.

Now that we (LogDevice) have experimented with kBinarySearchWithFirstKey for a while and confirmed that it's really useful, this PR is adding the missing error handling.

It's a pretty inconvenient situation implementation-wise. The error needs to be reported from InternalIterator when trying to access value. But there are ~700 call sites of `InternalIterator::value()`, most of which either can't hit the error condition (because the iterator is reading from memtable or from index or something) or wouldn't benefit from the deferred loading of the value (e.g. compaction iterator that reads all values anyway). Adding error handling to all these call sites would needlessly bloat the code. So instead I made the deferred value loading optional: only the call sites that may use deferred loading have to call the new method `PrepareValue()` before calling `value()`. The feature is enabled with a new bool argument `allow_unprepared_value` to a bunch of methods that create iterators (it wouldn't make sense to put it in ReadOptions because it's completely internal to iterators, with virtually no user-visible effect). Lmk if you have better ideas.

Note that the deferred value loading only happens for *internal* iterators. The user-visible iterator (DBIter) always prepares the value before returning from Seek/Next/etc. We could go further and add an API to defer that value loading too, but that's most likely not useful for LogDevice, so it doesn't seem worth the complexity for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6621

Test Plan: make -j5 check . Will also deploy to some logdevice test clusters and look at stats.

Reviewed By: siying

Differential Revision: D20786930

Pulled By: al13n321

fbshipit-source-id: 6da77d918bad3780522e918f17f4d5513d3e99ee
2020-04-15 17:40:44 -07:00
anand76
610a09ccff Remove a printf from db_stress that's not useful info (#6705)
Summary:
This was causing db_crashtest.py to wrongly assume an error by parsing the output. Hopefully this will stabilize the crash tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6705

Test Plan: make blackbox_crash_test

Reviewed By: ltamasi

Differential Revision: D21043335

Pulled By: anand1976

fbshipit-source-id: 5cddd112b124d4e2ebd11724a17d4ef0f50c1cf8
2020-04-15 12:13:35 -07:00
sdong
165560fb32 Two Improvements to tools/check_format_compatible.sh (#6702)
Summary:
Improve it in two ways:
1. tools/check_format_compatible.sh is not friendly to run outside FB environment. remove the hard-coded http proxy setting. Instead, move it to Legocastle configuration
2. Always disable warning as error, so that older build is more likely to pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6702

Test Plan: Run the test and make sure at least it doesn't break.

Reviewed By: riversand963

Differential Revision: D21033329

fbshipit-source-id: 88b4ec1ec49547b772790050a165466bdc4a62a0
2020-04-15 11:28:11 -07:00
anand76
234e2ed5b6 Fix a couple of bugs in db_stress fault injection (#6700)
Summary:
1. Fix a memory leak in FaultInjectionTestFS in the stack trace related
code
2. Check status of all MultiGet keys before deciding whether an error
was swallowed, instead of assuming an ok status for any key means an
undetected error
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6700

Test Plan: Run db_stress with asan and fault injection

Reviewed By: cheng-chang

Differential Revision: D21021498

Pulled By: anand1976

fbshipit-source-id: 489191efd1ab0fa834923a1e1d57253a7a315465
2020-04-14 11:06:55 -07:00
Cheng Chang
9ae8058d95 Suppress file deletion error message in FaultInjectionTestEnv (#6696)
Summary:
The error message is causing problems in the crash tests due to the
error parsing logic in db_crashtest.py.

This is a follow up PR for https://github.com/facebook/rocksdb/pull/6694.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6696

Test Plan: make check

Reviewed By: zhichao-cao

Differential Revision: D21021875

Pulled By: cheng-chang

fbshipit-source-id: 11e3f536df16941a89949ebcd2147cd8dfa3fbe0
2020-04-14 10:55:10 -07:00
anand76
3d6d7bcf17 Log CompactOnDeletionCollectorFactory parameters on DB open (#6686)
Summary:
Log it in the info log to help in troubleshooting. It is logged as follows -
```
2020/04/10-10:51:39.886662 7ffff7fef340                   Options.table_properties_collectors: CompactOnDeletionCollector (Sliding window size = 100 Deletion trigger = 90);
```

Tests:
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6686

Reviewed By: ltamasi

Differential Revision: D21002442

Pulled By: anand1976

fbshipit-source-id: 7adf0dbae7f1febcb00ce61fea5097118ede5c6a
2020-04-13 19:58:04 -07:00
Zhichao Cao
38dfa406ff Add NewFileChecksumGenCrc32cFactory to file checksum (#6688)
Summary:
Add NewFileChecksumGenCrc32cFactory to file checksum public interface such that applications can use the build in crc32 checksum factory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6688

Test Plan: pass make asan_check

Reviewed By: riversand963

Differential Revision: D21006859

Pulled By: zhichao-cao

fbshipit-source-id: ea8a45196a8b77c310728ab05f6cc0f49f3baef0
2020-04-13 19:13:41 -07:00
Ziyue Yang
41563b61db Fix data racing of BlockBasedTableBuilder::ParallelCompressionRep::first_block (#6640)
Summary:
BlockBasedTableBuilder::ParallelCompressionRep::first_block can be read in
Flush() and written in BGWorkWriteRawBlock() concurrently. This commit fixes
the issue by reading first_block out before pushing the block to compression
and write.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6640

Test Plan: Run all tests concurrently with TSAN.

Reviewed By: cheng-chang

Differential Revision: D20851370

fbshipit-source-id: 6f039222e8319d31e15f1b45e05c106527253f72
2020-04-13 16:24:57 -07:00
anand76
d9cad3a526 Suppress file deletion error message in FaultInjectionTestFS (#6694)
Summary:
The error message is causing problems in the crash tests due to the
error parsing logic in db_crashtest.py.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6694

Reviewed By: siying

Differential Revision: D20998531

Pulled By: anand1976

fbshipit-source-id: 89cb54a5f5bb664ae6d239c37559f10e14c5ea07
2020-04-13 15:18:38 -07:00
Andrew Kryczka
9eca6d651d fix comparison count for format_version=3 indexes (#6650)
Summary:
In index blocks since `format_version=3`, user keys are written
rather than internal keys. When reading such blocks, the comparator is
obtained via `InternalKeyComparator::user_comparator()`. That function
must not return an unwrapped result as the wrapper class provides
accounting logic to populate `PerfContext::user_key_comparison_count`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6650

Test Plan:
ran db_bench and verified
`PerfContext::user_key_comparison_count` became larger.

Reviewed By: cheng-chang

Differential Revision: D20866325

Pulled By: ajkr

fbshipit-source-id: ad755d46bda31157dacc5b66e532279f19ad538c
2020-04-13 11:18:37 -07:00
anand76
79c838eb0f Fix a few bugs in db_stress fault injection (#6693)
Summary:
Fix the following issues -
1. Output parsing error in db_crashtest.py
2. Memory leak on exit
3. False alarm on filter block read error
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6693

Test Plan: asan_crash

Reviewed By: cheng-chang

Differential Revision: D20990399

Pulled By: anand1976

fbshipit-source-id: 178ee0dd7c69a4bc5db698379db0dedb29281699
2020-04-13 11:01:03 -07:00
Yanqin Jin
eeb3cf3f58 Fix release build (#6690)
Summary:
Fix release build caused by variable defined but unused.

Test plan (devserver)
```
make release
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6690

Reviewed By: cheng-chang

Differential Revision: D20980571

Pulled By: riversand963

fbshipit-source-id: c3f3b13f81dce4bdb19876dc2e710d5902ff8a02
2020-04-11 22:04:04 -07:00
anand76
5c19a441c4 Fault injection in db_stress (#6538)
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538

Test Plan:
crash_test
make check

Reviewed By: riversand963

Differential Revision: D20714347

Pulled By: anand1976

fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
2020-04-10 17:21:26 -07:00
Yanqin Jin
0c05624d50 Compaction with timestamp: input boundaries (#6645)
Summary:
Towards making compaction logic compatible with user timestamp.
When computing boundaries and overlapping ranges for inputs of compaction, We need to compare SSTs by user key without timestamp.

Test plan (devserver):
```
make check
```
Several individual tests:
```
./version_set_test --gtest_filter=VersionStorageInfoTimestampTest.GetOverlappingInputs
./db_with_timestamp_compaction_test
./db_with_timestamp_basic_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6645

Reviewed By: ltamasi

Differential Revision: D20960012

Pulled By: riversand963

fbshipit-source-id: ad377fa9eb481bf7a8a3e1824aaade48cdc653a4
2020-04-10 16:05:49 -07:00
Akanksha Mahajan
a0faff126d Report kFilesMarkedForCompaction for delete triggered compactions (#6680)
Summary:
Summary : Set manual_compaction false in case of DeleteTriggeredCompaction object so that kFilesMarkedForComapaction can be reported.
          Added a DeletionTriggeredUniversalCompactionMarking test case for Deletion Triggered compaction in case of Universal Compaction.

Test Plan : make check -j64
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6680

Test Plan: make check -j64

Reviewed By: anand1976

Differential Revision: D20945946

Pulled By: akankshamahajan15

fbshipit-source-id: af84e417bd7127652aaae9143c560d1ab3815d25
2020-04-10 15:30:38 -07:00
anand76
d600e5b0eb Fix a Centos build failure reported in #6651 (#6656)
Summary:
Fixes issue https://github.com/facebook/rocksdb/issues/6651

Tests:
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6656

Reviewed By: cheng-chang

Differential Revision: D20879084

Pulled By: anand1976

fbshipit-source-id: c2cc508ca2716fcf80dcf9d2ba31c32d211f941e
2020-04-10 11:47:46 -07:00
sdong
1be3be5522 Auto-Format two recent diffs and add HISTORY.md (#6685)
Summary:
Two recent diffs can be autoformatted.
Also add HISTORY.md entry for https://github.com/facebook/rocksdb/pull/6214
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6685

Test Plan: Run all existing tests

Reviewed By: cheng-chang

Differential Revision: D20965780

fbshipit-source-id: 195b08d7849513d42fe14073112cd19fdda6af95
2020-04-10 11:32:44 -07:00
Andrew Kryczka
f08630b914 explicitly mark backup interfaces non-extensible (#6654)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6654

Reviewed By: cheng-chang

Differential Revision: D20878094

Pulled By: ajkr

fbshipit-source-id: 94d2561bdb6ffb7fe3773ca07d475337600a5b57
2020-04-10 10:51:09 -07:00
Connor1996
c8c739a877 Fix sst_dump not able to open ingested file (#6673)
Summary:
When investigating https://github.com/facebook/rocksdb/issues/6666, we encounter an error for sst_dump to dump an ingested SST file with global seqno.
```
Corruption: An external sst file with version 2 have global seqno property with value ��/, while largest seqno in the file is 0)
```

Same as https://github.com/facebook/rocksdb/pull/5097, it is due to SstFileReader don't know the largest seqno of a file, it will fail this check when it open a file with global seqno. ca89ac2ba9/table/block_based_table_reader.cc (L730)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6673

Test Plan: run it manually

Reviewed By: cheng-chang

Differential Revision: D20937546

Pulled By: ajkr

fbshipit-source-id: c3fd04d60916a738533ee1885f3ea844669a9479
2020-04-10 10:47:46 -07:00
Huisheng Liu
9e89ffb776 make iterator return versions between timestamp bounds (#6544)
Summary:
(Based on Yanqin's idea) Add a new field in readoptions as lower timestamp bound for iterator. When the parameter is not supplied (nullptr), the iterator returns the latest visible version of a record. When it is supplied, the existing timestamp field is the upper bound. Together the two serves as a bounded time window. The iterator returns all versions of a record falling in the window.

SeekRandom perf test (10 minutes) on the same development machine ram drive with the same DB data shows no regression (within marge of error). The test is adapted from https://github.com/facebook/rocksdb/wiki/RocksDB-In-Memory-Workload-Performance-Benchmarks.
base line (commit e860f8840):
seekrandom   : 7.836 micros/op 4082449 ops/sec; (0 of 73481999 found)
This PR:
seekrandom   : 7.764 micros/op 4120935 ops/sec; (0 of 71303999 found)

db_bench --db=r:\rocksdb.github --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --cache_size=2147483648 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=r:\rocksdb.github\WAL_LOG --sync=0 --verify_checksum=1 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --duration=600 --benchmarks=seekrandom --use_existing_db=1 --num=25000000 --threads=32 --allow_concurrent_memtable_write=0
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6544

Reviewed By: ltamasi

Differential Revision: D20844069

Pulled By: riversand963

fbshipit-source-id: d97f2bf38a323c8c6a68db213b2d3c694b1c1f74
2020-04-10 09:51:58 -07:00
Luca Giacchino
66a95f0fac Provide an allocator for new memory type to be used with RocksDB block cache (#6214)
Summary:
New memory technologies are being developed by various hardware vendors (Intel DCPMM is one such technology currently available). These new memory types require different libraries for allocation and management (such as PMDK and memkind). The high capacities available make it possible to provision large caches (up to several TBs in size), beyond what is achievable with DRAM.
The new allocator provided in this PR uses the memkind library to allocate memory on different media.

**Performance**

We tested the new allocator using db_bench.
- For each test, we vary the size of the block cache (relative to the size of the uncompressed data in the database).
- The database is filled sequentially. Throughput is then measured with a readrandom benchmark.
- We use a uniform distribution as a worst-case scenario.

The plot shows throughput (ops/s) relative to a configuration with no block cache and default allocator.
For all tests, p99 latency is below 500 us.

![image](https://user-images.githubusercontent.com/26400080/71108594-42479100-2178-11ea-8231-8a775bbc92db.png)

**Changes**

- Add MemkindKmemAllocator
- Add --use_cache_memkind_kmem_allocator db_bench option (to create an LRU block cache with the new allocator)
- Add detection of memkind library with KMEM DAX support
- Add test for MemkindKmemAllocator

**Minimum Requirements**

- kernel 5.3.12
- ndctl v67 - https://github.com/pmem/ndctl
- memkind v1.10.0 - https://github.com/memkind/memkind

**Memory Configuration**

The allocator uses the MEMKIND_DAX_KMEM memory kind. Follow the instructions on[ memkind’s GitHub page](https://github.com/memkind/memkind) to set up NVDIMM memory accordingly.

Note on memory allocation with NVDIMM memory exposed as system memory.
- The MemkindKmemAllocator will only allocate from NVDIMM memory (using memkind_malloc with MEMKIND_DAX_KMEM kind).
- The default allocator is not restricted to RAM by default. Based on NUMA node latency, the kernel should allocate from local RAM preferentially, but it’s a kernel decision. numactl --preferred/--membind can be used to allocate preferentially/exclusively from the local RAM node.

**Usage**

When creating an LRU cache, pass a MemkindKmemAllocator object as argument.
For example (replace capacity with the desired value in bytes):

```
#include "rocksdb/cache.h"
#include "memory/memkind_kmem_allocator.h"

NewLRUCache(
    capacity /*size_t*/,
    6 /*cache_numshardbits*/,
    false /*strict_capacity_limit*/,
    false /*cache_high_pri_pool_ratio*/,
    std::make_shared<MemkindKmemAllocator>());
```

Refer to [RocksDB’s block cache documentation](https://github.com/facebook/rocksdb/wiki/Block-Cache) to assign the LRU cache as block cache for a database.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6214

Reviewed By: cheng-chang

Differential Revision: D19292435

fbshipit-source-id: 7202f47b769e7722b539c86c2ffd669f64d7b4e1
2020-04-09 20:47:23 -07:00
Peter Dillinger
9d6974d3c9 Temporarily disable ppc64le unit tests in PRs (#6682)
Summary:
Until Travis gets its act together (https://github.com/facebook/rocksdb/issues/6653)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6682

Test Plan: CI

Reviewed By: riversand963

Differential Revision: D20948865

Pulled By: pdillinger

fbshipit-source-id: 215de523c91a83d2a159f466b853e700c925ba4f
2020-04-09 16:42:44 -07:00
sdong
e860f8840a Fix memory corruption caused by new test in options_settable_test (#6676)
Summary:
https://github.com/facebook/rocksdb/pull/6668 added some new test code but it has a risk of memory corruption. Fix it
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6676

Test Plan: Run the test under ASAN and see it passes.

Reviewed By: ajkr

Differential Revision: D20937108

fbshipit-source-id: 22cc96bb02030df0a37a02e67a2cc37ca31ba22d
2020-04-09 11:23:32 -07:00
Cheng Chang
6e6f807917 Add two more optimization improvements to HISTORY (#6679)
Summary:
Although these optimizations are not user facing, still feel it's valuable to call out in HISTORY.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6679

Test Plan: no need

Reviewed By: zhichao-cao

Differential Revision: D20945916

Pulled By: cheng-chang

fbshipit-source-id: f3e790c07f3bcc4a8a74246c4fa232800ddd4438
2020-04-09 11:19:51 -07:00
Yi Wu
eb287c72d7 Fix wrong key being read on ingested file with global seqno and delta encoding (#6669)
Summary:
On reading an ingested SST file, `DataBlockIter` will replace seqno encoded in a key with global seqno. However, if the original seqno was part of the prefix used for the next key, the global seqno is by mistake used as part of the prefix to construct the next key, causing wrong result being returned. Although at this point it is only software error while data in the file is not corrupted, the issue can further cause compaction output out of order and corrupted result when the ingested SST participated in compaction. Fixing the issue by save the actual seqno and restore it before the key being used as prefix to construct next key.

The unit test is by Little-Wallace from https://github.com/facebook/rocksdb/issues/6666. Fixing https://github.com/facebook/rocksdb/issues/6666.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6669

Test Plan:
New unit test

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Reviewed By: cheng-chang

Differential Revision: D20931808

Pulled By: ajkr

fbshipit-source-id: f01959c35d6a493954dca981663766c7a5a9e8ab
2020-04-08 21:22:15 -07:00
Cheng Chang
31759a7094 Fix result slice's address for direct io read (#6672)
Summary:
When aligned_buf is provided, the result slice's starting address should take offset advance into account.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6672

Test Plan: make check

Reviewed By: anand1976

Differential Revision: D20934198

Pulled By: cheng-chang

fbshipit-source-id: c3475c9c132b92c50d8c7c399fca2e9e76870803
2020-04-08 21:20:31 -07:00