Commit Graph

9154 Commits

Author SHA1 Message Date
Levi Tamasi
78e291b17e Improve consistency checks in VersionBuilder (#6901)
Summary:
The patch cleans up the code and improves the consistency checks around
adding/deleting table files in `VersionBuilder`. Namely, it makes the checks
stricter and improves them in the following ways:
1) A table file can now only be deleted from the LSM tree using the level it
resides on. Earlier, there was some unnecessary wiggle room for
trivially moved files (they could be deleted using a lower level number than
the actual one).
2) A table file cannot be added to the tree if it is already present in the tree
on any level (not just the target level). The earlier code only had an assertion
(which is a no-op in release builds) that the newly added file is not already
present on the target level.
3) The above consistency checks around state transitions are now mandatory,
as opposed to the earlier `CheckConsistencyForDeletes`, which was a no-op
in release mode unless `force_consistency_checks` was set to `true`. The rationale
here is that assuming that the initial state is consistent, a valid transition leads to a
next state that is also consistent; however, an *invalid* transition offers no such
guarantee. Hence it makes sense to validate the transitions unconditionally,
and save `force_consistency_checks` for the paranoid checks that re-validate
the entire state.
4) The new checks build on the mechanism introduced in https://github.com/facebook/rocksdb/pull/6862,
which enables us to efficiently look up the location (level and position within level)
of files in a `Version` by file number. This makes the consistency checks much more
efficient than the earlier `CheckConsistencyForDeletes`, which essentially
performed a linear search.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6901

Test Plan:
Extended the unit tests and ran:

`make check`
`make whitebox_crash_test`

Reviewed By: ajkr

Differential Revision: D21822714

Pulled By: ltamasi

fbshipit-source-id: e2b29c8b6da1bf0f59004acc889e4870b2d18215
2020-06-03 11:24:55 -07:00
Peter Dillinger
9360776cb9 Fix handling of too-small filter partition size (#6905)
Summary:
Because ARM and some other platforms have a larger cache line
size, they have a larger minimum filter size, which causes recently
added PartitionedMultiGet test in db_bloom_filter_test to fail on those
platforms. The code would actually end up using larger partitions,
because keys_per_partition_ would be 0 and never == number of keys
added.

The code now attempts to get as close as possible to the small target
size, while fully utilizing that filter size, if the target partition
size is smaller than the minimum filter size.

Also updated the test to break more uniformly across platforms
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6905

Test Plan: updated test, tested on ARM

Reviewed By: anand1976

Differential Revision: D21840639

Pulled By: pdillinger

fbshipit-source-id: 11684b6d35f43d2e98b85ddb2c8dcfd59d670817
2020-06-03 10:43:01 -07:00
Zhichao Cao
2adb7e3768 Fix potential overflow of unsigned type in for loop (#6902)
Summary:
x.size() -1 or y - 1 can overflow to an extremely large value when x.size() pr y is 0 when they are unsigned type. The end condition of i in the for loop will be extremely large, potentially causes segment fault. Fix them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6902

Test Plan: pass make asan_check

Reviewed By: ajkr

Differential Revision: D21843767

Pulled By: zhichao-cao

fbshipit-source-id: 5b8b88155ac5a93d86246d832e89905a783bb5a1
2020-06-02 15:05:07 -07:00
Zhichao Cao
556972e964 Replace Status with IOStatus in CopyFile and CreateFile (#6916)
Summary:
Replace Status with IOStatus in CopyFile and CreateFile.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6916

Test Plan: pass make asan_check

Reviewed By: cheng-chang

Differential Revision: D21843775

Pulled By: zhichao-cao

fbshipit-source-id: 524d4a0fcf47f0941b923da0346e0de71607f5f6
2020-06-02 15:05:06 -07:00
sdong
bfc9737aca Remove gtest dependency in non-test code under utilities/cassandra (#6908)
Summary:
production code under utilities/cassandra depends on gtest.h. Remove them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6908

Test Plan: Run all existing tests.

Reviewed By: ajkr

Differential Revision: D21842606

fbshipit-source-id: a098e0b49c9aeac51cc90a79562ad9897a36122c
2020-06-02 13:56:29 -07:00
Stanislav Tkach
38f988d3b4 Expose rocksdb_options_copy function to the C API (#6880)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6880

Reviewed By: ajkr

Differential Revision: D21842752

Pulled By: pdillinger

fbshipit-source-id: eda326f551ddd9cb397681544b9e9799ea614e52
2020-06-02 13:48:51 -07:00
Peter Dillinger
14eca6bf04 For ApproximateSizes, pro-rate table metadata size over data blocks (#6784)
Summary:
The implementation of GetApproximateSizes was inconsistent in
its treatment of the size of non-data blocks of SST files, sometimes
including and sometimes now. This was at its worst with large portion
of table file used by filters and querying a small range that crossed
a table boundary: the size estimate would include large filter size.

It's conceivable that someone might want only to know the size in terms
of data blocks, but I believe that's unlikely enough to ignore for now.
Similarly, there's no evidence the internal function AppoximateOffsetOf
is used for anything other than a one-sided ApproximateSize, so I intend
to refactor to remove redundancy in a follow-up commit.

So to fix this, GetApproximateSizes (and implementation details
ApproximateSize and ApproximateOffsetOf) now consistently include in
their returned sizes a portion of table file metadata (incl filters
and indexes) based on the size portion of the data blocks in range. In
other words, if a key range covers data blocks that are X% by size of all
the table's data blocks, returned approximate size is X% of the total
file size. It would technically be more accurate to attribute metadata
based on number of keys, but that's not computationally efficient with
data available and rarely a meaningful difference.

Also includes miscellaneous comment improvements / clarifications.

Also included is a new approximatesizerandom benchmark for db_bench.
No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784

Test Plan:
Test added to DBTest.ApproximateSizesFilesWithErrorMargin.
Old code running new test...

    [ RUN      ] DBTest.ApproximateSizesFilesWithErrorMargin
    db/db_test.cc:1562: Failure
    Expected: (size) <= (11 * 100), actual: 9478 vs 1100

Other tests updated to reflect consistent accounting of metadata.

Reviewed By: siying

Differential Revision: D21334706

Pulled By: pdillinger

fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185
2020-06-02 12:30:23 -07:00
sdong
298b00a396 Reduce dependency on gtest dependency in release code (#6907)
Summary:
Release code now depends on gtest, indirectly through including "test_util/testharness.h". This creates multiple problems. One important reason is the definition of IGNORE_STATUS_IF_ERROR() in test_util/testharness.h. Move it to sync_point.h instead.
Note that utilities/cassandra/format.h still depends on "test_util/testharness.h". This will be resolved in a separate diff.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6907

Test Plan: Run all existing tests.

Reviewed By: ajkr

Differential Revision: D21829884

fbshipit-source-id: 9253c19ffde2936f3ae68998210f8e54f645a6e6
2020-06-02 12:11:24 -07:00
Adam Retter
8d87e9cea1 Update googletest from 1.8.1 to 1.10.0 (#6808)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6808

Reviewed By: anand1976

Differential Revision: D21483984

Pulled By: pdillinger

fbshipit-source-id: 70c5eff2bd54ddba469761d95e4cd4611fb8e598
2020-06-01 20:33:42 -07:00
anand76
66942e8158 Avoid unnecessary reads of uncompression dictionary in MultiGet (#6906)
Summary:
We may sometimes read the uncompression dictionary when its not
necessary, when we lookup a key in an SST file but the index indicates
the key is not present. This can happen with index_type 3.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6906

Test Plan: make check

Reviewed By: cheng-chang

Differential Revision: D21828944

Pulled By: anand1976

fbshipit-source-id: 7aef4f0a39548d0874eafefd2687006d2652f9bb
2020-06-01 19:43:37 -07:00
sdong
02f59ed669 Find the correct gcov (#6904)
Summary:
Right now in FB environment, wrong gcov is used. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6904

Test Plan: "make coverage" and watch results.

Reviewed By: riversand963

Differential Revision: D21824291

fbshipit-source-id: 666011fd86c36adafa09ebd9eb97742f94fb90bb
2020-06-01 16:33:05 -07:00
Cheng Chang
bcb9e41080 Explicitly free allocated buffer when status is not ok (#6903)
Summary:
Currently we rely on `BlockContents` to implicitly free the allocated scratch buffer, but when IO error happens, it doesn't make sense to construct the `BlockContents` which might be corrupted. In the stress test, we find that `assert(req.result.size() == block_size(handle));` fails because of potential IO errors.

In this PR, we explicitly free the scratch buffer on error without constructing `BlockContents`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6903

Test Plan: watch stress test

Reviewed By: anand1976

Differential Revision: D21823869

Pulled By: cheng-chang

fbshipit-source-id: 5603fc80e9bf3f44a9d7250ddebd871afe1eb89f
2020-06-01 15:19:40 -07:00
zitan
038e02d8d9 Remove extraneous newline from ldb stderr (#6897)
Summary:
**Summary**
Remove the extraneous newline when using ldb tool. For example, the subcommand list_column_families will print an empty line to stderr even if there are no errors.

**Test plan**
Passed make check; manually tested a few ldb subcommands.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6897

Reviewed By: pdillinger

Differential Revision: D21819352

Pulled By: gg814

fbshipit-source-id: 5a16a6431bb96684fe97647f4d3ac5bf0ec7fc90
2020-06-01 12:15:36 -07:00
Peter Dillinger
0c56fc4d66 Allow missing "unversioned" python, as in CentOS 8 (#6883)
Summary:
RocksDB Makefile was assuming existence of 'python' command,
which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.

Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)

Also, now use just 'python3' for PYTHON if not found so that an informative
"command not found" error will result rather than something weird.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883

Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.

Reviewed By: gg814

Differential Revision: D21767029

Pulled By: pdillinger

fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
2020-05-29 11:29:23 -07:00
Andrew Kryczka
c5abf78bca avoid IterKey::UpdateInternalKey() in BlockIter (#6843)
Summary:
`IterKey::UpdateInternalKey()` is an error-prone API as it's
incompatible with `IterKey::TrimAppend()`, which is used for
decoding delta-encoded internal keys. This PR stops using it in
`BlockIter`. Instead, it assigns global seqno in a separate `IterKey`'s
buffer when needed. The logic for safely getting a Slice with global
seqno properly assigned is encapsulated in `GlobalSeqnoAppliedKey`.
`BinarySeek()` is also migrated to use this API (previously it ignored
global seqno entirely).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6843

Test Plan:
benchmark setup -- single file DBs, in-memory, no compression. "normal_db"
created by regular flush; "ingestion_db" created by ingesting a file. Both
DBs have same contents.

```
$ TEST_TMPDIR=/dev/shm/normal_db/ ./db_bench -benchmarks=fillrandom,compact -write_buffer_size=10485760000 -disable_auto_compactions=true -compression_type=none -num=1000000
$ ./ldb write_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/ --compression_type=no --hex --create_if_missing < <(./sst_dump --command=scan --output_hex --file=/dev/shm/normal_db/dbbench/000007.sst | awk 'began {print "0x" substr($1, 2, length($1) - 2), "==>", "0x" $5} ; /^Sst file format: block-based/ {began=1}')
$ ./ldb ingest_extern_sst ./tmp.sst --db=/dev/shm/ingestion_db/dbbench/
```

benchmark run command:
```
TEST_TMPDIR=/dev/shm/$DB/ ./db_bench -benchmarks=seekrandom -seek_nexts=10 -use_existing_db=true -cache_index_and_filter_blocks=false -num=1000000 -cache_size=1048576000 -threads=1 -reads=40000000
```

results:

| DB | code | throughput |
|---|---|---|
| normal_db | master |  267.9 |
| normal_db   |    PR6843 | 254.2 (-5.1%) |
| ingestion_db |   master |  259.6 |
| ingestion_db |   PR6843 | 250.5 (-3.5%) |

Reviewed By: pdillinger

Differential Revision: D21562604

Pulled By: ajkr

fbshipit-source-id: 937596f836930515da8084d11755e1f247dcb264
2020-05-28 10:51:30 -07:00
Yanqin Jin
961c7590d6 Add timestamp to delete (#6253)
Summary:
Preliminary user-timestamp support for delete.

If ["a", ts=100] exists, you can delete it by calling `DB::Delete(write_options, key)` in which `write_options.timestamp` points to a `ts` higher than 100.

Implementation
A new ValueType, i.e. `kTypeDeletionWithTimestamp` is added for deletion marker with timestamp.
The reason for a separate `kTypeDeletionWithTimestamp`: RocksDB may drop tombstones (keys with kTypeDeletion) when compacting them to the bottom level. This is OK and useful if timestamp is disabled. When timestamp is enabled, should we still reuse `kTypeDeletion`, we may drop the tombstone with a more recent timestamp, causing deleted keys to re-appear.

Test plan (dev server)
```
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6253

Reviewed By: ltamasi

Differential Revision: D20995328

Pulled By: riversand963

fbshipit-source-id: a9e5c22968ad76f98e3dc6ee0151265a3f0df619
2020-05-28 10:40:03 -07:00
Levi Tamasi
e3f953a863 Make it possible to look up files by number in VersionStorageInfo (#6862)
Summary:
Does what it says on the can: the patch adds a hash map to `VersionStorageInfo`
that maps file numbers to file locations, i.e. (level, position in level) pairs. This
will enable stricter consistency checks in `VersionBuilder`. The patch also fixes
all the unit tests that used duplicate file numbers in a version (which would trigger
an assertion with the new code).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6862

Test Plan:
`make check`
`make whitebox_crash_test`

Reviewed By: riversand963

Differential Revision: D21670446

Pulled By: ltamasi

fbshipit-source-id: 2eac249945cf33d8fb8597b26bfff5221e1a861a
2020-05-28 10:03:06 -07:00
Akanksha Mahajan
bcefc59e9f Allow MultiGet users to limit cumulative value size (#6826)
Summary:
1. Add a value_size in read options which limits the cumulative value size of keys read in batches. Once the size exceeds read_options.value_size, all the remaining keys are returned with status Abort without further fetching any key.
2. Add a unit test case MultiGetBatchedValueSizeSimple the reads keys from memory and sst files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6826

Test Plan:
1. make check -j64
	   2. Add a new unit test case

Reviewed By: anand1976

Differential Revision: D21471483

Pulled By: akankshamahajan15

fbshipit-source-id: dea51b8e76d5d1df38ece8cdb29933b1d798b900
2020-05-27 13:07:14 -07:00
Adam Retter
9060e6fa79 Add newer WBWI::NewIteratorWithBase functions to RocksJava (#6872)
Summary:
Exposes the `ReadOptions` arguments to `NewIteratorWithBase`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6872

Reviewed By: ajkr

Differential Revision: D21725867

Pulled By: pdillinger

fbshipit-source-id: 4079ba590cc13ba7a6244ed91439d89c40a543b6
2020-05-27 11:59:12 -07:00
Cheng Chang
82a82c76e7 Fix potential memory leak of scratch buffer (#6879)
Summary:
If `req.scratch` is an internally allocated buffer, but `raw_block_contents` is not constructed to own `req.scratch`, then `req.scratch` will be leaked.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6879

Test Plan: make asan_check

Reviewed By: anand1976

Differential Revision: D21728498

Pulled By: cheng-chang

fbshipit-source-id: 8fc6a4f2543918c565ddc16ecfad1807eb9a42cf
2020-05-26 15:29:04 -07:00
Kefu Chai
bd68bfb41b cmake: link env_librados_test against rados (#6855)
Summary:
otherwise we have FTBFS like:
2020-05-18T15:12:06.400 INFO:tasks.workunit.client.0.smithi032.stdout:[100%] Linking CXX executable env_librados_test
2020-05-18T15:12:06.620 INFO:tasks.workunit.client.0.smithi032.stderr:/usr/bin/ld: CMakeFiles/rocksdb_env_librados_test.dir/utilities/env_librados_test.cc.o: undefined reference to symbol
'_ZN8librados7v14_2_05Rados4initEPKc@LIBRADOS_14.2.0'
2020-05-18T15:12:06.620 INFO:tasks.workunit.client.0.smithi032.stderr:/usr/bin/ld: /lib/librados.so.2: error adding symbols: DSO missing from command line
2020-05-18T15:12:06.620 INFO:tasks.workunit.client.0.smithi032.stderr:collect2: error: ld returned 1 exit status

this addresses the regression introduced by 07204837ce,
which hides the symbols exposed by `${THIRDPARTY_LIBS}` from
consumers of librocksdb

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6855

Reviewed By: riversand963

Differential Revision: D21621904

fbshipit-source-id: 7022ba4dc0003504401fce6f06547e4d74a32ac0
2020-05-25 22:53:00 -07:00
Andrew Kryczka
b464a85e33 fix transaction rollback in db_stress TestMultiGet (#6873)
Summary:
There were further uses of `txn` after `RollbackTxn(txn)` leading to
stress test errors. Moved the rollback to the end of the function.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6873

Test Plan:
found a command from the crash test that previously failed immediately under TSAN; verified now it succeeds.
```
./db_stress --acquire_snapshot_one_in=10000 --allow_concurrent_memtable_write=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --block_size=16384 --bloom_bits=222.913637674 --bottommost_compression_type=none --cache_index_and_filter_blocks=1 --cache_size=1048576 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_style=1 --compaction_ttl=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox --db_write_buffer_size=1048576 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --disable_wal=0 --enable_pipelined_write=0 --flush_one_in=1000000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=12 --index_type=2 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --log2_keys_per_lock=22 --long_running_snapshots=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_levels=1 --open_files=100 --ops_per_thread=200000 --partition_filters=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefixpercent=5 --progress_reports=0 --read_fault_one_in=0 --readpercent=45 --recycle_log_file_num=0 --reopen=20 --snapshot_hold_ops=100000 --subcompactions=4 --sync=0 --sync_fault_injection=False --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --txn_write_policy=1 --unordered_write=1 --use_block_based_filter=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=1 --use_multiget=1 --use_txn=1 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=4194304 --write_dbid_to_manifest=0 --writepercent=35
```

Reviewed By: pdillinger

Differential Revision: D21708338

Pulled By: ajkr

fbshipit-source-id: dcf55cddee0a14f429a75e7a8a505acf8025f2b1
2020-05-24 15:27:24 -07:00
Andrew Kryczka
b559dba817 pin image version in circle CI builds (#6876)
Summary:
somehow the windows-server-2019-vs2019 image changed in a way that made
VS 14 2015 the default. This caused an error when we specify VS 16 2019
as the cmake generator. I could not figure out the right arguments/env
vars to get the latest VS working so pinned the image to the previous
version instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6876

Reviewed By: pdillinger

Differential Revision: D21709679

Pulled By: ajkr

fbshipit-source-id: 2d16819ad239b4611fa199547744e1c101dc9da0
2020-05-23 21:23:27 -07:00
Peter Dillinger
803a517b48 Misc things for ASSERT_STATUS_CHECKED, also gcc 4.8.5 (#6871)
Summary:
* Print stack trace on status checked failure
* Make folly_synchronization_distributed_mutex_test a parallel test
* Disable ldb_test.py and rocksdb_dump_test.sh with
  ASSERT_STATUS_CHECKED (broken)
* Fix shadow warning in random_access_file_reader.h reported by gcc
  4.8.5 (ROCKSDB_NO_FBCODE), also https://github.com/facebook/rocksdb/issues/6866
* Work around compiler bug on max_align_t for gcc < 4.9
* Remove an apparently wrong comment in status.h
* Use check_some in Travis config (for proper diagnostic output)
* Fix ignored Status in loop in options_helper.cc
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6871

Test Plan: manual, CI

Reviewed By: ajkr

Differential Revision: D21706619

Pulled By: pdillinger

fbshipit-source-id: daf6364173d6689904eb394461a69a11f5bee2cb
2020-05-23 06:53:37 -07:00
Marek Kurdej
bcd32560dd Fix warning -Wextra-semi. NFC. (#6869)
Summary:
Minor fix.

CLA signed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6869

Reviewed By: ajkr

Differential Revision: D21704001

Pulled By: pdillinger

fbshipit-source-id: 57fd08114f3234f51f34758e25e708cc70962582
2020-05-22 11:20:13 -07:00
Peter Dillinger
35a25a3fb9 Fix/expand ASSERT_STATUS_CHECKED build, add to Travis (#6870)
Summary:
Fixed some option handling code that recently broke the
ASSERT_STATUS_CHECKED build for options_test.

Added all other existing tests that pass under ASSERT_STATUS_CHECKED to
the whitelist.

Added a Travis configuration to run all whitelisted tests with
ASSERT_STATUS_CHECKED. (Someday we might enable this check by default in
debug builds.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6870

Test Plan: ASSERT_STATUS_CHECKED=1 make check, Travis

Reviewed By: ajkr

Differential Revision: D21704374

Pulled By: pdillinger

fbshipit-source-id: 15daef98136a19d7a6843fa0c9ec08738c2ac693
2020-05-22 11:17:29 -07:00
mrambacher
826295a5e9 Change autovector to have a reserved size in LITE mode (#6868)
Summary:
Previously in LITE mode, an autovector did not have a reserved size. When
elements were added to the vector, the underlying array could be reallocated.

There was a set of code that never expands the autovector and was doing &autovector::back().  When the vector is resized, the old addresses may become invalid, causing a later exception to be thrown.

By reserving space in the autovector up front, this problem is eliminated for those uses where the vector will never exceed the initial size.

the resize happens, these pointers become invalid, leading to SEGV or other exceptions.

This change allows the autovector to be fully populated before we take the address of any of its elements, thereby elminating the potential for a resize.

There is comparable code to this change in Version::MultiGet for dealing with the context objects.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6868

Reviewed By: ajkr

Differential Revision: D21693505

Pulled By: cheng-chang

fbshipit-source-id: e71d516b15e08f202593cb80f2a42f048fc95768
2020-05-21 14:48:10 -07:00
Andrew Kryczka
292bcf6227 skip direct I/O tests in rocksdb lite (#6867)
Summary:
Fix a couple places where direct I/O was used even though it is
unsupported in lite builds.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6867

Test Plan: `LITE=1 make check -j48`

Reviewed By: pdillinger

Differential Revision: D21689185

Pulled By: ajkr

fbshipit-source-id: 3eaa3abf69cd7d0bcaabbcad3bb5a26fb8dd7301
2020-05-21 13:57:17 -07:00
mrambacher
38be686160 Add Struct Type to OptionsTypeInfo (#6425)
Summary:
Added code for generically handing structs to OptionTypeInfo.  A struct is a collection of variables handled by their own map of OptionTypeInfos.  Examples of structs include Compaction and Cache options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6425

Reviewed By: siying

Differential Revision: D21668789

Pulled By: zhichao-cao

fbshipit-source-id: 064b110de39dadf82361ed4663f7ac1a535b0b07
2020-05-21 10:58:39 -07:00
Peter Dillinger
c7aedf1b48 Clean up some code related to file checksums (#6861)
Summary:
* Add missing unit test for schema stability of FileChecksumGenCrc32c
  (previously was only comparing to itself)
* A lot of clarifying comments
* Add some assertions for preconditions
* Rename WritableFileWriter::CalculateFileChecksum -> UpdateFileChecksum
* Simplify FileChecksumGenCrc32c with shared functions
* Implement EndianSwapValue to replace unused EndianTransform

And incidentally since I had trouble with 'make check-format' GitHub action disagreeing with local run,
* Output full diagnostic information when 'make check-format' fails in CI
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6861

Test Plan: new unit test passes before & after other changes

Reviewed By: zhichao-cao

Differential Revision: D21667115

Pulled By: pdillinger

fbshipit-source-id: 6a99970f87605aa024fa540c78cd519ff322c3e6
2020-05-21 08:12:51 -07:00
anand76
eb04bb86c6 Fix a bug in crash_test_with_txn (#6860)
Summary:
In NoBatchedOpsStress::TestMultiGet, call txn->Get() when transactions
are in use.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6860

Test Plan: make crash_test_with_txn

Reviewed By: pdillinger

Differential Revision: D21667249

Pulled By: anand1976

fbshipit-source-id: 194bd7b9630a8efc3ae29d85422a61214e9e200e
2020-05-20 14:47:05 -07:00
Zhichao Cao
545e14b53b Generate file checksum in SstFileWriter (#6859)
Summary:
If Option.file_checksum_gen_factory is set, rocksdb generates the file checksum during flush and compaction based on the checksum generator created by the factory and store the checksum and function name in vstorage and Manifest.

This PR enable file checksum generation in SstFileWrite and store the checksum and checksum function name in the  ExternalSstFileInfo, such that application can use them for other purpose, for example, ingest the file checksum with files in IngestExternalFile().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6859

Test Plan: add unit test and pass make asan_check.

Reviewed By: ajkr

Differential Revision: D21656247

Pulled By: zhichao-cao

fbshipit-source-id: 78a3570c76031d8832e3d2de3d6c79cdf2b675d0
2020-05-20 11:55:31 -07:00
Peter Dillinger
aaafcb80ab Use in-repo gtest in buck build (#6858)
Summary:
... so that we have freedom to upgrade it (see https://github.com/facebook/rocksdb/issues/6808).

As a side benefit, gtest will no longer be linked into main library in
buck build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6858

Test Plan: fb internal build & link

Reviewed By: riversand963

Differential Revision: D21652061

Pulled By: pdillinger

fbshipit-source-id: 6018104af944debde576b5beda6c134e737acedb
2020-05-20 11:37:45 -07:00
Akanksha Mahajan
a1523efcdf Status check enforcement for io_posix_test and options_settable_test (#6857)
Summary:
Added status check enforcement for io_posix_test and options_settable_test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6857

Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 check

Reviewed By: ajkr

Differential Revision: D21647904

Pulled By: akankshamahajan15

fbshipit-source-id: b7f2321eb6c141a88cd5e1270ecb7d58f00341af
2020-05-19 19:22:28 -07:00
anand76
39b24432d4 Strengthen MultiGet correctness verification in NoBatchedOpsStress (#6849)
Summary:
Add MultiGet to VerifyDb and check consistency with Get in TestMultiGet.

Test plan -
make crash_test
ASAN crash test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6849

Reviewed By: pdillinger

Differential Revision: D21635011

Pulled By: anand1976

fbshipit-source-id: deb5a79d08fefd8d8010204f1f20b83adc92310e
2020-05-19 12:47:22 -07:00
Levi Tamasi
1551f1011a Refactor the blob file related logic in VersionBuilder (#6835)
Summary:
This patch is groundwork for an upcoming change to store the set of
linked SSTs in `BlobFileMetaData`. With the current code, a new
`BlobFileMetaData` object is created each time a `VersionEdit` touches
a certain blob file. This is fine as long as these objects are lightweight
and cheap to create; however, with the addition of the linked SST set, it would
be very inefficient since the set would have to be copied over and over again.
Note that this is the same kind of problem that `VersionBuilder` is solving
w/r/t `Version`s and files, and we can apply the same solution; that is, we can
accumulate the changes in a different mutable object, and apply the delta in
one shot when the changes are committed. The patch does exactly that by
adding a new `BlobFileMetaDataDelta` class to `VersionBuilder`. In addition,
it turns the existing `GetBlobFileMetaData` helper into `IsBlobFileInVersion`
(which is fine since that's the only thing the method's clients care about now),
and adds a couple of helper methods that can create a `BlobFileMetaData`
object from the `BlobFileMetaData` in the base (if applicable) and the delta
when the `Version` is saved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6835

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D21505187

Pulled By: ltamasi

fbshipit-source-id: d81a48c5f2ca7b79d7124c935332a6bcf3d5d988
2020-05-19 10:00:04 -07:00
mrambacher
0ac0098705 Make options length longer for sst_dump_test (#6846)
Summary:
Under MacOS when running with make -j 8 check, the temporary directory generated was > 100 characters.  This caused the tests to do nothing under MacOS.  Most of them still reported success for doing nothing, but ReadaheadSize was expecting the test to run.

By making the option name longer, the tests will no run successfully (and do something!)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6846

Reviewed By: ajkr

Differential Revision: D21576032

fbshipit-source-id: b089cde0d598137b572aa8527cc5459085252af7
2020-05-19 09:22:12 -07:00
Cheng Chang
ada700b906 Re-read the whole request in direct IO mode when IO uring returns partial result (#6853)
Summary:
If both direct IO and IO uring are enabled, when IO uring returns partial result, we'll try to read the remaining part of the request, but the starting address/offset of the remaining part might not be aligned to the block size, in direct IO mode, the unaligned offset causes bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6853

Test Plan: run make check with both direct IO and IO uring enabled, this is covered by one of the continuous tests.

Reviewed By: anand1976

Differential Revision: D21603023

Pulled By: cheng-chang

fbshipit-source-id: 942f6a11ff21e1892af6c4464e02bab4c707787c
2020-05-18 17:25:57 -07:00
Cheng Chang
b9d65f5aa6 Trigger compaction in CompactOnDeletionCollector based on deletion ratio (#6806)
Summary:
In level compaction, if the total size (even if compensated after taking account of the deletions) of a level hasn't exceeded the limit, but there are lots of deletion entries in some SST files of the level, these files should also be good candidates for compaction. Otherwise, queries for the deleted keys might be slow because they need to go over all the tombstones.

This PR adds an option `deletion_ratio` to the factory of `CompactOnDeletionCollector` to configure it to trigger compaction when the ratio of tombstones >= `deletion_ratio`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6806

Test Plan:
Added new unit test in `compact_on_deletion_collector_test.cc`.
make compact_on_deletion_collector_test && ./compact_on_deletion_collector_test

Reviewed By: ajkr

Differential Revision: D21511981

Pulled By: cheng-chang

fbshipit-source-id: 65a9d0150e8c9c00337787686475252e4535a3e1
2020-05-18 08:42:05 -07:00
Yanqin Jin
d790e6004f Fix buck target db_stress_lib in opt mode (#6847)
Summary:
In buck build with opt mode, target should not include rocksdb_test_lib.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6847

Test Plan: Watch for internal cont build.

Reviewed By: ajkr

Differential Revision: D21586803

Pulled By: riversand963

fbshipit-source-id: 76d253c18d16fac6cab86a8c3f6b471ad5b6efb3
2020-05-16 21:48:20 -07:00
Peter Dillinger
50ba245f6d Use 'make all' in LITE Travis configuration (#6834)
Summary:
So that we don't miss LITE compilation errors in tests
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6834

Reviewed By: anand1976

Differential Revision: D21503227

Pulled By: pdillinger

fbshipit-source-id: 3b2fdf3c4d395354d0ababac06da32addbafb3a5
2020-05-15 13:59:24 -07:00
Cheng Chang
91b7553293 Enable IO Uring in MultiGet in direct IO mode (#6815)
Summary:
Currently, in direct IO mode, `MultiGet` retrieves the data blocks one by one instead of in parallel, see `BlockBasedTable::RetrieveMultipleBlocks`.

Since direct IO is supported in `RandomAccessFileReader::MultiRead` in https://github.com/facebook/rocksdb/pull/6446, this PR applies `MultiRead` to `MultiGet` so that the data blocks can be retrieved in parallel.

Also, in direct IO mode and when data blocks are compressed and need to uncompressed, this PR only allocates one continuous aligned buffer to hold the data blocks, and then directly uncompress the blocks to insert into block cache, there is no longer intermediate copies to scratch buffers.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6815

Test Plan:
1. added a new unit test `BlockBasedTableReaderTest::MultiGet`.
2. existing unit tests and stress tests  contain tests against `MultiGet` in direct IO mode.

Reviewed By: anand1976

Differential Revision: D21426347

Pulled By: cheng-chang

fbshipit-source-id: b8446ae0e74152444ef9111e97f8e402ac31b24f
2020-05-14 23:26:26 -07:00
Yanqin Jin
b11a8b1b9a Fix valgrind error by init memory region (#6842)
Summary:
As title. After allocating a memory buffer, initialize its content to 0s.
This fixes valgrind issue introduced in https://github.com/facebook/rocksdb/issues/6709.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6842

Test Plan:
```
$make valgrind_test
$valgrind --tool=memcheck --track-origins=yes ./db_test2 --gtest_filter=DBTest2.CompressionFailures
```

Reviewed By: pdillinger

Differential Revision: D21551848

Pulled By: riversand963

fbshipit-source-id: e87a6f413e3f3d92d8e23d8ecc4cf93479c6674c
2020-05-14 18:50:03 -07:00
Zhichao Cao
06a2dcebea Move checksum calculation ahead of memory copy (#6844)
Summary:
Originally, the checksum of appended data in writable file writer is calculated after the data is copied to the buffer. It will not be able to catch the bit flip happens during copy. Move the checksum calculation before it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6844

Test Plan: pass make asan_check

Reviewed By: cheng-chang

Differential Revision: D21576726

Pulled By: zhichao-cao

fbshipit-source-id: 0a062a1f19886f6ea0d4e3f557e6f4b799773254
2020-05-14 14:58:14 -07:00
anand76
50d63a2af0 Fix LITE build failure in compaction_picker_test (#6839)
Summary:
Fix LITE build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6839

Test Plan: LITE=1 make check

Reviewed By: riversand963

Differential Revision: D21535808

Pulled By: anand1976

fbshipit-source-id: fcad961eca08e13fb0c256c92d18c3c1f1165f22
2020-05-13 10:47:15 -07:00
Yanqin Jin
3bea276fc8 Do not print u'string' in TARGETS file (#6841)
Summary:
Before this PR, extra deps passed in from cmd line to buckifier will be parsed
and used to populate a dict. Using this dict and printing to TARGETS file will
lead to printing u'', disallowed by build tools. This PR removes the u''.

Test Plan (local dev server):
```
python buckifier/buckify_rocksdb.py  '{"fake": {"extra_deps": [":test_dep", "//fake/module:mock1"], "extra_compiler_flags": ["-Os", "-DROCKSDB_LITE"]}}'
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6841

Reviewed By: siying

Differential Revision: D21538155

Pulled By: riversand963

fbshipit-source-id: 09403668a4aa1a15bad7dac229c2bc8ce8ee1349
2020-05-12 21:37:31 -07:00
Tongliang Liao
244797aa4b Add find_dependency() in cmake config file. (#6791)
Summary:
Currently when building PyTorch with latest RocksDB we get errors like "missing target Snappy::snappy", because they are simply not there.
With old `${VAR}` approach we essentially hard-code the abs path found during RocksDB build, which is:
 - Not relocatable.
 - Doesn't work when changed to modern target-based design because that requires target to present when used for expansion.

This fix allows cmake to setup imported target, if enabled during RocksDB build, when downstream uses `find_package(RocksDB)`.

This is for https://github.com/facebook/rocksdb/issues/6179
tchaikov Please help review, thanks!
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6791

Reviewed By: riversand963

Differential Revision: D21471553

fbshipit-source-id: 8d4ff2ab589a97ca6e6ba27e1f17b97a00f06206
2020-05-12 21:18:29 -07:00
Tongliang Liao
07204837ce Mark dependencies as PRIVATE and fix missing dependencies in tools. (#6790)
Summary:
Tools were mistakenly using leaked (PUBLIC) dependencies from main lib.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6790

Reviewed By: riversand963

Differential Revision: D21471551

fbshipit-source-id: ec43b92e231777e0fcf0f865444391af09d6963b
2020-05-12 21:07:55 -07:00
sdong
4a4b8a1344 sst_dump to reduce number of file reads (#6836)
Summary:
sst_dump can issue many file reads from the file system. This doesn't work well with file systems without a OS cache, especially remote file systems. In order to mitigate this problem, several improvements are done:
1. --readahead_size is added, so that users can specify readahead size when scanning the data.
2. Force a 512KB tail readahead, which prevents three I/Os for footer, meta index and property blocks and hopefully index and filter blocks too.
3. Consoldiate SSTDump's I/Os before opening the file for read. Use the same file prefetch buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6836

Test Plan: Add a test that covers this new feature.

Reviewed By: pdillinger

Differential Revision: D21516607

fbshipit-source-id: 3ae43526286f67b2f4a5bdedfbc92719d579b87e
2020-05-12 18:23:33 -07:00
Stanislav Tkach
70aaa9ceeb Expose CancellAllBackgroundWork to C api (#6832)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6832

Reviewed By: cheng-chang

Differential Revision: D21498186

Pulled By: riversand963

fbshipit-source-id: 66bb0d7c06af2bf0df3c6a09b61bca2fb81f2dd3
2020-05-12 14:50:52 -07:00