Commit Graph

4587 Commits

Author SHA1 Message Date
Peter Dillinger
2819c7840e Fix PrepopulateBlockCache::kFlushOnly (#8750)
Summary:
kFlushOnly currently means "always" except in the case of
remote compaction. This makes it flushes only.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8750

Test Plan: test updated

Reviewed By: akankshamahajan15

Differential Revision: D30968034

Pulled By: pdillinger

fbshipit-source-id: 5dbd24dde18852a0e937a540995fba9bfbe89037
2021-09-15 15:33:20 -07:00
sdong
12d798ac06 Always iniitalize ArenaWrappedDBIter::db_iter_ to nullptr (#8889)
Summary:
ArenaWrappedDBIter::db_iter_ should never be nullptr. However, when debugging a segfault, it's hard to distinguish it is not initialized (not possible) and other corruption. Add this nullptr to help distinguish the case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8889

Test Plan: Run existing unit tests.

Reviewed By: pdillinger

Differential Revision: D30814756

fbshipit-source-id: 4b1f36896a33dc203d4f1f424ded9554927d61ba
2021-09-14 14:33:15 -07:00
Andrew Kryczka
d648cb47b9 Adapt key-value checksum for timestamp-suffixed keys (#8914)
Summary:
After https://github.com/facebook/rocksdb/issues/8725, keys added to `WriteBatch` may be timestamp-suffixed, while `WriteBatch` has no awareness of the timestamp size. Therefore, `WriteBatch` can no longer calculate timestamp checksum separately from the rest of the key's checksum in all cases.

This PR changes the definition of key in KV checksum to include the timestamp suffix. That way we do not need to worry about where the timestamp begins within the key. I believe the only practical effect of this change is now `AssignTimestamp()` requires recomputing the whole key checksum (`UpdateK()`) rather than just the timestamp portion (`UpdateT()`).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8914

Test Plan:
run stress command that used to fail

```
$ ./db_stress --batch_protection_bytes_per_key=8 -clear_column_family_one_in=0 -test_batches_snapshots=1
```

Reviewed By: riversand963

Differential Revision: D30925715

Pulled By: ajkr

fbshipit-source-id: c143f7ccb46c0efb390ad57ef415c250d754deff
2021-09-14 13:14:39 -07:00
eharry
0b6be7eb68 Fix WAL log data corruption #8723 (#8746)
Summary:
Fix WAL log data corruption when using DBOptions.manual_wal_flush(true) and WriteOptions.sync(true) together (https://github.com/facebook/rocksdb/issues/8723)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8746

Reviewed By: ajkr

Differential Revision: D30758468

Pulled By: riversand963

fbshipit-source-id: 07c20899d5f2447dc77861b4845efc68a59aa4e8
2021-09-13 20:15:59 -07:00
Peter Dillinger
7bef598440 Bypass unused parameterization in ExternalSSTFileBasicTest.IngestExte… (#8910)
Summary:
Facebook infrastructure doesn't like continuously skipping
tests, so fixing this permanently disabled parameterization to BYPASS
instead of SKIP. (Internal ref: T100525285)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8910

Test Plan: manual

Reviewed By: anand1976

Differential Revision: D30905169

Pulled By: pdillinger

fbshipit-source-id: e23d63d2aa800e54676269fad3a093cd3f9f222d
2021-09-13 12:18:15 -07:00
Levi Tamasi
306b779957 Use GetBlobFileSize instead of GetTotalBlobBytes in DB properties (#8902)
Summary:
The patch adjusts the definition of BlobDB's DB properties a bit by
switching to `GetBlobFileSize` from `GetTotalBlobBytes`. The
difference is that the value returned by `GetBlobFileSize` includes
the blob file header and footer as well, and thus matches the on-disk
size of blob files. In addition, the patch removes the `Version` number
from the `blob_stats` property, and updates/extends the unit tests a little.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8902

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D30859542

Pulled By: ltamasi

fbshipit-source-id: e3426d2d567bd1bd8c8636abdafaafa0743c854c
2021-09-13 10:47:16 -07:00
mrambacher
dafa584fd1 Change the File System File Wrappers to std::unique_ptr (#8618)
Summary:
This allows the wrapper classes to own the wrapped object and eliminates confusion as to ownership.  Previously, many classes implemented their own ownership solutions.  Fixes https://github.com/facebook/rocksdb/issues/8606

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8618

Reviewed By: pdillinger

Differential Revision: D30136064

Pulled By: mrambacher

fbshipit-source-id: d0bf471df8818dbb1770a86335fe98f761cca193
2021-09-13 08:46:19 -07:00
Yanqin Jin
2a2b3e03a5 Allow WriteBatch to have keys with different timestamp sizes (#8725)
Summary:
In the past, we unnecessarily requires all keys in the same write batch
to be from column families whose timestamps' formats are the same for
simplicity. Specifically, we cannot use the same write batch to write to
two column families, one of which enables timestamp while the other
disables it.

The limitation is due to the member `timestamp_size_` that used to exist
in each `WriteBatch` object. We pass a timestamp_size to the constructor
of `WriteBatch`. Therefore, users can simply use the old
`WriteBatch::Put()`, `WriteBatch::Delete()`, etc APIs for write, while
the internal implementation of `WriteBatch` will take care of memory
allocation for timestamps.

The above is not necessary.
One the one hand, users can set up a memory buffer to store user key and
then contiguously append the timestamp to the user key. Then the user
can pass this buffer to the `WriteBatch::Put(Slice&)` API.
On the other hand, users can set up a SliceParts object which is an
array of Slices and let the last Slice to point to the memory buffer
storing timestamp. Then the user can pass the SliceParts object to the
`WriteBatch::Put(SliceParts&)` API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8725

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D30654499

Pulled By: riversand963

fbshipit-source-id: 9d848c77ad3c9dd629aa5fc4e2bc16fb0687b4a2
2021-09-12 15:34:26 -07:00
Peter Dillinger
bda8d93ba9 Fix and detect headers with missing dependencies (#8893)
Summary:
It's always annoying to find a header does not include its own
dependencies and only works when included after other includes. This
change adds `make check-headers` which validates that each header can
be included at the top of a file. Some headers are excluded e.g. because
of platform or external dependencies.

rocksdb_namespace.h had to be re-worked slightly to enable checking for
failure to include it. (ROCKSDB_NAMESPACE is a valid namespace name.)

Fixes mostly involve adding and cleaning up #includes, but for
FileTraceWriter, a constructor was out-of-lined to make a forward
declaration sufficient.

This check is not currently run with `make check` but is added to
CircleCI build-linux-unity since that one is already relatively fast.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8893

Test Plan: existing tests and resolving issues detected by new check

Reviewed By: mrambacher

Differential Revision: D30823300

Pulled By: pdillinger

fbshipit-source-id: 9fff223944994c83c105e2e6496d24845dc8e572
2021-09-10 10:00:26 -07:00
mrambacher
dc0dc90cf5 Make Statistics a Customizable Class (#8637)
Summary:
Make the Statistics object into a Customizable object.  Statistics can now be stored and created to/from the Options file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8637

Reviewed By: zhichao-cao

Differential Revision: D30530550

Pulled By: mrambacher

fbshipit-source-id: 5fc7d01d8431f37b2c205bbbd8342c9f697023bd
2021-09-10 09:47:39 -07:00
anand76
eea566864e Support custom Env in db_sst_test and external_sst_file_basic_test (#8888)
Summary:
Support custom Env in these tests. Some custom Envs do not support reopening a file for write, either normal mode or Random RW mode. Added some additional checks in external_sst_file_basic_test to accommodate those Envs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8888

Reviewed By: riversand963

Differential Revision: D30824481

Pulled By: anand1976

fbshipit-source-id: c3ac7a628e6df29e94f42e370e679934a4f77eac
2021-09-08 21:21:49 -07:00
Zhiyi Zhang
0cb0fc6fd3 Add DB properties for BlobDB (#8734)
Summary:
RocksDB exposes certain internal statistics via the DB property interface.
However, there are currently no properties related to BlobDB.

For starters, we would like to add the following BlobDB properties:
`rocksdb.num-blob-files`: number of blob files in the current Version (kind of like `num-files-at-level` but note this is not per level, since blob files are not part of the LSM tree).
`rocksdb.blob-stats`: this could return the total number and size of all blob files, and potentially also the total amount of garbage (in bytes) in the blob files in the current Version.
`rocksdb.total-blob-file-size`: the total size of all blob files (as a blob counterpart for `total-sst-file-size`) of all Versions.
`rocksdb.live-blob-file-size`: the total size of all blob files in the current Version.
`rocksdb.estimate-live-data-size`: this is actually an existing property that we can extend so it considers blob files as well. When it comes to blobs, we actually have an exact value for live bytes. Namely, live bytes can be computed simply as total bytes minus garbage bytes, summed over the entire set of blob files in the Version.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8734

Test Plan:
```
➜  rocksdb git:(new_feature_blobDB_properties) ./db_blob_basic_test
[==========] Running 16 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 10 tests from DBBlobBasicTest
[ RUN      ] DBBlobBasicTest.GetBlob
[       OK ] DBBlobBasicTest.GetBlob (12 ms)
[ RUN      ] DBBlobBasicTest.MultiGetBlobs
[       OK ] DBBlobBasicTest.MultiGetBlobs (11 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_CorruptIndex
[       OK ] DBBlobBasicTest.GetBlob_CorruptIndex (10 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_InlinedTTLIndex
[       OK ] DBBlobBasicTest.GetBlob_InlinedTTLIndex (12 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_IndexWithInvalidFileNumber
[       OK ] DBBlobBasicTest.GetBlob_IndexWithInvalidFileNumber (9 ms)
[ RUN      ] DBBlobBasicTest.GenerateIOTracing
[       OK ] DBBlobBasicTest.GenerateIOTracing (11 ms)
[ RUN      ] DBBlobBasicTest.BestEffortsRecovery_MissingNewestBlobFile
[       OK ] DBBlobBasicTest.BestEffortsRecovery_MissingNewestBlobFile (13 ms)
[ RUN      ] DBBlobBasicTest.GetMergeBlobWithPut
[       OK ] DBBlobBasicTest.GetMergeBlobWithPut (11 ms)
[ RUN      ] DBBlobBasicTest.MultiGetMergeBlobWithPut
[       OK ] DBBlobBasicTest.MultiGetMergeBlobWithPut (14 ms)
[ RUN      ] DBBlobBasicTest.BlobDBProperties
[       OK ] DBBlobBasicTest.BlobDBProperties (21 ms)
[----------] 10 tests from DBBlobBasicTest (124 ms total)

[----------] 6 tests from DBBlobBasicTest/DBBlobBasicIOErrorTest
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/0 (12 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/1 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/0 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/1 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/0 (1011 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/1 (1013 ms)
[----------] 6 tests from DBBlobBasicTest/DBBlobBasicIOErrorTest (2066 ms total)

[----------] Global test environment tear-down
[==========] 16 tests from 2 test cases ran. (2190 ms total)
[  PASSED  ] 16 tests.
```

Reviewed By: ltamasi

Differential Revision: D30690849

Pulled By: Zhiyi-Zhang

fbshipit-source-id: a7567319487ad76bd1a2e24bf143afdbbd9e4346
2021-09-08 12:22:04 -07:00
mrambacher
beed86473a Make MemTableRepFactory into a Customizable class (#8419)
Summary:
This PR does the following:
-> Makes the MemTableRepFactory into a Customizable class and creatable/configurable via CreateFromString
-> Makes the existing implementations compatible with configurations
-> Moves the "SpecialRepFactory" test class into testutil, accessible via the ObjectRegistry or a NewSpecial API

New tests were added to validate the functionality and all existing tests pass.  db_bench and memtablerep_bench were hand-tested to verify the functionality in those tools.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8419

Reviewed By: zhichao-cao

Differential Revision: D29558961

Pulled By: mrambacher

fbshipit-source-id: 81b7229636e4e649a0c914e73ac7b0f8454c931c
2021-09-08 07:46:44 -07:00
Andrew Kryczka
941543721d Bytes read stat for VerifyChecksum() and VerifyFileChecksums() APIs (#8741)
Summary:
- Clarified some comments on compatibility for adding new ticker stats
- Added read I/O stats for `VerifyChecksum()` and `VerifyFileChecksums()` APIs

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8741

Test Plan: new unit test

Reviewed By: zhichao-cao

Differential Revision: D30708578

Pulled By: ajkr

fbshipit-source-id: d06b961f7e199ae92c266b683e39870aa8f63449
2021-09-07 13:28:29 -07:00
Peter Dillinger
0ef88538c6 Improve support for using regexes (#8740)
Summary:
* Consolidate use of std::regex for testing to testharness.cc, to
minimize Facebook linters constantly flagging uses in non-production
code.
* Improve syntax and error messages for asserting some string matches a
regex in tests.
* Add a public Regex wrapper class to encapsulate existing usage in
ObjectRegistry.
* Remove unnecessary include <regex>
* Put warnings that use of Regex in production code could cause bad
performance or stack overflow.

Intended follow-up work:
* Replace std::regex with another underlying implementation like RE2
* Improve ObjectRegistry interface in terms of possibly confusing literal
string matching vs. regex and in terms of reporting invalid regex.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8740

Test Plan:
tests updated, basic unit test for public Regex, and some manual
testing of temporary changes to see example error messages:

utilities/backupable/backupable_db_test.cc:917: Failure
000010_1162373755_138626.blob (child.name)
does not match regex
[0-9]+_[0-9]+_[0-9]+[.]blobHAHAHA (pattern)

db/db_basic_test.cc:74: Failure
R3SHSBA8C4U0CIMV2ZB0 (sid3)
does not match regex [0-9A-Z]{20}HAHAHA

Reviewed By: mrambacher

Differential Revision: D30706246

Pulled By: pdillinger

fbshipit-source-id: ba845e8f563ccad39bdb58f44f04e9da8f78c3fd
2021-09-07 13:05:23 -07:00
Peter Dillinger
4750421ece Replace most typedef with using= (#8751)
Summary:
Old typedef syntax is confusing

Most but not all changes with

    perl -pi -e 's/typedef (.*) ([a-zA-Z0-9_]+);/using $2 = $1;/g' list_of_files
    make format

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8751

Test Plan: existing

Reviewed By: zhichao-cao

Differential Revision: D30745277

Pulled By: pdillinger

fbshipit-source-id: 6f65f0631c3563382d43347896020413cc2366d9
2021-09-07 11:31:59 -07:00
Levi Tamasi
55ef8972fc Support custom env in db_blob_{basic,compaction,corruption,index}_test (#8817)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8817

Test Plan: Ran `make check` and built/tested using internal custom environment.

Reviewed By: riversand963

Differential Revision: D30768215

Pulled By: ltamasi

fbshipit-source-id: cce96211d4c097612d20247f2e997358f40cc3d3
2021-09-07 11:13:56 -07:00
Akanksha Mahajan
e8a7001159 Update branch as "main" in tools/advisor/README.md (#8744)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8744

Reviewed By: ltamasi

Differential Revision: D30716145

Pulled By: akankshamahajan15

fbshipit-source-id: c2fcaf9ddcae85a86c0f10496acab28cd795ff12
2021-09-01 20:26:28 -07:00
Peter Dillinger
c9cd5d25a8 Remove some unneeded code (#8736)
Summary:
* FullKey and ParseFullKey appear to serve no purpose in the public API
(or anything else) so removed. Only use in one test updated.
* NumberToString serves no purpose vs. ToString so removed, numerous
calls updated
* Remove unnecessary forward declarations in metadata.h by re-arranging
class definitions.
* Remove some unneeded semicolons

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736

Test Plan: existing tests

Reviewed By: mrambacher

Differential Revision: D30700039

Pulled By: pdillinger

fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2
2021-09-01 14:28:58 -07:00
Peter Dillinger
13ded69484 Built-in support for generating unique IDs, bug fix (#8708)
Summary:
Env::GenerateUniqueId() works fine on Windows and on POSIX
where /proc/sys/kernel/random/uuid exists. Our other implementation is
flawed and easily produces collision in a new multi-threaded test.
As we rely more heavily on DB session ID uniqueness, this becomes a
serious issue.

This change combines several individually suitable entropy sources
for reliable generation of random unique IDs, with goal of uniqueness
and portability, not cryptographic strength nor maximum speed.

Specifically:
* Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
rather than in Env implementation details. Callers are now told whether
the operation fails or succeeds.
* Adds an internal API GenerateRawUniqueId for generating high-quality
128-bit unique identifiers, by combining entropy from three "tracks":
  * Lots of info from default Env like time, process id, and hostname.
  * std::random_device
  * port::GenerateRfcUuid (when working)
* Built-in implementations of Env::GenerateUniqueId() will now always
produce an RFC 4122 UUID string, either from platform-specific API or
by converting the output of GenerateRawUniqueId.

DB session IDs now use GenerateRawUniqueId while DB IDs (not as
critical) try to use port::GenerateRfcUuid but fall back on
GenerateRawUniqueId with conversion to an RFC 4122 UUID.

GenerateRawUniqueId is declared and defined under env/ rather than util/
or even port/ because of the Env dependency.

Likely follow-up: enhance GenerateRawUniqueId to be faster after the
first call and to guarantee uniqueness within the lifetime of a single
process (imparting the same property onto DB session IDs).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708

Test Plan:
A new mini-stress test in env_test checks the various public
and internal APIs for uniqueness, including each track of
GenerateRawUniqueId individually. We can't hope to verify anywhere close
to 128 bits of entropy, but it can at least detect flaws as bad as the
old code. Serial execution of the new tests takes about 350 ms on
my machine.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D30563780

Pulled By: pdillinger

fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
2021-08-30 15:20:41 -07:00
Zaorang Yang
2bc914094d Refactor with VersionBuilder (#8706)
Summary:
Introduce a new function to save sst files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8706

Reviewed By: jay-zhuang

Differential Revision: D30544242

Pulled By: riversand963

fbshipit-source-id: 554755852daff7ae1c7864b0029f51b27099ee09
2021-08-27 12:15:08 -07:00
James Yin
7ddc096d7d Fix typo in the comment of log_empty_ (#8711)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8711

Reviewed By: riversand963

Differential Revision: D30566761

Pulled By: jay-zhuang

fbshipit-source-id: dd4690f5e2af2d263ed75ea1b9ed24692fe81362
2021-08-27 12:10:29 -07:00
anand76
ebaa3c8a59 Fix a race condition in DumpStats() during iteration of the ColumnFamilySet (#8714)
Summary:
DumpStats() iterates through the ColumnFamilySet. There is a potential
race condition because it does Ref the cfd, and the cfd could get
destroyed during the iteration.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8714

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D30580199

Pulled By: anand1976

fbshipit-source-id: 60a3443ad0d4f7ac6a977dec780e6d2c1b70b850
2021-08-26 15:40:26 -07:00
Jay Zhuang
4afa24f8ae Deflake test CompactionJobTest.InputSerialization (#8712)
Summary:
It's invalid to have an empty file name.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8712

Test Plan:
```
$ gtest-parallel ./compaction_job_test --gtest_filter=CompactionJobTest.InputSerialization -r 10000
```

Reviewed By: pdillinger

Differential Revision: D30566739

Pulled By: jay-zhuang

fbshipit-source-id: 41e73175e3c95c4b73b4fdcd33470788d4e29d37
2021-08-26 09:27:37 -07:00
Yanqin Jin
f235f4b0a3 Fix a bug of secondary instance sequence going backward (#8653)
Summary:
Recent refactor of `ReactiveVersionSet::ReadAndApply()` uses
`ManifestTailer` whose `Iterate()` method can cause the db's
`last_sequence_` to go backward. Consequently, read requests can see
out-dated data. For example, latest changes to the primary will not be
seen on the secondary even after a `TryCatchUpWithPrimary()` if no new
write batches are read from the WALs and no new MANIFEST entries are
read from the MANIFEST.

Fix the bug so that `VersionEditHandler::CheckIterationResult` will
never decrease `last_sequence_`, `last_allocated_sequence_` and
`last_published_sequence_`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8653

Test Plan: make check

Reviewed By: jay-zhuang

Differential Revision: D30272084

Pulled By: riversand963

fbshipit-source-id: c6a49c534b2509b93ef62d8936ed0acd5b860eaa
2021-08-24 18:18:36 -07:00
Peter Dillinger
318fe6941a Add port::GetProcessID() (#8693)
Summary:
Useful in some places for object uniqueness across processes.
Currently used for generating a host-wide identifier of Cache objects
but expected to be used soon in some unique id generation code.

`int64_t` is chosen for return type because POSIX uses signed integer type,
usually `int`, for `pid_t` and Windows uses `DWORD`, which is `uint32_t`.

Future work: avoid copy-pasted declarations in port_*.h, perhaps with
port_common.h always included from port.h

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8693

Test Plan: manual for now

Reviewed By: ajkr, anand1976

Differential Revision: D30492876

Pulled By: pdillinger

fbshipit-source-id: 39fc2788623cc9f4787866bdb67a4d183dde7eef
2021-08-24 17:46:14 -07:00
Yanqin Jin
229350ef48 Allow iterate refresh for secondary instance (#8700)
Summary:
Test plan
make check

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8700

Reviewed By: zhichao-cao

Differential Revision: D30523907

Pulled By: riversand963

fbshipit-source-id: 68928ab4dafb64ce80ab7bc69d83727a4713ab91
2021-08-24 15:40:56 -07:00
Andrew Kryczka
c521f22a1e Deflake write-prepared and write-unprepared tests (#8696)
Summary:
The `JobContext::job_snapshot` referenced DB state but could
have been deleted by a BG thread after the signal/unlock allowing
shutdown to proceed. Then we would see an error like this (valgrind):

```
==354104== Thread 2:
==354104== Invalid read of size 8
==354104==    at 0x694C4D: rocksdb::ManagedSnapshot::~ManagedSnapshot() (snapshot_impl.cc:20)
==354104==    by 0x58F5BA: operator() (unique_ptr.h:81)
==354104==    by 0x58F5BA: operator() (unique_ptr.h:75)
==354104==    by 0x58F5BA: ~unique_ptr (unique_ptr.h:292)
==354104==    by 0x58F5BA: rocksdb::JobContext::~JobContext() (job_context.h:221)
==354104==    by 0x5F155E: rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) (db_impl_compaction_flush.cc:2696)
==354104==    by 0x5F1BC2: rocksdb::DBImpl::BGWorkCompaction(void*) (db_impl_compaction_flush.cc:2468)
==354104==    by 0x83707A: operator() (std_function.h:688)
==354104==    by 0x83707A: rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long) (threadpool_imp.cc:266)
==354104==    by 0x8373ED: rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*) (threadpool_imp.cc:307)
==354104==    by 0x492A800: execute_native_thread_routine (in /usr/local/fbcode/platform009/lib/libstdc++.so.6.0.28)
==354104==    by 0x4A5020B: start_thread (in /usr/local/fbcode/platform009/lib/libpthread-2.30.so)
==354104==    by 0x4CF281E: clone (in /usr/local/fbcode/platform009/lib/libc-2.30.so)
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8696

Test Plan: unable to repro

Reviewed By: pdillinger

Differential Revision: D30505277

Pulled By: ajkr

fbshipit-source-id: 5a99f34137cd14d06b0f624add6d37a70a61135d
2021-08-23 23:09:17 -07:00
Jay Zhuang
249b1078c9 Add extra information to RemoteCompaction APIs (#8680)
Summary:
Currently, we only provide job_id in RemoteCompaction APIs, the
main problem of `job_id` is it cannot uniquely identify a compaction job
between DB instances or between sessions.
Providing DB and session id to the user, which will make building cross
DB compaction service easier.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8680

Test Plan: unittest

Reviewed By: ajkr

Differential Revision: D30444859

Pulled By: jay-zhuang

fbshipit-source-id: fdf107f4286564049637f154193c6d94c3c59448
2021-08-23 16:27:38 -07:00
Peter Dillinger
04db764831 Embed original file number in SST table properties (#8686)
Summary:
I very recently realized that with https://github.com/facebook/rocksdb/issues/8669 we cannot later add
file numbers to external SST files (so that more can share db session
ids for better uniqueness properties), because of forward compatibility.
We would have a version of RocksDB that assumes session IDs are unique
on external SST files and therefore can't really break that invariant in
future files.

This change adds a table property for "orig_file_number" which is
populated by normal SST files and also external SST files generated by
SstFileWriter. SstFileWriter now keeps a db_session_id for life of the
object and increments its own file numbers for embedding in table
properties. (They are arguably "fake" file numbers because these numbers
and not embedded in the file name.)

While updating block_based_table_builder, I removed several unnecessary
fields from Rep, because following the pattern would have created
another unnecessary field.

This change also updates block_based_table_reader to use this new
property when available, which means that for newer SST files, we can
determine the stable/original <db_session_id,file_number> unique
identifier using just the file contents, not the file name. (It's a bit
complicated; detailed comments in block_based_table_reader.)

Also added DB host id to properties listing by sst_dump, which could be
useful in debugging.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8686

Test Plan: majorly overhauled StableCacheKeys test for this change

Reviewed By: zhichao-cao

Differential Revision: D30457742

Pulled By: pdillinger

fbshipit-source-id: 2e5ae7dddeb94fb9d8eac8a928486aed8b8cd445
2021-08-20 20:40:48 -07:00
Peter Dillinger
2a383f21f4 Add Bloom/Ribbon hybrid API support (#8679)
Summary:
This is essentially resurrection and fixing of the part of
https://github.com/facebook/rocksdb/issues/8198 that was reverted in https://github.com/facebook/rocksdb/issues/8212, using data added in https://github.com/facebook/rocksdb/issues/8246. Basically,
when configuring Ribbon filter, you can specify an LSM level before which
Bloom will be used instead of Ribbon. But Bloom is only considered for
Leveled and Universal compaction styles and file going into a known LSM
level. This way, SST file writer, FIFO compaction, etc. use Ribbon filter as
you would expect with NewRibbonFilterPolicy.

So that this can be controlled with a single int value and so that flushes
can be distinguished from intra-L0, we consider flush to go to level -1 for
the purposes of this option. (Explained in API comment.)

I also expect the most common and recommended Ribbon configuration to
use Bloom during flush, to minimize slowing down writes and because according
to my estimates, Ribbon only pays off if the structure lives in memory for
more than an hour. Thus, I have changed the default for NewRibbonFilterPolicy
to be this mild hybrid configuration. I don't really want to add something like
NewHybridFilterPolicy because at least the mild hybrid configuration (Bloom for
flush, Ribbon otherwise) should be considered a natural choice.

C APIs also updated, but because they don't support overloading,
rocksdb_filterpolicy_create_ribbon is kept pure ribbon for clarity and
rocksdb_filterpolicy_create_ribbon_hybrid must be called for a hybrid
configuration. While touching C API, I changed bits per key options from
int to double.

BuiltinFilterPolicy is needed so that LevelThresholdFilterPolicy doesn't inherit
unused fields from BloomFilterPolicy.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8679

Test Plan: new + updated tests, including crash test

Reviewed By: jay-zhuang

Differential Revision: D30445797

Pulled By: pdillinger

fbshipit-source-id: 6f5aeddfd6d79f7e55493b563c2d1d2d568892e1
2021-08-20 18:00:16 -07:00
Merlin Mao
baf22b4ee6 Add IteratorTraceExecutionResult for iterator related trace records. (#8687)
Summary:
- Allow to get `Valid()`, `status()`, `key()` and `value()` of an iterator from `IteratorTraceExecutionResult`.
- Move lower bound and upper bound from `IteratorSeekQueryTraceRecord` to `IteratorQueryTraceRecord`.

Added test in `DBTest2.TraceAndReplay`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8687

Reviewed By: zhichao-cao

Differential Revision: D30457630

Pulled By: autopear

fbshipit-source-id: be433099a25895b3aa6f0c00f95ad7b1d7489c1d
2021-08-20 15:35:56 -07:00
Akanksha Mahajan
5efec84c60 Fix blob callback in compaction and atomic flush (#8681)
Summary:
Pass BlobFileCompletionCallback  in case of atomic flush and
compaction job which is currently nullptr(default parameter).
BlobFileCompletionCallback is used in case of IntegratedBlobDB to report new blob files to
SstFileManager.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8681

Test Plan: CircleCI jobs

Reviewed By: ltamasi

Differential Revision: D30445998

Pulled By: akankshamahajan15

fbshipit-source-id: ba48093843864faec57f1f365cce7b5a569c4021
2021-08-20 11:41:14 -07:00
Merlin Mao
ff8953380f Add iterator's lower and upper bounds to TraceRecord (#8677)
Summary:
Trace file V2 added lower/upper bounds to `Iterator::Seek()` and `Iterator::SeekForPrev()`. They were not used anywhere during the execution of a `TraceRecord`. Now they are added to be used by `ReadOptions` during `Iterator::Seek()` and `Iterator::SeekForPrev()` if they are set.

Added test cases in `DBTest2.TraceAndManualReplay`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8677

Reviewed By: zhichao-cao

Differential Revision: D30438255

Pulled By: autopear

fbshipit-source-id: 82563006be0b69155990e506a74951c18af8d288
2021-08-19 17:27:12 -07:00
Baptiste Lemaire
c625b8d017 Add condition on NotifyOnFlushComplete that FlushJob was not mempurge. Add event listeners to mempurge tests. (#8672)
Summary:
Previously, when a `FlushJob` was redirected to a MemPurge, the function `DBImpl::NotifyOnFlushComplete` was called, which created a series of issues because the JobInfo was not correctly collected from the memtables.
This diff aims at correcting these two issues (`FlushJobInfo` collection in `FlushJob::MemPurge` , no call to `DBImpl::NotifyOnFlushComplete` after successful mempurge).
Event listeners were added to the unit tests to handle these situations.
Surprisingly none of the crashtests caught this issue, I will try to add event listeners to crash tests in the future.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8672

Reviewed By: akankshamahajan15

Differential Revision: D30383109

Pulled By: bjlemaire

fbshipit-source-id: 35a8d4295886923ee4049a6447f00022cb221c73
2021-08-18 17:40:01 -07:00
Merlin Mao
d10801e983 Allow Replayer to report the results of TraceRecords. (#8657)
Summary:
`Replayer::Execute()` can directly returns the result (e.g, request latency, DB::Get() return code, returned value, etc.)
`Replayer::Replay()` reports the results via a callback function.

New interface:
`TraceRecordResult` in "rocksdb/trace_record_result.h".

`DBTest2.TraceAndReplay` and `DBTest2.TraceAndManualReplay` are updated accordingly.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8657

Reviewed By: ajkr

Differential Revision: D30290216

Pulled By: autopear

fbshipit-source-id: 3c8d4e6b180ec743de1a9d9dcaee86064c74f0d6
2021-08-18 17:06:14 -07:00
Peter Dillinger
b6269b078a Stable cache keys on ingested SST files (#8669)
Summary:
Extends https://github.com/facebook/rocksdb/issues/8659 to work for ingested external SST files, even
the same file ingested into different DBs sharing a block cache.

Note: These new cache keys are currently only enabled when FileSystem
does not provide GetUniqueId. For now, they are typically larger,
so slightly less efficient.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8669

Test Plan: Extended unit test

Reviewed By: zhichao-cao

Differential Revision: D30398532

Pulled By: pdillinger

fbshipit-source-id: 1f13e2af4b8bfff5741953a69466e9589fbc23c7
2021-08-18 11:33:03 -07:00
Yanqin Jin
2b367fa8cc Fix bug caused by releasing snapshot(s) during compaction (#8608)
Summary:
In debug mode, we are seeing assertion failure as follows

```
db/compaction/compaction_iterator.cc:980: void rocksdb::CompactionIterator::PrepareOutput(): \
Assertion `ikey_.type != kTypeDeletion && ikey_.type != kTypeSingleDeletion' failed.
```

It is caused by releasing earliest snapshot during compaction between the execution of
`NextFromInput()` and `PrepareOutput()`.

In one case, as demonstrated in unit test `WritePreparedTransaction.ReleaseEarliestSnapshotDuringCompaction_WithSD2`,
incorrect result may be returned by a following range scan if we disable assertion, as in opt compilation
level: the SingleDelete marker's sequence number is zeroed out, but the preceding PUT is also
outputted to the SST file after compaction. Due to the logic of DBIter, the PUT will not be
skipped and will be returned by iterator in range scan. https://github.com/facebook/rocksdb/issues/8661 illustrates what happened.

Fix by taking a more conservative approach: make compaction zero out sequence number only
if key is in the earliest snapshot when the compaction starts.

Another assertion failure is
```
Assertion `current_user_key_snapshot_ == last_snapshot' failed.
```

It's caused by releasing the snapshot between the PUT and SingleDelete during compaction.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8608

Test Plan: make check

Reviewed By: jay-zhuang

Differential Revision: D30145645

Pulled By: riversand963

fbshipit-source-id: 699f58e66faf70732ad53810ccef43935d3bbe81
2021-08-17 22:14:20 -07:00
Levi Tamasi
6878cedcc3 Add statistics support to integrated BlobDB (#8667)
Summary:
The patch adds statistics support to the integrated BlobDB implementation,
namely the tickers `BLOB_DB_BLOB_FILE_BYTES_READ` and
`BLOB_DB_GC_{NUM_KEYS,BYTES}_RELOCATED`, and the histograms
`BLOB_DB_(DE)COMPRESSION_MICROS`. (Some other statistics, like
`BLOB_DB_BLOB_FILE_BYTES_WRITTEN`, `BLOB_DB_BLOB_FILE_SYNCED`,
`BLOB_DB_BLOB_FILE_{READ,WRITE,SYNC}_MICROS` were already supported.)
Note that the vast majority of the old BlobDB's tickers/histograms are not
really applicable to the new implementation, since they e.g. pertain to calling
dedicated BlobDB APIs (which the integrated BlobDB does not have) or are
tied to the legacy BlobDB's design of writing blob files synchronously when
a write API is called. Such statistics are marked "legacy BlobDB only" in
`statistics.h`.

Fixes https://github.com/facebook/rocksdb/issues/8645 .

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8667

Test Plan: Ran `make check` and tested the new statistics using `db_bench`.

Reviewed By: riversand963

Differential Revision: D30356884

Pulled By: ltamasi

fbshipit-source-id: 5f8a833faee60401c5643c2f0a6c0415488190a4
2021-08-17 17:22:31 -07:00
Peter Dillinger
a207c27809 Stable cache keys using DB session ids in SSTs (#8659)
Summary:
Use DB session ids in SST table properties to make cache keys
stable across DB re-open and copy / move / restore / etc.

These new cache keys are currently only enabled when FileSystem does not
provide GetUniqueId. For now, they are typically larger, so slightly
less efficient.

Relevant to https://github.com/facebook/rocksdb/issues/7405

This change has a minor regression in PersistentCache functionality:
metaindex blocks are no longer cached in PersistentCache. Table properties
blocks already were not but ideally should be. I didn't spent effort to
fix & test these issues because we don't believe PersistentCache is used much
if at all and expect SecondaryCache to replace it. (Though PRs are welcome.)

FIXME: there is more to be fixed for stable cache keys on external SST files

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8659

Test Plan:
new unit test added, which fails when disabling new
functionality

Reviewed By: zhichao-cao

Differential Revision: D30297705

Pulled By: pdillinger

fbshipit-source-id: e8539a5c8802a79340405629870f2e3fb3822d3a
2021-08-16 20:37:20 -07:00
Adam Retter
5de333fd99 Add db_test2 to to ASSERT_STATUS_CHECKED (#8640)
Summary:
This is the `db_test2` parts of https://github.com/facebook/rocksdb/pull/7737 reworked on the latest HEAD.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8640

Reviewed By: akankshamahajan15

Differential Revision: D30303684

Pulled By: mrambacher

fbshipit-source-id: 263e2f82d849bde4048b60aed8b31e7deed4706a
2021-08-16 08:10:32 -07:00
Jay Zhuang
c55460c734 Add property LiveSstFilesSizeAtTemperature for tiered storage (#8644)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8644

Reviewed By: siying, zhichao-cao

Differential Revision: D30236535

Pulled By: jay-zhuang

fbshipit-source-id: 1758d1c46d83a5087560fb63d53a016bf999da81
2021-08-15 14:17:45 -07:00
Baptiste Lemaire
e51be2c5a1 Improve MemPurge sampling (#8656)
Summary:
Previously, the `MemPurge` sampling function was assessing whether a random entry from a memtable was garbage or not by simply querying the given memtable (see https://github.com/facebook/rocksdb/issues/8628 for more details).
In this diff, I am updating the sampling function by querying not only the memtable the entry was drawn from, but also all subsequent memtables that have a greater memtable ID.
I also added the size of the value for KV entries in the payload/useful payload estimates (which was also one of the reasons why sampling was not as good as mempurging all the time in terms of L0 SST files reduction).
Once these changes were made, I was able to clean obsolete objects and functions from the `MemtableList` struct, and did a bit of cleanup everywhere.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8656

Reviewed By: pdillinger

Differential Revision: D30288583

Pulled By: bjlemaire

fbshipit-source-id: 7646a545ec56f4715949daa59ab5eee74540feb3
2021-08-13 14:35:41 -07:00
Merlin Mao
f58d276764 Make TraceRecord and Replayer public (#8611)
Summary:
New public interfaces:
`TraceRecord` and `TraceRecord::Handler`, available in "rocksdb/trace_record.h".
`Replayer`, available in `rocksdb/utilities/replayer.h`.

User can use `DB::NewDefaultReplayer()` to create a Replayer to auto/manual replay a trace file.

Unit tests:
- `./db_test2 --gtest_filter="DBTest2.TraceAndReplay"`: Updated with the internal API changes.
- `./db_test2 --gtest_filter="DBTest2.TraceAndManualReplay"`: New for manual replay.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8611

Reviewed By: ajkr

Differential Revision: D30266329

Pulled By: autopear

fbshipit-source-id: 1ecb3cbbedae0f6a67c18f0cc82e002b4d81b6f8
2021-08-11 19:32:46 -07:00
Baptiste Lemaire
e3a96c4823 Memtable sampling for mempurge heuristic. (#8628)
Summary:
Changes the API of the MemPurge process: the `bool experimental_allow_mempurge` and `experimental_mempurge_policy` flags have been replaced by a `double experimental_mempurge_threshold` option.
This change of API reflects another major change introduced in this PR: the MemPurgeDecider() function now works by sampling the memtables being flushed to estimate the overall amount of useful payload (payload minus the garbage), and then compare this useful payload estimate with the `double experimental_mempurge_threshold` value.
Therefore, when the value of this flag is `0.0` (default value), mempurge is simply deactivated. On the other hand, a value of `DBL_MAX` would be equivalent to always going through a mempurge regardless of the garbage ratio estimate.
At the moment, a `double experimental_mempurge_threshold` value else than 0.0 or `DBL_MAX` is opnly supported`with the `SkipList` memtable representation.
Regarding the sampling, this PR includes the introduction of a `MemTable::UniqueRandomSample` function that collects (approximately) random entries from the memtable by using the new `SkipList::Iterator::RandomSeek()` under the hood, or by iterating through each memtable entry, depending on the target sample size and the total number of entries.
The unit tests have been readapted to support this new API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8628

Reviewed By: pdillinger

Differential Revision: D30149315

Pulled By: bjlemaire

fbshipit-source-id: 1feef5390c95db6f4480ab4434716533d3947f27
2021-08-10 18:09:03 -07:00
Levi Tamasi
f63331ebaf Attempt to deflake DBTestXactLogIterator.TransactionLogIteratorCorruptedLog (#8627)
Summary:
The patch attempts to deflake `DBTestXactLogIterator.TransactionLogIteratorCorruptedLog`
by disabling file deletions while retrieving the list of WAL files and truncating the first WAL file.
This is to prevent the `PurgeObsoleteFiles` call triggered by `GetSortedWalFiles` from
invalidating the result of `GetSortedWalFiles`. The patch also cleans up the test case a bit
and changes it to using `test::TruncateFile` instead of calling the `truncate` syscall directly.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8627

Test Plan: `make check`

Reviewed By: akankshamahajan15

Differential Revision: D30147002

Pulled By: ltamasi

fbshipit-source-id: db11072a4ad8900a2f859cb5294e22b1888c23f6
2021-08-10 11:10:07 -07:00
Jay Zhuang
61f83dfeb7 Add an unittest for tiered storage universal compaction (#8631)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8631

Reviewed By: siying

Differential Revision: D30200385

Pulled By: jay-zhuang

fbshipit-source-id: 0fa2bb15e74ff81762d767f234078e0fe0106c55
2021-08-09 13:44:23 -07:00
sdong
e7c24168d8 Move old files to warm tier in FIFO compactions (#8310)
Summary:
Some FIFO users want to keep the data for longer, but the old data is rarely accessed. This feature allows users to configure FIFO compaction so that data older than a threshold is moved to a warm storage tier.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8310

Test Plan: Add several unit tests.

Reviewed By: ajkr

Differential Revision: D28493792

fbshipit-source-id: c14824ea634814dee5278b449ab5c98b6e0b5501
2021-08-09 12:51:14 -07:00
Roy Crihfield
d4b75d295f Add more C bindings for OptimisticTransactionDB (#8526)
Summary:
* `rocksdb_optimistictransactiondb_checkpoint_object_create`
* `rocksdb_optimistictransactiondb_write`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8526

Reviewed By: ajkr

Differential Revision: D30076822

Pulled By: jay-zhuang

fbshipit-source-id: a59956a8d5449e75d39a8087fbb2bad148cf697d
2021-08-06 19:10:48 -07:00
Levi Tamasi
87882736ef Fix the sorting of KeyContexts for batched MultiGet (#8633)
Summary:
`CompareKeyContext::operator()` on the trunk has a bug: when comparing
column family IDs, `lhs` is used for both sides of the comparison. This
results in the `KeyContext`s getting sorted solely based on key, which
in turn means that keys with the same column family do not necessarily
form a single range in the sorted list. This violates an assumption of the
batched `MultiGet` logic, leading to the same column family
showing up multiple times in the list of `MultiGetColumnFamilyData`.
The end result is the code attempting to check out the thread-local
`SuperVersion` for the same CF multiple times, causing an
assertion violation in debug builds and memory corruption/crash in
release builds.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8633

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D30169182

Pulled By: ltamasi

fbshipit-source-id: a47710652df7e95b14b40fb710924c11a8478023
2021-08-06 16:27:42 -07:00