Commit Graph

936 Commits

Author SHA1 Message Date
Yanqin Jin
3122cb4358 Revise APIs related to user-defined timestamp (#8946)
Summary:
ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`.
Namely, `WriteOptions` should not include information about "what-to-write", but should just
include information about "how-to-write".

According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore,
this PR removes `WriteOptions::timestamp` for compliance.
After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set
of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and
`SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity
made me reconsider doing it in another PR (maybe).

For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take
extra `timestamp` information when writing to `WriteBatch`es.
These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list.

Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to
`WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps
allocated already and multiple timestamps can be updated.

The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp
size of the default column family. This will be used to allocate space when calling APIs that do not
specify a column family handle.

Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing
some assertions about timestamp to returning Status code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8946

Test Plan:
make check
./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8
./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0

Make sure there is no perf regression by running the following
```
./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom
```

Before this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.831 micros/op 546235 ops/sec;   60.4 MB/s
```
After this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.820 micros/op 549404 ops/sec;   60.8 MB/s
```

Reviewed By: ltamasi

Differential Revision: D33721359

Pulled By: riversand963

fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09
2022-02-01 22:19:01 -08:00
Yanqin Jin
d10c5c08d3 Remove iter_start_seqnum and preserve_deletes (#9430)
Summary:
According to https://github.com/facebook/rocksdb/blob/6.27.fb/db/db_impl/db_impl.cc#L2896:L2911 and https://github.com/facebook/rocksdb/blob/6.27.fb/db/db_impl/db_impl_open.cc#L203:L208,
we are going to remove `iter_start_seqnum` and `preserve_deletes` starting from RocksDB 7.0

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9430

Test Plan: make check and CI

Reviewed By: ajkr

Differential Revision: D33753639

Pulled By: riversand963

fbshipit-source-id: c80aab8e8d8fc33e52472fed524ed703d0ffc8b6
2022-01-28 13:28:38 -08:00
Hui Xiao
1e0e883ca5 Remove deprecated API AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit (#9452)
Summary:
**Context/Summary:**
AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit have been marked as deprecated and it's time to actually remove the code.
- Keep `soft_rate_limit`/`hard_rate_limit` in `cf_mutable_options_type_info` to prevent throwing `InvalidArgument` in `GetColumnFamilyOptionsFromMap` when reading an option file still with these options (e.g, old option file generated from RocksDB before the deprecation)
- Keep `soft_rate_limit`/`hard_rate_limit` in under `OptionsOldApiTest.GetOptionsFromMapTest` to test the case mentioned above.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9452

Test Plan: Rely on my eyeball and CI

Reviewed By: ajkr

Differential Revision: D33804938

Pulled By: hx235

fbshipit-source-id: 133d49f7ec5238d7efceeb0a3122a5792a2b9945
2022-01-27 13:01:09 -08:00
Si Ke
93b1de4f45 Enable db_test running in Centos 32 bit OS and Alpine 32 bit OS (#9294)
Summary:
Closes https://github.com/facebook/rocksdb/issues/9271

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9294

Reviewed By: riversand963, hx235

Differential Revision: D33586002

Pulled By: pdillinger

fbshipit-source-id: 3d1a2fa71023e108613ff03dbd37a5f954fc4920
2022-01-14 11:58:18 -08:00
slk
2e5f764294 Make IncreaseFullHistoryTsLow to a public API (#9221)
Summary:
As (https://github.com/facebook/rocksdb/issues/9210) discussed, the **full_history_ts_low** is a member of CompactRangeOptions currently, which means a CF's fullHistoryTsLow is advanced only when users submit a CompactRange request.
However, users may want to advance the fllHistoryTsLow without an immediate compact.
This merge make IncreaseFullHistoryTsLow to a public API so users can advance each CF's fullHistoryTsLow seperately.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9221

Reviewed By: akankshamahajan15

Differential Revision: D33201106

Pulled By: riversand963

fbshipit-source-id: 9cb1d013ba93260f72e16353e693ffee167b47ee
2021-12-23 11:03:51 -08:00
mrambacher
9a116ab4b4 Add NewMetaDataIterator method (#8692)
Summary:
Fixes a problem where the iterator for metadata was being treated as a non-user key when in fact it was a user key.  This led to a problem where the property keys could not be searched for correctly.

The main exposure of this problem was that the HashIndexReader could not get the "prefixes" property correctly, resulting in the failure of retrieval/creation of the BlockPrefixIndex.

Added BlockBasedTableTest.SeekMetaBlocks test to validate this condition.

Fixing this condition exposed two other tests (SeekWithPrefixLongerThanKey, MultiGetPrefixFilter) that passed incorrectly previously and now failed.  Updated those two tests to pass.  Not sure if the tests are functionally correct/still appropriate, but made them pass...

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8692

Reviewed By: riversand963

Differential Revision: D33119539

Pulled By: mrambacher

fbshipit-source-id: 658969fe9265f73dc184dab97cc3f4eaed2d881a
2021-12-21 11:32:49 -08:00
Si Ke
79f4a04ee3 Get DBTest passing Assert Status Checked (#7737)
Summary:
Closes https://github.com/facebook/rocksdb/pull/7737

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9231

Reviewed By: hx235

Differential Revision: D32978332

Pulled By: pdillinger

fbshipit-source-id: b28900b685d60c668529a90dbaa8e1b357b28f76
2021-12-09 11:00:17 -08:00
mrambacher
7cd5835a28 Make RateLimiter Customizable (#9141)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9141

Reviewed By: zhichao-cao

Differential Revision: D32432190

Pulled By: mrambacher

fbshipit-source-id: 7930ed88a02412128cd407b5063522484e45c6ce
2021-12-01 06:57:02 -08:00
anand76
dddb791c18 Enable a few unit tests to use custom Env objects (#9087)
Summary:
Allow compaction_job_test, db_io_failure_test, dbformat_test, deletefile_test, and fault_injection_test to use a custom Env object. Also move ```RegisterCustomObjects``` declaration to a header file to simplify things.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9087

Test Plan: Run manually using "buck test rocksdb/src:compaction_job_test_fbcode" etc.

Reviewed By: riversand963

Differential Revision: D32007222

Pulled By: anand1976

fbshipit-source-id: 99af58559e25bf61563dfa95dc46e31fa7375792
2021-11-08 11:05:59 -08:00
Peter Dillinger
ad5325a736 Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).

Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990

Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D31582865

Pulled By: pdillinger

fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-18 23:32:01 -07:00
Peter Dillinger
3ffb3baa0b Add (Live)FileStorageInfo API (#8968)
Summary:
New classes FileStorageInfo and LiveFileStorageInfo and
'experimental' function DB::GetLiveFilesStorageInfo, which is intended
to largely replace several fragmented DB functions needed to create
checkpoints and backups.

This function is now used to create checkpoints and backups, because
it fixes many (probably not all) of the prior complexities of checkpoint
not having atomic access to DB metadata. This also ensures strong
functional test coverage of the new API. Specifically, much of the old
CheckpointImpl::CreateCustomCheckpoint has been migrated to and
updated in DBImpl::GetLiveFilesStorageInfo, with the former now
calling the latter.

Also, the class FileStorageInfo in metadata.h compatibly replaces
BackupFileInfo and serves as a new base class for SstFileMetaData.
Some old fields of SstFileMetaData are still provided (for now) but
deprecated.

Although FileStorageInfo::directory is accurate when using db_paths
and/or cf_paths, these have never been supported by Checkpoint
nor BackupEngine and still are not. This change does now detect
these cases and return NotSupported when appropriate. (More work
needed for support.)

Somehow this change broke ProgressCallbackDuringBackup, but
the progress_callback logic was dubious to begin with because it
would call the callback based on copy buffer size, not size actually
copied. Logic and test updated to track size actually copied
per-thread.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8968

Test Plan:
tests updated.
DB::GetLiveFilesStorageInfo mostly tested by use in CheckpointImpl.
DBTest.SnapshotFiles updated to also test GetLiveFilesStorageInfo,
including reading the data after DB close.
Added CheckpointTest.CheckpointWithDbPath (NotSupported).

Reviewed By: siying

Differential Revision: D31242045

Pulled By: pdillinger

fbshipit-source-id: b183d1ce9799e220daaefd6b3b5365d98de676c0
2021-10-16 10:04:32 -07:00
Hui Xiao
58444eadda Make RateLimiter::GetTotalPendingRequest() non pure virtual for backward compability (#8938)
Summary:
Context/Summary:
https://github.com/facebook/rocksdb/pull/8890 added a public API `RateLimiter::GetTotalPendingRequest()` but mistakenly marked it as pure virtual, forcing RateLimiter's derived classes to implement this function and breaking backward compatibility.

This PR makes `RateLimiter::GetTotalPendingRequest()` as non-pure virtual method by providing a trivial implementation in rate_limiter.h

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8938

Test Plan: Passing existing tests

Reviewed By: pdillinger

Differential Revision: D31100661

Pulled By: hx235

fbshipit-source-id: 06eff1005156a6e5a881e393b2c5b2ad706897d8
2021-09-21 21:29:26 -07:00
mrambacher
6924869867 Make SystemClock into a Customizable Class (#8636)
Summary:
Made SystemClock into a Customizable class, complete with CreateFromString.

Cleaned up some of the existing SystemClock implementations that were redundant (NoSleep was the same as the internal one for MockEnv).

Changed MockEnv construction to allow Clock to be passed to the Memory/MockFileSystem.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8636

Reviewed By: zhichao-cao

Differential Revision: D30483360

Pulled By: mrambacher

fbshipit-source-id: cd0e3a876c39f8c98fe13374c06e8edbd5b9f2a1
2021-09-21 09:23:48 -07:00
eharry
0b6be7eb68 Fix WAL log data corruption #8723 (#8746)
Summary:
Fix WAL log data corruption when using DBOptions.manual_wal_flush(true) and WriteOptions.sync(true) together (https://github.com/facebook/rocksdb/issues/8723)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8746

Reviewed By: ajkr

Differential Revision: D30758468

Pulled By: riversand963

fbshipit-source-id: 07c20899d5f2447dc77861b4845efc68a59aa4e8
2021-09-13 20:15:59 -07:00
mrambacher
beed86473a Make MemTableRepFactory into a Customizable class (#8419)
Summary:
This PR does the following:
-> Makes the MemTableRepFactory into a Customizable class and creatable/configurable via CreateFromString
-> Makes the existing implementations compatible with configurations
-> Moves the "SpecialRepFactory" test class into testutil, accessible via the ObjectRegistry or a NewSpecial API

New tests were added to validate the functionality and all existing tests pass.  db_bench and memtablerep_bench were hand-tested to verify the functionality in those tools.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8419

Reviewed By: zhichao-cao

Differential Revision: D29558961

Pulled By: mrambacher

fbshipit-source-id: 81b7229636e4e649a0c914e73ac7b0f8454c931c
2021-09-08 07:46:44 -07:00
Peter Dillinger
4750421ece Replace most typedef with using= (#8751)
Summary:
Old typedef syntax is confusing

Most but not all changes with

    perl -pi -e 's/typedef (.*) ([a-zA-Z0-9_]+);/using $2 = $1;/g' list_of_files
    make format

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8751

Test Plan: existing

Reviewed By: zhichao-cao

Differential Revision: D30745277

Pulled By: pdillinger

fbshipit-source-id: 6f65f0631c3563382d43347896020413cc2366d9
2021-09-07 11:31:59 -07:00
Drewryz
3b27725245 Fix a minor issue with initializing the test path (#8555)
Summary:
The PerThreadDBPath has already specified a slash. It does not need to be specified when initializing the test path.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8555

Reviewed By: ajkr

Differential Revision: D29758399

Pulled By: jay-zhuang

fbshipit-source-id: 6d2b878523e3e8580536e2829cb25489844d9011
2021-07-23 08:38:45 -07:00
Andrew Kryczka
ed8eb436db Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475)
Summary:
Various tests had disabled valgrind due to it slowing down and timing
out (as is the case right now) the CI runs. Where a test was disabled with no comment,
I assumed slowness was the cause. For these tests that were slow under
valgrind, as well as the ones identified in https://github.com/facebook/rocksdb/issues/8352, this PR moves them
behind the compiler flag `-DROCKSDB_FULL_VALGRIND_RUN`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8475

Test Plan: running `make full_valgrind_test`, `make valgrind_test`, `make check`; will verify they appear working correctly

Reviewed By: jay-zhuang

Differential Revision: D29504843

Pulled By: ajkr

fbshipit-source-id: 2aac90749cfbd30d5ce11cb29a07a1b9314eeea7
2021-07-07 11:14:05 -07:00
mrambacher
d45b837701 Fix TSAN issue (#8477)
Summary:
Added mutex to fix TSAN issue

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8477

Reviewed By: zhichao-cao

Differential Revision: D29517053

Pulled By: mrambacher

fbshipit-source-id: 661ccb1f495b7d34874a79e0a3d7aea1123d6047
2021-07-01 11:53:18 -07:00
mrambacher
be219089ad Add BlobMetaData retrieval methods (#8273)
Summary:
Added BlobMetaData to ColumnFamilyMetaData and LiveBlobMetaData and DB API GetLiveBlobMetaData to retrieve it.

First pass at struct.  More tests and maybe fields to come...

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8273

Reviewed By: ltamasi

Differential Revision: D29102400

Pulled By: mrambacher

fbshipit-source-id: 8a2383a4446328be6b91dced9841fdd3dfc80b73
2021-06-28 08:13:29 -07:00
Levi Tamasi
cbb3b25915 Print blob file checksums as hex (#8437)
Summary:
Currently, blob file checksums are incorrectly dumped as raw bytes
in the `ldb manifest_dump` output (i.e. they are not printed as hex).
The patch fixes this and also updates some test cases to reflect that
the checksum value field in `BlobFileAddition` and `SharedBlobFileMetaData`
contains the raw checksum and not a hex string.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8437

Test Plan:
`make check`
Tested using `ldb manifest_dump`

Reviewed By: akankshamahajan15

Differential Revision: D29284170

Pulled By: ltamasi

fbshipit-source-id: d11cfb3435b14cd73c8a3d3eb14fa0f9fa1d2228
2021-06-22 09:49:44 -07:00
Andrew Kryczka
c2a3424de5 Deflake DBTest.L0L1L2AndUpHitCounter (#8259)
Summary:
Previously we saw flakes on platforms like arm on CircleCI, such as the following:

```
Note: Google Test filter = DBTest.L0L1L2AndUpHitCounter
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DBTest
[ RUN      ] DBTest.L0L1L2AndUpHitCounter
db/db_test.cc:5345: Failure
Expected: (TestGetTickerCount(options, GET_HIT_L0)) > (100), actual: 30 vs 100
[  FAILED  ] DBTest.L0L1L2AndUpHitCounter (150 ms)
[----------] 1 test from DBTest (150 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (150 ms total)
[  PASSED  ] 0 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] DBTest.L0L1L2AndUpHitCounter
```

The test was totally non-deterministic, e.g., flush/compaction timing would affect how many files on each level. Furthermore, it depended heavily on platform-specific details, e.g., by having a 32KB memtable, it could become full with a very different number of entries depending on the platform.

This PR rewrites the test to build a deterministic LSM with one file per level.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8259

Reviewed By: mrambacher

Differential Revision: D28178100

Pulled By: ajkr

fbshipit-source-id: 0a03b26e8d23c29d8297c1bccb1b115dce33bdcd
2021-05-04 11:02:59 -07:00
Levi Tamasi
0c6e4674a6 Fix a data race related to DB properties (#8206)
Summary:
Historically, the DB properties `rocksdb.cur-size-active-mem-table`,
`rocksdb.cur-size-all-mem-tables`, and `rocksdb.size-all-mem-tables` called
the method `MemTable::ApproximateMemoryUsage` for mutable memtables,
which is not safe without synchronization. This resulted in data races with
memtable inserts. The patch changes the code handling these properties
to use `MemTable::ApproximateMemoryUsageFast` instead, which returns a
cached value backed by an atomic variable. Two test cases had to be updated
for this change. `MemoryTest.MemTableAndTableReadersTotal` was fixed by
increasing the value size used so each value ends up in its own memtable,
which was the original intention (note: the test has been broken in the sense
that the test code didn't consider that memtable sizes below 64 KB get
increased to 64 KB by `SanitizeOptions`, and has been passing only by
accident). `DBTest.MemoryUsageWithMaxWriteBufferSizeToMaintain` relies on
completely up-to-date values and thus was changed to use `ApproximateMemoryUsage`
directly instead of going through the DB properties. Note: this should be safe in this case
since there's only a single thread involved.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8206

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D27866811

Pulled By: ltamasi

fbshipit-source-id: 7bd754d0565e0a65f1f7f0e78ffc093beef79394
2021-04-19 16:38:02 -07:00
Justin Chapman
d89483098f Assert unlimited max_open_files for FIFO compaction. (#8172)
Summary:
Resolves https://github.com/facebook/rocksdb/issues/8014

- Add an assertion on `DB::Open` to ensure `db_options.max_open_files` is unlimited if FIFO Compaction is being used.
- This is to align with what the docs mention and to prevent premature data deletion.
- Update tests to work with this assertion.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8172

Test Plan:
```bash
$ make check -j$(nproc)

Generated TARGETS Summary:
- 6 libs
- 0 binarys
- 180 tests
```

Reviewed By: ajkr

Differential Revision: D27768792

Pulled By: thejchap

fbshipit-source-id: cf6350535e3a3577fec72bcba75b3c094dc7a6f3
2021-04-14 12:05:47 -07:00
Giuseppe Ottaviano
48cd7a3aae Fix flush reason attribution (#8150)
Summary:
Current flush reason attribution is misleading or incorrect (depending on what the original intention was):

- Flush due to WAL reaching its maximum size is attributed to `kWriteBufferManager`
- Flushes due to full write buffer and write buffer manager are not distinguishable, both are attributed to `kWriteBufferFull`

This changes the first to a new flush reason `kWALFull`, and splits the second between `kWriteBufferManager` and `kWriteBufferFull`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8150

Reviewed By: zhichao-cao

Differential Revision: D27569645

Pulled By: ot

fbshipit-source-id: 7e3c8ca186a6e71976e6b8e937297eebd4b769cc
2021-04-07 23:18:37 -07:00
Peter Dillinger
e7a60d01b2 Revamp WriteController (#8064)
Summary:
WriteController had a number of issues:
* It could introduce a delay of 1ms even if the write rate never exceeded the
configured delayed_write_rate.
* The DB-wide delayed_write_rate could be exceeded in a number of ways
with multiple column families:
  * Wiping all pending delay "debts" when another column family joins
  the delay with GetDelayToken().
  * Resetting last_refill_time_ to (now + sleep amount) means each
  column family can write with delayed_write_rate for large writes.
  * Updating bytes_left_ for a partial refill without updating
  last_refill_time_ would essentially give out random bonuses,
  especially to medium-sized writes.

Now the code is much simpler, with these issues fixed. See comments in
the new code and new (replacement) tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8064

Test Plan: new tests, better than old tests

Reviewed By: mrambacher

Differential Revision: D27064936

Pulled By: pdillinger

fbshipit-source-id: 497c23fe6819340b8f3d440bd634d8a2bc47323f
2021-03-18 09:47:31 -07:00
Levi Tamasi
b708b166dc Fix a harmless data race affecting two test cases (#8055)
Summary:
`DBTest.GetLiveBlobFiles` and `ObsoleteFilesTest.BlobFiles` both modify the
current `Version` in their setup phase, implicitly assuming that no other
threads would touch the `Version` while this is happening. The periodic
stats dumper thread violates this assumption; the patch fixes this by
disabling it in the affected test cases. (Note: the data race is
harmless in the sense that it only affects test code.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8055

Test Plan:
```
COMPILE_WITH_TSAN=1 make db_test -j24
gtest-parallel --repeat=10000 ./db_test --gtest_filter="*GetLiveBlobFiles"
COMPILE_WITH_TSAN=1 make obsolete_files_test -j24
gtest-parallel --repeat=10000 ./obsolete_files_test --gtest_filter="*BlobFiles"
```

Reviewed By: riversand963

Differential Revision: D27022715

Pulled By: ltamasi

fbshipit-source-id: b6cc77ed63d8bc1cbe0603522ff1a572182fc9ab
2021-03-12 16:44:35 -08:00
Levi Tamasi
974458891c Revert "Turn on memtable bloom filter by default. (#6584)" (#7939)
Summary:
This reverts commit ee79a28963.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7939

Reviewed By: siying

Differential Revision: D26298564

Pulled By: ltamasi

fbshipit-source-id: 6d663516e82e6de436f8d5317932ca9a98e152bd
2021-02-06 22:34:30 -08:00
sdong
ee79a28963 Turn on memtable bloom filter by default. (#6584)
Summary:
Memtable bloom filter is useful in many use cases. A default value on with conservative 1.5% memory can benefit more use cases than use cases impacted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6584

Test Plan: Run all existing tests.

Reviewed By: pdillinger

Differential Revision: D20626739

fbshipit-source-id: 1dd45532b932139552519b8c2682bd954550c2f9
2021-02-05 12:59:46 -08:00
Adam Retter
6e0f62f2b6 Add more tests to ASSERT_STATUS_CHECKED (3), API change (#7715)
Summary:
Third batch of adding more tests to ASSERT_STATUS_CHECKED.

* db_compaction_filter_test
* db_compaction_test
* db_dynamic_level_test
* db_inplace_update_test
* db_sst_test
* db_tailing_iter_test
* db_io_failure_test

Also update GetApproximateSizes APIs to all return Status.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7715

Reviewed By: jay-zhuang

Differential Revision: D25806896

Pulled By: pdillinger

fbshipit-source-id: 6cb9d62ba5a756c645812754c596ad3995d7c262
2021-01-06 14:15:02 -08:00
Peter Dillinger
a727efca99 Remove flaky, redundant, and dubious DBTest.SparseMerge (#7800)
Summary:
This test would occasionally fail like this:

    WARNING: c:\users\circleci\project\db\db_test.cc(1343): error: Expected:
    (dbfull()->TEST_MaxNextLevelOverlappingBytes(handles_[1])) <= (20 * 1048576), actual: 33501540 vs 20971520

And being a super old test, it's not structured in a sound way. And it appears that DBTest2.MaxCompactionBytesTest is a better test of what SparseMerge was intended to test. In fact, SparseMerge fails if I set

    options.max_compaction_bytes = options.target_file_size_base * 1000;

Thus, we are removing this negative-value test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7800

Test Plan: Q.E.D.

Reviewed By: ajkr

Differential Revision: D25693366

Pulled By: pdillinger

fbshipit-source-id: 9da07d4dce0559547fc938b2163a2015e956c548
2020-12-23 11:08:12 -08:00
mrambacher
f35f7f2704 Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566)
Summary:
This PR does a few things:

1.  The MockFileSystem class was split out from the MockEnv.  This change would theoretically allow a MockFileSystem to be used by other Environments as well (if we created a means of constructing one).  The MockFileSystem implements a FileSystem in its entirety and does not rely on any Wrapper implementation.

2.  Make the RocksDB test suite work when MOCK_ENV=1 and ENCRYPTED_ENV=1 are set.  To accomplish this, a few things were needed:
- The tests that tried to use the "wrong" environment (Env::Default() instead of env_) were updated
- The MockFileSystem was changed to support the features it was missing or mishandled (such as recursively deleting files in a directory or supporting renaming of a directory).

3.  Updated the test framework to have a ROCKSDB_GTEST_SKIP macro.  This can be used to flag tests that are skipped.  Currently, this defaults to doing nothing (marks the test as SUCCESS) but will mark the tests as SKIPPED when RocksDB is upgraded to a version of gtest that supports this (gtest-1.10).

I have run a full "make check" with MEM_ENV, ENCRYPTED_ENV,  both, and neither under both MacOS and RedHat.  A few tests were disabled/skipped for the MEM/ENCRYPTED cases.  The error_handler_fs_test fails/hangs for MEM_ENV (presumably a timing problem) and I will introduce another PR/issue to track that problem.  (I will also push a change to disable those tests soon).  There is one more test in DBTest2 that also fails which I need to investigate or skip before this PR is merged.

Theoretically, this PR should also allow the test suite to run against an Env loaded from the registry, though I do not have one to try it with currently.

Finally, once this is accepted, it would be nice if there was a CircleCI job to run these tests on a checkin so this effort does not become stale.  I do not know how to do that, so if someone could write that job, it would be appreciated :)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7566

Reviewed By: zhichao-cao

Differential Revision: D24408980

Pulled By: jay-zhuang

fbshipit-source-id: 911b1554a4d0da06fd51feca0c090a4abdcb4a5f
2020-10-27 10:33:09 -07:00
Levi Tamasi
786c1a2cc4 Reduce the number of iterations in DBTest.FileCreationRandomFailure (#7481)
Summary:
`DBTest.FileCreationRandomFailure` frequently times out during our
continuous test runs. (It's a case of "stress test posing as unit test.")
The patch reduces the number of iterations to avoid this. Note that
the lower numbers are still sufficient to trigger both flushes and
compactions, so test coverage is still the same.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7481

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D24034712

Pulled By: ltamasi

fbshipit-source-id: 8731a9446e5a121a1041b00f0df473b9f714935a
2020-10-01 10:42:58 -07:00
sdong
7508175558 Introduce options.check_flush_compaction_key_order (#7467)
Summary:
Introduce an new option options.check_flush_compaction_key_order, by default set to true, which checks key order of flush and compaction, and fail the operation if the order is violated.
Also did minor refactor hash checking code, which consolidates the hashing logic to a vlidation class, where the key ordering logic is added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7467

Test Plan: Add unit tests to validate the check can catch reordering in flush and compaction, and can be properly disabled.

Reviewed By: riversand963

Differential Revision: D24010683

fbshipit-source-id: 8dd6292d2cda8006054e9ded7cfa4bf405f0527c
2020-10-01 10:10:26 -07:00
mrambacher
7d472accdc Bring the Configurable options together (#5753)
Summary:
This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts

Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753

Reviewed By: ajkr

Differential Revision: D23385030

Pulled By: zhichao-cao

fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a
2020-09-14 17:01:01 -07:00
Akanksha Mahajan
963314ffd6 Add unit test for max_write_buffer_size_to_maintain (#7311)
Summary:
Add a unit test case to check memory usage when
max_write_buffer_size_to_maintain is set if flushed immutable memtables are
trimmed timely or not.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7311

Test Plan: Compared the results with before bug fix.

Reviewed By: ltamasi

Differential Revision: D23321702

Pulled By: akankshamahajan15

fbshipit-source-id: da04ee21137d641a07fd499a9e2749eb036fcb1e
2020-08-28 17:38:05 -07:00
sdong
b194c21bba Whole DBTest to skip fsync (#7274)
Summary:
After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative.

This commit reinstates https://github.com/facebook/rocksdb/issues/7049, whose un-revert was lost in an automatic
infrastructure mis-merge.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7274

Test Plan: Run all existing files.

Reviewed By: pdillinger

Differential Revision: D23177444

fbshipit-source-id: 1f61690b2ac6333c3b2c87176fef6b2cba086b33
2020-08-17 18:42:25 -07:00
Peter Dillinger
6ac1d25fd0 Fix+clean up handling of mock sleeps (#7101)
Summary:
We have a number of tests hanging on MacOS and windows due to
mishandling of code for mock sleeps. In addition, the code was in
terrible shape because the same variable (addon_time_) would sometimes
refer to microseconds and sometimes to seconds. One test even assumed it
was nanoseconds but was written to pass anyway.

This has been cleaned up so that DB tests generally use a SpecialEnv
function to mock sleep, for either some number of microseconds or seconds
depending on the function called. But to call one of these, the test must first
call SetMockSleep (precondition enforced with assertion), which also turns
sleeps in RocksDB into mock sleeps. To also removes accounting for actual
clock time, call SetTimeElapseOnlySleepOnReopen, which implies
SetMockSleep (on DB re-open). This latter setting only works by applying
on DB re-open, otherwise havoc can ensue if Env goes back in time with
DB open.

More specifics:

Removed some unused test classes, and updated comments on the general
problem.

Fixed DBSSTTest.GetTotalSstFilesSize using a sync point callback instead
of mock time. For this we have the only modification to production code,
inserting a sync point callback in flush_job.cc, which is not a change to
production behavior.

Removed unnecessary resetting of mock times to 0 in many tests. RocksDB
deals in relative time. Any behaviors relying on absolute date/time are likely
a bug. (The above test DBSSTTest.GetTotalSstFilesSize was the only one
clearly injecting a specific absolute time for actual testing convenience.) Just
in case I misunderstood some test, I put this note in each replacement:
// NOTE: Presumed unnecessary and removed: resetting mock time in env

Strengthened some tests like MergeTestTime, MergeCompactionTimeTest, and
FilterCompactionTimeTest in db_test.cc

stats_history_test and blob_db_test are each their own beast, rather deeply
dependent on MockTimeEnv. Each gets its own variant of a work-around for
TimedWait in a mock time environment. (Reduces redundancy and
inconsistency in stats_history_test.)

Intended follow-up:

Remove TimedWait from the public API of InstrumentedCondVar, and only
make that accessible through Env by passing in an InstrumentedCondVar and
a deadline. Then the Env implementations mocking time can fix this problem
without using sync points. (Test infrastructure using sync points interferes
with individual tests' control over sync points.)

With that change, we can simplify/consolidate the scattered work-arounds.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7101

Test Plan: make check on Linux and MacOS

Reviewed By: zhichao-cao

Differential Revision: D23032815

Pulled By: pdillinger

fbshipit-source-id: 7f33967ada8b83011fb54e8279365c008bd6610b
2020-08-11 12:41:30 -07:00
Remington Brasga
633bff2f19 Fixed typo on Value mismatch error in db_test (#6587)
Summary:
The debug is supposed to print out two keys to show the value mismatch, which was compared just a few lines above.

However, the actual print-out is the same values (so they obviously won't be mismatched)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6587

Reviewed By: riversand963

Differential Revision: D23025279

Pulled By: ajkr

fbshipit-source-id: 4c6c35bc60b273f13c08b5464b6f690d8a5cfe41
2020-08-10 10:06:08 -07:00
Cheng Chang
96ce0470a7 Clean snapshot dir before taking snapshot (#7156)
Summary:
`DBTest::SnapshotFiles` runs the tests in a `while` loop.
Currently, the snapshot directory is not cleaned up in each loop, so previous snapshot files may remain in the next loop's snapshot.
When I'm working on https://github.com/facebook/rocksdb/pull/7129, when checking the tracked WALs in MANIFEST, I find that this test always fails because it reads some unknown WAL. It turns out that the unknown WAL is left from previous loops.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7156

Test Plan: make db_test && ./db_test --gtest_filters=*SnapshotFiles

Reviewed By: siying

Differential Revision: D22668360

Pulled By: cheng-chang

fbshipit-source-id: 69d4aa3506038ba30e218e8ae966357935a99c6c
2020-07-22 13:54:01 -07:00
mrambacher
c7c7b07f06 More Makefile Cleanup (#7097)
Summary:
Cleans up some of the dependencies on test code in the Makefile while building tools:
- Moves the test::RandomString, DBBaseTest::RandomString into Random
- Moves the test::RandomHumanReadableString into Random
- Moves the DestroyDir method into file_utils
- Moves the SetupSyncPointsToMockDirectIO into sync_point.
- Moves the FaultInjection Env and FS classes under env

These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies.  By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.

Tested both release and debug builds via Make and CMake for both static and shared libraries.

More work remains to clean up how the tools are built and remove some unnecessary dependencies.  There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097

Reviewed By: riversand963

Differential Revision: D22463160

Pulled By: pdillinger

fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
2020-07-09 14:35:17 -07:00
Jay Zhuang
00de699096 Replace reinterpret_cast with static_cast_with_check (#7067)
Summary:
Replace `reinterpret_cast` with `static_cast_with_check` for `DBImpl` and `ColumnFamilyHandleImpl`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7067

Reviewed By: siying

Differential Revision: D22361587

Pulled By: jay-zhuang

fbshipit-source-id: dfe9e8f3af39c3d27cc372c55ab9ad905eb0a5a1
2020-07-02 19:25:41 -07:00
Zitan Chen
373d5ac485 BackupEngine verifies table file checksums on creating new backups (#7015)
Summary:
When table file checksums are enabled and stored in the DB manifest by using the RocksDB default crc32c checksum function, BackupEngine will calculate the crc32c checksum of the file to be copied and compare the calculated result with the one stored in the DB manifest before copying the file to the backup directory.

After copying to the backup directory, BackupEngine will verify the checksum of the copied file with the one calculated before copying. This helps detect some rare corruption events such as bit-flips during the copying process.

No verification with checksums in DB manifest will be performed if the table file checksum function is not the RocksDB default crc32c checksum function.

In addition, If `share_table_files` and `share_files_with_checksum` are true, BackupEngine will compare the checksums computed before and after copying of the table files.

Corresponding tests are added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7015

Test Plan: Passed make check

Reviewed By: pdillinger

Differential Revision: D22165732

Pulled By: gg814

fbshipit-source-id: ee0e8cc397c455eba64545c29380b9d9853588ec
2020-07-02 18:15:12 -07:00
Peter Dillinger
52d59e0c93 Revert "Whole DBTest to skip fsync (#7049)" (#7070)
Summary:
This reverts commit 4f1534bdb0.

This commit caused failures and deadlocks in
MultiThreadedDBTest.MultiThreaded/69 and others.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7070

Reviewed By: riversand963

Differential Revision: D22358778

Pulled By: pdillinger

fbshipit-source-id: faf8f2cb469a7063a113921c8e9c64a9f7610dac
2020-07-02 10:22:43 -07:00
sdong
4f1534bdb0 Whole DBTest to skip fsync (#7049)
Summary:
After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7049

Test Plan: Run all existing files.

Reviewed By: pdillinger

Differential Revision: D22301700

fbshipit-source-id: f9a9e3b3b26ce640665a47cb8bff33ba0c89b565
2020-07-01 19:37:56 -07:00
Peter Dillinger
e2fd501d44 Stabilize DBTest.ApproximateSizesMemTable (#7064)
Summary:
Random memtable layouts could cause random failure,
reproducible with command below running for a while. Test now using
deterministic behavior.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7064

Test Plan: while ./db_test --gtest_filter=*SizesMemTable*; do true; done

Reviewed By: siying

Differential Revision: D22339442

Pulled By: pdillinger

fbshipit-source-id: 8e74e5a9b5e88f7030854045a22c12cf561d5de6
2020-07-01 13:52:20 -07:00
Levi Tamasi
e367bc7f4b Clean up blob files based on the linked SST set (#7001)
Summary:
The earlier `VersionBuilder` code only cleaned up blob files that were
marked as entirely consisting of garbage using `VersionEdits` with
`BlobFileGarbage`. This covers the cases when table files go through
regular compaction, where we iterate through the KVs and thus have an
opportunity to calculate the amount of garbage (that is, most cases).
However, it does not help when table files are simply dropped (e.g. deletion
compactions or the `DeleteFile` API). To deal with such cases, the patch
adds logic that cleans up all blob files at the head of the list until the first
one with linked SSTs is found. (As an example, let's assume we have blob files
with numbers 1..10, and the first one with any linked SSTs is number 8.
This means that SSTs in the `Version` only rely on blob files with numbers >= 8,
and thus 1..7 are no longer needed.)

The code change itself is pretty small; however, changing the logic like this
necessitated changes to some tests that have been added recently (namely
to the ones that use blob files in isolation, i.e. without any table files referring
to them). Some of these cases were fixed by bypassing `VersionBuilder` altogether
in order to keep the tests simple (which actually makes them more proper unit tests
as well), while the `VersionBuilder` unit tests were fixed by adding dummy table
files to the test cases as needed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7001

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D22119474

Pulled By: ltamasi

fbshipit-source-id: c6547141355667d4291d9661d6518eb741e7b54a
2020-06-30 15:31:21 -07:00
sdong
58547e533b Disable fsync in some tests to speed them up (#7036)
Summary:
Fsyncing files is not providing more test coverage in many tests. Provide an option in SpecialEnv to turn it off to speed it up and enable this option in some tests with relatively long run time.
Most of those tests can be divided as parameterized gtest too. This two speed up approaches are orthogonal and we can do both if needed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7036

Test Plan: Run all tests and make sure they pass.

Reviewed By: ltamasi

Differential Revision: D22268084

fbshipit-source-id: 6d4a838a1b7328c13931a2a5d93de57aa02afaab
2020-06-29 16:56:59 -07:00
Yanqin Jin
e66199d848 First step towards handling MANIFEST write error (#6949)
Summary:
This PR provides preliminary support for handling IO error during MANIFEST write.
File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted.
One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach.
If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled.
Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled.

Possible future directions:
- Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added.

Test plan (dev server):
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949

Reviewed By: anand1976

Differential Revision: D22026020

Pulled By: riversand963

fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8
2020-06-24 19:07:08 -07:00
Zitan Chen
88db97b06d Add a DB Session ID (#6959)
Summary:
Added DB::GetDbSessionId by using the same format and machinery as DB::GetDbIdentity.
The DB Session ID is generated (and therefore, updated) each time a DB object is opened. It is written to the LOG file right after the line of “DB SUMMARY”.
A test for the uniqueness, for different openings and during the same opening, is also added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6959

Test Plan: Passed make check

Reviewed By: zhichao-cao

Differential Revision: D21951721

Pulled By: gg814

fbshipit-source-id: 958a48a612db49a39998ea703cded45987d3fa8b
2020-06-15 10:47:02 -07:00