Commit Graph

1128 Commits

Author SHA1 Message Date
Zhichao Cao
b0fd1cc45a Introduce a new trace file format (v 0.2) for better extension (#7977)
Summary:
The trace file record and payload encode is fixed, which requires complex backward compatibility resolving. This PR introduce a new trace file format, which makes it easier to add new entries to the payload and does not have backward compatible issues. V 0.1 is still supported in this PR. Added the tracing for lower_bound and upper_bound for iterator.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7977

Test Plan: make check. tested with old trace file in replay and analyzing.

Reviewed By: anand1976

Differential Revision: D26529948

Pulled By: zhichao-cao

fbshipit-source-id: ebb75a127ce3c07c25a1ccc194c551f917896a76
2021-02-18 23:05:35 -08:00
Levi Tamasi
dab4fe5bcd Add checkpoint support to BlobDB (#7959)
Summary:
The patch adds checkpoint support to BlobDB. Blob files are hard linked or
copied, depending on whether the checkpoint directory is on the same filesystem
or not, similarly to table files.

TODO: Add support for blob files to `ExportColumnFamily` and to the checksum
verification logic used by backup/restore.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7959

Test Plan: Ran `make check` and the crash test for a while.

Reviewed By: riversand963

Differential Revision: D26434768

Pulled By: ltamasi

fbshipit-source-id: 994be55a8dc08133028250760fca440d2c7c4dc5
2021-02-17 12:42:36 -08:00
Levi Tamasi
0743eba0c4 Add support for the integrated BlobDB to db_bench (#7956)
Summary:
The patch adds the configuration options of the new BlobDB implementation
to `db_bench` and adjusts the help messages of the old (`StackableDB`-based)
BlobDB's options to make it clear which implementation they pertain to.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7956

Test Plan: Ran `make check` and `db_bench` with the new options.

Reviewed By: jay-zhuang

Differential Revision: D26384808

Pulled By: ltamasi

fbshipit-source-id: b4405bb2c56cfd3506d4c32e3329c08dfdf69c94
2021-02-17 11:10:18 -08:00
Jay Zhuang
00519187a6 Update internal build script (#7957)
Summary:
For internal build.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7957

Test Plan: https://www.internalfb.com/intern/sandcastle/group/nonce/6483925578523975/

Reviewed By: ltamasi

Differential Revision: D26410919

Pulled By: jay-zhuang

fbshipit-source-id: a5f9516c91ea85c384a4208aa73331ecad833d01
2021-02-11 14:55:43 -08:00
David CARLIER
14fbb43f3e db_bench: dump cpu info for Mac. (#7932)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7932

Reviewed By: jay-zhuang

Differential Revision: D26316480

Pulled By: zhichao-cao

fbshipit-source-id: 3e002e49fcb7f60bc9270550a6b3e182fe197551
2021-02-10 12:56:44 -08:00
Levi Tamasi
4fba83b4c2 Fix db_bench_tool_test (#7935)
Summary:
The patch fixes the build for `db_bench_tool_test` and makes the tests pass.
Namely, it fixes the following issues:

* https://github.com/facebook/rocksdb/issues/7703 removed the member variable `fs_` but the test case `OptionsFileMultiLevelUniversal`
was not updated.
* https://github.com/facebook/rocksdb/issues/7344 fixed the `OptionsFile` test case for the case when Snappy is *not* available but at the
same time broke it for the case when it *is* available. (The test used a default-constructed
`ColumnFamilyOptions` object, and the default value of the `compression` option is either
Snappy or no compression depending on whether Snappy is supported.)
* The test used `google::ParseCommandLineFlags` instead of
`GFLAGS_NAMESPACE::ParseCommandLineFlags`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7935

Test Plan: Ran the test both with and without Snappy support.

Reviewed By: zhichao-cao

Differential Revision: D26269765

Pulled By: ltamasi

fbshipit-source-id: b7303d8a981ab299d22ab540e0cbd12d149ed9bb
2021-02-05 15:41:48 -08:00
Levi Tamasi
0288bdbc53 Add the integrated BlobDB to the stress/crash tests (#7900)
Summary:
The patch adds support for the options related to the new BlobDB implementation
to `db_stress`, including support for dynamically adjusting them using `SetOptions`
when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically`
are specified. (The latter is used to prevent the options from being enabled when
incompatible features are in use.)

The patch also updates the `db_stress` help messages of the existing stacked BlobDB
related options to clarify that they pertain to the old implementation. In addition, it
adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion
of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN),
and to also test BlobDB in conjunction with atomic flush and transactions, the script sets
the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900

Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options.

Reviewed By: jay-zhuang

Differential Revision: D26094913

Pulled By: ltamasi

fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814
2021-02-02 11:41:18 -08:00
Zhichao Cao
108e6b6354 Return Status::OK for unimplemented write batch handler in trace analyzer (#7910)
Summary:
The unimplemented handler will return Status::InvalidArgument() and caused issues when using trace analyzer for write batch record. Override with returning Status::OK()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7910

Test Plan: tested with real trace, make check

Reviewed By: siying

Differential Revision: D26154327

Pulled By: zhichao-cao

fbshipit-source-id: bcdefd4891f839b2e89e4c079f9f430245f482fb
2021-02-02 10:52:28 -08:00
Andrew Kryczka
78ee8564ad Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).

The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.

When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748

Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.

Reviewed By: pdillinger

Differential Revision: D25754492

Pulled By: ajkr

fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 12:18:58 -08:00
mrambacher
4a09d632c4 Remove Legacy and Custom FileWrapper classes from header files (#7851)
Summary:
Removed the uses of the Legacy FileWrapper classes from the source code.  The wrappers were creating an additional layer of indirection/wrapping, as the Env already has a FileSystem.

Moved the Custom FileWrapper classes into the CustomEnv, as these classes are really for the private use the the CustomEnv class.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7851

Reviewed By: anand1976

Differential Revision: D26114816

Pulled By: mrambacher

fbshipit-source-id: db32840e58d969d3a0fa6c25aaf13d6dcdc74150
2021-01-28 22:10:32 -08:00
mrambacher
0a9a05ae12 Make builds reproducible (#7866)
Summary:
Closes https://github.com/facebook/rocksdb/issues/7035

Changed how build_version.cc was generated:
- Included the GIT tag/branch in the build_version file
- Changed the "Build Date" to be:
      - If the GIT branch is "clean" (no changes), the date of the last git commit
      - If the branch is not clean, the current date
 - Added APIs to access the "build information", rather than accessing the strings directly.

The build_version.cc file is now regenerated whenever the library objects are rebuilt.

Verified that the built files remain the same size across builds on a "clean build" and the same information is reported by sst_dump --version

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7866

Reviewed By: pdillinger

Differential Revision: D26086565

Pulled By: mrambacher

fbshipit-source-id: 6fcbe47f6033989d5cf26a0ccb6dfdd9dd239d7f
2021-01-28 17:42:16 -08:00
anand76
4ee991b1e6 Cleanup multiple DBs after running db_bench in multi-DB mode (#7891)
Summary:
Currently, db_bench cleanup only deletes the main DB, if there's one.
Multiple DBs that are opened when --num_multi_db is specified are not
deleted, which can lead to crashes due to running compaction threads on
process exit.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7891

Test Plan: Run regression test

Reviewed By: jay-zhuang

Differential Revision: D26049914

Pulled By: anand1976

fbshipit-source-id: acef2821001ca5e208a96a6a273c724e56353316
2021-01-26 11:12:22 -08:00
Levi Tamasi
f9a30e0a5a Add 6.16 and 6.17 to check_format_compatible.sh (#7895)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7895

Test Plan: `tools/check_format_compatible.sh`

Reviewed By: zhichao-cao

Differential Revision: D26055885

Pulled By: ltamasi

fbshipit-source-id: fd669a439e7bf924b6abd9ef209130f528768c06
2021-01-26 09:39:21 -08:00
mrambacher
12f1137355 Add a SystemClock class to capture the time functions of an Env (#7858)
Summary:
Introduces and uses a SystemClock class to RocksDB.  This class contains the time-related functions of an Env and these functions can be redirected from the Env to the SystemClock.

Many of the places that used an Env (Timer, PerfStepTimer, RepeatableThread, RateLimiter, WriteController) for time-related functions have been changed to use SystemClock instead.  There are likely more places that can be changed, but this is a start to show what can/should be done.  Over time it would be nice to migrate most (if not all) of the uses of the time functions from the Env to the SystemClock.

There are several Env classes that implement these functions.  Most of these have not been converted yet to SystemClock implementations; that will come in a subsequent PR.  It would be good to unify many of the Mock Timer implementations, so that they behave similarly and be tested similarly (some override Sleep, some use a MockSleep, etc).

Additionally, this change will allow new methods to be introduced to the SystemClock (like https://github.com/facebook/rocksdb/issues/7101 WaitFor) in a consistent manner across a smaller number of classes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7858

Reviewed By: pdillinger

Differential Revision: D26006406

Pulled By: mrambacher

fbshipit-source-id: ed10a8abbdab7ff2e23d69d85bd25b3e7e899e90
2021-01-25 22:09:11 -08:00
Akanksha Mahajan
1d226018af In IOTracing, add filename with each operation in trace file. (#7885)
Summary:
1. In IOTracing, add filename with each IOTrace record. Filename is stored in file object (Tracing Wrappers).
         2. Change the logic of figuring out which additional information (file_size,
            length, offset etc) needs to be store with each operation
            which is different for different operations.
            When new information will be added in future (depends on operation),
            this change would make the future additions simple.

Logic: In IOTraceRecord, io_op_data is added and its
         bitwise positions represent which additional information need
         to added in the record from enum IOTraceOp. Values in IOTraceOp represent bitwise positions.
         So if length and offset needs to be stored (IOTraceOp::kIOLen
         is 1 and IOTraceOp::kIOOffset is 2), position 1 and 2 (from rightmost bit) will be set
         and io_op_data will contain 110.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7885

Test Plan: Updated io_tracer_test and verified the trace file manually.

Reviewed By: anand1976

Differential Revision: D25982353

Pulled By: akankshamahajan15

fbshipit-source-id: ebfc5539cc0e231d7794a6b42b73f5403e360b22
2021-01-25 14:37:35 -08:00
anand76
7189ea8fb7 Make regression test load options from file for checkpoint (#7864)
Summary:
The regression_test.sh script checkpoints the DB directory before running db_bench on it. Specify the --try_load_options when creating the checkpoint in order to load options from the OPTIONS file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7864

Test Plan: manually run db_bench on the checkpoint dir

Reviewed By: akankshamahajan15

Differential Revision: D25926960

Pulled By: anand1976

fbshipit-source-id: d3442ae24a7044b474dc80efc9c06bdc6ebe0388
2021-01-15 11:16:28 -08:00
anand76
8e7b068ecc Make ldb load column family options from OPTIONS file (#7847)
Summary:
When the --try_load_options is used in conjunction with the
--column_family option, ldb incorrectly sets the ColumnFamilyOptions for
that column family to defaults. This PR fixes that by retaining from the
OPTIONS file and applying command line overrides.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7847

Test Plan: Add a unit test in ldb_cmd_test

Reviewed By: ajkr

Differential Revision: D25874720

Pulled By: anand1976

fbshipit-source-id: 04bcf23b55e5a30b5b6a59b0e5cb4faef3da7429
2021-01-11 20:56:34 -08:00
Adam Retter
6e0f62f2b6 Add more tests to ASSERT_STATUS_CHECKED (3), API change (#7715)
Summary:
Third batch of adding more tests to ASSERT_STATUS_CHECKED.

* db_compaction_filter_test
* db_compaction_test
* db_dynamic_level_test
* db_inplace_update_test
* db_sst_test
* db_tailing_iter_test
* db_io_failure_test

Also update GetApproximateSizes APIs to all return Status.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7715

Reviewed By: jay-zhuang

Differential Revision: D25806896

Pulled By: pdillinger

fbshipit-source-id: 6cb9d62ba5a756c645812754c596ad3995d7c262
2021-01-06 14:15:02 -08:00
mrambacher
e628f59e87 Create a CustomEnv class; Add WinFileSystem; Make LegacyFileSystemWrapper private (#7703)
Summary:
This PR does the following:
-> Creates a WinFileSystem class.  This class is the Windows equivalent of the PosixFileSystem and will be used on Windows systems.
-> Introduces a CustomEnv class.  A CustomEnv is an Env that takes a FileSystem as constructor argument.  I believe there will only ever be two implementations of this class (PosixEnv and WinEnv).  There is still a CustomEnvWrapper class that takes an Env and a FileSystem and wraps the Env calls with the input Env but uses the FileSystem for the FileSystem calls
-> Eliminates the public uses of the LegacyFileSystemWrapper.

With this change in place, there are effectively the following patterns of Env:
- "Base Env classes" (PosixEnv, WinEnv).  These classes implement the core Env functions (e.g. Threads) and have a hard-coded input FileSystem.  These classes inherit from CompositeEnv, implement the core Env functions (threads) and delegate the FileSystem-like calls to the input file system.
- Wrapped Composite Env classes (MemEnv).  These classes take in an Env and a FileSystem.  The core env functions are re-directed to the wrapped env.  The file system calls are redirected to the input file system
- Legacy Wrapped Env classes.  These classes take in an Env input (but no FileSystem).  The core env functions are re-directed to the wrapped env.  A "Legacy File System" is created using this env and the file system calls directed to the env itself.

With these changes in place, the PosixEnv becomes a singleton -- there is only ever one created.  Any other use of the PosixEnv is via another wrapped env.  This cleans up some of the issues with the env construction and destruction.

Additionally, there were places in the code that required had an Env when they required a FileSystem.  Many of these places would wrap the Env with a LegacyFileSystemWrapper instead of using the env->GetFileSystem().  These places were changed, thereby removing layers of additional redirection (LegacyFileSystem --> Env --> Env::FileSystem).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7703

Reviewed By: zhichao-cao

Differential Revision: D25762190

Pulled By: anand1976

fbshipit-source-id: 1a088e97fc916f28ac69c149cd1dcad0ab31704b
2021-01-06 10:49:32 -08:00
Andrew Kryczka
b8c01ed38a Support --hex flag in ldb file_checksum_dump (#7820)
Summary:
Prior to this PR it prints the raw bytes which can include non-printable
characters. This PR adds the option to print in hex instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7820

Test Plan:
try it out

```
$ ./ldb file_checksum_dump --hex --db=/tmp/rocksdbtest-9383//db_basic_test_12281129388755189514/
16, FileChecksumCrc32c, 0xC789D948
```

Reviewed By: jay-zhuang

Differential Revision: D25738072

Pulled By: ajkr

fbshipit-source-id: 8cf2856877971756c0495cfa63a9a1281c414dc7
2021-01-04 11:13:14 -08:00
anand76
d7738666b0 Fix db_bench duration for multireadrandom benchmark (#7817)
Summary:
The multireadrandom benchmark, when run for a specific number of reads (--reads argument), should base the duration on the actual number of keys read rather than number of batches.

Tests:
Run db_bench multireadrandom benchmark

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7817

Reviewed By: zhichao-cao

Differential Revision: D25717230

Pulled By: anand1976

fbshipit-source-id: 13f4d8162268cf9a34918655e60302d0aba3864b
2020-12-28 13:38:10 -08:00
mrambacher
55e99688cc No elide constructors (#7798)
Summary:
Added "no-elide-constructors to the ASSERT_STATUS_CHECK builds.  This flag gives more errors/warnings for some of the Status checks where an inner class checks a Status and later returns it.  In this case,  without the elide check on, the returned status may not have been checked in the caller, thereby bypassing the checked code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7798

Reviewed By: jay-zhuang

Differential Revision: D25680451

Pulled By: pdillinger

fbshipit-source-id: c3f14ed9e2a13f0a8c54d839d5fb4d1fc1e93917
2020-12-23 16:55:53 -08:00
anand76
bd2645bc34 Update regression_test.sh to run multireadrandom benchmark (#7802)
Summary:
Update the regression_test.sh script to run the multireadrandom benchmark.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7802

Reviewed By: zhichao-cao

Differential Revision: D25685482

Pulled By: anand1976

fbshipit-source-id: ef2973b551a1bbdbce198a0adf29fc277f3e65e2
2020-12-23 11:26:12 -08:00
mrambacher
02418194d7 Add more tests for assert status checked (#7524)
Summary:
Added 10 more tests that pass the ASSERT_STATUS_CHECKED test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7524

Reviewed By: akankshamahajan15

Differential Revision: D24323093

Pulled By: ajkr

fbshipit-source-id: 28d4106d0ca1740c3b896c755edf82d504b74801
2020-12-22 23:45:58 -08:00
sdong
f4db3e4119 Avoid to force PORTABLE mode in tools/regression_test.sh (#7806)
Summary:
Right now tools/regression_test.sh always builds RocksDB with PORTABLE=1. There isn't a reason for that. Remove it. Users can always specify PORTABLE through  envirionement variable.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7806

Test Plan: Run tools/regression_test.sh and see it still builds.

Reviewed By: ajkr

Differential Revision: D25687911

fbshipit-source-id: 1c0b03e5df890babc8b7d8af48b48774d9a4600c
2020-12-22 16:54:07 -08:00
Peter Dillinger
4d897e51df Migrate away from Travis+Linux+amd64 (#7791)
Summary:
This disables Linux/amd64 builds in Travis for PRs, and adds a
gcc-10+c++20 build in CircleCI, which should fill out sufficient coverage
vs. what we had in Travis

Fixed a use of std::is_pod, which is deprecated in c++20

Fixed ++ on a volatile in db_repl_stress.cc, with bigger refactoring.
Although ++ on this volatile was probably ok with one thread writer and
one thread reader, the code was still overly complex. There was a
deadcode check for error
`if (replThread.no_read < dataPump.no_records)` which can be proven
never to happen based on the structure of the code. It infinite loops
instead for the case intended to be checked. I just simplified the code
for what should be the same checking power.

Also most configurations seem to be using make parallelism = 2 * vcores,
so fixing / using that.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7791

Test Plan:
CI
and `while ./db_repl_stress; do echo again; done` for a while

Reviewed By: siying

Differential Revision: D25669834

Pulled By: pdillinger

fbshipit-source-id: b2c688053d0b1d52c989903449d3cd27a04130d6
2020-12-22 00:20:57 -08:00
Jay Zhuang
fd0d35d390 Fix block_cache_test failure (#7783)
Summary:
`block_cache_tracer_test` and `block_cache_trace_analyzer_test` are using the same test directory. Which causes build failure if these 2 tests are running in parallel, for example: https://app.circleci.com/pipelines/github/facebook/rocksdb/5211/workflows/8639afbe-9fec-43e2-a6a4-6d47ea9cbcbe/jobs/74598/tests

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7783

Reviewed By: akankshamahajan15

Differential Revision: D25656762

Pulled By: jay-zhuang

fbshipit-source-id: 68aa020aa5b4b3bce324315edecb4e1a60cc18e6
2020-12-21 08:47:08 -08:00
Peter Dillinger
4d1ac19e3d aggregated-table-properties with GetMapProperty (#7779)
Summary:
So that we can more easily get aggregate live table data such
as total filter, index, and data sizes.

Also adds ldb support for getting properties

Also fixed some missing/inaccurate related comments in db.h

For example:

    $ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties
    rocksdb.aggregated-table-properties.data_size: 102871
    rocksdb.aggregated-table-properties.filter_size: 0
    rocksdb.aggregated-table-properties.index_partitions: 0
    rocksdb.aggregated-table-properties.index_size: 2232
    rocksdb.aggregated-table-properties.num_data_blocks: 100
    rocksdb.aggregated-table-properties.num_deletions: 0
    rocksdb.aggregated-table-properties.num_entries: 15000
    rocksdb.aggregated-table-properties.num_merge_operands: 0
    rocksdb.aggregated-table-properties.num_range_deletions: 0
    rocksdb.aggregated-table-properties.raw_key_size: 288890
    rocksdb.aggregated-table-properties.raw_value_size: 198890
    rocksdb.aggregated-table-properties.top_level_index_size: 0
    $ ./ldb --db=testdb get_property rocksdb.aggregated-table-properties-at-level1
    rocksdb.aggregated-table-properties-at-level1.data_size: 80909
    rocksdb.aggregated-table-properties-at-level1.filter_size: 0
    rocksdb.aggregated-table-properties-at-level1.index_partitions: 0
    rocksdb.aggregated-table-properties-at-level1.index_size: 1787
    rocksdb.aggregated-table-properties-at-level1.num_data_blocks: 81
    rocksdb.aggregated-table-properties-at-level1.num_deletions: 0
    rocksdb.aggregated-table-properties-at-level1.num_entries: 12466
    rocksdb.aggregated-table-properties-at-level1.num_merge_operands: 0
    rocksdb.aggregated-table-properties-at-level1.num_range_deletions: 0
    rocksdb.aggregated-table-properties-at-level1.raw_key_size: 238210
    rocksdb.aggregated-table-properties-at-level1.raw_value_size: 163414
    rocksdb.aggregated-table-properties-at-level1.top_level_index_size: 0
    $

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7779

Test Plan: Added a test to ldb_test.py

Reviewed By: jay-zhuang

Differential Revision: D25653103

Pulled By: pdillinger

fbshipit-source-id: 2905469a08a64dd6b5510cbd7be2e64d3234d6d3
2020-12-19 08:00:14 -08:00
Peter Dillinger
239d17a19c Support optimize_filters_for_memory for Ribbon filter (#7774)
Summary:
Primarily this change refactors the optimize_filters_for_memory
code for Bloom filters, based on malloc_usable_size, to also work for
Ribbon filters.

This change also replaces the somewhat slow but general
BuiltinFilterBitsBuilder::ApproximateNumEntries with
implementation-specific versions for Ribbon (new) and Legacy Bloom
(based on a recently deleted version). The reason is to emphasize
speed in ApproximateNumEntries rather than 100% accuracy.

Justification: ApproximateNumEntries (formerly CalculateNumEntry) is
only used by RocksDB for range-partitioned filters, called each time we
start to construct one. (In theory, it should be possible to reuse the
estimate, but the abstractions provided by FilterPolicy don't really
make that workable.) But this is only used as a heuristic estimate for
hitting a desired partitioned filter size because of alignment to data
blocks, which have various numbers of unique keys or prefixes. The two
factors lead us to prioritize reasonable speed over 100% accuracy.

optimize_filters_for_memory adds extra complication, because precisely
calculating num_entries for some allowed number of bytes depends on state
with optimize_filters_for_memory enabled. And the allocator-agnostic
implementation of optimize_filters_for_memory, using malloc_usable_size,
means we would have to actually allocate memory, many times, just to
precisely determine how many entries (keys) could be added and stay below
some size budget, for the current state. (In a draft, I got this
working, and then realized the balance of speed vs. accuracy was all
wrong.)

So related to that, I have made CalculateSpace, an internal-only API
only used for testing, non-authoritative also if
optimize_filters_for_memory is enabled. This simplifies some code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7774

Test Plan:
unit test updated, and for FilterSize test, range of tested
values is greatly expanded (still super fast)

Also tested `db_bench -benchmarks=fillrandom,stats -bloom_bits=10 -num=1000000 -partition_index_and_filters -format_version=5 [-optimize_filters_for_memory] [-use_ribbon_filter]` with temporary debug output of generated filter sizes.

Bloom+optimize_filters_for_memory:

      1 Filter size: 197 (224 in memory)
    134 Filter size: 3525 (3584 in memory)
    107 Filter size: 4037 (4096 in memory)
    Total on disk: 904,506
    Total in memory: 918,752

Ribbon+optimize_filters_for_memory:

      1 Filter size: 3061 (3072 in memory)
    110 Filter size: 3573 (3584 in memory)
     58 Filter size: 4085 (4096 in memory)
    Total on disk: 633,021 (-30.0%)
    Total in memory: 634,880 (-30.9%)

Bloom (no offm):

      1 Filter size: 261 (320 in memory)
      1 Filter size: 3333 (3584 in memory)
    240 Filter size: 3717 (4096 in memory)
    Total on disk: 895,674 (-1% on disk vs. +offm; known tolerable overhead of offm)
    Total in memory: 986,944 (+7.4% vs. +offm)

Ribbon (no offm):

      1 Filter size: 2949 (3072 in memory)
      1 Filter size: 3381 (3584 in memory)
    167 Filter size: 3701 (4096 in memory)
    Total on disk: 624,397 (-30.3% vs. Bloom)
    Total in memory: 690,688 (-30.0% vs. Bloom)

Note that optimize_filters_for_memory is even more effective for Ribbon filter than for cache-local Bloom, because it can close the unused memory gap even tighter than Bloom filter, because of 16 byte increments for Ribbon vs. 64 byte increments for Bloom.

Reviewed By: jay-zhuang

Differential Revision: D25592970

Pulled By: pdillinger

fbshipit-source-id: 606fdaa025bb790d7e9c21601e8ea86e10541912
2020-12-18 14:31:03 -08:00
Zhichao Cao
04b3524ad0 Inject the random write error to stress test (#7653)
Summary:
Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653

Test Plan: pass db_stress and python3 db_crashtest.py blackbox

Reviewed By: ajkr

Differential Revision: D25354132

Pulled By: zhichao-cao

fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669
2020-12-17 11:52:28 -08:00
Ramkumar Vadivelu
ac2f90d6f9 add 6.15.fb to check_format_compatible.sh (#7738)
Summary:
Update check_format_compatible.sh with 6.15.fb

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7738

Reviewed By: ajkr

Differential Revision: D25307717

Pulled By: ramvadiv

fbshipit-source-id: 49f5c6366e8c8a2ade9697975453c9c65e919f1b
2020-12-03 12:45:14 -08:00
Mammo, Mulugeta
1861de455e Add arena_block_size flag to db_bench (#7654)
Summary:
db_bench currently does not allow overriding the default `arena_block_size `calculation ([memtable size/8](https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L216)). For memtables whose size is in gigabytes, the `arena_block_size` defaults to hundreds of megabytes (affecting performance).

Exposing this option in db_bench would allow us to test the workloads with various `arena_block_size` values.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7654

Reviewed By: jay-zhuang

Differential Revision: D24996812

Pulled By: ajkr

fbshipit-source-id: a5e3d2c83d9f89e1bb8382f2e8dd476c79e33bef
2020-11-16 13:06:30 -08:00
Peter Dillinger
60af964372 Experimental (production candidate) SST schema for Ribbon filter (#7658)
Summary:
Added experimental public API for Ribbon filter:
NewExperimentalRibbonFilterPolicy(). This experimental API will
take a "Bloom equivalent" bits per key, and configure the Ribbon
filter for the same FP rate as Bloom would have but ~30% space
savings. (Note: optimize_filters_for_memory is not yet implemented
for Ribbon filter. That can be added with no effect on schema.)

Internally, the Ribbon filter is configured using a "one_in_fp_rate"
value, which is 1 over desired FP rate. For example, use 100 for 1%
FP rate. I'm expecting this will be used in the future for configuring
Bloom-like filters, as I expect people to more commonly hold constant
the filter accuracy and change the space vs. time trade-off, rather than
hold constant the space (per key) and change the accuracy vs. time
trade-off, though we might make that available.

### Benchmarking

```
$ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 34.1341
Number of filters: 1993
Total size (MB): 238.488
Reported total allocated memory (MB): 262.875
Reported internal fragmentation: 10.2255%
Bits/key stored: 10.0029
----------------------------
Mixed inside/outside queries...
  Single filter net ns/op: 18.7508
  Random filter net ns/op: 258.246
    Average FP rate %: 0.968672
----------------------------
Done. (For more info, run with -legend or -help.)
$ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 130.851
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
----------------------------
Mixed inside/outside queries...
  Single filter net ns/op: 58.4523
  Random filter net ns/op: 363.717
    Average FP rate %: 0.952978
----------------------------
Done. (For more info, run with -legend or -help.)
```

168.166 / 238.488 = 0.705  -> 29.5% space reduction

130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction)

### Working around a hashing "flaw"

bloom_test discovered a flaw in the simple hashing applied in
StandardHasher when num_starts == 1 (num_slots == 128), showing an
excessively high FP rate.  The problem is that when many entries, on the
order of number of hash bits or kCoeffBits, are associated with the same
start location, the correlation between the CoeffRow and ResultRow (for
efficiency) can lead to a solution that is "universal," or nearly so, for
entries mapping to that start location. (Normally, variance in start
location breaks the effective association between CoeffRow and
ResultRow; the same value for CoeffRow is effectively different if start
locations are different.) Without kUseSmash and with num_starts > 1 (thus
num_starts ~= num_slots), this flaw should be completely irrelevant.  Even
with 10M slots, the chances of a single slot having just 16 (or more)
entries map to it--not enough to cause an FP problem, which would be local
to that slot if it happened--is 1 in millions. This spreadsheet formula
shows that: =1/(10000000*(1 - POISSON(15, 1, TRUE)))

As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is
intended for CPU efficiency of filters with many more entries/slots than
kCoeffBits, a very reasonable work-around is to disallow num_starts==1
when !kUseSmash, by making the minimum non-zero number of slots
2*kCoeffBits. This is the work-around I've applied. This also means that
the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not
space-efficient for less than a few hundred entries. Because of this, I
have made it fall back on constructing a Bloom filter, under existing
schema, when that is more space efficient for small filters. (We can
change this in the future if we want.)

TODO: better unit tests for this case in ribbon_test, and probably
update StandardHasher for kUseSmash case so that it can scale nicely to
small filters.

### Other related changes

* Add Ribbon filter to stress/crash test
* Add Ribbon filter to filter_bench as -impl=3
* Add option string support, as in "filter_policy=experimental_ribbon:5.678;"
where 5.678 is the Bloom equivalent bits per key.
* Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom
* Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on
binary searching CalculateSpace (inefficient), so that subclasses
(especially experimental ones) don't have to provide an efficient
implementation inverting CalculateSpace.
* Minor refactor FastLocalBloomBitsBuilder for new base class
XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder,
which allows the latter to fall back on Bloom construction in some
extreme cases.
* Mostly updated bloom_test for Ribbon filter, though a test like
FullBloomTest::Schema is a next TODO to ensure schema stability
(in case this becomes production-ready schema as it is).
* Add some APIs to ribbon_impl.h for configuring Ribbon filters.
Although these are reasonably covered by bloom_test, TODO more unit
tests in ribbon_test
* Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data
for constructing the linear approximations in GetNumSlotsFor95PctSuccess.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658

Test Plan:
Some unit tests updated but other testing is left TODO. This
is considered experimental but laying down schema compatibility as early
as possible in case it proves production-quality. Also tested in
stress/crash test.

Reviewed By: jay-zhuang

Differential Revision: D24899349

Pulled By: pdillinger

fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8
2020-11-12 20:46:14 -08:00
Akanksha Mahajan
202605143b Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643)
Summary:
crash tests donot run in DEBUG_MODE=0 on tmpfs when
use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because
direct I/O is not supported on tmpfs and tests exit.

Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash
tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are
unset so we can sanitize direct I/O reads options in case of tmpfs as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643

Test Plan:
1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm";
           export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0";
           make crash_test -j64
           2. In DEBUG_LEVEL=1 mode:  make crash_test -j64

Reviewed By: jay-zhuang

Differential Revision: D24766550

Pulled By: akankshamahajan15

fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e
2020-11-10 10:50:34 -08:00
Akanksha Mahajan
06a92fcf5c Add "max_write_buffer_size_to_maintain" to crash test (#7634)
Summary:
Add "max_write_buffer_size_to_maintain" to crash test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634

Test Plan: make crash_test -j64

Reviewed By: zhichao-cao

Differential Revision: D24710401

Pulled By: akankshamahajan15

fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7
2020-11-03 13:55:18 -08:00
Akanksha Mahajan
6773901f76 Add 6.14 branch to check_format_compatible.sh (#7613)
Summary:
Add 6.14 to check_format_compatible.sh

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7613

Test Plan: ./tools/check_format_compatible.sh

Reviewed By: riversand963

Differential Revision: D24628535

Pulled By: akankshamahajan15

fbshipit-source-id: a8bf1d5505a1fcc8a5bedc5ff4fdf33a22c3f2e6
2020-10-29 15:51:50 -07:00
Yanqin Jin
394210f280 Remove unused includes (#7604)
Summary:
This is a PR generated **semi-automatically** by an internal tool to remove unused includes and `using` statements.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D24579392

Pulled By: riversand963

fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd
2020-10-28 23:22:27 -07:00
Ramkumar Vadivelu
9a690a74e1 In ParseInternalKey(), include corrupt key info in Status (#7515)
Summary:
Fixes Issue https://github.com/facebook/rocksdb/issues/7497

When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`

Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.

Tests:
- make check
- some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
- some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test

Example of new status returns:
- Key too small - `Corrupted Key: Internal Key too small. Size=5`
- Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
- Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515

Reviewed By: ajkr

Differential Revision: D24240264

Pulled By: ramvadiv

fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
2020-10-28 10:12:58 -07:00
Zhichao Cao
d8ec0a760a Make FileType Public and Replace kLogFile with kWalFile (#7580)
Summary:
As suggested by pdillinger ,The name of kLogFile is misleading, in some tests, kLogFile is defined as info log. Replace it with kWalFile and move it to public, which will be used in https://github.com/facebook/rocksdb/issues/7523

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7580

Test Plan: make check

Reviewed By: riversand963

Differential Revision: D24485420

Pulled By: zhichao-cao

fbshipit-source-id: 955e3dacc1021bb590fde93b0a568ffe9ad80799
2020-10-22 17:06:20 -07:00
Jay Zhuang
9e03e4dd52 db_crashtest preserves expected values file for failed crash tests (#7534)
Summary:
If crash test fails, don't delete the `expected_values_file` for later
debug. More details: https://github.com/facebook/rocksdb/issues/7530

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7534

Test Plan: local host

Reviewed By: ajkr

Differential Revision: D24239655

Pulled By: jay-zhuang

fbshipit-source-id: 3566f91a30aae1e27d2f51d910cddd08edb7d4cf
2020-10-12 14:10:14 -07:00
Andrew Kryczka
75d3b6fdf0 Redesign block cache pinning API (#7520)
Summary:
The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520

Test Plan:
- new unit test
- added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0`

Reviewed By: pdillinger

Differential Revision: D24200034

Pulled By: ajkr

fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e
2020-10-11 14:58:24 -07:00
sdong
82d42606ad Enable paranoid_file_checks in crash test (#7489)
Summary:
Cover paranoid_file_checks in crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7489

Test Plan: Run crash tests for hours and didn't see any failure.

Reviewed By: ajkr

Differential Revision: D24063868

fbshipit-source-id: 7b48b110e66ce78ae5d0c99a9f32af86edd34c1e
2020-10-09 08:32:22 -07:00
Zhichao Cao
685cabdafa Add trace_analyzer_test to ASSERT_STATUS_CHECKED list (#7480)
Summary:
Add trace_analyzer_test to ASSERT_STATUS_CHECKED list

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7480

Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 trace_analyzer_test

Reviewed By: riversand963

Differential Revision: D24033768

Pulled By: zhichao-cao

fbshipit-source-id: b415045e6fab01d6193448650772368c21c6dba6
2020-10-01 15:58:52 -07:00
Ramkumar Vadivelu
e04a50923d Change ParseInternalKey() to return Status instead of bool (#7457)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/7430

Change ParseInternalKey() to return Status instead of bool.

db_bench (seekrandom) based before/after results with value size of 100 bytes and 16 bytes can be found at (tests ran on an udb server):
https://www.dropbox.com/s/47bwamdy5ozngph/PIK_ret_Status_results.xlsx?dl=0

![db_bench_results](https://user-images.githubusercontent.com/62277872/94642825-2a21a800-029a-11eb-88f2-124136c83fd3.png)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7457

Reviewed By: ajkr

Differential Revision: D24002433

Pulled By: ramvadiv

fbshipit-source-id: ac253ecf577a29044c47c3fe254a01e71404c44c
2020-09-30 19:16:47 -07:00
sdong
aedcaaef99 Stress test to support paranoid_file_checks (#7473)
Summary:
It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473

Test Plan: Run crash test

Reviewed By: ajkr

Differential Revision: D24026939

fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c
2020-09-30 14:41:33 -07:00
Levi Tamasi
30fb9dd50f Introduce a helper method UncompressData (#7434)
Summary:
The patch introduces a helper method in `util/compression.h` called `UncompressData`
that dispatches calls to the correct uncompression method based on type, and changes
`UncompressBlockContentsForCompressionType` and `Benchmark::Uncompress` in
`db_bench` so they are implemented in terms of the new method. This eliminates
some code duplication. (`Benchmark::Compress` is also updated to use the previously
introduced `CompressData` helper.)

In addition, the patch brings the implementation of `Snappy_Uncompress` into sync with
the other uncompression methods by making the method compute the buffer size and allocate
the buffer itself. Finally, the patch eliminates some potentially risky back-and-forth conversions
between various unsigned and signed integer types by exposing the size of the allocated buffer
as a `size_t` instead of an `int`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7434

Test Plan:
`make check`
`./db_bench -benchmarks=compress,uncompress --compression_type ...`

Reviewed By: riversand963

Differential Revision: D23900011

Pulled By: ltamasi

fbshipit-source-id: b25df63ceec4639889be94acb22eb53e530c54e0
2020-09-25 09:01:45 -07:00
Akanksha Mahajan
98ac6b646a Add IO Tracer Parser (#7333)
Summary:
Implement a parsing tool io_tracer_parser that takes IO trace file (binary file) with command line argument --io_trace_file and output file with --output_file and dumps the IO trace records in outputfile in human readable form.

Also added unit test cases that generates IO trace records and calls io_tracer_parse to parse those records.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7333

Test Plan:
make check -j64,
 Add unit test cases.

Reviewed By: anand1976

Differential Revision: D23772360

Pulled By: akankshamahajan15

fbshipit-source-id: 9c20519c189362e6663352d08863326f3e496271
2020-09-23 15:50:26 -07:00
Peter Dillinger
7780a360eb Fix HISTORY.md and check_format_compatible.sh for 6.13 branch (#7401)
Summary:
Make "unreleased" section for HISTORY.md with things misplaced
into 6.12 and 6.13

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7401

Test Plan: see how it goes, and `git diff origin/6.13.fb HISTORY.md`

Reviewed By: jay-zhuang

Differential Revision: D23759740

Pulled By: pdillinger

fbshipit-source-id: fc441916c7ff2bbb8d5384137653b340d4c47674
2020-09-17 09:00:13 -07:00
Yanqin Jin
a28df7a75a Add basic support for user-defined timestamp to db_bench (#7389)
Summary:
Update db_bench so that we can run it with user-defined timestamp.
Currently, only 64-bit timestamp is supported, while others are disabled by assertion.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7389

Test Plan: ./db_bench -benchmarks=fillseq,fillrandom,readrandom,readsequential,....., -user_timestamp_size=8

Reviewed By: ltamasi

Differential Revision: D23720830

Pulled By: riversand963

fbshipit-source-id: 486eacbb82de9a5441e79a61bfa9beef6581608a
2020-09-15 20:34:26 -07:00
mrambacher
7d472accdc Bring the Configurable options together (#5753)
Summary:
This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts

Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753

Reviewed By: ajkr

Differential Revision: D23385030

Pulled By: zhichao-cao

fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a
2020-09-14 17:01:01 -07:00
mrambacher
a6ac51b99a Fix db_bench_tool_test. Fixes 7341 (#7344)
Summary:
1.  Failed to compile because of use of FileSystem* instead of Env* to some methods;

2.  Failed to compile with addition of ConfigOptions to some methods

3.  Failed to run successfully because the database and/or db_bench would change some of the options, invalidating the comparison

4.  Failed to run successfully if Snappy was not available.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7344

Reviewed By: siying

Differential Revision: D23501093

Pulled By: jay-zhuang

fbshipit-source-id: 81fd947e95fff9db8a4c5ff419d69d4c36bef23f
2020-09-09 09:07:16 -07:00
Peter Dillinger
9de912de3f Fix some errors showing up in Travis builds (#7359)
Summary:
Also enables a pull request to trigger all the Travis
configurations by writing FULL_CI in the commit message. (See what I did
there?)

First issue

    make: *** No rule to make target 'jl/util/crc32c_ppc_asm.o', needed by 'rocksdbjava'.  Stop.

Second issue

    tools/db_bench_tool.cc:5514:38: error: ‘gen_exp.rocksdb::Benchmark::GenerateTwoTermExpKeys::keyrange_size_’ may be used uninitialized in this function

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7359

Test Plan: CI

Reviewed By: zhichao-cao

Differential Revision: D23582132

Pulled By: pdillinger

fbshipit-source-id: 06d794673fd522ba11cf6398385387e6bd97ef89
2020-09-08 15:11:47 -07:00
Jay Zhuang
8a8a01c642 Fix compile error for old gcc-4.8 (#7358)
Summary:
gcc-4.8 returns error when using the constructor. Not sure if it's a compiler bug/limitation or code issue:
```
table/block_based/block_based_table_reader.cc:3183:67: error: use of deleted function ‘rocksdb::WritableFileStringStreamAdapter::WritableFileStringStreamAdapter(rocksdb::WritableFileStringStreamAdapter&&)’
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7358

Reviewed By: pdillinger

Differential Revision: D23577651

Pulled By: jay-zhuang

fbshipit-source-id: b0197e3d3538da61a6f3866410d88d2047fb9695
2020-09-08 12:09:34 -07:00
Peter Dillinger
4e258d3e63 Fix backup/restore in stress/crash test (#7357)
Summary:
(1) Skip check on specific key if restoring an old backup
(small minority of cases) because it can fail in those cases. (2) Remove
an old assertion about number of column families and number of keys
passed in, which is broken by atomic flush (cf_consistency) test. Like
other code (for better or worse) assume a single key and iterate over
column families. (3) Apply mock_direct_io to NewSequentialFile so that
db_stress backup works on /dev/shm.

Also add more context to output in case of backup/restore db_stress
failure.

Also a minor fix to BackupEngine to report first failure status in
creating new backup, and drop another clue about the potential
source of a "Backup failed" status.

Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357

Test Plan:
Using backup_one_in=10000,
"USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes
"USE_CLANG=1 make blackbox_crash_test" for 30+ minutes
And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb

Reviewed By: riversand963

Differential Revision: D23567244

Pulled By: pdillinger

fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77
2020-09-08 10:50:19 -07:00
Jay Zhuang
27aa443a15 Add sst_file_dumper status check (#7315)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7315

Test Plan:
`ASSERT_STATUS_CHECKED=1 make sst_dump_test && ./sst_dump_test`
And manually run `./sst_dump --file=*.sst` before and after the change.

Reviewed By: pdillinger

Differential Revision: D23361669

Pulled By: jay-zhuang

fbshipit-source-id: 5bf51a2a90ee35c8c679e5f604732ec2aef5949a
2020-09-04 19:26:42 -07:00
Jay Zhuang
ef32f11004 Disable backup/restore stress test (#7350)
Summary:
Seems it's causing some tests failures.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7350

Reviewed By: riversand963

Differential Revision: D23544109

Pulled By: jay-zhuang

fbshipit-source-id: 798a0ca374a20b6c2d0f29582729ff101c6a2e99
2020-09-04 11:58:18 -07:00
Peter Dillinger
06ad5dd293 Add file checksum to stress/crash test (#7343)
Summary:
This change has the crash test randomly select from a few file
checksum implementations, or nullptr, for DB file_checksum_gen_factory.
For compatibility across runs on same DB, each non-null factory can
understand all the other functions, but the default changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343

Test Plan:
'make blackbox_crash_test' for a while, including with some
debug output to ensure code is being exercised.

Reviewed By: zhichao-cao

Differential Revision: D23494580

Pulled By: pdillinger

fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a
2020-09-03 23:50:33 -07:00
Peter Dillinger
499c9448d0 Fix, enable, and enhance backup/restore in db_stress (#7348)
Summary:
Although added to db_stress, testing of backup/restore
was never integrated into the crash test, originally concerned about
performance. I've enabled it now and to address the peformance concern,
testing backup/restore is always skipped once the db exceeds a certain
size threshold, default 100MB. This should provide sufficient
opportunity for testing BackupEngine without bogging down everything
else with heavier and heavier operations.

Also fixed backup/restore in db_stress by making sure PurgeOldBackups
can remove manifest files, which are normally kept around for db_stress.

Added more coverage of backup options, and up to three backups being
saved in one backup directory (in some cases).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348

Test Plan:
ran 'make blackbox_crash_test' for a while, with heightened
probabilitly of taking backups (1/10k). Also confirmed with some debug
output that the code is being covered, TestBackupRestore only takes
a few seconds to complete when triggered, and even at 1/10k and ~50MB
database, there's <,~ 1 thread testing backups at any time.

Reviewed By: ajkr

Differential Revision: D23510835

Pulled By: pdillinger

fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0
2020-09-03 20:13:15 -07:00
Andrew Kryczka
5746767387 add ldb unsafe_remove_sst_file subcommand (#7335)
Summary:
This is adapted from https://github.com/facebook/rocksdb/issues/6678 but takes a different approach, avoiding opening a read-write DB and avoiding the `DeleteFile()` API.

First, this PR refactors how options variables are initialized in `ldb` so it can be reused in a subcommand that doesn't open a DB:

- Separated remaining option initialization logic out of `OpenDB()`. The new `PrepareOptions()` function initializes the full options state.
- Fixed an old TODO about applying the subcommand CF option overrides to the proper `ColumnFamilyOptions` object.

Second, this PR adds the `ldb unsafe_remove_sst_file` subcommand. It uses the `VersionSet`-level APIs to remove the file with the specified number.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7335

Test Plan: played with interactive python and this file removal command. Verified openability/correct results in case of multiple column families, multiple levels, etc.

Reviewed By: pdillinger

Differential Revision: D23454575

Pulled By: ajkr

fbshipit-source-id: 039b7a8cbfc42fd123dcb25821eef51d61148afe
2020-09-03 16:54:51 -07:00
Andrew Kryczka
af54c4092a fix SstFileWriter with dictionary compression (#7323)
Summary:
In block-based table builder, the cut-over from buffered to unbuffered
mode involves sampling the buffered blocks and generating a dictionary.
There was a bug where `SstFileWriter` passed zero as the `target_file_size`
causing the cutover to happen immediately, so there were no samples
available for generating the dictionary.

This PR changes the meaning of `target_file_size == 0` to mean buffer
the whole file before cutting over. It also adds dictionary compression
support to `sst_dump --command=recompress` for easy evaluation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7323

Reviewed By: cheng-chang

Differential Revision: D23412158

Pulled By: ajkr

fbshipit-source-id: 3b232050e70ef3c2ee85a4b5f6fadb139c569873
2020-09-03 15:49:57 -07:00
Hans Holmberg
679a413f11 Close databases on benchmark error exits in db_bench (#7327)
Summary:
Delete database instances to make sure there are no loose threads
running before exit(). This fixes segfaults seen when running
workloads through CompositeEnvs with custom file systems.

For further background on the issues arising when using CompositeEnvs, see the discussion in:
https://github.com/facebook/rocksdb/pull/6878

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7327

Reviewed By: cheng-chang

Differential Revision: D23433244

Pulled By: ajkr

fbshipit-source-id: 4e19cf2067e3fe68c2a3fe1823f24b4091336bbe
2020-09-03 14:36:30 -07:00
Hans Holmberg
2a0d3c7054 Add a file system parameter: --fs_uri to db_stress and db_bench (#6878)
Summary:
This pull request adds the parameter --fs_uri to db_bench and db_stress, creating a composite env combining the default env with a specified registered rocksdb file system.

This makes it easier to develop and test new RocksDB FileSystems.

The pull request also registers the posix file system for testing purposes.

Examples:
```
$./db_bench --fs_uri=posix:// --benchmarks=fillseq

$./db_stress --fs_uri=zenfs://nullb1
```

zenfs is a RocksDB FileSystem I'm developing to add support for zoned block devices, and in that case the zoned block device is specified in the uri (a zoned null block device in the above example).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6878

Reviewed By: siying

Differential Revision: D23023063

Pulled By: ajkr

fbshipit-source-id: 8b3fe7193ce45e683043b021779b7a4d547af247
2020-08-17 11:55:24 -07:00
Akanksha Mahajan
1f9f630b27 Store FileSystemPtr object that contains FileSystem ptr (#7180)
Summary:
As part of the IOTracing project, this PR
    1. Caches "FileSystemPtr" object(wrapper class that returns file system pointer based on tracing enabled) instead of "FileSystem" pointer.
    2. FileSystemPtr object is created using FileSystem pointer and IOTracer
    pointer.
    3. IOTracer shared_ptr is created in DBImpl and it is passed to different classes through constructor.
    4. When tracing is enabled through DB::StartIOTrace, FileSystemPtr
    returns FileSystemTracingWrapper pointer for tracing purpose and when
    it is disabled underlying FileSystem pointer is returned.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7180

Test Plan:
make check -j64
                COMPILE_WITH_TSAN=1 make check -j64

Reviewed By: anand1976

Differential Revision: D22987117

Pulled By: akankshamahajan15

fbshipit-source-id: 6073617e4c2d5bc363914f3a1f55ae3b0a58fbf1
2020-08-12 17:31:23 -07:00
Andrew Kryczka
7eebe6d38a Mark files for compaction in stress/crash tests (#7231)
Summary:
The mechanism to mark files for compaction is most commonly used in
delete-triggered compaction. This PR adds an option to exercise the
marking mechanism on random files created by db_stress. This PR also
enables that option in db_crashtest.py on its db_stress runs at random.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231

Test Plan:
- ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs.

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887
```

Reviewed By: anand1976

Differential Revision: D23025156

Pulled By: ajkr

fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96
2020-08-10 16:17:56 -07:00
Yanqin Jin
8a1da56b96 Include 6.12.fb branch in format compatibility check (#7226)
Summary:
As title.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7226

Test Plan:
```
tools/check_format_compatible.sh
```

Reviewed By: pdillinger

Differential Revision: D23006013

Pulled By: riversand963

fbshipit-source-id: 428f29ad984691183beae72d3d3e753ec9b085a7
2020-08-07 14:20:06 -07:00
Aaron Kabcenell
56ed601df3 Compaction Read/Write Stats by Compaction Type (#7165)
Summary:
Adds compaction statistics (total bytes read and written) for compactions that occur for delete-triggered, periodic, and TTL compaction reasons.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7165

Test Plan:
TTL and periodic can be checked by runnning db_bench with the options activated:

/db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -periodic_compaction_seconds=1
./db_bench --benchmarks="fillrandom,stats" --statistics --num=10000000 -base_background_compactions=16 -fifo_compaction_ttl=1

Setting the time to one second causes non-zero bytes read/written for those compaction reasons. Disabling them or setting them to times longer than the test run length causes the stats to return to zero as expected.

Delete-triggered compaction counting is tested in DBTablePropertiesTest.DeletionTriggeredCompactionMarking

Reviewed By: ajkr

Differential Revision: D22693050

Pulled By: akabcenell

fbshipit-source-id: d15cef4d94576f703015c8942d5f0d492f69401d
2020-07-29 13:39:29 -07:00
Stanislau Hlebik
961dd6228a remediation of S205607
fbshipit-source-id: 798decc90db4f13770e97cdce3c0df7d5421b2a3
2020-07-17 17:20:49 -07:00
Stanislau Hlebik
961a496abc remediation of S205607
fbshipit-source-id: 5113fe0c527595e4227ff827253b7414abbdf7ac
2020-07-17 17:20:49 -07:00
Jay Zhuang
fc4d5f5065 Add stress test for GetProperty (#7111)
Summary:
Add stress test coverage for `DB::GetProperty()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111

Test Plan:
```
./db_stress -get_property_one_in=1
make crash_test
```

Reviewed By: ajkr

Differential Revision: D22487906

Pulled By: jay-zhuang

fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5
2020-07-14 12:12:36 -07:00
mrambacher
c7c7b07f06 More Makefile Cleanup (#7097)
Summary:
Cleans up some of the dependencies on test code in the Makefile while building tools:
- Moves the test::RandomString, DBBaseTest::RandomString into Random
- Moves the test::RandomHumanReadableString into Random
- Moves the DestroyDir method into file_utils
- Moves the SetupSyncPointsToMockDirectIO into sync_point.
- Moves the FaultInjection Env and FS classes under env

These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies.  By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated.

Tested both release and debug builds via Make and CMake for both static and shared libraries.

More work remains to clean up how the tools are built and remove some unnecessary dependencies.  There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097

Reviewed By: riversand963

Differential Revision: D22463160

Pulled By: pdillinger

fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2
2020-07-09 14:35:17 -07:00
Yanqin Jin
842bd2742a Running ./ldb without any extra arg print usage (#7107)
Summary:
as title.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7107

Test Plan: make ldb && ./ldb

Reviewed By: pdillinger

Differential Revision: D22451399

Pulled By: riversand963

fbshipit-source-id: 797645e06473bb9cf139c533877e5161281515e8
2020-07-09 10:20:06 -07:00
Jay Zhuang
00de699096 Replace reinterpret_cast with static_cast_with_check (#7067)
Summary:
Replace `reinterpret_cast` with `static_cast_with_check` for `DBImpl` and `ColumnFamilyHandleImpl`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7067

Reviewed By: siying

Differential Revision: D22361587

Pulled By: jay-zhuang

fbshipit-source-id: dfe9e8f3af39c3d27cc372c55ab9ad905eb0a5a1
2020-07-02 19:25:41 -07:00
Yanqin Jin
f8bfd66b97 Fix python in format check script for Centos8 (#7057)
Summary:
As title.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7057

Test Plan: ./tools/check_format_compatible.sh

Reviewed By: pdillinger

Differential Revision: D22319831

Pulled By: riversand963

fbshipit-source-id: 82653a525a5296ef65a6a7a439cdd6bff88f498e
2020-06-30 16:37:21 -07:00
Yanqin Jin
f5554fd7b6 Add recent versions to format compatibility check (#7059)
Summary:
as title.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7059

Test Plan: ./tools/check_format_compatible.sh

Reviewed By: siying

Differential Revision: D22320774

Pulled By: riversand963

fbshipit-source-id: 124d13b08703d077a7aab3678e1eb639fcbcceca
2020-06-30 15:07:41 -07:00
sdong
d64cf0e4ee Move away from direct TmpDir() call in some tests (#7030)
Summary:
Some tests directly uses TmpDir() as temporary directory without adding any randomize factor. This would cause failures when tests run in parallel. Fix it by moving some of them to test::PerThreadDBPath()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7030

Test Plan: Watch existing tests pass

Reviewed By: zhichao-cao

Differential Revision: D22224710

fbshipit-source-id: 28c9932fede0a4a64670e5b5fdb08f4fb5dccdd0
2020-06-25 12:09:57 -07:00
Zitan Chen
be41c61f22 Add a new option for BackupEngine to store table files under shared_checksum using DB session id in the backup filenames (#6997)
Summary:
`BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`.

Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.

Three new test cases are added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997

Test Plan: Passed make check.

Reviewed By: ajkr

Differential Revision: D22098895

Pulled By: gg814

fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76
2020-06-24 19:31:25 -07:00
sdong
9cc25190e1 Test CircleCI with CLANG-10 (#7025)
Summary:
It's useful to build RocksDB using a more recent clang version in CI. Add a CircleCI build and fix some issues with it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7025

Test Plan: See all tests pass.

Reviewed By: pdillinger

Differential Revision: D22215700

fbshipit-source-id: 914a729c2cd3f3ac4a627cc0ac58d4691dca2168
2020-06-24 16:22:49 -07:00
Andrew Kryczka
8efb5cfb6f add SstFileManager to crash test (#6993)
Summary:
SstFileManager is already supported in the stress test as of https://github.com/facebook/rocksdb/issues/6454. This
PR enables the SstFileManager in some of the crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6993

Reviewed By: riversand963

Differential Revision: D22084406

Pulled By: ajkr

fbshipit-source-id: 78b8642682e7570ff6ec3a1c3ccd9940f4362289
2020-06-23 16:27:20 -07:00
Peter Dillinger
5b2bbacb6f Minimize memory internal fragmentation for Bloom filters (#6427)
Summary:
New experimental option BBTO::optimize_filters_for_memory builds
filters that maximize their use of "usable size" from malloc_usable_size,
which is also used to compute block cache charges.

Rather than always "rounding up," we track state in the
BloomFilterPolicy object to mix essentially "rounding down" and
"rounding up" so that the average FP rate of all generated filters is
the same as without the option. (YMMV as heavily accessed filters might
be unluckily lower accuracy.)

Thus, the option near-minimizes what the block cache considers as
"memory used" for a given target Bloom filter false positive rate and
Bloom filter implementation. There are no forward or backward
compatibility issues with this change, though it only works on the
format_version=5 Bloom filter.

With Jemalloc, we see about 10% reduction in memory footprint (and block
cache charge) for Bloom filters, but 1-2% increase in storage footprint,
due to encoding efficiency losses (FP rate is non-linear with bits/key).

Why not weighted random round up/down rather than state tracking? By
only requiring malloc_usable_size, we don't actually know what the next
larger and next smaller usable sizes for the allocator are. We pick a
requested size, accept and use whatever usable size it has, and use the
difference to inform our next choice. This allows us to narrow in on the
right balance without tracking/predicting usable sizes.

Why not weight history of generated filter false positive rates by
number of keys? This could lead to excess skew in small filters after
generating a large filter.

Results from filter_bench with jemalloc (irrelevant details omitted):

    (normal keys/filter, but high variance)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.6278
    Number of filters: 5516
    Total size (MB): 200.046
    Reported total allocated memory (MB): 220.597
    Reported internal fragmentation: 10.2732%
    Bits/key stored: 10.0097
    Average FP rate %: 0.965228
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 30.5104
    Number of filters: 5464
    Total size (MB): 200.015
    Reported total allocated memory (MB): 200.322
    Reported internal fragmentation: 0.153709%
    Bits/key stored: 10.1011
    Average FP rate %: 0.966313

    (very few keys / filter, optimization not as effective due to ~59 byte
     internal fragmentation in blocked Bloom filter representation)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.5649
    Number of filters: 162950
    Total size (MB): 200.001
    Reported total allocated memory (MB): 224.624
    Reported internal fragmentation: 12.3117%
    Bits/key stored: 10.2951
    Average FP rate %: 0.821534
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 31.8057
    Number of filters: 159849
    Total size (MB): 200
    Reported total allocated memory (MB): 208.846
    Reported internal fragmentation: 4.42297%
    Bits/key stored: 10.4948
    Average FP rate %: 0.811006

    (high keys/filter)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.7017
    Number of filters: 164
    Total size (MB): 200.352
    Reported total allocated memory (MB): 221.5
    Reported internal fragmentation: 10.5552%
    Bits/key stored: 10.0003
    Average FP rate %: 0.969358
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 30.7131
    Number of filters: 160
    Total size (MB): 200.928
    Reported total allocated memory (MB): 200.938
    Reported internal fragmentation: 0.00448054%
    Bits/key stored: 10.1852
    Average FP rate %: 0.963387

And from db_bench (block cache) with jemalloc:

    $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
    $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
    $ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
    17063835
    $ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
    17430747
    $ #^ 2.1% additional filter storage
    $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
    rocksdb.block.cache.index.add COUNT : 33
    rocksdb.block.cache.index.bytes.insert COUNT : 8440400
    rocksdb.block.cache.filter.add COUNT : 33
    rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
    rocksdb.bloom.filter.useful COUNT : 4963889
    rocksdb.bloom.filter.full.positive COUNT : 1214081
    rocksdb.bloom.filter.full.true.positive COUNT : 1161999
    $ #^ 1.04 % observed FP rate
    $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
    rocksdb.block.cache.index.add COUNT : 33
    rocksdb.block.cache.index.bytes.insert COUNT : 8448592
    rocksdb.block.cache.filter.add COUNT : 33
    rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
    rocksdb.bloom.filter.useful COUNT : 5360933
    rocksdb.bloom.filter.full.positive COUNT : 1321315
    rocksdb.bloom.filter.full.true.positive COUNT : 1262999
    $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters

(Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427

Test Plan: unit test added, 'make check' with gcc, clang and valgrind

Reviewed By: siying

Differential Revision: D22124374

Pulled By: pdillinger

fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
2020-06-22 13:32:07 -07:00
Andrew Kryczka
d76eed4839 minor fixes for stress/crash contruns (#7006)
Summary:
Avoid using `cf_consistency` together with `enable_compaction_filter` as
the former heavily uses snapshots while the latter is incompatible with
snapshots.

Also fix a clang-analyze error for a write to a variable that is never
read.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7006

Reviewed By: zhichao-cao

Differential Revision: D22141679

Pulled By: ajkr

fbshipit-source-id: 1840ae238168818a9ab5973f90fd78c067399447
2020-06-19 16:05:17 -07:00
Peter Dillinger
88b4210701 Remove racially charged terms "whitelist" and "blacklist" (#7008)
Summary:
We don't need them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008

Test Plan: "make check" and ensure "make crash_test" starts

Reviewed By: ajkr

Differential Revision: D22143838

Pulled By: pdillinger

fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e
2020-06-19 15:27:32 -07:00
Andrew Kryczka
775dc623ad add CompactionFilter to stress/crash tests (#6988)
Summary:
Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988

Test Plan:
running a minified blackbox crash test

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600
```

Reviewed By: anand1976

Differential Revision: D22072888

Pulled By: ajkr

fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320
2020-06-18 09:54:55 -07:00
Yanqin Jin
15d9f28da5 Add stress test for best-efforts recovery (#6819)
Summary:
Add crash test for the case of best-efforts recovery.
After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819

Test Plan:
```
./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0
./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0
make crash_test_with_best_efforts_recovery
```

Reviewed By: anand1976

Differential Revision: D21436753

Pulled By: riversand963

fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926
2020-06-12 19:27:46 -07:00
Peter Dillinger
edf74d1cb1 Add --version and --help to ldb and sst_dump (#6951)
Summary:
as title
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6951

Test Plan: tests included + manual

Reviewed By: ajkr

Differential Revision: D21918540

Pulled By: pdillinger

fbshipit-source-id: 79d4991f2a831214fc7e477a839ec19dbbace6c5
2020-06-09 10:04:01 -07:00
Zitan Chen
119b26fac0 Implement a new subcommand "identify" for sst_dump (#6943)
Summary:
Implemented a subcommand of sst_dump called identify, which determines whether a file is an SST file or identifies and lists all the SST files in a directory;

This update also fixes the problem that sst_dump exits with a success state even if target file/directory does not exist/is not an SST file/is empty/is corrupted.

One test is added to sst_dump_test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6943

Test Plan: Passed make check and a few manual tests

Reviewed By: pdillinger

Differential Revision: D21928985

Pulled By: gg814

fbshipit-source-id: 9a8b48e0cf1a0e96b13f42b690aba8ad981afad3
2020-06-08 13:58:28 -07:00
Zhichao Cao
fb08330f74 decouple the dependency of trace_analyzer_test unit test (#6941)
Summary:
Since gflags use the global variable to store the flags passed in. In the unit test, if we git one flag per unit test, the result is that all the flags are combined together in the following tests. Therefore, it has the dependency. In this PR, we pass the full arguments each time to ensure that the old arguments will be overwritten by the new one such that the dependency is removed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6941

Test Plan: make asan_check. run each unit test in trace_analyzer_test independently and in arbitrary orders.

Reviewed By: pdillinger

Differential Revision: D21909176

Pulled By: zhichao-cao

fbshipit-source-id: dca550a0a4a205c30faa620e258a020a3b5b4e13
2020-06-08 10:36:43 -07:00
Peter Dillinger
c7432cc3c0 Fix more defects reported by Coverity Scan (#6935)
Summary:
Mostly uninitialized values: some probably written before use, but some seem like bugs. Also, destructor needs to be virtual, and possible use-after-free in test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6935

Test Plan: make check

Reviewed By: siying

Differential Revision: D21885484

Pulled By: pdillinger

fbshipit-source-id: e2e7cb0a0cf196f2b55edd16f0634e81f6cc8e08
2020-06-04 15:35:08 -07:00
mrambacher
e85cbdb4e8 Fix two core dumps when files are missing (#6922)
Summary:
The LDB create and drop column family commands failed to check if theere was a valid database prior to dereferencing it, leading to a core dump.

The SstFileDumper prefetch code would dereference a file when the file did not exist as part of the Prefetch code.  This dereference was moved inside an st.ok() check.

Tests were added for both failure conditions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6922

Reviewed By: gg814

Differential Revision: D21884024

Pulled By: pdillinger

fbshipit-source-id: bddd45c299aa9dc7e928c17a37a96521f8c9149e
2020-06-04 11:40:32 -07:00
Zitan Chen
02df00d97b API change: DB::OpenForReadOnly will not write to the file system unless create_if_missing is true (#6900)
Summary:
DB::OpenForReadOnly will not write anything to the file system (i.e., create directories or files for the DB) unless create_if_missing is true.

This change also fixes some subcommands of ldb, which write to the file system even if the purpose is for readonly.

Two tests for this updated behavior of DB::OpenForReadOnly are also added.

Other minor changes:
1. Updated HISTORY.md to include this API change of DB::OpenForReadOnly;
2. Updated the help information for the put and batchput subcommands of ldb with the option [--create_if_missing];
3. Updated the comment of Env::DeleteDir to emphasize that it returns OK only if the directory to be deleted is empty.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6900

Test Plan: passed make check; also manually tested a few ldb subcommands

Reviewed By: pdillinger

Differential Revision: D21822188

Pulled By: gg814

fbshipit-source-id: 604cc0f0d0326a937ee25a32cdc2b512f9a3be6e
2020-06-03 18:57:49 -07:00
Peter Dillinger
14eca6bf04 For ApproximateSizes, pro-rate table metadata size over data blocks (#6784)
Summary:
The implementation of GetApproximateSizes was inconsistent in
its treatment of the size of non-data blocks of SST files, sometimes
including and sometimes now. This was at its worst with large portion
of table file used by filters and querying a small range that crossed
a table boundary: the size estimate would include large filter size.

It's conceivable that someone might want only to know the size in terms
of data blocks, but I believe that's unlikely enough to ignore for now.
Similarly, there's no evidence the internal function AppoximateOffsetOf
is used for anything other than a one-sided ApproximateSize, so I intend
to refactor to remove redundancy in a follow-up commit.

So to fix this, GetApproximateSizes (and implementation details
ApproximateSize and ApproximateOffsetOf) now consistently include in
their returned sizes a portion of table file metadata (incl filters
and indexes) based on the size portion of the data blocks in range. In
other words, if a key range covers data blocks that are X% by size of all
the table's data blocks, returned approximate size is X% of the total
file size. It would technically be more accurate to attribute metadata
based on number of keys, but that's not computationally efficient with
data available and rarely a meaningful difference.

Also includes miscellaneous comment improvements / clarifications.

Also included is a new approximatesizerandom benchmark for db_bench.
No significant performance difference seen with this change, whether ~700 ops/sec with cache_index_and_filter_blocks and small cache or ~150k ops/sec without cache_index_and_filter_blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6784

Test Plan:
Test added to DBTest.ApproximateSizesFilesWithErrorMargin.
Old code running new test...

    [ RUN      ] DBTest.ApproximateSizesFilesWithErrorMargin
    db/db_test.cc:1562: Failure
    Expected: (size) <= (11 * 100), actual: 9478 vs 1100

Other tests updated to reflect consistent accounting of metadata.

Reviewed By: siying

Differential Revision: D21334706

Pulled By: pdillinger

fbshipit-source-id: 6f86870e45213334fedbe9c73b4ebb1d8d611185
2020-06-02 12:30:23 -07:00
zitan
038e02d8d9 Remove extraneous newline from ldb stderr (#6897)
Summary:
**Summary**
Remove the extraneous newline when using ldb tool. For example, the subcommand list_column_families will print an empty line to stderr even if there are no errors.

**Test plan**
Passed make check; manually tested a few ldb subcommands.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6897

Reviewed By: pdillinger

Differential Revision: D21819352

Pulled By: gg814

fbshipit-source-id: 5a16a6431bb96684fe97647f4d3ac5bf0ec7fc90
2020-06-01 12:15:36 -07:00
Peter Dillinger
0c56fc4d66 Allow missing "unversioned" python, as in CentOS 8 (#6883)
Summary:
RocksDB Makefile was assuming existence of 'python' command,
which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.

Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)

Also, now use just 'python3' for PYTHON if not found so that an informative
"command not found" error will result rather than something weird.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883

Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.

Reviewed By: gg814

Differential Revision: D21767029

Pulled By: pdillinger

fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
2020-05-29 11:29:23 -07:00
mrambacher
0ac0098705 Make options length longer for sst_dump_test (#6846)
Summary:
Under MacOS when running with make -j 8 check, the temporary directory generated was > 100 characters.  This caused the tests to do nothing under MacOS.  Most of them still reported success for doing nothing, but ReadaheadSize was expecting the test to run.

By making the option name longer, the tests will no run successfully (and do something!)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6846

Reviewed By: ajkr

Differential Revision: D21576032

fbshipit-source-id: b089cde0d598137b572aa8527cc5459085252af7
2020-05-19 09:22:12 -07:00
Tongliang Liao
07204837ce Mark dependencies as PRIVATE and fix missing dependencies in tools. (#6790)
Summary:
Tools were mistakenly using leaked (PUBLIC) dependencies from main lib.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6790

Reviewed By: riversand963

Differential Revision: D21471551

fbshipit-source-id: ec43b92e231777e0fcf0f865444391af09d6963b
2020-05-12 21:07:55 -07:00
sdong
4a4b8a1344 sst_dump to reduce number of file reads (#6836)
Summary:
sst_dump can issue many file reads from the file system. This doesn't work well with file systems without a OS cache, especially remote file systems. In order to mitigate this problem, several improvements are done:
1. --readahead_size is added, so that users can specify readahead size when scanning the data.
2. Force a 512KB tail readahead, which prevents three I/Os for footer, meta index and property blocks and hopefully index and filter blocks too.
3. Consoldiate SSTDump's I/Os before opening the file for read. Use the same file prefetch buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6836

Test Plan: Add a test that covers this new feature.

Reviewed By: pdillinger

Differential Revision: D21516607

fbshipit-source-id: 3ae43526286f67b2f4a5bdedfbc92719d579b87e
2020-05-12 18:23:33 -07:00
sdong
a50ea71c00 Improve ldb consistency checks (#6802)
Summary:
When using ldb, users cannot turn on force consistency check in most commands, while they cannot use checksonsistnecy with --try_load_options. The change fixes both by:
1. checkconsistency now calls OpenDB() so that it gets all the options loading and sanitized options logic
2. use options.check_consistency_checks = true by default, and add a --disable_consistency_checks to turn it off.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6802

Test Plan: Add a new unit test. Some manual tests with corrupted DBs.

Reviewed By: pdillinger

Differential Revision: D21388051

fbshipit-source-id: 8d122732d391b426e3982a1c3232a8e3763ffad0
2020-05-08 14:17:47 -07:00
sdong
12825894a2 Disable "compression_parallel_threads" in crash test for now (#6816)
Summary:
"compressio_parallel_threads" caused several test failure tests. To keep crash test clean, disable it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6816

Test Plan: "make crash_test" to make sure the python script doesn't break

Reviewed By: zhichao-cao

Differential Revision: D21462112

fbshipit-source-id: 9eecc764800da82cd19665dc8b167eacead3310b
2020-05-07 20:52:29 -07:00
Andrew Kryczka
1f20df2f38 cover single level universal in crash test (#6818)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6818

Test Plan:
fast whitebox test and verify there are some single-level universal and
some multi-level universal runs.

```
$ python ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887
```

Reviewed By: riversand963

Differential Revision: D21432138

Pulled By: ajkr

fbshipit-source-id: 2fc5ba9f3dfa49bb11e81da7dd00a17b476e64d7
2020-05-06 18:08:09 -07:00
Mian Qin
d9e170d82b Fix issues for reproducing synthetic ZippyDB workloads in the FAST20' paper (#6795)
Summary:
Fix issues for reproducing synthetic ZippyDB workloads in the FAST20' paper using db_bench. Details changes as follows.
1, add a separate random mode in MixGraph to produce all_random workload.
2, fix power inverse function for generating prefix_dist workload.
3, make sure key_offset in prefix mode is always unsigned.
note: Need to carefully choose key_dist_a/b to avoid aliasing. Power inverse function range should be close to overall key space.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6795

Reviewed By: akankshamahajan15

Differential Revision: D21371095

Pulled By: zhichao-cao

fbshipit-source-id: 80744381e242392c8c7cf8ac3d68fe67fe876048
2020-05-04 10:55:14 -07:00
Ziyue Yang
e619a20e93 Add an option for parallel compression in for db_stress (#6722)
Summary:
This commit adds an `compression_parallel_threads` option in
db_stress. It also fixes the naming of parallel compression
option in db_bench to keep it aligned with others.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722

Reviewed By: pdillinger

Differential Revision: D21091385

fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100
2020-04-30 10:49:07 -07:00