Commit Graph

10039 Commits

Author SHA1 Message Date
Zhichao Cao
58162835d1 All the NoSpace() errors will be handled by regular SetBGError and RecoverFromNoSpace() (#8376)
Summary:
In the current logic, any IO Error with retryable flag == true will be handled by the special logic and in most cases, StartRecoverFromRetryableBGIOError will be called to do the auto resume. If the NoSpace error with retryable flag is set during WAL write, it is mapped as a hard error, which will trigger the auto recovery. During the recover process, if write continues and append to the WAL, the write process sees that bg_error is set to HardError and it calls WriteStatusCheck(), which calls SetBGError() with Status (not IOStatus). This will redirect to the regular SetBGError interface, in which recovery_error_ will be set to the corresponding error. With the recovery_error_ set, the auto resume thread created in StartRecoverFromRetryableBGIOError will keep failing as long as user keeps trying to write.

To fix this issue. All the NoSpace error (no matter retryable flag is set or not) will be redirect to the regular SetBGError, and RecoverFromNoSpace() will do the recovery job which calls SstFileManager::StartErrorRecovery().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8376

Test Plan: make check and added the new testing case

Reviewed By: anand1976

Differential Revision: D29071828

Pulled By: zhichao-cao

fbshipit-source-id: 7171d7e14cc4620fdab49b7eff7a2fe9a89942c2
2021-06-11 14:48:28 -07:00
Peter Dillinger
a42a342a7a Make platform009 default for FB developers (#8389)
Summary:
platform007 being phased out and sometimes broken

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8389

Test Plan: `make V=1` to see which compiler is being used

Reviewed By: jay-zhuang

Differential Revision: D29067183

Pulled By: pdillinger

fbshipit-source-id: d1b07267cbc55baa9395f2f4fe3967cc6dad52f7
2021-06-11 11:37:05 -07:00
mrambacher
6ad0810393 Make Comparator into a Customizable Object (#8336)
Summary:
Makes the Comparator class into a Customizable object.  Added/Updated the CreateFromString method to create Comparators.  Added test for using the ObjectRegistry to create one.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8336

Reviewed By: jay-zhuang

Differential Revision: D28999612

Pulled By: mrambacher

fbshipit-source-id: bff2cb2814eeb9fef6a00fddc61d6e34b6fbcf2e
2021-06-11 06:22:59 -07:00
Akanksha Mahajan
3897ce3125 Support for Merge in Integrated BlobDB with base values (#8292)
Summary:
This PR add support for Merge operation in Integrated BlobDB with base values(i.e DB::Put). Merged values can be retrieved through  DB::Get, DB::MultiGet, DB::GetMergeOperands and Iterator operation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8292

Test Plan: Add new unit tests

Reviewed By: ltamasi

Differential Revision: D28415896

Pulled By: akankshamahajan15

fbshipit-source-id: e9b3478bef51d2f214fb88c31ed3c8d2f4a531ff
2021-06-10 12:58:37 -07:00
Baptiste Lemaire
d61a449364 Fixed manifest_dump issues when printing keys and values containing null characters (#8378)
Summary:
Changed fprintf function to fputc in ApplyVersionEdit, and replaced null characters with whitespaces.
Added unit test in ldb_test.py - verifies that manifest_dump --verbose output is correct when keys and values containing null characters are inserted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8378

Reviewed By: pdillinger

Differential Revision: D29034584

Pulled By: bjlemaire

fbshipit-source-id: 50833687a8a5f726e247c38457eadc3e6dbab862
2021-06-10 12:55:20 -07:00
matthewvon
5a2b4ed671 BugFix: fs_posix.cc GetFreeSpace uses wrong value non-root users (#8370)
Summary:
fs_posix.cc GetFreeSpace() calculates free space based upon a call to statvfs().  However, there are two extremely different values in statvfs's returned structure:  f_bfree which is free space for root and f_bavail which is free space for non-root users.  The existing code uses f_bfree.  Many disks have 5 to 10% of the total disk space reserved for root only.  Therefore GetFreeSpace() does not realize that non-root users may not have storage available.

This PR detects whether the effective posix user is root or not, then selects the appropriate available space value.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8370

Reviewed By: mrambacher

Differential Revision: D29032710

Pulled By: jay-zhuang

fbshipit-source-id: 57feba34ed035615a479956d28f98d85735281c0
2021-06-10 11:11:54 -07:00
Zhichao Cao
f44e69c64a Use DbSessionId as cache key prefix when secondary cache is enabled (#8360)
Summary:
Currently, we either use the file system inode or a monotonically incrementing runtime ID as the block cache key prefix. However, if we use a monotonically incrementing runtime ID (in the case that the file system does not support inode id generation), in some cases, it cannot ensure uniqueness (e.g., we have secondary cache migrated from host to host). We use DbSessionID (20 bytes) + current file number (at most 10 bytes) as the new cache block key prefix when the secondary cache is enabled. So can accommodate scenarios such as transfer of cache state across hosts.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8360

Test Plan: add the test to lru_cache_test

Reviewed By: pdillinger

Differential Revision: D29006215

Pulled By: zhichao-cao

fbshipit-source-id: 6cff686b38d83904667a2bd39923cd030df16814
2021-06-10 11:02:43 -07:00
Levi Tamasi
db325a5904 Add a clipping internal iterator (#8327)
Summary:
Logically, subcompactions process a key range [start, end); however, the way
this is currently implemented is that the `CompactionIterator` for any given
subcompaction keeps processing key-values until it actually outputs a key that
is out of range, which is then discarded. Instead of doing this, the patch
introduces a new type of internal iterator called `ClippingIterator` which wraps
another internal iterator and "clips" its range of key-values so that any KVs
returned are strictly in the [start, end) interval. This does eliminate a (minor)
inefficiency by stopping processing in subcompactions exactly at the limit;
however, the main motivation is related to BlobDB: namely, we need this to be
able to measure the amount of garbage generated by a subcompaction
precisely and prevent off-by-one errors.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8327

Test Plan: `make check`

Reviewed By: siying

Differential Revision: D28761541

Pulled By: ltamasi

fbshipit-source-id: ee0e7229f04edabbc7bed5adb51771fbdc287f69
2021-06-09 15:41:16 -07:00
Peter Dillinger
2f93a3b809 Fix a major performance bug in 6.21 for cache entry stats (#8369)
Summary:
In final polishing of https://github.com/facebook/rocksdb/issues/8297 (after most manual testing), I
broke my own caching layer by sanitizing an input parameter with
std::min(0, x) instead of std::max(0, x). I resisted unit testing the
timing part of the result caching because historically, these test
are either flaky or difficult to write, and this was not a correctness
issue. This bug is essentially unnoticeable with a small number
of column families but can explode background work with a
large number of column families.

This change fixes the logical error, removes some unnecessary related
optimization, and adds mock time/sleeps to the unit test to ensure we
can cache hit within the age limit.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8369

Test Plan: added time testing logic to existing unit test

Reviewed By: ajkr

Differential Revision: D28950892

Pulled By: pdillinger

fbshipit-source-id: e79cd4ff3eec68fd0119d994f1ed468c38026c3b
2021-06-08 05:03:32 -07:00
David Devecsery
80a59a03a7 Cancel compact range (#8351)
Summary:
Added the ability to cancel an in-progress range compaction by storing to an atomic "canceled" variable pointed to within the CompactRangeOptions structure.

Tested via two tests added to db_tests2.cc.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8351

Reviewed By: ajkr

Differential Revision: D28808894

Pulled By: ddevec

fbshipit-source-id: cb321361c9e23b084b188bb203f11c375a22c2dd
2021-06-07 11:41:31 -07:00
Stepan Koltsov
707f8d168a Modify script which generates TARGETS (#8366)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8366

Test Plan: Run it, `TARGETS` now unchanged.

Reviewed By: jay-zhuang

Differential Revision: D28914138

Pulled By: stepancheg

fbshipit-source-id: 04d24cdf1439edf4204a3ba1f646e9e75a00d92b
2021-06-04 16:28:59 -07:00
Stiopa Koltsov
4d5b575563 Enable Starlark for fbcode//i*
Summary: #forcetdhashing

Reviewed By: ndmitchell

Differential Revision: D28873060

fbshipit-source-id: 7d3be3e7d38619ec5b0b117f462ca1b9f427aa94
2021-06-04 13:19:01 -07:00
Andrew Kryczka
9167ece586 Snapshot release triggered compaction without multiple tombstones (#8357)
Summary:
This is a duplicate of https://github.com/facebook/rocksdb/issues/4948 by mzhaom to fix tests after rebase.

This change is a follow-up to https://github.com/facebook/rocksdb/issues/4927, which made this possible by allowing tombstone dropping/seqnum zeroing optimizations on the last key in the compaction. Now the `largest_seqno != 0` condition suffices to prevent snapshot release triggered compaction from entering an infinite loop.

The issues caused by the extraneous condition `level_and_file.second->num_deletions > 1` are:

- files could have `largest_seqno > 0` forever making it impossible to tell they cannot contain any covering keys
- it doesn't trigger compaction when there are many overwritten keys. Some MyRocks use case actually doesn't use Delete but instead calls Put with empty value to "delete" keys, so we'd like to be able to trigger compaction in this case too.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8357

Test Plan: - make check

Reviewed By: jay-zhuang

Differential Revision: D28855340

Pulled By: ajkr

fbshipit-source-id: a261b51eecafec492499e6d01e8e43112f801798
2021-06-04 00:21:40 -07:00
anand76
799cf37cb1 Update HISTORY and version to 6.21 (#8363)
Summary:
Update HISTORY and version to 6.21 on master.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8363

Reviewed By: jay-zhuang

Differential Revision: D28888818

Pulled By: anand1976

fbshipit-source-id: 9e5fac3b99ecc9f3b7d9f21474a39fa50decb117
2021-06-03 19:32:14 -07:00
PiyushDatta
2655477c67 Fix "Interval WAL" bytes to say GB instead of MB (#8350)
Summary:
Reference: https://github.com/facebook/rocksdb/issues/7201

Before fix:
`/tmp/rocksdb_test_file/LOG.old.1622492586055679:Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 MB, 0.00 MB/s`

After fix:
`/tmp/rocksdb_test_file/LOG:Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, written: 0.00 GB, 0.00 MB/s`

Tests:
```
Computer:jobs running/jobs completed/%of started jobs/Average seconds to complete
ETA: 0s Left: 0 AVG: 0.05s  local:0/7720/100%/0.0s
rm -rf /dev/shm/rocksdb.CLRh
/usr/bin/python3 tools/check_all_python.py
No syntax errors in 34 .py files
/usr/bin/python3 tools/ldb_test.py
Running testCheckConsistency...
.Running testColumnFamilies...
.Running testCountDelimDump...
.Running testCountDelimIDump...
.Running testDumpLiveFiles...
.Running testDumpLoad...
Warning: 7 bad lines ignored.
.Running testGetProperty...
.Running testHexPutGet...
.Running testIDumpBasics...
.Running testIngestExternalSst...
.Running testInvalidCmdLines...
.Running testListColumnFamilies...
.Running testManifestDump...
.Running testMiscAdminTask...
Sequence,Count,ByteSize,Physical Offset,Key(s)
.Running testSSTDump...
.Running testSimpleStringPutGet...
.Running testStringBatchPut...
.Running testTtlPutGet...
.Running testWALDump...
.
----------------------------------------------------------------------
Ran 19 tests in 15.945s

OK
sh tools/rocksdb_dump_test.sh
make check-format
make[1]: Entering directory '/home/piydatta/Documents/rocksdb'
$DEBUG_LEVEL is 1
Makefile:176: Warning: Compiling in debug mode. Don't use the resulting binary in production
build_tools/format-diff.sh -c
Checking format of uncommitted changes...
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8350

Reviewed By: jay-zhuang

Differential Revision: D28790567

Pulled By: zhichao-cao

fbshipit-source-id: dcb1e4c124361156435122f21f0a288335b2c8c8
2021-06-01 15:19:21 -07:00
Jay Zhuang
eda83eaac0 Fix cmake build failure with gflags (#8324)
Summary:
- Fix cmake build failure with gflags.
- Add CI tests for both gflags 2.1 and 2.2.
- Fix ctest config with gtest.
- Add CI to run test with ctest.

One benefit of ctest is it support timeout, it's set to 5min in our CI, so we will know which test is hang.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8324

Test Plan: CI pass

Reviewed By: ajkr

Differential Revision: D28762517

Pulled By: jay-zhuang

fbshipit-source-id: 09063c5af5f9f33abfcdeb48593acbd9826cd199
2021-06-01 14:43:15 -07:00
sdong
ab718b415f Kill whitebox crash test if it is 15 minutes over the limit (#8341)
Summary:
Whitebox crash test can run significantly over the time limit for test slowness or no kiling points. This indefinite job can create problem when this test is periodically scheduled as a job. Instead, kill the job if it is 15 minutes over the limit.
Refactor the code slightly to consolidate the code for executing commands for white and black box tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8341

Test Plan: Run both of black and white box tests with both of natual and explicit kill condition.

Reviewed By: jay-zhuang

Differential Revision: D28756170

fbshipit-source-id: f253149890e62ace78f871be927e093e9b12f49b
2021-06-01 09:34:53 -07:00
Andrew Kryczka
d561af487c Preset dictionary compression blog post (#8342)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8342

Reviewed By: ramvadiv

Differential Revision: D28762140

Pulled By: ajkr

fbshipit-source-id: c66ca865f5136d6ad321d0f54a62cbf46d9251ba
2021-05-31 21:31:13 -07:00
anand76
9e701b48e0 Update graphs and link in the secondary cache blog post (#8348)
Summary:
Update graphs to remove FB specific terms such as WSF, and update link to the Github issue in the secondary cache blog post.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8348

Reviewed By: ramvadiv

Differential Revision: D28773858

Pulled By: anand1976

fbshipit-source-id: 86281d5c6928550d68d5aa66aae39a41a41f928f
2021-05-31 19:10:07 -07:00
sdong
1c88f66ff8 Add a new blog post for online validation (#8338)
Summary:
A new blog post to introduce recent development related to online validation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8338

Test Plan: Local test with "bundle exec jekyll serve"

Reviewed By: ltamasi

Differential Revision: D28757134

fbshipit-source-id: 42268e1af8dc0c6a42ae62ea61568409b7ce10e4
2021-05-27 13:26:32 -07:00
sdong
cda7923169 Use bloom filter to speed up sync point (#8337)
Summary:
Now SyncPoint is used in crash test but can signiciantly slow down the run. Add a bloom filter before each process to speed itup

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8337

Test Plan: Run all existing tests

Reviewed By: ajkr

Differential Revision: D28730282

fbshipit-source-id: a187377a9d47877a36c5649e4b1f67d5e3033238
2021-05-27 13:14:29 -07:00
anand76
b53e3d2adb Blog post about SecondaryCache (#8339)
Summary:
Blog post about SecondaryCache

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8339

Reviewed By: zhichao-cao

Differential Revision: D28753501

Pulled By: anand1976

fbshipit-source-id: d3241b746a9266fb523e13ad45fd0288083f7470
2021-05-27 12:16:12 -07:00
Peter (Stig) Edwards
c75ef03e58 Do not truncate WAL if in read_only mode (#8313)
Summary:
I noticed ```openat``` system call with ```O_WRONLY``` flag and ```sync_file_range``` and ```truncate``` on WAL file when using ```rocksdb::DB::OpenForReadOnly``` by way of ```db_bench --readonly=true --benchmarks=readseq --use_existing_db=1 --num=1 ...```

Noticed in ```strace``` after seeing the last modification time of the WAL file change after each run (with ```--readonly=true```).

  I think introduced by 7d7f14480e from https://github.com/facebook/rocksdb/pull/8122

I added a test to catch the WAL file being truncated and the modification time on it changing.
I am not sure if a mock filesystem with mock clock could be used to avoid having to sleep 1.1s.
The test could also check the set of files is the same and that the sizes are also unchanged.

Before:

```
[ RUN      ] DBBasicTest.ReadOnlyReopenMtimeUnchanged
db/db_basic_test.cc:182: Failure
Expected equality of these values:
  file_mtime_after_readonly_reopen
    Which is: 1621611136
  file_mtime_before_readonly_reopen
    Which is: 1621611135
  file is: 000010.log
[  FAILED  ] DBBasicTest.ReadOnlyReopenMtimeUnchanged (1108 ms)
```

After:

```
[ RUN      ] DBBasicTest.ReadOnlyReopenMtimeUnchanged
[       OK ] DBBasicTest.ReadOnlyReopenMtimeUnchanged (1108 ms)
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8313

Reviewed By: pdillinger

Differential Revision: D28656925

Pulled By: jay-zhuang

fbshipit-source-id: ea9e215cb53e7c830e76bc5fc75c45e21f12a1d6
2021-05-27 10:27:55 -07:00
sdong
dfa6b408fe Improve comments of iterate_upper_bound (#8331)
Summary:
ReadOptions.iterate_upper_bound's comment is confusing. Improve it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8331

Reviewed By: jay-zhuang

Differential Revision: D28696635

fbshipit-source-id: 7d9fa6fd1642562572140998c89d434058db8dda
2021-05-26 18:23:03 -07:00
Levi Tamasi
886774eabf Add blog post about the new BlobDB implementation (#8335)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8335

Reviewed By: ramvadiv

Differential Revision: D28715167

Pulled By: ltamasi

fbshipit-source-id: 1816196664b0d31aed0b9002df426579441da3f1
2021-05-26 13:23:28 -07:00
Peter Dillinger
956ce9bde2 Some API clarification for manual compaction and listeners (#8330)
Summary:
Avoid people hitting bugs

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8330

Test Plan: comments only

Reviewed By: siying

Differential Revision: D28683157

Pulled By: pdillinger

fbshipit-source-id: 2b34d3efb5e2fa34bea93d54c940cbd425212d25
2021-05-26 08:14:38 -07:00
sdong
a607b88240 SequenceIterWrapper should use internal comparator (#8328)
Summary:
https://github.com/facebook/rocksdb/pull/8288 introduces a bug: SequenceIterWrapper should do next for seek key using internal key comparator rather than user comparator. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8328

Test Plan: Pass all existing tests

Reviewed By: ltamasi

Differential Revision: D28647263

fbshipit-source-id: 4081d684fd8a86d248c485ef8a1563c7af136447
2021-05-24 12:46:38 -07:00
Zhichao Cao
a4405fd981 fix lru caching test and fix reference binding to null pointer (#8326)
Summary:
Fix for https://github.com/facebook/rocksdb/issues/8315. Inhe lru caching test, 5100 is not enough to hold meta block and first block in some random case, increase to 6100. Fix the reference binding to null pointer, use template.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8326

Test Plan: make check

Reviewed By: pdillinger

Differential Revision: D28625666

Pulled By: zhichao-cao

fbshipit-source-id: 97b85306ae3d09bfb74addc7c65e57fe55a976a5
2021-05-24 08:37:00 -07:00
Jay Zhuang
55853de661 Fix clang-analyze: use uninitiated variable (#8325)
Summary:
Error:
```
db/db_compaction_test.cc:5211:47: warning: The left operand of '*' is a garbage value
uint64_t total = (l1_avg_size + l2_avg_size * 10) * 10;
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8325

Test Plan: `$ make analyze`

Reviewed By: pdillinger

Differential Revision: D28620916

Pulled By: jay-zhuang

fbshipit-source-id: f6d58ab84eefbcc905cda45afb9522b0c6d230f8
2021-05-21 19:06:47 -07:00
Zhichao Cao
7303d02bdf Use new Insert and Lookup APIs in table reader to support secondary cache (#8315)
Summary:
Secondary cache is implemented to achieve the secondary cache tier for block cache. New Insert and Lookup APIs are introduced in https://github.com/facebook/rocksdb/issues/8271  . To support and use the secondary cache in block based table reader, this PR introduces the corresponding callback functions that will be used in secondary cache, and update the Insert and Lookup APIs accordingly.

benchmarking:
./db_bench --benchmarks="fillrandom" -num=1000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/tmp/rocks_t/db -partition_index_and_filters=true

./db_bench -db=/tmp/rocks_t/db -use_existing_db=true -benchmarks=readrandom -num=1000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=5 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -stats_dump_period_sec=30 -reads=50000000

master benchmarking results:
readrandom   :       3.923 micros/op 254881 ops/sec;   33.4 MB/s (23849796 of 50000000 found)
rocksdb.db.get.micros P50 : 2.820992 P95 : 5.636716 P99 : 16.450553 P100 : 8396.000000 COUNT : 50000000 SUM : 179947064

Current PR benchmarking results
readrandom   :       4.083 micros/op 244925 ops/sec;   32.1 MB/s (23849796 of 50000000 found)
rocksdb.db.get.micros P50 : 2.967687 P95 : 5.754916 P99 : 15.665912 P100 : 8213.000000 COUNT : 50000000 SUM : 187250053

About 3.8% throughput reduction.
P50: 5.2% increasing, P95, 2.09% increasing, P99 4.77% improvement

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8315

Test Plan: added the testing case

Reviewed By: anand1976

Differential Revision: D28599774

Pulled By: zhichao-cao

fbshipit-source-id: 098c4df0d7327d3a546df7604b2f1602f13044ed
2021-05-21 18:29:12 -07:00
Jay Zhuang
6c7c3e8cb3 Use large macos instance (#8320)
Summary:
Macos build is taking more than 1 hour, bump the instance type from the
default medium to large (large macos instance was not available before).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8320

Test Plan: watch CI pass

Reviewed By: ajkr

Differential Revision: D28589456

Pulled By: jay-zhuang

fbshipit-source-id: cff78dae5aaf9de90ade3468469290176de5ff32
2021-05-21 18:17:03 -07:00
Peter Dillinger
3469d60fcc Add table properties for number of entries added to filters (#8323)
Summary:
With Ribbon filter work and possible variance in actual bits
per key (or prefix; general term "entry") to achieve certain FP rates,
I've received a request to be able to track actual bits per key in
generated filters. This change adds a num_filter_entries table
property, which can be combined with filter_size to get bits per key
(entry).

This can vary from num_entries in at least these ways:
* Different versions of same key are only counted once in filters.
* With prefix filters, several user keys map to the same filter entry.
* A single filter can include both prefixes and user keys.

Note that FilterBlockBuilder::NumAdded() didn't do anything useful
except distinguish empty from non-empty.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8323

Test Plan: basic unit test included, others updated

Reviewed By: jay-zhuang

Differential Revision: D28596210

Pulled By: pdillinger

fbshipit-source-id: 529a111f3c84501e5a470bc84705e436ee68c376
2021-05-21 17:11:32 -07:00
Jay Zhuang
6c86543590 Fix manual compaction max_compaction_bytes under-calculated issue (#8269)
Summary:
Fix a bug that for manual compaction, `max_compaction_bytes` is only
limit the SST files from input level, but not overlapped files on output
level.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8269

Test Plan: `make check`

Reviewed By: ajkr

Differential Revision: D28231044

Pulled By: jay-zhuang

fbshipit-source-id: 9d7d03004f30cc4b1b9819830141436907554b7c
2021-05-21 14:03:44 -07:00
sdong
bd3d080ef8 Try to build with liburing by default. (#8322)
Summary:
By default, try to build with liburing. For make, if ROCKSDB_USE_IO_URING is not set, treat as 1, which means RocksDB will try to build with liburing. For cmake, add WITH_LIBURING to control it, with default on.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8322

Test Plan: Build using cmake and make.

Reviewed By: anand1976

Differential Revision: D28586498

fbshipit-source-id: cfd39159ab697f4b93a9293a59c07f839b1e7ed5
2021-05-21 10:21:53 -07:00
sdong
2f1984dd45 Compare memtable insert and flush count (#8288)
Summary:
When a memtable is flushed, it will validate number of entries it reads, and compare the number with how many entries inserted into memtable. This serves as one sanity c\
heck against memory corruption. This change will also allow more counters to be added in the future for better validation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8288

Test Plan: Pass all existing tests

Reviewed By: ajkr

Differential Revision: D28369194

fbshipit-source-id: 7ff870380c41eab7f99eee508550dcdce32838ad
2021-05-20 16:07:28 -07:00
Jay Zhuang
94b4faa0f1 Deflake ExternalSSTFileTest.PickedLevelBug (#8307)
Summary:
The test want to make sure these's no compaction during `AddFile`
(between `DBImpl::AddFile:MutexLock` and `DBImpl::AddFile:MutexUnlock`)
but the mutex could be unlocked by `EnterUnbatched()`.
Move the lock start point after bumping the ingest file number.

Also fix the dead lock when ASSERT fails.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8307

Reviewed By: ajkr

Differential Revision: D28479849

Pulled By: jay-zhuang

fbshipit-source-id: b3c50f66aa5d5f59c5c27f815bfea189c4cd06cb
2021-05-20 09:29:57 -07:00
dependabot[bot]
f76326e370 Bump nokogiri from 1.11.1 to 1.11.4 in /docs (#8318)
Summary:
Bumps [nokogiri](https://github.com/sparklemotion/nokogiri) from 1.11.1 to 1.11.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/releases">nokogiri's releases</a>.</em></p>
<blockquote>
<h2>1.11.4 / 2021-05-14</h2>
<h3>Security</h3>
<p>[CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:</p>
<ul>
<li><a href="https://security.archlinux.org/CVE-2019-20388">CVE-2019-20388</a></li>
<li><a href="https://security.archlinux.org/CVE-2020-24977">CVE-2020-24977</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3517">CVE-2021-3517</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3518">CVE-2021-3518</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3537">CVE-2021-3537</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3541">CVE-2021-3541</a></li>
</ul>
<p>Note that two additional CVEs were addressed upstream but are not relevant to this release. <a href="https://security.archlinux.org/CVE-2021-3516">CVE-2021-3516</a> via <code>xmllint</code> is not present in Nokogiri, and <a href="https://security.archlinux.org/CVE-2020-7595">CVE-2020-7595</a> has been patched in Nokogiri since v1.10.8 (see <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1992">https://github.com/facebook/rocksdb/issues/1992</a>).</p>
<p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-7rrm-v45f-jp64">nokogiri/GHSA-7rrm-v45f-jp64 </a> or <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2233">https://github.com/facebook/rocksdb/issues/2233</a> for a more complete analysis of these CVEs and patches.</p>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] vendored libxml2 is updated from 2.9.10 to 2.9.12. (Note that 2.9.11 was skipped because it was superseded by 2.9.12 a few hours after its release.)</li>
</ul>
<h2>1.11.3 / 2021-04-07</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this likely segfaulted. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1900">https://github.com/facebook/rocksdb/issues/1900</a>]</li>
<li>[JRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this raised a <code>TypeError</code> exception.</li>
<li>[CRuby] arm64/aarch64 systems (like Apple's M1) can now compile libxml2 and libxslt from source (though we continue to strongly advise users to install the native gems for the best possible experience)</li>
</ul>
<h2>1.11.2 / 2021-03-11</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] <code>NodeSet</code> may now safely contain <code>Node</code> objects from multiple documents. Previously the GC lifecycle of the parent <code>Document</code> objects could lead to nodes being GCed while still in scope. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1952#issuecomment-770856928">https://github.com/facebook/rocksdb/issues/1952</a>]</li>
<li>[CRuby] Patch libxml2 to avoid &quot;huge input lookup&quot; errors on large CDATA elements. (See upstream <a href="https://gitlab.gnome.org/GNOME/libxml2/-/issues/200">GNOME/libxml2#200</a> and <a href="https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/100">GNOME/libxml2!100</a>.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2132">https://github.com/facebook/rocksdb/issues/2132</a>].</li>
<li>[CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and link against <code>nokogiri.so</code> by including <code>LDFLAGS</code> in <code>Nokogiri::VERSION_INFO</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2167">https://github.com/facebook/rocksdb/issues/2167</a>]</li>
<li>[CRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was invoked twice on each object.</li>
<li>[JRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was not called, which was a problem for subclassing such as done by <code>Loofah</code>.</li>
</ul>
<h3>Improved</h3>
<ul>
<li>Reduce the number of object allocations needed when parsing an HTML::DocumentFragment. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2087">https://github.com/facebook/rocksdb/issues/2087</a>] (Thanks, <a href="https://github.com/ashmaroli"><code>@​ashmaroli</code></a>!)</li>
<li>[JRuby] Update the algorithm used to calculate <code>Node#line</code> to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1223">https://github.com/facebook/rocksdb/issues/1223</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2177">https://github.com/facebook/rocksdb/issues/2177</a>]</li>
<li>Introduce <code>--enable-system-libraries</code> and <code>--disable-system-libraries</code> flags to <code>extconf.rb</code>. These flags provide the same functionality as <code>--use-system-libraries</code> and the <code>NOKOGIRI_USE_SYSTEM_LIBRARIES</code> environment variable, but are more idiomatic. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li>
<li>[TruffleRuby] <code>--disable-static</code> is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2191#issuecomment-780724627">https://github.com/facebook/rocksdb/issues/2191</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li>
</ul>

</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a href="https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md">nokogiri's changelog</a>.</em></p>
<blockquote>
<h2>1.11.4 / 2021-05-14</h2>
<h3>Security</h3>
<p>[CRuby] Vendored libxml2 upgraded to v2.9.12 which addresses:</p>
<ul>
<li><a href="https://security.archlinux.org/CVE-2019-20388">CVE-2019-20388</a></li>
<li><a href="https://security.archlinux.org/CVE-2020-24977">CVE-2020-24977</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3517">CVE-2021-3517</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3518">CVE-2021-3518</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3537">CVE-2021-3537</a></li>
<li><a href="https://security.archlinux.org/CVE-2021-3541">CVE-2021-3541</a></li>
</ul>
<p>Note that two additional CVEs were addressed upstream but are not relevant to this release. <a href="https://security.archlinux.org/CVE-2021-3516">CVE-2021-3516</a> via <code>xmllint</code> is not present in Nokogiri, and <a href="https://security.archlinux.org/CVE-2020-7595">CVE-2020-7595</a> has been patched in Nokogiri since v1.10.8 (see <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1992">https://github.com/facebook/rocksdb/issues/1992</a>).</p>
<p>Please see <a href="https://github.com/sparklemotion/nokogiri/security/advisories/GHSA-7rrm-v45f-jp64">nokogiri/GHSA-7rrm-v45f-jp64 </a> or <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2233">https://github.com/facebook/rocksdb/issues/2233</a> for a more complete analysis of these CVEs and patches.</p>
<h3>Dependencies</h3>
<ul>
<li>[CRuby] vendored libxml2 is updated from 2.9.10 to 2.9.12. (Note that 2.9.11 was skipped because it was superseded by 2.9.12 a few hours after its release.)</li>
</ul>
<h2>1.11.3 / 2021-04-07</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this likely segfaulted. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1900">https://github.com/facebook/rocksdb/issues/1900</a>]</li>
<li>[JRuby] Passing non-<code>Node</code> objects to <code>Document#root=</code> now raises an <code>ArgumentError</code> exception. Previously this raised a <code>TypeError</code> exception.</li>
<li>[CRuby] arm64/aarch64 systems (like Apple's M1) can now compile libxml2 and libxslt from source (though we continue to strongly advise users to install the native gems for the best possible experience)</li>
</ul>
<h2>1.11.2 / 2021-03-11</h2>
<h3>Fixed</h3>
<ul>
<li>[CRuby] <code>NodeSet</code> may now safely contain <code>Node</code> objects from multiple documents. Previously the GC lifecycle of the parent <code>Document</code> objects could lead to nodes being GCed while still in scope. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1952#issuecomment-770856928">https://github.com/facebook/rocksdb/issues/1952</a>]</li>
<li>[CRuby] Patch libxml2 to avoid &quot;huge input lookup&quot; errors on large CDATA elements. (See upstream <a href="https://gitlab.gnome.org/GNOME/libxml2/-/issues/200">GNOME/libxml2#200</a> and <a href="https://gitlab.gnome.org/GNOME/libxml2/-/merge_requests/100">GNOME/libxml2!100</a>.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2132">https://github.com/facebook/rocksdb/issues/2132</a>].</li>
<li>[CRuby+Windows] Enable Nokogumbo (and other downstream gems) to compile and link against <code>nokogiri.so</code> by including <code>LDFLAGS</code> in <code>Nokogiri::VERSION_INFO</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2167">https://github.com/facebook/rocksdb/issues/2167</a>]</li>
<li>[CRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was invoked twice on each object.</li>
<li>[JRuby] <code>{XML,HTML}::Document.parse</code> now invokes <code>#initialize</code> exactly once. Previously <code>#initialize</code> was not called, which was a problem for subclassing such as done by <code>Loofah</code>.</li>
</ul>
<h3>Improved</h3>
<ul>
<li>Reduce the number of object allocations needed when parsing an <code>HTML::DocumentFragment</code>. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2087">https://github.com/facebook/rocksdb/issues/2087</a>] (Thanks, <a href="https://github.com/ashmaroli"><code>@​ashmaroli</code></a>!)</li>
<li>[JRuby] Update the algorithm used to calculate <code>Node#line</code> to be wrong less-often. The underlying parser, Xerces, does not track line numbers, and so we've always used a hacky solution for this method. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/1223">https://github.com/facebook/rocksdb/issues/1223</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2177">https://github.com/facebook/rocksdb/issues/2177</a>]</li>
<li>Introduce <code>--enable-system-libraries</code> and <code>--disable-system-libraries</code> flags to <code>extconf.rb</code>. These flags provide the same functionality as <code>--use-system-libraries</code> and the <code>NOKOGIRI_USE_SYSTEM_LIBRARIES</code> environment variable, but are more idiomatic. [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li>
<li>[TruffleRuby] <code>--disable-static</code> is now the default on TruffleRuby when the packaged libraries are used. This is more flexible and compiles faster. (Note, though, that the default on TR is still to use system libraries.) [<a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2191#issuecomment-780724627">https://github.com/facebook/rocksdb/issues/2191</a>, <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2193">https://github.com/facebook/rocksdb/issues/2193</a>] (Thanks, <a href="https://github.com/eregon"><code>@​eregon</code></a>!)</li>
</ul>

</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a href="9d69b44ed3"><code>9d69b44</code></a> version bump to v1.11.4</li>
<li><a href="058e87fdfd"><code>058e87f</code></a> update CHANGELOG with complete CVE information</li>
<li><a href="92852514a0"><code>9285251</code></a> Merge pull request <a href="https://github-redirect.dependabot.com/sparklemotion/nokogiri/issues/2234">https://github.com/facebook/rocksdb/issues/2234</a> from sparklemotion/2233-upgrade-to-libxml-2-9-12</li>
<li><a href="5436f6120f"><code>5436f61</code></a> update CHANGELOG</li>
<li><a href="761d320af2"><code>761d320</code></a> patch: renumber libxml2 patches</li>
<li><a href="889ee2a9cb"><code>889ee2a</code></a> test: update behavior of namespaces in HTML</li>
<li><a href="9751d852c0"><code>9751d85</code></a> test: remove low-value HTML::SAX::PushParser encoding test</li>
<li><a href="9fcb7d25ea"><code>9fcb7d2</code></a> test: adjust xpath gc test to libxml2's max recursion depth</li>
<li><a href="1c99019f5f"><code>1c99019</code></a> patch: backport libxslt configure.ac change for libxml2 config</li>
<li><a href="82a253fe7c"><code>82a253f</code></a> patch: fix isnan/isinf patch to apply cleanly to libxml 2.9.12</li>
<li>Additional commits viewable in <a href="https://github.com/sparklemotion/nokogiri/compare/v1.11.1...v1.11.4">compare view</a></li>
</ul>
</details>
<br />

[![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=nokogiri&package-manager=bundler&previous-version=1.11.1&new-version=1.11.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

 ---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `dependabot rebase` will rebase this PR
- `dependabot recreate` will recreate this PR, overwriting any edits that have been made to it
- `dependabot merge` will merge this PR after your CI passes on it
- `dependabot squash and merge` will squash and merge this PR after your CI passes on it
- `dependabot cancel merge` will cancel a previously requested merge and block automerging
- `dependabot reopen` will reopen this PR if it is closed
- `dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
- `dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
- `dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
- `dependabot use these labels` will set the current labels as the default for future PRs for this repo and language
- `dependabot use these reviewers` will set the current reviewers as the default for future PRs for this repo and language
- `dependabot use these assignees` will set the current assignees as the default for future PRs for this repo and language
- `dependabot use this milestone` will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/facebook/rocksdb/network/alerts).

</details>

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8318

Reviewed By: pdillinger

Differential Revision: D28541823

Pulled By: jay-zhuang

fbshipit-source-id: e431517d1dcd4a19b358b3a98b1578539158e1fe
2021-05-20 08:39:28 -07:00
Jay Zhuang
3786181a90 Add remote compaction public API (#8300)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8300

Reviewed By: ajkr

Differential Revision: D28464726

Pulled By: jay-zhuang

fbshipit-source-id: 49e9f4fb791808a6cbf39a7b1a331373f645fc5e
2021-05-19 21:41:31 -07:00
Peter Dillinger
311a544c2a Use deleters to label cache entries and collect stats (#8297)
Summary:
This change gathers and publishes statistics about the
kinds of items in block cache. This is especially important for
profiling relative usage of cache by index vs. filter vs. data blocks.
It works by iterating over the cache during periodic stats dump
(InternalStats, stats_dump_period_sec) or on demand when
DB::Get(Map)Property(kBlockCacheEntryStats), except that for
efficiency and sharing among column families, saved data from
the last scan is used when the data is not considered too old.

The new information can be seen in info LOG, for example:

    Block cache LRUCache@0x7fca62229330 capacity: 95.37 MB collections: 8 last_copies: 0 last_secs: 0.00178 secs_since: 0
    Block cache entry stats(count,size,portion): DataBlock(7092,28.24 MB,29.6136%) FilterBlock(215,867.90 KB,0.888728%) FilterMetaBlock(2,5.31 KB,0.00544%) IndexBlock(217,180.11 KB,0.184432%) WriteBuffer(1,256.00 KB,0.262144%) Misc(1,0.00 KB,0%)

And also through DB::GetProperty and GetMapProperty (here using
ldb just for demonstration):

    $ ./ldb --db=/dev/shm/dbbench/ get_property rocksdb.block-cache-entry-stats
    rocksdb.block-cache-entry-stats.bytes.data-block: 0
    rocksdb.block-cache-entry-stats.bytes.deprecated-filter-block: 0
    rocksdb.block-cache-entry-stats.bytes.filter-block: 0
    rocksdb.block-cache-entry-stats.bytes.filter-meta-block: 0
    rocksdb.block-cache-entry-stats.bytes.index-block: 178992
    rocksdb.block-cache-entry-stats.bytes.misc: 0
    rocksdb.block-cache-entry-stats.bytes.other-block: 0
    rocksdb.block-cache-entry-stats.bytes.write-buffer: 0
    rocksdb.block-cache-entry-stats.capacity: 8388608
    rocksdb.block-cache-entry-stats.count.data-block: 0
    rocksdb.block-cache-entry-stats.count.deprecated-filter-block: 0
    rocksdb.block-cache-entry-stats.count.filter-block: 0
    rocksdb.block-cache-entry-stats.count.filter-meta-block: 0
    rocksdb.block-cache-entry-stats.count.index-block: 215
    rocksdb.block-cache-entry-stats.count.misc: 1
    rocksdb.block-cache-entry-stats.count.other-block: 0
    rocksdb.block-cache-entry-stats.count.write-buffer: 0
    rocksdb.block-cache-entry-stats.id: LRUCache@0x7f3636661290
    rocksdb.block-cache-entry-stats.percent.data-block: 0.000000
    rocksdb.block-cache-entry-stats.percent.deprecated-filter-block: 0.000000
    rocksdb.block-cache-entry-stats.percent.filter-block: 0.000000
    rocksdb.block-cache-entry-stats.percent.filter-meta-block: 0.000000
    rocksdb.block-cache-entry-stats.percent.index-block: 2.133751
    rocksdb.block-cache-entry-stats.percent.misc: 0.000000
    rocksdb.block-cache-entry-stats.percent.other-block: 0.000000
    rocksdb.block-cache-entry-stats.percent.write-buffer: 0.000000
    rocksdb.block-cache-entry-stats.secs_for_last_collection: 0.000052
    rocksdb.block-cache-entry-stats.secs_since_last_collection: 0

Solution detail - We need some way to flag what kind of blocks each
entry belongs to, preferably without changing the Cache API.
One of the complications is that Cache is a general interface that could
have other users that don't adhere to whichever convention we decide
on for keys and values. Or we would pay for an extra field in the Handle
that would only be used for this purpose.

This change uses a back-door approach, the deleter, to indicate the
"role" of a Cache entry (in addition to the value type, implicitly).
This has the added benefit of ensuring proper code origin whenever we
recognize a particular role for a cache entry; if the entry came from
some other part of the code, it will use an unrecognized deleter, which
we simply attribute to the "Misc" role.

An internal API makes for simple instantiation and automatic
registration of Cache deleters for a given value type and "role".

Another internal API, CacheEntryStatsCollector, solves the problem of
caching the results of a scan and sharing them, to ensure scans are
neither excessive nor redundant so as not to harm Cache performance.

Because code is added to BlocklikeTraits, it is pulled out of
block_based_table_reader.cc into its own file.

This is a reformulation of https://github.com/facebook/rocksdb/issues/8276, without the type checking option
(could still be added), and with actual stat gathering.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8297

Test Plan: manual testing with db_bench, and a couple of basic unit tests

Reviewed By: ltamasi

Differential Revision: D28488721

Pulled By: pdillinger

fbshipit-source-id: 472f524a9691b5afb107934be2d41d84f2b129fb
2021-05-19 16:51:13 -07:00
Glebanister
748e3acc11 Add StartThread type checking wrapper (#8303)
Summary:
- Add class `FunctorWrapper` to invoke the function with given parameters
- Implement `StartThreadTyped` which wraps `StartThread` with type checking cover
- Demonstrate `StartThreadTyped` in test `util/thread_local_test.cc`

https://github.com/facebook/rocksdb/issues/8285

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8303

Reviewed By: ajkr

Differential Revision: D28539318

Pulled By: pdillinger

fbshipit-source-id: 624789c236bde31163deda95c1e1471aee68933e
2021-05-19 16:51:13 -07:00
anand76
13232e11d4 Allow cache_bench/db_bench to use a custom secondary cache (#8312)
Summary:
This PR adds a ```-secondary_cache_uri``` option to the cache_bench and db_bench tools to allow the user to specify a custom secondary cache URI. The object registry is used to create an instance of the ```SecondaryCache``` object of the type specified in the URI.

The main cache_bench code is packaged into a separate library, similar to db_bench.

An example invocation of db_bench with a secondary cache URI -
```db_bench --env_uri=ws://ws.flash_sandbox.vll1_2/ -db=anand/nvm_cache_2 -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=67108864 -cache_index_and_filter_blocks=true  -secondary_cache_uri='cachelibwrapper://filename=/home/anand76/nvm_cache/cache_file;size=2147483648;regionSize=16777216;admPolicy=random;admProbability=1.0;volatileSize=8388608;bktPower=20;lockPower=12' -partition_index_and_filters=true -duration=1800```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8312

Reviewed By: zhichao-cao

Differential Revision: D28544325

Pulled By: anand1976

fbshipit-source-id: 8f209b9af900c459dc42daa7a610d5f00176eeed
2021-05-19 15:26:18 -07:00
sdong
871a2cb292 Fix test issue in new env_test tests (#8319)
Summary:
The two new tests added to env_test don't clear sync points, so if tests are run in continuous mode, rather than parallel mode, the next test will trigger previous sync point and fail. Fix it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8319

Test Plan: Run the tests in continuous mode which used to fail and see them passing.

Reviewed By: pdillinger

Differential Revision: D28542562

fbshipit-source-id: 4052d487635188fe68a2a9df4b03d97b23f96720
2021-05-19 10:59:02 -07:00
sdong
ce0fc71adf Minor improvements in env_test (#8317)
Summary:
Fix typo in comments in env_test and add PermitUncheckedError() to two statuses.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8317

Reviewed By: jay-zhuang

Differential Revision: D28525093

fbshipit-source-id: 7a1ed3e45b6f500b8d2ae19fa339c9368111e922
2021-05-19 10:28:08 -07:00
anand76
9d61a0856d Sync ingested files only if reopen is supported by the FS (#8296)
Summary:
Some file systems (especially distributed FS) do not support reopening a file for writing. The ExternalSstFileIngestionJob calls ReopenWritableFile in order to sync the ingested file, which typically makes sense only on a local file system with a page cache (i.e Posix). So this change tries to sync the ingested file only if ReopenWritableFile doesn't return Status::NotSupported().

Tests:
Add a new unit test in external_sst_file_basic_test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8296

Reviewed By: jay-zhuang

Differential Revision: D28420865

Pulled By: anand1976

fbshipit-source-id: 380e7f5ff95324997f7a59864a9ac96ebbd0100c
2021-05-18 19:33:55 -07:00
sdong
60e5af83c1 Handle return code by io_uring_submit_and_wait() and io_uring_wait_cqe() (#8311)
Summary:
Right now return codes by io_uring_submit_and_wait() and io_uring_wait_cqe() are not handled. It is not the good practice. Although these two functions are not supposed to return non-0 values in normal exeuction, people suspect that they might return non-0 value when an interruption happens, and the code might cause hanging.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8311

Test Plan: Make sure at least normal test cases still pass.

Reviewed By: anand1976

Differential Revision: D28500828

fbshipit-source-id: 8a76cea9cafbd041102e0b6a8eef9d0bfed7c211
2021-05-18 16:09:14 -07:00
mrambacher
6b0a22a4b0 Fix MultiGet with PinnableSlices and Merge for WBWI (#8299)
Summary:
The MultiGetFromBatchAndDB would fail if the PinnableSlice value being returned was pinned.  This could happen if the value was retrieved from the DB (not memtable) or potentially if the values were reused (and a previous iteration returned a slice that was pinned).

This change resets the pinnable value to clear it prior to attempting to use it, thereby eliminating the problem with the value already being pinned.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8299

Reviewed By: jay-zhuang

Differential Revision: D28455426

Pulled By: mrambacher

fbshipit-source-id: a34d7d983ec9b6bb4c8a2b4892f72858d43e6972
2021-05-18 14:35:47 -07:00
Stanislav Tkach
83d1a66598 Expose CompressionOptions::parallel_threads through C API (#8302)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8302

Reviewed By: jay-zhuang

Differential Revision: D28499262

Pulled By: ajkr

fbshipit-source-id: 7b17b79af871d874dfca76db9bca0d640a6cd854
2021-05-17 22:53:04 -07:00
Levi Tamasi
d83542ca83 Make it possible to apply only a subrange of table property collectors (#8298)
Summary:
This patch does two things:
1) Introduces some aliases in order to eliminate/prevent long-winded type names
w/r/t the internal table property collectors (see e.g.
`std::vector<std::unique_ptr<IntTblPropCollectorFactory>>`).
2) Makes it possible to apply only a subrange of table property collectors during
table building by turning `TableBuilderOptions::int_tbl_prop_collector_factories`
from a pointer to a `vector` into a range (i.e. a pair of iterators).

Rationale: I plan to introduce a BlobDB related table property collector, which
should only be applied during table creation if blob storage is enabled at the moment
(which can be changed dynamically). This change will make it possible to include/
exclude the BlobDB related collector as needed without having to introduce
a second `vector` of collectors in `ColumnFamilyData` with pretty much the same
contents.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8298

Test Plan: `make check`

Reviewed By: jay-zhuang

Differential Revision: D28430910

Pulled By: ltamasi

fbshipit-source-id: a81d28f2c59495865300f43deb2257d2e6977c8e
2021-05-17 18:28:39 -07:00
sdong
0ed8cb666d Write file temperature information to manifest (#8284)
Summary:
As a part of tiered storage, writing tempeature information to manifest is needed so that after DB recovery, RocksDB still has the tiering information, to implement some further necessary functionalities.

Also fix some issues in simulated hybrid FS.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8284

Test Plan: Add a new unit test to validate that the information is indeed written and read back.

Reviewed By: zhichao-cao

Differential Revision: D28335801

fbshipit-source-id: 56aeb2e6ea090be0200181dd968c8a7278037def
2021-05-17 15:15:23 -07:00
anand76
feb06e83b2 Initial support for secondary cache in LRUCache (#8271)
Summary:
Defined the abstract interface for a secondary cache in include/rocksdb/secondary_cache.h, and updated LRUCacheOptions to take a std::shared_ptr<SecondaryCache>. An item is initially inserted into the LRU (primary) cache. When it ages out and evicted from memory, its inserted into the secondary cache. On a LRU cache miss and successful lookup in the secondary cache, the item is promoted to the LRU cache. Only support synchronous lookup currently. The secondary cache would be used to implement a persistent (flash cache) or compressed cache.

Tests:
Results from cache_bench and db_bench don't show any regression due to these changes.

cache_bench results before and after this change -
Command
```./cache_bench -ops_per_thread=10000000 -threads=1```
Before
```Complete in 40.688 s; QPS = 245774```
```Complete in 40.486 s; QPS = 246996```
```Complete in 42.019 s; QPS = 237989```
After
```Complete in 40.672 s; QPS = 245869```
```Complete in 44.622 s; QPS = 224107```
```Complete in 42.445 s; QPS = 235599```

db_bench results before this change, and with this change + https://github.com/facebook/rocksdb/issues/8213 and https://github.com/facebook/rocksdb/issues/8191 -
Commands
```./db_bench  --benchmarks="fillseq,compact" -num=30000000 -key_size=32 -value_size=256 -use_direct_io_for_flush_and_compaction=true -db=/home/anand76/nvm_cache/db -partition_index_and_filters=true```

```./db_bench -db=/home/anand76/nvm_cache/db -use_existing_db=true -benchmarks=readrandom -num=30000000 -key_size=32 -value_size=256 -use_direct_reads=true -cache_size=1073741824 -cache_numshardbits=6 -cache_index_and_filter_blocks=true -read_random_exp_range=17 -statistics -partition_index_and_filters=true -threads=16 -duration=300```
Before
```
DB path: [/home/anand76/nvm_cache/db]
readrandom   :      80.702 micros/op 198104 ops/sec;   54.4 MB/s (3708999 of 3708999 found)
```
```
DB path: [/home/anand76/nvm_cache/db]
readrandom   :      87.124 micros/op 183625 ops/sec;   50.4 MB/s (3439999 of 3439999 found)
```
After
```
DB path: [/home/anand76/nvm_cache/db]
readrandom   :      77.653 micros/op 206025 ops/sec;   56.6 MB/s (3866999 of 3866999 found)
```
```
DB path: [/home/anand76/nvm_cache/db]
readrandom   :      84.962 micros/op 188299 ops/sec;   51.7 MB/s (3535999 of 3535999 found)
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8271

Reviewed By: zhichao-cao

Differential Revision: D28357511

Pulled By: anand1976

fbshipit-source-id: d1cfa236f00e649a18c53328be10a8062a4b6da2
2021-05-13 22:58:40 -07:00