Commit Graph

1058 Commits

Author SHA1 Message Date
Andrew Kryczka
ffc48b6cad Update HISTORY.md for #9009 (#9036)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9036

Reviewed By: zhichao-cao

Differential Revision: D31640901

Pulled By: ajkr

fbshipit-source-id: 0b1e6e36094a74bb7906af44e29ecbeaa258de58
2021-10-14 09:36:32 -07:00
Giuseppe Ottaviano
4bfd415e34 Fix sequence number bump logic in multi-CF SST ingestion (#9005)
Summary:
The code in `IngestExternalFiles()` that bumps the DB's sequence number
depending on what seqnos were assigned to the files has 3 bugs:

1) There is an assertion that the sequence number is increased in all the
affected column families, but this is unnecessary, it is fine if some files can
stick to a lower sequence number. It is very easy to hit the assertion: it is
sufficient to insert 2 files in 2 CFs, one which overlaps the CF and one that
doesn't (for example the CF is empty). The line added in the
`IngestFilesIntoMultipleColumnFamilies_Success` test makes the assertion fail.

2) SetLastSequence() is called with the sum of all the bumps across CFs, but we
should take the maximum instead, as all CFs start with the current seqno and bump
it independently.

3) The code above is accidentally under a `#ifndef NDEBUG`, so it doesn't run in
optimized builds, so some files may be assigned seqnos from the future.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9005

Test Plan:
Added line in `IngestFilesIntoMultipleColumnFamilies_Success` that
triggers the assertion, verified that the test (and all the others) pass after
the fix.

Reviewed By: ajkr

Differential Revision: D31597892

Pulled By: ot

fbshipit-source-id: c2d3237f90290df1178736ace8653a9623f5a770
2021-10-12 20:39:52 -07:00
Levi Tamasi
7cc52cd8f5 Update HISTORY for PR 8994 (#9017)
Summary:
Also, expand on/clarify a comment in `VersionStorageInfoTest`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9017

Reviewed By: riversand963

Differential Revision: D31566130

Pulled By: ltamasi

fbshipit-source-id: 1d30c7af084c4de7b2030bc6c768838d65746010
2021-10-12 10:19:56 -07:00
Andrew Kryczka
a282eff3d1 Protect existing files in FaultInjectionTest{Env,FS}::ReopenWritableFile() (#8995)
Summary:
`FaultInjectionTest{Env,FS}::ReopenWritableFile()` functions were accidentally deleting WALs from previous `db_stress` runs causing verification to fail. They were operating under the assumption that `ReopenWritableFile()` would delete any existing file. It was a reasonable assumption considering the `{Env,FileSystem}::ReopenWritableFile()` documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage.

The fault injection change exposed that `ExternalSSTFileBasicTest.SyncFailure` was relying on a fault injection `Env` dropping unsynced data written by a regular `Env`. I changed that test to make its `SstFileWriter` use fault injection `Env`, and also implemented `LinkFile()` in fault injection so the unsynced data is tracked under the new name.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8995

Test Plan:
- Verified it fixes the following failure:

```
$ ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --ops_per_thread=1000 --prefixpercent=0 --readpercent=60 --reopen=0 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
$ ./db_stress --avoid_flush_during_recovery=1 --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=10 --key_len_percent_dist=1,30,69 --max_bytes_for_level_base=4194304 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_write_fault_one_in=16 --ops_per_thread=1000 --prefix_size=-1 --prefixpercent=0 --readpercent=50 --sync=1 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
Verification failed for column family 0 key 000000000000001300000000000000857878787878 (1143): Value not found: NotFound:
Crash-recovery verification failed :(
...
```

- `make check -j48`

Reviewed By: ltamasi

Differential Revision: D31495388

Pulled By: ajkr

fbshipit-source-id: 7886ccb6a07cb8b78ad7b6c1c341ccf40bb68385
2021-10-11 16:23:18 -07:00
Hui Xiao
6c3bf83d6f Update HISTORY.md for #8428 (#9001)
Summary:
Context:
HISTORY.md was not properly updated along with the change in https://github.com/facebook/rocksdb/pull/8428, where we introduced a change of accounting compression dictionary buffering memory and an extra condition of triggering data unbuffering.
Updated HISTORY.md for https://github.com/facebook/rocksdb/pull/8428 in 6.25.0 HISTORY.md section.
Updated blog post https://rocksdb.org/blog/2021/05/31/dictionary-compression.html.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9001

Reviewed By: ajkr

Differential Revision: D31517836

Pulled By: hx235

fbshipit-source-id: 01f6b30de4e1ff6b315aa8221139d9b700c7c629
2021-10-08 17:00:30 -07:00
Zhichao Cao
bcd049cd2d Ingest external SST files with Temperature hints (#8949)
Summary:
Add the file temperature to `IngestExternalFileArg` such that when SST files are ingested, user is able to assign the temperature to each SST file. If the temperature vector is empty or its size does not match the file name vector size, all ingested SST files will be assigned with `Temperature::unKnown`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8949

Test Plan: add the new test and make check

Reviewed By: siying

Differential Revision: D31127852

Pulled By: zhichao-cao

fbshipit-source-id: 141a81f0f7b473d88f4ab0cb2a21a114cbc6f83c
2021-10-08 10:32:24 -07:00
Andrew Kryczka
fcaa7ff638 Cancel manual compactions waiting on automatic compactions to drain (#8991)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8991

Test Plan: the new test hangs forever without this fix and passes with this fix.

Reviewed By: hx235

Differential Revision: D31456419

Pulled By: ajkr

fbshipit-source-id: a82c0e5560b6e6153089dccd8e46163c61b07bff
2021-10-07 15:23:55 -07:00
Zhichao Cao
699f45049d Introduce a mechanism to dump out blocks from block cache and re-insert to secondary cache (#8912)
Summary:
Background: Cache warming up will cause potential read performance degradation due to reading blocks from storage to the block cache. Since in production, the workload and access pattern to a certain DB is stable, it is a potential solution to dump out the blocks belonging to a certain DB to persist storage (e.g., to a file) and bulk-load the blocks to Secondary cache before the DB is relaunched. For example, when migrating a DB form host A to host B, it will take a short period of time, the access pattern to blocks in the block cache will not change much. It is efficient to dump out the blocks of certain DB, migrate to the destination host and insert them to the Secondary cache before we relaunch the DB.

Design: we introduce the interface of CacheDumpWriter and CacheDumpRead for user to store the blocks dumped out from block cache. RocksDB will encode all the information and send the string to the writer. User can implement their own writer it they want. CacheDumper and CacheLoad are introduced to save the blocks and load the blocks respectively.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8912

Test Plan: add new tests to lru_cache_test and pass make check.

Reviewed By: pdillinger

Differential Revision: D31452871

Pulled By: zhichao-cao

fbshipit-source-id: 11ab4f5d03e383f476947116361d54188d36ec48
2021-10-07 11:42:31 -07:00
Pradeep Ambati
e5bfb91d09 List blob files when using command - list_live_files_metadata (#8976)
Summary:
The ldb list_live_files_metadata command does not print any information about blob files currently. We would like to add this functionality. Note that list_live_files_metadata has two different modes of operation: the one shown above, which shows the LSM tree structure, and another one, which can be enabled using the flag --sort_by_filename and simply lists the files in numerical order regardless of level. We would like to show blob files in both modes.

Changes:
1. Using GetAllColumnFamilyMetaData API instead of GetLiveFilesMetaData API for fetching live files data.

Testing:
1. Created a sample rocksdb instance using dbbench command (this creates both SST and blob files)
2. Checked if the blob files are listed or not by using ldb commands.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8976

Reviewed By: ltamasi

Differential Revision: D31316061

Pulled By: pradeepambati

fbshipit-source-id: d15cdea192febf7a45f28deee2ba40615d3d84ab
2021-09-30 15:13:11 -07:00
anand76
532ff334d9 Don't ignore deletion rate limit if WAL dir is different (#8967)
Summary:
If WAL dir is different from the DB dir, we should still honor the SstFileManager deletion rate limit for SST files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8967

Test Plan: Add a new unit test in db_sst_test

Reviewed By: pdillinger

Differential Revision: D31220116

Pulled By: anand1976

fbshipit-source-id: bcde8a53a7d728e15e597fb5d07ee86c1b38bd28
2021-09-30 13:26:31 -07:00
Jay Zhuang
6b34eb0ebc Add remote compaction read/write bytes statistics (#8939)
Summary:
Add basic read/write bytes statistics on the primary side:
`REMOTE_COMPACT_READ_BYTES`
`REMOTE_COMPACT_WRITE_BYTES`

Fixed existing statistics missing some IO for remote compaction.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8939

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D31074672

Pulled By: jay-zhuang

fbshipit-source-id: c57afdba369990185008ffaec7e3fe7c62e8902f
2021-09-28 14:00:37 -07:00
Hui Xiao
d6bd1a0291 Support "level_at_creation" in TablePropertiesCollectorFactory::Context (#8919)
Summary:
Context:
Exposing the level of the sst file (i.e, table) where it is created in `TablePropertiesCollectorFactory::Context` allows users of `TablePropertiesCollectorFactory` to customize some implementation details of `TablePropertiesCollectorFactory` and `TablePropertiesCollector` based on the level of creation. For example, `TablePropertiesCollector::NeedCompact()` can return different values based on level of creation.
- Declared an extra field `level_at_creation` in `TablePropertiesCollectorFactory::Context`
- Allowed `level_at_creation` to be passed in as an argument in `IntTblPropCollectorFactory::CreateIntTblPropCollector()` and `UserKeyTablePropertiesCollectorFactory::CreateIntTblPropCollector()`, the latter of which is an internal wrapper of user's passed-in `TablePropertiesCollectorFactory::CreateTablePropertiesCollector()` used in table-building process
- Called `IntTblPropCollectorFactory::CreateIntTblPropCollector()` with `level_at_creation` passed into both `BlockBasedTableBuilder` and `PlainTableBuilder`
  -  `PlainTableBuilder` previously did not capture `level_at_creation` from `TableBuilderOptions` in `PlainTableFactory`. In order for it to call the method with this parameter, this PR also made `PlainTableBuilder` capture `level_at_creation` as a required parameter
- Called `IntTblPropCollectorFactory::CreateIntTblPropCollector()` with `level_at_creation` its overridden functions in its derived classes, including `RegularKeysStartWithAFactory::CreateIntTblPropCollector()` in `table_properties_collector_test.cc`, `SstFileWriterPropertiesCollectorFactory::CreateIntTblPropCollector()` in `sst_file_writer_collectors.h`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8919

Test Plan:
- Passed the added assertion for `context.level_at_creation`
- Passed existing tests
- Run `Make` to make sure adding a required parameter to `PlainTableBuilder`'s constructor does not break anything

Reviewed By: anand1976

Differential Revision: D30951729

Pulled By: hx235

fbshipit-source-id: c4a0173b0d9344a4cf47e1b987d759c1c73cb474
2021-09-28 12:35:24 -07:00
mrambacher
7fd68b7c39 Make WalFilter, SstPartitionerFactory, FileChecksumGenFactory, and TableProperties Customizable (#8638)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8638

Reviewed By: zhichao-cao

Differential Revision: D31024729

Pulled By: mrambacher

fbshipit-source-id: 954c04ccab0b8dee64050a27aadf78ed119106c0
2021-09-28 05:32:02 -07:00
sdong
b88109db19 Pollute buffer before calling Read() (#8955)
Summary:
Add a paranoid check where in case FileSystem layer doesn't fill the buffer but returns succeed, checksum is unlikely to match even if buffer contains a previous block. The byte modified is not useful anyway, so it isn't expect to change any behavior.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8955

Test Plan: See existing CI to pass.

Reviewed By: pdillinger

Differential Revision: D31183966

fbshipit-source-id: dcc4de429e18131873f783b90d3be55d7eb44a1f
2021-09-27 21:30:28 -07:00
Akanksha Mahajan
a2f29ce70a Update History.md for SingleDelete with user defined timestamp (#8964)
Summary:
Update History.md for SingleDelete with user defined timestamp

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8964

Reviewed By: zhichao-cao

Differential Revision: D31216214

Pulled By: akankshamahajan15

fbshipit-source-id: 0520132c75fe8f6823d154e41585b0df3086c04d
2021-09-27 14:58:30 -07:00
mrambacher
e0f697d2bd Make SliceTransform into a Customizable class (#8641)
Summary:
Made SliceTransform into a Customizable class.

Would be nice to write a test that stored and used a custom transform  in an SST table.

There are a set of tests (DBBlockFliterTest.PrefixExtractor*, SamePrefixTest.InDomainTest, PrefixTest.PrefixAndWholeKeyTest that run the same with or without a SliceTransform/PrefixFilter.  Is this expected?

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8641

Reviewed By: zhichao-cao

Differential Revision: D31142793

Pulled By: mrambacher

fbshipit-source-id: bb08672fccbfdc263dcae21f25a62307e1facda1
2021-09-27 07:43:47 -07:00
Yanqin Jin
b92cef2d1d Sort per-file blob read requests by offset (#8953)
Summary:
`RandomAccessFileReader::MultiRead()` tries to merge requests in direct IO, assuming input IO requests are
sorted by offsets.

Add a test in direct IO mode.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8953

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D31183546

Pulled By: riversand963

fbshipit-source-id: 5d043ec68e2daa47a3149066150afd41ee3d73e6
2021-09-24 22:14:30 -07:00
Hui Xiao
b25f2afeff Return Status::NotSupported() in RateLimiter::GetTotalPendingRequests default impl (#8950)
Summary:
Context:
After more discussion, a fix in https://github.com/facebook/rocksdb/issues/8938 might turn out to be too restrictive for the case where `GetTotalPendingRequests` might be invoked on RateLimiter classes that does not support the recently added API `RateLimiter::GetTotalPendingRequests` (https://github.com/facebook/rocksdb/issues/8890) due to the `assert(false)` in https://github.com/facebook/rocksdb/issues/8938. Furthermore, sentinel value like `-1` proposed in https://github.com/facebook/rocksdb/issues/8938 is easy to be ignored and unchecked. Therefore we decided to adopt `Status::NotSupported()`, which is also a convention of adding new API to public header in RocksDB.
- Changed return value type of  `RateLimiter::GetTotalPendingRequests` in related declaration/definition
- Passed in pointer argument to hold the output instead of returning it as before
- Adapted to the changes above in calling `RateLimiter::GetTotalPendingRequests` in test
- Minor improvement to `TEST_F(RateLimiterTest, GetTotalPendingRequests)`:  added failure message for assertion and replaced repetitive statements with a loop

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8950

Reviewed By: ajkr, pdillinger

Differential Revision: D31128450

Pulled By: hx235

fbshipit-source-id: 282ac9c4f3dacaa0aec6d0a993161f77ad47a040
2021-09-22 19:36:06 -07:00
sdong
c988e4720b Add HISTORY.md entry to a recent bug fix. (#8948)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8948

Reviewed By: anand1976

Differential Revision: D31127368

fbshipit-source-id: a374cb0baf88c3e15cd587a8f31e8a2d84432928
2021-09-22 16:23:08 -07:00
Hui Xiao
58444eadda Make RateLimiter::GetTotalPendingRequest() non pure virtual for backward compability (#8938)
Summary:
Context/Summary:
https://github.com/facebook/rocksdb/pull/8890 added a public API `RateLimiter::GetTotalPendingRequest()` but mistakenly marked it as pure virtual, forcing RateLimiter's derived classes to implement this function and breaking backward compatibility.

This PR makes `RateLimiter::GetTotalPendingRequest()` as non-pure virtual method by providing a trivial implementation in rate_limiter.h

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8938

Test Plan: Passing existing tests

Reviewed By: pdillinger

Differential Revision: D31100661

Pulled By: hx235

fbshipit-source-id: 06eff1005156a6e5a881e393b2c5b2ad706897d8
2021-09-21 21:29:26 -07:00
Peter Dillinger
5268cdc997 Finish BackupEngine migration to IOStatus (#8940)
Summary:
Updates a few remaining functions that should have been updated
from Status -> IOStatus, and adds to HISTORY for the overall change
including https://github.com/facebook/rocksdb/issues/8820.

This change is for inclusion in version 6.25.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8940

Test Plan: CI

Reviewed By: zhichao-cao

Differential Revision: D31085029

Pulled By: pdillinger

fbshipit-source-id: 91557c6a39ef1d90357d4f4dcd79af0645d87c7b
2021-09-21 11:13:17 -07:00
mrambacher
6924869867 Make SystemClock into a Customizable Class (#8636)
Summary:
Made SystemClock into a Customizable class, complete with CreateFromString.

Cleaned up some of the existing SystemClock implementations that were redundant (NoSleep was the same as the internal one for MockEnv).

Changed MockEnv construction to allow Clock to be passed to the Memory/MockFileSystem.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8636

Reviewed By: zhichao-cao

Differential Revision: D30483360

Pulled By: mrambacher

fbshipit-source-id: cd0e3a876c39f8c98fe13374c06e8edbd5b9f2a1
2021-09-21 09:23:48 -07:00
Peter Dillinger
d497cdfbb2 Update version to 6.25.0 (#8935)
Summary:
for release

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8935

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D31056726

Pulled By: pdillinger

fbshipit-source-id: 6fd022c39c19c35f10a2367df45dd2deb43df510
2021-09-20 11:22:41 -07:00
anand76
99fe4c5005 Add a gflag for IO uring enable/disable (#8931)
Summary:
In case of IO uring bugs, we need to provide a way for users to turn it off.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8931

Test Plan: Manually run db_bench with/without the option and verify the behavior

Reviewed By: pdillinger

Differential Revision: D31040252

Pulled By: anand1976

fbshipit-source-id: 56f2537d6ac8488c9e126296d8190ad9e0158f70
2021-09-18 10:24:56 -07:00
Jay Zhuang
1c290c785d RemoteCompaction support Fallback to local compaction (#8709)
Summary:
Add support for fallback to local compaction, the user can
return `CompactionServiceJobStatus::kUseLocal` to instruct RocksDB to
run the compaction locally instead of waiting for the remote compaction
result.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8709

Test Plan: unittest

Reviewed By: ajkr

Differential Revision: D30560163

Pulled By: jay-zhuang

fbshipit-source-id: 65d8905a4a1bc185a68daa120997f21d3198dbe1
2021-09-18 00:25:04 -07:00
Yanqin Jin
b512f4bc76 Batch blob read IO for MultiGet (#8699)
Summary:
In batched `MultiGet()`, RocksDB batches blob read IO and uses `RandomAccessFileReader::MultiRead()`
to read the blobs instead of issuing multiple `Read()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8699

Test Plan:
```
make check
```

Reviewed By: ltamasi

Differential Revision: D31030861

Pulled By: riversand963

fbshipit-source-id: a0df6060cbfd54cff9515a4eee08807b1dbcb0c8
2021-09-17 19:23:13 -07:00
Peter Dillinger
4149d044cd Change SstFileMetaData::size from size_t to uint64_t (#8926)
Summary:
Because even 32-bit systems can have large files

This is a "change" that I don't want intermingled with an upcoming refactoring.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8926

Test Plan: CI

Reviewed By: zhichao-cao

Differential Revision: D31020974

Pulled By: pdillinger

fbshipit-source-id: ca9eb4510697df6f1f55e37b37730b88b1809a92
2021-09-17 13:23:34 -07:00
mrambacher
272cc77751 Added a default Name method to Statistics (#8918)
Summary:
This keeps the implementations/API backward compatible.  Implementations of Statistics will need to override this method (and be registered with the ObjectRegistry) in order to be created via CreateFromString.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8918

Reviewed By: pdillinger

Differential Revision: D30958916

Pulled By: mrambacher

fbshipit-source-id: 75b99a84e9e11fda2a9e8eff9ee1ef69a17517b2
2021-09-17 07:25:43 -07:00
Akanksha Mahajan
d6aa8c49f8 Expose blob file information through the EventListener interface (#8675)
Summary:
1. Extend FlushJobInfo and CompactionJobInfo with information about the blob files generated by flush/compaction jobs. This PR add two structures BlobFileInfo and BlobFileGarbageInfo that contains the required information of blob files.
 2. Notify the creation and deletion of blob files through OnBlobFileCreationStarted, OnBlobFileCreated, and OnBlobFileDeleted.
 3. Test OnFile*Finish operations notifications with Blob Files.
 4. Log the blob file creation/deletion events through EventLogger in Log file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8675

Test Plan: Add new unit tests in listener_test

Reviewed By: ltamasi

Differential Revision: D30412613

Pulled By: akankshamahajan15

fbshipit-source-id: ca51b63c6e8c8d0485a38c503572bc5a82bd5d07
2021-09-16 17:23:36 -07:00
Jay Zhuang
b97c53b629 Add compaction priority information in RemoteCompaction (#8707)
Summary:
Add compaction priority information in RemoteCompaction, which
can be used to schedule high priority job first.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8707

Test Plan: unittest

Reviewed By: ajkr

Differential Revision: D30548401

Pulled By: jay-zhuang

fbshipit-source-id: b30446511fb31b4583c49edd8565d496cf013a34
2021-09-16 15:09:35 -07:00
Peter Dillinger
2819c7840e Fix PrepopulateBlockCache::kFlushOnly (#8750)
Summary:
kFlushOnly currently means "always" except in the case of
remote compaction. This makes it flushes only.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8750

Test Plan: test updated

Reviewed By: akankshamahajan15

Differential Revision: D30968034

Pulled By: pdillinger

fbshipit-source-id: 5dbd24dde18852a0e937a540995fba9bfbe89037
2021-09-15 15:33:20 -07:00
anand76
7743f033b1 More robust checking of IO uring completion data (#8894)
Summary:
Potential bugs in the IO uring implementation can cause bad data to be returned in the completion queue. Add some checks in the PosixRandomAccessFile::MultiRead completion handling code to catch such errors and fail the entire MultiRead. Also log some diagnostic messages and stack trace.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8894

Reviewed By: siying, pdillinger

Differential Revision: D30826982

Pulled By: anand1976

fbshipit-source-id: af91815ac760e095d6cc0466cf8bd5c10167fd15
2021-09-15 12:44:43 -07:00
eharry
0b6be7eb68 Fix WAL log data corruption #8723 (#8746)
Summary:
Fix WAL log data corruption when using DBOptions.manual_wal_flush(true) and WriteOptions.sync(true) together (https://github.com/facebook/rocksdb/issues/8723)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8746

Reviewed By: ajkr

Differential Revision: D30758468

Pulled By: riversand963

fbshipit-source-id: 07c20899d5f2447dc77861b4845efc68a59aa4e8
2021-09-13 20:15:59 -07:00
Yanqin Jin
2a2b3e03a5 Allow WriteBatch to have keys with different timestamp sizes (#8725)
Summary:
In the past, we unnecessarily requires all keys in the same write batch
to be from column families whose timestamps' formats are the same for
simplicity. Specifically, we cannot use the same write batch to write to
two column families, one of which enables timestamp while the other
disables it.

The limitation is due to the member `timestamp_size_` that used to exist
in each `WriteBatch` object. We pass a timestamp_size to the constructor
of `WriteBatch`. Therefore, users can simply use the old
`WriteBatch::Put()`, `WriteBatch::Delete()`, etc APIs for write, while
the internal implementation of `WriteBatch` will take care of memory
allocation for timestamps.

The above is not necessary.
One the one hand, users can set up a memory buffer to store user key and
then contiguously append the timestamp to the user key. Then the user
can pass this buffer to the `WriteBatch::Put(Slice&)` API.
On the other hand, users can set up a SliceParts object which is an
array of Slices and let the last Slice to point to the memory buffer
storing timestamp. Then the user can pass the SliceParts object to the
`WriteBatch::Put(SliceParts&)` API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8725

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D30654499

Pulled By: riversand963

fbshipit-source-id: 9d848c77ad3c9dd629aa5fc4e2bc16fb0687b4a2
2021-09-12 15:34:26 -07:00
Levi Tamasi
5f40b05c98 Update HISTORY.md for PR 8899 (#8905)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8905

Reviewed By: zhichao-cao

Differential Revision: D30873416

Pulled By: ltamasi

fbshipit-source-id: 6e55ec14a7fd2e562aa24cd0274e2436369923f5
2021-09-12 08:19:05 -07:00
Hui Xiao
12542488ef Add public API RateLimiter::GetTotalPendingRequests() (#8890)
Summary:
Context/Summary:
As users requested, a public API RateLimiter::GetTotalPendingRequests() is added to expose the total number of pending requests for bytes in the rate limiter, which is the size of the request queue of that priority (or of all priorities, if IO_TOTAL is interested) at the time when this API is called.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8890

Test Plan:
- Passing added new unit tests
- Passing existing unit tests

Reviewed By: ajkr

Differential Revision: D30815500

Pulled By: hx235

fbshipit-source-id: 2dfa990f651c1c47378b6215c751ad76a5824300
2021-09-10 08:37:04 -07:00
Hui Xiao
6785135bc2 Update HISTORY.md for new rate limiter io priorities (#8896)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8896

Reviewed By: ajkr

Differential Revision: D30846120

Pulled By: hx235

fbshipit-source-id: 9224ebce5437d63b0fb8af9171c6041a9ea5d90f
2021-09-09 13:26:20 -07:00
hx235
45175ca2e1 Charge read to rate limiter in BackupEngine (#8722)
Summary:
Context:
While all the non-trivial write operations in BackupEngine go through the RateLimiter, reads currently do not. In general, this is not a huge issue because (especially since some I/O efficiency fixes) reads in BackupEngine are mostly limited by corresponding writes, for both backup and restore. But in principle we should charge the RateLimiter for reads as well.
- Charged read operations in `BackupEngineImpl::CopyOrCreateFile`, `BackupEngineImpl::ReadFileAndComputeChecksum`, `BackupEngineImpl::BackupMeta::LoadFromFile` and `BackupEngineImpl::GetFileDbIdentities`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8722

Test Plan:
- Passed existing tests
- Passed added unit tests

Reviewed By: pdillinger

Differential Revision: D30610464

Pulled By: hx235

fbshipit-source-id: 9b08c9387159a5385c8d390d6666377a0d0117e5
2021-09-08 16:24:40 -07:00
Zhiyi Zhang
0cb0fc6fd3 Add DB properties for BlobDB (#8734)
Summary:
RocksDB exposes certain internal statistics via the DB property interface.
However, there are currently no properties related to BlobDB.

For starters, we would like to add the following BlobDB properties:
`rocksdb.num-blob-files`: number of blob files in the current Version (kind of like `num-files-at-level` but note this is not per level, since blob files are not part of the LSM tree).
`rocksdb.blob-stats`: this could return the total number and size of all blob files, and potentially also the total amount of garbage (in bytes) in the blob files in the current Version.
`rocksdb.total-blob-file-size`: the total size of all blob files (as a blob counterpart for `total-sst-file-size`) of all Versions.
`rocksdb.live-blob-file-size`: the total size of all blob files in the current Version.
`rocksdb.estimate-live-data-size`: this is actually an existing property that we can extend so it considers blob files as well. When it comes to blobs, we actually have an exact value for live bytes. Namely, live bytes can be computed simply as total bytes minus garbage bytes, summed over the entire set of blob files in the Version.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8734

Test Plan:
```
➜  rocksdb git:(new_feature_blobDB_properties) ./db_blob_basic_test
[==========] Running 16 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 10 tests from DBBlobBasicTest
[ RUN      ] DBBlobBasicTest.GetBlob
[       OK ] DBBlobBasicTest.GetBlob (12 ms)
[ RUN      ] DBBlobBasicTest.MultiGetBlobs
[       OK ] DBBlobBasicTest.MultiGetBlobs (11 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_CorruptIndex
[       OK ] DBBlobBasicTest.GetBlob_CorruptIndex (10 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_InlinedTTLIndex
[       OK ] DBBlobBasicTest.GetBlob_InlinedTTLIndex (12 ms)
[ RUN      ] DBBlobBasicTest.GetBlob_IndexWithInvalidFileNumber
[       OK ] DBBlobBasicTest.GetBlob_IndexWithInvalidFileNumber (9 ms)
[ RUN      ] DBBlobBasicTest.GenerateIOTracing
[       OK ] DBBlobBasicTest.GenerateIOTracing (11 ms)
[ RUN      ] DBBlobBasicTest.BestEffortsRecovery_MissingNewestBlobFile
[       OK ] DBBlobBasicTest.BestEffortsRecovery_MissingNewestBlobFile (13 ms)
[ RUN      ] DBBlobBasicTest.GetMergeBlobWithPut
[       OK ] DBBlobBasicTest.GetMergeBlobWithPut (11 ms)
[ RUN      ] DBBlobBasicTest.MultiGetMergeBlobWithPut
[       OK ] DBBlobBasicTest.MultiGetMergeBlobWithPut (14 ms)
[ RUN      ] DBBlobBasicTest.BlobDBProperties
[       OK ] DBBlobBasicTest.BlobDBProperties (21 ms)
[----------] 10 tests from DBBlobBasicTest (124 ms total)

[----------] 6 tests from DBBlobBasicTest/DBBlobBasicIOErrorTest
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/0 (12 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.GetBlob_IOError/1 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/0 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.MultiGetBlobs_IOError/1 (10 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/0
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/0 (1011 ms)
[ RUN      ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/1
[       OK ] DBBlobBasicTest/DBBlobBasicIOErrorTest.CompactionFilterReadBlob_IOError/1 (1013 ms)
[----------] 6 tests from DBBlobBasicTest/DBBlobBasicIOErrorTest (2066 ms total)

[----------] Global test environment tear-down
[==========] 16 tests from 2 test cases ran. (2190 ms total)
[  PASSED  ] 16 tests.
```

Reviewed By: ltamasi

Differential Revision: D30690849

Pulled By: Zhiyi-Zhang

fbshipit-source-id: a7567319487ad76bd1a2e24bf143afdbbd9e4346
2021-09-08 12:22:04 -07:00
Peter Dillinger
e40b04e9fa Fix POSIX LockFile after failure to create file (#8747)
Summary:
Failure to create the lock file (e.g. out of space) could
prevent future LockFile attempts in the same process on the same file
from succeeding.

Also added DEBUG code to fail assertion if PosixFileLock is destroyed
without using UnlockFile (which is a risk because FileLock is in the
public API with virtual destructor).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8747

Test Plan: test added

Reviewed By: ajkr

Differential Revision: D30732543

Pulled By: pdillinger

fbshipit-source-id: 4c30a959566d91f778d6fad3fbbd5f3941b097c1
2021-09-07 22:41:36 -07:00
Andrew Kryczka
9308ff366c Bytes read/written stats for CreateNewBackup*() (#8819)
Summary:
Gets `Statistics` from the options associated with the `DB` undergoing backup, and populates new ticker stats with the thread-local `IOContext` read/write counters for the threads doing backup work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8819

Reviewed By: pdillinger

Differential Revision: D30779238

Pulled By: ajkr

fbshipit-source-id: 75ccafc355f90906df5cf80367f7245b985772d8
2021-09-07 18:25:16 -07:00
Andrew Kryczka
941543721d Bytes read stat for VerifyChecksum() and VerifyFileChecksums() APIs (#8741)
Summary:
- Clarified some comments on compatibility for adding new ticker stats
- Added read I/O stats for `VerifyChecksum()` and `VerifyFileChecksums()` APIs

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8741

Test Plan: new unit test

Reviewed By: zhichao-cao

Differential Revision: D30708578

Pulled By: ajkr

fbshipit-source-id: d06b961f7e199ae92c266b683e39870aa8f63449
2021-09-07 13:28:29 -07:00
Peter Dillinger
c9cd5d25a8 Remove some unneeded code (#8736)
Summary:
* FullKey and ParseFullKey appear to serve no purpose in the public API
(or anything else) so removed. Only use in one test updated.
* NumberToString serves no purpose vs. ToString so removed, numerous
calls updated
* Remove unnecessary forward declarations in metadata.h by re-arranging
class definitions.
* Remove some unneeded semicolons

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736

Test Plan: existing tests

Reviewed By: mrambacher

Differential Revision: D30700039

Pulled By: pdillinger

fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2
2021-09-01 14:28:58 -07:00
Peter Dillinger
32752551b9 Fix a buffer size race condition in BackupEngine (#8732)
Summary:
If RateLimiter burst bytes changes during concurrent Restore
operations

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8732

Test Plan: updated unit test fails with TSAN before change, passes after

Reviewed By: ajkr

Differential Revision: D30683879

Pulled By: pdillinger

fbshipit-source-id: d0ddb3587ade91ee2a4d926b475acf7781b03086
2021-09-01 14:28:58 -07:00
anand76
ec9f52ece6 Fix a race in LRUCacheShard::Promote (#8717)
Summary:
In ```LRUCacheShard::Promote```, a reference is released outside the LRU mutex. Fix the race condition.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8717

Reviewed By: zhichao-cao

Differential Revision: D30649206

Pulled By: anand1976

fbshipit-source-id: 09c0af05b2294a7fe2c02876a61b0bad6e3ada61
2021-08-30 19:10:55 -07:00
Peter Dillinger
13ded69484 Built-in support for generating unique IDs, bug fix (#8708)
Summary:
Env::GenerateUniqueId() works fine on Windows and on POSIX
where /proc/sys/kernel/random/uuid exists. Our other implementation is
flawed and easily produces collision in a new multi-threaded test.
As we rely more heavily on DB session ID uniqueness, this becomes a
serious issue.

This change combines several individually suitable entropy sources
for reliable generation of random unique IDs, with goal of uniqueness
and portability, not cryptographic strength nor maximum speed.

Specifically:
* Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
rather than in Env implementation details. Callers are now told whether
the operation fails or succeeds.
* Adds an internal API GenerateRawUniqueId for generating high-quality
128-bit unique identifiers, by combining entropy from three "tracks":
  * Lots of info from default Env like time, process id, and hostname.
  * std::random_device
  * port::GenerateRfcUuid (when working)
* Built-in implementations of Env::GenerateUniqueId() will now always
produce an RFC 4122 UUID string, either from platform-specific API or
by converting the output of GenerateRawUniqueId.

DB session IDs now use GenerateRawUniqueId while DB IDs (not as
critical) try to use port::GenerateRfcUuid but fall back on
GenerateRawUniqueId with conversion to an RFC 4122 UUID.

GenerateRawUniqueId is declared and defined under env/ rather than util/
or even port/ because of the Env dependency.

Likely follow-up: enhance GenerateRawUniqueId to be faster after the
first call and to guarantee uniqueness within the lifetime of a single
process (imparting the same property onto DB session IDs).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708

Test Plan:
A new mini-stress test in env_test checks the various public
and internal APIs for uniqueness, including each track of
GenerateRawUniqueId individually. We can't hope to verify anywhere close
to 128 bits of entropy, but it can at least detect flaws as bad as the
old code. Serial execution of the new tests takes about 350 ms on
my machine.

Reviewed By: zhichao-cao, mrambacher

Differential Revision: D30563780

Pulled By: pdillinger

fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
2021-08-30 15:20:41 -07:00
Merlin Mao
6c2bd28a61 Update comments, fix typos. (#8721)
Summary:
- Removed the default empty constructors of `TraceWriter` and `TraceReader`.
- Removed unused `ReadFooter()` from `ReplayerImpl`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8721

Test Plan: None

Reviewed By: zhichao-cao

Differential Revision: D30609743

Pulled By: autopear

fbshipit-source-id: 7e2626b015bd57ebb408a2836b4b4217cea10002
2021-08-27 13:16:32 -07:00
anand76
ebaa3c8a59 Fix a race condition in DumpStats() during iteration of the ColumnFamilySet (#8714)
Summary:
DumpStats() iterates through the ColumnFamilySet. There is a potential
race condition because it does Ref the cfd, and the cfd could get
destroyed during the iteration.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8714

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D30580199

Pulled By: anand1976

fbshipit-source-id: 60a3443ad0d4f7ac6a977dec780e6d2c1b70b850
2021-08-26 15:40:26 -07:00
Yanqin Jin
f235f4b0a3 Fix a bug of secondary instance sequence going backward (#8653)
Summary:
Recent refactor of `ReactiveVersionSet::ReadAndApply()` uses
`ManifestTailer` whose `Iterate()` method can cause the db's
`last_sequence_` to go backward. Consequently, read requests can see
out-dated data. For example, latest changes to the primary will not be
seen on the secondary even after a `TryCatchUpWithPrimary()` if no new
write batches are read from the WALs and no new MANIFEST entries are
read from the MANIFEST.

Fix the bug so that `VersionEditHandler::CheckIterationResult` will
never decrease `last_sequence_`, `last_allocated_sequence_` and
`last_published_sequence_`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8653

Test Plan: make check

Reviewed By: jay-zhuang

Differential Revision: D30272084

Pulled By: riversand963

fbshipit-source-id: c6a49c534b2509b93ef62d8936ed0acd5b860eaa
2021-08-24 18:18:36 -07:00
Yanqin Jin
229350ef48 Allow iterate refresh for secondary instance (#8700)
Summary:
Test plan
make check

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8700

Reviewed By: zhichao-cao

Differential Revision: D30523907

Pulled By: riversand963

fbshipit-source-id: 68928ab4dafb64ce80ab7bc69d83727a4713ab91
2021-08-24 15:40:56 -07:00