Clarifying comments for Read() APIs (#8029)

Summary:
I recently discovered the confusing, undocumented semantics of
Read() functions in the FileSystem and Env APIs. I have added
clarification to the best of my reverse-engineered understanding, and
made a note in HISTORY.md for implementors to check their
implementations, as a subtly non-adherent implementation could lead to
RocksDB quietly ignoring some portion of a file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8029

Test Plan: no code changes

Reviewed By: anand1976

Differential Revision: D26831698

Pulled By: pdillinger

fbshipit-source-id: 208f97ff6037bc13bb2ef360b987c2640c79bd03
This commit is contained in:
Peter Dillinger 2021-03-05 14:40:35 -08:00 committed by Facebook GitHub Bot
parent cb25bc1128
commit ce391ff84b
3 changed files with 31 additions and 2 deletions

View File

@ -8,6 +8,7 @@
* Add suppport to extend DB::VerifyFileChecksums API to also verify blob files checksum. * Add suppport to extend DB::VerifyFileChecksums API to also verify blob files checksum.
* When using the new BlobDB, the amount of data written by flushes/compactions is now broken down into table files and blob files in the compaction statistics; namely, Write(GB) denotes the amount of data written to table files, while Wblob(GB) means the amount of data written to blob files. * When using the new BlobDB, the amount of data written by flushes/compactions is now broken down into table files and blob files in the compaction statistics; namely, Write(GB) denotes the amount of data written to table files, while Wblob(GB) means the amount of data written to blob files.
* Add new SetBufferSize API to WriteBufferManager to allow dynamic management of memory allotted to all write buffers. This allows user code to adjust memory monitoring provided by WriteBufferManager as process memory needs change datasets grow and shrink. * Add new SetBufferSize API to WriteBufferManager to allow dynamic management of memory allotted to all write buffers. This allows user code to adjust memory monitoring provided by WriteBufferManager as process memory needs change datasets grow and shrink.
* Clarified the required semantics of Read() functions in FileSystem and Env APIs. Please ensure any custom implementations are compliant.
* For the new integrated BlobDB implementation, compaction statistics now include the amount of data read from blob files during compaction (due to garbage collection or compaction filters). Write amplification metrics have also been extended to account for data read from blob files. * For the new integrated BlobDB implementation, compaction statistics now include the amount of data read from blob files during compaction (due to garbage collection or compaction filters). Write amplification metrics have also been extended to account for data read from blob files.
### New Features ### New Features

View File

@ -619,6 +619,10 @@ class SequentialFile {
// "scratch[0..n-1]" must be live when "*result" is used. // "scratch[0..n-1]" must be live when "*result" is used.
// If an error was encountered, returns a non-OK status. // If an error was encountered, returns a non-OK status.
// //
// After call, result->size() < n only if end of file has been
// reached (or non-OK status). Read might fail if called again after
// first result->size() < n.
//
// REQUIRES: External synchronization // REQUIRES: External synchronization
virtual Status Read(size_t n, Slice* result, char* scratch) = 0; virtual Status Read(size_t n, Slice* result, char* scratch) = 0;
@ -664,7 +668,8 @@ struct ReadRequest {
// File offset in bytes // File offset in bytes
uint64_t offset; uint64_t offset;
// Length to read in bytes // Length to read in bytes. `result` only returns fewer bytes if end of file
// is hit (or `status` is not OK).
size_t len; size_t len;
// A buffer that MultiRead() can optionally place data in. It can // A buffer that MultiRead() can optionally place data in. It can
@ -693,6 +698,10 @@ class RandomAccessFile {
// "*result" is used. If an error was encountered, returns a non-OK // "*result" is used. If an error was encountered, returns a non-OK
// status. // status.
// //
// After call, result->size() < n only if end of file has been
// reached (or non-OK status). Read might fail if called again after
// first result->size() < n.
//
// Safe for concurrent use by multiple threads. // Safe for concurrent use by multiple threads.
// If Direct I/O enabled, offset, n, and scratch should be aligned properly. // If Direct I/O enabled, offset, n, and scratch should be aligned properly.
virtual Status Read(uint64_t offset, size_t n, Slice* result, virtual Status Read(uint64_t offset, size_t n, Slice* result,
@ -977,6 +986,11 @@ class RandomRWFile {
// Read up to `n` bytes starting from offset `offset` and store them in // Read up to `n` bytes starting from offset `offset` and store them in
// result, provided `scratch` size should be at least `n`. // result, provided `scratch` size should be at least `n`.
//
// After call, result->size() < n only if end of file has been
// reached (or non-OK status). Read might fail if called again after
// first result->size() < n.
//
// Returns Status::OK() on success. // Returns Status::OK() on success.
virtual Status Read(uint64_t offset, size_t n, Slice* result, virtual Status Read(uint64_t offset, size_t n, Slice* result,
char* scratch) const = 0; char* scratch) const = 0;

View File

@ -563,6 +563,10 @@ class FSSequentialFile {
// "scratch[0..n-1]" must be live when "*result" is used. // "scratch[0..n-1]" must be live when "*result" is used.
// If an error was encountered, returns a non-OK status. // If an error was encountered, returns a non-OK status.
// //
// After call, result->size() < n only if end of file has been
// reached (or non-OK status). Read might fail if called again after
// first result->size() < n.
//
// REQUIRES: External synchronization // REQUIRES: External synchronization
virtual IOStatus Read(size_t n, const IOOptions& options, Slice* result, virtual IOStatus Read(size_t n, const IOOptions& options, Slice* result,
char* scratch, IODebugContext* dbg) = 0; char* scratch, IODebugContext* dbg) = 0;
@ -609,7 +613,8 @@ struct FSReadRequest {
// File offset in bytes // File offset in bytes
uint64_t offset; uint64_t offset;
// Length to read in bytes // Length to read in bytes. `result` only returns fewer bytes if end of file
// is hit (or `status` is not OK).
size_t len; size_t len;
// A buffer that MultiRead() can optionally place data in. It can // A buffer that MultiRead() can optionally place data in. It can
@ -639,6 +644,10 @@ class FSRandomAccessFile {
// "*result" is used. If an error was encountered, returns a non-OK // "*result" is used. If an error was encountered, returns a non-OK
// status. // status.
// //
// After call, result->size() < n only if end of file has been
// reached (or non-OK status). Read might fail if called again after
// first result->size() < n.
//
// Safe for concurrent use by multiple threads. // Safe for concurrent use by multiple threads.
// If Direct I/O enabled, offset, n, and scratch should be aligned properly. // If Direct I/O enabled, offset, n, and scratch should be aligned properly.
virtual IOStatus Read(uint64_t offset, size_t n, const IOOptions& options, virtual IOStatus Read(uint64_t offset, size_t n, const IOOptions& options,
@ -975,6 +984,11 @@ class FSRandomRWFile {
// Read up to `n` bytes starting from offset `offset` and store them in // Read up to `n` bytes starting from offset `offset` and store them in
// result, provided `scratch` size should be at least `n`. // result, provided `scratch` size should be at least `n`.
//
// After call, result->size() < n only if end of file has been
// reached (or non-OK status). Read might fail if called again after
// first result->size() < n.
//
// Returns Status::OK() on success. // Returns Status::OK() on success.
virtual IOStatus Read(uint64_t offset, size_t n, const IOOptions& options, virtual IOStatus Read(uint64_t offset, size_t n, const IOOptions& options,
Slice* result, char* scratch, Slice* result, char* scratch,