rocksdb/utilities
Andrew Kryczka 8272a6de57 Optionally wait on bytes_per_sync to smooth I/O (#5183)
Summary:
The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.

My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.

Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.

There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).

The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183

Differential Revision: D14953553

Pulled By: siying

fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
2019-04-22 11:51:39 -07:00
..
backupable Remove some "using std::..." from header files. (#5113) 2019-03-27 10:28:21 -07:00
blob_db Introduce a new MultiGet batching implementation (#5011) 2019-04-11 14:28:26 -07:00
cassandra Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
checkpoint Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
compaction_filters Comment out unused variables 2018-03-05 13:13:41 -08:00
convenience Change RocksDB License 2017-07-15 16:11:23 -07:00
leveldb_options Change RocksDB License 2017-07-15 16:11:23 -07:00
memory Per-thread unique test db names (#4135) 2018-07-13 17:27:39 -07:00
merge_operators Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
option_change_migration Avoid double-compacting data in bottom level in manual compactions (#5138) 2019-04-16 23:32:20 -07:00
options Make it easier for users to load options from option file and set shared block cache. (#5063) 2019-03-21 16:25:28 -07:00
persistent_cache Removed const fields in copyable classes (#5095) 2019-04-05 15:40:30 -07:00
simulator_cache Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
table_properties_collectors Reduce runtime of compact_on_deletion_collector_test (#4779) 2018-12-13 14:47:08 -08:00
trace Add the max trace file size limitation option to Tracing (#4610) 2018-11-27 14:27:05 -08:00
transactions refactor SavePoints (#5192) 2019-04-19 20:33:04 -07:00
ttl Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
write_batch_with_index Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
debug.cc Remove v1 RangeDelAggregator (#4778) 2018-12-17 17:33:46 -08:00
env_librados_test.cc Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
env_librados.cc Optionally wait on bytes_per_sync to smooth I/O (#5183) 2019-04-22 11:51:39 -07:00
env_librados.md Add EnvLibrados - RocksDB Env of RADOS (#1222) 2016-07-21 11:16:34 -07:00
env_mirror_test.cc Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
env_mirror.cc Optionally wait on bytes_per_sync to smooth I/O (#5183) 2019-04-22 11:51:39 -07:00
env_timed_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
env_timed.cc Apply modernize-use-override (2nd iteration) 2019-02-14 14:41:36 -08:00
merge_operators.h Support StringAppendOperator(delimiter_char) constructor in java-api 2018-03-08 16:17:47 -08:00
object_registry_test.cc fix memory leak in two_level_iterator 2018-04-15 17:26:26 -07:00
util_merge_operators_test.cc Change RocksDB License 2017-07-15 16:11:23 -07:00