Commit Graph

36 Commits

Author SHA1 Message Date
Andrew Kryczka
8272a6de57 Optionally wait on bytes_per_sync to smooth I/O (#5183)
Summary:
The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.

My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.

Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.

There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).

The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183

Differential Revision: D14953553

Pulled By: siying

fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
2019-04-22 11:51:39 -07:00
Zhongyi Xie
3c5eed5ebe remove incorrect assert in GetUniqueIdFromFile (#5102)
Summary:
User report has shown that sometimes `BlockBasedTable::SetupCacheKeyPrefix` would assert when trying to generate an id from the file. The actual cause seems to be hardware related but we might be better off without the incorrect assertion
See T42178927 for more information
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5102

Differential Revision: D14604677

Pulled By: miasantreble

fbshipit-source-id: fcb09207ebdc4fa66e941afbc0523d84797e7ad7
2019-03-25 23:28:29 -07:00
Rashmi Sharma
a4396f9218 Make it easier for users to load options from option file and set shared block cache. (#5063)
Summary:
[RocksDB] Make it easier for users to load options from option file and set shared block cache.
Right now, it requires several dynamic casting for users to set the shared block cache to their option struct cast from the option file.
If people don't do that, every CF of every DB will generate its own 8MB block cache. It's not a usable setting. So we are dragging every user who loads options from the file into such a mess.
Instead, we should allow them to pass their cache object to LoadLatestOptions() and LoadOptionsFromFile(), so that those loaded option structs will have the shared block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5063

Differential Revision: D14518584

Pulled By: rashmishrm

fbshipit-source-id: c91430ff9425a0e67d76fc67931d755f491ca5aa
2019-03-21 16:25:28 -07:00
Andrew Kryczka
186b3afaa8 Use fallocate even if hole-punching unsupported (#5023)
Summary:
The compiler flag `-DROCKSDB_FALLOCATE_PRESENT` was only set when
`fallocate`, `FALLOC_FL_KEEP_SIZE`, and `FALLOC_FL_PUNCH_HOLE` were all
present. However, the last of the three is not really necessary for the
primary `fallocate` use case; furthermore, it was introduced only in later
Linux kernel versions (2.6.38+).

This PR changes the flag `-DROCKSDB_FALLOCATE_PRESENT` to only require
`fallocate` and `FALLOC_FL_KEEP_SIZE` to be present. There is a separate
check for `FALLOC_FL_PUNCH_HOLE` only in the place where it is used.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5023

Differential Revision: D14248487

Pulled By: siying

fbshipit-source-id: a10ed0b902fa755988e957bd2dcec9081ec0502e
2019-03-04 15:43:17 -08:00
Young Tack Jin
4091597c67 fix for nvme device path (#4866)
Summary:
nvme device path doesn't have "block" as like "nvme/nvme0/nvme0n1"
or "nvme/nvme0/nvme0n1/nvme0n1p1". the last directory such as
"nvme0n1p1" should be removed if nvme drive is partitioned.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4866

Differential Revision: D13627824

Pulled By: riversand963

fbshipit-source-id: 09ab968f349f3dbb890beea20193f1359b17d317
2019-01-31 19:08:37 -08:00
Sagar Vemuri
645e57c22d Assert for Direct IO at the beginning in PositionedRead (#3891)
Summary:
Moved the direct-IO assertion to the top in `PosixSequentialFile::PositionedRead`, as it doesn't make sense to check for sector alignments before checking for direct IO.
Closes https://github.com/facebook/rocksdb/pull/3891

Differential Revision: D8267972

Pulled By: sagar0

fbshipit-source-id: 0ecf77c0fb5c35747a4ddbc15e278918c0849af7
2018-06-21 14:58:01 -07:00
奏之章
6e08916eb3 Fix Fadvise on closed file when reads use mmap
Summary:
```PosixMmapReadableFile::fd_``` is closed after created, but needs to remain open for the lifetime of `PosixMmapReadableFile` since it is used whenever `InvalidateCache` is called.
Closes https://github.com/facebook/rocksdb/pull/2764

Differential Revision: D8152515

Pulled By: ajkr

fbshipit-source-id: b738a6a55ba4e392f9b0f374ff396a1e61c64f65
2018-05-25 10:57:57 -07:00
Dmitri Smirnov
3db8504cde Catchup with posix features
Summary:
Catch up with Posix features
  NewWritableRWFile must fail when file does not exists
  Implement Env::Truncate()
  Adjust Env options optimization functions
  Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857

Differential Revision: D8053610

Pulled By: ajkr

fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
2018-05-24 15:13:04 -07:00
Andrew Kryczka
46152d53bf Second attempt at db_stress crash-recovery verification
Summary:
- Original commit: a4fb1f8c04
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e

This PR includes the contents of the original commit plus two bug fixes, which are:

- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793

Differential Revision: D7811671

Pulled By: ajkr

fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
2018-04-30 12:27:34 -07:00
Andrew Kryczka
6afe22db2e revert db_stress crash-recovery verification
Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786

Differential Revision: D7794516

Pulled By: ajkr

fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33
2018-04-27 12:57:01 -07:00
Andrew Kryczka
a4fb1f8c04 Add crash-recovery correctness check to db_stress
Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
2018-04-24 15:58:22 -07:00
Zhongyi Xie
954b496b3f fix memory leak in two_level_iterator
Summary:
this PR fixes a few failed contbuild:
1. ASAN memory leak in Block::NewIterator (table/block.cc:429). the proper destruction of first_level_iter_ and second_level_iter_ of two_level_iterator.cc is missing from the code after the refactoring in https://github.com/facebook/rocksdb/pull/3406
2. various unused param errors introduced by https://github.com/facebook/rocksdb/pull/3662
3. updated comment for `ForceReleaseCachedEntry` to emphasize the use of `force_erase` flag.
Closes https://github.com/facebook/rocksdb/pull/3718

Reviewed By: maysamyabandeh

Differential Revision: D7621192

Pulled By: miasantreble

fbshipit-source-id: 476c94264083a0730ded957c29de7807e4f5b146
2018-04-15 17:26:26 -07:00
David Lai
3be9b36453 comment unused parameters to turn on -Wunused-parameter flag
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662

Differential Revision: D7426121

Pulled By: Dayvedde

fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
2018-04-12 17:59:16 -07:00
Andrew Kryczka
5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Igor Sugak
aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai
f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Tamir Duberstein
cd5092e168 Suppress unused warnings
Summary:
- Use `__unused__` everywhere
- Suppress unused warnings in Release mode
    + This currently affects non-MSVC builds (e.g. mingw64).
Closes https://github.com/facebook/rocksdb/pull/3448

Differential Revision: D6885496

Pulled By: miasantreble

fbshipit-source-id: f2f6adacec940cc3851a9eee328fafbf61aad211
2018-02-02 12:27:07 -08:00
Yi Wu
bbcd3b0bd2 Suppress valgrind "unimplemented functionality" error
Summary:
Add ROCKSDB_VALGRIND_RUN macro and suppress false-positive "unimplemented functionality" throw by valgrind for steam hints.

Another approach would be add a valgrind suppress file. Valgrind is suppose to print the suppression when given "--gen-suppressions=all" param, which is suppose to be the content for the suppression file. But it doesn't print.
Closes https://github.com/facebook/rocksdb/pull/3174

Differential Revision: D6338786

Pulled By: yiwu-arbug

fbshipit-source-id: 3559efa5f3b92d40d09ad6ac82bc7b59f86c75aa
2017-11-15 14:28:34 -08:00
Shaohua Li
eefd75a228 Stream
Summary:
Add a simple policy for NVMe write time life hint
Closes https://github.com/facebook/rocksdb/pull/3095

Differential Revision: D6298030

Pulled By: shligit

fbshipit-source-id: 9a72a42e32e92193af11599eb71f0cf77448e24d
2017-11-10 09:26:24 -08:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Aaron Gao
43e4eef77f remove unnecessary fadvise
Summary:
We has to remove this line because previously it is only called when use_os_buffer = false. But now we have direct io to replace it.
Closes https://github.com/facebook/rocksdb/pull/2573

Differential Revision: D5412824

Pulled By: yiwu-arbug

fbshipit-source-id: 81f3f0cdf94566bfc09ef2ff123e40cddbe36b36
2017-07-12 18:14:41 -07:00
Siying Dong
857e9960be Improve the error message for I/O related errors.
Summary:
Force people to write something other than file name while returning status for IOError.
Closes https://github.com/facebook/rocksdb/pull/2493

Differential Revision: D5321309

Pulled By: siying

fbshipit-source-id: 38bcf6c19e80831cd3e300a047e975cbb131d822
2017-06-26 12:57:01 -07:00
Mike Kolupaev
bc09c8a0db Fix crash in PosixWritableFile::Close() when fstat() fails
Summary:
We had a crash in this code: `fstat()` failed; `file_stats` contained garbage, in particular `file_stats.st_blksize == 6`; the expression `file_stats.st_blocks / (file_stats.st_blksize / 512)` divided by zero.
Closes https://github.com/facebook/rocksdb/pull/2420

Differential Revision: D5216110

Pulled By: al13n321

fbshipit-source-id: 6d8fc5e7c4f98c1139e68c7829ebdbac68b0fce0
2017-06-08 19:56:22 -07:00
Maysam Yabandeh
550a1df72c Fix clang errors by asserting the precondition
Summary:
USE_CLANG=1 make -j32 analyze
The two errors would disappear after the assertion.
Closes https://github.com/facebook/rocksdb/pull/2416

Differential Revision: D5193526

Pulled By: maysamyabandeh

fbshipit-source-id: 16a21f18f68023f862764dd3ab9e00ca60b0eefa
2017-06-06 12:56:52 -07:00
Aaron Gao
6b99dbe049 fix memory alignment with logical sector size
Summary:
we align the buffer with logical sector size and should not test it with page size, which is usually 4k.
Closes https://github.com/facebook/rocksdb/pull/2245

Differential Revision: D5001842

Pulled By: lightmark

fbshipit-source-id: a7135fcf6351c6db363e8908956b1e193a4a6291
2017-05-04 01:30:13 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Aaron Gao
6616e4d629 add prefetch to PosixRandomAccessFile in buffered io
Summary:
Every time after a compaction/flush finish, we issue user reads to put the table into block cache which includes a couple of IO that read footer, index blocks, meta block, etc. So we implement Prefetch here to reduce IO.
Closes https://github.com/facebook/rocksdb/pull/2196

Differential Revision: D4931782

Pulled By: lightmark

fbshipit-source-id: 5a13d58dcab209964352322217193bbf7ff78149
2017-04-26 14:47:23 -07:00
Aaron Gao
3b4d1b7a44 add <sys/sysmacros.h> to avoid warning with glibc 2.25
Summary:
https://github.com/facebook/rocksdb/issues/2152
Closes https://github.com/facebook/rocksdb/pull/2208

Differential Revision: D4945577

Pulled By: lightmark

fbshipit-source-id: 4e679150f2c9443d3be0b6008b26b65fabbda75a
2017-04-26 01:26:55 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Aaron Gao
0716527341 remove warning
Summary:
st_blocks shows 16 though the right value is 8. This happens occasionally which seems a bug.
Closes https://github.com/facebook/rocksdb/pull/2160

Differential Revision: D4893542

Pulled By: lightmark

fbshipit-source-id: 68e832586b58bbc6162efbe83ce273f1570d5be3
2017-04-14 18:56:14 -07:00
Aaron Gao
13b50358fb add space for buggy kernel warning
Summary:
add the missing space
Closes https://github.com/facebook/rocksdb/pull/2150

Differential Revision: D4880696

Pulled By: lightmark

fbshipit-source-id: a4e0ad6a8ea45d6469d3f6c8514fdeb4cf10aaf5
2017-04-13 16:12:04 -07:00
Aaron Gao
ba7da434ae fix db_stress crash caused by buggy kernel warning
Summary:
filter the warning out and only print it once.
Closes https://github.com/facebook/rocksdb/pull/2137

Differential Revision: D4870925

Pulled By: lightmark

fbshipit-source-id: 91b363ce7f70bce88b0780337f408fc4649139b8
2017-04-11 16:56:59 -07:00
Aaron Gao
9e72939029 only FALLOC_FL_PUNCH_HOLE when ftruncate is buggy
Summary:
In RocksDB, we sometimes preallocate the estimated space for a file to have better perf with fallocate (if supported). Usually it is a little bit bigger than the real resulting file size. At this time, we have to let the Filesystem reclaim the space not used.

Ideally, calling ftruncate to truncate the file to its real size should be enough. HOWEVER, it isn't on tmpfs, which we witness in our case, with some buggy kernel version. ftruncate a file with preallocated space doesn't change number of the blocks used by the file, which means the space not used by the file is not returned to the filesystems. So in this case we need fallocate with FALLOC_FL_PUNCH_HOLE to explicitly reclaim the used blocks. It is a hack to cope with the kernel bug and usually we should not need it.
Closes https://github.com/facebook/rocksdb/pull/2102

Differential Revision: D4848934

Pulled By: lightmark

fbshipit-source-id: f1b40b5
2017-04-06 18:25:03 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00