Summary:
This PR does the following:
-> Creates a WinFileSystem class. This class is the Windows equivalent of the PosixFileSystem and will be used on Windows systems.
-> Introduces a CustomEnv class. A CustomEnv is an Env that takes a FileSystem as constructor argument. I believe there will only ever be two implementations of this class (PosixEnv and WinEnv). There is still a CustomEnvWrapper class that takes an Env and a FileSystem and wraps the Env calls with the input Env but uses the FileSystem for the FileSystem calls
-> Eliminates the public uses of the LegacyFileSystemWrapper.
With this change in place, there are effectively the following patterns of Env:
- "Base Env classes" (PosixEnv, WinEnv). These classes implement the core Env functions (e.g. Threads) and have a hard-coded input FileSystem. These classes inherit from CompositeEnv, implement the core Env functions (threads) and delegate the FileSystem-like calls to the input file system.
- Wrapped Composite Env classes (MemEnv). These classes take in an Env and a FileSystem. The core env functions are re-directed to the wrapped env. The file system calls are redirected to the input file system
- Legacy Wrapped Env classes. These classes take in an Env input (but no FileSystem). The core env functions are re-directed to the wrapped env. A "Legacy File System" is created using this env and the file system calls directed to the env itself.
With these changes in place, the PosixEnv becomes a singleton -- there is only ever one created. Any other use of the PosixEnv is via another wrapped env. This cleans up some of the issues with the env construction and destruction.
Additionally, there were places in the code that required had an Env when they required a FileSystem. Many of these places would wrap the Env with a LegacyFileSystemWrapper instead of using the env->GetFileSystem(). These places were changed, thereby removing layers of additional redirection (LegacyFileSystem --> Env --> Env::FileSystem).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7703
Reviewed By: zhichao-cao
Differential Revision: D25762190
Pulled By: anand1976
fbshipit-source-id: 1a088e97fc916f28ac69c149cd1dcad0ab31704b
Summary:
While rocksdb can compile on both macOS and Linux with Buck, it couldn't be
compiled on Windows. The only way to compile it on Windows was with the CMake
build.
To keep the multi-platform complexity low, I've simply included all the Windows
bits in the TARGETS file, and added large #if blocks when not on Windows, the
same was done on the posix specific files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7406
Test Plan:
On my devserver:
buck test //rocksdb/...
On Windows:
buck build mode/win //rocksdb/src:rocksdb_lib
Reviewed By: pdillinger
Differential Revision: D23874358
Pulled By: xavierd
fbshipit-source-id: 8768b5d16d7e8f44b5ca1e2483881ca4b24bffbe
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
From the reset of the code, it looks this this maybe can be unconditionally given the attribute? But I couldn't test with MSVC so I defensively put under CPP.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6075
Differential Revision: D18723749
fbshipit-source-id: 45fc8732c28dd29aab1644225d68f3c6f39bd69b
Summary:
Since we do not evict a file's blocks from block cache before that file
is deleted, we require a file's cache ID prefix is both unique and
non-reusable. However, the Windows functionality we were relying on only
guaranteed uniqueness. That meant a newly created file could be assigned
the same cache ID prefix as a deleted file. If the newly created file
had block offsets matching the deleted file, full cache keys could be
exactly the same, resulting in obsolete data blocks returned from cache
when trying to read from the new file.
We noticed this when running on FAT32 where compaction was writing out
of order keys due to reading obsolete blocks from its input files. The
functionality is documented as behaving the same on NTFS, although I
wasn't able to repro it there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5844
Test Plan:
we had a reliable repro of out-of-order keys on FAT32 that
was fixed by this change
Differential Revision: D17752442
fbshipit-source-id: 95d983f9196cf415f269e19293b97341edbf7e00
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
Differential Revision: D15551366
Pulled By: siying
fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
Summary:
The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.
My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.
Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.
There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).
The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183
Differential Revision: D14953553
Pulled By: siying
fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.
Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339
Differential Revision: D9654990
Pulled By: ajkr
fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
Summary:
Catch up with Posix features
NewWritableRWFile must fail when file does not exists
Implement Env::Truncate()
Adjust Env options optimization functions
Implement MemoryMappedBuffer on Windows.
Closes https://github.com/facebook/rocksdb/pull/3857
Differential Revision: D8053610
Pulled By: ajkr
fbshipit-source-id: ccd0d46c29648a9f6f496873bc1c9d6c5547487e
Summary:
Returning bytes_read causes the caller to call GetLastError()
to report failure but the lasterror may be overwritten by then
so we lose the error code.
Fix up CMake file to include xpress source code only when needed.
Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795
Differential Revision: D7832935
Pulled By: anand1976
fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b
Summary:
This patch addressed several issues.
Portability including db_test std::thread -> port::Thread Cc: @
and %z to ROCKSDB portable macro. Cc: maysamyabandeh
Implement Env::AreFilesSame
Make the implementation of file unique number more robust
Get rid of C-runtime and go directly to Windows API when dealing
with file primitives.
Implement GetSectorSize() and aling unbuffered read on the value if
available.
Adjust Windows Logger for the new interface, implement CloseImpl() Cc: anand1976
Fix test running script issue where $status var was of incorrect scope
so the failures were swallowed and not reported.
DestroyDB() creates a logger and opens a LOG file in the directory
being cleaned up. This holds a lock on the folder and the cleanup is
prevented. This fails one of the checkpoin tests. We observe the same in production.
We close the log file in this change.
Fix DBTest2.ReadAmpBitmapLiveInCacheAfterDBClose failure where the test
attempts to open a directory with NewRandomAccessFile which does not
work on Windows.
Fix DBTest.SoftLimit as it is dependent on thread timing. CC: yiwu-arbug
Closes https://github.com/facebook/rocksdb/pull/3552
Differential Revision: D7156304
Pulled By: siying
fbshipit-source-id: 43db0a757f1dfceffeb2b7988043156639173f5b
Summary:
BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
(1) improve the way the verification is done. With this it is 5 times faster
(2) run fewer tests for large blocks. This cut it down by another 10 times.
Now it can finish in similar time as other tests.
Closes https://github.com/facebook/rocksdb/pull/3313
Differential Revision: D6643711
Pulled By: siying
fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
Summary:
FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.
After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.
Scenario 1: 40MB/s + 40MB/s R/W compaction throughput
Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 56.897 ms | 35.593 ms | -37.4%
P99 | 3.905 ms | 3.896 ms | -2.8%
Scenario 2: 14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.
Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 86.227 ms | 50.436 ms | -41.5%
P99 | 8.415 ms | 3.356 ms | -60.1%
Closes https://github.com/facebook/rocksdb/pull/3225
Differential Revision: D6624174
Pulled By: miasantreble
fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
Summary:
MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var.
Closes https://github.com/facebook/rocksdb/pull/3048
Differential Revision: D6126272
Pulled By: maysamyabandeh
fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f
Summary:
Make default impl return NoSupported so the db_blob
tests exist in a meaningful manner.
Replace std::thread to port::Thread
Closes https://github.com/facebook/rocksdb/pull/2465
Differential Revision: D5275563
Pulled By: yiwu-arbug
fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6
Summary:
This was exposed by a48a62d, which made NDEBUG the default for cmake
builds.
Closes https://github.com/facebook/rocksdb/pull/2315
Differential Revision: D5079583
Pulled By: sagar0
fbshipit-source-id: c614e96a40df016a834a62b6236852265e7ee4db
Summary:
Remove double buffering on RandomRead on Windows.
With more logic appear in file reader/write Read no longer
obeys forwarding calls to Windows implementation.
Previously direct_io (unbuffered) was only available on Windows
but now is supported as generic.
We remove intermediate buffering on Windows.
Remove random_access_max_buffer_size option which was windows specific.
Non-zero values for that opton introduced unnecessary lock contention.
Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
no longer necessary.
Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105
Differential Revision: D4847770
Pulled By: siying
fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
This fixes an issue when the most recent readers assume that alignment is always set even if direct io is off.
Also adjust slightly appveyor script to run db_basic_test cases concurrently.
Closes https://github.com/facebook/rocksdb/pull/1959
Differential Revision: D4671972
Pulled By: IslamAbdelRahman
fbshipit-source-id: 1886620
Summary:
also change variable name `direct_io_` to `use_direct_io_` in WritableFile to make it consistent with read path.
Closes https://github.com/facebook/rocksdb/pull/1770
Differential Revision: D4416435
Pulled By: lightmark
fbshipit-source-id: 4143c53
Summary:
Enable directIO on WritableFileImpl::Append
with offset being current length of the file.
Enable UniqueID tests on Windows, disable others but
leeting them to compile. Unique tests are valuable to
detect failures on different filesystems and upcoming
ReFS.
Clear output in WinEnv Getchildren.This is different from
previous strategy, do not touch output on failure.
Make sure DBTest.OpenWhenOpen works with windows error message
Closes https://github.com/facebook/rocksdb/pull/1746
Differential Revision: D4385681
Pulled By: IslamAbdelRahman
fbshipit-source-id: c07b702
For ease of reuse and customization as a library
without wrapping.
WinEnvThreads is a class for replacement.
WintEnvIO is a class for reuse and behavior override.
Added private virtual functions for custom override
of fallocate pread for io classes.