Summary:
_endthreadex does not return and thus objects
for stack destructors do not run. This creates a memory leak.
We remove the calls since _enthreadex called automatically after the
threadproc returns i.e. thread exits.
Closes https://github.com/facebook/rocksdb/pull/3542
Differential Revision: D7088713
Pulled By: ajkr
fbshipit-source-id: 749ecafc6a9572f587f76e516547e07734349a54
Summary:
Right now, users will encounter unexpected bahavior if they use key or value larger than 4GB. We should explicitly fail the queriers.
Closes https://github.com/facebook/rocksdb/pull/3484
Differential Revision: D6953895
Pulled By: siying
fbshipit-source-id: b60491e1af064fc5d52971956661f6c18ceac24f
Summary:
BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
(1) improve the way the verification is done. With this it is 5 times faster
(2) run fewer tests for large blocks. This cut it down by another 10 times.
Now it can finish in similar time as other tests.
Closes https://github.com/facebook/rocksdb/pull/3313
Differential Revision: D6643711
Pulled By: siying
fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
Summary:
FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.
After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.
Scenario 1: 40MB/s + 40MB/s R/W compaction throughput
Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 56.897 ms | 35.593 ms | -37.4%
P99 | 3.905 ms | 3.896 ms | -2.8%
Scenario 2: 14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.
Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 86.227 ms | 50.436 ms | -41.5%
P99 | 8.415 ms | 3.356 ms | -60.1%
Closes https://github.com/facebook/rocksdb/pull/3225
Differential Revision: D6624174
Pulled By: miasantreble
fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
Summary:
Fix a race condition when we create a thread and immediately destroy
This case should be supported.
What happens is that the thread function needs the Data instance
to actually run but has no shared ownership and must rely on the
WindowsThread instance to continue existing.
To address this we change unique_ptr to shared_ptr and then
acquire an additional refcount for the threadproc which destroys it
just before the thread exit.
We choose to allocate shared_ptr instance on the heap as this allows
the original thread to continue w/o waiting for the new thread to start
running.
Closes https://github.com/facebook/rocksdb/pull/3240
Differential Revision: D6511324
Pulled By: yiwu-arbug
fbshipit-source-id: 4633ff7996daf4d287a9fe34f60c1dd28cf4ff36
Summary:
MSVC does not support unused attribute at this time. A separate assignment line fixes the issue probably by being counted as usage for MSVC and it no longer complains about unused var.
Closes https://github.com/facebook/rocksdb/pull/3048
Differential Revision: D6126272
Pulled By: maysamyabandeh
fbshipit-source-id: 4907865db45fd75a39a15725c0695aaa17509c1f
Summary:
snprintf is defined as _snprintf, which doesn't exist in the std
namespace.
Closes https://github.com/facebook/rocksdb/pull/2298
Differential Revision: D5070457
Pulled By: yiwu-arbug
fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
Summary:
Add -DPORTABLE=1
port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running.
Closes https://github.com/facebook/rocksdb/pull/2815
Differential Revision: D5751661
Pulled By: siying
fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0
Summary:
GCC < 5 + ASAN does not instrument aligned_alloc, which can make ASAN
report false-positive with "free on address which was not malloc" error.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61693
Also suppress leak warning with LRUCache::DisownData().
Closes https://github.com/facebook/rocksdb/pull/2783
Differential Revision: D5696465
Pulled By: yiwu-arbug
fbshipit-source-id: 87c607c002511fa089b18cc35e24909bee0e74b4
Summary:
This reverts the previous commit 1d7048c598, which broke the build.
Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627
Differential Revision: D5476473
Pulled By: sagar0
fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary:
Make default impl return NoSupported so the db_blob
tests exist in a meaningful manner.
Replace std::thread to port::Thread
Closes https://github.com/facebook/rocksdb/pull/2465
Differential Revision: D5275563
Pulled By: yiwu-arbug
fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6
Summary:
Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool.
Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool.
Closes https://github.com/facebook/rocksdb/pull/2204
Differential Revision: D4936256
Pulled By: ajkr
fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01
Summary:
This was exposed by a48a62d, which made NDEBUG the default for cmake
builds.
Closes https://github.com/facebook/rocksdb/pull/2315
Differential Revision: D5079583
Pulled By: sagar0
fbshipit-source-id: c614e96a40df016a834a62b6236852265e7ee4db
Summary:
- added a feature test in build_detect_platform to check whether sched_getcpu() is available. glibc offers it only on some platforms (e.g., linux but not mac); this way should be easier than maintaining a list of platforms on which it's available.
- refactored PhysicalCoreID() to be simpler / less repetitive. ordered the conditional compilation clauses from most-to-least preferred
Closes https://github.com/facebook/rocksdb/pull/2272
Differential Revision: D5038093
Pulled By: ajkr
fbshipit-source-id: 81d7db3cc620250de220bdeb3194b2b3d7673de7
Summary:
Updated PhysicalCoreID() to use sched_getcpu() on x86_64 for glibc >= 2.22. Added a new
function named GetCPUID() that calls sched_getcpu(), to avoid repeated code. This change is done as per the comments of PR: https://github.com/facebook/rocksdb/pull/2230
Signed-off-by: Jos Collin <jcollin@redhat.com>
Closes https://github.com/facebook/rocksdb/pull/2260
Differential Revision: D5025734
Pulled By: ajkr
fbshipit-source-id: f4cca68c12573cafcf8531e7411a1e733bbf8eef
Summary:
RocksDB is compiled as part of MyRocks (MySQL storage engine) build.
MySQL already defines `CACHE_LINE_SIZE` and therefore we're getting a
conflict. Change RocksDB definition to be more cognizant of this.
Closes https://github.com/facebook/rocksdb/pull/2257
Differential Revision: D5013188
Pulled By: gunnarku
fbshipit-source-id: cfa76fe99f90dcd82aa09204e2f1f35e07a82b41
Summary:
- downcase includes for case-sensitive filesystems
- give targets the same name (librocksdb) on all platforms
With this patch it is possible to cross-compile RocksDB for Windows
from a Linux host using mingw.
cc yuslepukhin orgads
Closes https://github.com/facebook/rocksdb/pull/2107
Differential Revision: D4849784
Pulled By: siying
fbshipit-source-id: ad26ed6b4d393851aa6551e6aa4201faba82ef60
Summary: Checked the return value of __get_cpuid(). Implemented the else case where the arch is different from i386 and x86_64.
Pulled By: ajkr
Differential Revision: D4973496
fbshipit-source-id: c40fdef5840364c2a79b1d11df0db5d4ec3d6a4a
Summary:
Remove double buffering on RandomRead on Windows.
With more logic appear in file reader/write Read no longer
obeys forwarding calls to Windows implementation.
Previously direct_io (unbuffered) was only available on Windows
but now is supported as generic.
We remove intermediate buffering on Windows.
Remove random_access_max_buffer_size option which was windows specific.
Non-zero values for that opton introduced unnecessary lock contention.
Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
no longer necessary.
Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105
Differential Revision: D4847770
Pulled By: siying
fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
Summary:
Replacement of #2147
The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194
Differential Revision: D4929799
Pulled By: siying
fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
There still are many warnings (most of them about invalid printf format
for long long), but it builds if FAIL_ON_WARNINGS is disabled.
Closes https://github.com/facebook/rocksdb/pull/2052
Differential Revision: D4807355
Pulled By: siying
fbshipit-source-id: ef03786
Summary:
This fixes an issue when the most recent readers assume that alignment is always set even if direct io is off.
Also adjust slightly appveyor script to run db_basic_test cases concurrently.
Closes https://github.com/facebook/rocksdb/pull/1959
Differential Revision: D4671972
Pulled By: IslamAbdelRahman
fbshipit-source-id: 1886620
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread. On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823
Differential Revision: D4492902
Pulled By: siying
fbshipit-source-id: c74cb11
Summary:
also change variable name `direct_io_` to `use_direct_io_` in WritableFile to make it consistent with read path.
Closes https://github.com/facebook/rocksdb/pull/1770
Differential Revision: D4416435
Pulled By: lightmark
fbshipit-source-id: 4143c53
Summary:
Enable directIO on WritableFileImpl::Append
with offset being current length of the file.
Enable UniqueID tests on Windows, disable others but
leeting them to compile. Unique tests are valuable to
detect failures on different filesystems and upcoming
ReFS.
Clear output in WinEnv Getchildren.This is different from
previous strategy, do not touch output on failure.
Make sure DBTest.OpenWhenOpen works with windows error message
Closes https://github.com/facebook/rocksdb/pull/1746
Differential Revision: D4385681
Pulled By: IslamAbdelRahman
fbshipit-source-id: c07b702
Summary:
To support scenarios where we want all instances of `Mutex` be adaptive
we're adding a conditional `#define` so that the desired behavior can be
easily enabled.
Closes https://github.com/facebook/rocksdb/pull/1710
Differential Revision: D4359863
Pulled By: gunnarku
fbshipit-source-id: 2f1e2f8
Summary:
The address of the array of string pointers is returned as the function result of backtrace_symbols(). This array is malloced by backtrace_symbols(), and must be freed by the caller.
Closes https://github.com/facebook/rocksdb/pull/1692
Differential Revision: D4355737
Pulled By: IslamAbdelRahman
fbshipit-source-id: 5742035
Summary:
It'd be nice to use the error status type to distinguish
between user error and system error. For example, GetChildren can fail
listing a backup directory's contents either because a bad path was provided
(user error) or because an operation failed, e.g., a remote storage service
call failed (system error). In the former case, we want to continue and treat
the backup directory as empty; in the latter case, we want to immediately
propagate the error to the caller.
This diff uses NotFound to indicate user error and IOError to indicate
system error. Previously IOError indicated both.
Closes https://github.com/facebook/rocksdb/pull/1644
Differential Revision: D4312157
Pulled By: ajkr
fbshipit-source-id: 51b4f24
Summary:
This diff split ThreadPool to
-ThreadPool (abstract interface exposed in include/rocksdb/threadpool.h)
-ThreadPoolImpl (actual implementation in util/threadpool_imp.h)
This allow us to expose ThreadPool to the user so we can use it as an option later
Test Plan: existing unit tests
Reviewers: andrewkr, yiwu, yhchiang, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D62085
* stack_trace,cc: The current Stacktrace code does not compile for FreeBSD
So set it to generate empty routines
* stack_trace,cc: The current Stacktrace code does not compile for FreeBSD
Use the definition also used in other commits
Summary: When rate_bytes_per_sec * refill_period_us_ overflows, the actual limited rate is very low. Handle this case so the rate will be large.
Test Plan: Add a unit test for it.
Reviewers: IslamAbdelRahman, andrewkr
Reviewed By: andrewkr
Subscribers: yiwu, lightmark, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D58929
For ease of reuse and customization as a library
without wrapping.
WinEnvThreads is a class for replacement.
WintEnvIO is a class for reuse and behavior override.
Added private virtual functions for custom override
of fallocate pread for io classes.
Conditionally retrofit thread_posix for use with std::thread
and reuse the same logic. Posix users continue using Posix interfaces.
Enable XPRESS compression in test runs.
Fix master introduced signed/unsigned mismatch.
* Musl libc does not provide adaptive mutex. Added feature test for PTHREAD_MUTEX_ADAPTIVE_NP.
* Musl libc does not provide backtrace(3). Added a feature check for backtrace(3).
* Fixed compiler error.
* Musl libc does not implement backtrace(3). Added platform check for libexecinfo.
* Alpine does not appear to support gcc -pg option. By default (gcc has PIE option enabled) it fails with:
gcc: error: -pie and -pg|p|profile are incompatible when linking
When -fno-PIE and -nopie are used it fails with:
/usr/lib/gcc/x86_64-alpine-linux-musl/5.3.0/../../../../x86_64-alpine-linux-musl/bin/ld: cannot find gcrt1.o: No such file or directory
Added gcc -pg platform test and output PROFILING_FLAGS accordingly. Replaced pg var in Makefile with PROFILING_FLAGS.
* fix segfault when TEST_IOCTL_FRIENDLY_TMPDIR is undefined and default candidates are not suitable
* use ASSERT_DOUBLE_EQ instead of ASSERT_EQ
* When compiled with ROCKSDB_MALLOC_USABLE_SIZE UniversalCompactionFourPaths and UniversalCompactionSecondPathRatio tests fail due to premature memtable flushes on systems with 16-byte alignment. Arena runs out of block space before GenerateNewFile() completes.
Increased options.write_buffer_size.
Comparable with Snappy on comp ratio.
Implemented using Windows API, does not require external package.
Avaiable since Windows 8 and server 2012.
Use -DXPRESS=1 with CMake to enable.
Summary: I broke it in D53781.
Test Plan: tried the same code in util/env_posix.cc and it compiled successfully
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54303
Summary:
Added this new function, which returns filename, size, and modified
timestamp for each file in the provided directory. The default implementation
retrieves the metadata sequentially using existing functions. In the next diff
I'll make HdfsEnv override this function to use libhdfs's bulk get function.
This won't work on windows due to the path separator.
Test Plan:
new unit test
$ ./env_test --gtest_filter=EnvPosixTest.ConsistentChildrenMetadata
Reviewers: yhchiang, sdong
Reviewed By: sdong
Subscribers: IslamAbdelRahman, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D53781
This change impacts only non-buffered I/O on Windows.
Currently, there is a buffer per RandomAccessFile
instance that is protected by a lock. The reason we
maintain the buffer is non-buffered I/O requires an aligned
buffer to work.
XPerf traces demonstrate that we accumulate a considerable
wait time while waiting for that lock.
This change enables to set random access buffer size to zero
which would indicate a per request allocation.
We are expecting that allocation expense would be much less than
I/O costs plus wait time due to the fact that the memory heap
would tend to re-use page aligned allocations especially with the
use of Jemalloc.
This change does not affect buffer use as a read_ahead_buffer for
compaction purposes.
Summary:
This diff adds support for concurrent adds to the skiplist memtable
implementations. Memory allocation is made thread-safe by the addition of
a spinlock, with small per-core buffers to avoid contention. Concurrent
memtable writes are made via an additional method and don't impose a
performance overhead on the non-concurrent case, so parallelism can be
selected on a per-batch basis.
Write thread synchronization is an increasing bottleneck for higher levels
of concurrency, so this diff adds --enable_write_thread_adaptive_yield
(default off). This feature causes threads joining a write batch
group to spin for a short time (default 100 usec) using sched_yield,
rather than going to sleep on a mutex. If the timing of the yield calls
indicates that another thread has actually run during the yield then
spinning is avoided. This option improves performance for concurrent
situations even without parallel adds, although it has the potential to
increase CPU usage (and the heuristic adaptation is not yet mature).
Parallel writes are not currently compatible with
inplace updates, update callbacks, or delete filtering.
Enable it with --allow_concurrent_memtable_write (and
--enable_write_thread_adaptive_yield). Parallel memtable writes
are performance neutral when there is no actual parallelism, and in
my experiments (SSD server-class Linux and varying contention and key
sizes for fillrandom) they are always a performance win when there is
more than one thread.
Statistics are updated earlier in the write path, dropping the number
of DB mutex acquisitions from 2 to 1 for almost all cases.
This diff was motivated and inspired by Yahoo's cLSM work. It is more
conservative than cLSM: RocksDB's write batch group leader role is
preserved (along with all of the existing flush and write throttling
logic) and concurrent writers are blocked until all memtable insertions
have completed and the sequence number has been advanced, to preserve
linearizability.
My test config is "db_bench -benchmarks=fillrandom -threads=$T
-batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
-level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
-disable_auto_compactions --max_write_buffer_number=8
-max_background_flushes=8 --disable_wal --write_buffer_size=160000000
--block_size=16384 --allow_concurrent_memtable_write" on a two-socket
Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive. With 1
thread I get ~440Kops/sec. Peak performance for 1 socket (numactl
-N1) is slightly more than 1Mops/sec, at 16 threads. Peak performance
across both sockets happens at 30 threads, and is ~900Kops/sec, although
with fewer threads there is less performance loss when the system has
background work.
Test Plan:
1. concurrent stress tests for InlineSkipList and DynamicBloom
2. make clean; make check
3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
6. make clean; OPT=-DROCKSDB_LITE make check
7. verify no perf regressions when disabled
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba
Differential Revision: https://reviews.facebook.net/D50589
Summary:
This diff provides a framework for doing manual
compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
BackgroundCompactions, so that RunManualCompactions can be reentrant.
Parallelism is controlled by the two routines
ConflictingManualCompaction to allow/disallow new parallel/manual
compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
I will be adding more tests later.
Test Plan: Rocksdb regression + new tests + valgrind
Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong
Reviewed By: sdong
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47973
* conversion from 'size_t' to 'type', by add static_cast
Tested:
* by build solution on Windows, Linux locally,
* run tests
* build CI system successful
in 64-bit.
Currently, a signed off_t type is being used for the following
interfaces for both offset and the length in bytes:
* `Allocate`
* `RangeSync`
On Linux `off_t` is automatically either 32 or 64-bit depending on
the platform. On Windows it is always a 32-bit signed long which
limits file access and in particular space pre-allocation
to effectively 2 Gb.
Proposal is to replace off_t with uint64_t as a portable type
always access files with 64-bit interfaces.
May need to modify posix code but lack resources to test it.
Summary: Run "make format" for some recent commits.
Test Plan: Build and run tests
Reviewers: IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D49707
introduce a new DBOption random_access_max_buffer_size to limit
the size of the random access buffer used for unbuffered access.
Implement read ahead buffering when enabled.
To that effect propagate compaction_readahead_size and the new option
to the env options to make it available for the implementation.
Add Hint() override so SetupForCompaction() call would call Hint()
readahead can now be setup from both Hint() and EnableReadAhead()
Add new option random_access_max_buffer_size support
db_bench, options_helper to make it string parsable
and the unit test.
Summary:
Give a name for every kill point, and allow users to disable some kill points based on prefixes. The kill points can be passed by db_stress through a command line paramter. This provides a way for users to boost the chance of triggering low frequency kill points
This allow follow up changes in crash test scripts to improve crash test coverage.
Test Plan:
Manually run db_stress with variable values of --kill_random_test and --kill_prefix_blacklist. Like this:
--kill_random_test=2 --kill_prefix_blacklist=Posix,WritableFileWriter::Append,WritableFileWriter::WriteBuffered,WritableFileWriter::Sync
Reviewers: igor, kradhakrishnan, rven, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D48735
Summary:
This mirrors https://reviews.facebook.net/D45645
Currently, mmap returns IOError when user tries to read
data past the end of the file. This diff changes the behavior.
Now, we return just the bytes that we can, and report the size
we returned via a Slice result. This is consistent with non-mmap
behavior and also pread() system call.
Summary. A change https://reviews.facebook.net/differential/diff/224721/
Has attempted to move common functionality out of platform dependent
code to a new facility called file_reader_writer.
This includes:
- perf counters
- Buffering
- RateLimiting
However, the change did not attempt to refactor Windows code.
To mitigate, we introduce new quering interfaces such as UseOSBuffer(),
GetRequiredBufferAlignment() and ReaderWriterForward()
for pure forwarding where required.
Introduce WritableFile got a new method Truncate(). This is to communicate
to the file as to how much data it has on close.
- When space is pre-allocated on Linux it is filled with zeros implicitly,
no such thing exist on Windows so we must truncate file on close.
- When operating in unbuffered mode the last page is filled with zeros but we still want to truncate.
Previously, Close() would take care of it but now buffer management is shifted to the wrappers and the file has
no idea about the file true size.
This means that Close() on the wrapper level must always include
Truncate() as well as wrapper __dtor should call Close() and
against double Close().
Move buffered/unbuffered write logic to the wrapper.
Utilize Aligned buffer class.
Adjust tests and implement Truncate() where necessary.
Come up with reasonable defaults for new virtual interfaces.
Forward calls for RandomAccessReadAhead class to avoid double
buffering and locking (double locking in unbuffered mode on WIndows).
Summary: We should never set max_open_files to be bigger than the system's ulimit. Otherwise we will get "Too many open files" errors. See an example in this Travis run: https://travis-ci.org/facebook/rocksdb/jobs/79591566
Test Plan:
make check
I will also verify that max_max_open_files is reasonable.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46551
Previous change for the function
555ca3e7b7 (diff-bdc04e0404c2db4fd3ac5118a63eaa4a)
made use of the QueryPerformanceCounter to return microseconds values that do not repeat
as std::chrono::system_clock returned values that made auto_roll_logger_test fail.
The interface documentation does not state that we need to return
system time describing the return value as a number of microsecs since some
moment in time. However, because on Linux it is implemented using gettimeofday
various pieces of code (such as GenericRateLimiter) took advantage of that
and make use of NowMicros() as a system timestamp. Thus the previous change
broke rate_limiter_test on Windows.
In addition, the interface name NowMicros() suggests that it is actually
a timestamp so people use it as such.
This change makes use of the new system call on Windows that returns
system time with required precision. This change preserves the fix
for auto_roll_logger_test and fixes rate_limiter_test.
Note that DBTest.RateLimitingTest still fails due to a separately reported issue.