Commit Graph

1492 Commits

Author SHA1 Message Date
Mike Kolupaev
e85f2c64cb Prevent empty memtables from using a lot of memory
Summary:
This fixes OOMs that we (logdevice) are currently having in production.

SkipListRep constructor does a couple small allocations from ConcurrentArena (see InlineSkipList constructor). ConcurrentArena would sometimes allocate an entire block for that, which is a few megabytes (we use Options::arena_block_size = 4 MB). So an empty memtable can take take 4 MB of memory. We have ~40k column families (spread across 15 DB instances), so 4 MB per empty memtable easily OOMs a machine for us.

This PR makes ConcurrentArena always allocate from Arena's inline block when possible. So as long as InlineSkipList's initial allocations are below 2 KB there would be no blocks allocated for empty memtables.
Closes https://github.com/facebook/rocksdb/pull/2569

Differential Revision: D5404029

Pulled By: al13n321

fbshipit-source-id: 568ec22a3fd1a485c06123f6b2dfc5e9ef67cd23
2017-07-28 15:58:43 -07:00
Siying Dong
e7697b8ce8 Fix LITE unit tests
Summary: Closes https://github.com/facebook/rocksdb/pull/2649

Differential Revision: D5505778

Pulled By: siying

fbshipit-source-id: 7e935603ede3d958ea087ed6b8cfc4121e8797bc
2017-07-26 21:11:47 -07:00
Siying Dong
c281b44829 Revert "CRC32 Power Optimization Changes"
Summary:
This reverts commit 2289d38115.
Closes https://github.com/facebook/rocksdb/pull/2652

Differential Revision: D5506163

Pulled By: siying

fbshipit-source-id: 105e31dd9d99090453a6b9f32c165206cd3affa3
2017-07-26 19:31:36 -07:00
Kamalalochana Subbaiah
2289d38115 CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2353

Differential Revision: D5210948

Pulled By: siying

fbshipit-source-id: 859a8c063d37697addd89ba2b8a14e5efd5d24bf
2017-07-26 09:42:29 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Islam AbdelRahman
3e5ea29a83 Fix Flaky DeleteSchedulerTest::ImmediateDeleteOn25PercDBSize
Summary:
In this test we are deleting 100 files, and we are expecting DeleteScheduler to delete 26 files in the background and 74 files immediately in the foreground

The main purpose of the test is to make sure that we delete files in foreground thread, which is verified in line 546

But sometimes we may end up with 26 files or 25 files in the trash directory because the background thread may be slow and not be able to delete the first file fast enough, so sometimes this test fail.

Remove
```
ASSERT_EQ(CountFilesInDir(trash_dir_), 25);
```
Since it does not have any benefit any way
Closes https://github.com/facebook/rocksdb/pull/2618

Differential Revision: D5458674

Pulled By: IslamAbdelRahman

fbshipit-source-id: 5556a9edfa049db71dce80b8e6ae0fdd25e1e74e
2017-07-20 11:29:01 -07:00
Yedidya Feldblum
f1a056e005 CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera
Summary:
CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.

The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.

Reviewed By: Orvid

Differential Revision:
D5432398

Tags: codemod, codemod-opensource

fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
2017-07-16 21:26:02 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Giuseppe Ottaviano
8f927e5f75 Fix undefined behavior in Hash
Summary:
Instead of ignoring UBSan checks, fix the negative shifts in
Hash(). Also add test to make sure the hash values are stable over
time. The values were computed before this change, so the test also
verifies the correctness of the change.
Closes https://github.com/facebook/rocksdb/pull/2546

Differential Revision: D5386369

Pulled By: yiwu-arbug

fbshipit-source-id: 6de4b44461a544d6222cc5d72d8cda2c0373d17e
2017-07-10 12:29:24 -07:00
Maysam Yabandeh
b5fb85ec51 fix valgrind init complaint
Summary: Closes https://github.com/facebook/rocksdb/pull/2549

Differential Revision: D5386307

Pulled By: maysamyabandeh

fbshipit-source-id: 3032c95c54755053b6450765ec4dacbecb734f9d
2017-07-07 18:27:08 -07:00
Maysam Yabandeh
45b9bb0331 Cut filter partition based on metadata_block_size
Summary:
Currently metadata_block_size controls only index partition size. With this patch a partition is cut after any of index or filter partitions reaches metadata_block_size.
Closes https://github.com/facebook/rocksdb/pull/2452

Differential Revision: D5275651

Pulled By: maysamyabandeh

fbshipit-source-id: 5057e4424b4c8902043782e6bf8c38f0c4f25160
2017-07-02 10:42:12 -07:00
Siying Dong
afbef65187 Bug fix: Fast CRC Support printing is not honest
Summary:
11c5d4741a introduces a bug that IsFastCrc32Supported() returns wrong result. Fix it. Also fix some FB internal scripts.
Closes https://github.com/facebook/rocksdb/pull/2513

Differential Revision: D5343802

Pulled By: yiwu-arbug

fbshipit-source-id: 057dc7ae3b262fe951413d1190ce60afc788cc05
2017-06-28 21:41:42 -07:00
Mike Kolupaev
397ab11152 Improve Status message for block checksum mismatches
Summary:
We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages.

It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild.
Closes https://github.com/facebook/rocksdb/pull/2507

Differential Revision: D5345702

Pulled By: al13n321

fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339
2017-06-28 21:27:01 -07:00
Aaron Gao
0025a36409 revert perf_context and io_stats to __thread
Summary:
https://github.com/facebook/rocksdb/pull/2380 introduces a regression by replacing __thread with ThreadLocalPtr. Revert the thread local implementation back.
Closes https://github.com/facebook/rocksdb/pull/2485

Differential Revision: D5308050

Pulled By: lightmark

fbshipit-source-id: 2676e9c22edf76e8133d3f4c50e2711e11a95480
2017-06-26 15:27:17 -07:00
Maysam Yabandeh
499ebb3ab5 Optimize for serial commits in 2PC
Summary:
Throughput: 46k tps in our sysbench settings (filling the details later)

The idea is to have the simplest change that gives us a reasonable boost
in 2PC throughput.

Major design changes:
1. The WAL file internal buffer is not flushed after each write. Instead
it is flushed before critical operations (WAL copy via fs) or when
FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
via mutex_.
2. Use two sequence numbers: last seq, and last seq for write. Last seq
is the last visible sequence number for reads. Last seq for write is the
next sequence number that should be used to write to WAL/memtable. This
allows to have a memtable write be in parallel to WAL writes.
3. BatchGroup is not used for writes. This means that we can have
parallel writers which changes a major assumption in the code base. To
accommodate for that i) allow only 1 WriteImpl that intends to write to
memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
come via group commit phase which is serial anyway, ii) make all the
parts in the code base that assumed to be the only writer (via
EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
protected via a stat_mutex_.

Note: the first commit has the approach figured out but is not clean.
Submitting the PR anyway to get the early feedback on the approach. If
we are ok with the approach I will go ahead with this updates:
0) Rebase with Yi's pipelining changes
1) Currently batching is disabled by default to make sure that it will be
consistent with all unit tests. Will make this optional via a config.
2) A couple of unit tests are disabled. They need to be updated with the
serial commit of 2PC taken into account.
3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
releasing mutex_ beforehand (the same way EnterUnbatched does). This
needs to be cleaned up.
Closes https://github.com/facebook/rocksdb/pull/2345

Differential Revision: D5210732

Pulled By: maysamyabandeh

fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
2017-06-24 14:11:29 -07:00
Dmitri Smirnov
a21db161c9 Implement ReopenWritibaleFile on Windows and other fixes
Summary:
Make default impl return NoSupported so the db_blob
  tests exist in a meaningful manner.
  Replace std::thread to port::Thread
Closes https://github.com/facebook/rocksdb/pull/2465

Differential Revision: D5275563

Pulled By: yiwu-arbug

fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6
2017-06-20 10:31:13 -07:00
Sagar Vemuri
53dda8797d Do not run RateLimiterTest.Rate test on Travis+Mac OSX.
Summary:
RateLimiterTest.Rate test has been failing continuously since many days on travis in Mac OSX PLATFORM_DEPENDENT test suite.
Check https://travis-ci.org/facebook/rocksdb/pull_requests.

Disabling this test for now, so that we can investigate more in depth.
Closes https://github.com/facebook/rocksdb/pull/2451

Differential Revision: D5250147

Pulled By: sagar0

fbshipit-source-id: d58476a3c2792d20e875754d1516c4bc7174e86c
2017-06-14 14:58:02 -07:00
hyunwoo
6b5a5dc5d8 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2430

Differential Revision: D5242471

Pulled By: IslamAbdelRahman

fbshipit-source-id: 832eb3a4c70221444ccd2ae63217823fec56c748
2017-06-13 16:58:01 -07:00
haoxiang
0f228be3bb fixed typo in util/dynamic_bloom.h
Summary:
fixed a typo in util/dynamic_bloom.h
Closes https://github.com/facebook/rocksdb/pull/2442

Differential Revision: D5242397

Pulled By: IslamAbdelRahman

fbshipit-source-id: c47fd18cc79afff6b022201a0410c0cd47626576
2017-06-13 16:41:36 -07:00
Andrew Kryczka
c217e0b9c7 Call RateLimiter for compaction reads
Summary:
Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed.
Closes https://github.com/facebook/rocksdb/pull/2433

Differential Revision: D5216946

Pulled By: ajkr

fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e
2017-06-13 14:56:46 -07:00
Siying Dong
0175d58c3c Make direct I/O write use incremental buffer
Summary:
Currently for direct I/O, the large maximum buffer is always allocated. This will be wasteful if users flush the data in much smaller chunks. This diff fix this by changing the behavior of incremental buffer works. When we enlarge buffer, we try to copy the existing data in the buffer to the enlarged buffer, rather than flush the buffer first. This can make sure that no extra I/O is introduced because of buffer enlargement.
Closes https://github.com/facebook/rocksdb/pull/2403

Differential Revision: D5178403

Pulled By: siying

fbshipit-source-id: a8fe1e7304bdb8cab2973340022fe80ff83449fd
2017-06-13 04:41:37 -07:00
Islam AbdelRahman
d713471da8 Limit trash directory to be 25% of total DB
Summary:
Update DeleteScheduler to delete files immediately if trash directory is >= 25% of DB size
Closes https://github.com/facebook/rocksdb/pull/2436

Differential Revision: D5230384

Pulled By: IslamAbdelRahman

fbshipit-source-id: 5cbda8ac536a3cc72c774641621edc02c8202482
2017-06-12 16:57:21 -07:00
Siying Dong
db818d2d1a Fix RocksDB Lite build with CLANG
Summary: Closes https://github.com/facebook/rocksdb/pull/2419

Differential Revision: D5193976

Pulled By: siying

fbshipit-source-id: 62d115edee6043237e9d6ad3c2a05481e162c9eb
2017-06-12 06:41:27 -07:00
Siying Dong
52a7f38b19 WriteOptions.low_pri which can throttle low pri writes if needed
Summary:
If ReadOptions.low_pri=true and compaction is behind, the write will either return immediate or be slowed down based on ReadOptions.no_slowdown.
Closes https://github.com/facebook/rocksdb/pull/2369

Differential Revision: D5127619

Pulled By: siying

fbshipit-source-id: d30e1cff515890af0eff32dfb869d2e4c9545eb0
2017-06-05 15:02:35 -07:00
hyunwoo
c7662a44a4 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2376

Differential Revision: D5183630

Pulled By: ajkr

fbshipit-source-id: 133cfd0445959e70aa2cd1a12151bf3c0c5c3ac5
2017-06-05 11:27:34 -07:00
Aaron Gao
7f6c02dda1 using ThreadLocalPtr to hide ROCKSDB_SUPPORT_THREAD_LOCAL from public…
Summary:
… headers

https://github.com/facebook/rocksdb/pull/2199 should not reference RocksDB-specific macros (like ROCKSDB_SUPPORT_THREAD_LOCAL in this case) to public headers, `iostats_context.h` and `perf_context.h`. We shouldn't do that because users have to provide these compiler flags when building their binary with RocksDB.

We should hide the thread local global variable inside our implementation and just expose a function api to retrieve these variables. It may break some users for now but good for long term.

make check -j64
Closes https://github.com/facebook/rocksdb/pull/2380

Differential Revision: D5177896

Pulled By: lightmark

fbshipit-source-id: 6fcdfac57f2e2dcfe60992b7385c5403f6dcb390
2017-06-02 17:26:19 -07:00
Siying Dong
95b0e89b5d Improve write buffer manager (and allow the size to be tracked in block cache)
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350

Differential Revision: D5110648

Pulled By: siying

fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
2017-06-02 14:26:56 -07:00
Tamir Duberstein
103d0692ea Avoid unsupported attributes when not building with UBSAN
Summary:
yiwu-arbug see individual commits.
Closes https://github.com/facebook/rocksdb/pull/2318

Differential Revision: D5141520

Pulled By: yiwu-arbug

fbshipit-source-id: 7987c92ab4461eef36afce5a133d3a0ee0c96300
2017-05-30 11:13:01 -07:00
Siying Dong
41cbb72749 options.delayed_write_rate use the rate of rate_limiter by default.
Summary:
It's hard for RocksDB to come up with a good default of delayed write rate. Use rate given by rate limiter if it is availalbe. This provides the I/O order of magnitude.
Closes https://github.com/facebook/rocksdb/pull/2357

Differential Revision: D5115324

Pulled By: siying

fbshipit-source-id: 341065ad2211c981fc804011c0f0e59a50c7e754
2017-05-24 09:58:24 -07:00
Andrew Kryczka
6cc9aef162 New API for background work in single thread pool
Summary:
Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool.

Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool.
Closes https://github.com/facebook/rocksdb/pull/2204

Differential Revision: D4936256

Pulled By: ajkr

fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01
2017-05-23 11:12:27 -07:00
Andrew Kryczka
ac39d6bec5 Core-local statistics
Summary:
This diff changes `StatisticsImpl` from a thread-local approach to a core-local one. The goal is to perform faster aggregations, particularly for applications that have many threads. There should be no behavior change.
Closes https://github.com/facebook/rocksdb/pull/2258

Differential Revision: D5016258

Pulled By: ajkr

fbshipit-source-id: 7d4d165b4a91d8110f0409d113d1be91f22d31a9
2017-05-23 10:42:59 -07:00
Aaron Gao
3e86c0f07c disable direct reads for log and manifest and add direct io to tests
Summary:
Disable direct reads for log and manifest. Direct reads should not affect sequential_file
Also add kDirectIO for option_config_ in db_test_util
Closes https://github.com/facebook/rocksdb/pull/2337

Differential Revision: D5100261

Pulled By: lightmark

fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b
2017-05-22 18:41:28 -07:00
Dmitri Smirnov
15ba4d6c4b Address MS Visual Studio 2017 issue with autovector
Summary:
This addresses https://github.com/facebook/rocksdb/issues/2262
Closes https://github.com/facebook/rocksdb/pull/2333

Differential Revision: D5097941

Pulled By: siying

fbshipit-source-id: fb33582bfe7883ecc3f6da028703982522b5f75f
2017-05-22 10:57:06 -07:00
Aaron Gao
a36220ccfb fix unity test
Summary:
unity test will fail even if we have the same function names in different anonymous namespaces in different files.
Closes https://github.com/facebook/rocksdb/pull/2321

Differential Revision: D5083783

Pulled By: lightmark

fbshipit-source-id: 1347aaf866900af30d23cdd4f29c1b96f17352af
2017-05-17 18:56:55 -07:00
Nikhil Benesch
11c5d4741a cross-platform compatibility improvements
Summary:
We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS.

See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build.

I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream.
Closes https://github.com/facebook/rocksdb/pull/2199

Differential Revision: D5054042

Pulled By: yiwu-arbug

fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
2017-05-15 16:15:38 -07:00
Andrew Kryczka
bbe9ee7dd4 core-local array type conversions
Summary:
try to clean up the type conversions and hope it passes on windows.

one interesting thing I learned is that bitshift operations are special: in `x << y`, the result type depends only on the type of `x`, unlike most arithmetic operations where the result type depends on both operands' types.
Closes https://github.com/facebook/rocksdb/pull/2277

Differential Revision: D5050145

Pulled By: ajkr

fbshipit-source-id: f3309e77526ac9612c632bf93a62d99757af9a29
2017-05-12 09:28:07 -07:00
Andrew Kryczka
cda5fde2d9 CoreLocalArray class
Summary:
Moved the logic for core-local array out of ConcurrentArena and into a separate class because I want to reuse it for core-local stats.
Closes https://github.com/facebook/rocksdb/pull/2256

Differential Revision: D5011518

Pulled By: ajkr

fbshipit-source-id: a75a7b8f7b7a42fd6273489ada405f14c6be196a
2017-05-10 18:25:36 -07:00
Anirban Rahut
d85ff4953c Blob storage pr
Summary:
The final pull request for Blob Storage.
Closes https://github.com/facebook/rocksdb/pull/2269

Differential Revision: D5033189

Pulled By: yiwu-arbug

fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06
2017-05-10 15:14:44 -07:00
Tamir Duberstein
fdaefa0309 travis: add Windows cross-compilation
Summary:
- downcase includes for case-sensitive filesystems
- give targets the same name (librocksdb) on all platforms

With this patch it is possible to cross-compile RocksDB for Windows
from a Linux host using mingw.

cc yuslepukhin orgads
Closes https://github.com/facebook/rocksdb/pull/2107

Differential Revision: D4849784

Pulled By: siying

fbshipit-source-id: ad26ed6b4d393851aa6551e6aa4201faba82ef60
2017-05-05 23:20:01 -07:00
Aaron Gao
a30a696034 do not read next datablock if upperbound is reached
Summary:
Now if we have iterate_upper_bound set, we continue read until get a key >= upper_bound. For a lot of cases that neighboring data blocks have a user key gap between them, our index key will be a user key in the middle to get a shorter size. For example, if we have blocks:
[a b c d][f g h]
Then the index key for the first block will be 'e'.
then if upper bound is any key between 'd' and 'e', for example, d1, d2, ..., d99999999999, we don't have to read the second block and also know that we have done our iteration by reaching the last key that smaller the upper bound already.

This diff can reduce RA in most cases.
Closes https://github.com/facebook/rocksdb/pull/2239

Differential Revision: D4990693

Pulled By: lightmark

fbshipit-source-id: ab30ea2e3c6edf3fddd5efed3c34fcf7739827ff
2017-05-05 23:20:01 -07:00
Aaron Gao
2d42cf5ea9 Roundup read bytes in ReadaheadRandomAccessFile
Summary:
Fix alignment in ReadaheadRandomAccessFile
Closes https://github.com/facebook/rocksdb/pull/2253

Differential Revision: D5012336

Pulled By: lightmark

fbshipit-source-id: 10d2c829520cb787227ef653ef63d5d701725778
2017-05-05 12:14:14 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Dmitri Smirnov
cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Aaron Gao
2150cc1f3e fix WritableFile buffer size in direct IO
Summary:
�fix the buffer size in case of ppl use buffer size as their block_size.
Closes https://github.com/facebook/rocksdb/pull/2198

Differential Revision: D4956878

Pulled By: lightmark

fbshipit-source-id: 8bb0dc9c133887aadcd625d5261a3d1110b71473
2017-04-26 15:57:02 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Yi Wu
0fcdccc33e Blob storage helper methods
Summary:
Split out interfaces needed for blob storage from #1560, including
* CompactionEventListener and OnFlushBegin listener interfaces.
* Blob filename support.
Closes https://github.com/facebook/rocksdb/pull/2169

Differential Revision: D4905463

Pulled By: yiwu-arbug

fbshipit-source-id: 564e73448f1b7a367e5e46216a521e57ea9011b5
2017-04-18 12:42:38 -07:00
Aaron Gao
6e8d6f429d readahead backwards from sst end
Summary:
prefetch some data from the end of the file for each compaction to reduce IO.
Closes https://github.com/facebook/rocksdb/pull/2149

Differential Revision: D4880576

Pulled By: lightmark

fbshipit-source-id: aa767cd1afc84c541837fbf1ad6c0d45b34d3932
2017-04-14 18:56:14 -07:00
Aaron Gao
44fa8ece9b change use_direct_writes to use_direct_io_for_flush_and_compaction
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117

Differential Revision: D4860912

Pulled By: lightmark

fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
2017-04-13 16:12:04 -07:00
Aaron Gao
10d7546961 set readahead buffer size from roundup(user_size) + 4k to roundup(use…
Summary:
Users usually set readahead buffer to a multiple of 4k, more than that, usually a multiple of blocks.
So previously we set real buffer size 512 * n + 4k, which may introduce an additional block reading.
Closes https://github.com/facebook/rocksdb/pull/2138

Differential Revision: D4871504

Pulled By: lightmark

fbshipit-source-id: b070faa51d92e976e8e8468c00692699e585e243
2017-04-11 17:13:33 -07:00