Commit Graph

2727 Commits

Author SHA1 Message Date
Dmitri Smirnov
cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Siying Dong
e15382c09c Disable two flaky tests
Summary: Closes https://github.com/facebook/rocksdb/pull/2217

Differential Revision: D4959351

Pulled By: siying

fbshipit-source-id: ce7c3a430bae0d15e06b3d5c958ebce969d08564
2017-04-26 17:13:46 -07:00
Andrew Kryczka
efc361ef7d Add user stats Reset API
Summary:
It resets all the ticker and histogram stats to zero. Needed to change the locking a bit since Reset() is the only operation that manipulates multiple tickers/histograms together, and that operation should be seen as atomic by other operations that access tickers/histograms.
Closes https://github.com/facebook/rocksdb/pull/2213

Differential Revision: D4952232

Pulled By: ajkr

fbshipit-source-id: c0475c3e4c7b940120d53891b69c3091149a0679
2017-04-26 15:57:01 -07:00
Andrew Kryczka
f6a27d0bce Extract statistics tests into separate file
Summary:
I'm going to add more DB tests for statistics as currently we have very few. I started a file dedicated to this purpose and moved the existing stats-specific tests there.
Closes https://github.com/facebook/rocksdb/pull/2211

Differential Revision: D4951558

Pulled By: ajkr

fbshipit-source-id: 05d11c35079c40ecabdfd2cf5556ccb761f694a4
2017-04-26 14:47:23 -07:00
Aaron Gao
7eddecce12 support bulk loading with universal compaction
Summary:
Support buck load with universal compaction.
More test cases to be added.
Closes https://github.com/facebook/rocksdb/pull/2202

Differential Revision: D4935360

Pulled By: lightmark

fbshipit-source-id: cc3ca1b6f42faa503207dab1408d6bcf393ee5b5
2017-04-26 13:41:32 -07:00
Aaron Gao
72c21fb3f2 call GetRootDB() before cast to DBImpl* in CancelAllBackgroundWork
Summary:
User could call this with wrapper class of DB or DBImpl
Closes https://github.com/facebook/rocksdb/pull/2200

Differential Revision: D4935530

Pulled By: lightmark

fbshipit-source-id: df9cb61d67d0f3bbcf62f714d77523a459a92883
2017-04-24 13:47:17 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Aaron Gao
cb885bccfe set compaction_iterator earliest_snapshot to max if no snapshot
Summary:
It is a potential bug that will be triggered if we ingest files before inserting the first key into an empty db.
0 is a special value reserved to indicate the concept of non-existence. But not good for seqno in this case because 0 is a valid seqno for ingestion(bulk loading)
Closes https://github.com/facebook/rocksdb/pull/2183

Differential Revision: D4919827

Pulled By: lightmark

fbshipit-source-id: 237eea40f88bd6487b66806109d90065dc02c362
2017-04-20 19:56:59 -07:00
Andrew Kryczka
1dd7760513 Change L0 compaction score using level size
Summary:
The goal is to avoid the problem of small number of L0 files triggering compaction to base level (which increased write-amp), while still allowing L0 compaction-by-size (so intra-L0 compactions cause score to increase).
Closes https://github.com/facebook/rocksdb/pull/2172

Differential Revision: D4908552

Pulled By: ajkr

fbshipit-source-id: 4b170142b2b368e24bd7948b2a6f24c69fabf73d
2017-04-19 12:00:01 -07:00
Yi Wu
966ebb02f5 Hide event listeners from lite build
Summary:
Fixing lite build failure introduce by #2169.
Closes https://github.com/facebook/rocksdb/pull/2174

Reviewed By: sagar0

Differential Revision: D4910619

Pulled By: yiwu-arbug

fbshipit-source-id: 5213b7b7431cc258688793c8c28153025588d8d9
2017-04-18 17:26:19 -07:00
Siying Dong
c49d704656 Add DB:ResetStats()
Summary:
Add a function to allow users to reset internal stats without restarting the DB.
Closes https://github.com/facebook/rocksdb/pull/2167

Differential Revision: D4907939

Pulled By: siying

fbshipit-source-id: ab2dd85b88aabe9380da7485320a1d460d3e1f68
2017-04-18 16:56:48 -07:00
Yi Wu
0fcdccc33e Blob storage helper methods
Summary:
Split out interfaces needed for blob storage from #1560, including
* CompactionEventListener and OnFlushBegin listener interfaces.
* Blob filename support.
Closes https://github.com/facebook/rocksdb/pull/2169

Differential Revision: D4905463

Pulled By: yiwu-arbug

fbshipit-source-id: 564e73448f1b7a367e5e46216a521e57ea9011b5
2017-04-18 12:42:38 -07:00
Yi Wu
e9e6e53247 Simplify write thread logic
Summary:
The concept about early exit in write thread implementation is a confusing one. It means that if early exit is allowed, batch group leader will not responsible to exit the batch group, but the last finished writer do. In case we need to mark log synced, or encounter memtable insert error, early exit is disallowed.

This patch remove such a concept by:
* In all cases, the last finished writer (not necessary leader) is responsible to exit batch group.
* In case of parallel memtable write, leader will also mark log synced after memtable insert and before signal finish (call `CompleteParallelWorker()`). The purpose is to allow mark log synced (which require locking mutex) can run in parallel to memtable insert in other writers.
* The last finish writer should handle memtable insert error (update bg_error_) before exiting batch group.
Closes https://github.com/facebook/rocksdb/pull/2134

Differential Revision: D4869667

Pulled By: yiwu-arbug

fbshipit-source-id: aec170847c85b90f4179d6a4608a4fe1361544e3
2017-04-13 16:12:04 -07:00
Aaron Gao
44fa8ece9b change use_direct_writes to use_direct_io_for_flush_and_compaction
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117

Differential Revision: D4860912

Pulled By: lightmark

fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
2017-04-13 16:12:04 -07:00
Igor Canadi
b6b9359ece Fix BYTES_WRITTEN accounting
Summary:
BYTES_WRITTEN accounting doesn't work with disabled WAL. For example, this is what we
get in the LOG:

```
Cumulative writes: 9794K writes, 228M keys, 9794K commit groups, 1.0
writes per commit group, ingest: 0.00 GB, 0.00 MB/s
```

WAL bytes are tracked in a different statistic:
https://github.com/facebook/rocksdb/blob/master/db/internal_stats.h#L105.
BYTES_WRITTEN should count all the writes.
Closes https://github.com/facebook/rocksdb/pull/2133

Differential Revision: D4880615

Pulled By: yiwu-arbug

fbshipit-source-id: 8fd0b223099f3f5ad7df79d4e737d313687fec69
2017-04-13 16:12:03 -07:00
Sagar Vemuri
6a6723ee1e Move MergeOperatorPinning tests to be with other merge operator tests
Summary:
Moved MergeOperatorPinning tests from db_test2.cc to db_merge_operator_test.cc.

[This is the same code as PR #2104 , which has already been reviewed,  but I am creating a new PR as I cannot import from #2104 onto phabricator anymore even after rebasing. I'll close and discard #2104.]
Closes https://github.com/facebook/rocksdb/pull/2125

Differential Revision: D4863312

Pulled By: sagar0

fbshipit-source-id: 0f71a7690aa09c1d03ee85ce2bc1d2d89e4f4399
2017-04-11 16:15:06 -07:00
Siying Dong
8f47a97512 File level histogram should be printed per CF, not per DB
Summary:
Currently level histogram is only printed out for DB stats and for default CF. This is confusing. Change to print for every CF instead.
Closes https://github.com/facebook/rocksdb/pull/2126

Differential Revision: D4865373

Pulled By: siying

fbshipit-source-id: 1c853e0ac66e00120ee931cabc9daf69ccc2d577
2017-04-11 08:42:03 -07:00
Manuel Ung
1f8b119ed6 Limit maximum memory used in the WriteBatch representation
Summary:
Extend TransactionOptions to include max_write_batch_size which determines the maximum size of the writebatch representation. If memory limit is exceeded, the operation will abort with subcode kMemoryLimit.
Closes https://github.com/facebook/rocksdb/pull/2124

Differential Revision: D4861842

Pulled By: lth

fbshipit-source-id: 46fd172ea67cc90bbba829bf0d70cfab2261c161
2017-04-10 15:42:26 -07:00
Maysam Yabandeh
20778f2f92 Adding comments to the write path
Summary:
also did minor refactoring
Closes https://github.com/facebook/rocksdb/pull/2115

Differential Revision: D4855818

Pulled By: maysamyabandeh

fbshipit-source-id: fbca6ac57e5c6677fffe8354f7291e596a50cb77
2017-04-10 12:43:34 -07:00
Sagar Vemuri
7124268a09 Reduce the number of params needed to construct DBIter
Summary:
DBIter, and in-turn NewDBIterator and NewArenaWrappedDBIterator, take a  bunch of params. They can be reduced by passing in ReadOptions directly instead of passing in every new param separately. It also seems much cleaner as a bunch of the params towards the end seem to be optional.

(Recently I introduced max_skippable_internal_keys, which added one more to the already huge count).

Idea courtesy IslamAbdelRahman
Closes https://github.com/facebook/rocksdb/pull/2116

Differential Revision: D4857128

Pulled By: sagar0

fbshipit-source-id: 7d239df094b94bd9ea79d145cdf825478ac037a8
2017-04-10 11:14:14 -07:00
Islam AbdelRahman
61730186df dummy diff
Summary: Closes https://github.com/facebook/rocksdb/pull/2114

Differential Revision: D4854860

Pulled By: IslamAbdelRahman

fbshipit-source-id: b871c5b9ccc52d20f5ceacdd172dc70b1dbf9110
2017-04-07 17:07:37 -07:00
Ayappan
dd8f9e38e9 Fix compilation for GCC-5
Summary:
Fixes this issue https://github.com/facebook/rocksdb/issues/2108
Closes https://github.com/facebook/rocksdb/pull/2109

Differential Revision: D4851965

Pulled By: yiwu-arbug

fbshipit-source-id: 6ee807b
2017-04-07 10:54:12 -07:00
Siying Dong
ff97287016 Refactor compaction picker code
Summary:
1. Move universal compaction picker to separate files compaction_picker_universal.cc and compaction_picker_universal.h.
2. Rename some functions to make the code easier to understand.
3. Move leveled compaction picking code to a dedicated class, so that we we don't need to pass some common variable around when calling functions. It also allowed us to break down LevelCompactionPicker::PickCompaction() to smaller functions.
Closes https://github.com/facebook/rocksdb/pull/2100

Differential Revision: D4845948

Pulled By: siying

fbshipit-source-id: efa0ab4
2017-04-06 20:09:34 -07:00
Sagar Vemuri
343b59d6ee Move various string utility functions into string_util
Summary:
This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.

Check the sub-commits for easier reviewing.
Closes https://github.com/facebook/rocksdb/pull/2094

Differential Revision: D4837730

Pulled By: sagar0

fbshipit-source-id: 344278a
2017-04-06 14:54:12 -07:00
Yi Wu
df6f5a3772 Move memtable related files into memtable directory
Summary:
Move memtable related files into memtable directory.
Closes https://github.com/facebook/rocksdb/pull/2087

Differential Revision: D4829242

Pulled By: yiwu-arbug

fbshipit-source-id: ca70ab6
2017-04-06 14:09:13 -07:00
Tamir Duberstein
107c5f6a60 CMake: more MinGW fixes
Summary:
siying this is a resubmission of #2081 with the 4th commit fixed. From that commit message:

> Note that the previous use of quotes in PLATFORM_{CC,CXX}FLAGS was
incorrect and caused GCC to produce the incorrect define:
>
>  #define ROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE 1
>
> This was the cause of the Linux build failure on the previous version
of this change.

I've tested this locally, and the Linux build succeeds now.
Closes https://github.com/facebook/rocksdb/pull/2097

Differential Revision: D4839964

Pulled By: siying

fbshipit-source-id: cc51322
2017-04-06 14:09:13 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Islam AbdelRahman
c50e3750dc Use a human readable size for level report
Summary:
Current
```
** Compaction Stats [default] **
Level    Files   Size(MB} Score Read(GB}  Rn(GB} Rnp1(GB} Write(GB} Wnew(GB} Moved(GB} W-Amp Rd(MB/s} Wr(MB/s} Comp(sec} Comp(cnt} Avg(sec} KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/0      49.02   0.5      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     76.1         1         2    0.322       0      0
 Sum      2/0      49.02   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     76.1         1         2    0.322       0      0
 Int      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     76.1         1         2    0.322       0      0
```

New
```
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn Key
Closes https://github.com/facebook/rocksdb/pull/2055

Differential Revision: D4804576

Pulled By: IslamAbdelRahman

fbshipit-source-id: 719be6a
2017-04-05 17:24:19 -07:00
Siying Dong
ce64b8b719 Divide db/db_impl.cc
Summary:
db_impl.cc is too large to manage. Divide db_impl.cc into db/db_impl.cc, db/db_impl_compaction_flush.cc, db/db_impl_files.cc, db/db_impl_open.cc and db/db_impl_write.cc.
Closes https://github.com/facebook/rocksdb/pull/2095

Differential Revision: D4838188

Pulled By: siying

fbshipit-source-id: c5f3059
2017-04-05 17:24:19 -07:00
Andrew Kryczka
d659faad54 Level-based L0->L0 compaction
Summary:
Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.

- L0->L0 is triggered when base level is unavailable due to pending compactions
- L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
- Subcompactions are disabled for L0->L0 since we want to output one file.
- Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
Closes https://github.com/facebook/rocksdb/pull/2027

Differential Revision: D4760318

Pulled By: ajkr

fbshipit-source-id: 9d07183
2017-04-04 18:09:11 -07:00
Siying Dong
43010a929f Revert "[rocksdb][PR] CMake: more MinGW fixes"
fbshipit-source-id: 43b4529
2017-04-04 16:24:26 -07:00
Tamir Duberstein
3450ac8c1b CMake: more MinGW fixes
Summary:
See individual commits.

yuslepukhin siying
Closes https://github.com/facebook/rocksdb/pull/2081

Differential Revision: D4824639

Pulled By: IslamAbdelRahman

fbshipit-source-id: 2fc2b00
2017-04-04 15:09:17 -07:00
Aaron Gao
90cfd46458 update IterKey that can get user key and internal key explicitly
Summary:
to void future bug that caused by the mix of userkey/internalkey
Closes https://github.com/facebook/rocksdb/pull/2084

Differential Revision: D4825889

Pulled By: lightmark

fbshipit-source-id: 28411db
2017-04-04 14:24:20 -07:00
Yi Wu
9e44531803 Refactor WriteImpl (pipeline write part 1)
Summary:
Refactor WriteImpl() so when I plug-in the pipeline write code (which is
an alternative approach for WriteThread), some of the logic can be
reuse. I split out the following methods from WriteImpl():

* PreprocessWrite()
* HandleWALFull() (previous MaybeFlushColumnFamilies())
* HandleWriteBufferFull()
* WriteToWAL()

Also adding a constructor to WriteThread::Writer, and move WriteContext into db_impl.h.
No real logic change in this patch.
Closes https://github.com/facebook/rocksdb/pull/2042

Differential Revision: D4781014

Pulled By: yiwu-arbug

fbshipit-source-id: d45ca18
2017-04-04 10:24:32 -07:00
Siying Dong
6ef8c620d3 Move auto_roll_logger and filename out of db/
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080

Differential Revision: D4821141

Pulled By: siying

fbshipit-source-id: ca7d768
2017-04-03 18:39:14 -07:00
Siying Dong
88cc81df5c auto_roll_logger_test to move away from real sleep
Summary:
auto_roll_logger_test relies on timing conditon that some operations finish within 1 seconds. This caused flaky tests. Move away from real timing and sleep and use fake time to verify the time-based rolling.
Closes https://github.com/facebook/rocksdb/pull/2066

Differential Revision: D4810647

Pulled By: siying

fbshipit-source-id: c54d994
2017-04-03 11:39:09 -07:00
Nikhil Benesch
d25e28d584 replace sometimes-undefined uint type with unsigned int
Summary:
`uint` is nonstandard and not a built-in type on all compilers; replace it
with the always-valid `unsigned int`. I assume this went unnoticed because
it's inside an `#ifdef ROCKDB_JEMALLOC`.
Closes https://github.com/facebook/rocksdb/pull/2075

Differential Revision: D4820427

Pulled By: ajkr

fbshipit-source-id: 0876561
2017-04-03 11:39:09 -07:00
Andrew Kryczka
a1d7e487b3 Add L0 write-amp to compaction level stats
Summary:
Previously it always showed 0.0 for L0 write-amp because we were dividing by bytes read from non-output level. For L0, we should instead divide by bytes ingested to the DB. Note the numerator (bytes written to L0) includes flush bytes.
Closes https://github.com/facebook/rocksdb/pull/2078

Differential Revision: D4816902

Pulled By: ajkr

fbshipit-source-id: 7dca31a
2017-04-03 11:24:10 -07:00
Orgad Shaneh
6401a8b76b Fix build with MinGW
Summary:
There still are many warnings (most of them about invalid printf format
for long long), but it builds if FAIL_ON_WARNINGS is disabled.
Closes https://github.com/facebook/rocksdb/pull/2052

Differential Revision: D4807355

Pulled By: siying

fbshipit-source-id: ef03786
2017-03-30 16:54:52 -07:00
Sagar Vemuri
c6d04f2ecf Option to fail a request as incomplete when skipping too many internal keys
Summary:
Operations like Seek/Next/Prev sometimes take too long to complete when there are many internal keys to be skipped. Adding an option, max_skippable_internal_keys -- which could be used to set a threshold for the maximum number of keys that can be skipped, will help to address these cases where it is much better to fail a request (as incomplete) than to wait for a considerable time for the request to complete.

This feature -- to fail an iterator seek request as incomplete, is disabled by default when max_skippable_internal_keys = 0. It is enabled only when max_skippable_internal_keys > 0.

This feature is based on the discussion mentioned in the PR https://github.com/facebook/rocksdb/pull/1084.
Closes https://github.com/facebook/rocksdb/pull/2000

Differential Revision: D4753223

Pulled By: sagar0

fbshipit-source-id: 1c973f7
2017-03-30 12:09:21 -07:00
Siying Dong
67d7623794 Expose the stalling information through DB::GetProperty()
Summary:
Add two DB properties: rocksdb.actual_delayed_write_rate and rocksdb.is_write_stooped, for people to know whether current writes are being throttled.
Closes https://github.com/facebook/rocksdb/pull/2043

Differential Revision: D4782975

Pulled By: siying

fbshipit-source-id: 6b2f5cf
2017-03-29 11:54:20 -07:00
Maysam Yabandeh
e7731d119a Configure index partition size
Summary:
Allow the users to specify the target index partition size.

With this patch an index partition is cut before its estimated in-memory size goes above the configured value for metadata_block_size. The filter partitions are still cut right after an index partition is cut.
Closes https://github.com/facebook/rocksdb/pull/2041

Differential Revision: D4780216

Pulled By: maysamyabandeh

fbshipit-source-id: 95a0831
2017-03-28 12:09:12 -07:00
Siying Dong
91b5feb37b Fix Windows Build broken by a recent commit
Summary: Closes https://github.com/facebook/rocksdb/pull/2032

Differential Revision: D4766260

Pulled By: siying

fbshipit-source-id: 415daa4
2017-03-23 18:09:57 -07:00
Warren Falk
41ccae6d26 Add C API functions (and tests) for WriteBatchWithIndex
Summary:
I've added functions to the C API to support WriteBatchWithIndex as requested in #1833.

I've also added unit tests to c_test

I've implemented the WriteBatchWithIndex variation of every function available for regular WriteBatch.  And added additional functions unique to WriteBatchWithIndex.

For now, the following is omitted:
  1. The ability to create WriteBatchWithIndex's custom batch-only iterator as I'm not sure what its purpose is.  It should be possible to add later if anyone wants it.
  2. The ability to create the batch with a fallback comparator, since it appears to be unnecessary.  I believe the column family comparator will be used for this, meaning those using a custom comparator can just use the column family variations.
Closes https://github.com/facebook/rocksdb/pull/1985

Differential Revision: D4760039

Pulled By: siying

fbshipit-source-id: 393227e
2017-03-23 15:54:13 -07:00
Daniel Black
f4fce4751e Fix clang compile error - [-Werror,-Wunused-lambda-capture]
Summary:
Errors where:

db/version_set.cc:1535:20: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
                  [this](const Fsize& f1, const Fsize& f2) -> bool {
                   ^
db/version_set.cc:1541:20: error: lambda capture 'this' is not used [-Werror,-Wunused-lambda-capture]
                  [this](const Fsize& f1, const Fsize& f2) -> bool {
                   ^
db/db_test.cc:2983:27: error: lambda capture 'kNumPutsBeforeWaitForFlush' is not required to be captured for this use [-Werror,-Wunused-lambda-capture]
  auto gen_l0_kb = [this, kNumPutsBeforeWaitForFlush](int size) {
                          ^
Closes https://github.com/facebook/rocksdb/pull/1972

Differential Revision: D4685991

Pulled By: siying

fbshipit-source-id: 9125379
2017-03-22 18:09:10 -07:00
Siying Dong
15950fe3a0 Remove ASSERT_EQ(boolean, ...)
Summary: Closes https://github.com/facebook/rocksdb/pull/2024

Differential Revision: D4755420

Pulled By: siying

fbshipit-source-id: 7332ab1
2017-03-22 15:54:12 -07:00
Aaron Gao
3e56c7e0c4 make total_log_size_ atomic
Summary:
make total_log_size_ atomic to avoid overflow caused by data race.
Closes https://github.com/facebook/rocksdb/pull/2019

Differential Revision: D4751391

Pulled By: siying

fbshipit-source-id: fac01dd
2017-03-22 11:54:40 -07:00
Dmitri Smirnov
be723a8d8c Optionally construct Post Processing Info map in MemTableInserter
Summary:
MemTableInserter default constructs Post processing info
  std::map. However, on Windows with 2015 STL the default
  constructed map still dynamically allocates one node
  which shows up on a profiler and we loose ~40% throughput
  on fillrandom benchmark.
  Solution: declare a map as std::aligned storage and optionally
  construct.

This addresses https://github.com/facebook/rocksdb/issues/1976

Before:
-------------------------------------------------------------------
  Initializing RocksDB Options from command-line flags
  DB path: [k:\data\BulkLoadRandom_10M_fillonly]
  fillrandom   :       2.775 micros/op 360334 ops/sec;  280.4 MB/s
  Microseconds per write:
  Count: 10000000 Average: 2.7749  StdDev: 39.92
  Min: 1  Median: 2.0826  Max: 26051
  Percentiles: P50: 2.08 P75: 2.55 P99: 3.55 P99.9: 9.58 P99.99: 51.5**6
  ------------------------------------------------------

  After:

  Initializing RocksDB Options from command-line flags
  DB path: [k:\data\BulkLoadRandom_10M_fillon
Closes https://github.com/facebook/rocksdb/pull/2011

Differential Revision: D4740823

Pulled By: siying

fbshipit-source-id: 1daaa2c
2017-03-22 11:24:12 -07:00
Maysam Yabandeh
8b0097b49b Readers for partition filter
Summary:
This is the last split of this pull request: https://github.com/facebook/rocksdb/pull/1891 which includes the reader part as well as the tests.
Closes https://github.com/facebook/rocksdb/pull/1961

Differential Revision: D4672216

Pulled By: maysamyabandeh

fbshipit-source-id: 6a2b829
2017-03-22 09:24:15 -07:00
Siying Dong
8f5bf04468 Flush triggered by DB write buffer size picks the oldest unflushed CF
Summary:
Previously, when DB write buffer size triggers, we always pick the CF with most data in its memtable to flush. This approach can minimize total flush happens. Change the behavior to always pick the oldest unflushed CF, which makes it the same behavior when max_total_wal_size hits. This approach will minimize size used by max_total_wal_size.
Closes https://github.com/facebook/rocksdb/pull/1987

Differential Revision: D4703214

Pulled By: siying

fbshipit-source-id: 9ff8b09
2017-03-21 11:09:10 -07:00