rocksdb

Author	SHA1	Message	Date
Maysam Yabandeh	01534db24c	Fix the reported asan issues Summary: This is to resolve the asan complains. In the meanwhile I am working on clarifying/revisiting the sync rules. Closes https://github.com/facebook/rocksdb/pull/2510 Differential Revision: D5338660 Pulled By: yiwu-arbug fbshipit-source-id: ce6f6e0826d43a2c0bfa4328a00c78f73cd6498a	2017-06-28 13:13:22 -07:00
Maysam Yabandeh	8e6345d2df	Update rename of ParanoidCheck Summary: Closes https://github.com/facebook/rocksdb/pull/2494 Differential Revision: D5317902 Pulled By: maysamyabandeh fbshipit-source-id: 097330292180816b3d0c9f4cbbdb6f68f0180200	2017-06-24 18:12:03 -07:00
Maysam Yabandeh	499ebb3ab5	Optimize for serial commits in 2PC Summary: Throughput: 46k tps in our sysbench settings (filling the details later) The idea is to have the simplest change that gives us a reasonable boost in 2PC throughput. Major design changes: 1. The WAL file internal buffer is not flushed after each write. Instead it is flushed before critical operations (WAL copy via fs) or when FlushWAL is called by MySQL. Flushing the WAL buffer is also protected via mutex_. 2. Use two sequence numbers: last seq, and last seq for write. Last seq is the last visible sequence number for reads. Last seq for write is the next sequence number that should be used to write to WAL/memtable. This allows to have a memtable write be in parallel to WAL writes. 3. BatchGroup is not used for writes. This means that we can have parallel writers which changes a major assumption in the code base. To accommodate for that i) allow only 1 WriteImpl that intends to write to memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes come via group commit phase which is serial anyway, ii) make all the parts in the code base that assumed to be the only writer (via EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are protected via a stat_mutex_. Note: the first commit has the approach figured out but is not clean. Submitting the PR anyway to get the early feedback on the approach. If we are ok with the approach I will go ahead with this updates: 0) Rebase with Yi's pipelining changes 1) Currently batching is disabled by default to make sure that it will be consistent with all unit tests. Will make this optional via a config. 2) A couple of unit tests are disabled. They need to be updated with the serial commit of 2PC taken into account. 3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires releasing mutex_ beforehand (the same way EnterUnbatched does). This needs to be cleaned up. Closes https://github.com/facebook/rocksdb/pull/2345 Differential Revision: D5210732 Pulled By: maysamyabandeh fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4	2017-06-24 14:11:29 -07:00
Andrew Kryczka	71f5bcb730	Introduce OnBackgroundError callback Summary: Some users want to prevent rocksdb from entering read-only mode in certain error cases. This diff gives them a callback, `OnBackgroundError`, that they can use to achieve it. - call `OnBackgroundError` every time we consider setting `bg_error_`. Use its result to assign `bg_error_` but not to change the function's return status. - classified calls using `BackgroundErrorReason` to give the callback some info about where the error happened - renamed `ParanoidCheck` to something more specific so we can provide a clear `BackgroundErrorReason` - unit tests for the most common cases: flush or compaction errors Closes https://github.com/facebook/rocksdb/pull/2477 Differential Revision: D5300190 Pulled By: ajkr fbshipit-source-id: a0ea4564249719b83428e3f4c6ca2c49e366e9b3	2017-06-22 19:41:50 -07:00
Andrew Kryczka	c217e0b9c7	Call RateLimiter for compaction reads Summary: Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed. Closes https://github.com/facebook/rocksdb/pull/2433 Differential Revision: D5216946 Pulled By: ajkr fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e	2017-06-13 14:56:46 -07:00
Siying Dong	52a7f38b19	WriteOptions.low_pri which can throttle low pri writes if needed Summary: If ReadOptions.low_pri=true and compaction is behind, the write will either return immediate or be slowed down based on ReadOptions.no_slowdown. Closes https://github.com/facebook/rocksdb/pull/2369 Differential Revision: D5127619 Pulled By: siying fbshipit-source-id: d30e1cff515890af0eff32dfb869d2e4c9545eb0	2017-06-05 15:02:35 -07:00
Yi Wu	ad19eb8686	Fixing blob db sequence number handling Summary: Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch. Stacking on #2375. Closes https://github.com/facebook/rocksdb/pull/2385 Differential Revision: D5148358 Pulled By: yiwu-arbug fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33	2017-05-31 10:56:45 -07:00
Yi Wu	07bdcb91fe	New WriteImpl to pipeline WAL/memtable write Summary: PipelineWriteImpl is an alternative approach to WriteImpl. In WriteImpl, only one thread is allow to write at the same time. This thread will do both WAL and memtable writes for all write threads in the write group. Pending writers wait in queue until the current writer finishes. In the pipeline write approach, two queue is maintained: one WAL writer queue and one memtable writer queue. All writers (regardless of whether they need to write WAL) will still need to first join the WAL writer queue, and after the house keeping work and WAL writing, they will need to join memtable writer queue if needed. The benefit of this approach is that 1. Writers without memtable writes (e.g. the prepare phase of two phase commit) can exit write thread once WAL write is finish. They don't need to wait for memtable writes in case of group commit. 2. Pending writers only need to wait for previous WAL writer finish to be able to join the write thread, instead of wait also for previous memtable writes. Merging #2056 and #2058 into this PR. Closes https://github.com/facebook/rocksdb/pull/2286 Differential Revision: D5054606 Pulled By: yiwu-arbug fbshipit-source-id: ee5b11efd19d3e39d6b7210937b11cefdd4d1c8d	2017-05-19 14:26:42 -07:00
Siying Dong	d616ebea23	Add GPLv2 as an alternative license. Summary: Closes https://github.com/facebook/rocksdb/pull/2226 Differential Revision: D4967547 Pulled By: siying fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4	2017-04-27 18:06:12 -07:00
Yi Wu	e9e6e53247	Simplify write thread logic Summary: The concept about early exit in write thread implementation is a confusing one. It means that if early exit is allowed, batch group leader will not responsible to exit the batch group, but the last finished writer do. In case we need to mark log synced, or encounter memtable insert error, early exit is disallowed. This patch remove such a concept by: * In all cases, the last finished writer (not necessary leader) is responsible to exit batch group. * In case of parallel memtable write, leader will also mark log synced after memtable insert and before signal finish (call `CompleteParallelWorker()`). The purpose is to allow mark log synced (which require locking mutex) can run in parallel to memtable insert in other writers. * The last finish writer should handle memtable insert error (update bg_error_) before exiting batch group. Closes https://github.com/facebook/rocksdb/pull/2134 Differential Revision: D4869667 Pulled By: yiwu-arbug fbshipit-source-id: aec170847c85b90f4179d6a4608a4fe1361544e3	2017-04-13 16:12:04 -07:00
Igor Canadi	b6b9359ece	Fix BYTES_WRITTEN accounting Summary: BYTES_WRITTEN accounting doesn't work with disabled WAL. For example, this is what we get in the LOG: ``` Cumulative writes: 9794K writes, 228M keys, 9794K commit groups, 1.0 writes per commit group, ingest: 0.00 GB, 0.00 MB/s ``` WAL bytes are tracked in a different statistic: https://github.com/facebook/rocksdb/blob/master/db/internal_stats.h#L105. BYTES_WRITTEN should count all the writes. Closes https://github.com/facebook/rocksdb/pull/2133 Differential Revision: D4880615 Pulled By: yiwu-arbug fbshipit-source-id: 8fd0b223099f3f5ad7df79d4e737d313687fec69	2017-04-13 16:12:03 -07:00
Maysam Yabandeh	20778f2f92	Adding comments to the write path Summary: also did minor refactoring Closes https://github.com/facebook/rocksdb/pull/2115 Differential Revision: D4855818 Pulled By: maysamyabandeh fbshipit-source-id: fbca6ac57e5c6677fffe8354f7291e596a50cb77	2017-04-10 12:43:34 -07:00
Siying Dong	ff97287016	Refactor compaction picker code Summary: 1. Move universal compaction picker to separate files compaction_picker_universal.cc and compaction_picker_universal.h. 2. Rename some functions to make the code easier to understand. 3. Move leveled compaction picking code to a dedicated class, so that we we don't need to pass some common variable around when calling functions. It also allowed us to break down LevelCompactionPicker::PickCompaction() to smaller functions. Closes https://github.com/facebook/rocksdb/pull/2100 Differential Revision: D4845948 Pulled By: siying fbshipit-source-id: efa0ab4	2017-04-06 20:09:34 -07:00
Siying Dong	d2dce5611a	Move some files under util/ to separate dirs Summary: Move some files under util/ to new directories env/, monitoring/ options/ and cache/ Closes https://github.com/facebook/rocksdb/pull/2090 Differential Revision: D4833681 Pulled By: siying fbshipit-source-id: 2fd8bef	2017-04-05 19:09:16 -07:00
Siying Dong	ce64b8b719	Divide db/db_impl.cc Summary: db_impl.cc is too large to manage. Divide db_impl.cc into db/db_impl.cc, db/db_impl_compaction_flush.cc, db/db_impl_files.cc, db/db_impl_open.cc and db/db_impl_write.cc. Closes https://github.com/facebook/rocksdb/pull/2095 Differential Revision: D4838188 Pulled By: siying fbshipit-source-id: c5f3059	2017-04-05 17:24:19 -07:00

1 2

65 Commits