Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold.
Test Plan: Add unit test DBTest.HardLimit
Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45999
Summary:
Although there are currently counters to keep track of the
stall caused by having too many L0 files, there is no distinction as
to whether when that stall occurs either (A) L0-L1 compaction is taking
place to try and mitigate it, or (B) no L0-L1 compaction has been scheduled
at the moment. This diff adds a counter for (A) so that the nature of L0
stalls can be better understood.
Test Plan: make all && make check
Reviewers: sdong, igor, anthony, noetzli, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, dhruba
Differential Revision: https://reviews.facebook.net/D46749
Summary:
RocksDB debug version failed to build under gcc-4.8.1 on sandcastle with the following error:
```
db/db_compaction_filter_test.cc:570:33: error: ‘snapshot’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
```
Test Plan: make db_compaction_filter_test && ./db_compaction_filter_test
Reviewers: rven, anthony, yhchiang, aekmekji, igor, sdong
Reviewed By: igor, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46725
Summary:
There are currently no statistics on seeks, only on gets. This adds the following counters:
rocksdb.number.db.seek
rocksdb.number.db.next
rocksdb.number.db.prev
(number of calls)
rocksdb.db.iterate.bytes.read
(number of bytes read from key + value using seek/next/prev)
rocksdb.number.keys.seek.found
rocksdb.number.keys.next.found
rocksdb.number.keys.prev.found
(number of calls where seek/next/prev found a value)
Test Plan:
./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5
./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5 -reverse_iterator
Reviewers: yhchiang, rven, kradhakrishnan, IslamAbdelRahman, MarkCallaghan, sdong, igor
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D46605
Summary:
Refactoring NewTableReader to accept TableReaderOptions
This will make it easier to add new options in the future, for example in this diff https://reviews.facebook.net/D46071
Test Plan: run existing tests
Reviewers: igor, yhchiang, anthony, rven, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D46179
Summary:
During the refactoring, the condition that makes sure that compaction
filters are only applied to records newer than the latest snapshot
got butchered. This patch fixes the condition and adds a test case.
Test Plan: make db_compaction_filter_test && ./db_compaction_filter_test
Reviewers: rven, anthony, yhchiang, sdong, aekmekji, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46707
Summary. A change https://reviews.facebook.net/differential/diff/224721/
Has attempted to move common functionality out of platform dependent
code to a new facility called file_reader_writer.
This includes:
- perf counters
- Buffering
- RateLimiting
However, the change did not attempt to refactor Windows code.
To mitigate, we introduce new quering interfaces such as UseOSBuffer(),
GetRequiredBufferAlignment() and ReaderWriterForward()
for pure forwarding where required.
Introduce WritableFile got a new method Truncate(). This is to communicate
to the file as to how much data it has on close.
- When space is pre-allocated on Linux it is filled with zeros implicitly,
no such thing exist on Windows so we must truncate file on close.
- When operating in unbuffered mode the last page is filled with zeros but we still want to truncate.
Previously, Close() would take care of it but now buffer management is shifted to the wrappers and the file has
no idea about the file true size.
This means that Close() on the wrapper level must always include
Truncate() as well as wrapper __dtor should call Close() and
against double Close().
Move buffered/unbuffered write logic to the wrapper.
Utilize Aligned buffer class.
Adjust tests and implement Truncate() where necessary.
Come up with reasonable defaults for new virtual interfaces.
Forward calls for RandomAccessReadAhead class to avoid double
buffering and locking (double locking in unbuffered mode on WIndows).
Summary:
The current build is failing on some platforms due to an __unused__ attribute.
This patch prevents the problem by using a pattern similar to MergeHelper
(assert not on the variable but inside a condition that uses the variable). We
should have better error handling in both cases in the future.
Test Plan: make clean all check
Reviewers: rven, anthony, yhchiang, sdong, igor, aekmekji
Reviewed By: aekmekji
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46623
Summary:
The test SuggestCompactRangeNoTwoLevel0Compactions in
DBCompactionTest fails when there are parallel L0-L1 compactions
taking place because the test makes sure that only one compaction
involving L0 takes place at any given time (since before having
parallel compactions this was impossible). I changed the test to only
run with DBOptions.max_subcompactions=1 so as to not hit this issue
which is not a correctness issue but just an inherent changing of
assumptions after introducing parallel compactions.
This failed after landing https://reviews.facebook.net/D43269#inline-321303
so now this should fix it
Test Plan: make all && make check
Reviewers: yhchiang, igor, anthony, noetzli, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D46617
Summary:
Builder and CompactionJob share a lot of fairly complex code. This patch
refactors this code into a separate class, the CompactionIterator. Because the
shared code is fairly complex, this patch hopefully improves maintainability.
While there are is a lot of potential for further improvements, the patch is
intentionally pretty close to the original structure because the change is
already complex enough.
Test Plan: make clean all check && ./db_stress
Reviewers: rven, anthony, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46197
Summary: ReadDroppedColumnFamily is consistently failing in Travis CI environment (can't repro locally). I suspect it might be failing with non-OK status. This diff will give us more info about the failure.
Test Plan: none
Reviewers: sdong, kradhakrishnan
Reviewed By: kradhakrishnan
Subscribers: kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46611
Summary:
Up to this point, the subcompactions that make up a compaction
job have been divided based on the key range of the L1 files, and each
subcompaction has handled the key range of only one file. However
DBOption.max_subcompactions allows the user to designate how many
subcompactions at most to perform. This patch updates the
CompactionJob::GetSubcompactionBoundaries() to determine these
divisions accordingly based on that option and other input/system factors.
The current approach orders the starting and/or ending keys of certain
compaction input files and then generates a histogram to approximate the
size covered by the key range between each consecutive pair of keys. Then
it groups these ranges into groups so that the sizes are approximately equal
to one another. The approach has also been adapted to work for universal
compaction as well instead of just for level-based compaction as it was before.
These subcompactions are then executed in parallel by locally spawning
threads, one for each. The results are then aggregated and the compaction
completed.
Test Plan: make all && make check
Reviewers: yhchiang, anthony, igor, noetzli, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43269
Summary: The current code, considers data to be consistent if the record
checksum passes. We do have customer issues where the record checksum passed but
the data was incomprehensible. There is no way to get out of this error case
since all WAL recovery model will consider this error as unrelated to WAL.
Relaxing the definition and including errors while inserting to memtable as WAL
errors and handing them as per the recovery level.
Test Plan: Used customer dump to verify the fix for different level. The db
opens for kSkipAnyCorruptedRecords and kPointInTimeRecovery, but fails for
kAbsoluteConsistency and kTolerateCorruptedTailRecords.
Reviewers: sdon igor
CC: leveldb@
Task ID: #7918721
Blame Rev:
Summary: DBTest.ReadLatencyHistogramByLevel was not written as expected. After writes, reads aren't guaranteed to hit data written. It was not expected. Fix it.
Test Plan: Run the test multiple times
Reviewers: IslamAbdelRahman, rven, anthony, kradhakrishnan, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D46587
Summary: We should never set max_open_files to be bigger than the system's ulimit. Otherwise we will get "Too many open files" errors. See an example in this Travis run: https://travis-ci.org/facebook/rocksdb/jobs/79591566
Test Plan:
make check
I will also verify that max_max_open_files is reasonable.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46551
Summary: Currently, if users didn't set options.arena_block_size, we set "result.arena_block_size = result.write_buffer_size / 10". It makes result.arena_block_size not a multiplier of 4KB, even if options.write_buffer_size is a multiplier of MBs. When calling malloc to arena_block_size, we may waste a small amount of memory for it. We now make the default to be /8 or /16 and align it to 4KB.
Test Plan: unit tests
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46467
Summary:
This commit makes DBTest.OptimizeFiltersForHits more deterministic by:
(1) make key inserts more random
(2) make sure L0 has one file
(3) make file size smaller compared to level target so L1 will cover more range.
Test Plan: Run the test many times.
Reviewers: rven, IslamAbdelRahman, kradhakrishnan, igor, anthony
Reviewed By: anthony
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D46461
Summary:
It looks like in some cases an assert in SeekInternal failed when computing the
hints for the next level because user_key was the same as the largest key and
not strictly smaller. Relaxing the assert to expect smaller or equal keys.
Test Plan: make clean all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46443
Summary:
In some cases, equality comparisons can be done more efficiently than three-way
comparisons. There are quite a few places in the code where we only care about
equality. This patch adds an Equal() method that defaults to using the
Compare() method.
Test Plan: make clean all check
Reviewers: rven, anthony, yhchiang, igor, sdong
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46233
Summary: When computing the hint for GetNextLevelIndex(), ForwardIterator was doing a redundant comparison. This patch fixes the comparison (using https://github.com/facebook/rocksdb/blob/master/db/version_set.cc#L158 as a reference) and moves it inside an assert because we expect `level_files[f_idx]` to contain the next key after Seek(), so user_key should always be smaller than the largest key.
Test Plan: make clean all check
Reviewers: rven, anthony, yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: tnovak, sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D46227
Summary:
This diff fixes a case when the forward iterator misses a new
insert when the mutable iterator is not current. The test is also
improved and the check for deleted iterators is made more informative.
Test Plan: DBTailingIteratorTest.*Trim
Reviewers: tnovak, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D46167
Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).
Test Plan: make clean check all
Reviewers: rven, anthony, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45993
Summary: TableCache::Get() puts parameters in the wrong places so that table readers created by Get() will not have the histogram updated.
Test Plan: Will write a unit test for that.
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D46035
Summary:
I noticed that memtable iterator usually crosses the `iterate_upper_bound`
threshold when tailing. Changes introduced in D43833 made `NeedToSeekImmutable`
always return true in such case, even when `Seek()` only needs to rewind the
memtable iterator. In a test I ran, this caused the "tailing efficiency"
(ratio of calls to `Seek()` that only affect the memtable versus all seeks)
to drop almost to zero.
This diff attempts to fix the regression by using a different flag to indicate
that `current_` is over the limit instead of resetting `valid_` in
`UpdateCurrent()`.
Test Plan: `DBTestTailingIterator.TailingIteratorUpperBound`
Reviewers: sdong, rven
Reviewed By: rven
Subscribers: dhruba, march
Differential Revision: https://reviews.facebook.net/D45909
Summary:
Flushes in listener_test happened to early when ROCKSDB_MALLOC_USABLE_SIZE was
active (e.g. when compiling with ROCKSDB_FBCODE_BUILD_WITH_481=1) due to
malloc_usable_size() reporting a better estimate (similar to
https://reviews.facebook.net/D43317 ). This patch grows the write buffer size
slightly to compensate for this.
Test Plan: ROCKSDB_FBCODE_BUILD_WITH_481=1 make listener_test && ./listener_test
Reviewers: rven, anthony, yhchiang, igor, sdong
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45921
Summary:
Pessimistic Transaction expiration time checking currently causes a performace regression, Lets disable it in db_bench by default.
Also, in order to be able to better tune how much contention we're simulating, added new optinos to set lock timeout and snapshot.
Test Plan: run db_bench randomtranansaction
Reviewers: sdong, igor, yhchiang, MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45831
Summary:
Now that the approach to parallelizing L0-L1 level-based
compactions by breaking the compaction job into subcompactions is
being extended to apply to universal compactions as well, the unit
tests need to account for this and run the universal compaction
tests with subcompactions both enabled and disabled.
Test Plan: make all && make check
Reviewers: sdong, igor, noetzli, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45657
Summary: malloc_usable_size() gets a better estimation of memory usage. It is already used to calculate block cache memory usage. Use it in arena too.
Test Plan: Run all unit tests
Reviewers: anthony, kradhakrishnan, rven, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43317
Summary:
MarkLogsSynced() was doing `logs_.erase(it++);`. The standard is saying:
```
all iterators and references are invalidated, unless the erased members are at an end (front or back) of the deque (in which case only iterators and references to the erased members are invalidated)
```
Because `it` is an iterator to the first element of the container, it is
invalidated, only one iteration is executed and `log.getting_synced = false;`
is not being done, so `while (logs_.front().getting_synced)` in `WriteImpl()`
is not terminating.
Test Plan: make db_bench && ./db_bench --benchmarks=fillsync
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, yhchiang, sdong, tnovak
Reviewed By: tnovak
Subscribers: kolmike, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45807
Summary:
Just realized that after D45675, part of the code in
DBTest.ApproximateMemoryUsage, does not really test anything anymore, so I
removed it.
Test Plan: make clean all check
Reviewers: rven, igor, sdong, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45783
Summary:
The immutable memtable iterators are allocated from an arena and there
is no benefit from deleting these. Also the immutable memtables
themselves will continue to be in memory until the version set
containing it is alive. We will not remove immutable memtable iterators
over the upper bound. We now add immutable iterators to the test.
Test Plan: db_tailing_iter_test.TailingIteratorTrimSeekToNext
Reviewers: tnovak, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45597
Summary: Add ZSTD compression type. The same way as adding LZ4.
Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests.
Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor
Reviewed By: igor
Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45747
Summary:
This patch fixes two issues in DBTest.ApproximateMemoryUsage:
- It was possible that a flush happened between getting the two properties in
Phase 1, resulting in different numbers for the properties and failing the
assertion. This is fixed by waiting for the flush to finish before getting
the properties.
- There was a similar issue in Phase 2 and additionally there was an issue that
rocksdb.size-all-mem-tables was not monotonically increasing because it was
possible that a flush happened just after getting the properties and then
another flush just before getting the properties in the next round. In this
situation, the reported memory usage decreased. This is fixed by forcing a
flush before getting the properties.
Note: during testing, I found that kFlushesPerRound does not seem very
accurate. I added a TODO for this and it would be great to get some input on
what to do there.
Test Plan:
The first issue can be made more likely to trigger by inserting a
`usleep(10000);` between the calls to GetIntProperty() in Phase 1.
The second issue can be made more likely to trigger by inserting a
`if (r != 0) usleep(10000);` before the calls to GetIntProperty() and a
`usleep(10000);` after the calls.
Then execute make db_test && ./db_test --gtest_filter=DBTest.ApproximateMemoryUsage
Reviewers: rven, yhchiang, igor, sdong, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45675
Summary:
Add argument --show_table_properties to db_bench
-show_table_properties (If true, then per-level table properties will be
printed on every stats-interval when stats_interval is set and
stats_per_interval is on.) type: bool default: false
Test Plan:
./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1
./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1 --num_column_families=2
Sample Output:
Compaction Stats [column_family_name_000001]
Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt) KeyIn KeyDrop
---------------------------------------------------------------------------------------------------------------------------------------------------------------------
L0 3/0 5 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 86.3 0 17 0.021 0 0 0
L1 5/0 9 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 0
L2 9/0 16 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 0
Sum 17/0 31 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 86.3 0 17 0.021 0 0 0
Int 0/0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 83.9 0 2 0.022 0 0 0
Flush(GB): cumulative 0.030, interval 0.004
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard
Level[0]: # data blocks=2571; # entries=84813; raw key size=2035512; raw average key size=24.000000; raw value size=8481300; raw average value size=100.000000; data block size=5690119; index block size=82415; filter block size=0; (estimated) table size=5772534; filter policy name=N/A;
Level[1]: # data blocks=4285; # entries=141355; raw key size=3392520; raw average key size=24.000000; raw value size=14135500; raw average value size=100.000000; data block size=9487353; index block size=137377; filter block size=0; (estimated) table size=9624730; filter policy name=N/A;
Level[2]: # data blocks=7713; # entries=254439; raw key size=6106536; raw average key size=24.000000; raw value size=25443900; raw average value size=100.000000; data block size=17077893; index block size=247269; filter block size=0; (estimated) table size=17325162; filter policy name=N/A;
Level[3]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
Level[4]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
Level[5]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
Level[6]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A;
Reviewers: anthony, IslamAbdelRahman, MarkCallaghan, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45651
Summary:
ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.
We add ReadaheadRandomAccessFile layer only when file is read during compactions.
D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.
Test Plan: make check
Reviewers: MarkCallaghan, sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45123
Summary:
When DBIter changes iterating direction from forward to backward, it might see some much larger keys with higher sequence ID. With this commit, these rows will be actively filtered out. It should fix existing disabled tests in db_iter_test.
This may not be a perfect fix, but it introduces least impact on existing codes, in order to be safe.
Test Plan:
Enable existing tests and make sure they pass. Add a new test DBIterWithMergeIterTest.InnerMergeIteratorDataRace8.
Also run all existing tests.
Reviewers: yhchiang, rven, anthony, IslamAbdelRahman, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45567
Summary:
DBTest.GetProperty was failing occasionally (see task #8131266). The reason was
that the test closed the database before the compaction was done. When the test
reopened the database, RocksDB would schedule a compaction which in turn
created table readers and lead the test to fail the assertion that
rocksdb.estimate-table-readers-mem is 0. In most cases, GetIntProperty() of
rocksdb.estimate-table-readers-mem happened before the compaction created the
table readers, hiding the problem. This patch changes the
WaitForFlushMemTable() to WaitForCompact(). WaitForFlushMemTable() is not
necessary because it is already being called a couple of lines before without
any insertions in-between.
Test Plan:
Insert `usleep(10000);` just after `Reopen(options);` on line 2333 to make the issue more likely, then run:
make db_test && while ./db_test --gtest_filter=DBTest.GetProperty; do true; done
Reviewers: rven, yhchiang, anthony, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45603
Summary:
It was pointed out to me that the members of SubCompactionState
'earliest_snapshot', 'latest_snapshot' and 'visible_at_tip' are never
modified by the subcompactions, so they can stay as global varaibles
instead to make things simpler.
Test Plan: make all && make check
Reviewers: sdong, igor, noetzli, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45477
Summary: There was a bad merge during refresh.
Test Plan: make -j all; make check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45555
Summary:
We have earlier added a feature to delete file iterators when the
current key is over the iterate upper bound. We now add a whitebox test
to check if the file iterators were actually deleted.
Test Plan: Add check for a range which has deleted iterators.
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45321
Summary:
After deleting file iterators which are over the iterate upper
bound, we also need to check for null pointers in
ResetIncompletIterators.
Test Plan: db_tailing_iter_test.TailingIteratorTrimSeekToNext
Reviewers: tnovak, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45525
Summary:
See task #7983654. The example was triggering an assert in compaction job
because the compaction was not marked as manual. With this patch,
CompactionPicker::FormCompaction() marks compactions as manual. This patch
also fixes a couple of typos, adds optimistic_transaction_example to
.gitignore and librocksdb as a dependency for examples. Adding librocksdb as
a dependency makes sure that the examples are built with the latest changes
in librocksdb.
Test Plan: make clean && cd examples && make all && ./compact_files_example
Reviewers: rven, sdong, anthony, igor, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45117
Summary:
This patch adds "rocksdb.aggregated-table-properties"
and "rocksdb.aggregated-table-properties-at-levelN", the former
returns the aggreated table properties of a column family,
while the later returns the aggregated table properties
of the specified level N.
Test Plan: Added tests in db_test
Reviewers: igor, sdong, IslamAbdelRahman, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45087
Summary:
This patch fixes a race condition in DBTEst.DynamicMemtableOptions. In rare cases,
it was possible that the main thread would fill up both memtables before the flush
job acquired its work. Then, the flush job was flushing both memtables together,
producing only one L0 file while the test expected two. Now, the test waits for
flushes to finish earlier, to make sure that the memtables are flushed in separate
flush jobs.
Test Plan:
Insert "usleep(10000);" after "IOSTATS_SET_THREAD_POOL_ID(Env::Priority::HIGH);" in BGWorkFlush()
to make the issue more likely. Then test with:
make db_test && time while ./db_test --gtest_filter=*DynamicMemtableOptions; do true; done
Reviewers: rven, sdong, yhchiang, anthony, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45429
Summary: As title
Test Plan: make check
Reviewers: yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45447
Summary:
Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks.
This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point.
I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature.
Test Plan:
make check
I had to adjust some unit tests to understand this new behavior
Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli
Reviewed By: noetzli
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42087
Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.
I also added a --subcompactions option to db_bench
Test Plan: make all make check
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45069
Summary: Add more test cases of data race causing wrong iterating results. Tag tests not passing as DISABLED_
Test Plan: Run the tests
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, yhchiang
Reviewed By: yhchiang
Subscribers: tnovak, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44907
Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.
Test Plan: Add the option.
Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang
Reviewed By: igor
Subscribers: igor, MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43311
Summary:
Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.
Test Plan: Add unit tests
Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44205
Summary:
Currently, ThreadStatusFlush uses two sync-points to ensure
there's a flush currently running when calling GetThreadList().
However, one of the sync-point is inside db-mutex, which could
cause deadlock in case there's a DB::Get() call.
This patch fix this issue by moving the sync-point to a better
place where the flush job does not hold the mutex.
Test Plan: db_test
Reviewers: igor, sdong, anthony, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45045
Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions
Test Plan: Unittests for the new property
Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D44799
Summary: Removing two unused variables that prevented compilation.
Test Plan: make all
Reviewers: rven, sdong, yhchiang, anthony, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D44991
Summary:
This diff improves the memory utilization for tailing iterators RocksDB,
by freeing file iterators which are over the upper bound.
It is an updating on Siying's original diff for improving the memory usage for
tailing iterators. The changes for the seek and next path are now complete
and a test has been added to exercise these paths while deleting file iterators
which are above the upper bound.
Test Plan: db_tailing_iter_test.TailingIteratorTrimSeekToNext
Reviewers: march, tnovak, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D43833
Summary: Update DestroyDB so that all SST files in the first path id go through DeleteScheduler instead of being deleted immediately
Test Plan: added a unittest
Reviewers: igor, yhchiang, anthony, kradhakrishnan, rven, sdong
Reviewed By: sdong
Subscribers: jeanxu2012, dhruba
Differential Revision: https://reviews.facebook.net/D44955
Summary:
Currently, GetIntProperty("rocksdb.cur-size-all-mem-tables") only returns
the memory usage by those memtables which have not yet been flushed.
This patch introduces GetIntProperty("rocksdb.size-all-mem-tables"),
which includes the memory usage by all the memtables, includes those
have been flushed but pinned by iterators.
Test Plan: Added a test in db_test
Reviewers: igor, anthony, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D44229
Summary: There is a check to fail the iterator if prefix extractor is specified but upper bound is out of the prefix for the seek key. Relax this constraint to allow users to set upper bound to the next prefix of the current one.
Test Plan: make commit-prereq
Reviewers: igor, anthony, kradhakrishnan, yhchiang, rven
Reviewed By: rven
Subscribers: tnovak, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44949
Summary:
A couple variables were declared but only used in assertions
which causes issues when building in fbcode.
Test Plan: make dbg and make release
Reviewers: yhchiang, sdong, igor, anthony, MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D44937
Summary:
In D43239 (https://reviews.facebook.net/D43239) the number
of subcompactions is set based on the number of L1 files with
unique starting keys. In certain cases when this number is very large
this causes issues, particularly with the overlap between files since
very small output files can be generated. This diff bounds the number
of subcompactions to the user option DBOption.num_subcompactions.
Test Plan: ./db_test ./db_compaction_test
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D44883
Summary:
Reseek mutable_iter if it is invalid in Next and immutable_iter
is invalid.
Test Plan: DBTestTailingIterator.TailingIteratorSeekToNext
Reviewers: tnovak, march, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D44865
Summary:
In D43239 (https://reviews.facebook.net/D43239) there is an
assertion to make sure a subcompaction's output is never empty at the
end of execution. This assertion however breaks the build because some
tests lead to exactly that scenario. So instead I have altered the logic
to handle this case instead of just failing the assertion.
The reason that it is possible for a subcompaction's output to be empty is
that during a sequential execution of subcompactions, if a user aborts the
compaction job then some of the later subcompactions to be executed may
have yet to process any keys and therefore have yet to generate output files.
This becomes very rare once the subcompactions are executed in parallel,
but for now they are still sequential so the case is possible when there is an
early termination, as in some of the tests.
Test Plan: ./db_test ./db_compaction_test
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D44877
Summary:
In prepration for running multiple threads at the same time during
a compaction job, this patch assigns each subcompaction its own state
(instead of sharing the one global CompactionState). Each subcompaction then
uses this state to update its statistics, keep track of its snapshots, etc.
during the course of execution. Then at the end of all the executions the
statistics are aggregated across the subcompactions so that the final result
is the same as if only one larger compaction had run.
Test Plan: ./db_test ./db_compaction_test ./compaction_job_test
Reviewers: sdong, anthony, igor, noetzli, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43239
Summary:
While working on supporting mixing merge operators with
single deletes ( https://reviews.facebook.net/D43179 ),
I realized that returning and dealing with merge results
can be made simpler. Submitting this as a separate diff
because it is not directly related to single deletes.
Before, callers of merge helper had to retrieve the merge
result in one of two ways depending on whether the merge
was successful or not (success = result of merge was single
kTypeValue). For successful merges, the caller could query
the resulting key/value pair and for unsuccessful merges,
the result could be retrieved in the form of two deques of
keys and values. However, with single deletes, a successful merge
does not return a single key/value pair (if merge
operands are merged with a single delete, we have to generate
a value and keep the original single delete around to make
sure that we are not accidentially producing a key overwrite).
In addition, the two existing call sites of the merge
helper were taking the same actions independently from whether
the merge was successful or not, so this patch simplifies that.
Test Plan: make clean all check
Reviewers: rven, sdong, yhchiang, anthony, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43353
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.
Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected
Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44193
Summary:
This diff allows a Writer to join the next write batch group
without acquiring any locks. Waiting is performed via a per-Writer mutex,
so all of the non-leader writers never need to acquire the db mutex.
It is now possible to join a write batch group after the leader has been
chosen but before the batch has been constructed. This diff doesn't
increase parallelism, but reduces synchronization overheads.
For some CPU-bound workloads (no WAL, RAM-sized working set) this can
substantially reduce contention on the db mutex in a multi-threaded
environment. With T=8 N=500000 in a CPU-bound scenario (see the test
plan) this is good for a 33% perf win. Not all scenarios see such a
win, but none show a loss. This code is slightly faster even for the
single-threaded case (about 2% for the CPU-bound scenario below).
Test Plan:
1. unit tests
2. COMPILE_WITH_TSAN=1 make check
3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000
Reviewers: sdong, igor, rven, ljin, yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D43887
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
Add two more counters in iostats_context.
Also add a parameter of db_bench.
Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44115
Summary: Iterator has a bug: if a child iterator reaches its end, and user issues a Prev(), and just before SeekToLast() of the child iterator is called, some extra rows is added in the end, the position of iterator can be misplaced.
Test Plan: Run the tests with or without valgrind
Reviewers: rven, yhchiang, IslamAbdelRahman, anthony
Reviewed By: anthony
Subscribers: tnovak, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43671
Summary:
Initial implementation of Pessimistic Transactions. This diff contains the api changes discussed in D38913. This diff is pretty large, so let me know if people would prefer to meet up to discuss it.
MyRocks folks: please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues.
Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint(). After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex. We can then decide which route is preferable.
Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing.
Test Plan: Unit tests, db_bench parallel testing.
Reviewers: igor, rven, sdong, yhchiang, yoshinorim
Reviewed By: sdong
Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D40869
Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover
Test Plan:
make check -j64
COMPILE_WITH_TSAN=1 make check -j64
DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running)
Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D43755
Summary:
While working on single delete support for db_bench, I realized that
db_bench/db_stress contain a bunch of duplicate code related to
copmression and found some typos. This patch removes duplicate code,
typos and a redundant #ifndef in internal_stats.cc.
Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress
Reviewers: yhchiang, sdong, rven, anthony, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43965
Summary:
Key comparison is the single largest CPU user for CPU-bound
workloads. This diff reduces the number of comparisons in two ways.
The first is that it moves predecessor array gathering from
FindGreaterOrEqual to FindLessThan, so that FindGreaterOrEqual can
return immediately if compare_ returns 0. As part of this change I
moved the sequential insertion optimization into Insert, to remove the
undocumented (and smelly) requirement that prev must be equal to prev_
if it is non-null.
The second optimization is that all of the search functions skip calling
compare_ when moving to a lower level that has the same Next pointer.
With a branching factor of 4 we would expect this to happen 1/4 of
the time.
On a single-threaded CPU-bound workload (-benchmarks=fillrandom -threads=1
-batch_size=1 -memtablerep=skip_list -value_size=0 --num=1600000
-level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
-disable_auto_compactions --max_write_buffer_number=8
-max_background_flushes=8 --disable_wal --write_buffer_size=160000000)
on my dev server this is good for a 7% perf win.
Test Plan: unit tests
Reviewers: rven, ljin, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D43233
Summary:
Changed compaction_job_test to support better/more thorough
tests and added two tests. Also changed MockFileContents
to order using InternalKeyComparator.
Test Plan: make compaction_job_test && ./compaction_job_test; make all && make check
Reviewers: sdong, rven, igor, yhchiang, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42837
Summary: Implemented this simple wrapper for something else I was working on. Seemed like it makes sense to expose it instead of burying it in some random code.
Test Plan: added test
Reviewers: rven, kradhakrishnan, sdong, yhchiang
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43293
Summary:
Visual Studio complains about deque<LogWriterNumber> because LogWriterNumber is non-copyable for its unique_ptr member writer. Move away from it, and do explit free.
It is less safe but I can't think of a better way to unblock it.
Test Plan: valgrind check test
Reviewers: anthony, IslamAbdelRahman, kolmike, rven, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43647
Summary:
When seeking to the last occurrence of a key with sequence number 0, db_iter
ends up in an endless loop because it seeks to type kValueTypeForSeek
which is larger than kTypeDeletion/kTypeValue. Added test case that triggers
the behavior.
Test Plan: make clean all check
Reviewers: igor, rven, anthony, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43653
Summary: Update DeleteScheduler tests so that they verify the used penalties for waiting instead of measuring the time spent which is not reliable
Test Plan:
make -j64 delete_scheduler_test && ./delete_scheduler_test
COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test
COMPILE_WITH_ASAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test
make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
COMPILE_WITH_ASAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths"
Reviewers: yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D43635
Summary: The list of info log files of a db can be obtained using the new function.
Test Plan: New test in db_test.cc passed.
Reviewers: yhchiang, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: IslamAbdelRahman, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D41715
Summary:
Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed.
Create a new test file db_wal_test and move two WAL related tests from db_test to here.
Test Plan: Run the new tests
Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43605
Summary: Measure read latency histogram and put in statistics. Compaction inputs are excluded from it when possible (unfortunately usually no possible as we usually take table reader from table cache.
Test Plan:
Run db_bench and it shows the stats, like:
rocksdb.sst.read.micros statistics Percentiles :=> 50 : 1.238522 95 : 2.529740 99 : 3.912180
Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, MarkCallaghan, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43275
Summary: This patch will fix the false positive of DBTest.FlushSchedule under TSAN, we dont need to disable this test
Test Plan: COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.FlushSchedule"
Reviewers: yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D43599
Summary:
While doing forward iterating, if current key is merge, internal iterator position is placed to the next key. If Prev() is called now, needs to do extra Prev() to recover the location.
This is second attempt of fixing after reverting ec70fea4c4. This time shrink the fix to only merge key is the current key and avoid the reseeking logic for max_iterating skipping
Test Plan: enable the two disabled tests and make sure they pass
Reviewers: rven, IslamAbdelRahman, kradhakrishnan, tnovak, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D43557
Summary:
While working on https://reviews.facebook.net/D43179 , I found
duplicate code in the tests. This patch removes it.
Test Plan: make clean all check
Reviewers: igor, sdong, rven, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43263
Summary:
Subj. We really need this feature.
Previous diff D40899 has most of the changes to make this possible, this diff just adds the method.
Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind.
Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba
Differential Revision: https://reviews.facebook.net/D40905
Summary:
Updated DBTest DBCompactionTest and CompactionJobStatsTest
to run compaction-related tests once with subcompactions enabled and
once disabled using the TEST_P test type in the Google Test suite.
Test Plan: ./db_test ./db_compaction-test ./compaction_job_stats_test
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43443
Summary:
Introduce DeleteScheduler that allow enforcing a rate limit on file deletion
Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed.
I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile
Test Plan:
added delete_scheduler_test
existing unit tests
Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D43221
Summary:
Make DBCompactionTest.SkipStatsUpdateTest more stable by
removing flaky but unnecessary assertion on the size of db
as simply checking the random file open count is suffice.
Test Plan: db_compaction_test
Reviewers: igor, anthony, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43533
Summary:
UpdateAccumulatedStats() is used to optimize compaction decision
esp. when the number of deletion entries are high, but this function
can slowdown DBOpen esp. in disk environment.
This patch adds DBOptions::skip_sats_update_on_db_open, which skips
UpdateAccumulatedStats() in DB::Open() time when it's set to true.
Test Plan: Add DBCompactionTest.SkipStatsUpdateTest
Reviewers: igor, anthony, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: tnovak, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42843