Summary:
Add options.compaction_pri, which specifies the policy about which file to compact first.
kCompactionPriByLargestSeq will compact oldest files first.
Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable.
Test Plan: Will write unit tests
Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45951
Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold.
Test Plan: Add unit test DBTest.HardLimit
Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45999
Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.
I also added a --subcompactions option to db_bench
Test Plan: make all make check
Reviewers: sdong, igor, anthony, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D45069
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:
2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}
Add two more counters in iostats_context.
Also add a parameter of db_bench.
Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D44115
Summary:
As of now compactions involving files from Level 0 and Level 1 are single
threaded because the files in L0, although sorted, are not range partitioned like
the other levels. This means that during L0-L1 compaction each file from L1
needs to be merged with potentially all the files from L0.
This attempt to parallelize the L0-L1 compaction assigns a thread and a
corresponding iterator to each L1 file that then considers only the key range
found in that L1 file and only the L0 files that have those keys (and only the
specific portion of those L0 files in which those keys are found). In this way
the overlap is minimized and potentially eliminated between different iterators
focusing on the same files.
The first step is to restructure the compaction logic to break L0-L1 compactions
into multiple, smaller, sequential compactions. Eventually each of these smaller
jobs will be run simultaneously. Areas to pay extra attention to are
# Correct aggregation of compaction job statistics across multiple threads
# Proper opening/closing of output files (make sure each thread's is unique)
# Keys that span multiple L1 files
# Skewed distributions of keys within L0 files
Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test
Reviewers: igor, noetzli, anthony, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42699
Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.
We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.
Test Plan: make check (there might be some unit tests that depend on this behavior)
Reviewers: IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41931
Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows.
Test Plan: Add a new unit test for it
Reviewers: rven, igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37335
Summary:
With this change, we use L1 and up to store compaction outputs in universal compaction.
The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.
If options.num_levels=1, it behaves all the same as now.
Test Plan:
1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
3) add unit test to migrate from multiple level setting back to one level setting
4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.
Reviewers: rven, kradhakrishnan, yhchiang, igor
Reviewed By: igor
Subscribers: meyering, leveldb, MarkCallaghan, dhruba
Differential Revision: https://reviews.facebook.net/D34539
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.
In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.
Test Plan: New unit tests and pass tests suites including valgrind.
Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo
Reviewed By: ikabiljo
Subscribers: yoshinorim, ikabiljo, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31437
Summary:
The very last reference happens in DBImpl::GetOptions()
I built with both DBImpl::GetOptions() and ColumnFamilyData::options() commented out
Test Plan: make all check
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29073
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
Test Plan: compiles
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: bobbaldwin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28689
Summary:
Make inplace_update_support and inplace_update_num_locks dynamic.
inplace_callback becomes immutable
We are almost free of references to cfd->options() in db_impl
Test Plan: unit test
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25293
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25347
Summary:
This is not a critical options. Making it dynamic so that we can remove
more reference to cfd->options()
Test Plan: unit test
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24957
Summary: as title
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24903
Summary: as title
Test Plan:
unit test
I am only able to build the test case for hard_rate_limit.
soft_rate_limit is essentially the same thing as hard_rate_limit
Reviewers: igor, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24759
Summary: Add more tests as well
Test Plan: unit test
Reviewers: igor, sdong, yhchiang
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24747
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24729
Summary:
make compaction related options changeable. Most of changes are tedious,
following the same convention: grabs MutableCFOptions at the beginning
of compaction under mutex, then pass it throughout the job and register
it in SuperVersion at the end.
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23349
Summary: as title
Test Plan:
make all check
I will think a way to set up stress test for this
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23055