Summary:
The previous compression type selection caused unexpected behavior when the base level was also the bottommost level. The following sequence of events could happen:
- full compaction generates files with `bottommost_compression` type
- now base level is bottommost level since all files are in the same level
- any compaction causes files to be rewritten `compression_per_level` type since bottommost compression didn't apply to base level
I changed the code to make bottommost compression apply to base level.
Closes https://github.com/facebook/rocksdb/pull/3141
Differential Revision: D6264614
Pulled By: ajkr
fbshipit-source-id: d7aaa8675126896684154a1f2c9034d6214fde82
Summary:
When snapshots are held for a long time, files may reach the bottom level containing overwritten/deleted keys. We previously had no mechanism to trigger compaction on such files. This particularly impacted DBs that write to different parts of the keyspace over time, as such files would never be naturally compacted due to second-last level files moving down. This PR introduces a mechanism for bottommost files to be recompacted upon releasing all snapshots that prevent them from dropping their deleted/overwritten keys.
- Changed `CompactionPicker` to compact files in `BottommostFilesMarkedForCompaction()`. These are the last choice when picking. Each file will be compacted alone and output to the same level in which it originated. The goal of this type of compaction is to rewrite the data excluding deleted/overwritten keys.
- Changed `ReleaseSnapshot()` to recompute the bottom files marked for compaction when the oldest existing snapshot changes, and schedule a compaction if needed. We cache the value that oldest existing snapshot needs to exceed in order for another file to be marked in `bottommost_files_mark_threshold_`, which allows us to avoid recomputing marked files for most snapshot releases.
- Changed `VersionStorageInfo` to track the list of bottommost files, which is recomputed every time the version changes by `UpdateBottommostFiles()`. The list of marked bottommost files is first computed in `ComputeBottommostFilesMarkedForCompaction()` when the version changes, but may also be recomputed when `ReleaseSnapshot()` is called.
- Extracted core logic of `Compaction::IsBottommostLevel()` into `VersionStorageInfo::RangeMightExistAfterSortedRun()` since logic to check whether a file is bottommost is now necessary outside of compaction.
Closes https://github.com/facebook/rocksdb/pull/3009
Differential Revision: D6062044
Pulled By: ajkr
fbshipit-source-id: 123d201cf140715a7d5928e8b3cb4f9cd9f7ad21
Summary:
With FIFO compaction we would like to get the oldest data time for monitoring. The problem is we don't have timestamp for each key in the DB. As an approximation, we expose the earliest of sst file "creation_time" property.
My plan is to override the property with a more accurate value with blob db, where we actually have timestamp.
Closes https://github.com/facebook/rocksdb/pull/2842
Differential Revision: D5770600
Pulled By: yiwu-arbug
fbshipit-source-id: 03833c8f10bbfbee62f8ea5c0d03c0cafb5d853a
Summary:
ColumnFamilyOptions::compaction_options_fifo and all its sub-fields can be set dynamically now.
Some of the ways in which the fifo compaction options can be set are:
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024}"}})`
- `SetOptions({{"compaction_options_fifo", "{ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024;ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=51;ttl=49;allow_compaction=true;}"}})`
Most of the code has been made generic enough so that it could be reused later to make universal options (and other such nested defined-types) dynamic with very few lines of parsing/serializing code changes.
Introduced a few new functions like `ParseStruct`, `SerializeStruct` and `GetStringFromStruct`.
The duplicate code in `GetStringFromDBOptions` and `GetStringFromColumnFamilyOptions` has been moved into `GetStringFromStruct`. So they become just simple wrappers now.
Closes https://github.com/facebook/rocksdb/pull/3006
Differential Revision: D6058619
Pulled By: sagar0
fbshipit-source-id: 1e8f78b3374ca5249bb4f3be8a6d3bb4cbc52f92
Summary:
Hi,
As part of some optimization, we're using multiple DB locations (tmpfs and spindle) to store data and configured max_bytes_for_level_multiplier_additional. But, max_bytes_for_level_multiplier_additional is not used to compute the actual size for the level while picking the DB location. So, even if DB location does not have space, RocksDB mistakenly puts the level at that location.
Can someone pls. verify the fix? Let me know any other changes required.
Thanks,
Jay
Closes https://github.com/facebook/rocksdb/pull/2704
Differential Revision: D5992515
Pulled By: ajkr
fbshipit-source-id: cbbc6c0e0a7dbdca91c72e0f37b218c4cec57e28
Summary:
Fix for [2461](https://github.com/facebook/rocksdb/issues/2461).
Problem: When using multiple db_paths setting with RocksDB, RocksDB incorrectly calculates the size of L1 in LevelCompactionBuilder::GetPathId.
max_bytes_for_level_base is used as L0 size and L1 size is calculated as (L0 size * max_bytes_for_level_multiplier). However, L1 size should be max_bytes_for_level_base.
Solution: Use max_bytes_for_level_base as L1 size. Also, use L1 size as the estimated size of L0.
Closes https://github.com/facebook/rocksdb/pull/2903
Differential Revision: D5885442
Pulled By: maysamyabandeh
fbshipit-source-id: 036da1c9298d173b9b80479cc6661ee4b7a951f6
Summary:
Bug report: https://www.facebook.com/groups/rocksdb.dev/permalink/1389452781153232/
Non-empty `level0_compactions_in_progress_` was aborting `CompactFiles` after incrementing `bg_compaction_scheduled_`, and in that case we never decremented it. This blocked future compactions and prevented DB close as we wait for scheduled compactions to finish/abort during close.
I eliminated `CompactFiles`'s dependency on `level0_compactions_in_progress_`. Since it takes a contiguous span of L0 files -- through the last L0 file if any L1+ files are included -- it's fine to run in parallel with other compactions involving L0. We make the same assumption in intra-L0 compaction.
Closes https://github.com/facebook/rocksdb/pull/2849
Differential Revision: D5780440
Pulled By: ajkr
fbshipit-source-id: 15b15d3faf5a699aed4b82a58352d4a7bb23e027
Summary:
if we're moving any L0 files down, we need to include older L0 files since they may contain older versions of the keys being moved down.
Closes https://github.com/facebook/rocksdb/pull/2845
Differential Revision: D5773800
Pulled By: ajkr
fbshipit-source-id: 9f0770a8eaaeea4c87df2e7a2a1d65bf9d7f4f7e
Summary:
The previous implementation of caching `file_size` index made no sense. It only remembered the original span of locked files starting from beginning of `file_size`. We should remember the index after all compactions that have been considered but rejected. This will reduce the work we do while holding the db mutex.
Closes https://github.com/facebook/rocksdb/pull/2624
Differential Revision: D5468152
Pulled By: ajkr
fbshipit-source-id: ab92a4bffe76f9f174d861bb5812b974d1013400
Summary:
This reverts the previous commit 1d7048c598, which broke the build.
Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627
Differential Revision: D5476473
Pulled By: sagar0
fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary:
This diff addresses two problems. Both problems cause us to miss scheduling desirable compactions. One side effect is compaction picking can spam logs, as there's no delay after failed attempts to pick compactions.
1. If a compaction pulled in a locked input-level file due to user-key overlap, we would not consider picking another file from the same input level.
2. If a compaction pulled in a locked output-level file due to user-key overlap, we would not consider picking any other compaction on any level.
The code changes are dependent, which is why I solved both problems in a single diff.
- Moved input-level `ExpandInputsToCleanCut` into the loop inside `PickFileToCompact`. This gives two benefits: (1) if it fails, we will try the next-largest file on the same input level; (2) we get the fully-expanded input-level key-range with which we can check for pending compactions in output level.
- Added another call to `ExpandInputsToCleanCut` inside `PickFileToCompact`'s to check for compaction conflicts in output level.
- Deleted call to `IsRangeInCompaction` in `PickFileToCompact`, as `ExpandInputsToCleanCut` also correctly handles the case where original output-level files (i.e., ones not pulled in due to user-key overlap) are pending compaction.
Closes https://github.com/facebook/rocksdb/pull/2615
Differential Revision: D5454643
Pulled By: ajkr
fbshipit-source-id: ea3fb5477d83e97148951af3fd4558d2039e9872
Summary:
I decided not even to keep it as an INFO-level log as it is too normal for compactions to be skipped due to locked input files. Removing logging here makes us consistent with how we treat locked files that weren't pulled in due to overlap.
We may want some error handling on line 422, which should never happen when called by `LevelCompactionBuilder::PickCompaction`, as `SetupInitialFiles` skips compactions where overlap causes the output level to pull in locked files.
Closes https://github.com/facebook/rocksdb/pull/2617
Differential Revision: D5458502
Pulled By: ajkr
fbshipit-source-id: c2e5f867c0a77c1812ce4242ab3e085b3eee0bae
Summary:
Introducing FIFO compactions with TTL.
FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size.
To address that request:
- Added a new TTL option to FIFO compaction options.
- Updated FIFO compaction score to take TTL into consideration.
- Added a new table property, creation_time, to keep track of when the SST file is created.
- Creation_time is set as below:
- On Flush: Set to the time of flush.
- On Compaction: Set to the max creation_time of all the files involved in the compaction.
- On Repair and Recovery: Set to the time of repair/recovery.
- Old files created prior to this code change will have a creation_time of 0.
- FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. `if (file.creation_time < (current_time - ttl)) then delete(file)`. This will enable cases where you might want to delete all files older than, say, 1 day.
- FIFO compaction will fall back to the prior way of deleting files based on size if:
- the creation_time of all files involved in compaction is 0.
- the total size (of all SST files combined) does not drop below `compaction_options_fifo.max_table_files_size` even if the files older than ttl are deleted.
This feature is not supported if max_open_files != -1 or with table formats other than Block-based.
**Test Plan:**
Added tests.
**Benchmark results:**
Base: FIFO with max size: 100MB ::
```
svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100
readwhilewriting : 1.924 micros/op 519858 ops/sec; 13.6 MB/s (1176277 of 5000000 found)
```
With TTL (a low one for testing) ::
```
svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20
readwhilewriting : 1.902 micros/op 525817 ops/sec; 13.7 MB/s (1185057 of 5000000 found)
```
Example Log lines:
```
2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion
2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files
...
2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK
2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40}
```
SST Files remaining in the dbbench dir, after db_bench execution completed:
```
svemuri@dev15905 ~/rocksdb (fifo-compaction) $ ls -l /dev/shm//dbbench/*.sst
-rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst
-rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst
-rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst
```
Closes https://github.com/facebook/rocksdb/pull/2480
Differential Revision: D5305116
Pulled By: sagar0
fbshipit-source-id: 3e5cfcf5dd07ed2211b5b37492eb235b45139174
Summary:
First cut for early review; there are few conceptual points to answer and some code structure issues.
For conceptual points -
- restriction-wise, we're going to disallow ingest_behind if (use_seqno_zero_out=true || disable_auto_compaction=false), the user is responsible to properly open and close DB with required params
- we wanted to ingest into reserved bottom most level. Should we fail fast if bottom level isn't empty, or should we attempt to ingest if file fits there key-ranges-wise?
- Modifying AssignLevelForIngestedFile seems the place we we'd handle that.
On code structure - going to refactor GenerateAndAddExternalFile call in the test class to allow passing instance of IngestionOptions, that's just going to incur lots of changes at callsites.
Closes https://github.com/facebook/rocksdb/pull/2144
Differential Revision: D4873732
Pulled By: lightmark
fbshipit-source-id: 81cb698106b68ef8797f564453651d50900e153a
Summary:
Allow an option for users to do some compaction in FIFO compaction, to pay some write amplification for fewer number of files.
Closes https://github.com/facebook/rocksdb/pull/2163
Differential Revision: D4895953
Pulled By: siying
fbshipit-source-id: a1ab608dd0627211f3e1f588a2e97159646e1231
Summary:
1. Move universal compaction picker to separate files compaction_picker_universal.cc and compaction_picker_universal.h.
2. Rename some functions to make the code easier to understand.
3. Move leveled compaction picking code to a dedicated class, so that we we don't need to pass some common variable around when calling functions. It also allowed us to break down LevelCompactionPicker::PickCompaction() to smaller functions.
Closes https://github.com/facebook/rocksdb/pull/2100
Differential Revision: D4845948
Pulled By: siying
fbshipit-source-id: efa0ab4
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090
Differential Revision: D4833681
Pulled By: siying
fbshipit-source-id: 2fd8bef
Summary:
Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.
- L0->L0 is triggered when base level is unavailable due to pending compactions
- L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
- Subcompactions are disabled for L0->L0 since we want to output one file.
- Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
Closes https://github.com/facebook/rocksdb/pull/2027
Differential Revision: D4760318
Pulled By: ajkr
fbshipit-source-id: 9d07183
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080
Differential Revision: D4821141
Pulled By: siying
fbshipit-source-id: ca7d768
Summary:
reimplement the compaction expansion on lower level.
Considering such a case:
input level file: 1[B E] 2[F G] 3[H I] 4 [J M]
output level file: 5[A C] 6[D K] 7[L O]
If we initially pick file 2, now we will compact file 2 and 6. But we can safely compact 2, 3 and 6 without expanding the output level.
The previous code is messy and wrong.
In this diff, I first determine the input range [a, b], and output range [c, d],
then we get the range [e,f] = [min(a, c), max(b, d] and put all eligible clean-cut files within [e, f] into this compaction.
**Note: clean-cut means the files don't have the same user key on the boundaries of some files that are not chosen in this compaction**.
Closes https://github.com/facebook/rocksdb/pull/1760
Differential Revision: D4395564
Pulled By: lightmark
fbshipit-source-id: 2dc2c5c
Summary:
The previous fix of reappearing of a deleted row 0ce258f9b3 missed a corner case, which can be reproduced using test CompactionPickerTest.OverlappingUserKeys7. Consider such an example:
input level file: 1[B E] 2[F H]
output level file: 3[A C] 4[D I] 5[I K]
First file 2 is picked, which overlaps to file 4. 4 expands to 5. Now the all range is [D K] with 2 output level files. When we try to expand that, [D K] overlaps with file 1 and 2 in the input level, and 1 and 2 overlaps with 3 and 4 in the output level. So we end up with picking 3 and 4 in the output level. Without expanding, it also has 2 files, so we determine the output level doesn't change, although they are the different two files.
The fix is to expand the output level files after we picked 3 and 4. In that case, there will be three output level files so we will abort the expanding.
I also added two unit tests related to marked_for_compaction and being_compacted. They have been passing though.
Test Plan: Run the new unit test, as well as all other tests.
Reviewers: andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: yoshinorim, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D65373
Summary:
Issue scenario:
(1) We have 3 files in L1 and we issue a compaction that will compact them into 1 file in L2
(2) While compaction (1) is running, we flush a file into L0 and trigger another compaction that decide to move this file to L1 and then move it again to L2 (this file don't overlap with any other files)
(3) compaction (1) finishes and install the file it generated in L2, but this file overlap with the file we generated in (2) so we break the LSM consistency
Looks like this issue can be triggered by using non-exclusive manual compaction or AddFile()
Test Plan: unit tests
Reviewers: sdong
Reviewed By: sdong
Subscribers: hermanlee4, jkedgar, andrewkr, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D64947
Summary:
* Change constructor of MutableCFOptions to depends only on ColumnFamilyOptions.
* Move `max_subcompactions`, `compaction_options_fifo` and `compaction_pri` to ImmutableCFOptions to make it clear that they are immutable.
Test Plan: existing unit tests.
Reviewers: yhchiang, IslamAbdelRahman, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D63945
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.
Test Plan: Add two new unit tests. Run all existing tests, including jtest.
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59829
Summary:
If we have total number of sorted runs greater than level0_file_num_compaction_trigger, Universal compaction will always issue a compaction
even if the number of sorted runs that are not being compacted is less than level0_file_num_compaction_trigger.
This diff changes this behaviour to relay on the `number of sorted runs not being compacted` instead of `total number of sorted runs`
Test Plan: New unit test
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61533
Summary: We may wrongly drop delete operation if we pick a file with the entry to be delete, the put entry of the same user key is in the next file in the level, and the next file is not picked. We expand compaction inputs for output level too.
Test Plan: Add unit tests that reproduct the bug of dropping delete entry. Change compaction_picker_test to assert the new behavior.
Reviewers: IslamAbdelRahman, igor
Reviewed By: igor
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D61173
Summary: With max_size_amplification_percent = 0 to make sure that DBTestUniversalCompaction.UniversalCompactionSingleSortedRun tests the configuration to compact to one single sorted run.
Test Plan: Run all existing tests
Reviewers: yhchiang, andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60021
Summary: Make CompactionOptionsFIFO a part of mutable_cf_options
Test Plan: UT
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, lgalanis, dhruba
Differential Revision: https://reviews.facebook.net/D58653
Summary:
Add a new option that can be used to set a specific compression algorithm for bottommost level.
This option will only affect levels larger than base level.
I have also updated CompactionJobInfo to include the compression algorithm used in compaction
Test Plan:
added new unittest
existing unittests
Reviewers: andrewkr, yhchiang, sdong
Reviewed By: sdong
Subscribers: lightmark, andrewkr, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D57669
Summary: We always put compaction to level0_compactions_in_progress_ for universal compaction, so we should also remove it. The bug causes assert failure when running manual compaction.
Test Plan:
TEST_TMPDIR=/dev/shm/ ./db_bench --benchmarks=fillrandom,compact --subcompactions=16 --compaction_style=1
always fails on my host. After the fix, it doesn't fail any more.
Reviewers: IslamAbdelRahman, andrewkr, kradhakrishnan, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D55017
Summary: After we made manual compaction runnable concurrently with automaticallly compaction, we need to run ComputeCompactionScore() to prepare a coming compaction picking call before the compaction finishes.
Test Plan: Run existing tests.
Reviewers: yhchiang, IslamAbdelRahman, andrewkr, kradhakrishnan, anthony, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D54891
Summary: When L0->L1 is pending, there may be one L1->L2 compaction going on which prevents the L0->L1 compaction from happening. If L1 needs more data to be moved to L2, then we may continue scheduling more L1->L2 compactions. The end result may be that L0->L1 compaction will not happen until L1 size drops to below target size. We can reduce the stalling because of number of L0 files by stopping schedling new L1->L2 compaction when L0's score is higher than L1.
Test Plan: Run all existing tests.
Reviewers: yhchiang, MarkCallaghan, rven, anthony, IslamAbdelRahman, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D52401
Summary:
Add CompactionReason to CompactionJobInfo
This will allow users to understand why compaction started which will help options tuning
Test Plan:
added new tests
make check -j64
Reviewers: yhchiang, anthony, kradhakrishnan, sdong, rven
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D51975
Summary:
List of changes:
1) Fix the snprintf() usage in cases where wrong variable was used to determine the output buffer size.
2) Remove unnecessary checks before calling delete operator.
3) Increase code correctness by using size_t type when getting vector's size.
4) Unify the coding style by removing namespace::std usage at the top of the file to confirm to the majority usage.
5) Fix various lint errors pointed out by 'arc lint'.
Test Plan:
Code review and build:
git diff
make clean
make -j 32 commit-prereq
arc lint
Reviewers: kradhakrishnan, sdong, rven, anthony, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D51849
Summary:
This diff provides a framework for doing manual
compactions in parallel with other compactions. We now have a deque of manual compactions. We also pass manual compactions as an argument from RunManualCompactions down to
BackgroundCompactions, so that RunManualCompactions can be reentrant.
Parallelism is controlled by the two routines
ConflictingManualCompaction to allow/disallow new parallel/manual
compactions based on already existing ManualCompactions. In this diff, by default manual compactions still have to run exclusive of other compactions. However, by setting the compaction option, exclusive_manual_compaction to false, it is possible to run other compactions in parallel with a manual compaction. However, we are still restricted to one manual compaction per column family at a time. All of these restrictions will be relaxed in future diffs.
I will be adding more tests later.
Test Plan: Rocksdb regression + new tests + valgrind
Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong
Reviewed By: sdong
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D47973
Summary:
Currently RocksDB may break in lines like this:
for (size_t i = sorted_runs.size() - 1; i >= first_index_after; i--) {
if options.level0_file_num_compaction_trigger=0.
Fix it by not executing the logic of picking compactions if there is no file (sorted_runs.size() = 0). Also internally set options.level0_file_num_compaction_trigger=1 if users give a 0. 0 is a value makes no sense in RocksDB.
Test Plan: Run all tests. Will add a unit test too.
Reviewers: yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, rven
Reviewed By: rven
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D50727
Summary:
Since level 0 files can overlap, two level 0 compactions cannot
run in parallel. Compact files needs to check this before running a
compaction.
Test Plan: CompactFilesTest.L0ConflictsFiles
Reviewers: igor, IslamAbdelRahman, anthony, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D50079
Summary:
Add options.compaction_pri, which specifies the policy about which file to compact first.
kCompactionPriByLargestSeq will compact oldest files first.
Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable.
Test Plan: Will write unit tests
Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D45951
Summary: The diff modifies the condition checked to determine the bottommost level during compaction. Previously, absence of files in higher levels alone was used as the condition. Now, the function additionally evaluates if the higher levels have files which have non-overlapping key ranges, then the level can be safely considered as the bottommost level.
Test Plan: Unit test cases added and passing. However, unit tests of universal compaction are failing as a result of the changes made in this diff. Need to understand why that is happening.
Reviewers: igor
Subscribers: dhruba, sdong, lgalanis, meyering
Differential Revision: https://reviews.facebook.net/D46473
Summary:
See task #7983654. The example was triggering an assert in compaction job
because the compaction was not marked as manual. With this patch,
CompactionPicker::FormCompaction() marks compactions as manual. This patch
also fixes a couple of typos, adds optimistic_transaction_example to
.gitignore and librocksdb as a dependency for examples. Adding librocksdb as
a dependency makes sure that the examples are built with the latest changes
in librocksdb.
Test Plan: make clean && cd examples && make all && ./compact_files_example
Reviewers: rven, sdong, anthony, igor, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D45117
Summary:
The compact files API had a bug where some overlapping files
are not added. These are files which overlap with files which were
added to the compaction input files, but not to the original set of
input files. This happens only when there are more than two levels
involved in the compaction. An example will illustrate this better.
Level 2 has 1 input file 1.sst which spans [20,30].
Level 3 has added file 2.sst which spans [10,25]
Level 4 has file 3.sst which spans [35,40] and
input file 4.sst which spans [46,50].
The existing code would not add 3.sst to the set of input_files because
it only becomes an overlapping file in level 4 and it wasn't one in
level 3.
When installing the results of the compaction, 3.sst would overlap with
output file from the compact files and result in the assertion in
version_set.cc:1130
// Must not overlap
assert(level <= 0 || level_files->empty() ||
internal_comparator_->Compare(
(*level_files)[level_files->size() - 1]->largest, f->smallest) <
0);
This change now adds overlapping files from the current level to the set
of input files also so that we don't hit the assertion above.
Test Plan:
d=/tmp/j; rm -rf $d; seq 1000 | parallel --gnu --eta
'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_compaction_test
--gtest_filter=*CompactilesOnLevel* --gtest_also_run_disabled_tests >&
'$d'/log-{}'
Reviewers: igor, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D43437