Commit Graph

67 Commits

Author SHA1 Message Date
Siying Dong
d94aa2f7db Make compaction_pri = kMinOverlappingRatio to be default (#4911)
Summary:
compaction_pri = kMinOverlappingRatio usually provides much better write amplification than the default.
https://github.com/facebook/rocksdb/pull/4907 fixes one shortcome of this option. Make it default.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4911

Differential Revision: D13789262

Pulled By: siying

fbshipit-source-id: d90acf8c4dede44f00d183ca4c7a210259378269
2019-01-23 16:47:38 -08:00
Siying Dong
5bf941966b CompactionPri = kMinOverlappingRatio also uses compensated file size (#4907)
Summary:
Right now, CompactionPri = kMinOverlappingRatio provides best write amplification, but it doesn't
prioritize files with more tombstones. We combine the two good features: make kMinOverlappingRatio
to boost files with lots of tombstones too.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4907

Differential Revision: D13788774

Pulled By: siying

fbshipit-source-id: 1991cbb495fb76c8b529de69896e38d81ed9d9b3
2019-01-23 13:21:01 -08:00
Sagar Vemuri
70645355ad Move FIFOCompactionPicker to a separate file (#4724)
Summary:
**Summary:**
Simplified the code layout by moving FIFOCompactionPicker to a separate file.
**Why?:**
While trying to add ttl functionality to universal compaction, I found that `FIFOCompactionPicker` class and its impl methods to be interspersed between `LevelCompactionPicker` methods which kind-of made the code a little hard to traverse. So I moved `FIFOCompactionPicker` to a separate compaction_picker_fifo.h/cc file, similar to `UniversalCompactionPicker`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4724

Differential Revision: D13227914

Pulled By: sagar0

fbshipit-source-id: 89471766ea67fa4d87664a41c057dd7df4b3d4e3
2018-11-29 16:04:52 -08:00
Yanqin Jin
54de56844d Remove random writes from SST file ingestion (#4172)
Summary:
RocksDB used to store global_seqno in external SST files written by
SstFileWriter. During file ingestion, RocksDB uses `pwrite` to update the
`global_seqno`. Since random write is not supported in some non-POSIX compliant
file systems, external SST file ingestion is not supported on these file
systems. To address this limitation, we no longer update `global_seqno` during
file ingestion. Later RocksDB uses the MANIFEST and other information in table
properties to deduce global seqno for externally-ingested SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4172

Differential Revision: D8961465

Pulled By: riversand963

fbshipit-source-id: 4382ec85270a96be5bc0cf33758ca2b167b05071
2018-07-27 16:12:23 -07:00
Anand Ananthabhotla
a736255de8 Delete triggered compaction for universal style
Summary:
This is still WIP, but I'm hoping for early feedback on the overall approach.

This patch implements deletion triggered compaction, which till now only
worked for leveled, for universal style. SST files are marked for
compaction by the CompactOnDeletionCollertor table property. This is
expected to be used when free disk space is low and the user wants to
reclaim space by deleting a bunch of keys. The deletions are expected to
be dense. In such a situation, we want to avoid a full compaction due to
its space overhead.

The strategy used in this case is similar to leveled. We pick one file
from the set of files marked for compaction. We then expand the inputs
to a clean cut on the same level, and then pick overlapping files from
the next non-mepty level. Picking files from the next level can cause
the key range to expand, and we opportunistically expand inputs in the
source level to include files wholly in this key range.

The main side effect of this is that it breaks the property of no time
range overlap between levels. This shouldn't break any functionality.
Closes https://github.com/facebook/rocksdb/pull/3860

Differential Revision: D8124397

Pulled By: anand1976

fbshipit-source-id: bfa2a9dd6817930e991b35d3a8e7e61304ed3dcf
2018-05-29 15:44:34 -07:00
Phani Shekhar Mantripragada
446b32cfc3 Support for Column family specific paths.
Summary:
In this change, an option to set different paths for different column families is added.
This option is set via cf_paths setting of ColumnFamilyOptions. This option will work in a similar fashion to db_paths setting. Cf_paths is a vector of Dbpath values which contains a pair of the absolute path and target size. Multiple levels in a Column family can go to different paths if cf_paths has more than one path.
To maintain backward compatibility, if cf_paths is not specified for a column family, db_paths setting will be used. Note that, if db_paths setting is also not specified, RocksDB already has code to use db_name as the only path.

Changes :
1) A new member "cf_paths" is added to ImmutableCfOptions. This is set, based on cf_paths setting of ColumnFamilyOptions and db_paths setting of ImmutableDbOptions.  This member is used to identify the path information whenever files are accessed.
2) Validation checks are added for cf_paths setting based on existing checks for db_paths setting.
3) DestroyDB, PurgeObsoleteFiles etc. are edited to support multiple cf_paths.
4) Unit tests are added appropriately.
Closes https://github.com/facebook/rocksdb/pull/3102

Differential Revision: D6951697

Pulled By: ajkr

fbshipit-source-id: 60d2262862b0a8fd6605b09ccb0da32bb331787d
2018-04-05 19:58:20 -07:00
Andrew Kryczka
5d68243e61 Comment out unused variables
Summary:
Submitting on behalf of another employee.
Closes https://github.com/facebook/rocksdb/pull/3557

Differential Revision: D7146025

Pulled By: ajkr

fbshipit-source-id: 495ca5db5beec3789e671e26f78170957704e77e
2018-03-05 13:13:41 -08:00
Igor Sugak
aba3409740 Back out "[codemod] - comment out unused parameters"
Reviewed By: igorsugak

fbshipit-source-id: 4a93675cc1931089ddd574cacdb15d228b1e5f37
2018-02-22 12:43:17 -08:00
David Lai
f4a030ce81 - comment out unused parameters
Reviewed By: everiq, igorsugak

Differential Revision: D7046710

fbshipit-source-id: 8e10b1f1e2aecebbfb229c742e214db887e5a461
2018-02-22 09:44:23 -08:00
Siying Dong
7291a3f813 Improve fallocate size in compaction output
Summary:
Now in leveled compaction, we allocate solely based on output target file size. If the total input size is smaller than the number, we should use the total input size instead. Also, cap the allocate size to 1GB.
Closes https://github.com/facebook/rocksdb/pull/3385

Differential Revision: D6762363

Pulled By: siying

fbshipit-source-id: e30906f6e9bff3ec847d2166e44cb49c92f98a13
2018-01-22 16:43:46 -08:00
Zhongyi Xie
fcc8a6574d Make Universal compaction options dynamic
Summary:
Let me know if more test coverage is needed
Closes https://github.com/facebook/rocksdb/pull/3213

Differential Revision: D6457165

Pulled By: miasantreble

fbshipit-source-id: 3f944abff28aa7775237f1c4f61c64ccbad4eea9
2017-12-11 13:27:06 -08:00
Sagar Vemuri
f0804db7f7 Make FIFO compaction options dynamically configurable
Summary:
ColumnFamilyOptions::compaction_options_fifo and all its sub-fields can be set dynamically now.

Some of the ways in which the fifo compaction options can be set are:
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024}"}})`
- `SetOptions({{"compaction_options_fifo", "{ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024;ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=51;ttl=49;allow_compaction=true;}"}})`

Most of the code has been made generic enough so that it could be reused later to make universal options (and other such nested defined-types) dynamic with very few lines of parsing/serializing code changes.
Introduced a few new functions like `ParseStruct`, `SerializeStruct` and `GetStringFromStruct`.
The duplicate code in `GetStringFromDBOptions` and `GetStringFromColumnFamilyOptions` has been moved into `GetStringFromStruct`. So they become just simple wrappers now.
Closes https://github.com/facebook/rocksdb/pull/3006

Differential Revision: D6058619

Pulled By: sagar0

fbshipit-source-id: 1e8f78b3374ca5249bb4f3be8a6d3bb4cbc52f92
2017-10-19 15:26:36 -07:00
Sagar Vemuri
9980de262c Fix FIFO compaction picker test
Summary:
A FIFO compaction picker test is accidentally testing against an instance of level compaction picker.
Closes https://github.com/facebook/rocksdb/pull/2641

Differential Revision: D5495390

Pulled By: sagar0

fbshipit-source-id: 301962736f629b1c499570fb504cdbe66bacb46f
2017-07-26 12:12:26 -07:00
Andrew Kryczka
a34b2e388e Fix caching of compaction picker's next index
Summary:
The previous implementation of caching `file_size` index made no sense. It only remembered the original span of locked files starting from beginning of `file_size`. We should remember the index after all compactions that have been considered but rejected. This will reduce the work we do while holding the db mutex.
Closes https://github.com/facebook/rocksdb/pull/2624

Differential Revision: D5468152

Pulled By: ajkr

fbshipit-source-id: ab92a4bffe76f9f174d861bb5812b974d1013400
2017-07-21 20:57:15 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Andrew Kryczka
a22b9cc6fe overlapping endpoint fixes in level compaction picker
Summary:
This diff addresses two problems. Both problems cause us to miss scheduling desirable compactions. One side effect is compaction picking can spam logs, as there's no delay after failed attempts to pick compactions.

1. If a compaction pulled in a locked input-level file due to user-key overlap, we would not consider picking another file from the same input level.
2. If a compaction pulled in a locked output-level file due to user-key overlap, we would not consider picking any other compaction on any level.

The code changes are dependent, which is why I solved both problems in a single diff.

- Moved input-level `ExpandInputsToCleanCut` into the loop inside `PickFileToCompact`. This gives two benefits: (1) if it fails, we will try the next-largest file on the same input level; (2) we get the fully-expanded input-level key-range with which we can check for pending compactions in output level.
- Added another call to `ExpandInputsToCleanCut` inside `PickFileToCompact`'s to check for compaction conflicts in output level.
- Deleted call to `IsRangeInCompaction` in `PickFileToCompact`, as `ExpandInputsToCleanCut` also correctly handles the case where original output-level files (i.e., ones not pulled in due to user-key overlap) are pending compaction.
Closes https://github.com/facebook/rocksdb/pull/2615

Differential Revision: D5454643

Pulled By: ajkr

fbshipit-source-id: ea3fb5477d83e97148951af3fd4558d2039e9872
2017-07-19 20:42:00 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Andrew Kryczka
3a8a848a55 account for L0 size in estimated compaction bytes
Summary:
also changed the `>` in the comparison against `level0_file_num_compaction_trigger` into a `>=` since exactly `level0_file_num_compaction_trigger` can trigger a compaction from L0.
Closes https://github.com/facebook/rocksdb/pull/2179

Differential Revision: D4915772

Pulled By: ajkr

fbshipit-source-id: e38fec6253de6f9a40e61734615c6670d84038aa
2017-06-01 17:56:59 -07:00
Mikhail Antonov
ba685a472a Support ingest_behind for IngestExternalFile
Summary:
First cut for early review; there are few conceptual points to answer and some code structure issues.

For conceptual points -

 - restriction-wise, we're going to disallow ingest_behind if (use_seqno_zero_out=true || disable_auto_compaction=false), the user is responsible to properly open and close DB with required params
 - we wanted to ingest into reserved bottom most level. Should we fail fast if bottom level isn't empty, or should we attempt to ingest if file fits there key-ranges-wise?
 - Modifying AssignLevelForIngestedFile seems the place we we'd handle that.

On code structure - going to refactor GenerateAndAddExternalFile call in the test class to allow passing instance of IngestionOptions, that's just going to incur lots of changes at callsites.
Closes https://github.com/facebook/rocksdb/pull/2144

Differential Revision: D4873732

Pulled By: lightmark

fbshipit-source-id: 81cb698106b68ef8797f564453651d50900e153a
2017-05-17 11:42:42 -07:00
Andrew Kryczka
8c3a180e83 Set lower-bound on dynamic level sizes
Summary:
Changed dynamic leveling to stop setting the base level's size bound below `max_bytes_for_level_base`.

Behavior for config where `max_bytes_for_level_base == level0_file_num_compaction_trigger * write_buffer_size` and same amount of data in L0 and base-level:

- Before #2027, compaction scoring would favor base-level due to dividing by size smaller than `max_bytes_for_level_base`.
- After #2027, L0 and Lbase get equal scores. The disadvantage is L0 is often compacted before reaching the num files trigger since `write_buffer_size` can be bigger than the dynamically chosen base-level size. This increases write-amp.
- After this diff, L0 and Lbase still get equal scores. Now it takes `level0_file_num_compaction_trigger` files of size `write_buffer_size` to trigger L0 compaction by size, fixing the write-amp problem above.
Closes https://github.com/facebook/rocksdb/pull/2123

Differential Revision: D4861570

Pulled By: ajkr

fbshipit-source-id: 467ddef56ed1f647c14d86bb018bcb044c39b964
2017-05-04 18:16:12 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Siying Dong
ff97287016 Refactor compaction picker code
Summary:
1. Move universal compaction picker to separate files compaction_picker_universal.cc and compaction_picker_universal.h.
2. Rename some functions to make the code easier to understand.
3. Move leveled compaction picking code to a dedicated class, so that we we don't need to pass some common variable around when calling functions. It also allowed us to break down LevelCompactionPicker::PickCompaction() to smaller functions.
Closes https://github.com/facebook/rocksdb/pull/2100

Differential Revision: D4845948

Pulled By: siying

fbshipit-source-id: efa0ab4
2017-04-06 20:09:34 -07:00
Andrew Kryczka
d659faad54 Level-based L0->L0 compaction
Summary:
Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach.

- L0->L0 is triggered when base level is unavailable due to pending compactions
- L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes.
- Subcompactions are disabled for L0->L0 since we want to output one file.
- Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0.
Closes https://github.com/facebook/rocksdb/pull/2027

Differential Revision: D4760318

Pulled By: ajkr

fbshipit-source-id: 9d07183
2017-04-04 18:09:11 -07:00
Aaron Gao
2a0f3d0de1 level compaction expansion
Summary:
reimplement the compaction expansion on lower level.

Considering such a case:
input level file: 1[B E] 2[F G] 3[H I] 4 [J M]
output level file: 5[A C] 6[D K] 7[L O]

If we initially pick file 2, now we will compact file 2 and 6. But we can safely compact 2, 3 and 6 without expanding the output level.

The previous code is messy and wrong.

In this diff, I first determine the input range [a, b], and output range [c, d],
then we get the range [e,f] = [min(a, c), max(b, d] and put all eligible clean-cut files within [e, f] into this compaction.

**Note: clean-cut means the files don't have the same user key on the boundaries of some files that are not chosen in this compaction**.
Closes https://github.com/facebook/rocksdb/pull/1760

Differential Revision: D4395564

Pulled By: lightmark

fbshipit-source-id: 2dc2c5c
2017-02-21 10:24:17 -08:00
Artemiy Kolesnikov
2f4fc539c6 Compaction::IsTrivialMove relaxing
Summary:
IsTrivialMove returns true if no input file overlaps with output_level+1 with more than max_compaction_bytes_ bytes.
Closes https://github.com/facebook/rocksdb/pull/1619

Differential Revision: D4278338

Pulled By: yiwu-arbug

fbshipit-source-id: 994c001
2016-12-07 11:54:11 -08:00
sdong
1168cb810a Fix a bug that may cause a deleted row to appear again
Summary:
The previous fix of reappearing of a deleted row 0ce258f9b3 missed a corner case, which can be reproduced using test CompactionPickerTest.OverlappingUserKeys7. Consider such an example:

input level file: 1[B E] 2[F H]
output level file: 3[A C] 4[D I] 5[I K]

First file 2 is picked, which overlaps to file 4. 4 expands to 5. Now the all range is [D K] with 2 output level files. When we try to expand that, [D K] overlaps with file 1 and 2 in the input level, and 1 and 2 overlaps with 3 and 4 in the output level. So we end up with picking 3 and 4 in the output level. Without expanding, it also has 2 files, so we determine the output level doesn't change, although they are the different two files.

The fix is to expand the output level files after we picked 3 and 4. In that case, there will be three output level files so we will abort the expanding.

I also added two unit tests related to marked_for_compaction and being_compacted. They have been passing though.

Test Plan: Run the new unit test, as well as all other tests.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: yoshinorim, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65373
2016-10-24 09:49:07 -07:00
Islam AbdelRahman
2ad68b971a Support running consistency checks in release mode
Summary:
We always run consistency checks when compiling in debug mode
allow users to set Options::force_consistency_checks to true to be able to run such checks even when compiling in release mode

Test Plan:
make check -j64
make release

Reviewers: lightmark, sdong, yiwu

Reviewed By: yiwu

Subscribers: hermanlee4, andrewkr, yoshinorim, jkedgar, dhruba

Differential Revision: https://reviews.facebook.net/D64701
2016-10-07 17:21:45 -07:00
Yi Wu
81747f1be6 Refactor MutableCFOptions
Summary:
* Change constructor of MutableCFOptions to depends only on ColumnFamilyOptions.
* Move `max_subcompactions`, `compaction_options_fifo` and `compaction_pri` to ImmutableCFOptions to make it clear that they are immutable.

Test Plan: existing unit tests.

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63945
2016-09-13 21:11:59 -07:00
sdong
32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
sdong
0ce258f9b3 Compaction picker to expand output level files for keys cross files' boundary too.
Summary: We may wrongly drop delete operation if we pick a file with the entry to be delete, the put entry of the same user key is in the next file in the level, and the next file is not picked. We expand compaction inputs for output level too.

Test Plan: Add unit tests that reproduct the bug of dropping delete entry. Change compaction_picker_test to assert the new behavior.

Reviewers: IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61173
2016-07-26 17:56:36 -07:00
Ashish Shenoy
99765ed855 Clean up the ComputeCompactionScore() API
Summary: Make CompactionOptionsFIFO a part of mutable_cf_options

Test Plan: UT

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, lgalanis, dhruba

Differential Revision: https://reviews.facebook.net/D58653
2016-05-23 15:55:29 -07:00
Aaron Gao
43afd72bee [rocksdb] make more options dynamic
Summary:
make more ColumnFamilyOptions dynamic:
- compression
- soft_pending_compaction_bytes_limit
- hard_pending_compaction_bytes_limit
- min_partial_merge_operands
- report_bg_io_stats
- paranoid_file_checks

Test Plan:
Add sanity check in `db_test.cc` for all above options except for soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit.
All passed.

Reviewers: andrewkr, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57519
2016-05-17 13:11:56 -07:00
sdong
bfb6b1b8a8 Estimate pending compaction bytes more accurately
Summary: Currently we estimate bytes needed for compaction by assuming fanout value to be level multiplier. It overestimates when size of a level exceeds the target by large. We estimate by the ratio of actual sizes in levels instead.

Test Plan: Fix existing test cases and add a new one.

Reviewers: IslamAbdelRahman, igor, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57789
2016-05-09 15:30:02 -07:00
sdong
6a14f7a976 Change several option defaults
Summary:
Changing several option defaults:
 options.max_open_files changes from 5000 to -1
 options.base_background_compactions changes from max_background_compactions to 1
 options.wal_recovery_mode changes from kTolerateCorruptedTailRecords to kTolerateCorruptedTailRecords
 options.compaction_pri changes from kByCompensatedSize to kByCompensatedSize

Test Plan: Write unit tests to see OldDefaults() works as expected.

Reviewers: IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: MarkCallaghan, yiwu, kradhakrishnan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56427
2016-04-28 17:50:58 -07:00
sdong
2feafa3db9 Change some RocksDB default options
Summary: Change some RocksDB default options to make it more friendly to server workloads.

Test Plan: Run all existing tests

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: sumeet, muthu, benj, MarkCallaghan, igor, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55941
2016-03-31 17:12:18 -07:00
sdong
92a9ccf1a6 Add a new compaction priority that picks file whose overlapping ratio is smallest
Summary:
Add a new compaction priority as following:
For every file, we calculate total size of files overalapping with the file in the next level, over the file's size itself. The file with smallest ratio will be picked first.
My "db_bench --fillrandom" shows about 5% less compaction than kOldestSmallestSeqFirst if --hard_pending_compaction_bytes_limit value to keep LSM tree in shape. If not limiting hard_pending_compaction_bytes_limit, improvement is only 1% or 2%.

Test Plan: Add a unit test

Reviewers: andrewkr, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54075
2016-02-11 15:59:19 -08:00
Baraa Hamodi
21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
sdong
235b162be1 Not scheduling more L1->L2 compaction if L0->L1 is pending with higher priority
Summary: When L0->L1 is pending, there may be one L1->L2 compaction going on which prevents the L0->L1 compaction from happening. If L1 needs more data to be moved to L2, then we may continue scheduling more L1->L2 compactions. The end result may be that L0->L1 compaction will not happen until L1 size drops to below target size. We can reduce the stalling because of number of L0 files by stopping schedling new L1->L2 compaction when L0's score is higher than L1.

Test Plan: Run all existing tests.

Reviewers: yhchiang, MarkCallaghan, rven, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52401
2016-01-08 13:56:57 -08:00
Igor Canadi
eb5b637fb0 Fix condition for bottommost level
Summary:
The function GetBoundaryKeys() returns the smallest key from the first file and largest key from the last file. This is good for any level >0, but it's not correct for level 0. In level 0, files can overlap, so we need to check all files for boundary keys. This bug can cause wrong value for bottommost_level in compaction (value of true, although correct is false), which means we can set sequence numbers to 0 even if the key is not the oldest one in the database.

Herman reported corruption while testing MyRocks. Fortunately, the patch that added the bug was not released yet.

Test Plan: added a new test to compaction_picker_test.

Reviewers: hermanlee4, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48201
2015-10-05 17:40:18 -07:00
Mayank Pundir
c58bac701c Fix valgrind failure due to memory leaks
Summary: Test cases for IsBottommostLevel function create FileMetaData objects which were not getting deleted in the destructor.

Test Plan: Valgrind check on compaction_picker_test

Reviewers: yhchiang, igor, sdong

Subscribers: rven, kradhakrishnan, IslamAbdelRahman, dhruba, anthony

Differential Revision: https://reviews.facebook.net/D47463
2015-09-23 17:41:42 -07:00
sdong
f1b9f804e9 Add a mode to always pick the oldest file to compact for each level
Summary:
Add options.compaction_pri, which specifies the policy about which file to compact first.
kCompactionPriByLargestSeq will compact oldest files first.
Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable.

Test Plan: Will write unit tests

Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45951
2015-09-21 17:21:59 -07:00
Mayank Pundir
a5e312a7a4 Improving condition for bottommost level during compaction
Summary: The diff modifies the condition checked to determine the bottommost level during compaction. Previously, absence of files in higher levels alone was used as the condition. Now, the function additionally evaluates if the higher levels have files which have non-overlapping key ranges, then the level can be safely considered as the bottommost level.

Test Plan: Unit test cases added and passing. However, unit tests of universal compaction are failing as a result of the changes made in this diff. Need to understand why that is happening.

Reviewers: igor

Subscribers: dhruba, sdong, lgalanis, meyering

Differential Revision: https://reviews.facebook.net/D46473
2015-09-16 17:47:50 -07:00
sdong
07d2d34160 Add a counter about estimated pending compaction bytes
Summary:
Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property.
In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits.

Test Plan: Add unit tests

Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44205
2015-08-20 22:17:10 -07:00
Islam AbdelRahman
20922c4a5a Make compaction_picker_test runnable in ROCKSDB_LITE
Summary: Remove universal and fifo compaction tests from ROCKSDB_LITE since they are not supported

Test Plan: compaction_picker_test

Reviewers: sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D42129
2015-07-20 10:46:09 -07:00
Poornima Chozhiyath Raman
c0b23dd5b0 Enabling trivial move in universal compaction
Summary: This change enables trivial move if all the input files are non onverlapping while doing Universal Compaction.

Test Plan: ./compaction_picker_test and db_test ran successfully with the new testcases.

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D40875
2015-07-07 14:18:55 -07:00
Andres Notzli
58d7ab3c68 Added tests for ExpandWhileOverlapping()
Summary:
This patch adds three test cases for ExpandWhileOverlapping()
to the compaction_picker_test test suite.
ExpandWhileOverlapping() only has an effect if the comparison
function for the internal keys allows for overlapping user
keys in different SST files on the same level. Thus, this
patch adds a comparator based on sequence numbers to
compaction_picker_test for the new test cases.

Test Plan:
- make compaction_picker_test && ./compaction_picker_test
  -> All tests pass
- Replace body of ExpandWhileOverlapping() with `return true`
  -> Compile and run ./compaction_picker_test as before
  -> New tests fail

Reviewers: sdong, yhchiang, rven, anthony, IslamAbdelRahman, kradhakrishnan, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41277
2015-07-06 22:25:27 -07:00
Islam AbdelRahman
3ce3bb3da2 Allowing L0 -> L1 trivial move on sorted data
Summary:
This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping

The conditions for trivial move have been updated

Introduced conditions:
  - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual)
  - Input level files cannot be overlapping

Removed conditions:
  - Trivial move only run when the compaction is not manual
  - Input level should can contain only 1 file

More context on what tests failed because of Trivial move
```
DBTest.CompactionsGenerateMultipleFiles
This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1
```

```
DBTest.NoSpaceCompactRange
This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation
because trivial move does not need any extra space, and did not check for that
```

```
DBTest.DropWrites
Similar to DBTest.NoSpaceCompactRange
```

```
DBTest.DeleteObsoleteFilesPendingOutputs
This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3
```

```
CuckooTableDBTest.CompactionIntoMultipleFiles
Same as DBTest.CompactionsGenerateMultipleFiles
```

This diff is based on a work by @sdong https://reviews.facebook.net/D34149

Test Plan: make -j64 check

Reviewers: rven, sdong, igor

Reviewed By: igor

Subscribers: yhchiang, ott, march, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D34797
2015-06-04 16:51:25 -07:00
Igor Canadi
b5881762bc Reset parent_index and base_index when picking files marked for compaction
Summary: This caused a crash of our MongoDB + RocksDB instance. PickCompactionBySize() sets its own parent_index. We never reset this parent_index when picking PickFilesMarkedForCompactionExperimental(). So we might end up doing SetupOtherInputs() with parent_index that was set by PickCompactionBySize, although we're using compaction calculated using PickFilesMarkedForCompactionExperimental.

Test Plan: Added a unit test that fails with assertion on master.

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38337
2015-05-12 11:16:25 -07:00
Igor Canadi
6059bdf86a Add experimental API MarkForCompaction()
Summary:
Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).

All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.

MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.

Test Plan: added a new unit test

Reviewers: yhchiang, rven, MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37083
2015-04-17 16:44:45 -07:00