rocksdb

Author	SHA1	Message	Date
PeifengSi	43defe9872	Correct the code comment in Compaction::KeyNotExistsBeyondOutputLevel (#4902 ) Summary: Even one key falls in a file's range, we can not infer it definitely exists in this file. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4902 Differential Revision: D13795018 Pulled By: siying fbshipit-source-id: 590956f727e9440fcdee55ad9541ace934c64914	2019-01-23 18:00:56 -08:00
Abhishek Madan	3a4bd36fed	Truncate range tombstones by leveraging InternalKeys (#4432 ) Summary: To more accurately truncate range tombstones at SST boundaries, we now represent them in RangeDelAggregator using InternalKeys, which are end-key-exclusive as they were before this change. During compaction, "atomic compaction unit boundaries" (the range of keys contained in neighbouring and overlaping SSTs) are propagated down to RangeDelAggregator to truncate range tombstones at those boundariies instead. See https://github.com/facebook/rocksdb/pull/4432#discussion_r221072219 and https://github.com/facebook/rocksdb/pull/4432#discussion_r221138683 for motivating examples. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4432 Differential Revision: D10263952 Pulled By: abhimadan fbshipit-source-id: 2fe85ff8a02b3a6a2de2edfe708012797a7bd579	2018-10-09 15:19:38 -07:00
Maysam Yabandeh	1fb6805527	Fix snprintf buffer overflow bug (#4465 ) Summary: The contract of snprintf says that it returns "The number of characters that would have been written if n had been sufficiently large" http://www.cplusplus.com/reference/cstdio/snprintf/ The existing code however was assuming that the return value is the actual number of written bytes and uses that to reposition the starting point on the next call to snprintf. This leads to buffer overflow when the last call to snprintf has filled up the buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4465 Differential Revision: D10224080 Pulled By: maysamyabandeh fbshipit-source-id: 40f44e122d15b0db439812a0a361167cf012de3e	2018-10-05 14:50:51 -07:00
Zhichao Cao	1f6efabe23	Add bottommost_compression_opts to for bottommost_compression (#3985 ) Summary: …ression For `CompressionType` we have options `compression` and `bottommost_compression`. Thus, to make the compression options consitent with the compression type when bottommost_compression is enabled, we add the bottommost_compression_opts Closes https://github.com/facebook/rocksdb/pull/3985 Reviewed By: riversand963 Differential Revision: D8385911 Pulled By: zhichao-cao fbshipit-source-id: 07bc533dd61bcf1cef5927d8d62901c13d38d5fc	2018-06-27 17:42:38 -07:00
Zhongyi Xie	a703432808	MaxFileSizeForLevel: adjust max_file_size for dynamic level compaction Summary: `MutableCFOptions::RefreshDerivedOptions` always assume base level is L1, which is not true when `level_compaction_dynamic_level_bytes=true` and Level based compaction is used. This PR fixes this by recomputing `max_file_size` at query time (in `MaxFileSizeForLevel`) Fixes https://github.com/facebook/rocksdb/issues/3229 In master: ``` Level Files Size(MB) -------------------- 0 14 846 1 0 0 2 0 0 3 0 0 4 0 0 5 15 366 6 11 481 Cumulative compaction: 3.83 GB write, 2.27 GB read ``` In branch: ``` Level Files Size(MB) -------------------- 0 9 544 1 0 0 2 0 0 3 0 0 4 0 0 5 0 0 6 445 935 Cumulative compaction: 2.91 GB write, 1.46 GB read ``` db_bench command used: ``` ./db_bench --benchmarks="fillrandom,deleterandom,fillrandom,levelstats,stats" --statistics -deletes=5000 -db=tmp -compression_type=none --num=20000 -value_size=100000 -level_compaction_dynamic_level_bytes=true -target_file_size_base=2097152 -target_file_size_multiplier=2 ``` Closes https://github.com/facebook/rocksdb/pull/3755 Differential Revision: D7721381 Pulled By: miasantreble fbshipit-source-id: 39afb8503190bac3b466adf9bbf2a9b3655789f8	2018-05-03 16:42:13 -07:00
Huachao Huang	ed7a95b28c	Add max_subcompactions as a compaction option Summary: Sometimes we want to compact files as fast as possible, but don't want to set a large `max_subcompactions` in the `DBOptions` by default. I add a `max_subcompactions` options to `CompactionOptions` so that we can choose a proper concurrency dynamically. Closes https://github.com/facebook/rocksdb/pull/3775 Differential Revision: D7792357 Pulled By: ajkr fbshipit-source-id: 94f54c3784dce69e40a229721a79a97e80cd6a6c	2018-04-27 11:57:39 -07:00
Andrew Kryczka	20c508c1ed	Enable subcompactions in manual level-based compaction Summary: This is the simplest way I could think of to speed up `CompactRange`. It works but isn't that optimal because it relies on the same `max_compaction_bytes` and `max_subcompactions` options that are used in other places. If it turns out to be useful we can allow overriding these in `CompactRangeOptions` in the future. Closes https://github.com/facebook/rocksdb/pull/3549 Differential Revision: D7117634 Pulled By: ajkr fbshipit-source-id: d0cd03d6bd0d2fd7ea3fb13cd3b8bf7c47d11e42	2018-03-06 12:43:51 -08:00
Siying Dong	7291a3f813	Improve fallocate size in compaction output Summary: Now in leveled compaction, we allocate solely based on output target file size. If the total input size is smaller than the number, we should use the total input size instead. Also, cap the allocate size to 1GB. Closes https://github.com/facebook/rocksdb/pull/3385 Differential Revision: D6762363 Pulled By: siying fbshipit-source-id: e30906f6e9bff3ec847d2166e44cb49c92f98a13	2018-01-22 16:43:46 -08:00
Zhongyi Xie	fcc8a6574d	Make Universal compaction options dynamic Summary: Let me know if more test coverage is needed Closes https://github.com/facebook/rocksdb/pull/3213 Differential Revision: D6457165 Pulled By: miasantreble fbshipit-source-id: 3f944abff28aa7775237f1c4f61c64ccbad4eea9	2017-12-11 13:27:06 -08:00
Andrew Kryczka	6a9335dbbb	always drop tombstones compacted to bottommost level Summary: Problem was in bottommost compaction, when an L0->L0 compaction happened and L0 was bottommost. Then we'd preserve tombstones according to `Compaction::KeyNotExistsBeyondOutputLevel`, while zeroing seqnum according to `CompactionIterator::PrepareOutput`, thus triggering the assertion in `PrepareOutput`. To fix, we can just drop tombstones in L0->L0 when the output is "bottommost", i.e., the compaction includes the oldest L0 file and there's nothing at lower levels. Closes https://github.com/facebook/rocksdb/pull/3085 Differential Revision: D6175742 Pulled By: ajkr fbshipit-source-id: 8ab19a2e001496f362e9eb0a71757e2f6ecfdb3b	2017-10-27 15:56:35 -07:00
Andrew Kryczka	9b18cc2363	single-file bottom-level compaction when snapshot released Summary: When snapshots are held for a long time, files may reach the bottom level containing overwritten/deleted keys. We previously had no mechanism to trigger compaction on such files. This particularly impacted DBs that write to different parts of the keyspace over time, as such files would never be naturally compacted due to second-last level files moving down. This PR introduces a mechanism for bottommost files to be recompacted upon releasing all snapshots that prevent them from dropping their deleted/overwritten keys. - Changed `CompactionPicker` to compact files in `BottommostFilesMarkedForCompaction()`. These are the last choice when picking. Each file will be compacted alone and output to the same level in which it originated. The goal of this type of compaction is to rewrite the data excluding deleted/overwritten keys. - Changed `ReleaseSnapshot()` to recompute the bottom files marked for compaction when the oldest existing snapshot changes, and schedule a compaction if needed. We cache the value that oldest existing snapshot needs to exceed in order for another file to be marked in `bottommost_files_mark_threshold_`, which allows us to avoid recomputing marked files for most snapshot releases. - Changed `VersionStorageInfo` to track the list of bottommost files, which is recomputed every time the version changes by `UpdateBottommostFiles()`. The list of marked bottommost files is first computed in `ComputeBottommostFilesMarkedForCompaction()` when the version changes, but may also be recomputed when `ReleaseSnapshot()` is called. - Extracted core logic of `Compaction::IsBottommostLevel()` into `VersionStorageInfo::RangeMightExistAfterSortedRun()` since logic to check whether a file is bottommost is now necessary outside of compaction. Closes https://github.com/facebook/rocksdb/pull/3009 Differential Revision: D6062044 Pulled By: ajkr fbshipit-source-id: 123d201cf140715a7d5928e8b3cb4f9cd9f7ad21	2017-10-25 16:30:37 -07:00
Siying Dong	71598cdc75	Fix false removal of tombstone issue in FIFO and kCompactionStyleNone Summary: Similar to the bug fixed by https://github.com/facebook/rocksdb/pull/2726, FIFO with compaction and kCompactionStyleNone during user customized CompactFiles() with output level to be 0 can suffer from the same problem. Fix it by leveraging the bottommost_level_ flag. Closes https://github.com/facebook/rocksdb/pull/2735 Differential Revision: D5626906 Pulled By: siying fbshipit-source-id: 2b148d0461c61dbd986d74655e384419ae442158	2017-08-15 13:02:19 -07:00
Andrew Kryczka	acf935e40f	fix deletion dropping in intra-L0 Summary: `KeyNotExistsBeyondOutputLevel` didn't consider L0 files' key-ranges. So if a key only was covered by older L0 files' key-ranges, we would incorrectly drop deletions of that key. This PR just skips the deletion-dropping optimization when output level is L0. Closes https://github.com/facebook/rocksdb/pull/2726 Differential Revision: D5617286 Pulled By: ajkr fbshipit-source-id: 4bff1396b06d49a828ba4542f249191052915bce	2017-08-11 18:12:38 -07:00
奏之章	3218edc573	Fix universal compaction bug Summary: this value ``` Compaction::is_trivial_move_ ``` uninitialized . under universal compaction , we enable ``` CompactionOptionsUniversal::allow_trivial_move ``` , `9b11d4345a/db/compaction.cc (L245)` here is a disastrous bug , some sst trivial move to target level without overlap check ... THEN , DATABASE DAMAGED , WE GOT A LEVEL WITH OVERLAP ! Closes https://github.com/facebook/rocksdb/pull/2634 Differential Revision: D5530722 Pulled By: siying fbshipit-source-id: 425ab55bca5967110377d634258360bcf88c200e	2017-07-31 14:27:45 -07:00
Siying Dong	3c327ac2d0	Change RocksDB License Summary: Closes https://github.com/facebook/rocksdb/pull/2589 Differential Revision: D5431502 Pulled By: siying fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75	2017-07-15 16:11:23 -07:00
Sagar Vemuri	1cd45cd1b3	FIFO Compaction with TTL Summary: Introducing FIFO compactions with TTL. FIFO compaction is based on size only which makes it tricky to enable in production as use cases can have organic growth. A user requested an option to drop files based on the time of their creation instead of the total size. To address that request: - Added a new TTL option to FIFO compaction options. - Updated FIFO compaction score to take TTL into consideration. - Added a new table property, creation_time, to keep track of when the SST file is created. - Creation_time is set as below: - On Flush: Set to the time of flush. - On Compaction: Set to the max creation_time of all the files involved in the compaction. - On Repair and Recovery: Set to the time of repair/recovery. - Old files created prior to this code change will have a creation_time of 0. - FIFO compaction with TTL is enabled when ttl > 0. All files older than ttl will be deleted during compaction. i.e. `if (file.creation_time < (current_time - ttl)) then delete(file)`. This will enable cases where you might want to delete all files older than, say, 1 day. - FIFO compaction will fall back to the prior way of deleting files based on size if: - the creation_time of all files involved in compaction is 0. - the total size (of all SST files combined) does not drop below `compaction_options_fifo.max_table_files_size` even if the files older than ttl are deleted. This feature is not supported if max_open_files != -1 or with table formats other than Block-based. Test Plan: Added tests. Benchmark results: Base: FIFO with max size: 100MB :: ``` svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 readwhilewriting : 1.924 micros/op 519858 ops/sec; 13.6 MB/s (1176277 of 5000000 found) ``` With TTL (a low one for testing) :: ``` svemuri@dev15905 ~/rocksdb (fifo-compaction) $ TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=readwhilewriting --num=5000000 --threads=16 --compaction_style=2 --fifo_compaction_max_table_files_size_mb=100 --fifo_compaction_ttl=20 readwhilewriting : 1.902 micros/op 525817 ops/sec; 13.7 MB/s (1185057 of 5000000 found) ``` Example Log lines: ``` 2017/06/26-15:17:24.609249 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609177) [db/compaction_picker.cc:1471] [default] FIFO compaction: picking file 40 with creation time 1498515423 for deletion 2017/06/26-15:17:24.609255 7fd5a45ff700 (Original Log Time 2017/06/26-15:17:24.609234) [db/db_impl_compaction_flush.cc:1541] [default] Deleted 1 files ... 2017/06/26-15:17:25.553185 7fd5a61a5800 [DEBUG] [db/db_impl_files.cc:309] [JOB 0] Delete /dev/shm/dbbench/000040.sst type=2 #40 -- OK 2017/06/26-15:17:25.553205 7fd5a61a5800 EVENT_LOG_v1 {"time_micros": 1498515445553199, "job": 0, "event": "table_file_deletion", "file_number": 40} ``` SST Files remaining in the dbbench dir, after db_bench execution completed: ``` svemuri@dev15905 ~/rocksdb (fifo-compaction) $ ls -l /dev/shm//dbbench/*.sst -rw-r--r--. 1 svemuri users 30749887 Jun 26 15:17 /dev/shm//dbbench/000042.sst -rw-r--r--. 1 svemuri users 30768779 Jun 26 15:17 /dev/shm//dbbench/000044.sst -rw-r--r--. 1 svemuri users 30757481 Jun 26 15:17 /dev/shm//dbbench/000046.sst ``` Closes https://github.com/facebook/rocksdb/pull/2480 Differential Revision: D5305116 Pulled By: sagar0 fbshipit-source-id: 3e5cfcf5dd07ed2211b5b37492eb235b45139174	2017-06-27 17:11:48 -07:00
Andrew Kryczka	7c1c8ce5ac	Avoid calling fallocate with UINT64_MAX Summary: When user doesn't set a limit on compaction output file size, let's use the sum of the input files' sizes. This will avoid passing UINT64_MAX as fallocate()'s length. Reported in #2249. Test setup: - command: `TEST_TMPDIR=/data/rocksdb-test/ strace -e fallocate ./db_compaction_test --gtest_filter=DBCompactionTest.ManualCompactionUnknownOutputSize` - filesystem: xfs before this diff: `fallocate(10, 01, 0, 1844674407370955160) = -1 ENOSPC (No space left on device)` after this diff: `fallocate(10, 01, 0, 1977) = 0` Closes https://github.com/facebook/rocksdb/pull/2252 Differential Revision: D5007275 Pulled By: ajkr fbshipit-source-id: 4491404a6ae8a41328aede2e2d6f4d9ac3e38880	2017-05-04 17:43:22 -07:00
Siying Dong	d616ebea23	Add GPLv2 as an alternative license. Summary: Closes https://github.com/facebook/rocksdb/pull/2226 Differential Revision: D4967547 Pulled By: siying fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4	2017-04-27 18:06:12 -07:00
Sagar Vemuri	343b59d6ee	Move various string utility functions into string_util Summary: This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper. Check the sub-commits for easier reviewing. Closes https://github.com/facebook/rocksdb/pull/2094 Differential Revision: D4837730 Pulled By: sagar0 fbshipit-source-id: 344278a	2017-04-06 14:54:12 -07:00
Andrew Kryczka	d659faad54	Level-based L0->L0 compaction Summary: Level-based L0->L0 compaction operates on spans of files that aren't currently being compacted. It reduces the number of L0 files, thus making write stall conditions harder to reach. - L0->L0 is triggered when base level is unavailable due to pending compactions - L0->L0 always outputs one file of at most `max_level0_burst_file_size` bytes. - Subcompactions are disabled for L0->L0 since we want to output one file. - Input files are chosen as the longest span of available files that will fit within the size limit. This minimizes number of files in L0. Closes https://github.com/facebook/rocksdb/pull/2027 Differential Revision: D4760318 Pulled By: ajkr fbshipit-source-id: 9d07183	2017-04-04 18:09:11 -07:00
Artemiy Kolesnikov	2f4fc539c6	Compaction::IsTrivialMove relaxing Summary: IsTrivialMove returns true if no input file overlaps with output_level+1 with more than max_compaction_bytes_ bytes. Closes https://github.com/facebook/rocksdb/pull/1619 Differential Revision: D4278338 Pulled By: yiwu-arbug fbshipit-source-id: 994c001	2016-12-07 11:54:11 -08:00
Yi Wu	81747f1be6	Refactor MutableCFOptions Summary: * Change constructor of MutableCFOptions to depends only on ColumnFamilyOptions. * Move `max_subcompactions`, `compaction_options_fifo` and `compaction_pri` to ImmutableCFOptions to make it clear that they are immutable. Test Plan: existing unit tests. Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D63945	2016-09-13 21:11:59 -07:00
sdong	32149059f9	Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes. Test Plan: Add two new unit tests. Run all existing tests, including jtest. Reviewers: yhchiang, igor, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: leveldb, andrewkr, dhruba Differential Revision: https://reviews.facebook.net/D59829	2016-09-01 14:33:24 -07:00
Islam AbdelRahman	fa813f7478	Update DB::AddFile() to ingest the file to the lowest possible level Summary: DB::AddFile() right now always add the ingested file to L0 update the logic to add the file to the lowest possible level Test Plan: unit tests Reviewers: jkedgar, sdong Reviewed By: sdong Subscribers: andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D59637	2016-06-21 17:57:59 -07:00
Aaron Gao	43afd72bee	[rocksdb] make more options dynamic Summary: make more ColumnFamilyOptions dynamic: - compression - soft_pending_compaction_bytes_limit - hard_pending_compaction_bytes_limit - min_partial_merge_operands - report_bg_io_stats - paranoid_file_checks Test Plan: Add sanity check in `db_test.cc` for all above options except for soft_pending_compaction_bytes_limit and hard_pending_compaction_bytes_limit. All passed. Reviewers: andrewkr, sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D57519	2016-05-17 13:11:56 -07:00
Islam AbdelRahman	4b31723433	Add bottommost_compression option Summary: Add a new option that can be used to set a specific compression algorithm for bottommost level. This option will only affect levels larger than base level. I have also updated CompactionJobInfo to include the compression algorithm used in compaction Test Plan: added new unittest existing unittests Reviewers: andrewkr, yhchiang, sdong Reviewed By: sdong Subscribers: lightmark, andrewkr, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D57669	2016-05-09 15:57:19 -07:00
Yueh-Hsuan Chiang	be9816b3d9	Fix data race issue when sub-compaction is used in CompactionJob Summary: When subcompaction is used, all subcompactions share the same Compaction pointer in CompactionJob while each subcompaction all keeps their mutable stats in SubcompactionState. However, there're still some mutable part that is currently store in the shared Compaction pointer. This patch makes two changes: 1. Make the shared Compaction pointer const so that it can never be modified during the compaction. 2. Move necessary states from Compaction to SubcompactionState. 3. Make functions of Compaction const if the function does not modify its internal state. Test Plan: rocksdb and MyRocks test Reviewers: sdong, kradhakrishnan, andrewkr, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: andrewkr, dhruba, yoshinorim, gunnarku, leveldb Differential Revision: https://reviews.facebook.net/D55923	2016-03-24 19:36:39 -07:00
Baraa Hamodi	21e95811d1	Updated all copyright headers to the new format.	2016-02-09 15:12:00 -08:00
Siying Dong	298ba27ae2	Merge pull request #846 from yuslepukhin/enble_c4244_lossofdata Enable MS compiler warning c4244.	2015-12-23 22:59:42 -08:00
Islam AbdelRahman	d005c66faf	Report compaction reason in CompactionListener Summary: Add CompactionReason to CompactionJobInfo This will allow users to understand why compaction started which will help options tuning Test Plan: added new tests make check -j64 Reviewers: yhchiang, anthony, kradhakrishnan, sdong, rven Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D51975	2015-12-22 11:37:19 -08:00
Reid Horuff	97ea8afaaf	compaction assertion triggering test fix for sequence zeroing assertion trip	2015-12-18 16:08:31 -08:00
Gunnar Kudrjavets	97265f5f14	Fix minor bugs in delete operator, snprintf, and size_t usage Summary: List of changes: 1) Fix the snprintf() usage in cases where wrong variable was used to determine the output buffer size. 2) Remove unnecessary checks before calling delete operator. 3) Increase code correctness by using size_t type when getting vector's size. 4) Unify the coding style by removing namespace::std usage at the top of the file to confirm to the majority usage. 5) Fix various lint errors pointed out by 'arc lint'. Test Plan: Code review and build: git diff make clean make -j 32 commit-prereq arc lint Reviewers: kradhakrishnan, sdong, rven, anthony, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D51849	2015-12-15 15:26:20 -08:00
Dmitri Smirnov	236fe21c92	Enable MS compiler warning c4244. Mostly due to the fact that there are differences in sizes of int,long on 64 bit systems vs GNU.	2015-12-11 16:47:34 -08:00
Vasili Svirski	41b32c6059	Enable C4267 warning * conversion from 'size_t' to 'type', by add static_cast Tested: * by build solution on Windows, Linux locally, * run tests * build CI system successful	2015-11-24 16:33:09 +03:00
sdong	000836a880	CompactionFilter::Context to contain column family ID Summary: Add the column family ID to compaction filter context, so it is easier for compaction filter to apply different logic for different column families. Test Plan: Add a unit test to verify the column family ID passed is correct. Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D48357	2015-10-08 11:27:38 -07:00
Igor Canadi	eb5b637fb0	Fix condition for bottommost level Summary: The function GetBoundaryKeys() returns the smallest key from the first file and largest key from the last file. This is good for any level >0, but it's not correct for level 0. In level 0, files can overlap, so we need to check all files for boundary keys. This bug can cause wrong value for bottommost_level in compaction (value of true, although correct is false), which means we can set sequence numbers to 0 even if the key is not the oldest one in the database. Herman reported corruption while testing MyRocks. Fortunately, the patch that added the bug was not released yet. Test Plan: added a new test to compaction_picker_test. Reviewers: hermanlee4, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D48201	2015-10-05 17:40:18 -07:00
Mayank Pundir	a5e312a7a4	Improving condition for bottommost level during compaction Summary: The diff modifies the condition checked to determine the bottommost level during compaction. Previously, absence of files in higher levels alone was used as the condition. Now, the function additionally evaluates if the higher levels have files which have non-overlapping key ranges, then the level can be safely considered as the bottommost level. Test Plan: Unit test cases added and passing. However, unit tests of universal compaction are failing as a result of the changes made in this diff. Need to understand why that is happening. Reviewers: igor Subscribers: dhruba, sdong, lgalanis, meyering Differential Revision: https://reviews.facebook.net/D46473	2015-09-16 17:47:50 -07:00
Ari Ekmekji	3c37b3cccd	Determine boundaries of subcompactions Summary: Up to this point, the subcompactions that make up a compaction job have been divided based on the key range of the L1 files, and each subcompaction has handled the key range of only one file. However DBOption.max_subcompactions allows the user to designate how many subcompactions at most to perform. This patch updates the CompactionJob::GetSubcompactionBoundaries() to determine these divisions accordingly based on that option and other input/system factors. The current approach orders the starting and/or ending keys of certain compaction input files and then generates a histogram to approximate the size covered by the key range between each consecutive pair of keys. Then it groups these ranges into groups so that the sizes are approximately equal to one another. The approach has also been adapted to work for universal compaction as well instead of just for level-based compaction as it was before. These subcompactions are then executed in parallel by locally spawning threads, one for each. The results are then aggregated and the compaction completed. Test Plan: make all && make check Reviewers: yhchiang, anthony, igor, noetzli, sdong Reviewed By: sdong Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43269	2015-09-10 13:50:00 -07:00
Ari Ekmekji	f0da6977a3	[Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State Summary: In prepration for running multiple threads at the same time during a compaction job, this patch assigns each subcompaction its own state (instead of sharing the one global CompactionState). Each subcompaction then uses this state to update its statistics, keep track of its snapshots, etc. during the course of execution. Then at the end of all the executions the statistics are aggregated across the subcompactions so that the final result is the same as if only one larger compaction had run. Test Plan: ./db_test ./db_compaction_test ./compaction_job_test Reviewers: sdong, anthony, igor, noetzli, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43239	2015-08-18 11:06:23 -07:00
Poornima Chozhiyath Raman	1bdfcef7bf	Fix when output level is 0 of universal compaction with trivial move Summary: Fix for universal compaction with trivial move, when the ouput level is 0. The tests where failing. Fixed by allowing normal compaction when output level is 0. Test Plan: modified test cases run successfully. Reviewers: sdong, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: anthony, kradhakrishnan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42933	2015-07-27 14:25:57 -07:00
Igor Canadi	a96fcd09b7	Deprecate CompactionFilterV2 Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature. Test Plan: make check Reviewers: noetzli, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42405	2015-07-17 18:59:11 +02:00
sdong	f9728640f3	"make format" against last 10 commits Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more. Test Plan: Build it. Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41961	2015-07-13 13:50:18 -07:00
Dmitri Smirnov	c903ccc4c2	Merge from github/master	2015-07-09 18:01:08 -07:00
Poornima Chozhiyath Raman	4bed00a44b	Fix function name format according to google style Summary: Change the naming style of getter and setters according to Google C++ style in compaction.h file Test Plan: Compilation success Reviewers: sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41265	2015-07-08 15:21:10 -07:00
Poornima Chozhiyath Raman	c0b23dd5b0	Enabling trivial move in universal compaction Summary: This change enables trivial move if all the input files are non onverlapping while doing Universal Compaction. Test Plan: ./compaction_picker_test and db_test ran successfully with the new testcases. Reviewers: sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40875	2015-07-07 14:18:55 -07:00
Dmitri Smirnov	18285c1e2f	Windows Port from Microsoft Summary: Make RocksDb build and run on Windows to be functionally complete and performant. All existing test cases run with no regressions. Performance numbers are in the pull-request. Test plan: make all of the existing unit tests pass, obtain perf numbers. Co-authored-by: Praveen Rao praveensinghrao@outlook.com Co-authored-by: Sherlock Huang baihan.huang@gmail.com Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com	2015-07-01 16:13:56 -07:00
Islam AbdelRahman	643bbbf081	Use nullptr for default compaction_filter_factory Summary: Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2 The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI Test Plan: make check Reviewers: yoshinorim, ott, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39693	2015-06-08 16:34:26 -07:00
Islam AbdelRahman	3ce3bb3da2	Allowing L0 -> L1 trivial move on sorted data Summary: This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping The conditions for trivial move have been updated Introduced conditions: - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual) - Input level files cannot be overlapping Removed conditions: - Trivial move only run when the compaction is not manual - Input level should can contain only 1 file More context on what tests failed because of Trivial move ``` DBTest.CompactionsGenerateMultipleFiles This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1 ``` ``` DBTest.NoSpaceCompactRange This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation because trivial move does not need any extra space, and did not check for that ``` ``` DBTest.DropWrites Similar to DBTest.NoSpaceCompactRange ``` ``` DBTest.DeleteObsoleteFilesPendingOutputs This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3 ``` ``` CuckooTableDBTest.CompactionIntoMultipleFiles Same as DBTest.CompactionsGenerateMultipleFiles ``` This diff is based on a work by @sdong https://reviews.facebook.net/D34149 Test Plan: make -j64 check Reviewers: rven, sdong, igor Reviewed By: igor Subscribers: yhchiang, ott, march, dhruba, sdong Differential Revision: https://reviews.facebook.net/D34797	2015-06-04 16:51:25 -07:00
sdong	ec43a8b9fb	Universal Compaction with multiple levels won't allocate up to output size Summary: Universal compactions with multiple levels should use file preallocation size based on file size if output level is not level 0 Test Plan: Run all tests. Reviewers: igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38439	2015-05-13 14:15:46 -07:00
Yueh-Hsuan Chiang	77a5a543a5	Allow GetThreadList() to report basic compaction operation properties. Summary: Now we're able to show more details about a compaction in GetThreadList() :) This patch allows GetThreadList() to report basic compaction operation properties. Basic compaction properties include: 1. job id 2. compaction input / output level 3. compaction property flags (is_manual, is_deletion, .. etc) 4. total input bytes 5. the number of bytes has been read currently. 6. the number of bytes has been written currently. Flush operation properties will be done in a seperate diff. Test Plan: /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1 Sample output of tracking same job: ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 31.357 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 59.440 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 226.375 ms CompactionJob::Install BaseInputLevel 1 \| BytesRead 3958013 \| BytesWritten 3621940 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37653	2015-05-06 22:51:06 -07:00

1 2

97 Commits