rocksdb

Author	SHA1	Message	Date
Igor Canadi	d80ce7f99a	Compaction filter on merge operands Summary: Since Andres' internship is over, I took over https://reviews.facebook.net/D42555 and rebased and simplified it a bit. The behavior in this diff is a bit simpler than in D42555: * only merge operators are passed through FilterMergeValue(). If fitler function returns true, the merge operator is ignored * compaction filter is not called on: 1) results of merge operations and 2) base values that are getting merged with merge operands (the second case was also true in previous diff) Do we also need a compaction filter to get called on merge results? Test Plan: make && make check Reviewers: lovro, tnovak, rven, yhchiang, sdong Reviewed By: sdong Subscribers: noetzli, kolmike, leveldb, dhruba, sdong Differential Revision: https://reviews.facebook.net/D47847	2015-10-07 09:30:03 -07:00
Ari Ekmekji	5ba3297d0d	Add compaction time to log output Summary: Although compaction time is recorded in the statistics, it is helpful to include this value in the log output corresponding to the end of compaction. Test Plan: make all && make check Reviewers: yhchiang, sdong, igor, noetzli, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D47007	2015-09-15 17:11:44 -07:00
Ari Ekmekji	03ddce9a01	Add counters for L0 stall while L0-L1 compaction is taking place Summary: Although there are currently counters to keep track of the stall caused by having too many L0 files, there is no distinction as to whether when that stall occurs either (A) L0-L1 compaction is taking place to try and mitigate it, or (B) no L0-L1 compaction has been scheduled at the moment. This diff adds a counter for (A) so that the nature of L0 stalls can be better understood. Test Plan: make all && make check Reviewers: sdong, igor, anthony, noetzli, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba Differential Revision: https://reviews.facebook.net/D46749	2015-09-14 11:03:37 -07:00
Andres Noetzli	8aa1f15197	Refactored common code of Builder/CompactionJob out into a CompactionIterator Summary: Builder and CompactionJob share a lot of fairly complex code. This patch refactors this code into a separate class, the CompactionIterator. Because the shared code is fairly complex, this patch hopefully improves maintainability. While there are is a lot of potential for further improvements, the patch is intentionally pretty close to the original structure because the change is already complex enough. Test Plan: make clean all check && ./db_stress Reviewers: rven, anthony, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46197	2015-09-10 14:35:25 -07:00
Ari Ekmekji	3c37b3cccd	Determine boundaries of subcompactions Summary: Up to this point, the subcompactions that make up a compaction job have been divided based on the key range of the L1 files, and each subcompaction has handled the key range of only one file. However DBOption.max_subcompactions allows the user to designate how many subcompactions at most to perform. This patch updates the CompactionJob::GetSubcompactionBoundaries() to determine these divisions accordingly based on that option and other input/system factors. The current approach orders the starting and/or ending keys of certain compaction input files and then generates a histogram to approximate the size covered by the key range between each consecutive pair of keys. Then it groups these ranges into groups so that the sizes are approximately equal to one another. The approach has also been adapted to work for universal compaction as well instead of just for level-based compaction as it was before. These subcompactions are then executed in parallel by locally spawning threads, one for each. The results are then aggregated and the compaction completed. Test Plan: make all && make check Reviewers: yhchiang, anthony, igor, noetzli, sdong Reviewed By: sdong Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43269	2015-09-10 13:50:00 -07:00
Andres Noetzli	6bdc484fd8	Added Equal method to Comparator interface Summary: In some cases, equality comparisons can be done more efficiently than three-way comparisons. There are quite a few places in the code where we only care about equality. This patch adds an Equal() method that defaults to using the Compare() method. Test Plan: make clean all check Reviewers: rven, anthony, yhchiang, igor, sdong Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46233	2015-09-08 15:30:49 -07:00
Ari Ekmekji	2f8d71ec05	Moving sequence number compaction variables from SubCompactionState to CompactionJob Summary: It was pointed out to me that the members of SubCompactionState 'earliest_snapshot', 'latest_snapshot' and 'visible_at_tip' are never modified by the subcompactions, so they can stay as global varaibles instead to make things simpler. Test Plan: make all && make check Reviewers: sdong, igor, noetzli, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D45477	2015-08-25 14:03:10 -07:00
Ari Ekmekji	b6def58f73	Changed 'num_subcompactions' to the more accurate 'max_subcompactions' Summary: Up until this point we had DbOptions.num_subcompactions, but it is semantically more correct to call this max_subcompactions since we will schedule up to DbOptions.max_subcompactions smaller compactions at a time during a compaction job. I also added a --subcompactions option to db_bench Test Plan: make all make check Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D45069	2015-08-21 14:25:34 -07:00
sdong	c852968465	db_iter_test: add more test cases for the data race bug Summary: Add more test cases of data race causing wrong iterating results. Tag tests not passing as DISABLED_ Test Plan: Run the tests Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, yhchiang Reviewed By: yhchiang Subscribers: tnovak, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44907	2015-08-21 12:14:12 -07:00
Dmitri Smirnov	5bf8907622	More indent adjustment.	2015-08-20 14:14:02 -07:00
Dmitri Smirnov	e2a9f43d64	Adjust indent	2015-08-20 14:10:51 -07:00
Dmitri Smirnov	1cac89c9b1	Address windows build issues Intro SubCompactionState move functionality =delete copy functionality #ifdef SyncPoint in tests for Windows Release builds	2015-08-20 14:08:24 -07:00
Ari Ekmekji	137c376675	Removing variables used only in assertions to prevent build error Summary: A couple variables were declared but only used in assertions which causes issues when building in fbcode. Test Plan: make dbg and make release Reviewers: yhchiang, sdong, igor, anthony, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44937	2015-08-19 08:52:22 -07:00
Ari Ekmekji	b47cc58516	Bounding Number of Subcompactions Summary: In D43239 (https://reviews.facebook.net/D43239) the number of subcompactions is set based on the number of L1 files with unique starting keys. In certain cases when this number is very large this causes issues, particularly with the overlap between files since very small output files can be generated. This diff bounds the number of subcompactions to the user option DBOption.num_subcompactions. Test Plan: ./db_test ./db_compaction_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44883	2015-08-18 14:56:31 -07:00
Ari Ekmekji	601b1aaca0	Fixing Failed Assertion in Subcompaction State Diff Summary: In D43239 (https://reviews.facebook.net/D43239) there is an assertion to make sure a subcompaction's output is never empty at the end of execution. This assertion however breaks the build because some tests lead to exactly that scenario. So instead I have altered the logic to handle this case instead of just failing the assertion. The reason that it is possible for a subcompaction's output to be empty is that during a sequential execution of subcompactions, if a user aborts the compaction job then some of the later subcompactions to be executed may have yet to process any keys and therefore have yet to generate output files. This becomes very rare once the subcompactions are executed in parallel, but for now they are still sequential so the case is possible when there is an early termination, as in some of the tests. Test Plan: ./db_test ./db_compaction_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44877	2015-08-18 12:27:12 -07:00
Ari Ekmekji	f0da6977a3	[Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State Summary: In prepration for running multiple threads at the same time during a compaction job, this patch assigns each subcompaction its own state (instead of sharing the one global CompactionState). Each subcompaction then uses this state to update its statistics, keep track of its snapshots, etc. during the course of execution. Then at the end of all the executions the statistics are aggregated across the subcompactions so that the final result is the same as if only one larger compaction had run. Test Plan: ./db_test ./db_compaction_test ./compaction_job_test Reviewers: sdong, anthony, igor, noetzli, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43239	2015-08-18 11:06:23 -07:00
Andres Notzli	f32a572099	Simplify querying of merge results Summary: While working on supporting mixing merge operators with single deletes ( https://reviews.facebook.net/D43179 ), I realized that returning and dealing with merge results can be made simpler. Submitting this as a separate diff because it is not directly related to single deletes. Before, callers of merge helper had to retrieve the merge result in one of two ways depending on whether the merge was successful or not (success = result of merge was single kTypeValue). For successful merges, the caller could query the resulting key/value pair and for unsuccessful merges, the result could be retrieved in the form of two deques of keys and values. However, with single deletes, a successful merge does not return a single key/value pair (if merge operands are merged with a single delete, we have to generate a value and keep the original single delete around to make sure that we are not accidentially producing a key overwrite). In addition, the two existing call sites of the merge helper were taking the same actions independently from whether the merge was successful or not, so this patch simplifies that. Test Plan: make clean all check Reviewers: rven, sdong, yhchiang, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43353	2015-08-17 17:34:38 -07:00
sdong	72613657f0	Measure file read latency histogram per level Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled. Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44193	2015-08-14 17:32:42 -07:00
sdong	603b6da8b8	Add options.compaction_measure_io_stats to print write I/O stats in compactions Summary: Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs: 2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]} Add two more counters in iostats_context. Also add a parameter of db_bench. Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44115	2015-08-13 16:52:26 -07:00
Andres Notzli	68f934355a	Better CompactionJob testing Summary: Changed compaction_job_test to support better/more thorough tests and added two tests. Also changed MockFileContents to order using InternalKeyComparator. Test Plan: make compaction_job_test && ./compaction_job_test; make all && make check Reviewers: sdong, rven, igor, yhchiang, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42837	2015-08-07 21:59:51 -07:00
sdong	3ae386eafe	Add statistic histogram "rocksdb.sst.read.micros" Summary: Measure read latency histogram and put in statistics. Compaction inputs are excluded from it when possible (unfortunately usually no possible as we usually take table reader from table cache. Test Plan: Run db_bench and it shows the stats, like: rocksdb.sst.read.micros statistics Percentiles :=> 50 : 1.238522 95 : 2.529740 99 : 3.912180 Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, MarkCallaghan, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43275	2015-08-05 13:02:33 -07:00
Ari Ekmekji	40c64434d4	Parallelize L0-L1 Compaction: Restructure Compaction Job Summary: As of now compactions involving files from Level 0 and Level 1 are single threaded because the files in L0, although sorted, are not range partitioned like the other levels. This means that during L0-L1 compaction each file from L1 needs to be merged with potentially all the files from L0. This attempt to parallelize the L0-L1 compaction assigns a thread and a corresponding iterator to each L1 file that then considers only the key range found in that L1 file and only the L0 files that have those keys (and only the specific portion of those L0 files in which those keys are found). In this way the overlap is minimized and potentially eliminated between different iterators focusing on the same files. The first step is to restructure the compaction logic to break L0-L1 compactions into multiple, smaller, sequential compactions. Eventually each of these smaller jobs will be run simultaneously. Areas to pay extra attention to are # Correct aggregation of compaction job statistics across multiple threads # Proper opening/closing of output files (make sure each thread's is unique) # Keys that span multiple L1 files # Skewed distributions of keys within L0 files Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test Reviewers: igor, noetzli, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42699	2015-08-03 11:32:14 -07:00
Andres Notzli	d06c82e477	Further cleanup of CompactionJob and MergeHelper Summary: Simplified logic in CompactionJob and removed unused parameter in MergeHelper. Test Plan: make && make check Reviewers: rven, igor, sdong, yhchiang Reviewed By: sdong Subscribers: aekmekji, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42687	2015-07-28 19:21:55 -07:00
Andres Notzli	e95c59cd2f	Count number of corrupt keys during compaction Summary: For task #7771355, we would like to log the number of corrupt keys during a compaction. This patch implements and tests the count as part of CompactionJobStats. Test Plan: make && make check Reviewers: rven, igor, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42921	2015-07-28 16:41:40 -07:00
sdong	6e9fbeb27c	Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future. Test Plan: Run all existing unit tests. Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42321	2015-07-17 16:58:18 -07:00
Igor Canadi	35ca59364c	Don't let flushes preempt compactions Summary: When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler. We have a special case that when you set max_background_flushes to 0, we schedule the flush to execute on the compaction thread. Test Plan: make check (there might be some unit tests that depend on this behavior) Reviewers: IslamAbdelRahman, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41931	2015-07-17 12:02:52 -07:00
Igor Canadi	a96fcd09b7	Deprecate CompactionFilterV2 Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature. Test Plan: make check Reviewers: noetzli, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42405	2015-07-17 18:59:11 +02:00
Andres Notzli	6b2d44b2ff	Refactoring of writing key/value pairs Summary: Before, writing key/value pairs out to files was done inside ProcessKeyValueCompaction(). To make ProcessKeyValueCompaction() more understandable, this patch moves the writing part to a separate function. This is intended to be a stepping stone for additional changes. Test Plan: make && make check Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42243	2015-07-15 09:55:45 -07:00
Igor Canadi	9a6a0bd8c9	Style fix in compaction_job.cc Summary: I didn't know this can work :) Test Plan: none Reviewers: sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42081	2015-07-15 09:36:47 +02:00
Andres Notzli	ddad40e930	Fixed nullptr deref and added assert Summary: Fixes two minor issues in CompactionJob. CompactionJob::Run() dereferences log_buffer_ without a check, so this patch adds an assert in the constructor where log_buffer_ is assigned. compaction_job_stats_ can be null but ProcessKeyValueCompaction was dereferencing it without a check. Test Plan: make && make check Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42231	2015-07-14 23:12:34 -07:00
Andres Notzli	ae29495e4b	Avoid manipulating const char* arrays Summary: We were manipulating `const char*` arrays in CompactionJob to change the sequence number/types of keys. This patch changes UpdateInternalKey() to use string methods to do the manipulation and updates all calls accordingly. Test Plan: Added test case for UpdateInternalKey() in dbformat_test. make && make check Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41985	2015-07-14 00:21:41 -07:00
Andres Notzli	ab137af4ba	Partial cleanup of CompactionJob Summary: Logging, dealing with key prefix batches and updating stats moved from CompactionJob::Run() into separate functions. Test Plan: make all && make check Reviewers: sdong, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41919	2015-07-14 00:09:20 -07:00
Ari Ekmekji	8bca83e5dd	Add tombstone information in CompactionJobStats Summary: Added new statistics in CompactionJobStats to keep track of deletion entries and the expiration of those entries. Updated these fields in compaction_job.cc as compaction took place and wrote a new test in compaction_job_stats_test.cc to verify accuracy. Test Plan: Wrote new test DeletionStatsTest in compaction_job_stats_test.cc to verify Reviewers: sdong, igor, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41355	2015-07-13 15:51:38 -07:00
sdong	f9728640f3	"make format" against last 10 commits Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more. Test Plan: Build it. Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41961	2015-07-13 13:50:18 -07:00
Poornima Chozhiyath Raman	4bed00a44b	Fix function name format according to google style Summary: Change the naming style of getter and setters according to Google C++ style in compaction.h file Test Plan: Compilation success Reviewers: sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41265	2015-07-08 15:21:10 -07:00
Yueh-Hsuan Chiang	bb1c74ce18	Fixed a bug of CompactionStats in multi-level universal compaction case Summary: Universal compaction can involves in multiple levels. However, the current implementation of bytes_readn and bytes_readnp1 (and some other stats with postfix `n` and `np1`) assumes compaction can only have two levels. This patch fixes this bug and redefines bytes_readn and bytes_readnp1: * bytes_readnp1: the number of bytes read in the compaction output level. * bytes_readn: the total number of bytes read minus bytes_readnp1 Test Plan: Add a test in compaction_job_stats_test Reviewers: igor, sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40239	2015-06-17 23:40:34 -07:00
sdong	75d7075a8a	Print info message about files need compaction for debuging purpose Summary: When there are files marked for compaction after compactions, print extra messages to help debugging. Example: 2015/06/08-23:12:55.212855 7ff5013ff700 [default] [JOB 121] Generated table #75: 54 keys, 4807 bytes (need compaction) 2015/06/08-23:12:55.556194 7ff5013ff700 (Original Log Time 2015/06/08-23:12:55.556160) [default] compacted to: base level 1 max bytes base 10240 files[0 1 9 32 12 0 0 0] max score 0.96 (2 files need compaction), MB/sec: 0.0 rd, 0.1 wr, level 2, files in(1, 3) out(5) MB in(0.0, 0.0) out(0.0), read-write-amplify(11.3) write-amplify(5.7) OK, records in: 40, records dropped: 0 Test Plan: Run test and see LOG files. valgrind test DBTest.TablePropertiesNeedCompactTest Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, igor Reviewed By: igor Subscribers: yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39771	2015-06-09 11:23:29 -07:00
sdong	6df589b446	Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files Summary: It is experimental. Allow users to return from a call back function TablePropertiesCollector::NeedCompact(), based on the data in the file. It can be used to allow users to suggest DB to clear up delete tombstones faster. Test Plan: Add a unit test. Reviewers: igor, yhchiang, kradhakrishnan, rven Reviewed By: rven Subscribers: yoshinorim, MarkCallaghan, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39585	2015-06-05 20:18:21 -07:00
Islam AbdelRahman	3ce3bb3da2	Allowing L0 -> L1 trivial move on sorted data Summary: This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping The conditions for trivial move have been updated Introduced conditions: - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual) - Input level files cannot be overlapping Removed conditions: - Trivial move only run when the compaction is not manual - Input level should can contain only 1 file More context on what tests failed because of Trivial move ``` DBTest.CompactionsGenerateMultipleFiles This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1 ``` ``` DBTest.NoSpaceCompactRange This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation because trivial move does not need any extra space, and did not check for that ``` ``` DBTest.DropWrites Similar to DBTest.NoSpaceCompactRange ``` ``` DBTest.DeleteObsoleteFilesPendingOutputs This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3 ``` ``` CuckooTableDBTest.CompactionIntoMultipleFiles Same as DBTest.CompactionsGenerateMultipleFiles ``` This diff is based on a work by @sdong https://reviews.facebook.net/D34149 Test Plan: make -j64 check Reviewers: rven, sdong, igor Reviewed By: igor Subscribers: yhchiang, ott, march, dhruba, sdong Differential Revision: https://reviews.facebook.net/D34797	2015-06-04 16:51:25 -07:00
Yueh-Hsuan Chiang	bb808eaddb	Changed the CompactionJobStats::output_key_prefix type from char[] to string. Summary: Keys in RocksDB can be arbitrary byte strings. However, in the current CompactionJobStats, smallest_output_key_prefix and largest_output_key_prefix are of type char[] without having a length, which is insufficient to handle non-null terminated strings. This patch change their type to std::string. Test Plan: compaction_job_stats_test Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39537	2015-06-04 12:31:12 -07:00
Yueh-Hsuan Chiang	fe5c6321cb	Allow EventListener::OnCompactionCompleted to return CompactionJobStats. Summary: Allow EventListener::OnCompactionCompleted to return CompactionJobStats, which contains useful information about a compaction. Example CompactionJobStats returned by OnCompactionCompleted(): smallest_output_key_prefix 05000000 largest_output_key_prefix 06990000 elapsed_time 42419 num_input_records 300 num_input_files 3 num_input_files_at_output_level 2 num_output_records 200 num_output_files 1 actual_bytes_input 167200 actual_bytes_output 110688 total_input_raw_key_bytes 5400 total_input_raw_value_bytes 300000 num_records_replaced 100 is_manual_compaction 1 Test Plan: Developed a mega test in db_test which covers 20 variables in CompactionJobStats. Reviewers: rven, igor, anthony, sdong Reviewed By: sdong Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38463	2015-06-02 17:07:16 -07:00
Yueh-Hsuan Chiang	fc83821270	Add EventListener::OnTableFileCreated() Summary: Add EventListener::OnTableFileCreated(), which will be called when a table file is created. This patch is part of the EventLogger and EventListener integration. Test Plan: Augment existing test in db/listener_test.cc Reviewers: anthony, kradhakrishnan, rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38865	2015-06-02 14:12:23 -07:00
Yueh-Hsuan Chiang	ec4ff4e99c	Rename EventLoggerHelpers EventHelpers Summary: Rename EventLoggerHelpers EventHelpers, as it's going to include all event-related helper functions instead of EventLogger only stuffs. Test Plan: make Reviewers: sdong, rven, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39093	2015-05-28 13:37:47 -07:00
Igor Canadi	dbd95b7532	Add more table properties to EventLogger Summary: Example output: {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}} Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter. Test Plan: make check + check out the output in the log Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38343	2015-05-12 15:53:55 -07:00
Yueh-Hsuan Chiang	77a5a543a5	Allow GetThreadList() to report basic compaction operation properties. Summary: Now we're able to show more details about a compaction in GetThreadList() :) This patch allows GetThreadList() to report basic compaction operation properties. Basic compaction properties include: 1. job id 2. compaction input / output level 3. compaction property flags (is_manual, is_deletion, .. etc) 4. total input bytes 5. the number of bytes has been read currently. 6. the number of bytes has been written currently. Flush operation properties will be done in a seperate diff. Test Plan: /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1 Sample output of tracking same job: ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 31.357 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 59.440 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 226.375 ms CompactionJob::Install BaseInputLevel 1 \| BytesRead 3958013 \| BytesWritten 3621940 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37653	2015-05-06 22:51:06 -07:00
Igor Canadi	65fe1cfbb3	Cleanup CompactionJob Summary: Couple changes: 1. instead of SnapshotList, just take a vector of snapshots 2. don't take a separate parameter is_snapshots_supported. If there are snapshots in the list, that means they are supported. I actually think we should get rid of this notion of snapshots not being supported. 3. don't pass in mutable_cf_options as a parameter. Lifetime of mutable_cf_options is a bit tricky to maintain, so it's better to not pass it in for the whole compaction job. We only really need it when we install the compaction results. Test Plan: make check Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36627	2015-05-05 19:01:12 -07:00
Igor Canadi	1bb4928da9	Include bunch of more events into EventLogger Summary: Added these events: * Recovery start, finish and also when recovery creates a file * Trivial move * Compaction start, finish and when compaction creates a file * Flush start, finish Also includes small fix to EventLogger Also added option ROCKSDB_PRINT_EVENTS_TO_STDOUT which is useful when we debug things. I've spent far too much time chasing LOG files. Still didn't get sst table properties in JSON. They are written very deeply into the stack. I'll address in separate diff. TODO: * Write specification. Let's first use this for a while and figure out what's good data to put here, too. After that we'll write spec * Write tools that parse and analyze LOGs. This can be in python or go. Good intern task. Test Plan: Ran db_bench with ROCKSDB_PRINT_EVENTS_TO_STDOUT. Here's the output: https://phabricator.fb.com/P19811976 Reviewers: sdong, yhchiang, rven, MarkCallaghan, kradhakrishnan, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37521	2015-04-27 15:20:02 -07:00
sdong	397b6588bd	options.paranoid_file_checks to read all rows after writing to a file. Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows. Test Plan: Add a new unit test for it Reviewers: rven, igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37335	2015-04-23 11:34:35 -07:00
Igor Canadi	47b8743984	Make Compaction class easier to use Summary: The goal of this diff is to make Compaction class easier to use. This should also make new compaction algorithms easier to write (like CompactFiles from @yhchiang and dynamic leveled and multi-leveled universal from @sdong). Here are couple of things demonstrating that Compaction class is hard to use: 1. we have two constructors of Compaction class 2. there's this thing called grandparents_, but it appears to only be setup for leveled compaction and not compactfiles 3. it's easy to introduce a subtle and dangerous bug like this: D36225 4. SetupBottomMostLevel() is hard to understand and it shouldn't be. See this comment: `afbafeaeae/db/compaction.cc (L236-L241)`. It also made it harder for @yhchiang to write CompactFiles, as evidenced by this: `afbafeaeae/db/compaction_picker.cc (L204-L210)` The problem is that we create Compaction object, which holds a lot of state, and then pass it around to some functions. After those functions are done mutating, then we call couple of functions on Compaction object, like SetupBottommostLevel() and MarkFilesBeingCompacted(). It is very hard to see what's happening with all that Compaction's state while it's travelling across different functions. If you're writing a new PickCompaction() function you need to try really hard to understand what are all the functions you need to run on Compaction object and what state you need to setup. My proposed solution is to make important parts of Compaction immutable after construction. PickCompaction() should calculate compaction inputs and then pass them onto Compaction object once they are finalized. That makes it easy to create a new compaction -- just provide all the parameters to the constructor and you're done. No need to call confusing functions after you created your object. This diff doesn't fully achieve that goal, but it comes pretty close. Here are some of the changes: * have one Compaction constructor instead of two. * inputs_ is constant after construction * MarkFilesBeingCompacted() is now private to Compaction class and automatically called on construction/destruction. * SetupBottommostLevel() is gone. Compaction figures it out on its own based on the input. * CompactionPicker's functions are not passing around Compaction object anymore. They are only passing around the state that they need. Test Plan: make check make asan_check make valgrind_check Reviewers: rven, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: sdong, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36687	2015-04-10 15:01:54 -07:00
sdong	953a885ebf	A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge Summary: Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it. Also refactor the codes so that (1) make table property collector and internal table property collector two separate data structures with the later one now exposed (2) table builders only receive internal table properties Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector. Reviewers: yhchiang, igor.sugak, rven, igor Reviewed By: rven, igor Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D35373	2015-04-06 10:27:21 -07:00

1 2

74 Commits