rocksdb

Author	SHA1	Message	Date
Igor Canadi	2dc3910b5e	Add --benchmark_write_rate_limit option to db_bench Summary: So far, we benchmarked RocksDB by writing as fast as possible. With this change, we're able to limit our write throughput, which should help us better understand how RocksDB performes under varying write workloads. Specifically, I'm currently interested in the shape of the graph that has write throughput on one axis and write rate on another. This should help us with designing our stall system, as we have started to do with D36351. Test Plan: $ ./db_bench --benchmarks=fillrandom --benchmark_write_rate_limit=1000000 fillrandom : 118.523 micros/op 8437 ops/sec; 0.9 MB/s $ ./db_bench --benchmarks=fillrandom --benchmark_write_rate_limit=2000000 fillrandom : 59.136 micros/op 16910 ops/sec; 1.9 MB/s Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39759	2015-06-17 16:44:52 -07:00
Islam AbdelRahman	12e030a992	Use CompactRangeOptions for CompactRange Summary: This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters Old CompactRange is still available but deprecated Test Plan: make all check make rocksdbjava USE_CLANG=1 make all OPT=-DROCKSDB_LITE make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40209	2015-06-17 14:36:14 -07:00
Igor Canadi	4716ab4d16	Merge pull request #638 from HolodovAlexander/master C api: human-readable statistics	2015-06-17 13:16:20 -07:00
Igor Canadi	25d600569d	Clean up InstallSuperVersion Summary: We go to great lengths to make sure MaybeScheduleFlushOrCompaction() is called outside of write thread. But anyway, it's still called in the mutex, so it's not that much cheaper. This diff removes the "optimization" and cleans up the code a bit. Test Plan: make check Reviewers: rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40113	2015-06-17 12:37:59 -07:00
Yueh-Hsuan Chiang	1a08d0beb5	Block c_test in ROCKSDB_LITE Summary: Block c_test in ROCKSDB_LITE as it's not supported in ROCKSDB_LITE. Test Plan: c_test Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40257	2015-06-17 10:54:51 -07:00
sdong	40f562e747	Allow GetApproximateSize() to include mem table size if it is skip list memtable Summary: Add an option in GetApproximateSize() so that the result will include estimated sizes in mem tables. To implement it, implement an estimated count from the beginning to a key in skip list. The approach is to count to find the entry, how many Next() is issued from each level, and sum them with a weight that is <branching factor> ^ <level>. Test Plan: Add a test case Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40119	2015-06-16 18:13:23 -07:00
Igor Canadi	d59d90bb1f	db_bench periodically writes QPS to CSV file Summary: This is part of an effort to better understand and optimize RocksDB stalls under high load. I added a feature to db_bench to periodically write QPS to CSV files. That way we can nicely see how our QPS changes in time (especially when DB is stalled) and can do a better job of evaluating our stall system (i.e. we want the QPS to be as constant as possible, as opposed to having bunch of stalls) Cool part of CSV files is that we can easily graph them -- there are a bunch of tools available. Test Plan: Ran ./db_bench --report_interval_seconds=10 --benchmarks=fillrandom --num=10000000 and observed this in report.csv: secs_elapsed,interval_qps 10,2725860 20,1980480 30,1863456 40,1454359 50,1460389 Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40047	2015-06-12 14:31:53 -07:00
Yueh-Hsuan Chiang	5fec963877	Fixed false alarm of size comparison in compaction_job_stats_test Summary: Fixed false alarm of size comparison in compaction_job_stats_test Test Plan: compaction_job_stats_test Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39921	2015-06-12 10:44:07 -07:00
Islam AbdelRahman	cccd2199a6	Revert skip bottommost compaction Summary: Reverting this diff https://reviews.facebook.net/D39999 Will add an option to force bottom most level compaction and then re submit it Test Plan: make check Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40041	2015-06-12 10:43:33 -07:00
Islam AbdelRahman	20f2b54252	Skip bottom most level compaction if no compaction filter Summary: If we don't have a compaction filter then we can skip compacting the bottom most level Test Plan: make check added unit tests Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39999	2015-06-12 09:56:08 -07:00
sdong	7842920be5	Slow down writes by bytes written Summary: We slow down data into the database to the rate of options.delayed_write_rate (a new option) with this patch. The thread synchronization approach I take is to still synchronize write controller by DB mutex and GetDelay() is inside DB mutex. Try to minimize the frequency of getting time in GetDelay(). I verified it through db_bench and it seems to work hard_rate_limit is deprecated. options.delayed_write_rate is still not dynamically changeable. Need to work on it as a follow-up. Test Plan: Add new unit tests in db_test Reviewers: yhchiang, rven, kradhakrishnan, anthony, MarkCallaghan, igor Reviewed By: igor Subscribers: ikabiljo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36351	2015-06-11 20:42:18 -07:00
Igor Canadi	a84df655f3	Don't let two L0->L1 compactions run in parallel Summary: With experimental feature SuggestCompactRange() we don't restrict running two L0->L1 compactions in parallel. This diff fixes this. Test Plan: added a unit test to reproduce the failure. fixed the unit test Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39981	2015-06-11 15:42:16 -07:00
Islam AbdelRahman	d6ce0f7c61	Add largest sequence to FlushJobInfo Summary: Adding largest sequence number to FlushJobInfo and passing flushed file metadata to NotifyOnFlushCompleted which include alot of other values that we may want to expose in FlushJobInfo Test Plan: make check Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39927	2015-06-11 15:22:22 -07:00
Yueh-Hsuan Chiang	3eddd1abe9	Add Env::GetThreadID(), which returns the ID of the current thread. Summary: Add Env::GetThreadID(), which returns the ID of the current thread. In addition, make GetThreadList() and InfoLog use same unique ID for the same thread. Test Plan: db_test listener_test Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39735	2015-06-11 14:18:02 -07:00
Islam AbdelRahman	73faa3d41d	Handling edge cases for ReFitLevel Summary: Right now the level we pass to ReFitLevel is the maximum level with files (before compaction), there are multiple cases where this maximum level have changed after compaction - all files where in L0 (now maximum level is L1) - using kCompactionStyleUniversal (now maximum level in the last level) - level_compaction_dynamic_level_bytes ?? We can handle each of these cases individually, but I felt it's safer to calculate max_level_with_files again if we want to do a ReFitLevel Test Plan: adding some tests make -j64 check Reviewers: igor, sdong Reviewed By: sdong Subscribers: ott, dhruba Differential Revision: https://reviews.facebook.net/D39663	2015-06-11 14:15:52 -07:00
Reed Allman	735df66552	C: add WriteBatch.PutLogData support	2015-06-10 00:12:33 -07:00
sdong	e409d3d745	Make "make all" work for CYGWIN Summary: Some test and benchmark codes don't build for CYGWIN. Fix it. Test Plan: Build "make all" with TARGET_OS=Cygwin on cygwin and make sure it passes. Reviewers: rven, yhchiang, anthony, igor, kradhakrishnan Reviewed By: igor, kradhakrishnan Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39711	2015-06-09 16:36:07 -07:00
sdong	75d7075a8a	Print info message about files need compaction for debuging purpose Summary: When there are files marked for compaction after compactions, print extra messages to help debugging. Example: 2015/06/08-23:12:55.212855 7ff5013ff700 [default] [JOB 121] Generated table #75: 54 keys, 4807 bytes (need compaction) 2015/06/08-23:12:55.556194 7ff5013ff700 (Original Log Time 2015/06/08-23:12:55.556160) [default] compacted to: base level 1 max bytes base 10240 files[0 1 9 32 12 0 0 0] max score 0.96 (2 files need compaction), MB/sec: 0.0 rd, 0.1 wr, level 2, files in(1, 3) out(5) MB in(0.0, 0.0) out(0.0), read-write-amplify(11.3) write-amplify(5.7) OK, records in: 40, records dropped: 0 Test Plan: Run test and see LOG files. valgrind test DBTest.TablePropertiesNeedCompactTest Reviewers: rven, yhchiang, kradhakrishnan, IslamAbdelRahman, igor Reviewed By: igor Subscribers: yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39771	2015-06-09 11:23:29 -07:00
Venkatesh Radhakrishnan	406a5682eb	Fix hang when closing a DB after doing loads with WAL disabled. Summary: There is a hang during DB close in the following scenario: a) a load with WAL disabled was done, b) CancelAllBackgroundWork was called, c) DB Close was called This was because in that we will wait for a flush but we cannot do a background flush because we have called CancelAllBackgroundWork which marks the DB as shutting downn. Test Plan: Added DBTest FlushOnDestroy Reviewers: sdong Reviewed By: sdong Subscribers: yoshinorim, hermanlee4, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39747	2015-06-09 10:39:49 -07:00
sdong	d8c8f08c12	GetSnapshot() and ReleaseSnapshot() to move new and free out of DB mutex Summary: We currently issue malloc and free inside DB mutex in GetSnapshot() and ReleaseSnapshot(). Move them out. Test Plan: Go through all tests make valgrind_check Reviewers: yhchiang, rven, IslamAbdelRahman, anthony, igor Reviewed By: igor Subscribers: maykov, hermanlee4, MarkCallaghan, yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39753	2015-06-08 21:57:02 -07:00
Islam AbdelRahman	643bbbf081	Use nullptr for default compaction_filter_factory Summary: Replacing the default value for compaction_filter_factory and compaction_filter_factory_v2 to be nullptr instead of DefaultCompactionFilterFactory / DefaultCompactionFilterFactoryV2 The reason for this is to be able to determine easily if we have compaction filter factory or not without depending on RTTI Test Plan: make check Reviewers: yoshinorim, ott, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39693	2015-06-08 16:34:26 -07:00
Igor Canadi	f02ce0c651	Fix ASAN errors in c_test Summary: key_sizes claims that 3rd key is of length 8, but it's really only 3. This diff makes it length 8. Test Plan: asan c_test works again. Reviewers: sdong, yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39699	2015-06-08 11:28:40 -07:00
Igor Canadi	133130a4f5	Merge pull request #625 from rdallman/c-slice-parts-support C: add support for WriteBatch SliceParts params	2015-06-08 13:14:44 -04:00
Igor Canadi	de4d172d0f	Merge pull request #622 from rdallman/c-multiget C: add MultiGet support	2015-06-08 13:13:54 -04:00
sdong	6df589b446	Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files Summary: It is experimental. Allow users to return from a call back function TablePropertiesCollector::NeedCompact(), based on the data in the file. It can be used to allow users to suggest DB to clear up delete tombstones faster. Test Plan: Add a unit test. Reviewers: igor, yhchiang, kradhakrishnan, rven Reviewed By: rven Subscribers: yoshinorim, MarkCallaghan, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39585	2015-06-05 20:18:21 -07:00
Yueh-Hsuan Chiang	2e764f06ea	[API Change] Improve EventListener::OnFlushCompleted interface Summary: EventListener::OnFlushCompleted() now passes a structure instead of a list of parameters. This minimizes the API change in the future. Test Plan: listener_test compact_files_test example/compact_files_example Reviewers: kradhakrishnan, sdong, IslamAbdelRahman, rven, igor Reviewed By: rven, igor Subscribers: IslamAbdelRahman, rven, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39543	2015-06-05 12:28:51 -07:00
Yueh-Hsuan Chiang	7322c74012	Revert incorrect commit Summary: Revert incorrect commit Test Plan: db_test Reviewers: sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39651	2015-06-05 11:23:09 -07:00
Islam AbdelRahman	31e60e2a77	Unlock mutex in ReFitLevel Summary: I encountered an issue where the database hang, it looks like the mutex is not unlocked on return in ReFitLevel function Test Plan: make -j64 check Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39609	2015-06-05 11:06:14 -07:00
Yueh-Hsuan Chiang	7647df8f9e	Fixed the tsan failure in util/compaction_job_stats_impl.cc Summary: The type of smallest_output_key_prefix and largest_output_key_prefix have been changed to std::string in https://reviews.facebook.net/D39537. As a result, we shouldn't do smallest_output_key_prefix[0] = 0 in the initialization. Test Plan: compile db_test with tsan enabled and repeat DBTest.CompactionDeletionTrigger test to verify the tsan issue has been gone. Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39645	2015-06-05 11:05:35 -07:00
Igor Canadi	b2785472c8	Fix compile Summary: This commit broke the compile: `3ce3bb3da2` As evidenced here: https://evergreen.mongodb.com/task/mongodb_mongo_master_ubuntu1404_rocksdb_compile_ce2b1d11d42de93f7b375f7e6c41fb709f66e969_15_06_04_23_09_36 This should fix it Test Plan: make check Reviewers: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39627	2015-06-05 09:41:45 -04:00
Islam AbdelRahman	3ce3bb3da2	Allowing L0 -> L1 trivial move on sorted data Summary: This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping The conditions for trivial move have been updated Introduced conditions: - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual) - Input level files cannot be overlapping Removed conditions: - Trivial move only run when the compaction is not manual - Input level should can contain only 1 file More context on what tests failed because of Trivial move ``` DBTest.CompactionsGenerateMultipleFiles This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1 ``` ``` DBTest.NoSpaceCompactRange This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation because trivial move does not need any extra space, and did not check for that ``` ``` DBTest.DropWrites Similar to DBTest.NoSpaceCompactRange ``` ``` DBTest.DeleteObsoleteFilesPendingOutputs This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3 ``` ``` CuckooTableDBTest.CompactionIntoMultipleFiles Same as DBTest.CompactionsGenerateMultipleFiles ``` This diff is based on a work by @sdong https://reviews.facebook.net/D34149 Test Plan: make -j64 check Reviewers: rven, sdong, igor Reviewed By: igor Subscribers: yhchiang, ott, march, dhruba, sdong Differential Revision: https://reviews.facebook.net/D34797	2015-06-04 16:51:25 -07:00
Yueh-Hsuan Chiang	bb808eaddb	Changed the CompactionJobStats::output_key_prefix type from char[] to string. Summary: Keys in RocksDB can be arbitrary byte strings. However, in the current CompactionJobStats, smallest_output_key_prefix and largest_output_key_prefix are of type char[] without having a length, which is insufficient to handle non-null terminated strings. This patch change their type to std::string. Test Plan: compaction_job_stats_test Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39537	2015-06-04 12:31:12 -07:00
Yueh-Hsuan Chiang	0b3172d071	Add EventListener::OnTableFileDeletion() Summary: Add EventListener::OnTableFileDeletion(), which will be called when a table file is deleted. Test Plan: Extend three existing tests in db_test to verify the deleted files. Reviewers: rven, anthony, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38931	2015-06-03 19:57:01 -07:00
Reed Allman	211a195d41	C: add MultiGet support	2015-06-03 17:57:42 -07:00
Reed Allman	5dc174e11a	C: add support for WriteBatch SliceParts params	2015-06-03 17:08:00 -07:00
Igor Canadi	2d0b9e5f0a	Fix compile on darwin Summary: We need to start doing some CI on Macs. Test Plan: works now Reviewers: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39489	2015-06-03 16:52:51 -04:00
sdong	3af668ed17	Fix DBTest.MigrateToDynamicLevelMaxBytesBase slowness with valgrind Summary: DBTest.MigrateToDynamicLevelMaxBytesBase with valgrind test is extremely slow. Work it around by not having both threads running everything non-stop. Test Plan: Run the test with valgrind which used to take too long to finish and see it finish in reasonable time. Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39477	2015-06-03 12:08:37 -07:00
Igor Canadi	408cc4b8e0	Revert "Merge pull request #621 from rdallman/c-slice-parts-support" This reverts commit `78382d4ba7`, reversing changes made to `ca8b85ac04`.	2015-06-03 13:34:07 -04:00
Igor Canadi	78382d4ba7	Merge pull request #621 from rdallman/c-slice-parts-support C: add support for WriteBatch SliceParts params	2015-06-03 13:16:14 -04:00
Yueh-Hsuan Chiang	0483dab2ab	Remove a TODO that has been done Summary: Remove a TODO that has been done Test Plan: make Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39429	2015-06-02 18:38:57 -07:00
Yueh-Hsuan Chiang	8afafc2783	Fix compile warning in db/db_impl Summary: Fix the following compile warning in db/db_impl db/db_impl.cc:1603:19: error: implicit conversion loses integer precision: 'const uint64_t' (aka 'const unsigned long') to 'int' [-Werror,-Wshorten-64-to-32] info.job_id = job_id; ~ ^~~~~~ Test Plan: db_test Reviewers: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39423	2015-06-02 17:36:45 -07:00
Yueh-Hsuan Chiang	fe5c6321cb	Allow EventListener::OnCompactionCompleted to return CompactionJobStats. Summary: Allow EventListener::OnCompactionCompleted to return CompactionJobStats, which contains useful information about a compaction. Example CompactionJobStats returned by OnCompactionCompleted(): smallest_output_key_prefix 05000000 largest_output_key_prefix 06990000 elapsed_time 42419 num_input_records 300 num_input_files 3 num_input_files_at_output_level 2 num_output_records 200 num_output_files 1 actual_bytes_input 167200 actual_bytes_output 110688 total_input_raw_key_bytes 5400 total_input_raw_value_bytes 300000 num_records_replaced 100 is_manual_compaction 1 Test Plan: Developed a mega test in db_test which covers 20 variables in CompactionJobStats. Reviewers: rven, igor, anthony, sdong Reviewed By: sdong Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38463	2015-06-02 17:07:16 -07:00
Yueh-Hsuan Chiang	3083ed2129	Fixed heap-use-after-free error in compaction_job_test.cc Summary: Fixed heap-use-after-free error in compaction_job_test.cc Test Plan: compaction_job_test Reviewers: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39411	2015-06-02 16:23:01 -07:00
Yueh-Hsuan Chiang	ab946af08a	Fix a compile warning in listener_test.cc Summary: Fixed the following compile warning in listener_test.cc: db/listener_test.cc:214:8: error: 'OnTableFileCreated' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override] 14:16:46 void OnTableFileCreated( Test Plan: make listener_test Reviewers: sdong, igor Subscribers: leveldb	2015-06-02 14:20:27 -07:00
Yueh-Hsuan Chiang	fc83821270	Add EventListener::OnTableFileCreated() Summary: Add EventListener::OnTableFileCreated(), which will be called when a table file is created. This patch is part of the EventLogger and EventListener integration. Test Plan: Augment existing test in db/listener_test.cc Reviewers: anthony, kradhakrishnan, rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38865	2015-06-02 14:12:23 -07:00
Yueh-Hsuan Chiang	898e803fc5	Add a stats counter for DB_WRITE back which was mistakenly removed. Summary: Add a stats counter for DB_WRITE back which was mistakenly removed. Test Plan: augment GroupCommitTest Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39399	2015-06-02 12:35:12 -07:00
Mike Kolupaev	ec7a944360	more times in perf_context and iostats_context Summary: We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls: - perf_context.db_mutex_lock_nanos - perf_context.db_condition_wait_nanos - iostats_context.open_time - iostats_context.allocate_time - iostats_context.write_time - iostats_context.range_sync_time - iostats_context.logger_time In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long. Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took. Reviewers: rven, yhchiang, sdong Reviewed By: sdong Subscribers: march, dhruba, tnovak Differential Revision: https://reviews.facebook.net/D39177	2015-06-02 02:07:58 -07:00
sdong	4266d4fd90	Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange() Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true. Test Plan: Add a unit test for it. Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39099	2015-06-01 18:21:14 -07:00
Yueh-Hsuan Chiang	d333820bad	Removed DBImpl::notifying_events_ Summary: DBImpl::notifying_events_ is a internal counter in DBImpl which is used to prevent DB close when DB is notifying events. However, as the current events all rely on either compaction or flush which already have similar counters to prevent DB close, it is safe to remove notifying_events_. Test Plan: listener_test examples/compact_files_example Reviewers: igor, anthony, kradhakrishnan, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39315	2015-06-01 15:32:23 -07:00
Igor Canadi	a187e66ad0	Merge pull request #617 from rdallman/wb-merge-sliceparts WriteBatch.Merge w/ SliceParts support	2015-05-31 13:34:34 -04:00
Igor Canadi	4c181f08bc	Fix compile on darwin Summary: As title Test Plan: make check Reviewers: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39243	2015-05-30 12:25:45 -04:00
agiardullo	bc7a7a400c	fix LITE build Summary: Broken by optimistic transaction diff. (I only built 'release' not 'static_lib' when testing). Test Plan: build Reviewers: yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39219	2015-05-29 15:22:00 -07:00
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	2015-05-29 14:36:35 -07:00
Reed Allman	21cd6b7ad8	C: add support for WriteBatch SliceParts params	2015-05-29 10:23:43 -07:00
Reed Allman	a0635ba3f6	WriteBatch.Merge w/ SliceParts support also hooked up WriteBatchInternal	2015-05-29 04:30:03 -07:00
agiardullo	c815351038	Support saving history in memtable_list Summary: For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit. After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much. This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit). However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions. Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here. Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37443	2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang	ec4ff4e99c	Rename EventLoggerHelpers EventHelpers Summary: Rename EventLoggerHelpers EventHelpers, as it's going to include all event-related helper functions instead of EventLogger only stuffs. Test Plan: make Reviewers: sdong, rven, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39093	2015-05-28 13:37:47 -07:00
Yueh-Hsuan Chiang	672dda9b3b	[API Change] Move listeners from ColumnFamilyOptions to DBOptions Summary: Move listeners from ColumnFamilyOptions to DBOptions Test Plan: listener_test compact_files_test Reviewers: rven, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D39087	2015-05-28 13:21:39 -07:00
Yueh-Hsuan Chiang	3ab8ffd4dd	Compaction now conditionally boosts the size of deletion entries. Summary: Compaction now boosts the size of deletion entries of a file only when the number of deletion entries is greater than the number of non-deletion entries in the file. The motivation here is that in a stable workload, the number of deletion entries should be roughly equal to the number of non-deletion entries. If we compensate the size of deletion entries in a stable workload, the deletion compensation logic might introduce unwanted effet which changes the shape of LSM tree. Test Plan: db_test --gtest_filter="Deletion" Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38703	2015-05-26 14:05:38 -07:00
Igor Canadi	a81ac24127	Merge pull request #615 from rdallman/master C: add more block based table stuff, some aux slice transform/merge ops	2015-05-26 14:19:31 -04:00
Yueh-Hsuan Chiang	6d299b70b8	Fixed a bug in EventLoggerHelpers::LogTableFileCreation Summary: Fixed a missing "}" at the end of the generated JSON Log in EventLoggerHelpers::LogTableFileCreation. Test Plan: db_bench Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38919	2015-05-26 10:55:46 -07:00
Yueh-Hsuan Chiang	a0580205c8	Removed an unused private variable in db_impl.h Summary: Removed an unused private variable in db_impl.h Test Plan: make db_test Reviewers: sdong, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38925	2015-05-26 10:46:26 -07:00
Reed Allman	9c38ce1d02	C: extra bbto / noop slice transform	2015-05-22 22:56:28 -07:00
Igor Canadi	ea6d3a8ac0	Don't skip last level when calculating compaction stats Summary: We have a bug where we don't report the last level's files as being compacted. This fixes it. Test Plan: See the fix in action here: https://phabricator.fb.com/P19845738 Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38727	2015-05-22 15:30:43 -04:00
Yueh-Hsuan Chiang	5c224d1b70	Fixed two bugs on logging file deletion. Summary: This patch fixes the following two bugs on logging file deletion. 1. Previously, file deletion failure was only logged in INFO_LEVEL. This patch changes it to ERROR_LEVEL and does some code clean. 2. EventLogger previously will always generate the same log on table file deletion even when file deletion is not successful. Now the resulting status of file deletion will also be logged. Test Plan: make all check Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38817	2015-05-22 12:10:51 -07:00
Yueh-Hsuan Chiang	dc81efe415	Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL Summary: Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL Test Plan: Use db_bench to verify the log level. Sample output: 2015/05/22-00:20:39.778064 7fff75b41300 [WARN] RocksDB version: 3.11.0 2015/05/22-00:20:39.778095 7fff75b41300 [WARN] Git sha rocksdb_build_git_sha:7fee8775a459134c4cb04baae5bd1687e268f2a0 2015/05/22-00:20:39.778099 7fff75b41300 [WARN] Compile date May 22 2015 2015/05/22-00:20:39.778101 7fff75b41300 [WARN] DB SUMMARY 2015/05/22-00:20:39.778145 7fff75b41300 [WARN] SST files in /tmp/rocksdbtest-691931916/dbbench dir, Total Num: 0, files: 2015/05/22-00:20:39.778148 7fff75b41300 [WARN] Write Ahead Log file in /tmp/rocksdbtest-691931916/dbbench: 2015/05/22-00:20:39.778150 7fff75b41300 [WARN] Options.error_if_exists: 0 2015/05/22-00:20:39.778152 7fff75b41300 [WARN] Options.create_if_missing: 1 2015/05/22-00:20:39.778153 7fff75b41300 [WARN] Options.paranoid_checks: 1 Reviewers: MarkCallaghan, igor, kradhakrishnan Reviewed By: igor Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38835	2015-05-22 11:54:59 -07:00
Yueh-Hsuan Chiang	687214f878	Ensure ColumnFamilyOptions.num_levels >= 2 when level compaction is used. Summary: Ensure ColumnFamilyOptions.num_levels >= 2 when level compaction is used. Test Plan: Extend SanitizeOptions test in column_family_test Reviewers: sdong, rven, anthony, krishnanm86, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38829	2015-05-22 11:35:40 -07:00
Yueh-Hsuan Chiang	2abb592688	Avoid logging under mutex in DBImpl::WriteLevel0TableForRecovery(). Summary: Avoid logging under mutex in DBImpl::WriteLevel0TableForRecovery(). Test Plan: make all check Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38823	2015-05-22 11:24:12 -07:00
Yueh-Hsuan Chiang	7fee8775a4	Allow EventLogger to directly log from a JSONWriter. Summary: Allow EventLogger to directly log from a JSONWriter. This allows the JSONWriter to be shared by EventLogger and potentially EventListener, which is an important step to integrate EventLogger and EventListener. This patch also rewrites EventLoggerHelpers::LogTableFileCreation(), which uses the new API to generate identical log. Test Plan: Run db_bench in debug mode and make sure the log is correct and no assertions fail. Reviewers: sdong, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38709	2015-05-21 15:39:30 -07:00
Igor Canadi	7a3577519f	Don't artificially inflate L0 score Summary: This turns out to be pretty bad because if we prioritize L0->L1 then L1 can grow artificially large, which makes L0->L1 more and more expensive. For example: 256MB @ L0 + 256MB @ L1 --> 512MB @ L1 256MB @ L0 + 512MB @ L1 --> 768MB @ L1 256MB @ L0 + 768MB @ L1 --> 1GB @ L1 .... 256MB @ L0 + 10GB @ L1 --> 10.2GB @ L1 At some point we need to start compacting L1->L2 to speed up L0->L1. Test Plan: The performance improvement is massive for heavy write workload. This is the benchmark I ran: https://phabricator.fb.com/P19842671. Before this change, the benchmark took 47 minutes to complete. After, the benchmark finished in 2minutes. You can see full results here: https://phabricator.fb.com/P19842674 Also, we ran this diff on MongoDB on RocksDB on one replicaset. Before the change, our initial sync was so slow that it couldn't keep up with primary writes. After the change, the import finished without any issues Reviewers: dynamike, MarkCallaghan, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38637	2015-05-21 11:40:48 -07:00
Yueh-Hsuan Chiang	e2c1d4b57f	[Public API Change] Make DB::GetDbIdentity() be const function. Summary: Make DB::GetDbIdentity() be const function. Test Plan: make db_test Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38745	2015-05-21 11:01:48 -07:00
Yueh-Hsuan Chiang	812c461c96	Dump db stats in WARN level Summary: Dump db stats in WARN level Test Plan: run db_bench and verify the LOG Reviewers: igor, MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38691	2015-05-19 18:42:17 -07:00
Mark Callaghan	944043d683	Add --wal_bytes_per_sync for db_bench and more IO stats Summary: See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats. I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job (IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes. The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the Interval compaction line uses the interval uptime. So these Cumalative & Interval compaction IO rates cannot be compared to the per-level IO rates. But both forms of the rates are useful for debugging perf. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D38667	2015-05-19 16:19:30 -07:00
Igor Canadi	04feaeebb9	Fix comparison between signed and usigned integers Summary: Not sure why this fails on some compilers and doesn't on others. Test Plan: none Reviewers: meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38673	2015-05-19 10:59:30 -07:00
Igor Canadi	4a855c0799	Add an option wal_bytes_per_sync to control sync_file_range for WAL files Summary: sync_file_range is not always asyncronous and thus can block writes if we do this for WAL in the foreground thread. See more here: http://yoshinorimatsunobu.blogspot.com/2014/03/how-syncfilerange-really-works.html Some users don't want us to call sync_file_range on WALs. Some other do. Thus, I'm adding a separate option wal_bytes_per_sync to control calling sync_file_range on WAL files. bytes_per_sync will apply only to table files now. Test Plan: no more sync_file_range for WAL as evidenced by strace Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38253	2015-05-18 17:03:59 -07:00
Igor Canadi	b0fdda4ff0	Allow flushes to run in parallel with manual compaction Summary: As title. I spent some time thinking about it and I don't think there should be any issue with running manual compaction and flushes in parallel Test Plan: make check works Reviewers: rven, yhchiang, sdong Reviewed By: yhchiang, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38355	2015-05-18 15:34:33 -07:00
sdong	fb5bdbf987	DBTest.DynamicLevelMaxBytesCompactRange: make sure L0 is not empty before running compact range Summary: DBTest.DynamicLevelMaxBytesCompactRange needs to make sure L0 is not empty to properly cover the code paths we want to cover. However, current codes have a bug that might leave the condition not held. Improve the test to ensure it. Test Plan: Run the test in an environment that is used to fail. Also run it many times. Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38631	2015-05-18 11:49:45 -07:00
sdong	6fa7085121	CompactRange skips levels 1 to base_level -1 for dynamic level base size Summary: CompactRange() now is much more expensive for dynamic level base size as it goes through all the levels. Skip those not used levels between level 0 an base level. Test Plan: Run all unit tests Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37125	2015-05-18 10:54:11 -07:00
Holodov Alexander	eeb44366ba	C api: human-readable statistics	2015-05-16 12:34:28 +04:00
Yueh-Hsuan Chiang	3f0867c0fe	Allow GetThreadList to report Flush properties. Summary: Allow GetThreadList to report Flush properties, which includes: * job id * number of bytes that has been written since flush started. * total size of input mem-tables Test Plan: ./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000 Sample output from db_bench which tracks same flush job ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140213879898240 High Pri default Flush 5789 us FlushJob::WriteLevel0Table BytesMemtables 4112835 \| BytesWritten 577104 \| JobID 8 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140213879898240 High Pri default Flush 30.634 ms FlushJob::WriteLevel0Table BytesMemtables 4112835 \| BytesWritten 1734865 \| JobID 8 \| Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38505	2015-05-15 23:22:22 -07:00
Igor Canadi	7413306d94	Take a chance on a random file when choosing compaction Summary: When trying to compact entire database with SuggestCompactRange(), we'll first try the left-most files. This is pretty bad, because: 1) the left part of LSM tree will be overly compacted, but right part will not be touched 2) First compaction will pick up the left-most file. Second compaction will try to pick up next left-most, but this will not be possible, because there's a big chance that second's file range on N+1 level is already being compacted. I observe both of those problems when running Mongo+RocksDB and trying to compact the DB to clean up tombstones. I'm unable to clean them up :( This diff adds a bit of randomness into choosing a file. First, it chooses a file at random and tries to compact that one. This should solve both problems specified here. Test Plan: make check Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38379	2015-05-15 14:14:40 -07:00
sdong	5aad881298	DBTest.DynamicLevelMaxBytesBase2: remove an unnecesary check Summary: DBTest.DynamicLevelMaxBytesBase2 has a check that is not necessary and may fail. Remove it, and add two unrelated check. Test Plan: Run the test Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38457	2015-05-14 09:22:43 -07:00
sdong	ec43a8b9fb	Universal Compaction with multiple levels won't allocate up to output size Summary: Universal compactions with multiple levels should use file preallocation size based on file size if output level is not level 0 Test Plan: Run all tests. Reviewers: igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38439	2015-05-13 14:15:46 -07:00
sdong	bc68bd5a13	db_bench to support rate limiter Summary: Add --rate_limiter_bytes_per_sec to db_bench to allow rater limit to disk Test Plan: Run ./db_bench --benchmarks=fillseq --num=30000000 --rate_limiter_bytes_per_sec=3000000 --num_multi_db=8 -disable_wal And see io_stats to have the rate limited. Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D38385	2015-05-13 10:03:41 -07:00
Yueh-Hsuan Chiang	df1f87a882	Fixed compile error in db/column_family.cc Summary: Fixed the following compile error in db/column_family.cc db/column_family.cc:633:33: error: ‘ASSERT_GT’ was not declared in this scope 16:14:45 ASSERT_GT(listeners.size(), 0U); Test Plan: make db_test Reviewers: igor, sdong, rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38367	2015-05-12 16:20:03 -07:00
Yueh-Hsuan Chiang	14431e971d	Fixed a bug in EventListener::OnCompactionCompleted(). Summary: Fixed a bug in EventListener::OnCompactionCompleted() that returns incorrect list of input / output file names. Test Plan: Extend existing test in listener_test.cc Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38349	2015-05-12 16:10:23 -07:00
Igor Canadi	dbd95b7532	Add more table properties to EventLogger Summary: Example output: {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}} Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter. Test Plan: make check + check out the output in the log Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38343	2015-05-12 15:53:55 -07:00
Igor Canadi	b5881762bc	Reset parent_index and base_index when picking files marked for compaction Summary: This caused a crash of our MongoDB + RocksDB instance. PickCompactionBySize() sets its own parent_index. We never reset this parent_index when picking PickFilesMarkedForCompactionExperimental(). So we might end up doing SetupOtherInputs() with parent_index that was set by PickCompactionBySize, although we're using compaction calculated using PickFilesMarkedForCompactionExperimental. Test Plan: Added a unit test that fails with assertion on master. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38337	2015-05-12 11:16:25 -07:00
agiardullo	711465ccec	API to fetch from both a WriteBatchWithIndex and the db Summary: Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch. (This is needed for transactions). I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep). Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc. WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)). Test Plan: Added new unit tests. Reviewers: rven, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38037	2015-05-11 14:51:51 -07:00
Igor Canadi	3996fff8a1	Fix clang build - add override Summary: In new clang we need to add override to every overriden function Test Plan: none Reviewers: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D38259	2015-05-09 11:04:14 -07:00
Igor Canadi	d978139063	SuggestCompactRange() is manual compaction Summary: When reporting compaction that was started because of SuggestCompactRange() we should treat it as manual compaction. Test Plan: none Reviewers: yhchiang, rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38139	2015-05-08 19:37:02 -07:00
Yueh-Hsuan Chiang	77a5a543a5	Allow GetThreadList() to report basic compaction operation properties. Summary: Now we're able to show more details about a compaction in GetThreadList() :) This patch allows GetThreadList() to report basic compaction operation properties. Basic compaction properties include: 1. job id 2. compaction input / output level 3. compaction property flags (is_manual, is_deletion, .. etc) 4. total input bytes 5. the number of bytes has been read currently. 6. the number of bytes has been written currently. Flush operation properties will be done in a seperate diff. Test Plan: /db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1 Sample output of tracking same job: ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 31.357 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 59.440 ms CompactionJob::FinishCompactionOutputFile BaseInputLevel 1 \| BytesRead 2264663 \| BytesWritten 1934241 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| ThreadID ThreadType cfName Operation ElapsedTime Stage State OperationProperties 140664171987072 Low Pri default Compaction 226.375 ms CompactionJob::Install BaseInputLevel 1 \| BytesRead 3958013 \| BytesWritten 3621940 \| IsDeletion 0 \| IsManual 0 \| IsTrivialMove 0 \| JobID 277 \| OutputLevel 2 \| TotalInputBytes 3964158 \| Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37653	2015-05-06 22:51:06 -07:00
Igor Canadi	65fe1cfbb3	Cleanup CompactionJob Summary: Couple changes: 1. instead of SnapshotList, just take a vector of snapshots 2. don't take a separate parameter is_snapshots_supported. If there are snapshots in the list, that means they are supported. I actually think we should get rid of this notion of snapshots not being supported. 3. don't pass in mutable_cf_options as a parameter. Lifetime of mutable_cf_options is a bit tricky to maintain, so it's better to not pass it in for the whole compaction job. We only really need it when we install the compaction results. Test Plan: make check Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36627	2015-05-05 19:01:12 -07:00
Laurent Demailly	df4130ad85	fix crashes in stats and compaction filter for db_ttl_impl Summary: fix crashes in stats and compaction filter for db_ttl_impl Test Plan: Ran build with lots of debugging https://reviews.facebook.net/differential/diff/194175/ Reviewers: yhchiang, igor, rven Reviewed By: igor Subscribers: rven, dhruba Differential Revision: https://reviews.facebook.net/D38001	2015-05-05 16:54:47 -07:00
Venkatesh Radhakrishnan	7ea769487f	Fix flakiness in column_family_test Summary: Fixes #6840824, running "make check" on centos6 hits a deadlock in column_family_test Test Plan: seq 10000 \| parallel --gnu --eta 't=/dev/shm/rdb-{}; rm -rf $t; mkdir $t && export TEST_TMPDIR=$t; ./column_family_test > $t/log-{}' Made the test deterministic by narrrowing the window for the flush. Reviewers: igor, meyering Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38079	2015-05-05 15:59:02 -07:00
Liangjun Feng	9aa011fa36	Optimize GetRange Function Summary: Optimize GetRange Function by checking the level of the files Test Plan: pass make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37977	2015-05-05 09:57:47 -07:00
Igor Canadi	36a7408896	Fix UNLIKELY parenthesis Summary: Ooops :) status.ok() is acutally highly likely :) Test Plan: none Reviewers: rven, yhchiang, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38043	2015-05-05 08:57:34 -07:00
Jim Meyering	2ab7065af1	build: avoid unused-variable warning Summary: [noticed a new warning when building with the very latest gcc] * db/memtablerep_bench.cc (FLAGS_env): Remove declaration of unused varaible, to avoid this warning/error: db/memtablerep_bench.cc:135:22: error: ‘FLAGS_env’ defined but not\ used [-Werror=unused-variable] static rocksdb::Env* FLAGS_env = rocksdb::Env::Default(); ^ Test Plan: compile Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37983	2015-05-02 13:19:10 -07:00
Venkatesh Radhakrishnan	d2346c2cf0	Fix hang with large write batches and column families. Summary: This diff fixes a hang reported by a Github user. https://www.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Ffacebook%2Frocksdb%2Fissues%2F595%23issuecomment-96983273&h=9AQFYOWlo Multiple large write batches with column families cause a hang. The issue was caused by not doing flushes/compaction when the write controller was stopped. Test Plan: Create a DBTest from the user's test case Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37929	2015-05-01 15:41:50 -07:00
Mark Callaghan	b6b100fe04	Remove iter_refresh_interval_us Summary: The default, use one iter for the whole test, isn't good. This cost me a few hours of debugging and a few days of tessting. For readonly that isn't realistic and for read-write that keeps a lot of old sst files around. I remove the option because nothing uses it and not calling gettimeofday per loop iteration adds about 3% to QPS at 20 threads. Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37965	2015-05-01 14:17:45 -07:00
Igor Canadi	dddceefe5e	Fix clang build Summary: fix build Test Plan: works Reviewers: kradhakrishnan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37911	2015-04-30 11:11:35 -07:00
krad	d4540654e9	Optimize GetApproximateSizes() to use lesser CPU cycles. Summary: CPU profiling reveals GetApproximateSizes as a bottleneck for performance. The current implementation is sub-optimal, it scans every file in every level to compute the result. We can take advantage of the fact that all levels above 0 are sorted in the increasing order of key ranges and use binary search to locate the starting index. This can reduce the number of comparisons required to compute the result. Test Plan: We have good test coverage. Run the tests. Reviewers: sdong, igor, rven, dynamike Subscribers: dynamike, maykov, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37755	2015-04-30 10:55:03 -07:00
Igor Canadi	7246ad34d0	Don't compact bottommost level in SuggestCompactRange Summary: Before the fix we also marked the bottommost level for compaction. This is wrong because then RocksDB has N+1 levels instead of N as before the compaction. Test Plan: SuggestCompactRangeTest in db_test Reviewers: yhchiang, rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37869	2015-04-29 13:35:48 -07:00
Igor Canadi	7f47ba0e26	Fix possible SIGSEGV in CompactRange (github issue #596 ) Summary: For very detailed explanation of what's happening read this: https://github.com/facebook/rocksdb/issues/596 Test Plan: make check + new unit test Reviewers: yhchiang, anthony, rven Reviewed By: rven Subscribers: adamretter, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37779	2015-04-29 10:52:31 -07:00
agiardullo	d6f39c5ae3	Helper function to time Merges Summary: Remove duplicate code. If this diff looks good, I will cleanup other call sites as well. Test Plan: unit tests Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37761	2015-04-27 20:23:50 -07:00
Igor Canadi	1bb4928da9	Include bunch of more events into EventLogger Summary: Added these events: * Recovery start, finish and also when recovery creates a file * Trivial move * Compaction start, finish and when compaction creates a file * Flush start, finish Also includes small fix to EventLogger Also added option ROCKSDB_PRINT_EVENTS_TO_STDOUT which is useful when we debug things. I've spent far too much time chasing LOG files. Still didn't get sst table properties in JSON. They are written very deeply into the stack. I'll address in separate diff. TODO: * Write specification. Let's first use this for a while and figure out what's good data to put here, too. After that we'll write spec * Write tools that parse and analyze LOGs. This can be in python or go. Good intern task. Test Plan: Ran db_bench with ROCKSDB_PRINT_EVENTS_TO_STDOUT. Here's the output: https://phabricator.fb.com/P19811976 Reviewers: sdong, yhchiang, rven, MarkCallaghan, kradhakrishnan, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37521	2015-04-27 15:20:02 -07:00
clark.kang	6ede020dc4	fix typos	2015-04-25 18:14:27 +09:00
sdong	98a44559d5	Build for CYGWIN Summary: Make it build for CYGWIN. Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions. Test Plan: Build it and run some unit tests in CYGWIN Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37605	2015-04-23 21:33:44 -07:00
sdong	d01bbb53ae	Fix CompactRange for universal compaction with num_levels > 1 Summary: CompactRange for universal compaction with num_levels > 1 seems to have a bug. The unit test also has a bug so it doesn't capture the problem. Fix it. Revert the compact range to the logic equivalent to num_levels=1. Always compact all files together. It should also fix DBTest.IncreaseUniversalCompactionNumLevels. The issue was that options.write_buffer_size = 100 << 10 and options.write_buffer_size = 100 << 10 are not used in later test scenarios. So write_buffer_size of 4MB was used. The compaction trigger condition is not anymore obvious as expected. Test Plan: Run the new test and all test suites Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37551	2015-04-23 19:12:31 -07:00
Igor Canadi	e003d3864c	Abstract out SetMaxPossibleForUserKey() and SetMinPossibleForUserKey Summary: Based on feedback from D37083. Are all of these correct? In some spaces it seems like we're doing SetMaxPossibleForUserKey() although we want the smallest possible internal key for user key. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37341	2015-04-23 18:08:37 -07:00
Igor Canadi	aa14670b27	Add an assertion in CompactionPicker Summary: Reading CompactionPicker I noticed this dangerous substraction of two unsigned integers. We should assert to mark this as safe. Test Plan: make check Reviewers: anthony, rven, yhchiang, sdong Reviewed By: sdong Subscribers: kradhakrishnan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37041	2015-04-23 17:46:15 -07:00
Giuseppe Ottaviano	2dc421df48	Implement DB::PromoteL0 method Summary: This diff implements a new `DB` method `PromoteL0` which moves all files in L0 to a given level skipping compaction, provided that the files have disjoint ranges and all levels up to the target level are empty. This method provides finer-grain control for trivial compactions, and it is useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change the semantics of an existing operation, which can impact existing code. PromoteL0 is designed to work well in combination with the proposed `GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure by populating one level at a time. Such fine-grained control can be very useful for static or mostly-static databases. Test Plan: `make check` Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37107	2015-04-23 12:10:36 -07:00
sdong	9bf40b64d0	Print max score in level summary Summary: Add more logging to help debugging issues. Test Plan: Run test suites Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37401	2015-04-23 11:34:36 -07:00
sdong	397b6588bd	options.paranoid_file_checks to read all rows after writing to a file. Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows. Test Plan: Add a new unit test for it Reviewers: rven, igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37335	2015-04-23 11:34:35 -07:00
Venkatesh Radhakrishnan	618d07b068	Making PreShutdown tests more reliable. Summary: A couple of times on Travis, we have had the thread status say that there were no compactions done and since we assert for it, the test failed. We now fix this by waiting till compaction started. Test Plan: run DBTEST::PreShutdown d=/tmp/j; rm -rf $d; seq 200 \| parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_test --gtest_filter=DBTest.PreShutdown* >& '$d'/log-{}' Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37545	2015-04-23 08:35:02 -07:00
Igor Canadi	6059bdf86a	Add experimental API MarkForCompaction() Summary: Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys). All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart. MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction. Test Plan: added a new unit test Reviewers: yhchiang, rven, MarkCallaghan, sdong Reviewed By: sdong Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37083	2015-04-17 16:44:45 -07:00
Jim Meyering	acf8a4141d	maint: use ASSERT_TRUE, not ASSERT_EQ(true; same for false Summary: The usage I'm fixing here caused trouble on Fedora 21 when compiling with the current gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC): db/write_controller_test.cc: In member function ‘virtual void rocksdb::WriteControllerTest_SanityTest_Test::TestBody()’: db/write_controller_test.cc:23:165: error: converting ‘false’ to pointer type for argument 1 of ‘char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)’ [-Werror=conversion-null] ASSERT_EQ(false, controller.IsStopped()); ^ This change was induced mechanically via: git grep -l -E 'ASSERT_EQ\(false'\|xargs perl -pi -e 's/ASSERT_EQ\(false, /ASSERT_FALSE(/' git grep -l -E 'ASSERT_EQ\(true'\|xargs perl -pi -e 's/ASSERT_EQ\(true, /ASSERT_TRUE(/' Except for the three in utilities/backupable/backupable_db_test.cc for which I ended up reformatting (joining lines) in the result. As for why this problem is exhibited with that version of gcc, and none of the others I've used (from 4.8.1 through gcc-5.0.0 and newer), I suspect it's a bug in F21's gcc that has been fixed in gcc-5.0.0. Test Plan: "make" now succeed on Fedora 21 Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D37329	2015-04-17 14:54:17 -07:00
Igor Canadi	b5400f90fe	Kill dead code Summary: this is not used anywhere Test Plan: compiles Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37053	2015-04-17 12:07:47 -07:00
Igor Canadi	00c2afcd38	Fix bug in ExpandWhileOverlapping() Summary: If ExpandWhileOverlapping() we don't clear inputs. That's a bug introduced by my recent patch https://reviews.facebook.net/D36687. However, we have no tests covering ExpandWhileOverlapping(). I created a task t6771252 to add ExpandWhileOverlapping() tests. Test Plan: make check Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D37077	2015-04-16 19:31:10 -07:00
sdong	debaf85ef5	Bug of trivial move of dynamic level Summary: D36669 introduces a bug that trivial moved data is not going to specific level but the next level, which will incorrectly be level 1 for level 0 compaciton if base level is not level 1. Fixing it by appreciating the output level Test Plan: Run all tests Reviewers: MarkCallaghan, rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37119	2015-04-14 21:42:08 -07:00
sdong	12d7d3d28d	Fix and Improve DBTest.DynamicLevelCompressionPerLevel2 Summary: Recent change of DBTest.DynamicLevelCompressionPerLevel2 has a bug that the second sync point is not enabled. Fix it. Also add an assert for that. Also, flush compression is not tracked in the test. Add it. Test Plan: Build everything Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37101	2015-04-14 21:42:08 -07:00
sdong	a1271c6c6f	Fix build break introduced by new SyncPoint interface change Summary: When commiting the sync point interface change, didn't resolve the new occurance of the old interface in rebase. Fix it. Test Plan: Build and see it pass Reviewers: igor, yhchiang, rven, anthony, kradhakrishnan Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37095	2015-04-14 16:42:37 -07:00
sdong	fcb206b667	SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward Summary: Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests. Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug. Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check. Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36999	2015-04-14 16:18:50 -07:00
Igor Canadi	281db8bb62	Temporarily disable test CompactFilesOnLevelCompaction Summary: https://reviews.facebook.net/D36963 made the debug build much faster and that triggered failures of CompactFilesOnLevelCompaction test. 3 out of 4 last tests on Jenkins failed. I'm disabling this test temporarily, since we likely know the reason why it's failing and there's already work in progress to address it -- https://reviews.facebook.net/D36225 Test Plan: none Reviewers: sdong, rven, yhchiang, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36993	2015-04-13 19:30:40 -07:00
Igor Canadi	9b983befa8	Fix flakiness of WalManagerTest Summary: We should use mocked-out env for these tests to make it more realiable. Added benefit is that instead of actually sleeping for 3 seconds, we can instead pretend to sleep and just increase time counters. Test Plan: for i in `seq 100`; do ./wal_manager_test --gtest_filter=WalManagerTest.WALArchivalTtl ;done Reviewers: rven, meyering Reviewed By: meyering Subscribers: meyering, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36951	2015-04-13 16:15:05 -07:00
Igor Canadi	e7ad14926a	Fix flakiness in FIFOCompaction test (github issue #573 ) Summary: The problem is that sometimes two memtables will be compacted together into a single file. In that case, our assertion ASSERT_EQ(NumTableFilesAtLevel(0), 5); fails because same amount of data is in 4 files instead of 5. We should wait for flush so that we prevent two memtables merging into a single file. Test Plan: `for i in `seq 20`; do mrtest FIFOCompactionTest; done` -- fails at least once before. fails zero times after. Reviewers: rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36939	2015-04-13 11:39:45 -07:00
Igor Canadi	abb4052278	Kill benchharness Summary: 1. it doesn't work 2. we're not using it In the future, if we need general benchmark framework, we should probably use https://github.com/google/benchmark Test Plan: make all Reviewers: yhchiang, rven, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36777	2015-04-13 10:17:42 -07:00
Igor Canadi	590fadc407	Fix compile warning on CLANG Summary: oops Test Plan: compiles now Reviewers: sdong, yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36867	2015-04-10 15:14:57 -07:00
Igor Canadi	47b8743984	Make Compaction class easier to use Summary: The goal of this diff is to make Compaction class easier to use. This should also make new compaction algorithms easier to write (like CompactFiles from @yhchiang and dynamic leveled and multi-leveled universal from @sdong). Here are couple of things demonstrating that Compaction class is hard to use: 1. we have two constructors of Compaction class 2. there's this thing called grandparents_, but it appears to only be setup for leveled compaction and not compactfiles 3. it's easy to introduce a subtle and dangerous bug like this: D36225 4. SetupBottomMostLevel() is hard to understand and it shouldn't be. See this comment: `afbafeaeae/db/compaction.cc (L236-L241)`. It also made it harder for @yhchiang to write CompactFiles, as evidenced by this: `afbafeaeae/db/compaction_picker.cc (L204-L210)` The problem is that we create Compaction object, which holds a lot of state, and then pass it around to some functions. After those functions are done mutating, then we call couple of functions on Compaction object, like SetupBottommostLevel() and MarkFilesBeingCompacted(). It is very hard to see what's happening with all that Compaction's state while it's travelling across different functions. If you're writing a new PickCompaction() function you need to try really hard to understand what are all the functions you need to run on Compaction object and what state you need to setup. My proposed solution is to make important parts of Compaction immutable after construction. PickCompaction() should calculate compaction inputs and then pass them onto Compaction object once they are finalized. That makes it easy to create a new compaction -- just provide all the parameters to the constructor and you're done. No need to call confusing functions after you created your object. This diff doesn't fully achieve that goal, but it comes pretty close. Here are some of the changes: * have one Compaction constructor instead of two. * inputs_ is constant after construction * MarkFilesBeingCompacted() is now private to Compaction class and automatically called on construction/destruction. * SetupBottommostLevel() is gone. Compaction figures it out on its own based on the input. * CompactionPicker's functions are not passing around Compaction object anymore. They are only passing around the state that they need. Test Plan: make check make asan_check make valgrind_check Reviewers: rven, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: sdong, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36687	2015-04-10 15:01:54 -07:00
agiardullo	753dd1fdd0	Fix valgrind issues in memtable_list_test Summary: Need to remember to unref MemTableList->current() before deleting. Test Plan: ran test with valgrind Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36855	2015-04-10 14:16:03 -07:00
krad	697380f3d7	Repairer documentation improvement. Summary: Adding verbosity to existing comments. Test Plan: None Reviewers: sdong CC: leveldb Task ID: #6718960 Blame Rev:	2015-04-10 12:35:28 -07:00
agiardullo	0feeee6433	Fix memtable_list_test Summary: Test failing due to a missing directory caused by a simple bug (did not run into this on my dev box since the path already existed). We should look into deleting test::TmpDir() before each test run. Test Plan: ran test Reviewers: igor, yhchiang, meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36831	2015-04-09 22:11:35 -07:00
Yueh-Hsuan Chiang	7b9581bc3b	Fixed xfunc related compile errors in ROCKSDB_LITE Summary: Fixed xfunc related compile errors in ROCKSDB_LITE Now make OPT=-DROCKSDB_LITE shared_lib -j32 would work Test Plan: make clean make OPT=-DROCKSDB_LITE shared_lib -j32 make clean make OPT=-DROCKSDB_LITE static_lib -j32 Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36825	2015-04-09 21:05:18 -07:00
agiardullo	fabc115690	MemTableList tests Summary: Add tests for MemTableList Test Plan: run test Reviewers: yhchiang, kradhakrishnan, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36735	2015-04-09 18:01:11 -07:00
Yueh-Hsuan Chiang	9741dec0e5	Fix a compile error in ROCKSDB_LITE in db/db_impl.cc Summary: Fix a compile error in ROCKSDB_LITE in db/db_impl.cc related to internal_stats. Test Plan: make OPT=-DROCKSDB_LITE shared_lib Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36819	2015-04-09 17:07:29 -07:00
Yueh-Hsuan Chiang	d2a056241a	Fix a compilation error in ROCKSDB_LITE in db/internal_stats.h Summary: Fix a compilation error in ROCKSDB_LITE in db/internal_stats.h Other compilation errors will be fixed in a separate diff. Test Plan: make OPT=-DROCKSDB_LITE Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36807	2015-04-09 16:43:54 -07:00
sdong	316ec80bf8	fault_injection_test: add a test case to cover log syncing after a log roll Summary: Add a test case: Write some keys without sync, flush, write other keys and do sync. Before flush finishes, host crashes and unsync data is dropped. Tag the new test as disabled since it is not passing. Test Plan: Run the test Reviewers: MarkCallaghan, rven, anthony, igor, kradhakrishnan Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36741	2015-04-09 16:15:42 -07:00
Mark Callaghan	ed229a0dee	Fixes for readcache-flashcache Summary: This fixes two problems: 1) the env should not be created twice when use_existing_db is false 2) the env dtor should run before cachedev_fd_ is closed. Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36795	2015-04-09 15:51:34 -07:00
agiardullo	84c5bd7eb9	Add thread-safety documentation to MemTable and related classes Summary: Other than making some class members private, this is a documentation-only change Test Plan: unit tests Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36567	2015-04-08 21:10:35 -07:00
krad	2b019a1512	Enabling checksum in repair db as it should have been. Summary: I think the checksum was turned off by mistake. Test Plan: Run make check Reviewers: igor sdong chip CC: Task ID: Blame Rev:	2015-04-08 15:52:02 -07:00
sdong	b1bbdd7919	Create EnvOptions using sanitized DB Options Summary: Now EnvOptions uses unsanitized DB options. bytes_per_sync is tuned off when rate_limiter is used, but this change doesn't take effort. Test Plan: See different I/O pattern in db_bench running fillseq. Reviewers: yhchiang, kradhakrishnan, rven, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36723	2015-04-08 14:40:42 -07:00
sdong	b118238a57	Trivial move to cover multiple input levels Summary: Now trivial move is only triggered when moving from level n to n+1. With dynamic level base, it is possible that file is moved from level 0 to level n, while levels from 1 to n-1 are empty. Extend trivial move to this case. Test Plan: Add a more unit test of sequential loading. Non-trivial compaction happened without the patch and now doesn't happen. Reviewers: rven, yhchiang, MarkCallaghan, igor Reviewed By: igor Subscribers: leveldb, dhruba, IslamAbdelRahman Differential Revision: https://reviews.facebook.net/D36669	2015-04-08 09:26:40 -07:00
krad	58346b9e29	Log writer record format doc. Summary: Added a ASCII doodle to represent the log writer format. Test Plan: None Reviewers: sdong CC: leveldb Task ID: 6179896 Blame Rev:	2015-04-07 16:25:56 -07:00
Yoshinori Matsunobu	f12614070f	Fix TSAN build error of D36447 Summary: D36447 caused build error when using COMPILE_WITH_TSAN=1. This diff fixes the error. Test Plan: jenkins Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36579	2015-04-06 17:37:36 -07:00
Yoshinori Matsunobu	824e646341	Adding another NewFlashcacheAwareEnv function to support pre-opened fd Summary: There are some cases when flachcache file descriptor was already allocated (i.e. fb-MySQL). Then NewFlashcacheAwareEnv returns an error at open() because fd was already assigned. This diff adds another function to instantiate FlashcacheAwareEnv, with pre-allocated fd cachedev_fd. Test Plan: Tested with MyRocks using this function, then worked Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, MarkCallaghan, rven Differential Revision: https://reviews.facebook.net/D36447	2015-04-06 16:50:36 -07:00
Igor Canadi	5e067a7b19	Clean up compression logging Summary: Now we add warnings when user configures compression and the compression is not supported. Test Plan: Configured compression to non-supported values. Observed messages in my log: 2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2. 2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data. Reviewers: rven, sdong, yhchiang Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35979	2015-04-06 12:50:44 -07:00
sdong	953a885ebf	A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge Summary: Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it. Also refactor the codes so that (1) make table property collector and internal table property collector two separate data structures with the later one now exposed (2) table builders only receive internal table properties Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector. Reviewers: yhchiang, igor.sugak, rven, igor Reviewed By: rven, igor Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D35373	2015-04-06 10:27:21 -07:00
Jim Meyering	d2a92c13bc	avoid returning a number-of-active-keys estimate of nearly 2^64 Summary: If accumulated_num_non_deletions_ were ever smaller than accumulated_num_deletions_, the computation of "accumulated_num_non_deletions_ - accumulated_num_deletions_" would result in a logically "negative" value, but since the two operands are unsigned (uint64_t), the result corresponding to e.g., -1 would 2^64-1. Instead, return 0 in that case. Test Plan: - ensure "make check" still passes - temporarily add an "abort();" call in the new "if"-block, and observe that it fails in some test cases. However, note that this case is triggered only when the two numbers are equal. Thus, no test case triggers the erroneous behavior this change is designed to avoid. If anyone can construct a scenario in which that bug would be triggered, I'll be happy to add a test case. Reviewers: ljin, igor, rven, igor.sugak, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36489	2015-04-03 14:46:35 -07:00
sdong	a7ac6cef1f	Fix level size overflow for options_.level_compaction_dynamic_level_bytes=true Summary: Int is used for level size targets when options_.level_compaction_dynamic_level_bytes=true, which will cause overflow when database grows big. Fix it. Test Plan: Add a new unit test which fails without the fix. Reviewers: rven, yhchiang, MarkCallaghan, igor Reviewed By: igor Subscribers: leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D36453	2015-04-03 09:04:35 -07:00
sdong	089509b847	db_test: clean up sync points in test cleaning up Summary: In some db_test tests sync points are not cleared which will cause unexpected results in the next tests. Clean them up in test cleaning up. Test Plan: Run the same tests that used to fail: build using USE_CLANG=1 and run ./db_test --gtest_filter="DBTest.CompressLevelCompaction:DBTestUniversalCompactionParallel" Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36429	2015-04-02 16:17:58 -07:00
Venkatesh Radhakrishnan	afbafeaeae	Disallow trivial move if compression level is different Summary: Check compression level of start_level with output_compression before allowing trivial move Test Plan: New DBTest CompressLevelCompactionThirdPath added Reviewers: igor, yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36213	2015-04-02 11:06:30 -07:00
Venkatesh Radhakrishnan	d0695f3e26	Fix crash caused by opening an empty DB in readonly mode Summary: This diff fixes a crash found when an empty database is opened in readonly mode. We now check the number of levels before we open the DB as a compacted DB. Test Plan: DBTest.EmptyCompactedDB Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36327	2015-04-01 16:55:08 -07:00
Herman Lee	51c8133a72	Fix make unity build compiler warning about "stats" shadowing global variable Summary: Fix the make unity build. The local stats variable name was shadowing a global stats variable. Test Plan: Run the build OPT=-DTRAVIS V=1 make unity Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36285	2015-04-01 10:48:42 -07:00
sdong	76d63b4525	Fix one non-determinism of DBTest.DynamicCompactionOptions Summary: After recent change of DBTest.DynamicCompactionOptions, occasionally hit another non-deterministic case where L0 showdown is triggered while timeout should not triggered for hard limit. Fix it by increasing L0 slowdown trigger at the same time. Test Plan: Run the failed test. Reviewers: igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36219	2015-03-30 15:53:44 -07:00
sdong	b23bbaa82a	Universal Compactions with Small Files Summary: With this change, we use L1 and up to store compaction outputs in universal compaction. The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible. If options.num_levels=1, it behaves all the same as now. Test Plan: 1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels. 2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels 3) add unit test to migrate from multiple level setting back to one level setting 4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results. Reviewers: rven, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: meyering, leveldb, MarkCallaghan, dhruba Differential Revision: https://reviews.facebook.net/D34539	2015-03-30 15:12:02 -07:00
Igor Canadi	2511b7d947	Makefile minor cleanup Summary: Just couple of small changes: 1. removed signal_test, since it doesn't seem useful and we don't even run it as part of `make check` 2. moved perf_context_test to TESTS instead of PROGRAMS 3. `make release` probably shouldn't compile benchmarks. We currently rely on `make release` building db_bench (via Jenkins), so I left db_bench there. This is just a minor cleanup. We need to rethink our targets since they are a bit messy right now. We can do this during our tech debt week. Test Plan: make release Reviewers: anthony, rven, yhchiang, sdong, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36171	2015-03-30 16:05:35 -04:00
Mark Callaghan	1bd70fb54a	Add --stats_interval_seconds to db_bench Summary: The --stats_interval_seconds determines interval for stats reporting and overrides --stats_interval when set. I also changed tools/benchmark.sh to report stats every 60 seconds so I can avoid trying to figure out a good value for --stats_interval per test and per storage device. Task ID: #6631621 Blame Rev: Test Plan: run tools/run_flash_bench, look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36189	2015-03-30 12:58:32 -07:00
Igor Canadi	fd3dbef22b	Clean up old log files in background threads Summary: Cleaning up log files can do heavy IO, since we call ftruncate() in the destructor. We don't want to call ftruncate() in user threads. This diff moves cleaning to background threads (flush and compaction) Test Plan: make check, will also run valgrind Reviewers: yhchiang, rven, MarkCallaghan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D36177	2015-03-30 15:04:10 -04:00
Mark Callaghan	99ec2412e5	Make the benchmark scripts configurable and add tests Summary: This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests, makes the duration per-test configurable and refactors the test scripts. Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except the writer thread does Merge rather than Put. Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format. Sometimes automation and humans want different format. Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary stats in the output at test end. Adds the average ingest rate to compaction IO stats. Output now looks like: https://gist.github.com/mdcallag/2bd64d18be1b93adc494 More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1 For benchmark.sh changes default RocksDB configuration to reduce stalls: * min_level_to_compress from 2 to 3 * hard_rate_limit from 2 to 3 * max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8 * L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop) Task ID: #6596829 Blame Rev: Test Plan: run tools/run_flash_bench.sh Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D36075	2015-03-30 11:28:25 -07:00
Igor Canadi	d61cb0b9de	db_bench can now disable flashcache for background threads Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code. Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35391	2015-03-30 09:51:11 -07:00
Herman Lee	e018892bb6	Formalize the DB properties string definitions. Summary: Assign the string properties to const string variables under the DB::Properties namespace. This helps catch typos during compilation and also consolidates the property definition in one place. Test Plan: Run rocksdb unit tests Reviewers: sdong, yoshinorim, igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35991	2015-03-27 14:50:20 -07:00
Igor Canadi	030859eb5d	Dump compression info on startup Summary: It's useful to know if we have compression support or no Test Plan: Observed this in my LOG: 2015/03/26-10:34:35.460681 7f5b322b7840 Snappy supported 2015/03/26-10:34:35.460682 7f5b322b7840 Zlib supported 2015/03/26-10:34:35.460686 7f5b322b7840 Bzip supported 2015/03/26-10:34:35.460687 7f5b322b7840 LZ4 NOT supported Reviewers: sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35955	2015-03-26 11:22:20 -07:00
Alexander.Mikhaylov	a3e4b32483	fix compilation error (same as fix #284 ) [maa@srv2-nskb-devg2 rocksdb-master]$ CXX=/usr/local/CC/gcc-4.7.4/bin/g++ EXTRA_CXXFLAGS=-std=c++11 DISABLE_WARNING_AS_ERROR=1 make db_bench CC db/db_bench.o db/db_bench.cc: In member function 'rocksdb::Slice rocksdb::Benchmark::AllocateKey(std::unique_ptr<const char []>)': db/db_bench.cc:1434:41: error: use of deleted function 'void std::unique_ptr<_Tp [], _Dp>::reset(_Up) [with _Up = char; _Tp = const char; _Dp = std::default_delete<const char []>]' In file included from /usr/local/CC/gcc-4.7.4/lib/gcc/x86_64-unknown-linux-gnu/4.7.4/../../../../include/c++/4.7.4/memory:86:0, from ./include/rocksdb/db.h:14, from ./db/dbformat.h:14, from ./db/db_impl.h:21, from db/db_bench.cc:33:	2015-03-26 14:53:42 +06:00
Anurag Indu	3d1a924ff3	Adding stats for the merge and filter operation Summary: We have addded new stats and perf_context for measuring the merge and filter operation time consumption. We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB. Test Plan: WIP Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34377	2015-03-24 14:42:04 -07:00
Yueh-Hsuan Chiang	248c063ba1	Report elapsed time in micros in ThreadStatus instead of start time. Summary: Report elapsed time of a thread operation in micros in ThreadStatus instead of start time of a thread operation in seconds since the Epoch, 1970-01-01 00:00:00 (UTC). Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample Output: ThreadID ThreadType cfName Operation ElapsedTime Stage State 140667724562496 High Pri column_family_name_000002 Flush 772.419 ms FlushJob::WriteLevel0Table 140667728756800 High Pri default Flush 617.845 ms FlushJob::WriteLevel0Table 140667732951104 High Pri column_family_name_000005 Flush 772.078 ms FlushJob::WriteLevel0Table 140667875557440 Low Pri column_family_name_000008 Compaction 1409.216 ms CompactionJob::Install 140667737145408 Low Pri 140667749728320 Low Pri 140667816837184 Low Pri column_family_name_000007 Compaction 1071.815 ms CompactionJob::ProcessKeyValueCompaction 140667787477056 Low Pri column_family_name_000009 Compaction 772.516 ms CompactionJob::ProcessKeyValueCompaction 140667741339712 Low Pri 140667758116928 Low Pri column_family_name_000004 Compaction 620.739 ms CompactionJob::ProcessKeyValueCompaction 140667753922624 Low Pri 140667842003008 Low Pri column_family_name_000006 Compaction 1260.079 ms CompactionJob::ProcessKeyValueCompaction 140667745534016 Low Pri Reviewers: sdong, igor, rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35769	2015-03-24 11:32:25 -07:00
Yueh-Hsuan Chiang	a057bb2a8e	Improve ThreadStatusSingleCompaction Summary: Improve ThreadStatusSingleCompaction in two ways: 1. Use SYNC_POINT to ensure compaction won't happen before the test finishes its "Put Phase" instead of using sleep. 2. In Put Phase, it continues until we have sufficient number of L0 files. Note that during the put phase, there won't be any compaction that consumes L0 files because of item 1. Test Plan: ./db_test --gtest_filter="ThreadStatusSingleCompaction" Reviewers: sdong, igor, rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35727	2015-03-23 15:30:45 -07:00
sdong	38d286f146	Clean-up WAL directory before running db_test Summary: DBTest doesn't clean up wal directory. It might cause failure after a failure test run. Fix it. Test Plan: Run unit tests Try open DB with non-empty db_path/wal. Reviewers: rven, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D35559	2015-03-20 12:34:24 -07:00
Igor Sugak	9405b5ef8f	rocksdb: Remove #include "util/string_util.h" from util/testharness.h Summary: 1. Manually deleted #include "util/string_util.h" from util/testharness.h 2. ``` % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -E 'say $F[0] if /: error:/' build.log \| sort -u \| xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"' ``` Test Plan: Make sure make all completes with no errors. ``` % make all -j55 ``` Reviewers: meyering, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35493	2015-03-19 17:29:37 -07:00
Igor Canadi	b088c83e6e	Don't delete files when column family is dropped Summary: To understand the bug read t5943287 and check out the new test in column_family_test (ReadDroppedColumnFamily), iter 0. RocksDB contract allowes you to read a drop column family as long as there is a live reference. However, since our iteration ignores dropped column families, AddLiveFiles() didn't mark files of a dropped column families as live. So we deleted them. In this patch I no longer ignore dropped column families in the iteration. I think this behavior was confusing and it also led to this bug. Now if an iterator client wants to ignore dropped column families, he needs to do it explicitly. Test Plan: Added a new unit test that is failing on master. Unit test succeeds now. Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32535	2015-03-19 17:04:29 -07:00
Igor Canadi	52e0f3353f	Clean up compactions_in_progress_ Summary: Suprisingly, the only way we use this vector is to keep track of level0 compactions. Thus, I simplified it. Test Plan: make check Reviewers: rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35313	2015-03-18 18:25:15 -07:00
Igor Sugak	6b626ff24c	rocksdb: change db_test::MultiThreadedDBTest as value parameterized test. Summary: This is a simple change to make db_test::MultiThreadedDBTest as value parameterized test. There is a value of creating a separate set of such tests later. Test Plan: ```lang=bash % make db_test % ./make db_test ``` Also with the following command I can execute all db_test in 2:37.87 on my box ``` % ./db_test --gtest_list_tests \| sed 's/\# GetParam.//' \| tr -d ' ' \| env time parallel --gnu --eta --joblog=LOG -- 'TEST_TMPDIR=/dev/shm/rocksdb-{} ./db_test --gtest_filter="{}"' ``` Reviewers: igor, rven, meyering, sdong Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35361	2015-03-18 18:18:12 -07:00
sdong	0831a35994	Add a DB Property For Number of Deletions in Memtables Summary: Add a DB property for number of deletions in memtables. It can sometimes help people debug slowness because of too many deletes. Test Plan: Add test cases. Reviewers: rven, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D35247	2015-03-18 17:03:59 -07:00
Mark Callaghan	dfccc7b4e2	Add readwhilemerging benchmark Summary: This is like readwhilewriting but uses Merge rather than Put in the writer thread. I am using it for in-progress benchmarks. I don't think the other benchmarks for Merge cover this behavior. The purpose for this test is to measure read performance when readers might have to merge results. This will also benefit from work-in-progress to add skewed key generation. Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35115	2015-03-18 13:50:52 -07:00
agiardullo	81345b90f9	Create an abstract interface for write batches Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class. This makes it easier to write code that is agnostic toward the implementation of the particular write batch. In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch. Test Plan: modified existing WriteBatchWithIndex tests to test new functions. Running all tests. Reviewers: igor, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34017	2015-03-17 19:23:08 -07:00
Igor Canadi	c88ff4ca76	Deprecate removeScanCountLimit in NewLRUCache Summary: It is no longer used by the implementation, so we should also remove it from the public API. Test Plan: make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34971	2015-03-17 15:04:37 -07:00
Igor Sugak	b4b69e4f77	rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te \| test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files 'test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157	2015-03-17 14:08:00 -07:00
Venkatesh Radhakrishnan	98c37fda5d	Remove unused parameter in CancelAllBackgroundWork Summary: Some suggestions for cleanup from Igor. Test Plan: Regression tests. Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35169	2015-03-16 21:07:54 -07:00
Igor Sugak	9fd6edf81c	rocksdb: Replace ASSERT* with EXPECT* in functions that does not return void value Summary: gtest does not use exceptions to fail a unit test by design, and `ASSERT`s are implemented using `return`. As a consequence we cannot use `ASSERT` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement \| 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes. In order to detect all existing `ASSERT` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases. In `util/testharness.h` I defined `EXPECT` assertions, the same way as `ASSERT`, and redefined `ASSERT` to return `void`. Then executed: ```lang=bash % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if /: error:/' \ build.log \| xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number' % make format ``` After that I reverted back change to `ASSERT` in `util/testharness.h`. But preserved introduced `EXPECT`, which is the same as `ASSERT*`. This will be deleted once switched to gtest. This diff is independent and contains manual changes only in `util/testharness.h`. Test Plan: Make sure all tests are passing. ```lang=bash % USE_CLANG=1 make check ``` Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33333	2015-03-16 20:52:32 -07:00
Venkatesh Radhakrishnan	b2b3086524	Speed up rocksDB close call. Summary: On RocksDB, when there are multiple instances doing flushes/compactions in the background, the close call takes a long time because the flushes/compactions need to complete before the database can shut down. If another instance is using the background threads and the compaction for this instance is in the queue since it has been scheduled, we still cannot shutdown. We now remove the scheduled background tasks which have not yet started running, so that shutdown is speeded up. Test Plan: DB Test added. Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33741	2015-03-16 18:49:14 -07:00
Igor Sugak	95344346af	rocksdb: Small refactoring before migrating to gtest Summary: These changes are necessary to make tests look more generic, and avoid feature conflicts with gtest. Test Plan: Make sure no build errors, and all test are passing. ``` % make check ``` Reviewers: igor, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35145	2015-03-16 18:08:59 -07:00
Mark Callaghan	56337faf3e	Fix compaction IO stats to handle large file counts Summary: The output did not have space for 6-digit file counts or for 3-digit counts of files being compacted. This adds space for that while preserving existing alignment. See https://gist.github.com/mdcallag/0a61c6a18dd467224c11 Task ID: # Blame Rev: Test Plan: run db_bench, look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35091	2015-03-16 11:50:23 -07:00
Igor Canadi	c6967a1a5e	Make RecordIn/RecordOut human readable Summary: I had hard time understanding these big numbers. Here's how the output looks like now: https://gist.github.com/igorcanadi/4c39c17685049584a992 Test Plan: db_bench Reviewers: sdong, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35073	2015-03-14 15:12:41 -07:00
Mark Callaghan	c8da670325	Stop printing per-level stall times. Summary: Per-level stall times are the suggested stall time, not the actual stall time so this change stops printing them both in the per-level output lines and in the summary. Also changed output for total stall time to include units in all cases. The new output looks like: Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt) RecordIn RecordDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 4/1 7 0.8 0.0 0.0 0.0 0.6 0.6 0.0 0.0 0.0 12.9 50 352 0.141 882 0 0 L1 5/0 9 0.9 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0 0 0.000 0 0 0 L2 54/0 99 1.0 0.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.0 0 0 0.000 0 0 0 L3 289/0 527 0.5 0.0 0.0 0.0 0.0 0.0 0.5 0.0 0.0 0.0 0 0 0.000 0 0 0 Sum 352/1 642 0.0 0.0 0.0 0.0 0.6 0.6 1.7 1.0 0.0 12.9 50 352 0.141 882 0 0 Int 0/0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 15.5 0 3 0.118 7 0 0 Flush(GB): accumulative 0.627, interval 0.005 Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 882 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard Task ID: #6493861 Blame Rev: Test Plan: run db_bench, look at output Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35085	2015-03-14 15:01:43 -07:00
Yueh-Hsuan Chiang	12134139e3	Fixed the unit-test issue in PreShutdownCompactionMiddle Summary: Fixed the unit-test issue in PreShutdownCompactionMiddle Test Plan: export ROCKSDB_TESTS=PreShutdownCompactionMiddle Reviewers: rven, sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35061	2015-03-14 08:25:27 -07:00
Yueh-Hsuan Chiang	fd1b3f385a	Fix the issue in PreShutdownMultipleCompaction Summary: Fix the issue in PreShutdownMultipleCompaction Test Plan: export ROCKSDB_TESTS=PreShutdownMultipleCompaction ./db_test Reviewers: rven, sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35055	2015-03-14 08:03:02 -07:00
Igor Canadi	417367c42d	Fix SIGSEGV when not using cache	2015-03-13 16:41:00 -07:00
Venkatesh Radhakrishnan	e25ff039c8	Prevent slowdowns and stalls in PreShutdown tests Summary: The preshutdown tests check for stopped compactions/flushes. Removing stalls on the write path. Test Plan: DBTests.PreShutdown* Reviewers: yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35037	2015-03-13 14:51:40 -07:00
Igor Canadi	f690712652	Speed up db_bench shutdown Summary: See t6489044 Test Plan: compiles Reviewers: MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34977	2015-03-13 14:45:15 -07:00
Yueh-Hsuan Chiang	c1b3cde18a	Improve the robustness of ThreadStatusSingleCompaction Summary: Improve the robustness of ThreadStatusSingleCompaction by ensuring the number of files flushed in the test. Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35019	2015-03-13 13:16:53 -07:00
Yueh-Hsuan Chiang	8c12426c93	Fix the deadlock issue in ThreadStatusSingleCompaction. Summary: Fix the deadlock issue in ThreadStatusSingleCompaction. In the previous version of ThreadStatusSingleCompaction, the compaction thread will wait for a SYNC_POINT while its db_mutex is held. However, if the test hasn't finished its Put cycle while a compaction is running, a deadlock will happen in the test. Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test Reviewers: sdong, igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35001	2015-03-13 12:53:00 -07:00
sdong	b16ead531d	DBTest.DynamicLevelCompressionPerLevel should not run without snappy support Summary: The test depends on snappy to be used. Skip the test if it is not supported. Test Plan: Run the test Reviewers: meyering, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34995	2015-03-13 11:26:17 -07:00
Yueh-Hsuan Chiang	a5e60bafc2	Fix a typo / test failure in ThreadStatusSingleCompaction Summary: Fix a typo / test failure in ThreadStatusSingleCompaction Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test	2015-03-13 11:20:17 -07:00
Igor Canadi	cb2c91850c	Don't run some tests is snappy is not present Summary: Currently, we have `ifdef SNAPPY` around bunch of db_test code. Some tests that don't even use compression are also blocked when running system doesn't have snappy. This also causes hard-to-catch bugs, like D34983. We should dynamically figure out if compression is supported or not. Test Plan: compiles Reviewers: sdong, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34989	2015-03-13 11:08:50 -07:00
Yueh-Hsuan Chiang	c594b0e89d	Allow GetThreadList() to report operation stage. Summary: Allow GetThreadList() to report operation stage. Test Plan: ./thread_list_test ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 export ROCKSDB_TESTS=ThreadStatus ./db_test Sample output ThreadID ThreadType cfName Operation OP_StartTime ElapsedTime Stage State 140116265861184 Low Pri 140116270055488 Low Pri 140116274249792 High Pri column_family_name_000005 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116400078912 Low Pri column_family_name_000004 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116358135872 Low Pri column_family_name_000006 Compaction 2015/03/10-14:58:10 1 us CompactionJob::FinishCompactionOutputFile 140116341358656 Low Pri 140116295221312 High Pri default Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116324581440 Low Pri column_family_name_000009 Compaction 2015/03/10-14:58:11 0 us CompactionJob::ProcessKeyValueCompaction 140116278444096 Low Pri 140116299415616 Low Pri column_family_name_000008 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116291027008 High Pri column_family_name_000001 Flush 2015/03/10-14:58:11 0 us FlushJob::WriteLevel0Table 140116286832704 Low Pri column_family_name_000002 Compaction 2015/03/10-14:58:11 0 us CompactionJob::FinishCompactionOutputFile 140116282638400 Low Pri Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34683	2015-03-13 10:45:40 -07:00
Igor Canadi	52d8347a91	EventLogger Summary: Here's my proposal for making our LOGs easier to read by machines. The idea is to dump all events as JSON objects. JSON is easy to read by humans, but more importantly, it's easy to read by machines. That way, we can parse this, load into SQLite/mongo and then query or visualize. I started with table_create and table_delete events, but if everybody agrees, I'll continue by adding more events (flush/compaction/etc etc) Test Plan: Ran db_bench. Observed: 2015/01/15-14:13:25.788019 1105ef000 EVENT_LOG_v1 {"time_micros": 1421360005788015, "event": "table_file_creation", "file_number": 12, "file_size": 1909699} 2015/01/15-14:13:25.956500 110740000 EVENT_LOG_v1 {"time_micros": 1421360005956498, "event": "table_file_deletion", "file_number": 12} Reviewers: yhchiang, rven, dhruba, MarkCallaghan, lgalanis, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31647	2015-03-13 10:15:54 -07:00
Islam AbdelRahman	9d22a1f136	Allow negative Wnew Summary: we are using uint64_t for Wnew this is not correct since this value can be negative https://github.com/facebook/rocksdb/issues/535 Test Plan: run db_bench and check what happens when Wnew is -ve Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34935	2015-03-12 20:53:18 -07:00
Venkatesh Radhakrishnan	b411d06031	Prevent stalls in preshutdown tests Summary: The tests using sync_point for intent to shutdown stop compaction and this results in stalls if too many rows are written. We now limit the number of rows written to prevent stalls, since the focus of the test is to cancel background work, which is being correctly tested. This fixes a Jenkins issue. Test Plan: DBTest.PreShutdown* Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34893	2015-03-12 10:49:06 -07:00
Islam AbdelRahman	1d43bc41fb	Fixing segmentation fault in db_bench Summary: Fixing segmentation fault when running db_bench This seg fault happens because num_created is used without being initialized Test Plan: running db_bench using these arguments bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=1000000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=overwrite --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/home/tec/koko/ --sync=$sync --disable_wal=1 --compression_type=zlib --stats_interval=$si --compression_ratio=0.5 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=1 Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34881	2015-03-11 17:57:16 -07:00
sdong	e9de8b65a6	Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true Summary: Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc. Test Plan: run all tests Reviewers: rven, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34431	2015-03-11 13:14:52 -07:00
Yueh-Hsuan Chiang	2b785d76b8	Fixed a bug where CompactFiles won't delete obsolete files until flush. Summary: Fixed a bug where CompactFiles won't delete obsolete files until flush. Test Plan: ./compact_files_test export ROCKSDB_TESTS=CompactFiles ./db_test Reviewers: rven, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34671	2015-03-11 13:06:59 -07:00
sdong	2884b100ba	db_bench: Better way to randomize repeated read keys in -read_random_exp_range Summary: Use a better way to map from a key with locality to a random location. Now with the same -read_random_exp_range setting, hit rate drops, which it is expected. Test Plan: ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=<multiple_values> Reviewers: MarkCallaghan, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34761	2015-03-11 11:46:14 -07:00
Venkatesh Radhakrishnan	284be570c8	Provide a mechanism to inform Rocksdb that it is shutting down Summary: Provide an API which enables users to infor Rocksdb that it is shutting down. Test Plan: db_test Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34617	2015-03-11 10:31:02 -07:00
Igor Canadi	2ddf53b2ca	Get OptimizeFilterForHits work on Mac Summary: Got it working by some voodoo programming Test Plan: works! Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34611	2015-03-10 17:53:22 -07:00
Yueh-Hsuan Chiang	89597bb66b	Allow GetThreadList() to report the start time of the current operation. Summary: Allow GetThreadList() to report the start time of the current operation. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \ --max_background_compactions=10 --max_background_flushes=3 \ --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \ --num_column_families=10 Sample output: ThreadID ThreadType cfName Operation OP_StartTime State 140338840797248 High Pri column_family_name_000003 Flush 2015/03/09-17:49:59 140338844991552 High Pri column_family_name_000004 Flush 2015/03/09-17:49:59 140338849185856 Low Pri 140338983403584 Low Pri 140339008569408 Low Pri 140338861768768 Low Pri 140338924683328 Low Pri 140338899517504 Low Pri 140338853380160 Low Pri 140338882740288 Low Pri 140338865963072 High Pri column_family_name_000006 Flush 2015/03/09-17:49:59 140338954043456 Low Pri 140338857574464 Low Pri Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: lgalanis, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34689	2015-03-10 14:51:28 -07:00
sdong	37921b4997	db_bench: Add Option -read_random_exp_range to allow read skewness. Summary: Introduce parameter -read_random_exp_range in db_bench to provide some key skewness in readrandom and multireadrandom benchmarks. It will helpful to cover block cache better. Test Plan: Run benchmarks with this new parameter. I can clearly see block cache hit rate change while I increase this value (DB size is about 66MB): ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=0.0 rocksdb.block.cache.data.miss COUNT : 958418 rocksdb.block.cache.data.hit COUNT : 41582 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=5.0 rocksdb.block.cache.data.miss COUNT : 819518 rocksdb.block.cache.data.hit COUNT : 180482 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=10.0 rocksdb.block.cache.data.miss COUNT : 450479 rocksdb.block.cache.data.hit COUNT : 549521 ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=20.0 rocksdb.block.cache.data.miss COUNT : 223192 rocksdb.block.cache.data.hit COUNT : 776808 Reviewers: MarkCallaghan, kradhakrishnan, yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34629	2015-03-09 11:34:52 -07:00
Igor Canadi	485ac0dbd0	Add rate_limiter to string options Summary: I want to be able to set this through mongo config. Test Plan: added unit test Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34599	2015-03-06 14:21:15 -08:00
Yueh-Hsuan Chiang	dc4532c497	Add --thread_status_per_interval to db_bench Summary: Add --thread_status_per_interval to db_bench, which allows db_bench to optionally enable print the current thread status periodically. Test Plan: ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10 Sample output: ThreadID ThreadType dbName cfName Operation State 140281571770432 Low Pri 140281575964736 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000001 Flush 140281710182464 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000008 Compaction 140281638879296 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000007 Compaction 140281592741952 Low Pri 140281580159040 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000002 Flush 140281676628032 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000006 Compaction 140281584353344 Low Pri 140281622102080 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000009 Compaction 140281605324864 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000004 Compaction 140281601130560 High Pri /tmp/rocksdbtest-5297/dbbench default Flush 140281596936256 Low Pri 140281588547648 Low Pri Reviewers: igor, rven, sdong Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34515	2015-03-06 11:22:06 -08:00
Yueh-Hsuan Chiang	694988b627	Fix a bug in stall time counter. Improve its output format. Summary: Fix a bug in stall time counter. Improve its output format. Test Plan: export ROCKSDB_TESTS=Timeout ./db_test ./db_bench --benchmarks=fillrandom --stats_interval=10000 --statistics=true --stats_per_interval=1 --num=1000000 --threads=4 --level0_stop_writes_trigger=3 --level0_slowdown_writes_trigger=2 sample output: Uptime(secs): 35.8 total, 0.0 interval Cumulative writes: 359590 writes, 359589 keys, 183047 batches, 2.0 writes per batch, 0.04 GB user ingest, stall seconds: 1786.008 ms Cumulative WAL: 359591 writes, 183046 syncs, 1.96 writes per sync, 0.04 GB written Interval writes: 253 writes, 253 keys, 128 batches, 2.0 writes per batch, 0.0 MB user ingest, stall time: 0 us Interval WAL: 253 writes, 128 syncs, 1.96 writes per sync, 0.00 MB written Reviewers: MarkCallaghan, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34275	2015-03-03 12:48:12 -08:00
Igor Canadi	db03739340	options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically. Summary: When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases. In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier. Test Plan: New unit tests and pass tests suites including valgrind. Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo Reviewed By: ikabiljo Subscribers: yoshinorim, ikabiljo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31437	2015-03-02 22:40:41 -08:00
Mark Callaghan	c4bd03a97e	Fix typo in log message Summary: fix typo Task ID: # Blame Rev: Test Plan: Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34251	2015-03-02 09:35:50 -08:00
Igor Canadi	3cf7f353d9	Instrument memtable seeks Summary: As title Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34191	2015-02-27 17:06:06 -08:00
Sameet Agarwal	e7c434c364	Add columnfamily option optimize_filters_for_hits to optimize for key hits only Summary: Summary: Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most accesses to the store are key hits and we dont need to optimize performance for key misses. This is useful when you have a very large database and most of your lookups succeed. The option allows the store to not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level. This option is only provided for BlockBasedTable. We skip the filters when we are compacting Test Plan: 1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level) 2. Added another unit test to db_test which specifically tests that filters are being skipped Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33717	2015-02-26 16:25:56 -08:00
Igor Sugak	62247ffa3b	rocksdb: Add missing override Summary: When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes. Prerequisites: bear and clang 3.5 build with extra tools ```lang=bash % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html % clang-modernize -p . -include . -add-override % make format ``` Test Plan: Make sure all tests are passing. ```lang=bash % #Use default fb code clang. % make check ``` Verify less error and no missing override errors. ```lang=bash % # Have trunk clang present in path. % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make ``` Reviewers: igor, kradhakrishnan, rven, meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34077	2015-02-26 11:28:41 -08:00
Mark Callaghan	182b4ceacd	Limit key range to number of keys, not number of writes Summary: An old commit (482401) changed DoWrite to use the value of --writes rather than --num to determine the range for keys. This restores the old and correct behavior which is to limit it using --num. Task ID: #6353043 Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34065	2015-02-25 15:53:45 -08:00
Venkatesh Radhakrishnan	4ade89962d	Fix compile error on MacOS. Summary: In a release build, a member was not being accessed. This member was only being accessed in a debug build. We now add an accessor function for this member and the buid succeeds. Test Plan: build release/unity/debug on linux/mac Reviewers: sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34035	2015-02-24 16:24:53 -08:00
Igor Canadi	ace3d85068	Revert "Unused managed iterator" This reverts commit `bd339a9798`. Conflicts: db/managed_iterator.cc	2015-02-24 13:27:41 -08:00
Igor Canadi	7b8f348e56	Attempt at fixing travis issue	2015-02-24 12:20:43 -08:00
Igor Canadi	bd339a9798	Unused managed iterator Summary: This causes warnings on OS X Test Plan: compiles Reviewers: rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33969	2015-02-24 09:51:52 -08:00
Jinfu Leng	96d989f70d	catch config errors with L0 file count triggers Test Plan: Run "make clean && make all check" Reviewers: rven, igor, yhchiang, kradhakrishnan, MarkCallaghan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33627	2015-02-23 16:08:27 -08:00
Igor Sugak	62f7a1be4f	rocksdb: Fixed 'Dead assignment' and 'Dead initialization' scan-build warnings Summary: This diff contains trivial fixes for 6 scan-build warnings: db/c_test.c `db` variable is never read. Removed assignment. scan-build report: http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath db/db_iter.cc `skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return. scan-build report: http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath db/log_reader.cc In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion. scan-build report: http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases. scan-build reprots: http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPath http://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath table/plain_table_key_coding.cc Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration. scan-build report: http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath tools/db_stress.cc Missing `break` in switch case block. This seems to be a bug. Added missing `break`. Test Plan: Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs. ```lang=bash % make check % make analyze ``` Reviewers: meyering, igor, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33795	2015-02-23 14:10:09 -08:00
Jim Meyering	a2b911b63f	inputs: restore "const" attribute removed by D33759 Summary: The "const" attribute applies to the type, and placing it before that return type retains the desired semantics, yet avoids the compiler error/warning. Test Plan: Run make Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33789	2015-02-20 11:52:20 -08:00
Jim Meyering	c6d54b5037	fix erroneous assert: cast kBlockSize (of type unsigned int) to "int" Summary: Otherwise, we would assert that an unsigned expression is always >= 0. The intent was to form a possibly negative number, and to assert that that value is always >= 0, but since one variable in the computation was unsigned, the result was guaranteed to be unsigned, too, rendering the assertion useless. Cast that unsigned variable to "int", so that all operands are signed, and thus so that the result can be negative. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33771	2015-02-20 11:07:17 -08:00
Jim Meyering	c37937a9ce	maint: remove extraneous "const" attribute from return type Summary: The "const" attribute does not make sense on a return type, and provokes a warning/error from gcc -W -Wextra. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33759	2015-02-20 11:07:07 -08:00
Jim Meyering	9283c7afd2	build: remove always-true assertions Summary: Remove some always-true assertions. They provoke these compilation failures: table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] * table/plain_table_key_coding.cc (rocksdb): Remove assertion that unsigned type variable is >= 0. * db/version_set.cc (DoGenerateLevelFilesBrief): Likewise. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33747	2015-02-20 11:07:03 -08:00
Igor Sugak	73711f956c	rocksdb: Fix scan-build bug 'Memory leak' in db/db_bench.cc Summary: The bug is detected by scan-build. In `void WriteSeqSeekSeq(ThreadState* thread)` memory is allocated in line 3118 `Slice key = AllocateKey();` but `Slice` is not responsible deleting `Slice::data()`. Added `std::unique_ptr<const char[]>*` parameter to ` AllocateKey()`, so that it requires caller to not forget about Slice::data() management. scan-build bug report: http://home.fburl.com/~sugak/latest6/report-6e9754.html#EndPath Test Plan: Make sure scan-build does not report 'Memory leak' in db/db_bench.cc and all tests are passing. ```lang=bash % make analyze % make check ``` Reviewers: lgalanis, igor, meyering, sdong Reviewed By: meyering, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33501	2015-02-19 14:27:48 -08:00
Jim Meyering	a42324e370	build: do not relink every single binary just for a timestamp Summary: Prior to this change, "make check" would always waste a lot of time relinking 60+ binaries. With this change, it does that only when the generated file, util/build_version.cc, changes, and that happens only when the date changes or when the current git SHA changes. This change makes some other improvements: before, there was no rule to build a deleted util/build_version.cc. If it was somehow removed, any attempt to link a program would fail. There is no longer any need for the separate file, build_tools/build_detect_version. Its functionality is now in the Makefile. * Makefile (DEPFILES): Don't filter-out util/build_version.cc. No need, and besides, removing that dependency was wrong. (date, git_sha, gen_build_version): New helper variables. (util/build_version.cc): New rule, to create this file and update it only if it would contain new information. * build_tools/build_detect_platform: Remove file. * db/db_impl.cc: Now, print only date (not the time). * util/build_version.h (rocksdb_build_compile_time): Remove declaration. No longer used. Test Plan: - Run "make check" twice, and note that the second time no linking is performed. - Remove util/build_version.cc and ensure that any "make" command regenerates it before doing anything else. - Run this: strings librocksdb.a\|grep _build_. That prints output including the following: rocksdb_build_git_date:2015-02-19 rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0 Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33591	2015-02-19 13:11:10 -08:00
sdong	d45a6a4002	Add rocksdb.num-live-versions: number of live versions Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33327	2015-02-19 13:10:37 -08:00
Venkatesh Radhakrishnan	7d817268b9	Managed iterator Summary: This is a diff for managed iterator. A managed iterator is a wrapper around an iterator which saves the options for that iterator as well as the current key/value so that the underlying iterator and its associated memory can be released when it is aged out automatically or on the request of the user. Will provide the automatic release as a follow-up diff. Test Plan: Managed* tests in db_test and XF tests for managed iterator Reviewers: igor, yhchiang, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31401	2015-02-18 11:49:31 -08:00
Yueh-Hsuan Chiang	12753130ec	Remove ThreadStatusMultiCompaction test Summary: Remove ThreadStatusMultiCompaction test as it's currently written in a way that depends on some randomness, while the flush / compaction status of a single thread is also covered in ThreadStatusFlush and ThreadStatusSingleCompaction tests. Test Plan: ./db_test Reviewers: igor, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33537	2015-02-17 12:04:56 -08:00
Yueh-Hsuan Chiang	e60bc99fe0	Allow GetThreadList to reflect flush activity. Summary: Allow GetThreadList to reflect flush activity. Test Plan: Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test. ./db_test ./thread_list_test Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D32871	2015-02-17 10:13:52 -08:00
Igor Canadi	e7ea51a8e7	Introduce job_id for flush and compaction Summary: It would be good to assing background job their IDs. Two benefits: 1) makes LOGs more readable 2) I might use it in my EventLogger, which will try to make our LOG easier to read/query/visualize Test Plan: ran rocksdb, read the LOG Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31617	2015-02-12 09:54:48 -08:00
sdong	5f00af4570	DBTest.DestroyDBMetaDatabase: create DB directories if not exists Summary: DBTest.DestroyDBMetaDatabase occasionally fails on my dev host, for file not existing. Always create directories to avoid that. Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33321	2015-02-11 16:16:50 -08:00
sdong	68af7811ea	Remember whole key/prefix filtering on/off in SST file Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter. Test Plan: Add a unit test for it Reviewers: yhchiang, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32889	2015-02-11 11:20:04 -08:00
sdong	6d6305dd7d	Perf Context to report DB mutex waiting time Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex Test Plan: Add a test and run it. Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33177	2015-02-09 17:55:12 -08:00
Igor Canadi	863009b5a5	Fix deleting obsolete files #2 Summary: For description of the bug, see comment in db_test. The fix is pretty straight forward. Test Plan: added unit test. eventually we need better testing of FOF/POF process. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33081	2015-02-09 17:38:32 -08:00
Grace Law	1851f977c2	Added RocksDB stats GET_HIT_L0 and GET_HIT_L1 Summary: - In statistics.h , added tickers. - In version_set.cc, -- Added a getter method for hit_file_level_ in the class FilePicker -- Added a line in the Get() method in case of a found, increment the corresponding counters based on the level of the file respectively. Corresponding task: https://our.intern.facebook.com/intern/tasks/?s=506100481&t=5952818 Personal fork: `0c3f2e3600` Test Plan: In terminal, ``` make -j32 db_test ROCKSDB_TESTS=L0L1L2AndUpHitCounter ./db_test ``` Or to use debugger, ``` make -j32 db_test export ROCKSDB_TESTS=L0L1L2AndUpHitCounter gdb db_test ``` Reviewers: rven, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D32205	2015-02-09 14:53:58 -08:00
sdong	91ac3b2067	Print DB pointer when opening a DB Summary: Having a pointer for DB will be helpful to debug when GDB or working on a dump. If the client process doesn't have any thread actively working on RocksDB, it can be hard to find out. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33159	2015-02-09 12:52:58 -08:00
fyrz	cfe8837e43	Switch logv with loglevel to virtual	2015-02-09 20:59:29 +01:00
Igor Canadi	aaceef3638	Fix formatting	2015-02-09 09:53:30 -08:00
Igor Canadi	ee4aa9a0ee	Merge pull request #481 from mkevac/backupable Allow creating and restoring backups from C	2015-02-09 09:51:19 -08:00
Marko Kevac	9651308307	renamed backup to backup_and_restore in c_test for clarity	2015-02-09 12:11:42 +03:00
Marko Kevac	7e50ed8c24	Added some more wrappers and wrote a test for backup in C	2015-02-07 14:25:10 +03:00
Igor Canadi	2a979822b6	Fix deleting obsolete files Summary: This diff basically reverts D30249 and also adds a unit test that was failing before this patch. I have no idea how I didn't catch this terrible bug when writing a diff, sorry about that :( I think we should redesign our system of keeping track of and deleting files. This is already a second bug in this critical piece of code. I'll think of few ideas. BTW this diff is also a regression when running lots of column families. I plan to revisit this separately. Test Plan: added a unit test Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33045	2015-02-06 08:44:30 -08:00
Igor Canadi	6f10130354	Fix DestroyDB Summary: When DestroyDB() finds a wal file in the DB directory, it assumes it is actually in WAL directory. This can lead to confusion, since it reports IO error when it tries to delete wal file from DB directory. For example: https://ci-builds.fb.com/job/rocksdb_clang_build/296/console This change will fix our unit tests. Test Plan: unit tests work Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32907	2015-02-05 20:09:42 -08:00
Igor Canadi	7de4e99a8e	Revert "Fix wal_dir not getting cleaned" This reverts commit `f36d394aed`.	2015-02-05 11:44:17 -08:00
Yueh-Hsuan Chiang	181191a1e4	Add a counter for collecting the wait time on db mutex. Summary: Add a counter for collecting the wait time on db mutex. Also add MutexWrapper and CondVarWrapper for measuring wait time. Test Plan: ./db_test export ROCKSDB_TESTS=MutexWaitStats ./db_test verify stats output using db_bench make clean make release ./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10 Sample output: rocksdb.db.mutex.wait.micros COUNT : 7546866 Reviewers: MarkCallaghan, rven, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32787	2015-02-04 21:39:45 -08:00
Igor Canadi	f36d394aed	Fix wal_dir not getting cleaned	2015-02-04 18:57:22 -08:00
sdong	53ae09c398	db_test: fix a data race in SpecialEnv Summary: db_test's test class SpecialEnv has a thread unsafe variable rnd_ but it can be accessed by multiple threads. It is complained by TSAN. Protect it by a mutex. Test Plan: Run the test Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32895	2015-02-04 18:32:53 -08:00
Igor Canadi	3e53760fc4	Fix compaction_picker_test	2015-02-04 16:20:25 -08:00
Igor Canadi	e39f4f6cf9	Fix data race #3 Summary: Added requirement that ComputeCompactionScore() be executed in mutex, since it's accessing being_compacted bool, which can be mutated by other threads. Also added more comments about thread safety of FileMetaData, since it was a bit confusing. However, it seems that FileMetaData doesn't have data races (except being_compacted) Test Plan: Ran 100 ConvertCompactionStyle tests with thread sanitizer. On master -- some failures. With this patch -- none. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32283	2015-02-04 16:04:51 -08:00
sdong	e63140d52b	Get() to use prefix bloom filter when filter is not block based Summary: Get() now doesn't make use of bloom filter if it is prefix based. Add the check. Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter. Test Plan: make all check Add a test case in DBTest Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31941	2015-02-04 15:15:41 -08:00
Venkatesh Radhakrishnan	dad98dd4ae	Changes for supporting cross functional tests for inplace_update Summary: This diff containes the changes to the code and db_test for supporting cross functional tests for inplace_update Test Plan: Run XF with inplace_test and also without Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32367	2015-02-03 12:19:56 -08:00
sdong	9898f63988	Divide test DBIteratorTest.DBIterator to smaller tests Summary: When building on my host, I saw warning: In file included from db/db_iter_test.cc:17:0: db/db_iter_test.cc: In member function â€˜void rocksdb::_Test_DBIterator::_Run()â€™: ./util/testharness.h:147:14: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without void TCONCAT(_Test_,name)::_Run() ^ ./util/testharness.h:134:23: note: in definition of macro â€˜TCONCAT1â€™ #define TCONCAT1(a,b) a##b ^ ./util/testharness.h:147:6: note: in expansion of macro â€˜TCONCATâ€™ void TCONCAT(_Test_,name)::_Run() ^ db/db_iter_test.cc:589:1: note: in expansion of macro â€˜TESTâ€™ TEST(DBIteratorTest, DBIterator) { ^ By dividing the test into small tests, it should fix the problem Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32679	2015-02-03 09:48:03 -08:00
Venkatesh Radhakrishnan	0b8dec7172	Cross functional test infrastructure for RocksDB. Summary: This Diff provides the implementation of the cross functional test infrastructure. This provides the ability to test a single feature with every existing regression test in order to identify issues with interoperability between features. Test Plan: Reference implementation of inplace update support cross functional test. Able to find interoperability issues with inplace support and ran all of db_test. Will add separate diff for those changes. Reviewers: igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32247	2015-02-02 14:49:22 -08:00
Marko Kevac	86e2a1eeea	Allow creating backups from C	2015-01-31 15:47:49 +03:00
sdong	5917de0bae	CappedFixTransform: return fixed length prefix, or full key if key is shorter than the fixed length Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key. Test Plan: Add a test case for db_test options_test and a new test Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31887	2015-01-30 16:04:30 -08:00
Igor Canadi	6c6037f60c	Expose Snapshot's SequenceNumber Summary: Requested here: https://www.facebook.com/groups/rocksdb.dev/permalink/705524519546065/ It might also help with mongo. I don't see a reason why we shouldn't expose this info. Test Plan: make check Reviewers: sdong, yhchiang, rven Reviewed By: rven Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32547	2015-01-29 18:22:43 -08:00
sdong	d07fec3bdc	make DBTest.SharedWriteBuffer to pass MockEnv Summary: DBTest.SharedWriteBuffer uses an Options that doesn't pass CurrentOptions(), so that it doesn't use MockEnv. However, DBTest's constructor uses MockEnv to call DestoryDB() to clean up, causing uncleaned state before it runs. Test Plan: Run the test modified to make sure they pass default Env and SharedWriteBuffer now passes MockEnv. Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32475	2015-01-28 16:19:27 -08:00
Igor Canadi	4bdf38b16e	Disable FlushSchedule when running TSAN Summary: There's a bug in TSAN (or libstdc++?) with std::shared_ptr<> for some reason. In db_test, only FlushSchedule is affected. See more: https://groups.google.com/forum/#!topic/thread-sanitizer/vz_s-t226Vg With this change and all other @sdong's and mine diffs, our db_test should be TSAN-clean. I'll move to other tests. Test Plan: no more flush schedule when running TSAN Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32469	2015-01-28 15:31:48 -08:00
sdong	10af17f3d7	fault_injection_test: add a unit test to allow parallel compactions and multiple levels Summary: Add a new test case in fault_injection_test, which covers parallel compactions and multiple levels. Use MockEnv to run the new test case to speed it up. Improve MockEnv to avoid DestoryDB(), previously failed when deleting lock files. Test Plan: Run ./fault_injection_test, including valgrind Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32415	2015-01-28 14:07:25 -08:00
Igor Canadi	560ed402bd	[minor] fprintf to stderr instead of stdout in test	2015-01-27 21:00:33 -08:00
sdong	d2a2b058f0	fault_injection_test: to support file closed after being deleted Summary: fault_injection_test occasionally fails because file closing can happen after deletion. Improve the test to support it. Test Plan: I have a new test case I'm working on, where the issue appears almost every time. With the patch, the problem goes away. Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32373	2015-01-27 17:06:47 -08:00
Yueh-Hsuan Chiang	d6c7300ccf	Fixed a compile warning in clang in db/listener_test.cc Summary: Fixed a compile warning in clang in db/listener_test.cc Test Plan: make listener_test Reviewers: oridb Reviewed By: oridb Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D32337	2015-01-27 15:01:04 -08:00
Ori Bernstein	f9758e0129	Add compaction listener. Summary: This adds a listener for compactions, and gives some useful statistics on each compaction pass. Test Plan: Unit tests. Reviewers: sdong, igor, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D31641	2015-01-27 14:44:02 -08:00
sdong	e919ecedfc	SuperVersion::Unref() to use sequential consistency to decrease ref counting Summary: I'm not sure the expected results of std::atomic::fetch_sub() when using memory_order_relaxed, and I suspect TSAN complains. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32259	2015-01-27 14:08:08 -08:00
sdong	1b43ab58d9	fault_injection_test: add more logging and makes synchronization slightly stronger Summary: We see failure of the test in travis but I can't repro it. Add more logging in failure cases to help us figure out which failure it is. Also makes synchronization slightly stronger, though there isn't seem to be a problem without it Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32319	2015-01-27 13:53:17 -08:00
sdong	be8f0b12ed	Rename DBImpl::log_dir_unsynced_ to log_dir_synced_ Summary: log_dir_unsynced_ is a confusing name. Rename it to log_dir_synced_ and flip the value. Test Plan: Run ./fault_injection_test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32235	2015-01-26 16:01:36 -08:00
sdong	c1de6c42a0	fault_injection_test: add a test case to drop random number of unsynced data Summary: Currently fault_injection_test has a test case to drop all the unsynced data. Add one more case to take a randomized bytes from it. Test Plan: Run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32229	2015-01-26 15:53:21 -08:00
sdong	d888c95748	Sync WAL Directory and DB Path if different from DB directory Summary: 1. If WAL directory is different from db directory. Sync the directory after creating a log file under it. 2. After creating an SST file, sync its parent directory instead of DB directory. 3. change the check of kResetDeleteUnsyncedFiles in fault_injection_test. Since we changed the behavior to sync log files' parent directory after first WAL sync, instead of creating, kResetDeleteUnsyncedFiles will not guarantee to show post sync updates. Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32067	2015-01-26 14:17:45 -08:00
Igor Canadi	f1c8862479	Fix data race #1 Summary: This is first in a series of diffs that fixes data races detected by thread sanitizer. Here the problem is that we call Ref() on a column family during a single-threaded write, without holding a mutex. Test Plan: TSAN is no longer complaining about LevelLimitReopen. Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32121	2015-01-26 11:48:07 -08:00
Igor Canadi	42189612c3	Fix data race #2 Summary: We should not be calling InternalStats methods outside of the mutex. Test Plan: COMPILE_WITH_TSAN=1 m db_test && ROCKSDB_TESTS=CompactionTrigger ./db_test failing before the diff, works now Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32127	2015-01-23 18:04:39 -08:00
Igor Canadi	f5a8398352	Fix archive WAL race conditions Summary: More race condition bugs with our archive WAL files. I do believe this caused t5988326, but can't reproduce the failure unfortunately. Test Plan: make check Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32103	2015-01-23 17:35:12 -08:00
sdong	43ec4e68ba	fault_injection_test: bring back 3 iteration runs Summary: 3 iterations were disabled by mistake by one recent commit, causing CLANG build error. Fix it Test Plan: USE_CLANG=1 make fault_injection_test and run the test Reviewers: igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32109	2015-01-23 16:30:43 -08:00
sdong	c2e8e8c1c0	Fix two namings in fault_injection_test.cc Summary: fault_injection_test.cc has two variable names not following the convention fix it. Test Plan: run the test Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32097	2015-01-23 16:12:48 -08:00
sdong	b4c13a868a	fault_injection_test: improvements and add new tests Summary: Wrapper classes in fault_injection_test doesn't simulate RocksDB Env behavior close enough. Improve it by: (1) when fsync, don't sync parent (2) support directory fsync (3) support multiple directories Add test cases of (1) persisting by WAL fsync, not just compact range (2) different WAL dir (3) combination of (1) and (2) (4) data directory is not the same as db name. Test Plan: Run the test and make sure it passes. Reviewers: rven, yhchiang, igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32031	2015-01-23 15:50:15 -08:00
Yueh-Hsuan Chiang	46a7048dcd	Reduce false alarm in ThreadStatusMultipleCompaction test	2015-01-22 15:45:02 -08:00
sdong	4e48753b73	Sync manifest file when initializing it Summary: Now we don't sync manifest file when initializing it, so DB cannot be safely reopened before the first mem table flush. Fix it by syncing it. This fixes fault_injection_test. Test Plan: make all check Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D32001	2015-01-22 14:32:03 -08:00
Igor Canadi	ae82849bc9	Fix build failure	2015-01-21 18:23:12 -08:00
Igor Canadi	423dee8418	Abort db_bench if Get() returns error Summary: I saw this when running readrandom benchmark with corrupted database -- benchmark worked! If a Get() returns corruption we should probably abort. Test Plan: compiles Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31701	2015-01-21 18:18:15 -08:00
sdong	206237d121	DBImpl::CheckConsistency() shouldn't create path name with double "/" Summary: GetLiveFilesMetaData() already adds a leading "/" in file name. No need to add one extra "/" in DBImpl::CheckConsistency() Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D31779	2015-01-21 16:36:13 -08:00
Yueh-Hsuan Chiang	b229f970df	Remove Compaction::ReleaseInputs(). Summary: This patch remove the unnecessary Compaction::ReleaseInputs(). Compaction::ReleaseInputs() tries to unref its input_version and column_family. However, such unref is always done in ~Compaction(), and all current ReleaseInputs() calls are right before the destructor. Test Plan: ./db_test Reviewers: igor Reviewed By: igor Subscribers: igor, rven, dhruba, sdong Differential Revision: https://reviews.facebook.net/D31605	2015-01-15 12:44:19 -08:00
Thomas Dudziak	d10f1de2b4	Ported LevelDB's fault_injection_test Summary: This is a port of [[ https://github.com/google/leveldb/blob/master/db/fault_injection_test.cc \| LevelDB's fault_injection_test ]] to RocksDB. Unfortunately it fails with: ``` ==== Test FaultInjectionTest.FaultTest db/fault_injection_test.cc:491: Corruption: no meta-nextfile entry in descriptor #0 ./fault_injection_test() [0x41477a] rocksdb::FaultInjectionTest::PartialCompactTestReopenWithFault(rocksdb::FaultInjectionTest::ResetMethod, int, int) /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:491 #1 ./fault_injection_test() [0x40a38a] rocksdb::_Test_FaultTest::_Run() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:517 #2 ./fault_injection_test() [0x415bea] rocksdb::_Test_FaultTest::_RunIt() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:507 #3 ./fault_injection_test() [0x584367] rocksdb::test::RunAllTests() /data/users/tomdzk/rocksdb/util/testharness.cc:70 #4 /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6(__libc_start_main+0x10e) [0x7f7a40857efe] ?? ??:0 #5 ./fault_injection_test() [0x408bb8] _start ??:0 ``` so I commented out the test invocation in the source code for now (lines 514-520) so it can be merged. Test Plan: This is a new test. Reviewers: igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31587	2015-01-15 10:28:10 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
sdong	bb128bfec3	More accurate message for compaction applied to a different version Test Plan: Compile. Run it. Reviewers: yhchiang, dhruba, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D31479	2015-01-13 16:48:18 -08:00
Igor Canadi	53f615df6a	Fix clang build	2015-01-13 12:27:28 -08:00
Yueh-Hsuan Chiang	2159484dd6	Remove two unnecessary blank lines in db/db_test.cc	2015-01-13 01:40:11 -08:00
Yueh-Hsuan Chiang	d2c018fd5b	Make ThreadStatusMultipleCompaction more robust.	2015-01-13 01:02:10 -08:00
Yueh-Hsuan Chiang	bf9aa4dfcd	Improve GetThreadStatus to avoid false alarm in some case.	2015-01-13 00:38:09 -08:00
Yueh-Hsuan Chiang	c91cdd59c1	Allow GetThreadList() to indicate a thread is doing Compaction. Summary: Allow GetThreadList() to indicate a thread is doing Compaction. Test Plan: export ROCKSDB_TESTS=ThreadStatus ./db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: leveldb, dhruba, jonahcohen, rven Differential Revision: https://reviews.facebook.net/D30105	2015-01-13 00:04:08 -08:00
Igor Canadi	15d2abbec3	Fix build issues	2015-01-09 13:04:06 -08:00
Igor Canadi	abb9b95ffe	Move compression functions from port/ to util/ Summary: We keep checksum functions in util/, there is no reason for compression to be in port/ Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31281	2015-01-09 12:57:11 -08:00
sdong	9132e52ea4	DB Stats Dump to print total stall time Summary: Add printing of stall time in DB Stats: Sample outputs: DB Stats Uptime(secs): 53.2 total, 1.7 interval Cumulative writes: 625940 writes, 625939 keys, 625940 batches, 1.0 writes per batch, 0.49 GB user ingest, stall micros: 50691070 Cumulative WAL: 625940 writes, 625939 syncs, 1.00 writes per sync, 0.49 GB written Interval writes: 10859 writes, 10859 keys, 10859 batches, 1.0 writes per batch, 8.7 MB user ingest, stall micros: 1692319 Interval WAL: 10859 writes, 10859 syncs, 1.00 writes per sync, 0.01 MB written Test Plan: make all check verify printing using db_bench Reviewers: igor, yhchiang, rven, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D31239	2015-01-09 11:44:19 -08:00
Ameya Gupte	242b9769c3	Memtablerep Benchmark Summary: Create a benchmark for testing memtablereps. This diff is a bit rough, but it should do the trick until other bootcampers can clean it up. Addressing comments Removed the mutexes Changed ReadWriteBenchmark to fix number of reads and count the number of writes we can perform in that time. Test Plan: Run it. Below runs pass ./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep skiplist ./memtablerep_bench --benchmarks fillseq,readseq --memtablerep skiplist ./memtablerep_bench --benchmarks readwrite,seqreadwrite --memtablerep skiplist --num_operations 200 --num_threads 5 ./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep hashskiplist ./memtablerep_bench --benchmarks fillseq,readseq --memtablerep hashskiplist --num_scans 2 ./memtablerep_bench --benchmarks fillseq,readseq --memtablerep vector Reviewers: jpaton, ikabiljo, sdong Reviewed By: sdong Subscribers: dhruba, ameyag Differential Revision: https://reviews.facebook.net/D22683	2015-01-07 15:15:30 -08:00
sdong	73ee4febab	Add comments about properties supported by DB::GetProperty() and DB::GetIntProperty() Summary: Add comments in db.h to help users discover their options. Test Plan: Compile Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31077	2015-01-07 15:09:35 -08:00
sdong	9ef59a09a5	VersionSet::AddLiveFiles() to assert current version is included. Summary: Add an extra assert to make sure current version is included in VersionSet::AddLiveFiles(). Test Plan: make all check Reviewers: yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba, hermanlee4, leveldb Differential Revision: https://reviews.facebook.net/D30819	2015-01-07 12:03:40 -08:00
sdong	4d16a9a633	VersionBuilder to optimize for applying a later edit deleting files added by previous edits Summary: During recovery, VersionBuilder::Apply() was called multiple times. If the DB is open for long enough, most of files added earlier will be deleted by later deletes. In current solution, sorting added file happens first and then deletes are applied. In this patch, deletes are applied when possible inside Apply(), which can significantly reduce the sorting time in some cases. Test Plan: Add unit tests in version_builder valgrind_check Open a manifest of 50MB, with 9K live files. The manifest read time reduced from 1.6 seconds to 0.7 seconds. Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30765	2015-01-07 10:36:58 -08:00
Igor Canadi	7731d51c82	Simplify column family concurrency Summary: This patch changes concurrency guarantees around ColumnFamilySet::column_families_ and ColumnFamilySet::column_families_data_. Before: * When mutating: lock DB mutex and spin lock * When reading: lock DB mutex OR spin lock After: * When mutating: lock DB mutex and be in write thread * When reading: lock DB mutex or be in write thread That way, we eliminate the spin lock that protects these hash maps and simplify concurrency. That means we don't need to lock the spin lock during writing, since writing is mutually exclusive with column family create/drop (the only operations that mutate those hash maps). With these new restrictions, I also needed to move column family create to the write thread (column family drop was already in the write thread). Even though we don't need to lock the spin lock during write, impact on performance should be minimal -- the spin lock is almost never busy, so locking it is almost free. This addresses task t5116919. Test Plan: make check Stress test with lots and lots of column family drop and create: time ./db_stress --threads=30 --ops_per_thread=5000000 --max_key=5000 --column_families=200 --clear_column_family_one_in=100000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress/ Reviewers: yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30651	2015-01-06 12:44:21 -08:00
Igor Canadi	07aa4e0e35	Fix compaction summary log for trivial move Summary: When trivial move commit is done, we log the summary of the input version instead of current. This is inconsistent with other log messages and confusing. Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30939	2015-01-05 17:32:49 -08:00
Leonidas Galanis	9d5bd411be	benchmark.sh won't run through all tests properly if one specifies wal_dir to be different than db directory. Summary: A command line like this to run all the tests: source benchmark.config.sh && nohup ./benchmark.sh 'bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' where benchmark.config.sh is: export DB_DIR=/data/mysql/rocksdata export WAL_DIR=/txlogs/rockswal export OUTPUT_DIR=/root/rocks_benchmarking/output Will fail for the tests that need a new DB . Also 1) set disable_data_sync=0 and 2) add debug mode to run through all the tests more quickly Test Plan: run ./benchmark.sh 'debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' and verify that there are no complaints about WAL dir not being empty. Reviewers: sdong, yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D30909	2015-01-05 15:36:47 -08:00
Igor Canadi	62ad0a9b19	Deprecating skip_log_error_on_recovery Summary: Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery. The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings. Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery. Reviewers: yhchiang, rven, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D30603	2015-01-05 13:35:56 -08:00

... 4 5 6 7 8 ...

1960 Commits