rocksdb

Author	SHA1	Message	Date
Igor Canadi	77e4ad7ce2	Fix compile failure on Travis Summary: Travis is complaining against using {} to initialize KVMap: https://travis-ci.org/facebook/rocksdb/jobs/84132600 db/compaction_job_test.cc:526:26: error: chosen constructor is explicit in copy-initialization RunCompaction({files}, {}); This diff should fix it Test Plan: travis Reviewers: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D48309	2015-10-07 10:17:47 -07:00
Igor Canadi	d80ce7f99a	Compaction filter on merge operands Summary: Since Andres' internship is over, I took over https://reviews.facebook.net/D42555 and rebased and simplified it a bit. The behavior in this diff is a bit simpler than in D42555: * only merge operators are passed through FilterMergeValue(). If fitler function returns true, the merge operator is ignored * compaction filter is not called on: 1) results of merge operations and 2) base values that are getting merged with merge operands (the second case was also true in previous diff) Do we also need a compaction filter to get called on merge results? Test Plan: make && make check Reviewers: lovro, tnovak, rven, yhchiang, sdong Reviewed By: sdong Subscribers: noetzli, kolmike, leveldb, dhruba, sdong Differential Revision: https://reviews.facebook.net/D47847	2015-10-07 09:30:03 -07:00
dyniusz	0267502655	Support for LevelDB SST with .ldb suffix Summary: Handle SST files with both ".sst" and ".ldb" suffix. This enables user to migrate from leveldb to rocksdb. Test Plan: Added unit test with DB operating on SSTs with names schema. See db/dc_test.cc:SSTsWithLdbSuffixHandling for details Reviewers: yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D48003	2015-10-06 17:46:22 -07:00
Igor Canadi	eb5b637fb0	Fix condition for bottommost level Summary: The function GetBoundaryKeys() returns the smallest key from the first file and largest key from the last file. This is good for any level >0, but it's not correct for level 0. In level 0, files can overlap, so we need to check all files for boundary keys. This bug can cause wrong value for bottommost_level in compaction (value of true, although correct is false), which means we can set sequence numbers to 0 even if the key is not the oldest one in the database. Herman reported corruption while testing MyRocks. Fortunately, the patch that added the bug was not released yet. Test Plan: added a new test to compaction_picker_test. Reviewers: hermanlee4, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D48201	2015-10-05 17:40:18 -07:00
Igor Canadi	9eaff629e3	Make corruption_test more robust Summary: Latest travis failed because of corruption test TableFileIndexData: https://travis-ci.org/facebook/rocksdb/jobs/83732558 This diff makes the test more explicit: 1. create two files 2. corrupt the second's file index 3. expect to get only 5000 keys when range scanning Test Plan: the test is still passing :) Reviewers: sdong, rven, yhchiang, kradhakrishnan, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D48183	2015-10-05 14:46:28 -07:00
Igor Canadi	bf19dbff44	Fix valgrind - Initialize done variable Summary: Fixes the valgrind warning "Conditional jump or move depends on uninitialised value(s)" Test Plan: valgrind test, no more warning Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D48177	2015-10-05 10:10:11 -07:00
Igor Canadi	115427ef63	Add APIs PauseBackgroundWork() and ContinueBackgroundWork() Summary: To support a new MongoDB capability, we need to make sure that we don't do any IO for a short period of time. For background, see: * https://jira.mongodb.org/browse/SERVER-20704 * https://jira.mongodb.org/browse/SERVER-18899 To implement that, I add a new API calls PauseBackgroundWork() and ContinueBackgroundWork() which reuse the capability we already have in place for RefitLevel() function. Test Plan: Added a new test in db_test. Made sure that test fails when PauseBackgroundWork() is commented out. Reviewers: IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47901	2015-10-02 13:17:34 -07:00
Islam AbdelRahman	c29af48d3e	Add max_file_opening_threads to db_bench Summary: Add an option to db_bench for max_file_opening_threads Test Plan: compile and run db_bench Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, paultuckfield Differential Revision: https://reviews.facebook.net/D47811	2015-09-30 09:51:31 -07:00
Yueh-Hsuan Chiang	30f74fa964	Make CompactionJobStatsTest.UniversalCompactionTest more robust Summary: CompactionJobStatsTest.UniversalCompactionTest assumes compaction kicks in when the number of L0 files equals to the compaction trigger. However, in some case, the compaction might not catch up the write speed and thus compaction might not kick in until the number of L0 files is GREATER than the compaction trigger. This patch tries to fix this corner case by making the Put thread wait for a potential compaction whenever it flushes. Test Plan: ./compaction_job_stats_test Reviewers: sdong, anthony, IslamAbdelRahman, igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D47589	2015-09-28 13:55:53 -07:00
Mike Lin	60fa9cf0b5	Override DBImplReadOnly::SyncWAL() to return NotSupported. Previously, calling it caused program abort.	2015-09-25 21:25:30 -07:00
Yueh-Hsuan Chiang	63e0f86797	Fixed a bug which causes rocksdb.flush.write.bytes stat is always zero Summary: Fixed a bug which causes rocksdb.flush.write.bytes stat is always zero Test Plan: augment existing db_test Reviewers: sdong, anthony, IslamAbdelRahman, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47595	2015-09-25 13:34:49 -07:00
Yueh-Hsuan Chiang	b6aa3f962d	Fixed a memory leak issue in DBTest.UnremovableSingleDelete Summary: Fixed a memory leak issue in DBTest.UnremovableSingleDelete Test Plan: valgrind --error-exitcode=2 --leak-check=full ./db_test --gtest_filter="UnremovableSingleDelete" Reviewers: sdong, anthony, IslamAbdelRahman, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47583	2015-09-25 12:07:32 -07:00
Igor Canadi	7b7b5d9f18	[minor] Reuse SleepingBackgroundTask Summary: As title Test Plan: make check Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46983	2015-09-25 10:29:44 -07:00
Mayank Pundir	c58bac701c	Fix valgrind failure due to memory leaks Summary: Test cases for IsBottommostLevel function create FileMetaData objects which were not getting deleted in the destructor. Test Plan: Valgrind check on compaction_picker_test Reviewers: yhchiang, igor, sdong Subscribers: rven, kradhakrishnan, IslamAbdelRahman, dhruba, anthony Differential Revision: https://reviews.facebook.net/D47463	2015-09-23 17:41:42 -07:00
Islam AbdelRahman	f03b5c987b	Add experimental DB::AddFile() to plug sst files into empty DB Summary: This is an initial version of bulk load feature This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are (1) Memtables are empty (2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0 (3) Added sst files values are not overlapping Test Plan: unit testing Reviewers: igor, ott, sdong Reviewed By: sdong Subscribers: leveldb, ott, dhruba Differential Revision: https://reviews.facebook.net/D39081	2015-09-23 12:42:43 -07:00
Yueh-Hsuan Chiang	3fdb6e5234	Fixed old lint errors in db/filename.cc Summary: Fixed old lint errors in db/filename.cc Test Plan: make Reviewers: igor, sdong, anthony, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47445	2015-09-23 12:39:16 -07:00
Yueh-Hsuan Chiang	b349d22786	Fixed old lint errors in db/filename.h Summary: Fixed old lint errors in db/filename.h Test Plan: make Reviewers: igor, sdong, anthony, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47439	2015-09-23 12:22:44 -07:00
sdong	df34aea331	PlainTableReader to support non-mmap mode Summary: PlainTableReader now only allows mmap-mode. Add the support to non-mmap mode for more flexibility. Refactor the codes to move all logic of reading data to PlainTableKeyDecoder, and consolidate the calls to Read() call and ReadVarint32() call. Implement the calls for both of mmap and non-mmap case seperately. For non-mmap mode, make copy of keys in several places when we need to move the buffer after reading the keys. Test Plan: Add the mode of non-mmap case in plain_table_db_test. Run it in valgrind mode too. Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D47187	2015-09-23 11:41:07 -07:00
sdong	d0c31641d2	Internal stats WAL file synced to match meaning of the stats of the same name Summary: https://reviews.facebook.net/D23343 changed WAL sync bytes to extra fsync. This change does the same for internal stats. Test Plan: Run all existing unit tests and verify results in db_bench. Reviewers: anthony, rven, igor, MarkCallaghan, kradhakrishnan, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D47349	2015-09-22 14:23:11 -07:00
Siying Dong	48b4497f75	Merge pull request #730 from yuslepukhin/fix_write_batch_win_const_expr Fix Windows constexpr issue and '#ifdef' column_family_test in Release.	2015-09-22 11:08:10 -07:00
sdong	f1b9f804e9	Add a mode to always pick the oldest file to compact for each level Summary: Add options.compaction_pri, which specifies the policy about which file to compact first. kCompactionPriByLargestSeq will compact oldest files first. Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable. Test Plan: Will write unit tests Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45951	2015-09-21 17:21:59 -07:00
Dmitri Smirnov	2754ec9994	Fix Windows constexpr issue and '#ifdef' column_family_test in Release.	2015-09-21 16:21:01 -07:00
jsteemann	5ec129971b	key_ cannot become nullptr, so no check is needed for that (ignoring the unlikely case that some overrides `operator new throw(std::bad_alloc)` with a function that returns a nullptr)	2015-09-18 20:15:20 +02:00
jsteemann	834b12a8d5	made Size() function const because it does not modify data	2015-09-18 20:10:00 +02:00
Andres Noetzli	014fd55adc	Support for SingleDelete() Summary: This patch fixes #7460559. It introduces SingleDelete as a new database operation. This operation can be used to delete keys that were never overwritten (no put following another put of the same key). If an overwritten key is single deleted the behavior is undefined. Single deletion of a non-existent key has no effect but multiple consecutive single deletions are not allowed (see limitations). In contrast to the conventional Delete() operation, the deletion entry is removed along with the value when the two are lined up in a compaction. Note: The semantics are similar to @igor's prototype that allowed to have this behavior on the granularity of a column family ( https://reviews.facebook.net/D42093 ). This new patch, however, is more aggressive when it comes to removing tombstones: It removes the SingleDelete together with the value whenever there is no snapshot between them while the older patch only did this when the sequence number of the deletion was older than the earliest snapshot. Most of the complex additions are in the Compaction Iterator, all other changes should be relatively straightforward. The patch also includes basic support for single deletions in db_stress and db_bench. Limitations: - Not compatible with cuckoo hash tables - Single deletions cannot be used in combination with merges and normal deletions on the same key (other keys are not affected by this) - Consecutive single deletions are currently not allowed (and older version of this patch supported this so it could be resurrected if needed) Test Plan: make all check Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor Reviewed By: igor Subscribers: maykov, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43179	2015-09-17 11:42:56 -07:00
Venkatesh Radhakrishnan	51e1c11254	Do not flag error if file to be deleted does not exist Summary: Some users have observed errors in the log file when the log file or sst file is already deleted. Test Plan: Make sure that the errors do not appear for already deleted files. Reviewers: sdong Reviewed By: sdong Subscribers: anthony, kradhakrishnan, yhchiang, rven, igor, IslamAbdelRahman, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47115	2015-09-17 10:21:34 -07:00
Mayank Pundir	a5e312a7a4	Improving condition for bottommost level during compaction Summary: The diff modifies the condition checked to determine the bottommost level during compaction. Previously, absence of files in higher levels alone was used as the condition. Now, the function additionally evaluates if the higher levels have files which have non-overlapping key ranges, then the level can be safely considered as the bottommost level. Test Plan: Unit test cases added and passing. However, unit tests of universal compaction are failing as a result of the changes made in this diff. Need to understand why that is happening. Reviewers: igor Subscribers: dhruba, sdong, lgalanis, meyering Differential Revision: https://reviews.facebook.net/D46473	2015-09-16 17:47:50 -07:00
sdong	9aca7cd6d8	DB::Open() to flush info log after printing DB pointer Summary: Now DB::Open() flushes info log before printing DB pointer, so it may not show up if no activity after DB open. Move log flushing from after printing options to printing DB pointer. Test Plan: make commit-prereq Reviewers: igor, IslamAbdelRahman, yhchiang, kradhakrishnan, anthony, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D47121	2015-09-16 16:33:39 -07:00
Yueh-Hsuan Chiang	f21c7415a7	Change the log level of DB start-up log from Warn to Header. Summary: Change the log level of DB start-up log from Warn to Header. Test Plan: db_bench and observe the LOG header Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47067	2015-09-16 11:31:45 -07:00
Alexey Maykov	3ebf11ed16	Adding the increment for a counter for a number of WAL syncs Summary: This will unblock the corresponding change in MyRocks Test Plan: ran rocksdb.write_sync test Reviewers: sdong, kolmike Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D46911	2015-09-16 11:00:49 -07:00
Igor Canadi	1b7ea8ce81	Skipped tests shouldn't be failures Summary: If we skip a test, we shouldn't mark `make check` as failure. This fixes travis CI test. Test Plan: Travis CI Reviewers: noetzli, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D47031	2015-09-15 18:10:36 -07:00
Ari Ekmekji	5ba3297d0d	Add compaction time to log output Summary: Although compaction time is recorded in the statistics, it is helpful to include this value in the log output corresponding to the end of compaction. Test Plan: make all && make check Reviewers: yhchiang, sdong, igor, noetzli, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D47007	2015-09-15 17:11:44 -07:00
Igor Canadi	0e50a3fcc0	Merge issue with D46773 Summary: There was a merge issue with SleepingBackgroundTask Test Plan: compiles now Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46977	2015-09-15 11:35:23 -07:00
Igor Canadi	a7e80379b0	LogAndApply() should fail if the column family has been dropped Summary: This patch finally fixes the ColumnFamilyTest.ReadDroppedColumnFamily test. The test has been failing very sporadically and it was hard to repro. However, I managed to write a new tests that reproes the failure deterministically. Here's what happens: 1. We start the flush for the column family 2. We check if the column family was dropped here: `a3fc49bfdd/db/flush_job.cc (L149)` 3. This check goes through, ends up in InstallMemtableFlushResults() and it goes into LogAndApply() 4. At about this time, we start dropping the column family. Dropping the column family process gets to LogAndApply() at about the same time as LogAndApply() from flush process 5. Drop column family goes through LogAndApply() first, marking the column family as dropped. 6. Flush process gets woken up and gets a chance to write to the MANIFEST. However, this is where it gets stuck: `a3fc49bfdd/db/version_set.cc (L1975)` 7. We see that the column family was dropped, so there is no need to write to the MANIFEST. We return OK. 8. Flush gets OK back from LogAndApply() and it deletes the memtable, thinking that the data is now safely persisted to sst file. The fix is pretty simple. Instead of OK, we return ShutdownInProgress. This is not really true, but we have been using this status code to also mean "this operation was canceled because the column family has been dropped". The fix is only one LOC. All other code is related to tests. I added a new test that reproes the failure. I also moved SleepingBackgroundTask to util/testutil.h (because I needed it in column_family_test for my new test). There's plenty of other places where we reimplement SleepingBackgroundTask, but I'll address that in a separate commit. Test Plan: 1. new test 2. make check 3. Make sure the ColumnFamilyTest.ReadDroppedColumnFamily doesn't fail on Travis: https://travis-ci.org/facebook/rocksdb/jobs/79952386 Reviewers: yhchiang, anthony, IslamAbdelRahman, kradhakrishnan, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46773	2015-09-15 11:28:44 -07:00
Yoshinori Matsunobu	4886073174	Adding Slice::difference_offset() function Summary: There are some use cases in MyRocks to compare two slices and to return the first byte where they differ. It may be useful to add it as a RocksDB Slice function. Test Plan: db_test Reviewers: sdong, rven, igor Reviewed By: igor Subscribers: jkedgar, dhruba Differential Revision: https://reviews.facebook.net/D46935	2015-09-15 10:32:42 -07:00
sdong	f3170b6f6c	DBImpl::FindObsoleteFiles() shouldn't release mutex between getting min_pending_output and scanning files Summary: Releasing mutex between getting min_pending_output and scanning files may cause min_pending_output to be max but some non-final files are found in file scanning, ending up with deleting wrong files. As a recent regression, mutex can be released while waiting for log sync. We move it to after file scanning. Test Plan: Run all existing tests. Don't think it is easy to write a unit test. Maybe we should find a way to assert lock not released so that we can have some test verification for similar cases. Reviewers: igor, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, kolmike, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D46899	2015-09-14 23:39:30 -07:00
Islam AbdelRahman	7cb314b9e6	Skip some tests in ROCKSD_LITE Summary: Skip these tests under ROCKSDB_LITE compaction_job_stats_test corruption_test transactions/transaction_test Test Plan: compile using ROCKSDB_LITE Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D46923	2015-09-14 16:44:35 -07:00
sdong	5de807ac16	Add options.hard_pending_compaction_bytes_limit to stop writes if compaction lagging behind Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold. Test Plan: Add unit test DBTest.HardLimit Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45999	2015-09-14 12:51:16 -07:00
Siying Dong	592f6bf782	Merge pull request #716 from yuslepukhin/refactor_file_reader_writer_win Refactor to support file_reader_writer on Windows.	2015-09-14 12:29:01 -07:00
Ari Ekmekji	03ddce9a01	Add counters for L0 stall while L0-L1 compaction is taking place Summary: Although there are currently counters to keep track of the stall caused by having too many L0 files, there is no distinction as to whether when that stall occurs either (A) L0-L1 compaction is taking place to try and mitigate it, or (B) no L0-L1 compaction has been scheduled at the moment. This diff adds a counter for (A) so that the nature of L0 stalls can be better understood. Test Plan: make all && make check Reviewers: sdong, igor, anthony, noetzli, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba Differential Revision: https://reviews.facebook.net/D46749	2015-09-14 11:03:37 -07:00
Dmitri Smirnov	ddc8b44998	Address code review comments both GH and internal Fix compilation issues on GCC/CLANG Address Windows Release test build issues due to Sync	2015-09-11 17:36:48 -07:00
Andres Noetzli	34cedaff66	Initialize variable to avoid warning Summary: RocksDB debug version failed to build under gcc-4.8.1 on sandcastle with the following error: ``` db/db_compaction_filter_test.cc:570:33: error: â€˜snapshotâ€™ may be used uninitialized in this function [-Werror=maybe-uninitialized] ``` Test Plan: make db_compaction_filter_test && ./db_compaction_filter_test Reviewers: rven, anthony, yhchiang, aekmekji, igor, sdong Reviewed By: igor, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46725	2015-09-11 12:07:54 -07:00
Manuel Ung	aeb4612685	Add counters for seek/next/prev Summary: There are currently no statistics on seeks, only on gets. This adds the following counters: rocksdb.number.db.seek rocksdb.number.db.next rocksdb.number.db.prev (number of calls) rocksdb.db.iterate.bytes.read (number of bytes read from key + value using seek/next/prev) rocksdb.number.keys.seek.found rocksdb.number.keys.next.found rocksdb.number.keys.prev.found (number of calls where seek/next/prev found a value) Test Plan: ./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5 ./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5 -reverse_iterator Reviewers: yhchiang, rven, kradhakrishnan, IslamAbdelRahman, MarkCallaghan, sdong, igor Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D46605	2015-09-11 11:37:44 -07:00
Islam AbdelRahman	45e9e4f0bb	Refactor NewTableReader to accept TableReaderOptions Summary: Refactoring NewTableReader to accept TableReaderOptions This will make it easier to add new options in the future, for example in this diff https://reviews.facebook.net/D46071 Test Plan: run existing tests Reviewers: igor, yhchiang, anthony, rven, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D46179	2015-09-11 11:36:33 -07:00
Andres Noetzli	ddb950f83f	Fixed bug in compaction iterator Summary: During the refactoring, the condition that makes sure that compaction filters are only applied to records newer than the latest snapshot got butchered. This patch fixes the condition and adds a test case. Test Plan: make db_compaction_filter_test && ./db_compaction_filter_test Reviewers: rven, anthony, yhchiang, sdong, aekmekji, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46707	2015-09-11 10:13:49 -07:00
Dmitri Smirnov	30e82d5c41	Refactor to support file_reader_writer on Windows. Summary. A change https://reviews.facebook.net/differential/diff/224721/ Has attempted to move common functionality out of platform dependent code to a new facility called file_reader_writer. This includes: - perf counters - Buffering - RateLimiting However, the change did not attempt to refactor Windows code. To mitigate, we introduce new quering interfaces such as UseOSBuffer(), GetRequiredBufferAlignment() and ReaderWriterForward() for pure forwarding where required. Introduce WritableFile got a new method Truncate(). This is to communicate to the file as to how much data it has on close. - When space is pre-allocated on Linux it is filled with zeros implicitly, no such thing exist on Windows so we must truncate file on close. - When operating in unbuffered mode the last page is filled with zeros but we still want to truncate. Previously, Close() would take care of it but now buffer management is shifted to the wrappers and the file has no idea about the file true size. This means that Close() on the wrapper level must always include Truncate() as well as wrapper __dtor should call Close() and against double Close(). Move buffered/unbuffered write logic to the wrapper. Utilize Aligned buffer class. Adjust tests and implement Truncate() where necessary. Come up with reasonable defaults for new virtual interfaces. Forward calls for RandomAccessReadAhead class to avoid double buffering and locking (double locking in unbuffered mode on WIndows).	2015-09-11 09:57:02 -07:00
Andres Noetzli	c25f6a85bf	Removed __unused__ attribute Summary: The current build is failing on some platforms due to an __unused__ attribute. This patch prevents the problem by using a pattern similar to MergeHelper (assert not on the variable but inside a condition that uses the variable). We should have better error handling in both cases in the future. Test Plan: make clean all check Reviewers: rven, anthony, yhchiang, sdong, igor, aekmekji Reviewed By: aekmekji Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46623	2015-09-10 15:16:32 -07:00
Ari Ekmekji	6db0a939d2	Fix DBCompactionTest failure with parallel L0-L1 compactions Summary: The test SuggestCompactRangeNoTwoLevel0Compactions in DBCompactionTest fails when there are parallel L0-L1 compactions taking place because the test makes sure that only one compaction involving L0 takes place at any given time (since before having parallel compactions this was impossible). I changed the test to only run with DBOptions.max_subcompactions=1 so as to not hit this issue which is not a correctness issue but just an inherent changing of assumptions after introducing parallel compactions. This failed after landing https://reviews.facebook.net/D43269#inline-321303 so now this should fix it Test Plan: make all && make check Reviewers: yhchiang, igor, anthony, noetzli, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D46617	2015-09-10 14:37:00 -07:00
Andres Noetzli	8aa1f15197	Refactored common code of Builder/CompactionJob out into a CompactionIterator Summary: Builder and CompactionJob share a lot of fairly complex code. This patch refactors this code into a separate class, the CompactionIterator. Because the shared code is fairly complex, this patch hopefully improves maintainability. While there are is a lot of potential for further improvements, the patch is intentionally pretty close to the original structure because the change is already complex enough. Test Plan: make clean all check && ./db_stress Reviewers: rven, anthony, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46197	2015-09-10 14:35:25 -07:00
Igor Canadi	95ffc5d2bc	Correct ASSERT_OK() in ReadDroppedColumnFamily Summary: ReadDroppedColumnFamily is consistently failing in Travis CI environment (can't repro locally). I suspect it might be failing with non-OK status. This diff will give us more info about the failure. Test Plan: none Reviewers: sdong, kradhakrishnan Reviewed By: kradhakrishnan Subscribers: kradhakrishnan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46611	2015-09-10 14:17:12 -07:00
Ari Ekmekji	3c37b3cccd	Determine boundaries of subcompactions Summary: Up to this point, the subcompactions that make up a compaction job have been divided based on the key range of the L1 files, and each subcompaction has handled the key range of only one file. However DBOption.max_subcompactions allows the user to designate how many subcompactions at most to perform. This patch updates the CompactionJob::GetSubcompactionBoundaries() to determine these divisions accordingly based on that option and other input/system factors. The current approach orders the starting and/or ending keys of certain compaction input files and then generates a histogram to approximate the size covered by the key range between each consecutive pair of keys. Then it groups these ranges into groups so that the sizes are approximately equal to one another. The approach has also been adapted to work for universal compaction as well instead of just for level-based compaction as it was before. These subcompactions are then executed in parallel by locally spawning threads, one for each. The results are then aggregated and the compaction completed. Test Plan: make all && make check Reviewers: yhchiang, anthony, igor, noetzli, sdong Reviewed By: sdong Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43269	2015-09-10 13:50:00 -07:00
krad	1126644082	Relaxing consistency detection to include errors while inserting to memtable as WAL recovery error. Summary: The current code, considers data to be consistent if the record checksum passes. We do have customer issues where the record checksum passed but the data was incomprehensible. There is no way to get out of this error case since all WAL recovery model will consider this error as unrelated to WAL. Relaxing the definition and including errors while inserting to memtable as WAL errors and handing them as per the recovery level. Test Plan: Used customer dump to verify the fix for different level. The db opens for kSkipAnyCorruptedRecords and kPointInTimeRecovery, but fails for kAbsoluteConsistency and kTolerateCorruptedTailRecords. Reviewers: sdon igor CC: leveldb@ Task ID: #7918721 Blame Rev:	2015-09-10 12:56:17 -07:00
sdong	abc7f5fdb2	Make DBTest.ReadLatencyHistogramByLevel more robust Summary: DBTest.ReadLatencyHistogramByLevel was not written as expected. After writes, reads aren't guaranteed to hit data written. It was not expected. Fix it. Test Plan: Run the test multiple times Reviewers: IslamAbdelRahman, rven, anthony, kradhakrishnan, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D46587	2015-09-10 11:32:19 -07:00
Igor Canadi	ac9bcb55ce	Set max_open_files based on ulimit Summary: We should never set max_open_files to be bigger than the system's ulimit. Otherwise we will get "Too many open files" errors. See an example in this Travis run: https://travis-ci.org/facebook/rocksdb/jobs/79591566 Test Plan: make check I will also verify that max_max_open_files is reasonable. Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46551	2015-09-10 10:49:28 -07:00
agiardullo	b5b2b75e52	better tuning of arena block size Summary: Currently, if users didn't set options.arena_block_size, we set "result.arena_block_size = result.write_buffer_size / 10". It makes result.arena_block_size not a multiplier of 4KB, even if options.write_buffer_size is a multiplier of MBs. When calling malloc to arena_block_size, we may waste a small amount of memory for it. We now make the default to be /8 or /16 and align it to 4KB. Test Plan: unit tests Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46467	2015-09-08 20:53:32 -07:00
sdong	342ba80895	Make DBTest.OptimizeFiltersForHits more deterministic Summary: This commit makes DBTest.OptimizeFiltersForHits more deterministic by: (1) make key inserts more random (2) make sure L0 has one file (3) make file size smaller compared to level target so L1 will cover more range. Test Plan: Run the test many times. Reviewers: rven, IslamAbdelRahman, kradhakrishnan, igor, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D46461	2015-09-08 19:31:34 -07:00
Andres Notzli	e17e92ea19	Relaxed assert in forward iterator Summary: It looks like in some cases an assert in SeekInternal failed when computing the hints for the next level because user_key was the same as the largest key and not strictly smaller. Relaxing the assert to expect smaller or equal keys. Test Plan: make clean all check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46443	2015-09-08 17:15:11 -07:00
Andres Noetzli	6bdc484fd8	Added Equal method to Comparator interface Summary: In some cases, equality comparisons can be done more efficiently than three-way comparisons. There are quite a few places in the code where we only care about equality. This patch adds an Equal() method that defaults to using the Compare() method. Test Plan: make clean all check Reviewers: rven, anthony, yhchiang, igor, sdong Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46233	2015-09-08 15:30:49 -07:00
Andres Noetzli	3a0df7f161	Fixed comparison in ForwardIterator when computing hint for GetNextLevelIndex() Summary: When computing the hint for GetNextLevelIndex(), ForwardIterator was doing a redundant comparison. This patch fixes the comparison (using https://github.com/facebook/rocksdb/blob/master/db/version_set.cc#L158 as a reference) and moves it inside an assert because we expect `level_files[f_idx]` to contain the next key after Seek(), so user_key should always be smaller than the largest key. Test Plan: make clean all check Reviewers: rven, anthony, yhchiang, igor, sdong Reviewed By: sdong Subscribers: tnovak, sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D46227	2015-09-08 09:47:54 -07:00
Venkatesh Radhakrishnan	91f3c90792	Fix case when forward iterator misses a new update Summary: This diff fixes a case when the forward iterator misses a new insert when the mutable iterator is not current. The test is also improved and the check for deleted iterators is made more informative. Test Plan: DBTailingIteratorTest.*Trim Reviewers: tnovak, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D46167	2015-09-04 14:28:45 -07:00
Andres Noetzli	3c9cef1eed	Unified maps with Comparator for sorting, other cleanup Summary: This diff is a collection of cleanups that were initially part of D43179. Additionally it adds a unified way of defining key-value maps that use a Comparator for sorting (this was previously implemented in four different places). Test Plan: make clean check all Reviewers: rven, anthony, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45993	2015-09-02 13:58:22 -07:00
sdong	3e0a672c50	Bug fix: table readers created by TableCache::Get() doesn't have latency histogram reported Summary: TableCache::Get() puts parameters in the wrong places so that table readers created by Get() will not have the histogram updated. Test Plan: Will write a unit test for that. Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D46035	2015-09-02 12:57:07 -07:00
Tomislav Novak	5508122ed6	Fix a perf regression in ForwardIterator Summary: I noticed that memtable iterator usually crosses the `iterate_upper_bound` threshold when tailing. Changes introduced in D43833 made `NeedToSeekImmutable` always return true in such case, even when `Seek()` only needs to rewind the memtable iterator. In a test I ran, this caused the "tailing efficiency" (ratio of calls to `Seek()` that only affect the memtable versus all seeks) to drop almost to zero. This diff attempts to fix the regression by using a different flag to indicate that `current_` is over the limit instead of resetting `valid_` in `UpdateCurrent()`. Test Plan: `DBTestTailingIterator.TailingIteratorUpperBound` Reviewers: sdong, rven Reviewed By: rven Subscribers: dhruba, march Differential Revision: https://reviews.facebook.net/D45909	2015-09-01 09:54:30 -07:00
Andres Notzli	b722007778	Fix listener_test when using ROCKSDB_MALLOC_USABLE_SIZE Summary: Flushes in listener_test happened to early when ROCKSDB_MALLOC_USABLE_SIZE was active (e.g. when compiling with ROCKSDB_FBCODE_BUILD_WITH_481=1) due to malloc_usable_size() reporting a better estimate (similar to https://reviews.facebook.net/D43317 ). This patch grows the write buffer size slightly to compensate for this. Test Plan: ROCKSDB_FBCODE_BUILD_WITH_481=1 make listener_test && ./listener_test Reviewers: rven, anthony, yhchiang, igor, sdong Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45921	2015-08-31 23:11:12 -07:00
agiardullo	18db1e4695	better db_bench options for transactions Summary: Pessimistic Transaction expiration time checking currently causes a performace regression, Lets disable it in db_bench by default. Also, in order to be able to better tune how much contention we're simulating, added new optinos to set lock timeout and snapshot. Test Plan: run db_bench randomtranansaction Reviewers: sdong, igor, yhchiang, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45831	2015-08-31 15:56:07 -07:00
Ari Ekmekji	8b689546b6	Add Subcompactions to Universal Compaction Unit Tests Summary: Now that the approach to parallelizing L0-L1 level-based compactions by breaking the compaction job into subcompactions is being extended to apply to universal compactions as well, the unit tests need to account for this and run the universal compaction tests with subcompactions both enabled and disabled. Test Plan: make all && make check Reviewers: sdong, igor, noetzli, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D45657	2015-08-31 12:59:02 -07:00
sdong	3d78eb66bb	Arena usage to be calculated using malloc_usable_size() Summary: malloc_usable_size() gets a better estimation of memory usage. It is already used to calculate block cache memory usage. Use it in arena too. Test Plan: Run all unit tests Reviewers: anthony, kradhakrishnan, rven, IslamAbdelRahman, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43317	2015-08-31 09:39:27 -07:00
Andres Noetzli	effd9dd1e1	Fix deadlock in WAL sync Summary: MarkLogsSynced() was doing `logs_.erase(it++);`. The standard is saying: ``` all iterators and references are invalidated, unless the erased members are at an end (front or back) of the deque (in which case only iterators and references to the erased members are invalidated) ``` Because `it` is an iterator to the first element of the container, it is invalidated, only one iteration is executed and `log.getting_synced = false;` is not being done, so `while (logs_.front().getting_synced)` in `WriteImpl()` is not terminating. Test Plan: make db_bench && ./db_bench --benchmarks=fillsync Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, yhchiang, sdong, tnovak Reviewed By: tnovak Subscribers: kolmike, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45807	2015-08-28 18:06:32 -07:00
Andres Noetzli	72a9b73c9e	Removed unnecessary checks in DBTest.ApproximateMemoryUsage Summary: Just realized that after D45675, part of the code in DBTest.ApproximateMemoryUsage, does not really test anything anymore, so I removed it. Test Plan: make clean all check Reviewers: rven, igor, sdong, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45783	2015-08-28 11:13:20 -07:00
Venkatesh Radhakrishnan	cb164bfc48	Do not delete iterators for immutable memtables. Summary: The immutable memtable iterators are allocated from an arena and there is no benefit from deleting these. Also the immutable memtables themselves will continue to be in memory until the version set containing it is alive. We will not remove immutable memtable iterators over the upper bound. We now add immutable iterators to the test. Test Plan: db_tailing_iter_test.TailingIteratorTrimSeekToNext Reviewers: tnovak, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45597	2015-08-28 11:07:07 -07:00
sdong	7a0dbdf3ac	Add ZSTD (not final format) compression type Summary: Add ZSTD compression type. The same way as adding LZ4. Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests. Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor Reviewed By: igor Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45747	2015-08-28 11:01:13 -07:00
Andres Noetzli	e853191c17	Fix DBTest.ApproximateMemoryUsage Summary: This patch fixes two issues in DBTest.ApproximateMemoryUsage: - It was possible that a flush happened between getting the two properties in Phase 1, resulting in different numbers for the properties and failing the assertion. This is fixed by waiting for the flush to finish before getting the properties. - There was a similar issue in Phase 2 and additionally there was an issue that rocksdb.size-all-mem-tables was not monotonically increasing because it was possible that a flush happened just after getting the properties and then another flush just before getting the properties in the next round. In this situation, the reported memory usage decreased. This is fixed by forcing a flush before getting the properties. Note: during testing, I found that kFlushesPerRound does not seem very accurate. I added a TODO for this and it would be great to get some input on what to do there. Test Plan: The first issue can be made more likely to trigger by inserting a `usleep(10000);` between the calls to GetIntProperty() in Phase 1. The second issue can be made more likely to trigger by inserting a `if (r != 0) usleep(10000);` before the calls to GetIntProperty() and a `usleep(10000);` after the calls. Then execute make db_test && ./db_test --gtest_filter=DBTest.ApproximateMemoryUsage Reviewers: rven, yhchiang, igor, sdong, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45675	2015-08-27 16:17:08 -07:00
Yueh-Hsuan Chiang	8ef0144e2f	Add argument --show_table_properties to db_bench Summary: Add argument --show_table_properties to db_bench -show_table_properties (If true, then per-level table properties will be printed on every stats-interval when stats_interval is set and stats_per_interval is on.) type: bool default: false Test Plan: ./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1 ./db_bench --show_table_properties=1 --stats_interval=100000 --stats_per_interval=1 --num_column_families=2 Sample Output: Compaction Stats [column_family_name_000001] Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt) KeyIn KeyDrop --------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 3/0 5 0.8 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 86.3 0 17 0.021 0 0 0 L1 5/0 9 0.9 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 0 L2 9/0 16 0.2 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0.000 0 0 0 Sum 17/0 31 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 86.3 0 17 0.021 0 0 0 Int 0/0 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 83.9 0 2 0.022 0 0 0 Flush(GB): cumulative 0.030, interval 0.004 Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard Level[0]: # data blocks=2571; # entries=84813; raw key size=2035512; raw average key size=24.000000; raw value size=8481300; raw average value size=100.000000; data block size=5690119; index block size=82415; filter block size=0; (estimated) table size=5772534; filter policy name=N/A; Level[1]: # data blocks=4285; # entries=141355; raw key size=3392520; raw average key size=24.000000; raw value size=14135500; raw average value size=100.000000; data block size=9487353; index block size=137377; filter block size=0; (estimated) table size=9624730; filter policy name=N/A; Level[2]: # data blocks=7713; # entries=254439; raw key size=6106536; raw average key size=24.000000; raw value size=25443900; raw average value size=100.000000; data block size=17077893; index block size=247269; filter block size=0; (estimated) table size=17325162; filter policy name=N/A; Level[3]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Level[4]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Level[5]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Level[6]: # data blocks=0; # entries=0; raw key size=0; raw average key size=0.000000; raw value size=0; raw average value size=0.000000; data block size=0; index block size=0; filter block size=0; (estimated) table size=0; filter policy name=N/A; Reviewers: anthony, IslamAbdelRahman, MarkCallaghan, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45651	2015-08-26 18:27:23 -07:00
Igor Canadi	5f4166c90e	ReadaheadRandomAccessFile -- userspace readahead Summary: ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS. We add ReadaheadRandomAccessFile layer only when file is read during compactions. D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff. Test Plan: make check Reviewers: MarkCallaghan, sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45123	2015-08-26 15:25:59 -07:00
sdong	d286b5df90	DBIter to out extra keys with higher sequence numbers when changing direction from forward to backward Summary: When DBIter changes iterating direction from forward to backward, it might see some much larger keys with higher sequence ID. With this commit, these rows will be actively filtered out. It should fix existing disabled tests in db_iter_test. This may not be a perfect fix, but it introduces least impact on existing codes, in order to be safe. Test Plan: Enable existing tests and make sure they pass. Add a new test DBIterWithMergeIterTest.InnerMergeIteratorDataRace8. Also run all existing tests. Reviewers: yhchiang, rven, anthony, IslamAbdelRahman, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D45567	2015-08-26 13:01:39 -07:00
Andres Noetzli	3795449c9d	Fix DBTest.GetProperty Summary: DBTest.GetProperty was failing occasionally (see task #8131266). The reason was that the test closed the database before the compaction was done. When the test reopened the database, RocksDB would schedule a compaction which in turn created table readers and lead the test to fail the assertion that rocksdb.estimate-table-readers-mem is 0. In most cases, GetIntProperty() of rocksdb.estimate-table-readers-mem happened before the compaction created the table readers, hiding the problem. This patch changes the WaitForFlushMemTable() to WaitForCompact(). WaitForFlushMemTable() is not necessary because it is already being called a couple of lines before without any insertions in-between. Test Plan: Insert `usleep(10000);` just after `Reopen(options);` on line 2333 to make the issue more likely, then run: make db_test && while ./db_test --gtest_filter=DBTest.GetProperty; do true; done Reviewers: rven, yhchiang, anthony, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45603	2015-08-26 10:10:26 -07:00
Igor Canadi	a7834a1292	Merge pull request #698 from yuslepukhin/address_noexcept_windows Address noexcept and const integer lambda capture on win	2015-08-25 17:15:23 -07:00
Dmitri Smirnov	6924d7582b	Address noexcept and const integer lambda capture VS 2013 does not support noexcept. Complains about usage of ineteger constant within lambda requiring explicit capture.	2015-08-25 15:17:14 -07:00
Ari Ekmekji	2f8d71ec05	Moving sequence number compaction variables from SubCompactionState to CompactionJob Summary: It was pointed out to me that the members of SubCompactionState 'earliest_snapshot', 'latest_snapshot' and 'visible_at_tip' are never modified by the subcompactions, so they can stay as global varaibles instead to make things simpler. Test Plan: make all && make check Reviewers: sdong, igor, noetzli, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D45477	2015-08-25 14:03:10 -07:00
Venkatesh Radhakrishnan	bab9934d9e	Fix build failure caused by bad merge. Summary: There was a bad merge during refresh. Test Plan: make -j all; make check Reviewers: sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D45555	2015-08-25 14:02:03 -07:00
Venkatesh Radhakrishnan	4d28a7d8ab	Add a whitebox test for deleted file iterators. Summary: We have earlier added a feature to delete file iterators when the current key is over the iterate upper bound. We now add a whitebox test to check if the file iterators were actually deleted. Test Plan: Add check for a range which has deleted iterators. Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45321	2015-08-25 13:40:58 -07:00
Venkatesh Radhakrishnan	249fb4f881	Fix use of deleted file iterators with incomplete iterators Summary: After deleting file iterators which are over the iterate upper bound, we also need to check for null pointers in ResetIncompletIterators. Test Plan: db_tailing_iter_test.TailingIteratorTrimSeekToNext Reviewers: tnovak, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45525	2015-08-25 13:38:35 -07:00
Andres Notzli	09d982f9e0	Fix compact_files_example Summary: See task #7983654. The example was triggering an assert in compaction job because the compaction was not marked as manual. With this patch, CompactionPicker::FormCompaction() marks compactions as manual. This patch also fixes a couple of typos, adds optimistic_transaction_example to .gitignore and librocksdb as a dependency for examples. Adding librocksdb as a dependency makes sure that the examples are built with the latest changes in librocksdb. Test Plan: make clean && cd examples && make all && ./compact_files_example Reviewers: rven, sdong, anthony, igor, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45117	2015-08-25 12:29:44 -07:00
Yueh-Hsuan Chiang	6996de87af	Expose per-level aggregated table properties via GetProperty() Summary: This patch adds "rocksdb.aggregated-table-properties" and "rocksdb.aggregated-table-properties-at-levelN", the former returns the aggreated table properties of a column family, while the later returns the aggregated table properties of the specified level N. Test Plan: Added tests in db_test Reviewers: igor, sdong, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45087	2015-08-25 12:03:54 -07:00
Andres Noetzli	2050832974	Fixing race condition in DBTest.DynamicMemtableOptions Summary: This patch fixes a race condition in DBTEst.DynamicMemtableOptions. In rare cases, it was possible that the main thread would fill up both memtables before the flush job acquired its work. Then, the flush job was flushing both memtables together, producing only one L0 file while the test expected two. Now, the test waits for flushes to finish earlier, to make sure that the memtables are flushed in separate flush jobs. Test Plan: Insert "usleep(10000);" after "IOSTATS_SET_THREAD_POOL_ID(Env::Priority::HIGH);" in BGWorkFlush() to make the issue more likely. Then test with: make db_test && time while ./db_test --gtest_filter=*DynamicMemtableOptions; do true; done Reviewers: rven, sdong, yhchiang, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45429	2015-08-24 17:04:18 -07:00
Igor Canadi	e46bcc08b9	Remove an extra 's' from cur-size-all-mem-tabless Summary: As title Test Plan: make check Reviewers: yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45447	2015-08-24 16:43:18 -07:00
Igor Canadi	4ab26c5ad1	Smarter purging during flush Summary: Currently, we only purge duplicate keys and deletions during flush if `earliest_seqno_in_memtable <= newest_snapshot`. This means that the newest snapshot happened before we first created the memtable. This is almost never true for MyRocks and MongoRocks. This patch makes purging during flush able to understand snapshots. The main logic is copied from compaction_job.cc, although the logic over there is much more complicated and extensive. However, we should try to merge the common functionality at some point. I need this patch to implement no_overwrite_i_promise functionality for flush. We'll also need this to support SingleDelete() during Flush(). @yoshinorim requested the feature. Test Plan: make check I had to adjust some unit tests to understand this new behavior Reviewers: yhchiang, yoshinorim, anthony, sdong, noetzli Reviewed By: noetzli Subscribers: yoshinorim, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42087	2015-08-24 11:11:12 -07:00
Ari Ekmekji	b6def58f73	Changed 'num_subcompactions' to the more accurate 'max_subcompactions' Summary: Up until this point we had DbOptions.num_subcompactions, but it is semantically more correct to call this max_subcompactions since we will schedule up to DbOptions.max_subcompactions smaller compactions at a time during a compaction job. I also added a --subcompactions option to db_bench Test Plan: make all make check Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D45069	2015-08-21 14:25:34 -07:00
sdong	c852968465	db_iter_test: add more test cases for the data race bug Summary: Add more test cases of data race causing wrong iterating results. Tag tests not passing as DISABLED_ Test Plan: Run the tests Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, yhchiang Reviewed By: yhchiang Subscribers: tnovak, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44907	2015-08-21 12:14:12 -07:00
sdong	9130873a13	Add options.new_table_reader_for_compaction_inputs Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it. Test Plan: Add the option. Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang Reviewed By: igor Subscribers: igor, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43311	2015-08-21 08:46:29 -07:00
sdong	07d2d34160	Add a counter about estimated pending compaction bytes Summary: Add a counter of estimated bytes the DB needs to compact for all the compactions to finish. Expose it as a DB Property. In the future, we can use threshold of this counter to replace soft rate limit and hard rate limit. A single threshold of estimated compaction debt in bytes will be easier for users to reason about when should slow down and stopping than more abstract soft and hard rate limits. Test Plan: Add unit tests Reviewers: IslamAbdelRahman, yhchiang, rven, kradhakrishnan, anthony, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44205	2015-08-20 22:17:10 -07:00
Yueh-Hsuan Chiang	a203b913c1	Fixed a rare deadlock in DBTest.ThreadStatusFlush Summary: Currently, ThreadStatusFlush uses two sync-points to ensure there's a flush currently running when calling GetThreadList(). However, one of the sync-point is inside db-mutex, which could cause deadlock in case there's a DB::Get() call. This patch fix this issue by moving the sync-point to a better place where the flush job does not hold the mutex. Test Plan: db_test Reviewers: igor, sdong, anthony, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D45045	2015-08-20 17:18:47 -07:00
Siying Dong	962aa64292	Merge pull request #695 from yuslepukhin/address_windows_build Address windows build issues caused by introducing Subcompaction	2015-08-20 17:04:48 -07:00
Dmitri Smirnov	5bf8907622	More indent adjustment.	2015-08-20 14:14:02 -07:00
Dmitri Smirnov	e2a9f43d64	Adjust indent	2015-08-20 14:10:51 -07:00
Dmitri Smirnov	1cac89c9b1	Address windows build issues Intro SubCompactionState move functionality =delete copy functionality #ifdef SyncPoint in tests for Windows Release builds	2015-08-20 14:08:24 -07:00
Islam AbdelRahman	027ca5b2cd	Total SST files size DB Property Summary: Add a new DB property that calculate the total size of files used by all RocksDB Versions Test Plan: Unittests for the new property Reviewers: igor, yhchiang, anthony, rven, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44799	2015-08-20 11:47:19 -07:00
Andres Noetzli	b604d2562f	Removing unused variables to fix build Summary: Removing two unused variables that prevented compilation. Test Plan: make all Reviewers: rven, sdong, yhchiang, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D44991	2015-08-19 16:57:40 -07:00
Venkatesh Radhakrishnan	1b114eed4d	Free file iterators for files which are above the iterate upper bound to Improve memory utilization Summary: This diff improves the memory utilization for tailing iterators RocksDB, by freeing file iterators which are over the upper bound. It is an updating on Siying's original diff for improving the memory usage for tailing iterators. The changes for the seek and next path are now complete and a test has been added to exercise these paths while deleting file iterators which are above the upper bound. Test Plan: db_tailing_iter_test.TailingIteratorTrimSeekToNext Reviewers: march, tnovak, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43833	2015-08-19 16:05:51 -07:00
Islam AbdelRahman	3fd70b05b8	Rate limit deletes issued by DestroyDB Summary: Update DestroyDB so that all SST files in the first path id go through DeleteScheduler instead of being deleted immediately Test Plan: added a unittest Reviewers: igor, yhchiang, anthony, kradhakrishnan, rven, sdong Reviewed By: sdong Subscribers: jeanxu2012, dhruba Differential Revision: https://reviews.facebook.net/D44955	2015-08-19 15:02:17 -07:00
Yueh-Hsuan Chiang	df79eafcb3	Introduce GetIntProperty("rocksdb.size-all-mem-tables") Summary: Currently, GetIntProperty("rocksdb.cur-size-all-mem-tables") only returns the memory usage by those memtables which have not yet been flushed. This patch introduces GetIntProperty("rocksdb.size-all-mem-tables"), which includes the memory usage by all the memtables, includes those have been flushed but pinned by iterators. Test Plan: Added a test in db_test Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D44229	2015-08-19 13:32:09 -07:00
sdong	888fbdc889	Remove the contstaint that iterator upper bound needs to be within a prefix Summary: There is a check to fail the iterator if prefix extractor is specified but upper bound is out of the prefix for the seek key. Relax this constraint to allow users to set upper bound to the next prefix of the current one. Test Plan: make commit-prereq Reviewers: igor, anthony, kradhakrishnan, yhchiang, rven Reviewed By: rven Subscribers: tnovak, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44949	2015-08-19 11:03:51 -07:00
Ari Ekmekji	137c376675	Removing variables used only in assertions to prevent build error Summary: A couple variables were declared but only used in assertions which causes issues when building in fbcode. Test Plan: make dbg and make release Reviewers: yhchiang, sdong, igor, anthony, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44937	2015-08-19 08:52:22 -07:00
Ari Ekmekji	b47cc58516	Bounding Number of Subcompactions Summary: In D43239 (https://reviews.facebook.net/D43239) the number of subcompactions is set based on the number of L1 files with unique starting keys. In certain cases when this number is very large this causes issues, particularly with the overlap between files since very small output files can be generated. This diff bounds the number of subcompactions to the user option DBOption.num_subcompactions. Test Plan: ./db_test ./db_compaction_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44883	2015-08-18 14:56:31 -07:00
Venkatesh Radhakrishnan	e58e1b18e7	Make tailing iterator show new entries in memtable. Summary: Reseek mutable_iter if it is invalid in Next and immutable_iter is invalid. Test Plan: DBTestTailingIterator.TailingIteratorSeekToNext Reviewers: tnovak, march, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D44865	2015-08-18 14:40:06 -07:00
Ari Ekmekji	601b1aaca0	Fixing Failed Assertion in Subcompaction State Diff Summary: In D43239 (https://reviews.facebook.net/D43239) there is an assertion to make sure a subcompaction's output is never empty at the end of execution. This assertion however breaks the build because some tests lead to exactly that scenario. So instead I have altered the logic to handle this case instead of just failing the assertion. The reason that it is possible for a subcompaction's output to be empty is that during a sequential execution of subcompactions, if a user aborts the compaction job then some of the later subcompactions to be executed may have yet to process any keys and therefore have yet to generate output files. This becomes very rare once the subcompactions are executed in parallel, but for now they are still sequential so the case is possible when there is an early termination, as in some of the tests. Test Plan: ./db_test ./db_compaction_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D44877	2015-08-18 12:27:12 -07:00
Ari Ekmekji	f0da6977a3	[Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State Summary: In prepration for running multiple threads at the same time during a compaction job, this patch assigns each subcompaction its own state (instead of sharing the one global CompactionState). Each subcompaction then uses this state to update its statistics, keep track of its snapshots, etc. during the course of execution. Then at the end of all the executions the statistics are aggregated across the subcompactions so that the final result is the same as if only one larger compaction had run. Test Plan: ./db_test ./db_compaction_test ./compaction_job_test Reviewers: sdong, anthony, igor, noetzli, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43239	2015-08-18 11:06:23 -07:00
Andres Notzli	f32a572099	Simplify querying of merge results Summary: While working on supporting mixing merge operators with single deletes ( https://reviews.facebook.net/D43179 ), I realized that returning and dealing with merge results can be made simpler. Submitting this as a separate diff because it is not directly related to single deletes. Before, callers of merge helper had to retrieve the merge result in one of two ways depending on whether the merge was successful or not (success = result of merge was single kTypeValue). For successful merges, the caller could query the resulting key/value pair and for unsuccessful merges, the result could be retrieved in the form of two deques of keys and values. However, with single deletes, a successful merge does not return a single key/value pair (if merge operands are merged with a single delete, we have to generate a value and keep the original single delete around to make sure that we are not accidentially producing a key overwrite). In addition, the two existing call sites of the merge helper were taking the same actions independently from whether the merge was successful or not, so this patch simplifies that. Test Plan: make clean all check Reviewers: rven, sdong, yhchiang, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43353	2015-08-17 17:34:38 -07:00
sdong	72613657f0	Measure file read latency histogram per level Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled. Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44193	2015-08-14 17:32:42 -07:00
Nathan Bronson	b7198c3afe	reduce db mutex contention for write batch groups Summary: This diff allows a Writer to join the next write batch group without acquiring any locks. Waiting is performed via a per-Writer mutex, so all of the non-leader writers never need to acquire the db mutex. It is now possible to join a write batch group after the leader has been chosen but before the batch has been constructed. This diff doesn't increase parallelism, but reduces synchronization overheads. For some CPU-bound workloads (no WAL, RAM-sized working set) this can substantially reduce contention on the db mutex in a multi-threaded environment. With T=8 N=500000 in a CPU-bound scenario (see the test plan) this is good for a 33% perf win. Not all scenarios see such a win, but none show a loss. This code is slightly faster even for the single-threaded case (about 2% for the CPU-bound scenario below). Test Plan: 1. unit tests 2. COMPILE_WITH_TSAN=1 make check 3. stress high-contention scenarios with db_bench -benchmarks=fillrandom -threads=$T -batch_size=1 -memtablerep=skip_list -value_size=0 --num=$N -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000 Reviewers: sdong, igor, rven, ljin, yhchiang Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43887	2015-08-14 10:55:43 -07:00
sdong	603b6da8b8	Add options.compaction_measure_io_stats to print write I/O stats in compactions Summary: Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs: 2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]} Add two more counters in iostats_context. Also add a parameter of db_bench. Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D44115	2015-08-13 16:52:26 -07:00
sdong	4637207120	Add test case to repro the mispositional iterator in a low-chance data race case Summary: Iterator has a bug: if a child iterator reaches its end, and user issues a Prev(), and just before SeekToLast() of the child iterator is called, some extra rows is added in the end, the position of iterator can be misplaced. Test Plan: Run the tests with or without valgrind Reviewers: rven, yhchiang, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: tnovak, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43671	2015-08-12 10:50:52 -07:00
agiardullo	0db807ec28	Transaction error statuses Summary: Based on feedback from spetrunia, we should better differentiate error statuses for transaction failures. https://github.com/MySQLOnRocksDB/mysql-5.6/issues/86#issuecomment-124605954 Test Plan: unit tests Reviewers: rven, kradhakrishnan, spetrunia, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43323	2015-08-11 17:52:56 -07:00
agiardullo	c2f2cb0214	Pessimistic Transactions Summary: Initial implementation of Pessimistic Transactions. This diff contains the api changes discussed in D38913. This diff is pretty large, so let me know if people would prefer to meet up to discuss it. MyRocks folks: please take a look at the API in include/rocksdb/utilities/transaction[_db].h and let me know if you have any issues. Also, you'll notice a couple of TODOs in the implementation of RollbackToSavePoint(). After chatting with Siying, I'm going to send out a separate diff for an alternate implementation of this feature that implements the rollback inside of WriteBatch/WriteBatchWithIndex. We can then decide which route is preferable. Next, I'm planning on doing some perf testing and then integrating this diff into MongoRocks for further testing. Test Plan: Unit tests, db_bench parallel testing. Reviewers: igor, rven, sdong, yhchiang, yoshinorim Reviewed By: sdong Subscribers: hermanlee4, maykov, spetrunia, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40869	2015-08-11 17:52:23 -07:00
Islam AbdelRahman	c2868cbc52	Use manual_compaction for compaction_job_test Summary: Under certain conditions (disable compression) the compactions that are created in compaction_job_test will pass the trivial_move conditions This will cause problems since we assert that we dont run a compaction if it's a trivial move https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L144-L147 for example when we disable compression, compactions become a valid trivial move and the assert fails https://ci-builds.fb.com/view/rocksdb/job/rocksdb_no_compression/180/console Test Plan: compaction_job_test Reviewers: sdong, yhchiang, noetzli, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43983	2015-08-11 14:47:14 -07:00
Islam AbdelRahman	cee1e8a080	Parallelize LoadTableHandlers Summary: Add a new option that all LoadTableHandlers to use multiple threads to load files on DB Open and Recover Test Plan: make check -j64 COMPILE_WITH_TSAN=1 make check -j64 DISABLE_JEMALLOC=1 make all valgrind_check -j64 (still running) Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43755	2015-08-11 12:19:56 -07:00
Andres Notzli	4249f159d5	Removing duplicate code in db_bench/db_stress, fixing typos Summary: While working on single delete support for db_bench, I realized that db_bench/db_stress contain a bunch of duplicate code related to copmression and found some typos. This patch removes duplicate code, typos and a redundant #ifndef in internal_stats.cc. Test Plan: make db_stress && make db_bench && ./db_bench --benchmarks=compress,uncompress Reviewers: yhchiang, sdong, rven, anthony, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43965	2015-08-11 11:46:15 -07:00
Nathan Bronson	1ae27113c7	reduce comparisons by skiplist Summary: Key comparison is the single largest CPU user for CPU-bound workloads. This diff reduces the number of comparisons in two ways. The first is that it moves predecessor array gathering from FindGreaterOrEqual to FindLessThan, so that FindGreaterOrEqual can return immediately if compare_ returns 0. As part of this change I moved the sequential insertion optimization into Insert, to remove the undocumented (and smelly) requirement that prev must be equal to prev_ if it is non-null. The second optimization is that all of the search functions skip calling compare_ when moving to a lower level that has the same Next pointer. With a branching factor of 4 we would expect this to happen 1/4 of the time. On a single-threaded CPU-bound workload (-benchmarks=fillrandom -threads=1 -batch_size=1 -memtablerep=skip_list -value_size=0 --num=1600000 -level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999 -disable_auto_compactions --max_write_buffer_number=8 -max_background_flushes=8 --disable_wal --write_buffer_size=160000000) on my dev server this is good for a 7% perf win. Test Plan: unit tests Reviewers: rven, ljin, yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43233	2015-08-11 11:25:22 -07:00
Islam AbdelRahman	a9dcc0a638	Fix clang build Summary: https://ci-builds.fb.com/view/rocksdb/job/rocksdb_clang_build/893/console Fixing clang build Test Plan: make clean USE_CLANG=1 make all -j64 Reviewers: sdong, noetzli, yhchiang, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43959	2015-08-10 11:30:36 -07:00
Andres Notzli	68f934355a	Better CompactionJob testing Summary: Changed compaction_job_test to support better/more thorough tests and added two tests. Also changed MockFileContents to order using InternalKeyComparator. Test Plan: make compaction_job_test && ./compaction_job_test; make all && make check Reviewers: sdong, rven, igor, yhchiang, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42837	2015-08-07 21:59:51 -07:00
agiardullo	16ea1c7d1c	simple ManagedSnapshot wrapper Summary: Implemented this simple wrapper for something else I was working on. Seemed like it makes sense to expose it instead of burying it in some random code. Test Plan: added test Reviewers: rven, kradhakrishnan, sdong, yhchiang Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43293	2015-08-06 17:59:05 -07:00
sdong	6a4aaadcd7	Avoid type unique_ptr in LogWriterNumber::writer for Windows build break Summary: Visual Studio complains about deque<LogWriterNumber> because LogWriterNumber is non-copyable for its unique_ptr member writer. Move away from it, and do explit free. It is less safe but I can't think of a better way to unblock it. Test Plan: valgrind check test Reviewers: anthony, IslamAbdelRahman, kolmike, rven, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43647	2015-08-06 10:52:41 -07:00
Andres Noetzli	d7314ba759	Fixing endless loop if seeking to end of key with seq num 0 Summary: When seeking to the last occurrence of a key with sequence number 0, db_iter ends up in an endless loop because it seeks to type kValueTypeForSeek which is larger than kTypeDeletion/kTypeValue. Added test case that triggers the behavior. Test Plan: make clean all check Reviewers: igor, rven, anthony, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43653	2015-08-06 10:43:28 -07:00
Islam AbdelRahman	29b028b0ed	Make DeleteScheduler tests more reliable Summary: Update DeleteScheduler tests so that they verify the used penalties for waiting instead of measuring the time spent which is not reliable Test Plan: make -j64 delete_scheduler_test && ./delete_scheduler_test COMPILE_WITH_TSAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test COMPILE_WITH_ASAN=1 make -j64 delete_scheduler_test && ./delete_scheduler_test make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" COMPILE_WITH_ASAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.RateLimitedDelete:DBTest.DeleteSchedulerMultipleDBPaths" Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43635	2015-08-05 19:16:52 -07:00
Poornima Chozhiyath Raman	7d364d0d94	Fix build failure Summary: fix the build failure Test Plan: make all Reviewers: sdong, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43623	2015-08-05 16:38:12 -07:00
Poornima Chozhiyath Raman	960d936e83	Add function 'GetInfoLogList()' Summary: The list of info log files of a db can be obtained using the new function. Test Plan: New test in db_test.cc passed. Reviewers: yhchiang, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: IslamAbdelRahman, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41715	2015-08-05 16:16:46 -07:00
sdong	7ccd1c80a7	Add two unit tests for SyncWAL() Summary: Add two unit tests for SyncWAL(). One makes sure SyncWAL() doesn't block writes in the other thread. Another one makes sure SyncWAL() doesn't wait ongoing writes to finish before being executed. Create a new test file db_wal_test and move two WAL related tests from db_test to here. Test Plan: Run the new tests Reviewers: IslamAbdelRahman, rven, kradhakrishnan, kolmike, tnovak, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43605	2015-08-05 14:27:02 -07:00
sdong	3ae386eafe	Add statistic histogram "rocksdb.sst.read.micros" Summary: Measure read latency histogram and put in statistics. Compaction inputs are excluded from it when possible (unfortunately usually no possible as we usually take table reader from table cache. Test Plan: Run db_bench and it shows the stats, like: rocksdb.sst.read.micros statistics Percentiles :=> 50 : 1.238522 95 : 2.529740 99 : 3.912180 Reviewers: kradhakrishnan, rven, anthony, IslamAbdelRahman, MarkCallaghan, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43275	2015-08-05 13:02:33 -07:00
Islam AbdelRahman	9aec75fbb9	Enable DBTest.FlushSchedule under TSAN Summary: This patch will fix the false positive of DBTest.FlushSchedule under TSAN, we dont need to disable this test Test Plan: COMPILE_WITH_TSAN=1 make -j64 db_test && ./db_test --gtest_filter="DBTest.FlushSchedule" Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43599	2015-08-05 11:47:07 -07:00
sdong	8e01bd1144	Fix misplaced position for reversing iterator direction while current key is a merge Summary: While doing forward iterating, if current key is merge, internal iterator position is placed to the next key. If Prev() is called now, needs to do extra Prev() to recover the location. This is second attempt of fixing after reverting `ec70fea4c4`. This time shrink the fix to only merge key is the current key and avoid the reseeking logic for max_iterating skipping Test Plan: enable the two disabled tests and make sure they pass Reviewers: rven, IslamAbdelRahman, kradhakrishnan, tnovak, yhchiang Reviewed By: yhchiang Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D43557	2015-08-05 11:08:50 -07:00
Andres Notzli	c465071029	Removing duplicate code Summary: While working on https://reviews.facebook.net/D43179 , I found duplicate code in the tests. This patch removes it. Test Plan: make clean all check Reviewers: igor, sdong, rven, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43263	2015-08-05 07:33:27 -07:00
Mike Kolupaev	e06cf1a098	[wal changes 3/3] method in DB to sync WAL without blocking writers Summary: Subj. We really need this feature. Previous diff D40899 has most of the changes to make this possible, this diff just adds the method. Test Plan: `make check`, the new test fails without this diff; ran with ASAN, TSAN and valgrind. Reviewers: igor, rven, IslamAbdelRahman, anthony, kradhakrishnan, tnovak, yhchiang, sdong Reviewed By: sdong Subscribers: MarkCallaghan, maykov, hermanlee4, yoshinorim, tnovak, dhruba Differential Revision: https://reviews.facebook.net/D40905	2015-08-05 06:06:39 -07:00
Ari Ekmekji	5dc3e6881a	Update Tests To Enable Subcompactions Summary: Updated DBTest DBCompactionTest and CompactionJobStatsTest to run compaction-related tests once with subcompactions enabled and once disabled using the TEST_P test type in the Google Test suite. Test Plan: ./db_test ./db_compaction-test ./compaction_job_stats_test Reviewers: sdong, igor, anthony, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43443	2015-08-04 22:19:07 -07:00
Islam AbdelRahman	c45a57b41e	Support delete rate limiting Summary: Introduce DeleteScheduler that allow enforcing a rate limit on file deletion Instead of deleting files immediately, files are moved to trash directory and deleted in a background thread that apply sleep penalty between deletes if needed. I have updated PurgeObsoleteFiles and PurgeObsoleteWALFiles to use the delete_scheduler instead of env_->DeleteFile Test Plan: added delete_scheduler_test existing unit tests Reviewers: kradhakrishnan, anthony, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D43221	2015-08-04 20:45:27 -07:00
Yueh-Hsuan Chiang	241bb2aef3	Make DBCompactionTest.SkipStatsUpdateTest more stable. Summary: Make DBCompactionTest.SkipStatsUpdateTest more stable by removing flaky but unnecessary assertion on the size of db as simply checking the random file open count is suffice. Test Plan: db_compaction_test Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43533	2015-08-04 15:47:05 -07:00
Yueh-Hsuan Chiang	14d0bfa429	Add DBOptions::skip_sats_update_on_db_open Summary: UpdateAccumulatedStats() is used to optimize compaction decision esp. when the number of deletion entries are high, but this function can slowdown DBOpen esp. in disk environment. This patch adds DBOptions::skip_sats_update_on_db_open, which skips UpdateAccumulatedStats() in DB::Open() time when it's set to true. Test Plan: Add DBCompactionTest.SkipStatsUpdateTest Reviewers: igor, anthony, IslamAbdelRahman, sdong Reviewed By: sdong Subscribers: tnovak, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42843	2015-08-04 13:48:16 -07:00
Venkatesh Radhakrishnan	20b244fcca	Fix CompactFiles by adding all necessary files Summary: The compact files API had a bug where some overlapping files are not added. These are files which overlap with files which were added to the compaction input files, but not to the original set of input files. This happens only when there are more than two levels involved in the compaction. An example will illustrate this better. Level 2 has 1 input file 1.sst which spans [20,30]. Level 3 has added file 2.sst which spans [10,25] Level 4 has file 3.sst which spans [35,40] and input file 4.sst which spans [46,50]. The existing code would not add 3.sst to the set of input_files because it only becomes an overlapping file in level 4 and it wasn't one in level 3. When installing the results of the compaction, 3.sst would overlap with output file from the compact files and result in the assertion in version_set.cc:1130 // Must not overlap assert(level <= 0 \|\| level_files->empty() \|\| internal_comparator_->Compare( (level_files)[level_files->size() - 1]->largest, f->smallest) < 0); This change now adds overlapping files from the current level to the set of input files also so that we don't hit the assertion above. Test Plan: d=/tmp/j; rm -rf $d; seq 1000 \| parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_compaction_test --gtest_filter=CompactilesOnLevel* --gtest_also_run_disabled_tests >& '$d'/log-{}' Reviewers: igor, yhchiang, sdong Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43437	2015-08-03 15:53:22 -07:00
Venkatesh Radhakrishnan	87df6295dd	Make SuggestCompactRangeNoTwoLevel0Compactions deterministic Summary: Made SuggestCompactRangeNoTwoLevel0Compactions by forcing a flush after generating a file and waiting for compaction at the end. Test Plan: Run SuggestCompactRangeNoTwoLevel0Compactions Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43449	2015-08-03 15:52:52 -07:00
Ari Ekmekji	40c64434d4	Parallelize L0-L1 Compaction: Restructure Compaction Job Summary: As of now compactions involving files from Level 0 and Level 1 are single threaded because the files in L0, although sorted, are not range partitioned like the other levels. This means that during L0-L1 compaction each file from L1 needs to be merged with potentially all the files from L0. This attempt to parallelize the L0-L1 compaction assigns a thread and a corresponding iterator to each L1 file that then considers only the key range found in that L1 file and only the L0 files that have those keys (and only the specific portion of those L0 files in which those keys are found). In this way the overlap is minimized and potentially eliminated between different iterators focusing on the same files. The first step is to restructure the compaction logic to break L0-L1 compactions into multiple, smaller, sequential compactions. Eventually each of these smaller jobs will be run simultaneously. Areas to pay extra attention to are # Correct aggregation of compaction job statistics across multiple threads # Proper opening/closing of output files (make sure each thread's is unique) # Keys that span multiple L1 files # Skewed distributions of keys within L0 files Test Plan: Make and run db_test (newer version has separate compaction tests) and compaction_job_stats_test Reviewers: igor, noetzli, anthony, sdong, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42699	2015-08-03 11:32:14 -07:00
Andres Notzli	193dc977e7	Fixing dead code in table_properties_collector_test Summary: There was a bug in table_properties_collector_test that this patch is fixing: `!backward_mode && !test_int_tbl_prop_collector` in TestCustomizedTablePropertiesCollector was never true, so the code in the if-block never got executed. The reason is that the CustomizedTablePropertiesCollector test was skipping tests with `!backward_mode_ && !encode_as_internal`. The reason for skipping the tests is unknown. Test Plan: make table_properties_collector_test && ./table_properties_collector_test Reviewers: rven, igor, yhchiang, anthony, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D43281	2015-07-30 16:59:03 -07:00
agiardullo	8161bdb5a0	WriteBatch Save Points Summary: Support RollbackToSavePoint() in WriteBatch and WriteBatchWithIndex. Support for partial transaction rollback is needed for MyRocks. An alternate implementation of Transaction::RollbackToSavePoint() exists in D40869. However, the other implementation is messier because it is implemented outside of WriteBatch. This implementation is much cleaner and also exposes a potentially useful feature to WriteBatch. Test Plan: Added unit tests Reviewers: IslamAbdelRahman, kradhakrishnan, maykov, yoshinorim, hermanlee4, spetrunia, sdong, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42723	2015-07-29 16:54:23 -07:00
Andres Notzli	d06c82e477	Further cleanup of CompactionJob and MergeHelper Summary: Simplified logic in CompactionJob and removed unused parameter in MergeHelper. Test Plan: make && make check Reviewers: rven, igor, sdong, yhchiang Reviewed By: sdong Subscribers: aekmekji, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42687	2015-07-28 19:21:55 -07:00
Andres Notzli	e95c59cd2f	Count number of corrupt keys during compaction Summary: For task #7771355, we would like to log the number of corrupt keys during a compaction. This patch implements and tests the count as part of CompactionJobStats. Test Plan: make && make check Reviewers: rven, igor, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42921	2015-07-28 16:41:40 -07:00
Poornima Chozhiyath Raman	1bdfcef7bf	Fix when output level is 0 of universal compaction with trivial move Summary: Fix for universal compaction with trivial move, when the ouput level is 0. The tests where failing. Fixed by allowing normal compaction when output level is 0. Test Plan: modified test cases run successfully. Reviewers: sdong, yhchiang, IslamAbdelRahman Reviewed By: IslamAbdelRahman Subscribers: anthony, kradhakrishnan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42933	2015-07-27 14:25:57 -07:00
sdong	82f148ef97	Fix test DBCompactionTest.PartialCompactionFailure undeterministic failure Summary: DBCompactionTest.PartialCompactionFailure has a risk that one flush job writes out two mem tables into one file, so that the total files flushed are less than expected. Fix it by writing for flush to finish after every write. Test Plan: Run the test Reviewers: IslamAbdelRahman, kradhakrishnan, yhchiang, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42831	2015-07-22 13:46:56 -07:00
Mike Kolupaev	4922af6f8d	fixed DBTest.GetPropertiesOfAllTablesTest and DBTest.GetUserDefinedTablaProperties flakiness Summary: These tests used to fail if a compaction happened between flushing tables and enumerating them to get properties. Test Plan: this reports occasional failures without this diff and no failures with it: `for i in {1..10000}; do echo $i; done \| parallel --gnu -j100 'TEST_TMPDIR=`TMPDIR=/dev/shm/rockstemp mktemp -d -t` ./db_test --gtest_filter=DBTest.GetUserDefinedTablaProperties >&/dev/null \|\| echo {} failed'` Reviewers: sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D42861	2015-07-22 12:37:49 -07:00
Mike Kolupaev	fe09a6dae3	[wal changes 2/3] write with sync=true syncs previous unsynced wals to prevent illegal data loss Summary: I'll just copy internal task summary here: " This sequence will cause data loss in the middle after an sync write: non-sync write key 1 flush triggered, not yet scheduled sync write key 2 system crash After rebooting, users might see key 2 but not key 1, which violates the API of sync write. This can be reproduced using unit test FaultInjectionTest::DISABLED_WriteOptionSyncTest. One way to fix it is for a sync write, if there is outstanding unsynced log files, we need to syc them too. " This diff should be considered together with the next diff D40905; in isolation this fix probably could be a little simpler. Test Plan: `make check`; added a test for that (DBTest.SyncingPreviousLogs) before noticing FaultInjectionTest.WriteOptionSyncTest (keeping both since mine asserts a bit more); both tests fail without this diff; for D40905 stacked on top of this diff, ran tests with ASAN, TSAN and valgrind Reviewers: rven, yhchiang, IslamAbdelRahman, anthony, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D40899	2015-07-22 03:28:08 -07:00
Andres Notzli	06aebca592	Report live data size estimate Summary: Fixes T6548822. Added a new function for estimating the size of the live data as proposed in the task. The value can be accessed through the property rocksdb.estimate-live-data-size. Test Plan: There are two unit tests in version_set_test and a simple test in db_test. make version_set_test && ./version_set_test; make db_test && ./db_test gtest_filter=GetProperty Reviewers: rven, igor, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41493	2015-07-21 21:33:20 -07:00
sdong	02b635fa38	Fix undeterministic failure of DBTest.GetPropertiesOfAllTablesTest Summary: DBTest.GetPropertiesOfAllTablesTest generates four files and expects four files there, but a L0->L1 comapction can trigger to compact to one single file. Fix it by raising level 0 number of file compaction trigger Test Plan: Run it many times and see it never fails. Reviewers: kradhakrishnan, IslamAbdelRahman, yhchiang, anthony Reviewed By: anthony Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D42789	2015-07-21 17:13:23 -07:00
Yueh-Hsuan Chiang	7219088cda	Move general compaction tests from db_test.cc to db_compaction_test.cc Summary: Move general compaction tests from db_test.cc to db_compaction_test.cc Test Plan: db_test db_compaction_test Reviewers: igor, sdong, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D42651	2015-07-21 03:05:57 -07:00

1 2 3 4 5 ...

2049 Commits