rocksdb

Author	SHA1	Message	Date
Yueh-Hsuan Chiang	5db9e76644	Fix Mac compile error: C++11 forbids default arguments for lambda expressions Summary: Fix the following Mac compile error. db/db_test.cc:8686:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions] auto gen_l0_kb = [this](int start, int size, int stride = 1) { ^ ~ Test Plan: db_test	2014-10-17 14:47:26 -07:00
Lei Jin	f4363fb81c	Fix DynamicMemtableOptions test Summary: as title Test Plan: make release Reviewers: igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25029	2014-10-17 10:09:45 -07:00
Igor Canadi	ee80fb4b4a	Total memtables size counter Summary: Added one new counter for GetProperty Test Plan: Not sure if needs a test case. compiles Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25023	2014-10-17 09:26:27 -07:00
Lei Jin	274dc81c92	fix build failure Summary: missed default value during merge Test Plan: ./db_test Reviewers: igor, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24975	2014-10-16 17:33:09 -07:00
Lei Jin	d6c8dba727	Log MutableCFOptions in SetOptions Summary: as title Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24903	2014-10-16 17:22:28 -07:00
Lei Jin	4d5708aa56	dynamic soft_rate_limit and hard_rate_limit Summary: as title Test Plan: unit test I am only able to build the test case for hard_rate_limit. soft_rate_limit is essentially the same thing as hard_rate_limit Reviewers: igor, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24759	2014-10-16 17:21:31 -07:00
Lei Jin	065a67c4f0	dynamic disable_auto_compactions Summary: Add more tests as well Test Plan: unit test Reviewers: igor, sdong, yhchiang Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24747	2014-10-16 17:14:17 -07:00
Lei Jin	dc50a1a593	make max_write_buffer_number dynamic Summary: as title Test Plan: unit test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D24729	2014-10-16 16:57:59 -07:00
Igor Canadi	ca250d71a1	Move logging out of mutex Summary: As title Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24897	2014-10-15 10:56:50 -07:00
Igor Canadi	cc6c883f59	Stop stopping writes on bg_error_ Summary: This might have caused https://github.com/facebook/rocksdb/issues/345. If we're stopping writes and bg_error comes along, we will never unblock the write. Test Plan: compiles Reviewers: ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24807	2014-10-13 14:25:55 -07:00
Yueh-Hsuan Chiang	5a76186340	Fixed compile error on Mac: default arguments for lambda expressions Summary: Fixed the following compile error on Mac. db/db_test.cc:8618:52: error: C++11 forbids default arguments for lambda expressions [-Werror,-Wlambda-extensions] auto gen_l0_kb = [this](int start, int size, int stride = 1) { ^ ~ 1 error generated. Test Plan: db_test	2014-10-10 14:10:16 -07:00
sdong	b7d3d6ebc5	db_bench: set thread pool size according to max_background_flushes Summary: option max_background_flushes doesn't make sense if thread pool size is not set accordingly. Set the thread pool size as what we do for max_background_compactions. Test Plan: Run db_bench with max_background_flushes > 1 Reviewers: yhchiang, igor, rven, ljin Reviewed By: ljin Subscribers: MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D24717	2014-10-09 20:38:15 -07:00
Tomislav Novak	88edfd90ae	SkipListRep::LookaheadIterator Summary: This diff introduces the `lookahead` argument to `SkipListFactory()`. This is an optimization for the tailing use case which includes many seeks. E.g. consider the following operations on a skip list iterator: Seek(x), Next(), Next(), Seek(x+2), Next(), Seek(x+3), Next(), Next(), ... If `lookahead` is positive, `SkipListRep` will return an iterator which also keeps track of the previously visited node. Seek() then first does a linear search starting from that node (up to `lookahead` steps). As in the tailing example above, this may require fewer than ~log(n) comparisons as with regular skip list search. Test Plan: Added a new benchmark (`fillseekseq`) which simulates the usage pattern. It first writes N records (with consecutive keys), then measures how much time it takes to read them by calling `Seek()` and `Next()`. $ time ./db_bench -num 10000000 -benchmarks fillseekseq -prefix_size 1 \ -key_size 8 -write_buffer_size $[102410241024] -value_size 50 \ -seekseq_next 2 -skip_list_lookahead=0 [...] DB path: [/dev/shm/rocksdbtest/dbbench] fillseekseq : 0.389 micros/op 2569047 ops/sec; real 0m21.806s user 0m12.106s sys 0m9.672s $ time ./db_bench [...] -skip_list_lookahead=2 [...] DB path: [/dev/shm/rocksdbtest/dbbench] fillseekseq : 0.153 micros/op 6540684 ops/sec; real 0m19.469s user 0m10.192s sys 0m9.252s Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb, march, lovro Differential Revision: https://reviews.facebook.net/D23997	2014-10-07 11:48:23 -07:00
Igor Canadi	f78b832e5d	Log RocksDB version Summary: This will be much easier than reviewing git sha's we currently have in our LOGs Test Plan: none Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24591	2014-10-07 10:40:57 -07:00
Lei Jin	25f6a852e4	add db_test for changing memtable size Summary: The test only covers changing write_buffer_size. Other changable parameters such bloom bits/probes are not obvious how to test. Suggestions are welcome Test Plan: db_test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24429	2014-10-07 10:40:45 -07:00
Yueh-Hsuan Chiang	56dfd363fd	Fix a check in database shutdown or Column family drop during flush. Summary: Fix a check in database shutdown or Column family drop during flush. Special thanks to Maurice Barnum who spots the problem :) Test Plan: db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24273	2014-10-03 00:25:27 -07:00
sdong	8ea232b9e3	Add number of records dropped in compaction summary Summary: Add two stats to compaction summary: 1. Total input records from previous level 2. Total number of records dropped after compaction Test Plan: See outputs of printing when runnning locally Reviewers: ljin, igor, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24411	2014-10-02 17:54:25 -07:00
sdong	f4086a88b4	perf_context.get_from_output_files_time is set for MultiGet() and ReadOnly DB too. Summary: perf_context.get_from_output_files_time is now only set writable DB's DB::Get(). Extend it to MultiGet() and read only DB. Test Plan: make all check Fix perf_context_test and extend it to cover MultiGet(), as long as read-only DB. Run it and watch the results Reviewers: ljin, yhchiang, igor Reviewed By: igor Subscribers: rven, leveldb Differential Revision: https://reviews.facebook.net/D24207	2014-10-02 17:02:50 -07:00
Nik Bougalis	a213971d8a	Don't return (or dereference) dangling pointer	2014-10-02 14:33:16 -07:00
Igor Canadi	d6987216c9	Merge pull request #327 from dalgaaf/wip-da-SCA-20141001 Fix some issues from SCA	2014-10-02 10:59:52 -07:00
Yueh-Hsuan Chiang	89833e5a85	Fixed signed-unsigned comparison warning in db_test.cc Summary: Fixed signed-unsigned comparison warning in db_test.cc db/db_test.cc:8606:3: note: in instantiation of function template specialization 'rocksdb::test::Tester::IsEq<int, unsigned long>' requested here ASSERT_EQ(2, metadata.size()); ^ Test Plan: make db_test	2014-10-02 01:05:59 -07:00
Yueh-Hsuan Chiang	fcac705f95	Fixed compile warning on Mac caused by unused variables. Summary: Fixed compile warning caused by unused variables. ./db/compaction_picker.h:118:7: error: private field 'max_grandparent_overlap_factor_' is not used [-Werror,-Wunused-private-field] int max_grandparent_overlap_factor_; ^ ./db/compaction_picker.h:119:7: error: private field 'expanded_compaction_factor_' is not used [-Werror,-Wunused-private-field] int expanded_compaction_factor_; ^ 2 errors generated. Test Plan: make db_test	2014-10-02 01:03:08 -07:00
Tomislav Novak	187b29938c	ForwardIterator: update prev_key_ only if prefix hasn't changed Summary: Since ForwardIterator is on a level below DBIter, the latter may call Next() on it (e.g. in order to skip deletion markers). Since this also updates `prev_key_`, it may prevent the Seek() optimization. For example, assume that there's only one SST file and it contains the following entries: 0101, 0201 (`ValueType::kTypeDeletion`, i.e. a tombstone record), 0201 (`kTypeValue`), 0202. Memtable is empty. `Seek(0102)` will result in `prev_key_` being set to `0201` instead of `0102`, since `DBIter::Seek()` will call `ForwardIterator::Next()` to skip record 0201. Therefore, when `Seek(0102)` is called again, `NeedToSeekImmutable()` will return true. This fix relies on `prefix_extractor_` to detect prefix changes. `prev_key_` is only set to `current_->key()` as long as they have the same prefix. I also made a small change to `NeedToSeekImmutable()` so it no longer returns true when the db is empty (i.e. there's nothing but a memtable). Test Plan: $ TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test Reviewers: sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23823	2014-10-01 17:10:48 -07:00
Lei Jin	5ec53f3edf	make compaction related options changeable Summary: make compaction related options changeable. Most of changes are tedious, following the same convention: grabs MutableCFOptions at the beginning of compaction under mutex, then pass it throughout the job and register it in SuperVersion at the end. Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23349	2014-10-01 16:19:16 -07:00
Danny Al-Gaaf	4a171882d6	db/version_set.cc: remove unnecessary checks Fix for: [db/version_set.cc:1219]: (style) Unsigned variable 'last_file' can't be negative so it is unnecessary to test it. [db/version_set.cc:1234]: (style) Unsigned variable 'first_file' can't be negative so it is unnecessary to test it. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 11:09:22 +02:00
Danny Al-Gaaf	091153493c	db/db_test.cc: remove unused variable Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:09 +02:00
Danny Al-Gaaf	5abd8add7d	db/deletefile_test.cc: remove unused variable Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	d6483af870	db/db_test.cc: reduce scope of some variables Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	44cca0cd8f	db/db_iter.cc: remove unused variable Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	8ee75dca2e	db/memtable.cc: remove unused variable merge_result Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:33 +02:00
Danny Al-Gaaf	0fd8bbca53	db/db_impl.cc: reduce scope of prefix_initialized Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:33 +02:00
Danny Al-Gaaf	676ff7b1fb	compaction_picker.cc: remove check for >=0 for unsigned Fix for: [db/compaction_picker.cc:923]: (style) Unsigned variable 'start_index' can't be negative so it is unnecessary to test it. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:33 +02:00
Danny Al-Gaaf	33580fa39a	db/db_impl.cc: fix object handling, remove double lines Fix for: [db/db_impl.cc:4039]: (error) Instance of 'StopWatch' object is destroyed immediately. [db/db_impl.cc:4042]: (error) Instance of 'StopWatch' object is destroyed immediately. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	b8b7117e97	db/version_set.cc: use !empty() instead of 'size() > 0' Use empty() since it should be prefered as it has, following the standard, a constant time complexity regardless of the containter type. The same is not guaranteed for size(). Fix for: [db/version_set.cc:2250]: (performance) Possible inefficient checking for 'column_families_not_found' emptiness. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:31 +02:00
Danny Al-Gaaf	53910ddb15	db_test.cc: pass parameter by reference Fix for: [db/db_test.cc:6141]: (performance) Function parameter 'key' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:31 +02:00
Danny Al-Gaaf	68ca534169	corruption_test.cc: pass parameter by reference Fix for: [db/corruption_test.cc:134]: (performance) Function parameter 'fname' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:31 +02:00
Danny Al-Gaaf	7506198da2	cuckoo_table_db_test.cc: add flush after delete It seems that a FlushMemTable() call is needed in the Uint64Comparator test after call Delete(). Otherwise the later via Put() added keys get lost with the next FlushMemTable() call before the check. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 17:53:49 +02:00
Mark Callaghan	1f963305a8	Print MB per second compaction throughput separately for reads and writes Summary: From this line there used to be one column (MB/sec) that includes reads and writes. This change splits it and for real workloads the rd and wr rates might not match when keys are dropped. 2014/09/29-17:31:01.213162 7f929fbff700 (Original Log Time 2014/09/29-17:31:01.180025) [default] compacted to: files[2 5 0 0 0 0 0], MB/sec: 14.0 rd, 14.0 wr, level 1, files in(4, 0) out(5) MB in(8.5, 0.0) out(8.5), read-write-amplify(2.0) write-amplify(1.0) OK Test Plan: make check, grepped LOG - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Differential Revision: https://reviews.facebook.net/D24237	2014-09-29 17:51:40 -07:00
mike@arpaia.co	f0f7955497	Fixing comile errors on OS X Summary: Building master on OS X has some compile errors due to implicit type conversions which generate warnings which RocksDB's build settings raise as errors. Test Plan: It compiles! Reviewers: ljin, igor Reviewed By: ljin Differential Revision: https://reviews.facebook.net/D24135	2014-09-29 16:05:25 -07:00
Mark Callaghan	747523d241	Print per column family metrics in db_bench Summary: see above Test Plan: make check, ran db_bench and looked at output - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: igor Differential Revision: https://reviews.facebook.net/D24189	2014-09-29 15:47:05 -07:00
erik	827e31c746	Make test use a compatible type in the size checks.	2014-09-29 14:52:16 -07:00
Lei Jin	fd5d80d55e	CompactedDB: log using the correct info_log Summary: info_log from supplied Options can be nullptr. Using the one from db_impl. Also call flush after that since no more loggging will happen and LOG can contain partial output Test Plan: verified with db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24183	2014-09-29 12:45:04 -07:00
Lei Jin	2faf49d5f1	use GetContext to replace callback function pointer Summary: Intead of passing callback function pointer and its arg on Table::Get() interface, passing GetContext. This makes the interface cleaner and possible better perf. Also adding a fast pass for SaveValue() Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24057	2014-09-29 11:09:09 -07:00
sdong	389edb6b1b	universal compaction picker: use double for potential overflow Summary: There is a possible overflow case in universal compaction picker. Use double to make the logic straight-forward Test Plan: make all check Reviewers: yhchiang, igor, MarkCallaghan, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23817	2014-09-26 16:17:05 -07:00
Lei Jin	fbd2dafc9f	CompactedDBImpl::MultiGet() for better CuckooTable performance Summary: Add the MultiGet API to allow prefetching. With file size of 1.5G, I configured it to have 0.9 hash ratio that can fill With 115M keys and result in 2 hash functions, the lookup QPS is ~4.9M/s vs. 3M/s for Get(). It is tricky to set the parameters right. Since files size is determined by power-of-two factor, that means # of keys is fixed in each file. With big file size (thus smaller # of files), we will have more chance to waste lot of space in the last file - lower space utilization as a result. Using smaller file size can improve the situation, but that harms lookup speed. Test Plan: db_bench Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23673	2014-09-25 13:34:51 -07:00
Lei Jin	3c68006109	CompactedDBImpl Summary: Add a CompactedDBImpl that will enabled when calling OpenForReadOnly() and the DB only has one level (>0) of files. As a performan comparison, CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with ReadOnlyDBImpl. Test Plan: db_bench Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23553	2014-09-25 11:14:01 -07:00
Igor Canadi	f7375f39fd	Fix double deletes Summary: While debugging clients compaction issues, I noticed bunch of delete bugs: P16329995. MakeTableName returns sst file with "/" prefix. We also need "/" prefix when we get the files though GetChildren(), so that we can properly dedup the files. Test Plan: none Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23457	2014-09-25 11:08:16 -07:00
Igor Canadi	21ddcf6e4f	Remove allow_thread_local Summary: See https://reviews.facebook.net/D19365 Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23907	2014-09-24 13:12:16 -07:00
sdong	cdaf44f9ae	Enlarge log size cap when printing file summary Summary: Now the file summary is too small for printing. Enlarge it. To enable it, allow to pass a size to log buffer. Test Plan: Add a unit test. make all check Reviewers: ljin, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21723	2014-09-23 16:56:34 -07:00
sdong	d0de413f4d	WriteBatchWithIndex to allow different Comparators for different column families Summary: Previously, one single column family is given to WriteBatchWithIndex to index keys for all column families. An extra map from column family ID to comparator is maintained which can override the default comparator given in the constructor. A WriteBatchWithIndex::SetComparatorForCF() is added for user to add comparators per column family. Also move more codes into anonymous namespace. Test Plan: Add a unit test Reviewers: ljin, igor Reviewed By: igor Subscribers: dhruba, leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D23355	2014-09-22 13:47:39 -07:00
Lei Jin	57a32f147f	change target_file_size_base to uint64_t Summary: It contrains the file size to be 4G max with int Test Plan: tried to grep instance and made sure other related variables are also uint64 Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23697	2014-09-22 11:15:03 -07:00
Lei Jin	5e6aee4325	dont create backup_input if compaction filter v2 is not used Summary: Compaction creates backup_input iterator even though it only needed when compaction filter v2 is enabled Test Plan: make all check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23769	2014-09-22 10:36:53 -07:00
Venkatesh Radhakrishnan	f44594743f	RocksDB: Format uint64 using PRIu64 in db_impl.cc Summary: Use PRIu64 to format uint64 in a portable manner Test Plan: Run "make all check" Reviewers: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23595	2014-09-18 22:19:41 -07:00
Igor Canadi	90b8c07b48	Fix unit tests errors Summary: Those were introduced with `2fb1fea30f` because the flushing behavior changed when max_background_flushes is > 0. Test Plan: make check Reviewers: ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23577	2014-09-18 13:32:44 -07:00
Lei Jin	51af7c326c	CuckooTable: add one option to allow identity function for the first hash function Summary: MurmurHash becomes expensive when we do millions Get() a second in one thread. Add this option to allow the first hash function to use identity function as hash function. It results in QPS increase from 3.7M/s to ~4.3M/s. I did not observe improvement for end to end RocksDB performance. This may be caused by other bottlenecks that I will address in a separate diff. Test Plan: ``` [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320 [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320 ``` Reviewers: sdong, igor, yhchiang Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23451	2014-09-18 11:00:48 -07:00
Igor Canadi	2fb1fea30f	Fix syncronization issues	2014-09-18 10:42:54 -07:00
Lei Jin	a062e1f2c4	SetOptions() for memtable related options Summary: as title Test Plan: make all check I will think a way to set up stress test for this Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23055	2014-09-17 12:49:13 -07:00
Igor Canadi	60a4aa175e	Test use_mmap_reads Summary: We currently don't test mmap reads as part of db_test. Piggyback it on kWalDir test config. Test Plan: make check Reviewers: ljin, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23337	2014-09-17 12:31:53 -07:00
Igor Canadi	4a27a2f193	Don't sync manifest when disableDataSync = true Summary: As we discussed offline Test Plan: compiles Reviewers: yhchiang, sdong, ljin, dhruba Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22989	2014-09-15 11:32:01 -07:00
Igor Canadi	04ce1b25f3	Fix #284	2014-09-13 14:14:10 -07:00
Igor Canadi	dee91c259d	WriteThread Summary: This diff just moves the write thread control out of the DBImpl. I will need this as I will control column family data concurrency by only accessing some data in the write thread. That way, we won't have to lock our accesses to column family hash table (mappings from IDs to CFDs). Test Plan: make check Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23301	2014-09-12 16:23:58 -07:00
Igor Canadi	540a257f2c	Fix WAL synced Summary: Uhm... Test Plan: nope Reviewers: sdong, yhchiang, tnovak, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23343	2014-09-12 16:15:29 -07:00
Chilledheart	49fe329e5e	Fix build issue under macosx	2014-09-13 05:05:22 +08:00
Feng Zhu	0352a9fa91	add_wrapped_bloom_test Summary: 1. wrap a filter policy like what fbcode/multifeed/rocksdb/MultifeedRocksDbKey.h to ensure that rocksdb works fine after filterpolicy interface change Test Plan: 1. valgrind ./bloom_test Reviewers: ljin, igor, yhchiang, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23229	2014-09-11 16:33:46 -07:00
Igor Canadi	9c0e66ce98	Don't run background jobs (flush, compactions) when bg_error_ is set Summary: If bg_error_ is set, that means that we mark DB read only. However, current behavior still continues the flushes and compactions, even though bg_error_ is set. On the other hand, if bg_error_ is set, we will return Status::OK() from CompactRange(), although the compaction didn't actually succeed. This is clearly not desired behavior. I found this when I was debugging t5132159, although I'm pretty sure these aren't related. Also, when we're shutting down, it's dangerous to exit RunManualCompaction(), since that will destruct ManualCompaction object. Background compaction job might still hold a reference to manual_compaction_ and this will lead to undefined behavior. I changed the behavior so that we only exit RunManualCompaction when manual compaction job is marked done. Test Plan: make check Reviewers: sdong, ljin, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23223	2014-09-11 16:24:16 -07:00
Igor Canadi	a9639bda84	Fix valgrind test Summary: Get valgrind to stop complaining about uninitialized value Test Plan: valgrind not complaining anymore Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23289	2014-09-11 15:36:30 -07:00
Igor Canadi	d1f24dc7ee	Relax FlushSchedule test Summary: The test makes sure that we don't call flush too often. For that, it's ok to check if we have less than 10 table files. Otherwise, the test is flaky because it's hard to estimate number of entries in the memtable before it gets flushed (any ideas?) Test Plan: Still works, but hopefully less flaky. Reviewers: ljin, sdong, yhchiang Reviewed by: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23241	2014-09-11 11:00:45 -07:00
Igor Canadi	3d9e6f7759	Push model for flushing memtables Summary: When memtable is full it calls the registered callback. That callback then registers column family as needing the flush. Every write checks if there are some column families that need to be flushed. This completely eliminates the need for MakeRoomForWrite() function and simplifies our Write code-path. There is some complexity with the concurrency when the column family is dropped. I made it a bit less complex by dropping the column family from the write thread in https://reviews.facebook.net/D22965. Let me know if you want to discuss this. Test Plan: make check works. I'll also run db_stress with creating and dropping column families for a while. Reviewers: yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23067	2014-09-10 18:46:09 -07:00
Igor Canadi	059e584dd3	[unit test] CompactRange should fail if we don't have space Summary: See t5106397. Also, few more changes: 1. in unit tests, the assumption is that writes will be dropped when there is no space left on device. I changed the wording around it. 2. InvalidArgument() errors are only when user-provided arguments are invalid. When the file is corrupted, we need to return Status::Corruption Test Plan: make check Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23145	2014-09-10 17:00:00 -07:00
Igor Canadi	a52cecb56c	Fix Mac compile	2014-09-09 18:42:35 -07:00
Jonah Cohen	092f97e219	Fix comments and typos Summary: Correct some comments and typos in RocksDB. Test Plan: Inspection Reviewers: sdong, igor Reviewed By: igor Differential Revision: https://reviews.facebook.net/D23133	2014-09-09 15:20:49 -07:00
Igor Canadi	0a42295a24	Fix SimpleWriteTimeoutTest Summary: In column family's SanitizeOptions() [1], we make sure that min_write_buffer_number_to_merge is normal value. However, this test depended on the fact that setting min_write_buffer_number_to_merge to be bigger than max_write_buffer_number will cause a deadlock. I'm not sure how it worked before. This diff fixes it by scheduling sleeping background task, which will actually block any attempts of flushing. [1] https://github.com/facebook/rocksdb/blob/master/db/column_family.cc#L104 Test Plan: the test works now Reviewers: yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23103	2014-09-09 11:50:05 -07:00
sdong	06d986252a	Always pass MergeContext as pointer, not reference Summary: To follow the coding convention and make sure when passing reference as a parameter it is also const, pass MergeContext as a pointer to mem tables. Test Plan: make all check Reviewers: ljin, igor Reviewed By: igor Subscribers: leveldb, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D23085	2014-09-09 11:37:32 -07:00
Stanislau Hlebik	d343c3fe46	Improve db recovery Summary: Avoid creating unnecessary sst files while db opening Test Plan: make all check Reviewers: sdong, igor Reviewed By: igor Subscribers: zagfox, yhchiang, ljin, leveldb Differential Revision: https://reviews.facebook.net/D20661	2014-09-09 11:18:50 -07:00
Lei Jin	52311463e9	MemTableOptions Summary: removed reference to options in WriteBatch and DBImpl::Get() Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23049	2014-09-08 18:46:52 -07:00
Lei Jin	171d4ff4a2	remove TailingIterator reference in db_impl.h Summary: as title Test Plan: make release Reviewers: igor Differential Revision: https://reviews.facebook.net/D23073	2014-09-08 15:39:53 -07:00
Lei Jin	9b0f7ffa1c	rename version_set options_ to db_options_ to avoid confusion Summary: as title Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23007	2014-09-08 15:25:01 -07:00
Igor Canadi	2d57828d0e	Check stop level trigger-0 before slowdown level-0 trigger Summary: ... Test Plan: Can't repro the test failure, but let's see what jenkins says Reviewers: zagfox, sdong, ljin Reviewed By: sdong, ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23061	2014-09-08 15:23:58 -07:00
Lei Jin	659d2d50c3	move compaction_filter to immutable_options Summary: all shared_ptrs are in immutable_options now. This will also make options assignment a little cheaper Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23001	2014-09-08 15:09:25 -07:00
Lei Jin	048560a642	reduce references to cfd->options() in DBImpl Summary: I found it is almost impossible to get rid of this function in a single batch. I will take a step by step approach Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22995	2014-09-08 15:04:34 -07:00
sdong	011241bb99	DB::Flush() Do not wait for background threads when there is nothing in mem table Summary: When we have multiple column families, users can issue Flush() on every column families to make sure everything is flushes, even if some of them might be empty. By skipping the waiting for empty cases, it can be greatly speed up. Still wait for people's comments before writing unit tests for it. Test Plan: Will write a unit test to make sure it is correct. Reviewers: ljin, yhchiang, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D22953	2014-09-08 13:40:42 -07:00
Igor Canadi	a2bb7c3c33	Push- instead of pull-model for managing Write stalls Summary: Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either: * proceed with all writes without delay * delay all writes by fixed time * stop all writes The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case). When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal. This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write. Test Plan: make check for now. I'll add some unit tests later. Also, perf test. Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22791	2014-09-08 11:20:25 -07:00
Feng Zhu	0af157f9bf	Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit `1d23b5c470` Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979	2014-09-08 10:37:05 -07:00
Igor Canadi	9360cc690e	Fix valgrind issue	2014-09-08 08:01:25 -07:00
Igor Canadi	9f1c80b556	Drop column family from write thread Summary: If we drop column family only from (single) write thread, we can be sure that nobody will drop the column family while we're writing (and our mutex is released). This greatly simplifies my patch that's getting rid of MakeRoomForWrite(). Test Plan: make check, but also running stress test Reviewers: ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22965	2014-09-05 15:20:05 -07:00
Igor Canadi	8de151bb99	Add db_bench with lots of column families to regression tests Summary: That way we can see when this graph goes up and be happy. Couple of changes: 1. title 2. fix db_bench to delete column families before deleting the DB. this was asserting when compiled in debug mode 3. don't sync manifest when disableDataSync. We discussed this offline. I can move it to separate diff if you'd like Test Plan: ran it Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22815	2014-09-05 14:20:18 -07:00
Lei Jin	c9e419ccb6	rename options_ to db_options_ in DBImpl to avoid confusion Summary: as title Test Plan: make release Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22935	2014-09-05 11:48:17 -07:00
Radheshyam Balasundaram	5cd0576ffe	Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Summary: Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Also added tests. Test Plan: make check all Also ran db_bench to generate multiple files. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22743	2014-09-05 11:18:01 -07:00
Raghav Pisolkar	0fbb3facc0	fixed memory leak in unit test DBIteratorBoundTest Summary: fixed memory leak in unit test DBIteratorBoundTest Test Plan: ran valgrind test on my unit test Reviewers: sdong Differential Revision: https://reviews.facebook.net/D22911	2014-09-05 10:35:28 -07:00
Lei Jin	adcd2532ca	fix asan check Summary: PlainTable takes reference instead of a copy. Keep a copy in the test code Test Plan: make asan_check Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22899	2014-09-05 09:53:04 -07:00
liuhuahang	bb6ae0f80c	fix more compile warnings N/A Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-05 14:14:37 +08:00
Nik Bougalis	4329d74e05	Fix swapped variable names to accurately reflect usage	2014-09-04 20:09:45 -07:00
Stanislau Hlebik	45a5e3ede0	Remove path with arena==nullptr from NewInternalIterator Summary: Simply code by removing code path which does not use Arena from NewInternalIterator Test Plan: make all check make valgrind_check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22395	2014-09-04 17:40:41 -07:00
Lei Jin	5665e5e285	introduce ImmutableOptions Summary: As a preparation to support updating some options dynamically, I'd like to first introduce ImmutableOptions, which is a subset of Options that cannot be changed during the course of a DB lifetime without restart. ColumnFamily will keep both Options and ImmutableOptions. Any component below ColumnFamily should only take ImmutableOptions in their constructor. Other options should be taken from APIs, which will be allowed to adjust dynamically. I am yet to make changes to memtable and other related classes to take ImmutableOptions in their ctor. That can be done in a seprate diff as this one is already pretty big. Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D22545	2014-09-04 16:18:36 -07:00
Raghav Pisolkar	e0b99d4f5d	created a new ReadOptions parameter 'iterate_upper_bound'	2014-09-04 11:00:16 -07:00
liuhuahang	ef5b384729	fix a few compile warnings 1, const qualifiers on return types make no sense and will trigger a compile warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 2, class HistogramImpl has virtual functions and thus should have a virtual destructor 3, with some toolchain, the macro __STDC_FORMAT_MACROS is predefined and thus should be checked before define Change-Id: I69747a03bfae88671bfbb2637c80d17600159c99 Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-04 23:06:23 +08:00
Lei Jin	9b58c73c7c	call SanitizeDBOptionsByCFOptions() in the right place Summary: It only covers Open() with default column family right now Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22467	2014-09-02 14:42:23 -07:00
Igor Canadi	a84234a61b	Ignore missing column families Summary: Before this diff, whenever we Write to non-existing column family, Write() would fail. This diff adds an option to not fail a Write() when WriteBatch points to non-existing column family. MongoDB said this would be useful for them, since they might have a transaction updating an index that was dropped by another thread. This way, they don't have to worry about checking if all indexes are alive on every write. They don't care if they lose writes to dropped index. Test Plan: added a small unit test Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22143	2014-09-02 13:29:05 -07:00
Igor Canadi	7f19bb93c6	Merge pull request #242 from tdfischer/perf-timer-destructors Refactor PerfStepTimer to automatically stop on destruct	2014-09-02 13:06:40 -07:00
Feng Zhu	8438a19360	fix dropping column family bug Summary: 1. db/db_impl.cc:2324 (DBImpl::BackgroundCompaction) should not raise bg_error_ when column family is dropped during compaction. Test Plan: 1. db_stress Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22653	2014-09-02 12:25:58 -07:00
Torrie Fischer	6614a48418	Refactor PerfStepTimer to stop on destruct This eliminates the need to remember to call PERF_TIMER_STOP when a section has been timed. This allows more useful design with the perf timers and enables possible return value optimizations. Simplistic example: class Foo { public: Foo(int v) : m_v(v); private: int m_v; } Foo makeFrobbedFoo(int errno) { errno = 0; return Foo(); } Foo bar(int *errno) { PERF_TIMER_GUARD(some_timer); return makeFrobbedFoo(errno); } int main(int argc, char[] argv) { Foo f; int errno; f = bar(&errno); if (errno) return -1; return 0; } After bar() is called, perf_context.some_timer would be incremented as if Stop(&perf_context.some_timer) was called at the end, and the compiler is still able to produce optimizations on the return value from makeFrobbedFoo() through to main().	2014-09-02 12:04:22 -07:00
Igor Canadi	990df99a61	Fix ios compile Summary: We need to set contbuild for this :) Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22701	2014-09-02 10:50:15 -07:00
Igor Canadi	7dcadb1d37	Don't let flush preempt compaction in certain cases Summary: I have an application configured with 16 background threads. Write rates are high. L0->L1 compactions is very slow and it limits the concurrency of the system. While it's happening, other 15 threads are idle. However, when there is a need of a flush, that one thread busy with L0->L1 is doing flush, instead of any other 15 threads that are just sitting there. This diff prevents that. If there are threads that are idle, we don't let flush preempt compaction. Test Plan: Will run stress test Reviewers: ljin, sdong, yhchiang Reviewed By: sdong, yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D22299	2014-09-02 08:34:54 -07:00
Nik Bougalis	f09329cb01	Fix candidate file comparison when using path ids	2014-08-31 00:54:15 -07:00
Tomislav Novak	0f9c43ea36	ForwardIterator: reset incomplete iterators on Seek() Summary: When reading from kBlockCacheTier, ForwardIterator's internal child iterators may end up in the incomplete state (read was unable to complete without doing disk I/O). `ForwardIterator::status()` will correctly report that; however, the iterator may be stuck in that state until all sub-iterators are rebuilt: * `NeedToSeekImmutable()` may return false even if some sub-iterators are incomplete * one of the child iterators may be an empty iterator without any state other that the kIncomplete status (created using `NewErrorIterator()`); seeking on any such iterator has no effect -- we need to construct it again Akin to rebuilding iterators after a superversion bump, this diff makes forward iterator reset all incomplete child iterators when `Seek()` or `Next()` are called. Test Plan: TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: lovro, march, leveldb Differential Revision: https://reviews.facebook.net/D22575	2014-08-29 16:21:29 -07:00
Lei Jin	722d80c374	reduce recordTick overhead in compaction loop Summary: It is too expensive to bump ticker to every key/vaue pair Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22527	2014-08-29 09:51:09 -07:00
Igor Canadi	0c26e76b28	Merge pull request #237 from tdfischer/tdfischer/faster-timeout-test test: db: fix test to have a smaller timeout for when it runs on faster ...	2014-08-28 20:40:10 -04:00
Feng Zhu	1d23b5c470	remove_internal_filter_policy Summary: 1. remove class InternalFilterPolicy in db/dbformat.h 2. Transformation from internal key to user key is done in filter_block.cc 3. This is a preparation for patch D20979 Test Plan: make all check valgrind ./db_test Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22509	2014-08-28 17:06:29 -07:00
Igor Canadi	d977e55596	Don't let other compactions run when manual compaction runs Summary: Based on discussions from t4982833. This is just a short-term fix, I plan to revamp manual compaction process as part of t4982812. Also, I think we should schedule automatic compactions at the very end of manual compactions, not when we're done with one level. I made that change as part of this diff. Let me know if you disagree. Test Plan: make check for now Reviewers: sdong, tnovak, yhchiang, ljin Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22401	2014-08-28 13:06:28 -04:00
Igor Canadi	d5bd6c772b	Fix ios compile Summary: No __thread for ios. Test Plan: compile works for ios now Reviewers: ljin, dhruba Reviewed By: dhruba Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22491	2014-08-28 12:46:05 -04:00
Radheshyam Balasundaram	4142a3e783	Adding a user comparator for comparing Uint64 slices. Summary: - New Uint64 comparator - Modify Reader and Builder to take custom user comparators instead of bytewise comparator - Modify logic for choosing unused user key in builder - Modify iterator logic in reader - test changes Test Plan: cuckoo_table_{builder,reader,db}_test make check all Reviewers: ljin, sdong Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D22377	2014-08-27 10:39:31 -07:00
Radheshyam Balasundaram	b6fd7811eb	Don't do memtable lookup in db_impl_readonly if memtables are empty while opening db. Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not. Test Plan: db_test make check all Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22281	2014-08-26 17:19:03 -07:00
Stanislau Hlebik	9dcb75b6d9	Add is-file-deletions-enabled property Summary: Add property 'rocksdb.is-file-deletions-enable' which equals disable_delete_obsole_file_ Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22119	2014-08-26 16:26:29 -07:00
Lei Jin	1755581f19	improve OptimizeForPointLookup() Summary: also fix HISTORY.md Test Plan: make all check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22437	2014-08-26 14:15:00 -07:00
Lei Jin	bda6f3363d	fix valgrind error in c_test caused by BlockBasedTableOptions Summary: It was creating BlockBasedTableOptions object in a loop without calling destroy() Test Plan: valgrind ./c_test --leak-check=full --show-reachable=yes Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22431	2014-08-26 09:57:25 -07:00
Torrie Fischer	0db6b028e7	Update timeout to 50ms instead of 3.	2014-08-26 09:38:45 -07:00
Lei Jin	23861857c4	ReadOptions.total_order_seek to allow total order seek for block-based table when hash index is enabled Summary: as title Test Plan: table_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22239	2014-08-25 16:14:30 -07:00
Lei Jin	a98badff16	print table options Summary: Add a virtual function in table factory that will print table options Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22149	2014-08-25 14:24:09 -07:00
Lei Jin	384400128f	move block based table related options BlockBasedTableOptions Summary: I will move compression related options in a separate diff since this diff is already pretty lengthy. I guess I will also need to change JNI accordingly :( Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21915	2014-08-25 14:22:05 -07:00
Igor Canadi	42ea795209	Fix concurrency issue in CompactionPicker Summary: I am currently working on a project that uses RocksDB. While debugging some perf issues, I came up across interesting compaction concurrency issue. Namely, I had 15 idle threads and a good comapction to do, but CompactionPicker returned "Compaction nothing to do". Here's how Internal stats looked: 2014/08/22-08:08:04.551982 7fc7fc3f5700 ------- DUMPING STATS ------- 2014/08/22-08:08:04.552000 7fc7fc3f5700 Compaction Stats [default] Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s) Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 7/5 353 1.0 0.0 0.0 0.0 2.3 2.3 0.0 0.0 0.0 9.4 0 0 0 0 247 46 5.359 8.53 1 8526.25 L1 2/2 86 1.3 2.6 1.9 0.7 2.6 1.9 2.7 1.3 24.3 24.0 39 19 71 52 109 11 9.938 0.00 0 0.00 L2 26/0 833 1.3 5.7 1.7 4.0 5.2 1.2 6.3 3.0 15.6 14.2 47 112 147 35 373 44 8.468 0.00 0 0.00 L3 12/0 505 0.1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0.000 0.00 0 0.00 Sum 47/7 1778 0.0 8.3 3.6 4.6 10.0 5.4 8.1 4.4 11.6 14.1 86 131 218 87 728 101 7.212 8.53 1 8526.25 Int 0/0 0 0.0 2.4 0.8 1.6 2.7 1.2 11.5 6.1 12.0 13.6 20 43 63 20 203 23 8.845 0.00 0 0.00 Flush(GB): accumulative 2.266, interval 0.444 Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 8.526 memtable_compaction, 0.000 leveln_slowdown_soft, 0.000 leveln_slowdown_hard Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 1 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard DB Stats Uptime(secs): 336.8 total, 60.4 interval Cumulative writes: 61584000 writes, 6480589 batches, 9.5 writes per batch, 1.39 GB user ingest Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 GB written Interval writes: 11235257 writes, 1175050 batches, 9.6 writes per batch, 259.9 MB user ingest Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 MB written To see what happened, go here: `47b452cfcf/db/compaction_picker.cc (L430)` * The for loop started with level 1, because it has the worst score. * PickCompactionBySize on L429 returned nullptr because all files were being compacted * ExpandWhileOverlapping(c) returned true (because that's what it does when it gets nullptr!?) * for loop break-ed, never trying compactions for level 2 :( :( This bug was present at least since January. I have no idea how we didn't find this sooner. Test Plan: Unit testing compaction picker is hard. I tested this by running my service and observing L0->L1 and L2->L3 compactions in parallel. However, for long-term, I opened the task #4968469. @yhchiang is currently refactoring CompactionPicker, hopefully the new version will be unit-testable ;) Here's how my compactions look like after the patch: 2014/08/22-08:50:02.166699 7f3400ffb700 ------- DUMPING STATS ------- 2014/08/22-08:50:02.166722 7f3400ffb700 Compaction Stats [default] Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s) Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 8/5 404 1.5 0.0 0.0 0.0 4.3 4.3 0.0 0.0 0.0 9.6 0 0 0 0 463 88 5.260 0.00 0 0.00 L1 2/2 60 0.9 4.8 3.9 0.8 4.7 3.9 2.4 1.2 23.9 23.6 80 23 131 108 204 19 10.747 0.00 0 0.00 L2 23/3 697 1.0 11.6 3.5 8.1 10.9 2.8 6.4 3.1 17.7 16.6 95 242 317 75 669 92 7.268 0.00 0 0.00 L3 58/14 2207 0.3 6.2 1.6 4.6 5.9 1.3 7.4 3.6 14.6 13.9 43 121 159 38 436 36 12.106 0.00 0 0.00 Sum 91/24 3368 0.0 22.5 9.1 13.5 25.8 12.4 11.2 6.0 13.0 14.9 218 386 607 221 1772 235 7.538 0.00 0 0.00 Int 0/0 0 0.0 3.2 0.9 2.3 3.6 1.3 15.3 8.0 12.4 13.7 24 66 89 23 266 27 9.838 0.00 0 0.00 Flush(GB): accumulative 4.336, interval 0.444 Stalls(secs): 0.000 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 0.000 leveln_slowdown_soft, 0.000 leveln_slowdown_hard Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard DB Stats Uptime(secs): 577.7 total, 60.1 interval Cumulative writes: 116960736 writes, 11966220 batches, 9.8 writes per batch, 2.64 GB user ingest Cumulative WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 GB written Interval writes: 11643735 writes, 1206136 batches, 9.7 writes per batch, 269.2 MB user ingest Interval WAL: 0 writes, 0 syncs, 0.00 writes per sync, 0.00 MB written Yay for concurrent L0->L1 and L2->L3 compactions! Reviewers: sdong, yhchiang, ljin Reviewed By: yhchiang Subscribers: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D22305	2014-08-22 11:32:40 -07:00
Yueh-Hsuan Chiang	47b452cfcf	Fix the error of c_test.c Summary: Fix the error of c_test.c Test Plan: make c_test ./c_test	2014-08-20 17:05:29 -07:00
Yueh-Hsuan Chiang	562b7a1f28	Add missing implementaiton of SanitizeDBOptions in simple_table_db_test.cc Summary: Add missing implementaiton of SanitizeDBOptions in simple_table_db_test.cc Test Plan: make simple_table_db_test.cc	2014-08-20 16:33:25 -07:00
Yueh-Hsuan Chiang	63a2215c63	Improve Options sanitization and add MmapReadRequired() to TableFactory Summary: Currently, PlainTable must use mmap_reads. When PlainTable is used but allow_mmap_reads is not set, rocksdb will fail in flush. This diff improve Options sanitization and add MmapReadRequired() to TableFactory. Test Plan: export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest make db_test -j32 ./db_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: you, leveldb Differential Revision: https://reviews.facebook.net/D21939	2014-08-20 15:53:39 -07:00
sdong	10720a5587	Revert the unintended change that DestroyDB() doesn't clean up info logs. Summary: A previous change triggered a change by mistake: DestroyDB() will keep info logs under DB directory. Revert the unintended change. Test Plan: Add a unit test case to verify it. Reviewers: ljin, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22209	2014-08-20 12:07:32 -07:00
Torrie Fischer	7c5173d27f	test: db: fix test to have a smaller timeout for when it runs on faster hardware	2014-08-19 13:45:12 -07:00
Radheshyam Balasundaram	162b8151f1	Adding Column Family support in db_bench. Summary: Adding num_column_families flag. Adding support for column families in DoWrite and ReadRandom methods. [Igor, please let me know if this approach sounds good. I shall add it to other methods too.] Test Plan: Ran fillseq on 1M keys and 10 Column families and ran readrandom. Reviewers: sdong, yhchiang, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21387	2014-08-18 18:15:01 -07:00
sdong	28b5c76004	WriteBatchWithIndex: a wrapper of WriteBatch, with a searchable index Summary: Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write. WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset. Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries. I will add more unit tests if people are OK about this API. Test Plan: make all check Add unit tests. Reviewers: yhchiang, igor, MarkCallaghan, ljin Reviewed By: ljin Subscribers: dhruba, leveldb, xjin Differential Revision: https://reviews.facebook.net/D21381	2014-08-18 16:37:38 -07:00
Radheshyam Balasundaram	36e759d199	Adding Cuckoo Table SST option to db_bench Summary: Adding flags to use cuckoo table SST in db_bench.cc Test Plan: Ran benchmark with fillseq and readrandom Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21729	2014-08-18 11:59:38 -07:00
Igor Canadi	a6fd14c881	Fix valgrind error in c_test	2014-08-18 11:08:51 -07:00
Igor Canadi	c8ecfaedd0	Merge pull request #230 from cockroachdb/spencerkimball/send-user-keys-to-v2-filter Pass parsed user key to prefix extractor in V2 compaction	2014-08-18 11:09:30 -04:00
Yueh-Hsuan Chiang	570ba5aca8	Avoid retrying to read property block from a table when it does not exist. Summary: Avoid retrying to read property block from a table when it does not exist in updating stats for compensating deletion entries. In addition, ReadTableProperties() now returns Status::NotFound instead of Status::Corruption when table properties does not exist in the file. Test Plan: make db_test -j32 export ROCKSDB_TESTS=CompactionDeleteionTrigger ./db_test Reviewers: ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21867	2014-08-15 12:17:44 -07:00
sdong	58b0f9d890	Support purging logs from separate log directory Summary: 1. Support purging info logs from a separate paths from DB path. Refactor the codes of generating info log prefixes so that it can be called when generating new files and scanning log directory. 2. Fix the bug of not scanning multiple DB paths (should only impact multiple DB paths) Test Plan: Add unit test for generating and parsing info log files Add end-to-end test in db_test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: leveldb, igor, dhruba Differential Revision: https://reviews.facebook.net/D21801	2014-08-14 13:22:50 -07:00
Lei Jin	58c49466d2	Allow env_posix to lower background thread IO priority Summary: This is a linux-specific system call. Test Plan: ran db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: haobo, leveldb Differential Revision: https://reviews.facebook.net/D21183	2014-08-13 20:49:58 -07:00
Lei Jin	5a5953b388	Add histogram for DB_SEEK Summary: as title Test Plan: make release Reviewers: sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21717	2014-08-13 15:56:37 -07:00
Feng Zhu	5e642403a9	log db path info before open Summary: 1. write db MANIFEST, CURRENT, IDENTITY, sst files, log files to log before open Test Plan: run db and check LOG file Reviewers: ljin, yhchiang, igor, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21459	2014-08-13 13:45:13 -07:00
Stanislau Hlebik	0c9dc9f8e0	Remove malloc from FormatFileNumber Summary: Replace unnecessary malloc with stack allocation Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21771	2014-08-13 11:57:40 -07:00
sdong	48081777f3	Revert "Include candidate files under options.db_log_dir in FindObsoleteFiles()" This reverts commit `54153ab07a`.	2014-08-12 18:14:27 -07:00
Yueh-Hsuan Chiang	0138b8eba8	Fixed compile errors (signed / unsigned comparison) in cuckoo_table_db_test on Mac Summary: Fixed compile errors (signed / unsigned comparison) in cuckoo_table_db_test on Mac Test Plan: make cuckoo_table_db_test	2014-08-12 17:35:09 -07:00
Yueh-Hsuan Chiang	1562653ba0	Fixed a signed-unsigned comparison error in db_test Summary: Fixed a signed-unsigned comparison error in db_test Test Plan: make db_test	2014-08-12 17:26:47 -07:00
Lei Jin	218857b3f5	remove tailing_iter.h/cc Summary: as title Test Plan: make all check ran db_bench and saw seek stats at the end Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21651	2014-08-12 17:13:15 -07:00
Lei Jin	5d0074c471	set bytes_per_sync to 1MB if rate limiter is enabled Summary: as title Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21201	2014-08-12 16:42:18 -07:00
Spencer Kimball	3fcf7b26b9	Pass parsed user key to prefix extractor in V2 compaction Previously, the prefix extractor was being supplied with the RocksDB key instead of a parsed user key. This makes correct interpretation by calling application fragile or impossible.	2014-08-12 18:48:28 -04:00
Stanislau Hlebik	2fa643466d	Add scope guard Summary: Small change: replace mutex_.Lock/mutex_.Unlock() with scope guard Test Plan: make all check Reviewers: igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21609	2014-08-12 12:13:13 -07:00
Stanislau Hlebik	06a52bda64	Flush only one column family Summary: Currently DBImpl::Flush() triggers flushes in all column families. Instead we need to trigger just the column family specified. Test Plan: make all check Reviewers: igor, ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20841	2014-08-11 22:10:32 -07:00
Radheshyam Balasundaram	9674c11d01	Integrating Cuckoo Hash SST Table format into RocksDB Summary: Contains the following changes: - Implementation of cuckoo_table_factory - Adding cuckoo table into AdaptiveTableFactory - Adding cuckoo_table_db_test, similar to lines of plain_table_db_test - Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key. - Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test cuckoo_table_db_test make check all make valgrind_check make asan_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21219	2014-08-11 20:21:07 -07:00
Siying Dong	18efdba8d5	Merge pull request #228 from miguelportilla/develop Changes to support unity build: Script for building the unity.cc file via Makefile Unity executable Makefile target for testing builds Source code changes to fix compilation of unity build	2014-08-11 11:10:23 -07:00
Feng Zhu	d3f2ec694f	check prefix_size when using hash search in db_bench Summary: 1. Check prefix_size when enable use_hash_search in db_bench 2. Remove include/statistics.h in db_bench Test Plan: ./db_bench --use_hash_search=1 Reviewers: ljin, yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21375	2014-08-11 10:47:52 -07:00
miguelportilla	93e6b5e9d9	Changes to support unity build: * Script for building the unity.cc file via Makefile * Unity executable Makefile target for testing builds * Source code changes to fix compilation of unity build	2014-08-11 13:22:47 -04:00
sdong	54153ab07a	Include candidate files under options.db_log_dir in FindObsoleteFiles() Summary: In FindObsoleteFiles(), we don't scan db_log_dir. Add it. Test Plan: make all check Reviewers: ljin, igor, yhchiang Reviewed By: yhchiang Subscribers: leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D21429	2014-08-08 17:37:03 -07:00
sdong	4632239d13	Need to schedule compactions when manual compaction finishes Summary: If there is an outstanding compaction scheduled but at the time a manual compaction is triggered, the manual compaction will preempt. In the end of the manual compaction, we should try to schedule compactions to make sure those preempted ones are not skipped. Test Plan: make all check Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: leveldb, dhruba, igor Differential Revision: https://reviews.facebook.net/D21321	2014-08-08 12:28:36 -07:00
Igor Canadi	5e0868147d	Fix SIGSEGV in travis Summary: Travis build was failing a lot. For example see https://travis-ci.org/facebook/rocksdb/builds/31425845 This fixes it. Also, please don't put any code after SignalAll :) Test Plan: no more SIGSEGV Reviewers: yhchiang, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D21417	2014-08-08 10:24:00 -07:00
Stanislau Hlebik	7c88249f51	Fix db_test and DBIter Summary: Fix old issue with DBTest.Randomized with BlockBasedTableWithWholeKeyHashIndex + added printing in DBTest.Randomized. Test Plan: make all check Reviewers: zagfox, igor, ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21003	2014-08-08 09:44:14 -07:00
Igor Canadi	894a77abdf	Fix leak in c_test	2014-08-07 15:06:52 -07:00
Igor Canadi	323d6e3542	Fix c_test	2014-08-07 14:29:38 -04:00
Igor Canadi	f8d6a2981f	Merge pull request #224 from cockroachdb/spencerkimball/compaction-filter-v2-c-bindings Add support for C bindings to the compaction V2 filter mechanism.	2014-08-07 14:10:54 -04:00
sdong	76dcf7eefd	Minor: fix a format Summary: A format fixing Test Plan: N/A Reviewers: ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21255	2014-08-06 18:11:33 -07:00
sdong	7abe9655d3	Fix valgrind failure caused by recent checked-in. Summary: Initialize un-initialized parameters Test Plan: run the failed test (c_test) Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21249	2014-08-06 17:45:47 -07:00
Spencer Kimball	38e8b727a8	Fix typo, add missing inclusion of state void* in invocation of create_compaction_filter_v2_.	2014-08-06 18:42:15 -04:00
Spencer Kimball	c1f588af71	Add support for C bindings to the compaction V2 filter mechanism. Test Plan: make c_test && ./c_test Some fixes after merge.	2014-08-06 15:55:48 -04:00
sdong	1242bfcad7	Add DB property "rocksdb.estimate-table-readers-mem" Summary: Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache. Refactor the property codes to allow getting property from a version, with DB mutex not acquired. Test Plan: Add several checks of this new property in existing codes for various cases. Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, igor, leveldb Differential Revision: https://reviews.facebook.net/D20733	2014-08-06 11:39:46 -07:00
Feng Zhu	1129921e9b	logging_when_create_and_delete_manifest Summary: 1. logging when create and delete manifest file 2. fix formating in table/format.cc Test Plan: make all check run db_bench, track the LOG file. Reviewers: ljin, yhchiang, igor, yufei.zhu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21009	2014-08-04 11:25:42 -07:00
Igor Canadi	e4c3673923	Never CompactRange to level 0 in level compaction Summary: I was bit by this when developing SpatialDB. In case all files are at level 0, CompactRange() will output the compacted files to level 0. This is not ideal, since read amp. is much better at level 1 and higher. Test Plan: Compacted data in SpatialDB, read manifest using ldb, verified that files are now at level 1 instead of 0. Reviewers: sdong, ljin, yhchiang, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20901	2014-08-01 06:41:48 -07:00
Yueh-Hsuan Chiang	1903aa5cc7	Fixed a warning / error in signed and unsigned comparison Summary: Fixed the following compilation error detected in mac: db/db_test.cc:2524:3: note: in instantiation of function template specialization 'rocksdb::test::Tester::IsEq<unsigned long long, int>' requested here ASSERT_EQ(int_num, 0); ^ Test Plan: make	2014-07-31 14:48:00 -07:00
Yueh-Hsuan Chiang	67dae255a9	Remove a check for merge operator in builder.cc Summary: Previously, builder.cc has a check for merge operator which prevents RocksDB from crash when reopening a DB w/o properly specifying the merge operator. However, currently we observed a memory leak on failing in RocksDB recovery. This diff removes such check and let it crash instead of causing memory leak for now before we have identified the real cause of the memory leak. Test Plan: make all check Reviewers: sdong Subscribers: ljin, igor Differential Revision: https://reviews.facebook.net/D20913	2014-07-31 14:22:21 -07:00
Yueh-Hsuan Chiang	2105ecac4d	Temporary remove the last test in merge_test	2014-07-31 11:20:49 -07:00
Stanislau Hlebik	3215967205	Fix readonly db Summary: DBImplReadOnly::CompactRange wasn't override DBImpl::CompactRange; this can cause problem when using StackableDB inheritors like DbWithTtl. P. S. Thanks C++11 for override :) Test Plan: make all check Reviewers: igor, sdong Reviewed By: sdong Subscribers: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D20829	2014-07-30 18:21:55 -07:00
Yueh-Hsuan Chiang	e9269e6ece	Fixed a typo in the comment for merge operator. Summary: Fixed a typo in the comment for merge operator. Test Plan: n/a	2014-07-30 17:25:11 -07:00
Yueh-Hsuan Chiang	49ee5a4ac4	Fixed the crash when merge_operator is not properly set after reopen. Summary: Fixed the crash when merge_operator is not properly set after reopen and added two test cases for this. Test Plan: make merge_test ./merge_test Reviewers: igor, ljin, sdong Reviewed By: sdong Subscribers: benj, mvikjord, leveldb Differential Revision: https://reviews.facebook.net/D20793	2014-07-30 17:24:36 -07:00
Stanislau Hlebik	76286ee67e	Remove unnecessary constructor parameter from ColumnFamilyData Summary: const string& dbname parameter is not used Test Plan: make all Reviewers: sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20703	2014-07-30 13:53:08 -07:00
sdong	473e829784	Fix ldb dump_manifest Summary: We now reads table properties in VersionSet::LogAndApply(), which requires options.db_paths to be set. But since ldb_cmd directly creates VersionSet without initialization db_paths, causing a seg fault. This patch fix it by initializing db_paths. log_and_apply_bench still shows segfault, because table cache is nullptr in VersionSet created. Test Plan: Run ldb dump_manifest which used to fail. Reviewers: yhchiang, ljin, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20751	2014-07-30 10:17:48 -07:00
sdong	f04356e660	Add DB::GetIntProperty() to return integer properties to be returned as integers Summary: We have quite some properties that are integers and we are adding more. Add a function to directly return them as an integer, instead of a string Test Plan: Add several unit test checks Reviewers: yhchiang, igor, dhruba, haobo, ljin Reviewed By: ljin Subscribers: yoshinorim, leveldb Differential Revision: https://reviews.facebook.net/D20637	2014-07-28 16:55:57 -07:00
sdong	f6784766db	Add DB property estimated number of keys Summary: Add a DB property of estimated number of live keys, by adding number of entries of all mem tables and all files, subtracted by all deletions in all files. Test Plan: Add the case in unit tests Reviewers: hobbymanyp, ljin Reviewed By: ljin Subscribers: MarkCallaghan, yoshinorim, leveldb, igor, dhruba Differential Revision: https://reviews.facebook.net/D20631	2014-07-28 15:27:58 -07:00
Lei Jin	7e8bb71dd0	InternalStats to take cfd on constructor Summary: It has one-to-one relationship with CFD. Take a pointer to CFD on constructor to avoid passing cfd through member functions. Test Plan: make Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20565	2014-07-28 12:27:08 -07:00
Lei Jin	1bd3431f7c	Change StopWatch interface Summary: So that we can avoid calling NowSecs() in MakeRoomForWrite twice Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20529	2014-07-28 12:22:37 -07:00
Lei Jin	f6ca226c17	make statistics forward-able Summary: Make StatisticsImpl being able to forward stats to provided statistics implementation. The main purpose is to allow us to collect internal stats in the future even when user supplies custom statistics implementation. It avoids intrumenting 2 sets of stats collection code. One immediate use case is tuning advisor, which needs to collect some internal stats, users may not be interested. Test Plan: ran db_bench and see stats show up at the end of run Will run make all check since some tests rely on statistics Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20145	2014-07-28 12:10:49 -07:00
Lei Jin	40fa8a4cd5	make statistics forward-able Summary: Make StatisticsImpl being able to forward stats to provided statistics implementation. The main purpose is to allow us to collect internal stats in the future even when user supplies custom statistics implementation. It avoids intrumenting 2 sets of stats collection code. One immediate use case is tuning advisor, which needs to collect some internal stats, users may not be interested. Test Plan: ran db_bench and see stats show up at the end of run Will run make all check since some tests rely on statistics Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20145	2014-07-28 12:05:36 -07:00
Lei Jin	d650612c4c	expose RateLimiter definition Summary: User gets undefinied error since the definition is not exposed. Also re-enable the db test with only upper bound check Test Plan: db_test, rate_limit_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20403	2014-07-25 15:17:06 -07:00
Yueh-Hsuan Chiang	6480717a26	Fixed compaction-related errors where number of input levels are hard-coded. Summary: Fixed compaction-related errors where number of input levels are hard-coded. It's a bug found in compaction branch. This diff will be pushed into master. Test Plan: export ROCKSDB_TESTS=Compact make db_test -j32 ./db_test also passed the tests in compaction branch Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20577	2014-07-24 17:06:00 -07:00
Igor Canadi	41a697256f	NewIterators in read-only mode Summary: As title. Test Plan: Added test to column_family_test Reviewers: ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20523	2014-07-23 16:52:11 -04:00
Feng Zhu	da9274574f	Use IterKey instead of string in Block::Iter to reduce malloc Summary: Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test Use IterKey in block::Iter to replace std::string to reduce malloc. Evaluate it using perf record. malloc: 4.26% -> 2.91% free: 3.61% -> 3.08% Test Plan: make all check ./valgrind db_test dbformat_test Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20433	2014-07-23 12:31:11 -07:00
Yueh-Hsuan Chiang	0e1b4787ed	Fixed a bug in Compaction.cc where input_levels_ was not properly resized. Summary: Fixed a bug in Compaction.cc where input_levels_ was not properly resized. Without this fix, there would be invalid access in input_levels_ when more than two levels are involved in one compaction run. This fix will go to master instead of compaction branch. Test Plan: tested in compaction branch. Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20481	2014-07-23 10:22:21 -07:00
Igor Canadi	0ff183a0d9	Move include/utilities/.h to include/rocksdb/utilities/.h Summary: All public headers need to be under `include/rocksdb` directory. Otherwise, clients include our header files like this: #include <rocksdb/db.h> #include <utilities/backupable_db.h> // still our public header! Also, internally, we include: #include "utilities/backupable/backupable_db.h" // internal header #include "utilities/backupable_db.h" // public header which is confusing. This way, when we install rocksdb as a system library, we can just copy `include/rocksdb` directory to system's header files. We can't really copy `utilities` directory to system's header files. Test Plan: compiles Reviewers: dhruba, ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20409	2014-07-23 10:21:38 -04:00
sdong	f6b7e1ed1a	Allow user to specify DB path of output file of manual compaction Summary: Add a parameter path_id to DB::CompactRange(), to indicate where the output file should be placed to. Test Plan: add a unit test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, igor, dhruba, MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D20085	2014-07-21 19:06:00 -07:00
Lei Jin	f6f1533c6f	make internal stats independent of statistics Summary: also make it aware of column family output from db_bench ``` Compaction Stats [default] Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s) Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 14 956 0.9 0.0 0.0 0.0 2.7 2.7 0.0 0.0 0.0 111.6 0 0 0 0 24 40 0.612 75.20 492387 0.15 L1 21 2001 2.0 5.7 2.0 3.7 5.3 1.6 5.4 2.6 71.2 65.7 31 43 55 12 82 2 41.242 43.72 41183 1.06 L2 217 18974 1.9 16.5 2.0 14.4 15.1 0.7 15.6 7.4 70.1 64.3 17 182 185 3 241 16 15.052 0.00 0 0.00 L3 1641 188245 1.8 9.1 1.1 8.0 8.5 0.5 15.4 7.4 61.3 57.2 9 75 76 1 152 9 16.887 0.00 0 0.00 L4 4447 449025 0.4 13.4 4.8 8.6 9.1 0.5 4.7 1.9 77.8 52.7 38 79 100 21 176 38 4.639 0.00 0 0.00 Sum 6340 659201 0.0 44.7 10.0 34.7 40.6 6.0 32.0 15.2 67.7 61.6 95 379 416 37 676 105 6.439 118.91 533570 0.22 Int 0 0 0.0 1.2 0.4 0.8 1.3 0.5 5.2 2.7 59.1 65.6 3 7 9 2 20 10 2.003 0.00 0 0.00 Stalls(secs): 75.197 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 43.717 leveln_slowdown Stalls(count): 492387 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 41183 leveln_slowdown DB Stats Uptime(secs): 202.1 total, 13.5 interval Cumulative writes: 6291456 writes, 6291456 batches, 1.0 writes per batch, 4.90 ingest GB Cumulative WAL: 6291456 writes, 6291456 syncs, 1.00 writes per sync, 4.90 GB written Interval writes: 1048576 writes, 1048576 batches, 1.0 writes per batch, 836.0 ingest MB Interval WAL: 1048576 writes, 1048576 syncs, 1.00 writes per sync, 0.82 MB written Test Plan: ran it Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19917	2014-07-21 12:57:29 -07:00
Feng Zhu	50c2dcb78f	add options.block_restart_interval in db_bench Summary: Add block_restart_interval in db_bench, default value 16 Test Plan: make Reviewers: sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20331	2014-07-21 12:01:40 -07:00
Chilledheart	54f4e2f188	Fix clang compiler warnings	2014-07-20 22:57:20 +08:00
Stanislau Hlebik	9d70cce047	Adding option to save PlainTable index and bloom filter in SST file. Summary: Adding option to save PlainTable index and bloom filter in SST file. If there is no bloom block and/or index block, PlainTableReader builds new ones. Otherwise PlainTableReader just use these blocks. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19527	2014-07-18 16:58:13 -07:00
Stanislau Hlebik	92d73cbe78	Add PlainTableOptions Summary: Since we have a lot of options for PlainTable, add a struct PlainTableOptions to manage them Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20175	2014-07-18 00:08:38 -07:00
Yueh-Hsuan Chiang	052ddbe0e2	Add MaxInputLevel() to CompactionPicker Summary: Having if-then branch for different compaction strategies is considered hacky and make CompactionPicker less pluggable. This diff removes two of such if-then branches in version_set.cc by adding MaxInputLevel() to CompactionPicker. // Given the current number of levels, returns the lowest allowed level // for compaction input. virtual int MaxInputLevel(int current_num_levels) const; Test Plan: make db_test export ROCKSDB_TESTS=Compaction ./db_test Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19971	2014-07-17 18:01:04 -07:00
Yueh-Hsuan Chiang	aac941b3f0	Fixed a signed and unsigned comparison in Compaction Summary: Fixed a signed and unsigned comparison in Compaction Test Plan: make db_test export ROCKSDB_TESTS=Compaction ./db_test	2014-07-17 16:37:25 -07:00
Radheshyam Balasundaram	0d57e3ad7d	Guarding files_ attribute with #ifndef NDEBUG guard in FilePicker class. Summary: Adding guards to files_ attribute of FilePicker class. This attribute is used only in DEBUG mode. This fixes build of static_lib in mac. Test Plan: make static_lib in mac make check all in devserver Reviewers: ljin, igor, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20163	2014-07-17 15:07:05 -07:00
Yueh-Hsuan Chiang	3178510153	Allow class Compaction to handle input files from multiple levels. Summary: Allow class Compaction to handle input files from multiple levels. This diff is a subset of https://reviews.facebook.net/D19263 where only db/compaction.cc and db/compaction.h are changed. Test Plan: make db_test export ROCKSDB_TESTS=Compaction ./db_test Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19923	2014-07-17 14:36:41 -07:00
Yueh-Hsuan Chiang	296e340753	Add struct CompactionInputFiles to manage compaction input files. Summary: Add struct CompactionInputFiles to manage compaction input files. Test Plan: export ROCKSDB_TESTS=Compact make db_test ./db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20061	2014-07-16 18:12:17 -07:00
Feng Zhu	bc6b2ab401	enable kHashSearch for blocktable in db_bench Summary: add a flag called use_hash_search in db_bench Test Plan: make all check ./db_bench --use_hash_search=1 Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D20067	2014-07-16 17:32:30 -07:00
Feng Zhu	87895c62db	fix bug in LOG for flush memtable Summary: One line change to fix a bug in the LOG when flush memtable Test Plan: NONE Reviewers: sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20049	2014-07-16 16:56:49 -07:00
Stanislau Hlebik	1c9f190ae3	Fix db_test Summary: Added deletion of DBIterators in DBIterator's tests Test Plan: make valgrind_check Reviewers: igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20043	2014-07-16 14:51:43 -07:00
Radheshyam Balasundaram	0418e66e2a	Refactoring Version::Get() Summary: Refactoring Version::Get() method to move file picker logic to a separate class. Test Plan: make check all Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19713	2014-07-16 13:33:02 -07:00
Feng Zhu	c11d604ab3	store file_indexer info in sequential memory Summary: use arena to allocate space for next_level_index_ and level_rb_ Thus increasing data locality and make Version::Get faster. Benchmark detail Base version: commit `d2a727c182` command used: ./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=2097152 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=fillseq, readrandom,readrandom,readrandom --use_existing_db=0 --num=52428800 --threads=1 Result: cpu running percentage: Version::Get, improved from 7.98% to 7.42% FileIndexer::GetNextLevelIndex, improved from 1.18% to 0.68%. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, igor Differential Revision: https://reviews.facebook.net/D19845	2014-07-16 11:21:30 -07:00
Stanislau Hlebik	d916593ead	Add Prev() for merge operator Summary: Implement Prev() with merge operator for DBIterator. Request from mongoDB. Task 4673663. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19743	2014-07-15 16:10:18 -07:00
sdong	0abaed2e08	Support multiple DB directories in universal compaction style Summary: This patch adds a target size parameter in options.db_paths and universal compaction will base it to determine which DB path to place a new file. Level-style stays the same. Test Plan: Add new unit tests Reviewers: ljin, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D19869	2014-07-15 12:06:28 -07:00
Igor Canadi	20c056306b	Remove stats logger Summary: Browsing through the code, looks like StatsLogger is not used at all! Test Plan: compiles Reviewers: ljin, sdong, yhchiang, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19827	2014-07-15 09:16:32 -04:00
Igor Canadi	d2a727c182	BG -> GB	2014-07-14 09:06:38 -07:00
Igor Canadi	ee6b35e55a	Fix mac compile Summary: We should use PRIu64 instead of "%lu" for portability Test Plan: compiles now Reviewers: ljin, dhruba Reviewed By: dhruba Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19809	2014-07-14 09:56:52 -04:00
Lei Jin	46f0f6ddd5	improve InternalStats output Summary: as title Test Plan: sampe output: Level Files Size(MB) Score Read(GB) Rn(GB) Rnp1(GB) Write(BG) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s) Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 15 1024 1.0 0.0 0.0 0.0 8.2 8.2 0.0 0.0 0.0 111.4 0 0 1 1 75 123 0.612 295.94 1939238 0.15 L1 23 2118 2.1 20.9 8.3 12.7 20.0 7.3 5.0 2.4 73.2 69.9 124 141 208 67 293 8 36.582 17.05 16100 1.06 L2 162 15333 1.5 47.0 7.1 40.0 42.6 2.6 12.7 6.0 67.9 61.5 62 457 482 25 709 55 12.898 0.00 0 0.00 L3 985 108065 1.1 37.8 4.0 33.9 36.9 3.0 18.8 9.3 60.1 58.5 41 338 363 25 645 31 20.812 0.00 0 0.00 L4 2788 356033 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0 0 0 0 0 0.000 0.00 0 0.00 Sum 3973 482572 0.0 105.8 19.3 86.5 107.7 21.2 11.1 5.6 62.9 64.0 227 936 1054 118 1723 217 7.938 312.99 1955338 0.16 Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19707	2014-07-11 15:03:30 -07:00
Feng Zhu	178fd6f9db	use FileLevel in LevelFileNumIterator Summary: Use FileLevel in LevelFileNumIterator, thus use new version of findFile. Old version of findFile function is deleted. Write a function in version_set.cc to generate FileLevel from files_. Add GenerateFileLevelTest in version_set_test.cc Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D19659	2014-07-11 12:52:41 -07:00
Tomislav Novak	3b97ee96c4	ForwardIterator seek bugfix Summary: If `NeedToSeekImmutable()` returns false, `SeekInternal()` won't reset the contents of `immutable_min_heap_`. However, since it calls `UpdateCurrent()` unconditionally, if `current_` is one of immutable iterators (previously popped from `immutable_min_heap_`), `UpdateCurrent()` will overwrite it. As a result, if old `current_` in fact pointed to the smallest entry, forward iterator will skip some records. Fix implemented in this diff pushes `current_` back to `immutable_min_heap_` before calling `UpdateCurrent()`. Test Plan: New unit test (courtesy of @lovro): $ ROCKSDB_TESTS=TailingIteratorSeekToSame ./db_test Reviewers: igor, dhruba, haobo, ljin Reviewed By: ljin Subscribers: lovro, leveldb Differential Revision: https://reviews.facebook.net/D19653	2014-07-10 16:46:13 -07:00
Reed Allman	1fc71a4b16	C API: create missing cf's, cleanup	2014-07-10 12:55:53 -07:00
Tomislav Novak	105c1e099b	ForwardIterator::status() checks all child iterators Summary: Forward iterator only checked `status_` and `mutable_iter_->status()`, which is not sufficient. For example, when reading exclusively from cache (kBlockCacheTier), `mutable_iter_->status()` may return kOk (e.g. there's nothing in the memtable), but one of immutable iterators could be in kIncomplete. In this case, `ForwardIterator::status()` ought to return that status instead of kOk. This diff changes `status()` to also check `imm_iters_`, `l0_iters_`, and `level_iters_`. Test Plan: ROCKSDB_TESTS=TailingIteratorIncomplete ./db_test Reviewers: ljin, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19581	2014-07-10 12:43:12 -07:00
sdong	36de0e5359	Add a function to return current perf level Summary: Add a function to return the perf level. It is to allow a wrapper of DB to increase the perf level and restore the original perf level after finishing the function call. Test Plan: Add a verification in db_test Reviewers: yhchiang, igor, ljin Reviewed By: ljin Subscribers: xjin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19551	2014-07-10 11:35:48 -07:00
Stanislau Hlebik	30c81e7717	Removing NewTotalOrderPlainTableFactory Summary: Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect. Total order mode indicator is prefix_extractor == nullptr, but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests in plain_table_db_tests is incorrect. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19587	2014-07-10 11:32:04 -07:00
Igor Canadi	f0a8be253e	JSON (Document) API sketch Summary: This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API. I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :) Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time. Test Plan: Added a simple unit test Reviewers: haobo, yhchiang, dhruba, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18747	2014-07-10 09:31:42 -07:00
Feng Zhu	222cf2555a	change the init parameter for FileDescriptor Summary: fix a bug in improve_file_key_search, change the parameter for FileDescriptor Test Plan: make all check Reviewers: sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D19611	2014-07-09 23:40:03 -07:00
Lei Jin	8a7d1fe616	disable rate limiter test Summary: The test is not stable because it relies on disk and only runs for a short period of time. So misisng a compaction/flush would greatly affect the rate. I am disabling it for now. What do you guys think? Test Plan: make Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19599	2014-07-09 22:46:15 -07:00
Feng Zhu	f697cad15c	create compressed_levels_ in Version, allocate its space using arena. Make Version::Get, Version::FindFile faster Summary: Define CompressedFileMetaData that just contains fd, smallest_slice, largest_slice. Create compressed_levels_ in Version, the space is allocated using arena Thus increase the file meta data locality, speed up "Get" and "FindFile" benchmark with in-memory tmpfs, could have 4% improvement under "random read" and 2% improvement under "read while writing" benchmark command: ./db_bench --db=/mnt/db/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --disable_wal=0 --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --benchmarks=readwhilewriting,readwhilewriting,readwhilewriting --use_existing_db=1 --num=52428800 --threads=1 —writes_per_second=81920 Read Random: From 1.8363 ms/op, improve to 1.7587 ms/op. Read while writing: From 2.985 ms/op, improve to 2.924 ms/op. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, igor Differential Revision: https://reviews.facebook.net/D19419	2014-07-09 22:14:39 -07:00
Yueh-Hsuan Chiang	70828557ef	Some fixes on size compensation logic for deletion entry in compaction Summary: This patch include two fixes: 1. newly created Version will now takes the aggregated stats for average-value-size from the latest Version. 2. compensated size of a file is now computed only for newly created / loaded file, this addresses the issue where files are already sorted by their compensated file size but might sometimes observe some out-of-order due to later update on compensated file size. Test Plan: export ROCKSDB_TESTS=CompactionDele ./db_test Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19557	2014-07-09 12:46:08 -07:00
Lei Jin	ef1aad97f9	fix one more internal_stats issue Summary: stall count is wrong Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19539	2014-07-08 15:29:13 -07:00
Lei Jin	73d7147096	make rate limiter test more reliable Summary: Randomize keys so that compaction actually happens. Change the config so that compaction happens more aggressively. The test takes longer time, but the results are more stable shown by iostat Test Plan: ran it Reviewers: igor, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19533	2014-07-08 15:15:00 -07:00
Lei Jin	8a9cc7885c	report correct interval amplification Summary: as title Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19515	2014-07-08 12:48:10 -07:00
Lei Jin	534357ca3a	integrate rate limiter into rocksdb Summary: Add option and plugin rate limiter for PosixWritableFile. The rate limiter only applies to flush and compaction. WAL and MANIFEST are excluded from this enforcement. Test Plan: db_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19425	2014-07-08 12:31:49 -07:00
Lei Jin	b278ae8e50	Apply fractional cascading in ForwardIterator::Seek() Summary: Use search hint to reduce FindFile range thus avoid comparison For a small DB with 50M keys, perf_context counter shows it reduces comparison from 2B to 1.3B for a 15-minute run. No perf change was observed for 1 seek thread, but quite good improvement was seen for 32 seek threads, when CPU was busy. will post detail results when ready Test Plan: db_bench and db_test Reviewers: haobo, sdong, dhruba, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18879	2014-07-08 11:40:42 -07:00
Igor Canadi	1b95bf734d	Merge pull request #197 from rdallman/update-options C API: update options w/ convenience funcs & fifo compaction	2014-07-08 11:40:11 -07:00
Reed Allman	fd3fb4b0bf	C API: update options w/ convenience funcs & fifo compaction	2014-07-08 10:57:45 -07:00
Reed Allman	e9b18b6b89	C API: bugfix column_family_comact_range	2014-07-07 21:48:49 -07:00
Igor Canadi	4adf64e068	Fix compile issue	2014-07-07 14:54:11 -07:00
Igor Canadi	8a03935f8c	Fix valgrind error in c_test Summary: External contribution caused some valgrind errors: `1a34aaaef0` This diff fixes them Test Plan: ran valgrind Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19485	2014-07-07 14:41:54 -07:00
Evan Shaw	13a130cc00	C API: Add test for compaction filter factories Also refactored the compaction filter tests to share some code and ensure that options were getting reset so future test results aren't confused.	2014-07-08 08:12:36 +12:00
Evan Shaw	3f7104d7c5	C API: Allow setting compaction filter factory	2014-07-08 08:12:36 +12:00
Evan Shaw	91bede79cc	C API: Add support for compaction filter factories (v1)	2014-07-08 08:12:36 +12:00
Radheshyam Balasundaram	f0660d5253	Adding NUMA support to db_bench tests Summary: Changes: - Adding numa_aware flag to db_bench.cc - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op. Test Plan: Ran db_bench tests using following command: ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True] The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True. Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin, igor Subscribers: igor, leveldb Differential Revision: https://reviews.facebook.net/D19353	2014-07-07 10:53:31 -07:00
Igor Canadi	0bc5fa9f40	Merge pull request #190 from edsrzf/c-api-writebatch-serialized C API: support constructing write batch from serialized representation	2014-07-07 10:17:43 -07:00
Reed Allman	1a34aaaef0	C API: column family support	2014-07-07 01:41:01 -07:00
Evan Shaw	9fc23d0c56	C API: support constructing write batch from serialized representation	2014-07-06 10:36:33 +12:00
Yueh-Hsuan Chiang	7b85c1e900	Improve SimpleWriteTimeoutTest to avoid false alarm. Summary: SimpleWriteTimeoutTest has two parts: 1) insert two large key/values to make memtable full and expect both of them are successful; 2) insert another key / value and expect it to be timed-out. Previously we also set a timeout in the first step, but this might sometimes cause false alarm. This diff makes the first two writes run without timeout setting. Test Plan: export ROCKSDB_TESTS=Time make db_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19461	2014-07-04 00:02:12 -07:00
Yueh-Hsuan Chiang	d33657a4a5	Fixed a warning in release mode. Summary: Removed a variable that is only used in assertion check. Test Plan: make release Reviewers: ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19455	2014-07-03 17:19:17 -07:00
Yueh-Hsuan Chiang	90a6aca48e	Finer report I/O stats about Flush and Compaction. Summary: This diff allows the I/O stats about Flush and Compaction to be reported in a more accurate way. Instead of measuring the size of a file, it measure I/O cost in per read / write basis. Test Plan: make all check Reviewers: sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19383	2014-07-03 16:28:03 -07:00
Yueh-Hsuan Chiang	d4d338de33	Add timeout_hint_us to WriteOptions and introduce Status::TimeOut. Summary: This diff adds timeout_hint_us to WriteOptions. If it's non-zero, then 1) writes associated with this options MAY be aborted when it has been waiting for longer than the specified time. If an abortion happens, associated writes will return Status::TimeOut. 2) the stall time of the associated write caused by flush or compaction will be limited by timeout_hint_us. The default value of timeout_hint_us is 0 (i.e., OFF.) The statistics of timeout writes will be recorded in WRITE_TIMEDOUT. Test Plan: export ROCKSDB_TESTS=WriteTimeoutAndDelayTest make db_test ./db_test Reviewers: igor, ljin, haobo, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18837	2014-07-03 15:47:02 -07:00
Igor Canadi	4203431e71	Fix mac os compile error	2014-07-03 23:03:24 +02:00
sdong	2459f7ec4e	Support Multiple DB paths (without having an interface to expose to users) Summary: In this patch, we allow RocksDB to support multiple DB paths internally. No user interface is supported yet so this patch is silent to users. Test Plan: make all check Reviewers: igor, haobo, ljin, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18921	2014-07-02 21:14:44 -07:00
Igor Canadi	f146cab261	Centralize compression decision to compaction picker Summary: Before this diff, we're deciding enable_compression in CompactionPicker and then we're deciding final compression type in DBImpl. This is kind of confusing. After the diff, the final compression type will be decided in CompactionPicker. The reason for this is that I want CompactFiles() to specify output compression type, so that people can mix and match compression styles in their compaction algorithms. This diff makes it much easier to do that. Test Plan: make check Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19137	2014-07-02 20:40:57 +02:00
sdong	1d05006740	Re-commit the correct part (WalDir) of the revision: Commit `6634844dba` by sdong Two small fixes in db_test Summary: Two fixes: (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it. Test Plan: ./db_test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: nkg-, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19389	2014-07-01 18:54:50 -07:00
sdong	30b20604db	Revert "Two small fixes in db_test" This reverts commit `6634844dba`.	2014-07-01 17:41:38 -07:00
sdong	9c332aa11a	HashLinkList memtable switches a bucket to a skip list to reduce performance outliers Summary: In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search. Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list. The new data structure is documented in comments in the codes. Test Plan: make all check set threshold_use_skiplist in several tests Reviewers: yhchiang, haobo, ljin Reviewed By: yhchiang, ljin Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19299	2014-07-01 17:14:15 -07:00
sdong	6634844dba	Two small fixes in db_test Summary: Two fixes: (1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other (2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it. Test Plan: ./db_test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: nkg-, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19389	2014-07-01 16:58:03 -07:00
Igor Canadi	f5d4df1c02	Fix compile error	2014-07-01 10:55:03 +02:00
Igor Canadi	a2e0d890ed	No need for files_by_size_ in universal compaction Summary: files_by_size_ is sorted by time in case of universal compaction. However, Version::files_ is also sorted by time. So no need for files_by_size_ Test Plan: 1) make check with the change 2) make check with `assert(last_index == c->input_version_->files_[level].size() - 1);` in compaction picker Reviewers: dhruba, haobo, yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19125	2014-07-01 08:55:04 +02:00
Feng Zhu	5656367416	use arena to allocate memtable's bloomfilter and hashskiplist's buckets_ Summary: Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena DynamicBloom: pass arena via constructor, allocate space in SetTotalBits HashSkipListRep: allocate space of buckets_ using arena. do not delete it in deconstructor because arena would take care of it. Several test files are changed. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D19335	2014-06-30 15:54:31 -07:00
sdong	dd337bc0b2	In logging format, use PRIu64 instead of casting Summary: Code cleaning up, since we are already using __STDC_FORMAT_MACROS in printing uint64_t, change other places. Only logging is changed. Test Plan: make all check Reviewers: ljin Reviewed By: ljin Subscribers: dhruba, yhchiang, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19113	2014-06-27 16:34:15 -07:00
Stanislau Hlebik	a3594867ba	Cache some conditions for DBImpl::MakeRoomForWrite Summary: Task 4580155. Some conditions in DBImpl::MakeRoomForWrite can be cached in ColumnFamilyData, because theirs value can be changed only during compaction, adding new memtable and/or add recalculation of compaction score. These conditions are: cfd->imm()->size() == cfd->options()->max_write_buffer_number - 1 cfd->current()->NumLevelFiles(0) >= cfd->options()->level0_stop_writes_trigger cfd->options()->soft_rate_limit > 0.0 && (score = cfd->current()->MaxCompactionScore()) > cfd->options()->soft_rate_limit cfd->options()->hard_rate_limit > 1.0 && (score = cfd->current()->MaxCompactionScore()) > cfd->options()->hard_rate_limit P.S. As it's my first diff, Siying suggested to add everybody as a reviewers for this diff. Sorry, if I forgot someone or add someone by mistake. Test Plan: make all check Reviewers: haobo, xjin, dhruba, yhchiang, zagfox, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19311	2014-06-26 16:45:27 -07:00
sdong	19de6a7aad	Remove MemTableRep::GetIterator(const Slice& slice) Summary: It seems to me that when ever function MemTableRep::GetIterator(const Slice& slice) is used, we can use MemTableRep::GetDynamicPrefixIterator() instead. Just delete it to simplify the codes. Test Plan: make all check Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19281	2014-06-25 14:09:29 -07:00
Yueh-Hsuan Chiang	8898a0a0d1	Reorder the member variables of FileMetaData to improve cache locality. Summary: Move stats related member variables of FileMetaData to the bottom to improve cache locality of normal DB operations. Test Plan: make Reviewers: haobo, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19287	2014-06-24 19:22:11 -06:00
Yueh-Hsuan Chiang	e813f5b6d9	Allow compaction to reclaim storage more effectively. Summary: This diff allows compaction to reclaim storage more effectively. In the current design, compactions are mainly triggered based on the file sizes. However, since deletion entries does not have value, files which have many deletion entries are less likely to be compacted. As a result, it may took a while to make deletion entries to be compacted. This diff address issue by compensating the size of deletion entries during compaction process: the size of each deletion entry in the compaction process is augmented by 2x average value size. The diff applies to both leveled and universal compacitons. Test Plan: develop CompactionDeletionTrigger make db_test ./db_test Reviewers: haobo, igor, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19029	2014-06-24 16:37:06 -06:00
Yueh-Hsuan Chiang	faa8d21922	Improve an assertion in RandomGenerator::Generate() in db_bench. Summary: RandomGenerator::Generate() currently has an assertion len < data_.size(). However, it is actually fine to have len == data_.size(). This diff change the assertion to len <= data_.size(). Test Plan: make db_bench ./db_bench Reviewers: haobo, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19269	2014-06-24 15:29:28 -06:00
Lei Jin	3b0dc76699	db_bench: measure the real latency of write/delete Summary: as title Test Plan: make release Reviewers: haobo, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19227	2014-06-23 13:23:02 -07:00
Lei Jin	a1b5650a75	db_bench: sanity check on compression ratio Summary: as requested by mark Test Plan: make release Reviewers: sdong, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19221	2014-06-23 10:46:16 -07:00
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	2014-06-20 10:23:02 +02:00
Igor Canadi	107e08baa7	Use same sorting for all level 0 files Summary: We decided that one of the long term goals is to unify level and universal compaction. As a small first step, I'm unifying level 0 sorting methods. Previously, we used to sort level 0 files in level compaction by file number and in universal compaction by sequence number. But it turns out that in level compaction, sorting by file number is exactly the same as sorting by sequence number. Test Plan: Ran make check with bunch of asserts to verify the sorting order is exactly the same. Also, make check with this patch Reviewers: haobo, yhchiang, ljin, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19131	2014-06-20 09:12:14 +02:00
Haobo Xu	7a9dd5f214	[RocksDB] Make block based table hash index more adaptive Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index. Test Plan: unit test, also manually veried LOG that fallback did occur. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19191	2014-06-19 16:40:32 -07:00
Yueh-Hsuan Chiang	4f5ccfd179	Fixed a potential write hang Summary: Currently, when something badly happen in the DB::Write() while the write-queue contains more than one element, the current design seems to forget to clean up the queue as well as wake-up all the writers, this potentially makes rocksdb hang on writes. Test Plan: make all check Reviewers: sdong, ljin, igor, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19167	2014-06-19 14:53:03 -07:00
Igor Canadi	bae495740d	Merge pull request #179 from edsrzf/c-api-compaction-filter Support for compaction filters in the C API	2014-06-19 21:22:46 +02:00
Lei Jin	c4e90c79ed	bug fix: iteration over ColumnFamilySet needs to be under mutex Summary: asan_crash_test is failing on segfault Test Plan: running asan_crash_test Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19149	2014-06-19 09:31:14 -07:00
Evan Shaw	5363eb8ad4	Add a test for using compaction filters via the C API	2014-06-19 21:46:58 +12:00
Evan Shaw	d72313a7fa	Add a way to set compaction filter in the C API	2014-06-19 16:31:24 +12:00
Evan Shaw	df2701373d	Support for compaction filters in the C API	2014-06-19 16:31:17 +12:00
sdong	edd47c5104	PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key Summary: Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes. The data format is documented in table/plain_table_factory.h Test Plan: Add unit test coverage in plain_table_db_test Reviewers: yhchiang, igor, dhruba, ljin, haobo Reviewed By: haobo Subscribers: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18735	2014-06-18 20:41:52 -07:00
Igor Canadi	3525aac9e5	Change order of parameters in adaptive table factory Summary: This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation: 1) block based factory 2) plain table factory 3) output factory 4) new format factory I think it makes more sense to have output as the first parameter. Also, fixed a NewAdaptiveTableFactory() call in unit test Test Plan: unit test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19119	2014-06-18 07:04:37 +02:00
sdong	8c265c08f1	HashLinkList to log distribution of number of entries aross buckets Summary: Add two parameters of hash linked list to log distribution of number of entries across all buckets, and a sample row when there are too many entries in one single bucket. Test Plan: Turn it on in plain_table_db_test and see the logs. Reviewers: haobo, ljin Reviewed By: ljin Subscribers: leveldb, nkg-, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D19095	2014-06-17 17:55:36 -07:00
sdong	200e4b4a72	Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format. Test Plan: add a unit test Reviewers: haobo, igor Reviewed By: igor Subscribers: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19017	2014-06-17 11:49:22 -07:00
Igor Canadi	4f18bfe376	Merge pull request #176 from bgrainger/mutexrw-unlock Add separate Read/WriteUnlock methods in MutexRW.	2014-06-17 20:38:06 +02:00
Yueh-Hsuan Chiang	e6e259b8ab	Include max_write_buffer_number >= 2 to SanitizeOptions. Summary: Include max_write_buffer_number >= 2 to SanitizeOptions. Test Plan: make all check Reviewers: haobo, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19077	2014-06-16 16:26:46 -07:00
sdong	cadc1adffa	Refactor: group metadata needed to open an SST file to a separate copyable struct Summary: We added multiple fields to FileMetaData recently and are planning to add more. This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements: (1) use it to design a more efficient data structure to speed up read queries. (2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data. The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: ljin Subscribers: leveldb, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D18933	2014-06-16 16:10:52 -07:00
Bradley Grainger	2d02ec6533	Add separate Read/WriteUnlock methods in MutexRW. Some platforms, particularly Windows, do not have a single method that can release both a held reader lock and a held writer lock; instead, a separate method (ReleaseSRWLockShared or ReleaseSRWLockExclusive) must be called in each case. This may also be necessary to back MutexRW with a shared_mutex in C++14; the current language proposal includes both an unlock() and a shared_unlock() method.	2014-06-16 15:41:46 -07:00
sdong	983c93d731	VersionSet::Get(): Bring back the logic of skipping key range check when there are <=3 level 0 files Summary: https://reviews.facebook.net/D17205 removed the logic of skipping file key range check when there are less than 3 level 0 files. This patch brings it back. Other than that, add another small optimization to avoid to check all the levels if most higher levels don't have any file. Test Plan: make all check Reviewers: ljin Reviewed By: ljin Subscribers: yhchiang, igor, haobo, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19035	2014-06-13 15:51:44 -07:00
Lei Jin	c83b085770	prefetch bloom filter data block for L0 files Summary: as title Test Plan: db_bench the initial result is very promising. I will post results of complete runs Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18867	2014-06-12 10:06:18 -07:00
Lei Jin	77db08f27b	fix forward iterator bug Summary: obvious Test Plan: db_test Reviewers: sdong, haobo, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18987	2014-06-10 09:57:26 -07:00
sdong	80f409ea37	Clean PlainTableReader's variables for better data locality Summary: Clean PlainTableReader's data structures: (1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization) (2) remove some variables only used when initialization from the class (3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly (4) make Options a reference. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18891	2014-06-09 13:53:39 -07:00
Igor Canadi	0365eaf12e	remove unnecessary printf	2014-06-06 18:27:44 -07:00
Igor Canadi	a0191c9dfe	Create Missing Column Families Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490 Test Plan: added unit test. also, stress test for some time Reviewers: sdong, haobo, dhruba, ljin, yhchiang Reviewed By: yhchiang Subscribers: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D18951	2014-06-06 18:04:56 -07:00
Igor Canadi	99d3eed2fd	Write Fast-path for single column family Summary: We have a perf regression of Write() even with one column family. Make fast path for single column family to avoid the perf regression. See task #4455480 Test Plan: make check Reviewers: sdong, ljin Reviewed By: sdong, ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18963	2014-06-06 17:26:23 -07:00
sdong	b92a19a431	sst_dump: Set dummy prefix extractor for binary search index in block based table Summary: Now sst_dump fails in block based tables if binary search index is used, as it requires a prefix extractor. Add it. Test Plan: Run it against such a file to make sure it fixes the problem. Reviewers: yhchiang, kailiu Reviewed By: kailiu Subscribers: ljin, igor, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D18927	2014-06-05 15:37:23 -07:00
Igor Canadi	5d870717ae	Correctly preallocate files in universal compaction Summary: In universal compaction, MaxFileSizeForLevel is ULLONG_MAX. We've been preallocation files to UULONG_MAX size all these time :) Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18915	2014-06-05 13:19:35 -07:00
Igor Canadi	fd27001072	Fix compile errors on Mac Summary: https://phabricator.fb.com/P11372644 Test Plan: compiles Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18873	2014-06-03 12:28:58 -07:00
sdong	df9069d23f	In DB::NewIterator(), try to allocate the whole iterator tree in an arena Summary: In this patch, try to allocate the whole iterator tree starting from DBIter from an arena 1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it. 2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator. 3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it. Limitations: (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc (2) Two level iterator itself is allocated in arena, but not iterators inside it. Test Plan: make all check Reviewers: ljin, haobo Reviewed By: haobo Subscribers: leveldb, dhruba, yhchiang, igor Differential Revision: https://reviews.facebook.net/D18513	2014-06-02 17:44:57 -07:00
Igor Canadi	91ddd587cc	Only signal cond variable if need to Summary: At the end of BackgroundCallCompaction(), we call SignalAll(), even though we don't need to. If compaction hasn't done anything and there's another compaction running, there is no need to signal on the condition variable. Doing so creates a tight feedback loop which results in log files like: wait for memtable flush compaction nothing to do wait for memtable flush compaction nothing to do This change eliminates that Test Plan: make check Also: icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG \| wc -l 7435 icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG \| wc -l 372 First version is before the change, second version is after the change. Reviewers: dhruba, ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18855	2014-06-02 17:23:55 -07:00
Igor Canadi	8cb7ad83c3	Flush stale column families less aggressively Summary: We've seen some production issues where column family is detected as stale, although there is only one column family in the system. This is a quick fix that: 1) doesn't flush stale column families if there's only one of them 2) Use 4 as a coefficient instead of 2 for determening when a column family is stale. This will make flushing less aggressive, while still keep a nice dynamic flushing of very stale CFs. Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18861	2014-06-02 15:33:54 -07:00
Lei Jin	388d2054c7	forward iterator Summary: Forward iterator puts everything together in a flat structure instead of a hierarchy of nested iterators. this should simplify the code and provide better performance. It also enables more optimization since all information are accessiable in one place. Init evaluation shows about 6% improvement Test Plan: db_test and db_bench Reviewers: dhruba, igor, tnovak, sdong, haobo Reviewed By: haobo Subscribers: sdong, leveldb Differential Revision: https://reviews.facebook.net/D18795	2014-05-30 14:31:55 -07:00
Lei Jin	f29c62fc6f	add an iterator refresh option for SeekRandom Summary: One more option to allow iterator refreshing when using normal iterator Test Plan: ran db_bench Reviewers: haobo, sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18849	2014-05-30 14:09:22 -07:00
Igor Canadi	6de6a06631	FIFO compaction style Summary: Introducing new compaction style -- FIFO. FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values. FIFO compaction style is suited for storing high-frequency event logs. Test Plan: Added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba Subscribers: alberts, leveldb Differential Revision: https://reviews.facebook.net/D18765	2014-05-21 11:43:35 -07:00
Igor Canadi	b2cf95fe38	Call EnableFileDeletions with false as argument	2014-05-20 14:28:51 -07:00
sdong	4e0602f941	Remove maximum key_size check in db_bench Summary: Key size limit doesn't seem to be applicable anymore. Remove it. Test Plan: run a couple of tests in db_bench Reviewers: haobo, igor, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D18723	2014-05-15 11:06:37 -07:00
Igor Canadi	f4574449e9	Clean up compaction logging Summary: Cleaned up compaction logging a little bit. Now file sizes are easier to read. Also, removed the trailing space. Test Plan: verified that i'm happy with logging output: files_size[#33(seq=101,sz=98KB,0) #31(seq=81,sz=159KB,0) #26(seq=0,sz=637KB,0)] Reviewers: sdong, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18549	2014-05-14 12:13:50 -07:00
sdong	3e4a9ec241	Arena to inline 2KB of data in it. Summary: In order to use arena to a use case that the total allocation size might be small (LogBuffer is already such a case), inline 1KB of data in it, so that it can be mostly in stack or inline in another class. If always inlining 2KB is a concern, I could make it a template to determine what to inline. However, dependents need to changes. Doesn't go with it for now Test Plan: make all check. Reviewers: haobo, igor, yhchiang, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18609	2014-05-14 11:49:01 -07:00
Igor Canadi	26f5dd9a5a	TablePropertiesCollectorFactory Summary: This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options. Here's description of task #4296714: I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB. For example, this table has 3155 entries and 1391828 deleted keys. The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for. Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API. In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe. Test Plan: Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one). Also, all other tests Reviewers: sdong, dhruba, haobo, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D18579	2014-05-13 12:30:55 -07:00
Yueh-Hsuan Chiang	1c7799d8aa	Fixed a file-not-found issue when a log file is moved to archive. Summary: Fixed a file-not-found issue when a log file is moved to archive by doing a missing retry. Test Plan: make db_test export ROCKSDB_TEST=TransactionLogIteratorRace ./db_test Reviewers: sdong, haobo Reviewed By: sdong CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D18669	2014-05-12 17:50:21 -07:00
sdong	acd17fd002	Remove unused variable in DBIter Summary: as title Test Plan: Still compile Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18603	2014-05-09 13:20:35 -07:00
Igor Canadi	a1068c91a1	Make RocksDB work with newer gflags Summary: Newer gflags switched from `google` namespace to `gflags` namespace. See: https://github.com/facebook/rocksdb/issues/139 and https://github.com/facebook/rocksdb/issues/102 Unfortunately, they don't define any macro with their namespace, so we need to actually try to compile gflags with two different namespace to figure out which one is the correct one. Test Plan: works in fbcode environemnt. I'll also try in ubutnu with newer gflags Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18537	2014-05-08 17:25:13 -07:00
Igor Canadi	8e37a29bfb	Compaction with zero outputs Summary: We had a hypothesis in https://reviews.facebook.net/D18507 that empty-string internal keys might have been caused by compaction filter deleting all the entries. I added a unit test for that case. Unforutnately, everything works as expected. Test Plan: this is a test Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18519	2014-05-08 13:48:39 -07:00
Igor Canadi	768d424dd9	[fix] SIGSEGV when VersionEdit in MANIFEST is corrupted Summary: This was reported by our customers in task #4295529. Cause: * MANIFEST file contains a VersionEdit, which contains file entries whose 'smallest' and 'largest' internal keys are empty. String with zero characters. Root cause of corruption was not investigated. We should report corruption when this happens. However, we currently SIGSEGV. Here's what happens: * VersionEdit encodes zero-strings happily and stores them in smallest and largest InternalKeys. InternalKey::Encode() does assert when `rep_.empty()`, but we don't assert in production environemnts. Also, we should never assert as a result of DB corruption. * As part of our ConsistencyCheck, we call GetLiveFilesMetaData() * GetLiveFilesMetadata() calls `file->largest.user_key().ToString()` * user_key() function does: 1. assert(size > 8) (ooops, no assert), 2. returns `Slice(internal_key.data(), internal_key.size() - 8)` * since `internal_key.size()` is unsigned int, this call translates to `Slice(whatever, 1298471928561892576182756)`. Bazinga. Fix: * VersionEdit checks if InternalKey is valid in `VersionEdit::GetInternalKey()`. If it's invalid, returns corruption. Lessons learned: * Always keep in mind that even if you `assert()`, production code will continue execution even if assert fails. * Never `assert` based on DB corruption. Assert only if the code should guarantee that assert can't fail. Test Plan: dumped offending manifest. Before: assert. Now: corruption Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18507	2014-05-07 16:52:12 -07:00
sdong	9efbd85ac9	fsync directory after creating current file in NewDB() Summary: One of our users reported current file corruption. The machine was rebooted during the time. This is the only think I can think of which could cause current file corruption. Just add this paranoid check. Test Plan: make all check Reviewers: haobo, igor Reviewed By: haobo CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18495	2014-05-06 17:51:33 -07:00
sdong	3a171dcb51	Pass logger to memtable rep and TLB page allocation error logged to info logs Summary: TLB page allocation errors are now logged to info logs, instead of stderr. In order to do that, mem table rep's factory functions take a info logger now. Test Plan: make all check Reviewers: haobo, igor, yhchiang Reviewed By: yhchiang CC: leveldb, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18471	2014-05-05 16:43:37 -07:00
Igor Canadi	15c3991933	Add comment about ValueType	2014-05-05 12:57:47 -07:00
Igor Canadi	d2569fea47	log_and_apply_bench on a new benchmark framework Summary: db_test includes Benchmark for LogAndApply. This diff removes it from db_test and puts it into a separate log_and_apply bench. I just wanted to play around with our new benchmark framework and figure out how it works. I would also like to show you how great it is! I believe right set of microbenchmarks can speed up our productivity a lot and help catch early regressions. Test Plan: no Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18261	2014-05-05 11:11:48 -07:00
sdong	4a7c747064	Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"" And make the default 0 for hash linked list memtable This reverts commit `d69dc64be7`.	2014-05-04 13:56:29 -07:00
Igor Canadi	d69dc64be7	Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB" This reverts commit `7dafa3a1d7`.	2014-05-04 08:37:09 -07:00
Igor Canadi	0afc8bc29a	xxHash Summary: Originally: https://github.com/facebook/rocksdb/pull/87/files I'm taking over to apply some finishing touches Test Plan: will add tests Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18315	2014-05-01 14:09:32 -04:00
Igor Canadi	096f5be0ed	Put column family information in LiveFileMetaData Summary: As summary Test Plan: compiles :) Reviewers: dhruba, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18405	2014-04-30 16:24:52 -04:00
Igor Canadi	16f1aa7b2d	Fix signed/unsigned compare	2014-04-30 14:38:01 -04:00
Igor Canadi	df70047669	Flush stale column families Summary: Added a new option `max_total_wal_size`. Once the total WAL size goes over that, we make an attempt to flush all column families that still have data in the earliest WAL file. By default, I calculate `max_total_wal_size` dynamically, that should be good-enough for non-advanced customers. Test Plan: Added a test Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18345	2014-04-30 14:33:40 -04:00
sdong	7dafa3a1d7	Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, dhruba, leveldb, igor, yhchiang Differential Revision: https://reviews.facebook.net/D18357	2014-04-30 11:02:26 -07:00
Igor Canadi	66f88c43a5	Some fixes as preparation for release	2014-04-30 09:03:24 -07:00
Igor Canadi	d6d67c0efe	More s/us fixes	2014-04-30 07:04:36 -07:00
Yueh-Hsuan Chiang	9d9d2965cb	Add a new mem-table representation based on cuckoo hash. Summary: = Major Changes = * Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash. Cuckoo hash uses multiple hash functions. This allows each key to have multiple possible locations in the mem-table. - Put: When insert a key, it will try to find whether one of its possible locations is vacant and store the key. If none of its possible locations are available, then it will kick out a victim key and store at that location. The kicked-out victim key will then be stored at a vacant space of its possible locations or kick-out another victim. In this diff, the kick-out path (known as cuckoo-path) is found using BFS, which guarantees to be the shortest. - Get: Simply tries all possible locations of a key --- this guarantees worst-case constant time complexity. - Time complexity: O(1) for Get, and average O(1) for Put if the fullness of the mem-table is below 80%. - Default using two hash functions, the number of hash functions used by the cuckoo-hash may dynamically increase if it fails to find a short-enough kick-out path. - Currently, HashCuckooRep does not support iteration and snapshots, as our current main purpose of this is to optimize point access. = Minor Changes = * Add IsSnapshotSupported() to DB to indicate whether the current DB supports snapshots. If it returns false, then DB::GetSnapshot() will always return nullptr. Test Plan: Run existing tests. Will develop a test specifically for cuckoo hash in the next diff. Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb, dhruba, igor Differential Revision: https://reviews.facebook.net/D16155	2014-04-29 17:13:46 -07:00
Igor Canadi	f1c9aa6ebe	More unsigned/signed compare fixes	2014-04-29 13:01:06 -07:00
Igor Canadi	38693d99c4	Fix more signed/unsigned comparsions	2014-04-29 12:40:18 -07:00
Igor Canadi	dd9eb7a7d5	Cache result of ReadFirstRecord() Summary: ReadFirstRecord() reads the actual log file from disk on every call. This diff introduces a cache layer on top of ReadFirstRecord(), which should significantly speed up repeated calls to GetUpdatesSince(). I also cleaned up some stuff, but the whole TransactionLogIterator could use some refactoring, especially if we see increased usage. Test Plan: make check Reviewers: haobo, sdong, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18387	2014-04-29 13:27:58 -04:00
Igor Canadi	91ef2eae23	Use new DBWithTTL API in tests	2014-04-28 23:46:24 -04:00
Igor Canadi	72ff275e3c	Fix TransactionLogIterator EOF caching Summary: When TransactionLogIterator comes to EOF, it calls UnmarkEOF and continues reading. However, if glibc cached the EOF status of the file, it will get EOF again, even though the new data might have been written to it. This has been causing errors in Mac OS. Test Plan: test passes, was failing before Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18381	2014-04-28 23:30:27 -04:00
Donovan Hide	4f9fae9bb7	Add rocksdb_open_for_read_only to C API	2014-04-27 20:57:10 +01:00
Igor Canadi	c489499a2b	Fix OSX compile	2014-04-26 17:15:43 -04:00
Lei Jin	ccaca59bee	avoid calling FindFile twice in TwoLevelIterator for PlainTable Summary: this is to reclaim the regression introduced in https://reviews.facebook.net/D17853 Test Plan: make all check Reviewers: igor, haobo, sdong, dhruba, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17985	2014-04-25 12:23:07 -07:00
Lei Jin	d642c60bdc	Check PrefixMayMatch on Seek() Summary: As a follow-up diff for https://reviews.facebook.net/D17805, add optimization to check PrefixMayMatch on Seek() Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17853	2014-04-25 12:22:23 -07:00
Lei Jin	3995e801ab	kill ReadOptions.prefix and .prefix_seek Summary: also add an override option total_order_iteration if you want to use full iterator with prefix_extractor Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17805	2014-04-25 12:21:34 -07:00
Igor Canadi	8ce5492623	Delete superversion and log outside of mutex Summary: As summary. Add two autovectors that get filled up in MakeRoomForWrite and they get deleted outside of mutex Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18249	2014-04-25 14:58:02 -04:00
Igor Canadi	ad3cd39ccd	Column family logging Summary: Now that we have column families involved, we need to add extra context to every log message. They now start with "[column family name] log message" Also added some logging that I think would be useful, like level summary after every flush (I often needed that when going through the logs). Test Plan: make check + ran db_bench to confirm I'm happy with log output Reviewers: dhruba, haobo, ljin, yhchiang, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18303	2014-04-25 09:51:16 -04:00
Igor Canadi	4cd9f58c04	Fix corruption test	2014-04-24 14:56:41 -04:00
Igor Canadi	478990c81b	Make CompactionInputErrorParanoid less flakey Summary: I'm getting lots of e-mails with CompactionInputErrorParanoid failing. Most recent example early morning today was: http://ci-builds.fb.com/job/rocksdb_valgrind/562/consoleFull I'm putting a stop to these e-mails. I investigated why the test is flakey and it turns out it's because of non-determinsim of compaction scheduling. If there is a compaction after the last flush, CorruptFile will corrupt the compacted file instead of file at level 0 (as it assumes). That makes `Check(9, 9)` fail big time. I also saw some errors with table file getting outputed to >= 1 levels instead of 0. Also fixed that. Test Plan: Ran corruption_test 100 times without a failure. Previously it usually failed at 10th occurrence. Reviewers: dhruba, haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18285	2014-04-24 11:13:28 -07:00
sdong	4de5b84ee0	Fix a bug in IterKey Summary: IterKey set buffer_size_ to a wrong initial value, causing it to always allocate values from heap instead of stack if the key size is smaller. Fix it. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D18279	2014-04-23 19:45:22 -07:00
Igor Canadi	f9f8965e96	Print out stack trace in mac, too Summary: While debugging Mac-only issue with ThreadLocalPtr, this was very useful. Let's print out stack trace in MAC OS, too. Test Plan: Verified that somewhat useful stack trace was generated on mac. Will run PrintStack() on linux, too. Reviewers: ljin, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18189	2014-04-23 09:11:35 -04:00
sdong	a570740727	Expose number of entries in mem tables to users Summary: In this patch, two new DB properties are defined: rocksdb.num-immutable-mem-table and rocksdb.num-entries-imm-mem-tables, from where number of entries in mem tables can be exposed to users Test Plan: Cover the codes in db_test make all check Reviewers: haobo, ljin, igor Reviewed By: igor CC: nkg-, igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18207	2014-04-22 22:13:21 -07:00
Lei Jin	5f1daf7ae3	get rid of shared_ptr in memtable.cc Summary: Get rid of the devil. Probably won't impact anything on the perf side. Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D18153	2014-04-22 21:14:25 -07:00
sdong	86a0133d05	PlainTableReader to expose index size to users Summary: This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files. In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them. Test Plan: Add a unit test. make all check` Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18195	2014-04-22 19:29:05 -07:00
Igor Canadi	1068d2fa60	Revert "Better port::Mutex::AssertHeld() and AssertNotHeld()" This reverts commit `ddafceb6c2`.	2014-04-22 18:38:10 -07:00
Igor Canadi	ddafceb6c2	Better port::Mutex::AssertHeld() and AssertNotHeld() Summary: Using ThreadLocalPtr as a flag to determine if a mutex is locked or not enables us to implement AssertNotHeld(). It also makes AssertHeld() actually correct. I had to remove port::Mutex as a dependency for util/thread_local.h, but that's fine since we can just use std::mutex :) Test Plan: make check Reviewers: ljin, dhruba, haobo, sdong, yhchiang Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18171	2014-04-22 17:26:21 -07:00
Igor Canadi	3992aec8fa	Support for column families in TTL DB Summary: This will enable people using TTL DB to do so with multiple column families. They can also specify different TTLs for each one. TODO: Implement CreateColumnFamily() in TTL world. Test Plan: Added a very simple sanity test. Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: haobo CC: leveldb, alberts Differential Revision: https://reviews.facebook.net/D17859	2014-04-22 11:27:33 -07:00
Igor Canadi	8dc34364d2	Rename "benchmark" back to "bench". Also, make `benchharness.cc` not compiled into rocksdb library.	2014-04-21 13:12:15 -07:00
Pratyush Seth	ff1b5df4c6	Added benchmark functionality on the lines of folly/Benchmark.h Summary: Added benchmark functionality on the lines of folly/Benchmark.h Test Plan: Added unit tests Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17973	2014-04-21 12:29:55 -07:00
Igor Canadi	f813279da5	Remove TransactionLogIteratorRace when -DNDEBUG	2014-04-21 11:08:30 -07:00
Lei Jin	0f2d768191	hints for narrowing down FindFile range and avoiding checking unrelevant L0 files Summary: The file tree structure in Version is prebuilt and the range of each file is known. On the Get() code path, we do binary search in FindFile() by comparing target key with each file's largest key and also check the range for each L0 file. With some pre-calculated knowledge, each key comparision that has been done can serve as a hint to narrow down further searches: (1) If a key falls within a L0 file's range, we can safely skip the next file if its range does not overlap with the current one. (2) If a key falls within a file's range in level L0 - Ln-1, we should only need to binary search in the next level for files that overlap with the current one. (1) will be able to skip some files depending one the key distribution. (2) can greatly reduce the range of binary search, especially for bottom levels, given that one file most likely only overlaps with N files from the level below (where N is max_bytes_for_level_multiplier). So on level L, we will only look at ~N files instead of N^L files. Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s = 9.6M/s), it gives us ~13% improvement. Test Plan: make all check Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17205	2014-04-21 09:10:12 -07:00
sdong	651792251a	Fix bugs introduced by D17961 Summary: D17961 has two bugs: (1) two level iterator fails to populate FileMetaData.table_reader, causing performance regression. (2) table cache handle the !status.ok() case in the wrong place, causing seg fault which shouldn't happen. Test Plan: make all check Reviewers: ljin, igor, haobo Reviewed By: ljin CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17991	2014-04-17 17:25:28 -07:00
sdong	fa430bfd04	Minimize accessing multiple objects in Version::Get() Summary: One of our profilings shows that Version::Get() sometimes is slow when getting pointer of user comparators or other global objects. In this patch: (1) we keep pointers of immutable objects in Version to avoid accesses them though option objects or cfd objects (2) table_reader is directly cached in FileMetaData so that table cache don't have to go through handle first to fetch it (3) If level 0 has less than 3 files, skip the filtering logic based on SST tables' key range. Smallest and largest key are stored in separated memory locations, which has potential cache misses Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17739	2014-04-17 14:14:00 -07:00
Igor Canadi	161d9e586b	Don't overflow size_t in mac	2014-04-16 15:15:22 -07:00
Igor Canadi	5c12f27791	Remove tautological assert	2014-04-16 09:09:28 -07:00
Igor Canadi	faf7691358	Close DB at the end of DontRollEmptyLogs test	2014-04-15 17:20:56 -07:00
Igor Canadi	1803ed2ccb	Fix Mac OS compile	2014-04-15 16:31:49 -07:00
Igor Canadi	7d838856cf	Fix compile issues when doing make release	2014-04-15 16:00:10 -07:00
sdong	0f40fe4bc7	When creating a new DB, fail it when wal_dir contains existing log files Summary: Current behavior of creating new DB is, if there is existing log files, we will go ahead and replay them on top of empty DB. This is a behavior that no user would expect. With this patch, we will fail the creation if a user creates a DB with existing log files. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: haobo CC: nkg-, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17817	2014-04-15 14:01:57 -07:00
Igor Canadi	c166615850	Fix compile issues introduced by RocksDBLite	2014-04-15 13:51:07 -07:00
Igor Canadi	588bca2020	RocksDBLite Summary: Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile. Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping) Test Plan: compiles :) Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17835	2014-04-15 13:39:26 -07:00
Igor Canadi	dbe0f327ca	Set log_empty to false even when options.sync is off [fix tests]	2014-04-15 10:28:34 -07:00
Igor Canadi	e6acb874cd	Don't roll empty logs Summary: With multiple column families, especially when manual Flush is executed, we might roll the log file, although the current log file is empty (no data has been written to the log). After the diff, we won't create new log file if current is empty. Next, I will write an algorithm that will flush column families that reference old log files (i.e., that weren't flushed in a while) Test Plan: Added an unit test. Confirmed that unit test failes in master Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17631	2014-04-15 09:57:25 -07:00
sdong	c87ed0942c	Fix db_bench's multireadrandom Summary: multireadrandom is broken. Fix it Test Plan: run it and see segfault has gone. Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17781	2014-04-14 15:43:34 -07:00
Yueh-Hsuan Chiang	118f88d25d	Fix compile error in tailing_iter.h Summary: Fix the following compile error ./db/tailing_iter.h:17:1: error: class 'SuperVersion' was previously declared as a struct [-Werror,-Wmismatched-tags] class SuperVersion; ^ ./db/column_family.h:77:8: note: previous use is here struct SuperVersion { ^ ./db/tailing_iter.h:17:1: note: did you mean struct here? class SuperVersion; ^~~~~ struct 1 error generated. Test Plan: make Reviewers: ljin, igor, haobo, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17799	2014-04-14 14:05:15 -07:00
Yueh-Hsuan Chiang	327102efa5	Fix merge_test failure due to incorrect assert behavior in the release mode.	2014-04-14 12:06:49 -07:00
Lei Jin	82b37a18bd	thread local for tailing iterator Summary: replace the super version acquisision in tailing itrator with thread local Test Plan: will post results Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17757	2014-04-14 10:48:01 -07:00
Lei Jin	539dd207df	using thread local SuperVersion for NewIterator Summary: Similar to GetImp(), use SuperVersion from thread local instead of acquriing mutex. I don't expect this change will make a dent on NewIterator() performance because the bottleneck seems to be on the rest part of the API Test Plan: make asan_check will post perf numbers Reviewers: haobo, igor, sdong, dhruba, yhchiang Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17643	2014-04-14 09:34:59 -07:00
sdong	d5e087b6df	db_bench: add a mode to operate multiple DBs Summary: This patch introduces a new parameter num_multi_db in db_bench. When this parameter is larger than 1, multiple DBs will be created. In all benchmarks, any operation applies to a random DB among them. This is to benchmark the performance of similar applications. Test Plan: run db_bench on both of num_multi_db=0 and more. Reviewers: haobo, ljin, igor Reviewed By: igor CC: igor, yhchiang, dhruba, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17769	2014-04-11 16:59:08 -07:00
Lei Jin	eba3fc644a	make corruption_test:CompactionInputErrorParanoid deterministic Summary: it writes ~10M data, default L0 compaction trigger is 4, plus 2 writer buffer, so that can accommodate ~6M data before compaction happens for sure. I guess encoding is doing a good job to shrink the data so that sometime, compaction does not get triggered. I get test failure quite often. Test Plan: ran it multiple times and all got pass Reviewers: igor, sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17775	2014-04-11 12:48:38 -07:00
Igor Canadi	de41357a18	Don't dump rocksdb version on IOS	2014-04-11 10:19:58 -07:00
Lei Jin	0af36d6aa6	SeekRandomWhileWriting Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17751	2014-04-11 09:47:20 -07:00
Kai Liu	1405232b6d	Temporarily disable a test case in db_test Summary: Root cause is still under investigation. Just Disable the troubling use case for now.	2014-04-10 17:17:39 -07:00
Igor Canadi	ddef6841b3	Renamed InfoLogLevel::DEBUG to InfoLogLevel::DEBUG_LEVEL Summary: XCode for some reason injects `#define DEBUG 1` into our code, which makes compile fail because we use `DEBUG` keyword for other stuff. This diff fixes the issue by renaming `DEBUG` to `DEBUG_LEVEL`. Test Plan: compiles Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17709	2014-04-10 15:27:42 -07:00
Kai Liu	75b59d5146	Enable hash index for block-based table Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index. Test Plan: Wrote several new unit tests. Reviewers: sdong, haobo, dhruba Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16539	2014-04-10 14:19:43 -07:00
Lei Jin	7a92537fc4	db_bench: add IteratorCreationWhileWriting mode and allow prefix_seek Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17655	2014-04-10 10:15:59 -07:00
Igor Canadi	4daea66343	Turn on -Wmissing-prototypes Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes. Test Plan: compiles Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17649	2014-04-09 21:17:14 -07:00
sdong	df2a8b6a1a	Polish IterKey and use it in DBImpl::ProcessKeyValueCompaction() Summary: 1. Polish IterKey a little bit. 2. Turn to use it in local parameter of current_user_key in DBImpl::ProcessKeyValueCompaction(). Our profile showing that DBImpl::ProcessKeyValueCompaction() has about 14% costs in std::string (the base including reading and writing data but excluding compaction filtering), which is higher than it should be. There are two std::string used in DBImpl::ProcessKeyValueCompaction(), compaction_filter_value and current_user_key and it's hard to distinguish the two. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17613	2014-04-09 20:50:58 -07:00
Igor Canadi	dc55903293	Improved CompressedCache Summary: This is testing behavior that was reported in https://github.com/facebook/rocksdb/issues/111 No issue was found, but it still good to commit this and make CompressedCache more robust. Test Plan: this is a plan Reviewers: ljin, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17625	2014-04-09 11:43:14 -07:00
Lei Jin	4824014e3b	speed up db_bench filluniquerandom mode Summary: filluniquerandom is painfully slow due to the naive bitmap check to find out if a key has been seen before. Majority of time is spent on searching the last few keys. Split a giant BitSet to smaller ones so that we can quickly check if a BitSet is full and thus can skip quickly. It used to take over one hour to filluniquerandom for 100M keys, now it takes about 10 mins. Test Plan: unit test also verified correctness in db_bench and make sure all keys are generated Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17607	2014-04-09 11:25:21 -07:00
Igor Canadi	2014915d32	Fix ASAN issue	2014-04-09 10:38:05 -07:00
Igor Canadi	b947fdc89d	Column family support for DB::OpenForReadOnly() Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though) Test Plan: added a unit test in column_family_test Reviewers: haobo, sdong, ljin, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17565	2014-04-09 09:56:17 -07:00
Igor Canadi	731e55c01c	Fix GetProperty() test Summary: GetProperty test is flakey. Before this diff: P8635927 After: P8635945 We need to make sure the thread is done before we destruct sleeping tasks. Otherwise, bad things happen. Test Plan: See summary Reviewers: ljin, sdong, haobo, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17595	2014-04-08 14:57:00 -07:00
Igor Canadi	34455deb06	Fix Mac OS compile issues	2014-04-08 14:05:53 -07:00
Igor Canadi	5b345b76cb	Remove env_ from MergingIterator Summary: env_ is not used. Compiling for iOS complains. Test Plan: compiles now Reviewers: ljin, haobo, sdong, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17589	2014-04-08 13:40:42 -07:00
Lei Jin	0c1126d4cf	db_bench cleanup Summary: clean up the db_bench a little bit. also avoid allocating memory for key in the loop Test Plan: I verified a run with filluniquerandom & readrandom. Iterator seek will be used lot to measure performance. Will fix whatever comes up Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17559	2014-04-08 11:21:09 -07:00
Igor Canadi	beeee9dccc	Small speedup of CompactionFilterV2 Summary: ToString() is expensive. Profiling shows that most compaction threads are stuck in jemalloc, allocating a new string. This will help out a litte. Test Plan: make check Reviewers: haobo, danguo Reviewed By: danguo CC: leveldb Differential Revision: https://reviews.facebook.net/D17583	2014-04-08 11:06:39 -07:00
Lei Jin	92c1eb0291	macros for perf_context Summary: This will allow us to disable them completely for iOS or for better performance Test Plan: will run make all check Reviewers: igor, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17511	2014-04-08 10:58:07 -07:00
sdong	5e2db3b434	PlainTableIterator not to store copied key in std::string Summary: Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h. This patch improves iterator performance significantly. Running this benchmark: ./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond The average latency is improved to about 750 nanoseconds from 1100 nanoseconds. Test Plan: Add a unit test. make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17547	2014-04-07 19:06:09 -07:00
Igor Canadi	664559fe2d	Small final fixes before merge	2014-04-07 15:38:53 -07:00
Igor Canadi	d1e2bce42d	CallFlushDuringCompaction	2014-04-07 15:03:15 -07:00
Igor Canadi	b42ceb9598	Simplify cleanup of dead (refcount == 0) column families	2014-04-07 14:31:02 -07:00
Igor Canadi	e48348d196	Make flush part of compaction process This will enable user to use only 1 background thread.	2014-04-07 13:53:08 -07:00
Igor Canadi	2a0917b28e	Merge branch 'master' into columnfamilies	2014-04-07 13:04:25 -07:00
Igor Canadi	751e4b1a35	Fix wal_dir sanitizing	2014-04-07 11:36:03 -07:00
Igor Canadi	3d2fe844ab	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/memtable_list.cc db/version_set.cc	2014-04-07 11:31:11 -07:00
Igor Canadi	7efdd9ef4d	Options::wal_dir shouldn't end in '/' Summary: If a client specifies wal_dir with trailing '/', we will fail in deleting obsolete log files. See task #4083746 Test Plan: make check Reviewers: haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17535	2014-04-07 10:25:38 -07:00
sdong	ea0198fe9a	Create log::Writer out of DB Mutex Summary: Our measurement shows that sometimes new log::Write's constructor can take hundreds of milliseconds. It's unclear why but just simply move it out of DB mutex. Test Plan: make all check Reviewers: haobo, ljin, igor Reviewed By: haobo CC: nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17487	2014-04-04 15:46:28 -07:00
Lei Jin	c90d446ee7	make hash_link_list Node's key space consecutively followed at the end Summary: per sdong's request, this will help processor prefetch on n->key case. Test Plan: make all check Reviewers: sdong, haobo, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17415	2014-04-04 15:37:28 -07:00
sdong	99c756f0fe	Flush Buffered Info Logs Before Doing Compaction (one line change) Summary: Flushing log buffer earlier to avoid confusion of time holding the locks. Test Plan: Should be safe as long as several related db test passes Reviewers: haobo, igor, ljin Reviewed By: igor CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17493	2014-04-04 10:58:30 -07:00
Igor Canadi	040657aec9	Fix MacOS errors	2014-04-03 16:04:34 -07:00
Igor Canadi	f76e4027ca	initialize candidate count	2014-04-03 11:45:44 -07:00
sdong	b9767d0e09	Move several more logging inside DB mutex to log buffer Summary: Move several some common logging still in DB mutex to log buffer. Test Plan: make all check Reviewers: haobo, igor, ljin, nkg- Reviewed By: nkg- CC: nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17439	2014-04-03 10:47:18 -07:00
Thomas Adam	98422cba77	[C-API] implemented more options	2014-04-03 10:47:37 +02:00
Thomas Adam	3a30b5b0be	[C-API] added "rocksdb_options_set_plain_table_factory" to make it possible to use plain table factory	2014-04-03 10:47:37 +02:00
Haobo Xu	48bc0c6ad3	[RocksDB] Fix a race condition in GetSortedWalFiles Summary: This patch fixed a race condition where a log file is moved to archived dir in the middle of GetSortedWalFiles. Without the fix, the log file would be missed in the result, which leads to transaction log iterator gap. A test utility SyncPoint is added to help reproducing the race condition. Test Plan: TransactionLogIteratorRace; make check Reviewers: dhruba, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17121	2014-04-02 22:12:29 -07:00
Igor Canadi	d1d19f5db3	Fix valgrind error in c_test	2014-04-02 17:24:30 -07:00
sdong	158845ba9a	Move a info logging out of DB Mutex Summary: As we know, logging can be slow, or even hang for some file systems. Move one more logging out of DB mutex. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: igor CC: yhchiang, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17427	2014-04-02 16:48:32 -07:00
sdong	4af1954fd6	Compaction Filter V1 to use old context struct to keep backward compatible Summary: The previous change D15087 changed existing compaction filter, which makes the commonly used class not backward compatible. Revert the older interface. Use a new interface for V2 instead. Test Plan: make all check Reviewers: haobo, yhchiang, igor CC: danguo, dhruba, ljin, igor, leveldb Differential Revision: https://reviews.facebook.net/D17223	2014-04-02 14:57:51 -07:00
sdong	284c365b77	Fix valgrind error caused by FileMetaData as two level iterator's index block handle Summary: It is a regression valgrind bug caused by using FileMetaData as index block handle. One of the fields of FileMetaData is not initialized after being contructed and copied, but I'm not able to find which one. Also, I realized that it's not a good idea to use FileMetaData as in TwoLevelIterator::InitDataBlock(), a copied FileMetaData can be compared with the one in version set byte by byte, but the refs can be changed. Also comparing such a large structure is slightly more expensive. Use a simpler structure instead Test Plan: Run the failing valgrind test (Harness.RandomizedLongDB) make all check Reviewers: igor, haobo, ljin Reviewed By: igor CC: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17409	2014-04-02 14:38:28 -07:00
Igor Canadi	8555ce2dec	Merge branch 'master' into columnfamilies	2014-04-02 10:48:05 -07:00
sdong	e0a87c4cf1	DBIter to use static allocated char array for saved_key_ (if it is not too long) Summary: DBIter now uses a std::string for saved_key. Based on some profiling, it could be more expensive than we though. Optimize it with the same technique as LookupKey -- if it is short, we copy it to a static allocated char. Otherwise, dynamically allocate memory for it. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: dhruba, igor, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17289	2014-04-01 16:43:11 -07:00
sdong	d50619a559	PlainTableIterator::Seek() shouldn't check bloom filter in total order mode Summary: In total order mode, iterator's seek() shouldn't check total order. Also some cleaning up about checking null for shared pointers. I don't know the behavior before it. This bug was reported by @igor. Test Plan: test plain_table_db_test Reviewers: ljin, haobo, igor Reviewed By: igor CC: yhchiang, dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D17391	2014-04-01 15:05:16 -07:00
Thomas Adam	38dc5ef45f	[C-API] added the possiblity to create a HashSkipList or HashLinkedList to support prefix seeks	2014-04-01 12:44:27 +02:00
Igor Canadi	ddbd1ece88	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc db/internal_stats.cc db/internal_stats.h db/version_edit.cc db/version_edit.h db/version_set.cc include/rocksdb/options.h util/options.cc	2014-03-31 13:39:24 -07:00
Igor Canadi	577556d5f9	Don't store version number in MANIFEST Summary: Talked to <insert internal project name> folks and they found it really scary that they won't be able to roll back once they upgrade to 2.8. We should fix this. Test Plan: make check Reviewers: haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17343	2014-03-31 11:33:09 -07:00
Igor Canadi	8a139a054c	More valgrind issues! Summary: Fix some more CompactionFilterV2 valgrind issues. Maybe it would make sense for CompactionFilterV2 to delete its prefix_extractor? Test Plan: ran CompactionFilterV2* tests with valgrind. issues before patch -> no issues after Reviewers: haobo, sdong, ljin, dhruba Reviewed By: dhruba CC: leveldb, danguo Differential Revision: https://reviews.facebook.net/D17337	2014-03-29 10:34:47 -07:00
sdong	43a593a6d9	Change default value of some Options Summary: Since we are optimizing for server workloads, some default values are not optimized any more. We change some of those values that I feel it's less prone to regression bugs. Test Plan: make all check Reviewers: dhruba, haobo, ljin, igor, yhchiang Reviewed By: igor CC: leveldb, MarkCallaghan Differential Revision: https://reviews.facebook.net/D16995	2014-03-28 17:09:28 -07:00
sdong	2d3468c293	MemTableIterator not to reference Memtable Summary: In one of the perf, I shows 10%-25% CPU costs of MemTableIterator.Seek(), when doing dynamic prefix seek, are spent on checking whether we need to do bloom filter check or finding out the prefix extractor. Seems that more level of pointer checking makes CPU cache miss more likely. This patch makes things slightly simpler by copying pointer of bloom of prefix extractor into the iterator. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin CC: igor, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17247	2014-03-28 16:46:25 -07:00
Lei Jin	0d755fff14	cache friendly blocked bloomfilter Summary: By constraining the probes within cache line(s), we can improve the cache miss rate thus performance. This probably only makes sense for in-memory workload so defaults the option to off. Numbers and comparision can be found in wiki: https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study Test Plan: benchmarked this change substantially. Will run make all check as well Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17133	2014-03-28 09:21:20 -07:00
Yueh-Hsuan Chiang	10cebec79e	Fix the bug in MergeUtil which causes mixing values of different keys. Summary: Fix the bug in MergeUtil which causes mixing values of different keys. Test Plan: stringappend_test make all check Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17235	2014-03-27 16:15:25 -07:00
Haobo Xu	a92194e5b2	[RocksDB] Add db property "rocksdb.cur-size-active-mem-table" Summary: as title Test Plan: db_test Reviewers: sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17217	2014-03-27 15:14:04 -07:00
Igor Canadi	1c9f8f0884	Fix valgrind issues Summary: NewFixedPrefixTransform is leaked in default options. Broken by `b47812fba6` Also included in the diff some code cleanup Test Plan: valgrind env_test also make check Reviewers: haobo, danguo, yhchiang Reviewed By: danguo CC: leveldb Differential Revision: https://reviews.facebook.net/D17211	2014-03-27 08:22:59 -07:00
sdong	d556200264	Some small cleaning up to make some compiling environment happy Summary: Compiler complains some errors when building using our internal build settings. Fix them. Test Plan: rebuild Reviewers: haobo, dhruba, igor, yhchiang, ljin Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17199	2014-03-26 18:11:41 -07:00
Igor Canadi	6a08bc042a	Fix no return warning in FileComparator	2014-03-26 14:46:07 -07:00
Igor Canadi	1e9621d4e5	Sort files correctly in Builder::SaveTo Summary: Previously, we used to sort all files by BySmallestFirst comparator and then re-sort level0 files in the Finalize() (recently moved to end of SaveTo). In this diff, I chose the correct comparator at the beginning and sort the files correctly in Builder::SaveTo. I also added a verification that all files are sorted correctly in CheckConsistency() NOTE: This diff depends on D17037 Test Plan: make check. Will also run db_stress Reviewers: dhruba, haobo, sdong, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17049	2014-03-26 13:30:14 -07:00
Igor Canadi	954679bb0f	AssertHeld() should do things Summary: AssertHeld() was a no-op before. Now it does things. Also, this change caught a bad bug in SuperVersion::Init(). The method is calling db->mutex.AssertHeld(), but db variable is not initialized yet! I also fixed that issue. Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17193	2014-03-26 11:24:52 -07:00
Igor Canadi	ad9a39c9b4	[RocksDB] Preallocate new MANIFEST files Summary: We don't preallocate MANIFEST file, even though we have an option for that. This diff preallocates manifest file every time we create it Test Plan: make check Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17163	2014-03-26 09:37:53 -07:00
sdong	6b2e7a2a01	When Options.max_num_files=-1, non level0 files also by pass table cache Summary: This is the part that was not finished when doing the Options.max_num_files=-1 feature. For iterating non level0 SST files (which was done using two level iterator), table cache is not bypassed. With this patch, the leftover feature is done. Test Plan: make all check; change Options.max_num_files=-1 in one of the tests to cover the codes. Reviewers: haobo, igor, dhruba, ljin, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17001	2014-03-25 18:40:52 -07:00
Igor Canadi	e86d7dffd7	Merge branch 'master' into columnfamilies	2014-03-25 15:24:02 -07:00
Yueh-Hsuan Chiang	b9ce156e38	Add assert to MergeOperator::PartialMergeMulti to check # of operands. Summary: Add assert(operands_list.size() >= 2) in MergeOperator::PartialMergeMulti to ensure it's only be called when we have at least two merge operands. Test Plan: run merge_test and stringappend_test. Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17169	2014-03-25 13:39:17 -07:00
Danny Guo	d9ca83df28	[rocksdb] make init prefix more robust Summary: Currently if client uses kNULLString as the prefix, it will confuse compaction filter v2. This diff added a bool to indicate if the prefix has been intialized. I also added a unit test to cover this case and make sure the new code path is hit. Test Plan: db_test Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17151	2014-03-25 11:59:40 -07:00
Yueh-Hsuan Chiang	34f9da1cef	Fix the failure of stringappend_test caused by PartialMergeMulti. Summary: Fix a bug that PartialMergeMulti will try to merge the first operand with an empty slice. Test Plan: run stringappend_test and merge_test. Reviewers: haobo, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17157	2014-03-25 11:50:09 -07:00
Igor Canadi	e8168382c4	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc include/rocksdb/options.h util/options.cc	2014-03-25 11:09:40 -07:00
Danny Guo	b47812fba6	[rocksdb] new CompactionFilterV2 API Summary: This diff adds a new CompactionFilterV2 API that roll up the decisions of kv pairs during compactions. These kv pairs must share the same key prefix. They are buffered inside the db. typedef std::vector<Slice> SliceVector; virtual std::vector<bool> Filter(int level, const SliceVector& keys, const SliceVector& existing_values, std::vector<std::string>* new_values, std::vector<bool>* values_changed ) const = 0; Application can override the Filter() function to operate on the buffered kv pairs. More details in the inline documentation. Test Plan: make check. Added unit tests to make sure Keep, Delete, Change all works. Reviewers: haobo CCs: leveldb Differential Revision: https://reviews.facebook.net/D15087	2014-03-24 20:47:53 -07:00
Yueh-Hsuan Chiang	cda4006e87	Enhance partial merge to support multiple arguments Summary: * PartialMerge api now takes a list of operands instead of two operands. * Add min_pertial_merge_operands to Options, indicating the minimum number of operands to trigger partial merge. * This diff is based on Schalk's previous diff (D14601), but it also includes necessary changes such as updating the pure C api for partial merge. Test Plan: * make check all * develop tests for cases where partial merge takes more than two operands. TODOs (from Schalk): * Add test with min_partial_merge_operands > 2. * Perform benchmarks to measure the performance improvements (can probably use results of task #2837810.) * Add description of problem to doc/index.html. * Change wiki pages to reflect the interface changes. Reviewers: haobo, igor, vamsi Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16815	2014-03-24 17:57:13 -07:00
Igor Canadi	ac328a86b9	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc	2014-03-20 14:41:37 -07:00
Igor Canadi	c21ce14fa5	Fix double-free in corruption_test	2014-03-20 14:37:30 -07:00
Igor Canadi	e67241f0b9	Sanity check on Open Summary: Everytime a client opens a DB, we do a sanity check that: * checks the existance of all the necessary files * verifies that file sizes are correct Some of the code was stolen from https://reviews.facebook.net/D16935 Test Plan: added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17097	2014-03-20 14:18:29 -07:00
Yiting Li	7981a43274	Consistency Check Function Summary: Added a function/command to check the consistency of live files' meta data Test Plan: Manual test (size mismatch, file not exist). Command test script. Reviewers: haobo Reviewed By: haobo CC: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D16935	2014-03-20 13:42:45 -07:00
Igor Canadi	8ea3cb621e	If paranoid_checks -- Mark DB read-only on any IOError Summary: Whenever we get an IOError from GetImpl() or NewIterator(), we should immediatelly mark the DB read-only. The same check already exists in Write() and Compaction(). This should help with clients that are somehow missing a file. Test Plan: make check Reviewers: dhruba, haobo, sdong, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17061	2014-03-20 13:10:02 -07:00
sdong	f681030c80	Fix DBTest.UniversalCompactionTrigger failure caused by D17067 Summary: D17067 breaks DBTest.UniversalCompactionTrigger because of wrong location of the checking. Fix it. Test Plan: Run the test and make sure it passes. Reviewers: igor, haobo Reviewed By: igor CC: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17079	2014-03-20 11:10:11 -07:00
sdong	752ec46cd5	Add a unit test to verify compaction filter context Summary: Add unit tests to make sure CompactionFilterContext::is_manual_compaction_ and CompactionFilterContext::is_full_compaction_ are set correctly. Test Plan: run the new tests. Reviewers: haobo, igor, dhruba, yhchiang, ljin Reviewed By: haobo CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17067	2014-03-19 18:10:48 -07:00
Igor Canadi	e20fa3f8a4	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/internal_stats.cc db/internal_stats.h db/version_set.cc	2014-03-19 17:22:20 -07:00
Igor Canadi	fcd5c5e828	ComputeCompactionScore in CompactionPicker Summary: As it turns out, we need the call to ComputeCompactionScore (previously: Finalize) in CompactionPicker. The issue caused a deadlock in db_stress: http://ci-builds.fb.com/job/rocksdb_crashtest/290/console The last two lines before a deadlock were: 2014/03/18-22:43:41.481029 7facafbee700 (Original Log Time 2014/03/18-22:43:41.480989) Compaction nothing to do 2014/03/18-22:43:41.481041 7faccf7fc700 wait for fewer level0 files... "Compaction nothing to do" and other thread waiting for fewer level0 files. Hm hm. I moved the pre-sorting to SaveTo, which should fix both the original and the new issue. Test Plan: make check for now, will run db_stress in jenkins Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17037	2014-03-19 16:52:26 -07:00
Igor Canadi	e493f2f54e	Don't compact with zero input files Summary: We have an issue with internal service trying to run compaction with zero input files: 2014/02/07-02:26:58.386531 7f79117ec700 Compaction start summary: Base version 1420 Base level 3, seek compaction:0, inputs:[ϛ~^Qy^?],[] 2014/02/07-02:26:58.386539 7f79117ec700 Compacted 0@3 + 0@4 files => 0 bytes There are two issues: * inputsummary is printing out junk * it's constantly retrying (since I guess madeProgress is true), so it prints out a lot of data in the LOG file (40GB in one day). I read through the Level compaction picker and added some failure condition if input[0] is empty. I think PickCompaction() should not return compaction with zero input files with this change. I'm not confident enough to add an assertion though :) Test Plan: make check Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16005	2014-03-19 16:01:25 -07:00
Igor Canadi	22507aff6c	Fix compile issue in Mac OS Summary: Compile issues are: * Unused variable env_ * Unused fallocate_with_keep_size_ Test Plan: compiles Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17043	2014-03-19 15:40:12 -07:00
Lei Jin	6dc940d4c9	avoid shared_ptr assignment in Version::Get() Summary: This is a 500ns operation while the whole Get() call takes only a few micro! Test Plan: ran db_bench, for a DB with 50M keys, QPS jumps from 5.2M/s to 7.2M/s Reviewers: haobo, igor, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17007	2014-03-19 10:54:32 -07:00
sdong	71e6a34271	Add a DB property to indicate number of background errors encountered Summary: Add a property to calculate number of background errors encountered to help users build their monitoring Test Plan: Add a unit test. make all check Reviewers: haobo, igor, dhruba Reviewed By: igor CC: ljin, nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16959	2014-03-18 14:28:30 -07:00
Igor Canadi	69aa6ecb26	Finalize fist version in column family	2014-03-18 14:23:47 -07:00
Igor Canadi	e25819a185	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc	2014-03-18 14:00:20 -07:00
Kai Liu	1ec72b37b1	Several easy-to-add properties related to compaction and flushes Summary: To partly address the request @nkg- raised, add three easy-to-add properties to compactions and flushes. Test Plan: run unit tests and add a new unit test to cover new properties. Reviewers: haobo, dhruba Reviewed By: dhruba CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D13677	2014-03-18 14:00:09 -07:00
Igor Canadi	758fa8c359	Don't Finalize in CompactionPicker Summary: Finalize re-sorts (read: mutates) the files_ in Version* and it is called by CompactionPicker during normal runtime. At the same time, this same Version* lives in the SuperVersion* and is accessed without the mutex in GetImpl() code path. Mutating the files_ in one thread and reading the same files_ in another thread is a bad idea. It caused this issue: http://ci-builds.fb.com/job/rocksdb_crashtest/285/console Long-term, we need to be more careful with method contracts and clearly document what state can be mutated when. Now that we are much faster because we don't lock in GetImpl(), we keep running into data races that were not a problem before when we were slower. db_stress has been very helpful in detecting those. Short-term, I removed Finalize() from CompactionPicker. Note: I believe this is an issue in current 2.7 version running in production. Test Plan: make check Will also run db_stress to see if issue is gone Reviewers: sdong, ljin, dhruba, haobo Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16983	2014-03-18 13:59:59 -07:00
Igor Canadi	3055a15b29	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/version_edit.cc db/version_edit.h db/version_set.cc	2014-03-18 13:24:27 -07:00
Lei Jin	63cef90078	disable the log_number check in Recover() Summary: There is a chance that an old MANIFEST is corrupted in 2.7 but just not noticed. This check would fail them. Change it to log instead of returning a Corruption status. Test Plan: make Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16923	2014-03-18 12:46:29 -07:00
Igor Canadi	bcea9c1296	Finalize version in dumpmanifest	2014-03-18 09:45:52 -07:00
Igor Canadi	f26cb0f093	Optimize fallocation Summary: Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads. This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x. At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported. Test Plan: make check Reviewers: dhruba, sdong, haobo, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16761	2014-03-17 21:52:14 -07:00
Igor Canadi	ae25742af9	Fix race condition in manifest roll Summary: When the manifest is getting rolled the following happens: 1) manifest_file_number_ is assigned to a new manifest number (even though the old one is still current) 2) mutex is unlocked 3) SetCurrentFile() creates temporary file manifest_file_number_.dbtmp 4) SetCurrentFile() renames manifest_file_number_.dbtmp to CURRENT 5) mutex is locked If FindObsoleteFiles happens between (3) and (4) it will: 1) Delete manifest_file_number_.dbtmp (because it's not in pending_outputs_) 2) Delete old manifest (because the manifest_file_number_ already points to a new one) I introduce the concept of prev_manifest_file_number_ that will avoid the race condition. However, we should discuss the future of MANIFEST file rolling. We found some race conditions with it last week and who knows how many more are there. Nobody is using it in production because we don't trust the implementation. Should we even support it? Test Plan: make check Reviewers: ljin, dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16929	2014-03-17 21:50:15 -07:00
Igor Canadi	d63ae5cb59	Adjust memtable sizes in unit test	2014-03-17 18:37:34 -07:00
Igor Canadi	64904b39a0	Merge branch 'master' into columnfamilies Conflicts: utilities/backupable/backupable_db.cc	2014-03-17 17:57:14 -07:00
Igor Canadi	e0c1211555	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc tools/db_stress.cc	2014-03-17 12:21:05 -07:00
Yueh-Hsuan Chiang	a5fafd4f46	Correct the logic of MemTable::ShouldFlushNow(). Summary: Memtable will now be forced to flush if the one of the following conditions is met: 1. Already allocated more than write_buffer_size + 60% arena block size. (the overflowing condition) 2. Unable to safely allocate one more arena block without hitting the overflowing condition AND the unused allocated memory < 25% arena block size. Test Plan: make all check Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16893	2014-03-17 12:20:11 -07:00
sdong	c61c9830d4	Fix a bug that Prev() can hang. Summary: Prev() now can hang when there is a key with more than max_skipped number of appearance internally but all of them are newer than the sequence ID to seek. Add unit tests to confirm the bug and fix it. Test Plan: make all check Reviewers: igor, haobo Reviewed By: igor CC: ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16899	2014-03-17 10:00:41 -07:00
Igor Canadi	30447b7251	Merge pull request #99 from caiosba/master Make it compile on Debian/GCC 4.7	2014-03-17 08:51:35 -07:00
Lei Jin	0cf6c8f7ce	fix: use the correct edit when comparing log_number Summary: In the last fix, I forgot to point to the writer when comparing edit, which is apparently not correct. Test Plan: still running make whitebox_crash_test Reviewers: igor, haobo, igor2 Reviewed By: igor2 CC: leveldb Differential Revision: https://reviews.facebook.net/D16911	2014-03-15 23:30:43 -07:00
Lei Jin	453ec52ca1	journal log_number correctly in MANIFEST Summary: Here is what it can cause probelm: There is one memtable flush and one compaction. Both call LogAndApply(). If both edits are applied in the same batch with flush edit first and the compaction edit followed. LogAndApplyHelper() will assign compaction edit current VersionSet's log number(which should be smaller than the log number from flush edit). It cause log_numbers in MANIFEST to be not monotonic increasing, which violates the assume Recover() makes. What is more is after comitting to MANIFEST file, log_number_ in VersionSet is updated to the log_number from the last edit, which is the compaction one. It ends up not updating the log_number. Test Plan: make whitebox_crash_test got another assertion about iter->valid(), not sure if that is related to this. Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16875	2014-03-14 18:36:47 -07:00
Caio SBA	b9c78d2db6	Make it compile on Debian/GCC 4.7	2014-03-14 22:44:35 +00:00
Igor Canadi	a782bb989e	Fix log_number in LogAndApply	2014-03-14 13:45:30 -07:00
Igor Canadi	8b169e949a	Merge branch 'master' into columnfamilies	2014-03-14 13:42:36 -07:00
Igor Canadi	928ee23567	Change WriteBatch interface	2014-03-14 13:40:06 -07:00
Igor Canadi	2bad3cb0db	Missing includes	2014-03-14 13:02:20 -07:00
Igor Canadi	db234133a9	[CF] WriteBatch to take in ColumnFamilyHandle Summary: Client doesn't need to know anything about ColumnFamily ID. By making WriteBatch take ColumnFamilyHandle as a parameter, we can eliminate method GetID() from ColumnFamilyHandle Test Plan: column_family_test Reviewers: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16887	2014-03-14 11:30:14 -07:00
Igor Canadi	3c75cc15a9	Fix HashSkipList and HashLinkedList SIGSEGV Summary: Original Summary: Yesterday, @ljin and I were debugging various db_stress issues. We suspected one of them happens when we concurrently call NewIterator without prefix_seek on HashSkipList. This test demonstrates it. Update: Arena is not thread-safe!! When creating a new full iterator, we have to create a new arena, otherwise we're doomed. Test Plan: SIGSEGV and assertion-throwing test now works! Reviewers: ljin, haobo, sdong Reviewed By: sdong CC: leveldb, ljin Differential Revision: https://reviews.facebook.net/D16857	2014-03-14 10:02:04 -07:00
Igor Canadi	6c72079d77	Fix warning on Mac OS	2014-03-14 09:54:23 -07:00
Igor Canadi	f0e1e3ebf1	CF cleanup part 2	2014-03-13 14:34:01 -07:00
Igor Canadi	f071a20f6e	Need more data in memtable to flush due to 11da8b	2014-03-13 13:52:20 -07:00
Igor Canadi	e1f56e12cf	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc tools/db_stress.cc	2014-03-13 13:21:20 -07:00
sdong	5aa81f04fa	Fix extra compaction tasks scheduled after D16767 in some cases Summary: With D16767, there is a case compaction tasks are scheduled infinitely: (1) no flush thread is configured and more than 1 compaction threads (2) a flush is going on by one compaction hread (3) the state of SST files is in the state that versions_->current()->NeedsCompaction() will generate a false positive (return true actually there is no work to be done) In that case, a infinite loop will be formed. This patch would fix it. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: igor CC: dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16863	2014-03-13 13:06:08 -07:00
Kai Liu	11da8bc5df	A heuristic way to check if a memtable is full Summary: This is is based on https://reviews.facebook.net/D15027. It's not finished but I would like to give a prototype to avoid arena over-allocation while making better use of the already allocated memory blocks. Instead of check approximate memtable size, we will take a deeper look at the arena, which incorporate essential idea that @sdong suggests: flush when arena has allocated its last and the last is "almost full" Test Plan: N/A Reviewers: haobo, sdong Reviewed By: sdong CC: leveldb, sdong Differential Revision: https://reviews.facebook.net/D15051	2014-03-12 16:40:14 -07:00
Igor Canadi	25c8a1a20f	More bug fixed introduced by code cleanup	2014-03-12 12:28:23 -07:00
Igor Canadi	b5d6ad69fc	Bug fixes introduced by code cleanup	2014-03-12 11:10:26 -07:00
Igor Canadi	dff9214165	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc tools/db_stress.cc	2014-03-12 10:17:41 -07:00
Igor Canadi	fb2346fc1f	[CF] Code cleanup part 1 Summary: I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup. Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch. This diff also fixes race condition when dropping column family. Test Plan: Ran db_stress with variety of parameters Reviewers: dhruba, haobo Differential Revision: https://reviews.facebook.net/D16833	2014-03-12 09:56:53 -07:00
Igor Canadi	45ad75db80	Correct version of D16821	2014-03-12 09:38:59 -07:00
Igor Canadi	2b95dc1542	Revert "Fix bad merge of D16791 and D16767" This reverts commit `839c8ecfcd`.	2014-03-12 09:37:43 -07:00
sdong	839c8ecfcd	Fix bad merge of D16791 and D16767 Summary: A bad Auto-Merge caused log buffer is flushed twice. Remove the unintended one. Test Plan: Should already be tested (the code looks the same as when I ran unit tests). Reviewers: haobo, igor Reviewed By: haobo CC: ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16821	2014-03-11 21:31:57 -07:00
sdong	bd45633b71	Fix data race against logging data structure because of LogBuffer Summary: @igor pointed out that there is a potential data race because of the way we use the newly introduced LogBuffer. After "bg_compaction_scheduled_--" or "bg_flush_scheduled_--", they can both become 0. As soon as the lock is released after that, DBImpl's deconstructor can go ahead and deconstruct all the states inside DB, including the info_log object hold in a shared pointer of the options object it keeps. At that point it is not safe anymore to continue using the info logger to write the delayed logs. With the patch, lock is released temporarily for log buffer to be flushed before "bg_compaction_scheduled_--" or "bg_flush_scheduled_--". In order to make sure we don't miss any pending flush or compaction, a new flag bg_schedule_needed_ is added, which is set to be true if there is a pending flush or compaction but not scheduled because of the max thread limit. If the flag is set to be true, the scheduling function will be called before compaction or flush thread finishes. Thanks @igor for this finding! Test Plan: make all check Reviewers: haobo, igor Reviewed By: haobo CC: dhruba, ljin, yhchiang, igor, leveldb Differential Revision: https://reviews.facebook.net/D16767	2014-03-11 16:09:53 -07:00
Igor Canadi	d833f15738	Fix bug in VersionEdit::DebugString()	2014-03-11 12:14:09 -07:00
Igor Canadi	37472bb279	Add MaxColumnFamily to VersionEdit::DebugString()	2014-03-11 12:08:42 -07:00
Igor Canadi	457c78eb89	[CF] db_stress for column families Summary: I had this diff for a while to test column families implementation. Last night, I ran it sucessfully for 10 hours with the command: time ./db_stress --threads=30 --ops_per_thread=200000000 --max_key=5000 --column_families=20 --clear_column_family_one_in=3000000 --verify_before_write=1 --reopen=50 --max_background_compactions=10 --max_background_flushes=10 --db=/tmp/db_stress It is ready to be committed :) Test Plan: Ran it for 10 hours Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16797	2014-03-11 12:06:12 -07:00
sdong	6c66bc08d9	Temp Fix of LogBuffer flushing Summary: To temp fix the log buffer flushing. Flush the buffer inside the lock. Clean the trunk before we find an eventual fix. Test Plan: make all check Reviewers: haobo, igor Reviewed By: igor CC: ljin, leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D16791	2014-03-11 11:37:40 -07:00
Igor Canadi	cb9802168f	Add a comment after SignalAll() Summary: Having code after SignalAll has already caused 2 bugs. Let's make sure this doesn't happen again. Test Plan: no test Reviewers: sdong, dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16785	2014-03-11 11:27:19 -07:00
Igor Canadi	dad8603fc4	[CF] Fix column family dropping Summary: Column family should be dropped after the change has been commited Test Plan: db stress Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16779	2014-03-11 09:47:29 -07:00
Igor Canadi	9634ba42ac	Merge branch 'master' into columnfamilies Conflicts: db/compaction_picker.cc db/db_impl.cc db/db_impl.h db/tailing_iter.cc db/version_set.h include/rocksdb/options.h util/options.cc	2014-03-10 17:26:09 -07:00
Igor Canadi	d5de22dc09	Call PurgeObsoleteFiles() only when HaveSomethingToDelete() Summary: as title Test Plan: fixed the build failure http://ci-builds.fb.com/job/rocksdb_build/987/console Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16743	2014-03-10 15:42:14 -07:00
sdong	fac58c0504	DBTest: remove perf_context's time > 0 check Summary: DBTest checks perf_context.seek_internal_seek_time > 0 and perf_context.find_next_user_entry_time > 0, which is not reliable. Remove them. Test Plan: ./db_test Reviewers: igor, haobo, ljin Reviewed By: igor CC: dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16737	2014-03-10 14:24:56 -07:00
Haobo Xu	a91aed615a	[RocksDB] Minor cleanup of PurgeObsoleteFiles Summary: as title. also made info log output of file deletion a bit more descriptive. Test Plan: make check; db_bench and look at LOG output Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16731	2014-03-10 14:13:38 -07:00
Lei Jin	8d007b4aaf	Consolidate SliceTransform object ownership Summary: (1) Fix SanitizeOptions() to also check HashLinkList. The current dynamic case just happens to work because the 2 classes have the same layout. (2) Do not delete SliceTransform object in HashSkipListFactory and HashLinkListFactory destructor. Reason: SanitizeOptions() enforces prefix_extractor and SliceTransform to be the same object when HashFactory is used. This makes the behavior strange: when HashFactory is used, prefix_extractor will be released by RocksDB. If other memtable factory is used, prefix_extractor should be released by user. Test Plan: db_bench && make asan_check Reviewers: haobo, igor, sdong Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16587	2014-03-10 12:56:46 -07:00
Haobo Xu	9e0e6aa7f6	[RocksDB] make sure KSVObsolete does not get accessed as a valid pointer. Summary: KSVObsolete is no longer nullptr and needs to be checked explicitly. Also did some minor code cleanup and added a stat counter to track superversion cleanups incurred in the foreground. Test Plan: make check Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D16701	2014-03-10 12:55:25 -07:00
Haobo Xu	66da467983	[RocksDB] LogBuffer Cleanup Summary: Moved LogBuffer class to an internal header. Removed some unneccesary indirection. Enabled log buffer for BackgroundCallFlush. Forced log buffer flush right after Unlock to improve time ordering of info log. Test Plan: make check; db_bench compare LOG output Reviewers: sdong Reviewed By: sdong CC: leveldb, igor Differential Revision: https://reviews.facebook.net/D16707	2014-03-10 11:05:44 -07:00
Igor Canadi	04d2c26e17	Add option verify_checksums_in_compaction Summary: If verify_checksums_in_compaction is true, compaction will verify checksums. This is default. If it's false, compaction doesn't verify checksums. This is useful for in-memory workloads. Test Plan: corruption_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16695	2014-03-10 10:06:34 -07:00
Igor Canadi	d4f2c610d3	Ignore dropped column families -- don't flush or compact them	2014-03-07 18:43:21 -08:00
Igor Canadi	1e0d47276c	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h	2014-03-07 16:59:47 -08:00
Igor Canadi	9f15092ebd	[CF] NewIterators Summary: Adding the last missing function -- NewIterators(). Pretty simple implementation Test Plan: added a unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16689	2014-03-07 16:15:25 -08:00
Lei Jin	e5fa4944fc	use CAS when returning SuperVersion to ThreadLocal Summary: Add a check at the end of GetImpl to release SuperVersion if it becomes obsolete. Also do Scrape() inside InstallSuperVersion so it happens more frequent. Test Plan: make all check running asan_check now Reviewers: igor, haobo, sdong, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16641	2014-03-07 14:43:22 -08:00
Igor Canadi	eec8695206	Delete local sv when destroying DB from stress test Summary: Not deleting local SV caused some an crash test issue: http://ci-builds.fb.com/job/rocksdb_asan_crash_test/83/console Test Plan: ran unit tests Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D16635	2014-03-06 18:15:26 -08:00
Igor Canadi	80a207fc90	Merge branch 'master' into columnfamilies Conflicts: db/compaction_picker.cc db/compaction_picker.h db/db_impl.cc db/version_set.cc db/version_set.h include/rocksdb/options.h util/options.cc	2014-03-05 16:59:22 -08:00
sdong	ecb1ffa2a8	Buffer info logs when picking compactions and write them out after releasing the mutex Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex. Test Plan: make all check check the log lines while running some tests that trigger compactions. Reviewers: haobo, igor, dhruba Reviewed By: dhruba CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg- Differential Revision: https://reviews.facebook.net/D16515	2014-03-05 15:36:32 -08:00
Igor Canadi	e2dd148a8b	Fix compile fail introduced by merge	2014-03-05 12:47:44 -08:00
Igor Canadi	a329dd1b25	Fix TEST_Destroy_DBImpl() to work with column families	2014-03-05 12:27:39 -08:00
Igor Canadi	0738ae6dc9	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc	2014-03-05 12:25:05 -08:00
Igor Canadi	9625acbf70	[CF] Dont reuse dropped column family IDs Summary: Column family IDs should be unique, even if column family is dropped. To achieve this, we save max column family in manifest. Note that the diff is still not ready. I'm only using differential to move the patch to my Mac machine. Test Plan: added a test to column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16581	2014-03-05 12:13:44 -08:00
Igor Canadi	8ca30bd51b	Merge pull request #47 from mlin/kCompactionStopStyleSimilarSize An initial implementation of kCompactionStopStyleSimilarSize for universal compaction	2014-03-05 10:35:30 -08:00
Lei Jin	04298f8c33	output perf_context in db_bench readrandom Summary: Add helper function to print perf context data in db_bench if enabled. I didn't find any code that actually exports perf context data. Not sure if I missed anything Test Plan: ran db_bench Reviewers: haobo, sdong, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16575	2014-03-05 10:32:54 -08:00
Lei Jin	64138b5d9c	fix db_bench to use HashSkipList for real Summary: For HashSkipList case, DBImpl has sanity check to see if prefix_extractor in options is the same as the one in memtable factory. If not, it falls back to SkipList. As result, I was experimenting with SkipList performance. No wonder it is much worse than LinkedList Test Plan: ran benchmark Reviewers: haobo, sdong, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16569	2014-03-05 10:28:53 -08:00
Lei Jin	51560ba755	config max_background_flushes in db_bench Summary: as title Test Plan: make release Reviewers: haobo, sdong, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16437	2014-03-05 10:27:17 -08:00
Igor Canadi	c0ccf43648	MergingIterator assertion Summary: I wrote a test that triggers assertion in MergingIterator. I have not touched that code ever, so I'm looking for somebody with good understanding of the MergingIterator code to fix this. The solution is probably a one-liner. Let me know if you're willing to take a look. Test Plan: This test fails with an assertion `use_heap_ == false` Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16521	2014-03-05 09:13:07 -08:00
sdong	e8ecca9e86	CleanupIteratorState() only to initialize DeletionState when super version cleanup needed Summary: Two changes: 1. DeletionState is only constructed when cleaning up is needed 2. Fix the bug of deletion state construction bug. A change was made in a previous patch: https://reviews.facebook.net/rROCKSDB774ed89c2405ee058086b099cbc8b29e243739cc#71a34e2e However, it somehow got lost when merging Test Plan: make all check Reviewers: kailiu, haobo, igor Reviewed By: igor CC: igor, dhruba, i.am.jin.lei, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16233	2014-03-04 20:58:20 -08:00
Igor Canadi	e21d5b8bbc	[CF] Flush all memtables on column family drop Summary: When column family is dropped, we want to delete all WALs that refer to it. To do that, we need to make them obsolete by flushing all the memtables Test Plan: column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16557	2014-03-04 17:21:30 -08:00
Lei Jin	a5b1d2f146	make key evenly distributed between 0 and FLAGS_num Summary: The issue is that when FLAGS_num is small, the leading bytes of the key are padded with 0s. This makes all keys have the same prefix 00000000 Most of the changes are just to make lint happy Test Plan: ran db_bench Reviewers: sdong, haobo, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16317	2014-03-04 17:08:05 -08:00
Igor Canadi	fa34697237	Merge branch 'master' into columnfamilies	2014-03-04 09:39:14 -08:00
Igor Canadi	335b207974	[CF] Delete SuperVersion in a special function Summary: Added a function DeleteSuperVersion that can be called in DBImpl destructor before PurgingObsoleteFiles. That way, PurgeObsoleteFiles will be able to delete all files held by alive super versions. Test Plan: column_family_test with valgrind Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16545	2014-03-04 09:35:44 -08:00
kailiu	906f3dca72	Add a hash-index component for block Summary: this is the key component extracted from diff: https://reviews.facebook.net/D14271 I separate it to a dedicated patch to make the review easier. Test Plan: added a unit test and passed it. Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16245	2014-03-03 21:11:49 -08:00
Igor Canadi	9d0577a6be	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/transaction_log_impl.cc db/transaction_log_impl.h include/rocksdb/options.h util/env.cc util/options.cc	2014-03-03 18:29:03 -08:00
Igor Canadi	5142b37000	Fix a group commit bug in LogAndApply Summary: EncodeTo(&record) does not overwrite, it appends to it. This means that group commit log and apply will look something like: record1 record1record2 record1record2record3 I'm surprised this didn't show up in production, but I think the reason is that MANIFEST group commit almost never happens. This bug turned up in column family work, where opening a database failed with "adding a same column family twice". Test Plan: Tested the change in column family branch and observed that the problem is gone (with db_stress) Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16461	2014-03-03 17:10:43 -08:00
Igor Canadi	f9b2f0ad79	[CF] Fix CF bugs in WriteBatch Summary: This diff fixes two bugs: * Increase sequence number even if WriteBatch fails. This is important because WriteBatches in WAL logs have implictly increasing sequence number, even if one update in a write batch fails. This caused some writes to get lost in my CF stress testing * Tolerate 'invalid column family' errors on recovery. When a column family is dropped, processing WAL logs can have some WriteBatches that still refer to the dropped column family. In recovery environment, we want to ignore those errors. In client's Write() code path, however, we want to return the failure to the client if he's trying to add data to invalid column family. Test Plan: db_stress's verification works now Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16533	2014-03-03 17:07:46 -08:00
kailiu	bf86af5174	Remove the terrible hack in for flush_block_policy_factory Summary: Previous code is too convoluted and I must be drunk for letting such code to be written without a second thought. Thanks to the discussion with @sdong, I added the `Options` when generating the flusher, thus avoiding the tricks. Just FYI: I resisted to add Options in flush_block_policy.h since I wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options and Options also depends FlushBlockPolicy... While I appreciate my effort to prevent it, the old design turns out creating more troubles than it tried to avoid. Test Plan: ran ./table_test Reviewers: sdong Reviewed By: sdong CC: sdong, leveldb Differential Revision: https://reviews.facebook.net/D16503	2014-02-28 16:39:27 -08:00
Igor Canadi	8ea21a778b	[CF] Rething LogAndApply for column families Summary: I though I might get away with as little changes to LogAndApply() as possible. It turns out this is not the case. This diff introduces different behavior of LogAndApply() for three cases: 1. column family add 2. column family drop 3. no-column family manipulation (1) and (2) don't support group commit yet. There were a lot of problems with old version od LogAndApply, detected by db_stress. The biggest was non-atomicity of manifest writes and metadata changes (i.e. if column family add is in manifest, it also has to be in in-memory data structure). Test Plan: db_stress Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16491	2014-02-28 14:46:48 -08:00
Igor Canadi	58ca641d53	Make Log::Reader more robust Summary: This diff does two things: (1) Log::Reader does not report a corruption when the last record in a log or manifest file is truncated (meaning that log writer died in the middle of the write). Inherited the code from LevelDB: https://code.google.com/p/leveldb/source/detail?r=269fc6ca9416129248db5ca57050cd5d39d177c8# (2) Turn off mmap writes for all writes to log and manifest files (2) is necessary because if we use mmap writes, the last record is not truncated, but is actually filled with zeros, making checksum fail. It is hard to recover from checksum failing. Test Plan: Added unit tests from LevelDB Actually recovered a "corrupted" MANIFEST file. Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16119	2014-02-28 13:19:47 -08:00
Igor Canadi	12966ec1bb	Fix LogAndApply() group commit	2014-02-28 12:22:45 -08:00
Yueh-Hsuan Chiang	a77527f2af	Add ReadOptions to TransactionLogIterator. Summary: Add an optional input parameter ReadOptions to DB::GetUpdateSince(), which allows the verification of checksums to be disabled by setting ReadOptions::verify_checksums to false. Test Plan: Tests are done off-line and will not be included in the regular unit test. Reviewers: igor Reviewed By: igor CC: leveldb, xjin, dhruba Differential Revision: https://reviews.facebook.net/D16305	2014-02-28 11:50:36 -08:00
Igor Canadi	f6a257b6a1	Set dropped column family before persisting in the manifest	2014-02-28 11:49:32 -08:00
Igor Canadi	670f3ba212	[CF] Small refactor of Recover() and DumpManifest()	2014-02-28 11:25:38 -08:00
Igor Canadi	099ad94306	Set log number for column family	2014-02-28 11:08:24 -08:00
Igor Canadi	510f84b686	[CF] CreateColumnFamily fix Summary: This fixes few bugs with CreateColumnFamily * We first have to LogAndApply and then call VersionSet::CreateColumnFamily. Otherwise, WriteSnapshot might be invoked, writing out column family add inside of LogAndApply, even though it's not really committed * Fix LogAndApplyHelper() to not apply log number to column_family_data, which is in case of column family add, just a dummy (default) column family * Create SuperVerion when creating column family Test Plan: column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16443	2014-02-28 10:40:52 -08:00
Igor Canadi	206b38f31c	SetLogNumber in CreateColumnFamily	2014-02-27 16:53:45 -08:00
Igor Canadi	b41a3bc4da	[CF] Change flow of CreateColumnFamily Summary: Previously, we first wrote to the manifest and then created internal data structure. Now, we first create internal data structure. That way, we can write out internal comparator to the manifest Test Plan: column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16425	2014-02-27 16:49:49 -08:00
Igor Canadi	492c9f71c6	[CF] Column family support for LDB tool Summary: Added list_column_family command and also updated dump_manifest Test Plan: no Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16419	2014-02-27 16:39:23 -08:00
Lei Jin	ad0c3747cb	cache SuperVersion in thread local storage to avoid mutex lock Summary: as title Test Plan: asan_check will post results later Reviewers: haobo, igor, dhruba, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16257	2014-02-27 11:38:55 -08:00
Igor Canadi	85b1b5e1b9	[CF] WaitForFlush() instead of sleeping Summary: If we sleep for 300ms the test fails in valgrind because it takes more than 300ms to flush. This way we WaitForFlush() when we're expecting flush, but still sleep and check if the flush happens even though it's not supposed to. Test Plan: notest Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16401	2014-02-27 10:31:05 -08:00
Igor Canadi	4c42201204	[CF] Test fixes and speedup	2014-02-26 17:34:39 -08:00
Igor Canadi	343c32be7b	[CF] DifferentMergeOperators and DifferentCompactionStyles tests Summary: Two new column family tests: * DifferentMergeOperators -- three column families, one without merge operator, one with add operator and one with append operator. verify that operations work as expected. * DifferentCompactionStyles -- three column families, two with level compactions and one with universal compaction. trigger the compactions and verify they work as expected. Test Plan: nope Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16377	2014-02-26 16:05:24 -08:00
Igor Canadi	3c81546422	[CF] Make LogDeletionTest less flakey Summary: Retry GetSortedWalFiles() and also wait 20ms before counting number of log files. WaitForFlush() doesn't necessarily wait for logs to be deleted, since logs are deleted outside of the mutex. Test Plan: column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16371	2014-02-26 14:41:18 -08:00
Igor Canadi	6e7cae7711	[CF] More tests Summary: New unit tests for column families Test Plan: this is a test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16359	2014-02-26 14:16:23 -08:00
Igor Canadi	9bce2b2a84	[CF] Fix lint errors in CF code Summary: Big CF diff uncovered some lint errors. This diff fixes some of them. Not much to see here Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16347	2014-02-26 10:10:00 -08:00
Igor Canadi	8b7ab9951c	[CF] Handle failure in WriteBatch::Handler Summary: * Add ColumnFamilyHandle::GetID() function. Client needs to know column family's ID to be able to construct WriteBatch * Handle WriteBatch::Handler failure gracefully. Since WriteBatch is not a very smart function (it takes raw CF id), client can add data to WriteBatch for column family that doesn't exist. In that case, we need to gracefully return failure status from DB::Write(). To do that, I added a return Status to WriteBatch functions PutCF, DeleteCF and MergeCF. Test Plan: Added test to column_family_test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16323	2014-02-26 10:10:00 -08:00
Igor Canadi	8895526308	Merge branch 'master' into columnfamilies	2014-02-25 17:04:48 -08:00
Igor Canadi	5ad7ee03ea	[CF] Log deletion in column families Summary: * Added unit test that verifies that obsolete files are deleted. * Advance log number for empty column family when cutting log file. * MinLogNumber() bug fix! (caught by the new unit test) Test Plan: unit test Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16311	2014-02-25 16:54:41 -08:00
Igor Canadi	dc277f0ab7	[CF] Adaptation of GetLiveFiles for CF Summary: Even if user flushes the memtables before getting live files, we still can't guarantee that new data didn't come in (to already-flushed memtables). If we want backups to provide consistent view of the database, we still need to get WAL files. Test Plan: backupable_db_test Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16299	2014-02-25 13:21:14 -08:00
Igor Canadi	5a91746277	log file is uint64_t	2014-02-25 12:57:43 -08:00
Igor Canadi	4209516359	Schedule flush when waiting on flush Summary: This will also help with avoiding the deadlock. If a flush failed and we're waiting for a memtable to be flushed, we should schedule a new flush and hope a new one succeedes. If paranoid_checks = false, Wait() will still hang on ENOSPC, but at least it will automatically continue when the space frees up. Current behavior both hangs and deadlocks. Also, I renamed some 'compaction' to 'flush'. 'compaction' was leveldb way of saying things. Test Plan: make check Reviewers: dhruba, haobo, ljin Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16281	2014-02-25 12:04:14 -08:00
Lei Jin	dea894ef8d	expose wal_dir in db_bench Summary: as title Test Plan: ran db_bench Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16269	2014-02-25 10:43:46 -08:00
Albert Strasheim	72aacf6b96	A few more C API functions.	2014-02-25 10:32:28 -08:00
Igor Canadi	b69e7d99d5	[CF] Better handling of memtable logs Summary: DBImpl now keeps a list of alive_log_files_. On every FindObsoleteFiles, it deletes all alive log files that are smaller than versions_->MinLogNumber() Test Plan: make check passes no specific unit tests yet, will add Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16293	2014-02-25 09:55:13 -08:00
Igor Canadi	d39da4b578	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc	2014-02-24 17:09:05 -08:00
Igor Canadi	6ed450a58c	DeleteFile should schedule Flush or Compaction Summary: More info here: https://github.com/facebook/rocksdb/issues/89 If flush fails because of ENOSPC, we have a deadlock problem. This is a quick fix that will continue the normal operation when user deletes the file and frees up the space on the device. We need to address the issue more broadly with bg_error_ cleanup. Test Plan: make check Reviewers: dhruba, haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D16275	2014-02-24 16:00:13 -08:00
Igor Canadi	2bf1151a25	Fix C API	2014-02-24 15:15:34 -08:00
Igor Canadi	18a7cdfba0	Merge pull request #82 from tecbot/api-enhancements Enhancements to the API	2014-02-24 14:20:13 -08:00
Thomas Adam	ce2b1f7b44	added a test case for custom merge operator	2014-02-23 17:58:38 +01:00
Thomas Adam	68248a2ac5	added a delete method for custom filter policy and merge operator to make it possible to override the cleanup behaviour of the return value	2014-02-23 17:58:11 +01:00
sdong	b2d29675c8	Add a test in prefix_test to verify correctness of results Summary: Add a test to verify HashLinkList and HashSkipList (mainly for the former one) returns the correct results when inserting the same bucket in the different orders. Some other changes: (1) add the test to test list (2) fix compile error (3) add header Test Plan: ./prefix_test Reviewers: haobo, kailiu Reviewed By: haobo CC: igor, yhchiang, i.am.jin.lei, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D16143	2014-02-19 17:00:34 -08:00
Kai Liu	2b205b35d8	Disable putting filter block to block cache Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache. Test Plan: Wrote a new unit tests to make sure it works. Reviewers: dhruba, haobo, igor, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16221	2014-02-19 15:38:57 -08:00
Thomas Adam	d74c9b79ea	Enhancements to the API	2014-02-19 23:59:54 +01:00
sdong	e90d3f7752	First Transaction Logs Should Not Skip Storage Options Given Summary: Currently, the first transaction log file ignore bytes_per_sync and other storage-related options. It is not consistent. Fix it. Test Plan: make all check. See the options set in GDB. Reviewers: haobo, kailiu Reviewed By: haobo CC: igor, ljin, yhchiang, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16215	2014-02-19 10:58:39 -08:00

... 9 10 11 12 13 ...

1746 Commits