rocksdb

Author	SHA1	Message	Date
sdong	9efbd85ac9	fsync directory after creating current file in NewDB() Summary: One of our users reported current file corruption. The machine was rebooted during the time. This is the only think I can think of which could cause current file corruption. Just add this paranoid check. Test Plan: make all check Reviewers: haobo, igor Reviewed By: haobo CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18495	2014-05-06 17:51:33 -07:00
sdong	3a171dcb51	Pass logger to memtable rep and TLB page allocation error logged to info logs Summary: TLB page allocation errors are now logged to info logs, instead of stderr. In order to do that, mem table rep's factory functions take a info logger now. Test Plan: make all check Reviewers: haobo, igor, yhchiang Reviewed By: yhchiang CC: leveldb, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18471	2014-05-05 16:43:37 -07:00
Igor Canadi	15c3991933	Add comment about ValueType	2014-05-05 12:57:47 -07:00
Igor Canadi	d2569fea47	log_and_apply_bench on a new benchmark framework Summary: db_test includes Benchmark for LogAndApply. This diff removes it from db_test and puts it into a separate log_and_apply bench. I just wanted to play around with our new benchmark framework and figure out how it works. I would also like to show you how great it is! I believe right set of microbenchmarks can speed up our productivity a lot and help catch early regressions. Test Plan: no Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18261	2014-05-05 11:11:48 -07:00
sdong	4a7c747064	Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"" And make the default 0 for hash linked list memtable This reverts commit `d69dc64be7`.	2014-05-04 13:56:29 -07:00
Igor Canadi	d69dc64be7	Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB" This reverts commit `7dafa3a1d7`.	2014-05-04 08:37:09 -07:00
Igor Canadi	0afc8bc29a	xxHash Summary: Originally: https://github.com/facebook/rocksdb/pull/87/files I'm taking over to apply some finishing touches Test Plan: will add tests Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18315	2014-05-01 14:09:32 -04:00
Igor Canadi	096f5be0ed	Put column family information in LiveFileMetaData Summary: As summary Test Plan: compiles :) Reviewers: dhruba, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18405	2014-04-30 16:24:52 -04:00
Igor Canadi	16f1aa7b2d	Fix signed/unsigned compare	2014-04-30 14:38:01 -04:00
Igor Canadi	df70047669	Flush stale column families Summary: Added a new option `max_total_wal_size`. Once the total WAL size goes over that, we make an attempt to flush all column families that still have data in the earliest WAL file. By default, I calculate `max_total_wal_size` dynamically, that should be good-enough for non-advanced customers. Test Plan: Added a test Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18345	2014-04-30 14:33:40 -04:00
sdong	7dafa3a1d7	Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, dhruba, leveldb, igor, yhchiang Differential Revision: https://reviews.facebook.net/D18357	2014-04-30 11:02:26 -07:00
Igor Canadi	66f88c43a5	Some fixes as preparation for release	2014-04-30 09:03:24 -07:00
Igor Canadi	d6d67c0efe	More s/us fixes	2014-04-30 07:04:36 -07:00
Yueh-Hsuan Chiang	9d9d2965cb	Add a new mem-table representation based on cuckoo hash. Summary: = Major Changes = * Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash. Cuckoo hash uses multiple hash functions. This allows each key to have multiple possible locations in the mem-table. - Put: When insert a key, it will try to find whether one of its possible locations is vacant and store the key. If none of its possible locations are available, then it will kick out a victim key and store at that location. The kicked-out victim key will then be stored at a vacant space of its possible locations or kick-out another victim. In this diff, the kick-out path (known as cuckoo-path) is found using BFS, which guarantees to be the shortest. - Get: Simply tries all possible locations of a key --- this guarantees worst-case constant time complexity. - Time complexity: O(1) for Get, and average O(1) for Put if the fullness of the mem-table is below 80%. - Default using two hash functions, the number of hash functions used by the cuckoo-hash may dynamically increase if it fails to find a short-enough kick-out path. - Currently, HashCuckooRep does not support iteration and snapshots, as our current main purpose of this is to optimize point access. = Minor Changes = * Add IsSnapshotSupported() to DB to indicate whether the current DB supports snapshots. If it returns false, then DB::GetSnapshot() will always return nullptr. Test Plan: Run existing tests. Will develop a test specifically for cuckoo hash in the next diff. Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb, dhruba, igor Differential Revision: https://reviews.facebook.net/D16155	2014-04-29 17:13:46 -07:00
Igor Canadi	f1c9aa6ebe	More unsigned/signed compare fixes	2014-04-29 13:01:06 -07:00
Igor Canadi	38693d99c4	Fix more signed/unsigned comparsions	2014-04-29 12:40:18 -07:00
Igor Canadi	dd9eb7a7d5	Cache result of ReadFirstRecord() Summary: ReadFirstRecord() reads the actual log file from disk on every call. This diff introduces a cache layer on top of ReadFirstRecord(), which should significantly speed up repeated calls to GetUpdatesSince(). I also cleaned up some stuff, but the whole TransactionLogIterator could use some refactoring, especially if we see increased usage. Test Plan: make check Reviewers: haobo, sdong, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18387	2014-04-29 13:27:58 -04:00
Igor Canadi	91ef2eae23	Use new DBWithTTL API in tests	2014-04-28 23:46:24 -04:00
Igor Canadi	72ff275e3c	Fix TransactionLogIterator EOF caching Summary: When TransactionLogIterator comes to EOF, it calls UnmarkEOF and continues reading. However, if glibc cached the EOF status of the file, it will get EOF again, even though the new data might have been written to it. This has been causing errors in Mac OS. Test Plan: test passes, was failing before Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18381	2014-04-28 23:30:27 -04:00
Donovan Hide	4f9fae9bb7	Add rocksdb_open_for_read_only to C API	2014-04-27 20:57:10 +01:00
Igor Canadi	c489499a2b	Fix OSX compile	2014-04-26 17:15:43 -04:00
Lei Jin	ccaca59bee	avoid calling FindFile twice in TwoLevelIterator for PlainTable Summary: this is to reclaim the regression introduced in https://reviews.facebook.net/D17853 Test Plan: make all check Reviewers: igor, haobo, sdong, dhruba, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17985	2014-04-25 12:23:07 -07:00
Lei Jin	d642c60bdc	Check PrefixMayMatch on Seek() Summary: As a follow-up diff for https://reviews.facebook.net/D17805, add optimization to check PrefixMayMatch on Seek() Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17853	2014-04-25 12:22:23 -07:00
Lei Jin	3995e801ab	kill ReadOptions.prefix and .prefix_seek Summary: also add an override option total_order_iteration if you want to use full iterator with prefix_extractor Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17805	2014-04-25 12:21:34 -07:00
Igor Canadi	8ce5492623	Delete superversion and log outside of mutex Summary: As summary. Add two autovectors that get filled up in MakeRoomForWrite and they get deleted outside of mutex Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18249	2014-04-25 14:58:02 -04:00
Igor Canadi	ad3cd39ccd	Column family logging Summary: Now that we have column families involved, we need to add extra context to every log message. They now start with "[column family name] log message" Also added some logging that I think would be useful, like level summary after every flush (I often needed that when going through the logs). Test Plan: make check + ran db_bench to confirm I'm happy with log output Reviewers: dhruba, haobo, ljin, yhchiang, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18303	2014-04-25 09:51:16 -04:00
Igor Canadi	4cd9f58c04	Fix corruption test	2014-04-24 14:56:41 -04:00
Igor Canadi	478990c81b	Make CompactionInputErrorParanoid less flakey Summary: I'm getting lots of e-mails with CompactionInputErrorParanoid failing. Most recent example early morning today was: http://ci-builds.fb.com/job/rocksdb_valgrind/562/consoleFull I'm putting a stop to these e-mails. I investigated why the test is flakey and it turns out it's because of non-determinsim of compaction scheduling. If there is a compaction after the last flush, CorruptFile will corrupt the compacted file instead of file at level 0 (as it assumes). That makes `Check(9, 9)` fail big time. I also saw some errors with table file getting outputed to >= 1 levels instead of 0. Also fixed that. Test Plan: Ran corruption_test 100 times without a failure. Previously it usually failed at 10th occurrence. Reviewers: dhruba, haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18285	2014-04-24 11:13:28 -07:00
sdong	4de5b84ee0	Fix a bug in IterKey Summary: IterKey set buffer_size_ to a wrong initial value, causing it to always allocate values from heap instead of stack if the key size is smaller. Fix it. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D18279	2014-04-23 19:45:22 -07:00
Igor Canadi	f9f8965e96	Print out stack trace in mac, too Summary: While debugging Mac-only issue with ThreadLocalPtr, this was very useful. Let's print out stack trace in MAC OS, too. Test Plan: Verified that somewhat useful stack trace was generated on mac. Will run PrintStack() on linux, too. Reviewers: ljin, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18189	2014-04-23 09:11:35 -04:00
sdong	a570740727	Expose number of entries in mem tables to users Summary: In this patch, two new DB properties are defined: rocksdb.num-immutable-mem-table and rocksdb.num-entries-imm-mem-tables, from where number of entries in mem tables can be exposed to users Test Plan: Cover the codes in db_test make all check Reviewers: haobo, ljin, igor Reviewed By: igor CC: nkg-, igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18207	2014-04-22 22:13:21 -07:00
Lei Jin	5f1daf7ae3	get rid of shared_ptr in memtable.cc Summary: Get rid of the devil. Probably won't impact anything on the perf side. Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D18153	2014-04-22 21:14:25 -07:00
sdong	86a0133d05	PlainTableReader to expose index size to users Summary: This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files. In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them. Test Plan: Add a unit test. make all check` Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18195	2014-04-22 19:29:05 -07:00
Igor Canadi	1068d2fa60	Revert "Better port::Mutex::AssertHeld() and AssertNotHeld()" This reverts commit `ddafceb6c2`.	2014-04-22 18:38:10 -07:00
Igor Canadi	ddafceb6c2	Better port::Mutex::AssertHeld() and AssertNotHeld() Summary: Using ThreadLocalPtr as a flag to determine if a mutex is locked or not enables us to implement AssertNotHeld(). It also makes AssertHeld() actually correct. I had to remove port::Mutex as a dependency for util/thread_local.h, but that's fine since we can just use std::mutex :) Test Plan: make check Reviewers: ljin, dhruba, haobo, sdong, yhchiang Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18171	2014-04-22 17:26:21 -07:00
Igor Canadi	3992aec8fa	Support for column families in TTL DB Summary: This will enable people using TTL DB to do so with multiple column families. They can also specify different TTLs for each one. TODO: Implement CreateColumnFamily() in TTL world. Test Plan: Added a very simple sanity test. Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: haobo CC: leveldb, alberts Differential Revision: https://reviews.facebook.net/D17859	2014-04-22 11:27:33 -07:00
Igor Canadi	8dc34364d2	Rename "benchmark" back to "bench". Also, make `benchharness.cc` not compiled into rocksdb library.	2014-04-21 13:12:15 -07:00
Pratyush Seth	ff1b5df4c6	Added benchmark functionality on the lines of folly/Benchmark.h Summary: Added benchmark functionality on the lines of folly/Benchmark.h Test Plan: Added unit tests Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17973	2014-04-21 12:29:55 -07:00
Igor Canadi	f813279da5	Remove TransactionLogIteratorRace when -DNDEBUG	2014-04-21 11:08:30 -07:00
Lei Jin	0f2d768191	hints for narrowing down FindFile range and avoiding checking unrelevant L0 files Summary: The file tree structure in Version is prebuilt and the range of each file is known. On the Get() code path, we do binary search in FindFile() by comparing target key with each file's largest key and also check the range for each L0 file. With some pre-calculated knowledge, each key comparision that has been done can serve as a hint to narrow down further searches: (1) If a key falls within a L0 file's range, we can safely skip the next file if its range does not overlap with the current one. (2) If a key falls within a file's range in level L0 - Ln-1, we should only need to binary search in the next level for files that overlap with the current one. (1) will be able to skip some files depending one the key distribution. (2) can greatly reduce the range of binary search, especially for bottom levels, given that one file most likely only overlaps with N files from the level below (where N is max_bytes_for_level_multiplier). So on level L, we will only look at ~N files instead of N^L files. Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s = 9.6M/s), it gives us ~13% improvement. Test Plan: make all check Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17205	2014-04-21 09:10:12 -07:00
sdong	651792251a	Fix bugs introduced by D17961 Summary: D17961 has two bugs: (1) two level iterator fails to populate FileMetaData.table_reader, causing performance regression. (2) table cache handle the !status.ok() case in the wrong place, causing seg fault which shouldn't happen. Test Plan: make all check Reviewers: ljin, igor, haobo Reviewed By: ljin CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17991	2014-04-17 17:25:28 -07:00
sdong	fa430bfd04	Minimize accessing multiple objects in Version::Get() Summary: One of our profilings shows that Version::Get() sometimes is slow when getting pointer of user comparators or other global objects. In this patch: (1) we keep pointers of immutable objects in Version to avoid accesses them though option objects or cfd objects (2) table_reader is directly cached in FileMetaData so that table cache don't have to go through handle first to fetch it (3) If level 0 has less than 3 files, skip the filtering logic based on SST tables' key range. Smallest and largest key are stored in separated memory locations, which has potential cache misses Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17739	2014-04-17 14:14:00 -07:00
Igor Canadi	161d9e586b	Don't overflow size_t in mac	2014-04-16 15:15:22 -07:00
Igor Canadi	5c12f27791	Remove tautological assert	2014-04-16 09:09:28 -07:00
Igor Canadi	faf7691358	Close DB at the end of DontRollEmptyLogs test	2014-04-15 17:20:56 -07:00
Igor Canadi	1803ed2ccb	Fix Mac OS compile	2014-04-15 16:31:49 -07:00
Igor Canadi	7d838856cf	Fix compile issues when doing make release	2014-04-15 16:00:10 -07:00
sdong	0f40fe4bc7	When creating a new DB, fail it when wal_dir contains existing log files Summary: Current behavior of creating new DB is, if there is existing log files, we will go ahead and replay them on top of empty DB. This is a behavior that no user would expect. With this patch, we will fail the creation if a user creates a DB with existing log files. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: haobo CC: nkg-, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17817	2014-04-15 14:01:57 -07:00
Igor Canadi	c166615850	Fix compile issues introduced by RocksDBLite	2014-04-15 13:51:07 -07:00
Igor Canadi	588bca2020	RocksDBLite Summary: Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile. Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping) Test Plan: compiles :) Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17835	2014-04-15 13:39:26 -07:00
Igor Canadi	dbe0f327ca	Set log_empty to false even when options.sync is off [fix tests]	2014-04-15 10:28:34 -07:00
Igor Canadi	e6acb874cd	Don't roll empty logs Summary: With multiple column families, especially when manual Flush is executed, we might roll the log file, although the current log file is empty (no data has been written to the log). After the diff, we won't create new log file if current is empty. Next, I will write an algorithm that will flush column families that reference old log files (i.e., that weren't flushed in a while) Test Plan: Added an unit test. Confirmed that unit test failes in master Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17631	2014-04-15 09:57:25 -07:00
sdong	c87ed0942c	Fix db_bench's multireadrandom Summary: multireadrandom is broken. Fix it Test Plan: run it and see segfault has gone. Reviewers: ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17781	2014-04-14 15:43:34 -07:00
Yueh-Hsuan Chiang	118f88d25d	Fix compile error in tailing_iter.h Summary: Fix the following compile error ./db/tailing_iter.h:17:1: error: class 'SuperVersion' was previously declared as a struct [-Werror,-Wmismatched-tags] class SuperVersion; ^ ./db/column_family.h:77:8: note: previous use is here struct SuperVersion { ^ ./db/tailing_iter.h:17:1: note: did you mean struct here? class SuperVersion; ^~~~~ struct 1 error generated. Test Plan: make Reviewers: ljin, igor, haobo, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17799	2014-04-14 14:05:15 -07:00
Yueh-Hsuan Chiang	327102efa5	Fix merge_test failure due to incorrect assert behavior in the release mode.	2014-04-14 12:06:49 -07:00
Lei Jin	82b37a18bd	thread local for tailing iterator Summary: replace the super version acquisision in tailing itrator with thread local Test Plan: will post results Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17757	2014-04-14 10:48:01 -07:00
Lei Jin	539dd207df	using thread local SuperVersion for NewIterator Summary: Similar to GetImp(), use SuperVersion from thread local instead of acquriing mutex. I don't expect this change will make a dent on NewIterator() performance because the bottleneck seems to be on the rest part of the API Test Plan: make asan_check will post perf numbers Reviewers: haobo, igor, sdong, dhruba, yhchiang Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17643	2014-04-14 09:34:59 -07:00
sdong	d5e087b6df	db_bench: add a mode to operate multiple DBs Summary: This patch introduces a new parameter num_multi_db in db_bench. When this parameter is larger than 1, multiple DBs will be created. In all benchmarks, any operation applies to a random DB among them. This is to benchmark the performance of similar applications. Test Plan: run db_bench on both of num_multi_db=0 and more. Reviewers: haobo, ljin, igor Reviewed By: igor CC: igor, yhchiang, dhruba, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17769	2014-04-11 16:59:08 -07:00
Lei Jin	eba3fc644a	make corruption_test:CompactionInputErrorParanoid deterministic Summary: it writes ~10M data, default L0 compaction trigger is 4, plus 2 writer buffer, so that can accommodate ~6M data before compaction happens for sure. I guess encoding is doing a good job to shrink the data so that sometime, compaction does not get triggered. I get test failure quite often. Test Plan: ran it multiple times and all got pass Reviewers: igor, sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17775	2014-04-11 12:48:38 -07:00
Igor Canadi	de41357a18	Don't dump rocksdb version on IOS	2014-04-11 10:19:58 -07:00
Lei Jin	0af36d6aa6	SeekRandomWhileWriting Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17751	2014-04-11 09:47:20 -07:00
Kai Liu	1405232b6d	Temporarily disable a test case in db_test Summary: Root cause is still under investigation. Just Disable the troubling use case for now.	2014-04-10 17:17:39 -07:00
Igor Canadi	ddef6841b3	Renamed InfoLogLevel::DEBUG to InfoLogLevel::DEBUG_LEVEL Summary: XCode for some reason injects `#define DEBUG 1` into our code, which makes compile fail because we use `DEBUG` keyword for other stuff. This diff fixes the issue by renaming `DEBUG` to `DEBUG_LEVEL`. Test Plan: compiles Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17709	2014-04-10 15:27:42 -07:00
Kai Liu	75b59d5146	Enable hash index for block-based table Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index. Test Plan: Wrote several new unit tests. Reviewers: sdong, haobo, dhruba Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16539	2014-04-10 14:19:43 -07:00
Lei Jin	7a92537fc4	db_bench: add IteratorCreationWhileWriting mode and allow prefix_seek Summary: as title Test Plan: ran it Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17655	2014-04-10 10:15:59 -07:00
Igor Canadi	4daea66343	Turn on -Wmissing-prototypes Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes. Test Plan: compiles Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17649	2014-04-09 21:17:14 -07:00
sdong	df2a8b6a1a	Polish IterKey and use it in DBImpl::ProcessKeyValueCompaction() Summary: 1. Polish IterKey a little bit. 2. Turn to use it in local parameter of current_user_key in DBImpl::ProcessKeyValueCompaction(). Our profile showing that DBImpl::ProcessKeyValueCompaction() has about 14% costs in std::string (the base including reading and writing data but excluding compaction filtering), which is higher than it should be. There are two std::string used in DBImpl::ProcessKeyValueCompaction(), compaction_filter_value and current_user_key and it's hard to distinguish the two. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17613	2014-04-09 20:50:58 -07:00
Igor Canadi	dc55903293	Improved CompressedCache Summary: This is testing behavior that was reported in https://github.com/facebook/rocksdb/issues/111 No issue was found, but it still good to commit this and make CompressedCache more robust. Test Plan: this is a plan Reviewers: ljin, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17625	2014-04-09 11:43:14 -07:00
Lei Jin	4824014e3b	speed up db_bench filluniquerandom mode Summary: filluniquerandom is painfully slow due to the naive bitmap check to find out if a key has been seen before. Majority of time is spent on searching the last few keys. Split a giant BitSet to smaller ones so that we can quickly check if a BitSet is full and thus can skip quickly. It used to take over one hour to filluniquerandom for 100M keys, now it takes about 10 mins. Test Plan: unit test also verified correctness in db_bench and make sure all keys are generated Reviewers: igor, haobo, yhchiang Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17607	2014-04-09 11:25:21 -07:00
Igor Canadi	2014915d32	Fix ASAN issue	2014-04-09 10:38:05 -07:00
Igor Canadi	b947fdc89d	Column family support for DB::OpenForReadOnly() Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though) Test Plan: added a unit test in column_family_test Reviewers: haobo, sdong, ljin, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17565	2014-04-09 09:56:17 -07:00
Igor Canadi	731e55c01c	Fix GetProperty() test Summary: GetProperty test is flakey. Before this diff: P8635927 After: P8635945 We need to make sure the thread is done before we destruct sleeping tasks. Otherwise, bad things happen. Test Plan: See summary Reviewers: ljin, sdong, haobo, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17595	2014-04-08 14:57:00 -07:00
Igor Canadi	34455deb06	Fix Mac OS compile issues	2014-04-08 14:05:53 -07:00
Igor Canadi	5b345b76cb	Remove env_ from MergingIterator Summary: env_ is not used. Compiling for iOS complains. Test Plan: compiles now Reviewers: ljin, haobo, sdong, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17589	2014-04-08 13:40:42 -07:00
Lei Jin	0c1126d4cf	db_bench cleanup Summary: clean up the db_bench a little bit. also avoid allocating memory for key in the loop Test Plan: I verified a run with filluniquerandom & readrandom. Iterator seek will be used lot to measure performance. Will fix whatever comes up Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17559	2014-04-08 11:21:09 -07:00
Igor Canadi	beeee9dccc	Small speedup of CompactionFilterV2 Summary: ToString() is expensive. Profiling shows that most compaction threads are stuck in jemalloc, allocating a new string. This will help out a litte. Test Plan: make check Reviewers: haobo, danguo Reviewed By: danguo CC: leveldb Differential Revision: https://reviews.facebook.net/D17583	2014-04-08 11:06:39 -07:00
Lei Jin	92c1eb0291	macros for perf_context Summary: This will allow us to disable them completely for iOS or for better performance Test Plan: will run make all check Reviewers: igor, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17511	2014-04-08 10:58:07 -07:00
sdong	5e2db3b434	PlainTableIterator not to store copied key in std::string Summary: Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h. This patch improves iterator performance significantly. Running this benchmark: ./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond The average latency is improved to about 750 nanoseconds from 1100 nanoseconds. Test Plan: Add a unit test. make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17547	2014-04-07 19:06:09 -07:00
Igor Canadi	664559fe2d	Small final fixes before merge	2014-04-07 15:38:53 -07:00
Igor Canadi	d1e2bce42d	CallFlushDuringCompaction	2014-04-07 15:03:15 -07:00
Igor Canadi	b42ceb9598	Simplify cleanup of dead (refcount == 0) column families	2014-04-07 14:31:02 -07:00
Igor Canadi	e48348d196	Make flush part of compaction process This will enable user to use only 1 background thread.	2014-04-07 13:53:08 -07:00
Igor Canadi	2a0917b28e	Merge branch 'master' into columnfamilies	2014-04-07 13:04:25 -07:00
Igor Canadi	751e4b1a35	Fix wal_dir sanitizing	2014-04-07 11:36:03 -07:00
Igor Canadi	3d2fe844ab	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/memtable_list.cc db/version_set.cc	2014-04-07 11:31:11 -07:00
Igor Canadi	7efdd9ef4d	Options::wal_dir shouldn't end in '/' Summary: If a client specifies wal_dir with trailing '/', we will fail in deleting obsolete log files. See task #4083746 Test Plan: make check Reviewers: haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17535	2014-04-07 10:25:38 -07:00
sdong	ea0198fe9a	Create log::Writer out of DB Mutex Summary: Our measurement shows that sometimes new log::Write's constructor can take hundreds of milliseconds. It's unclear why but just simply move it out of DB mutex. Test Plan: make all check Reviewers: haobo, ljin, igor Reviewed By: haobo CC: nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17487	2014-04-04 15:46:28 -07:00
Lei Jin	c90d446ee7	make hash_link_list Node's key space consecutively followed at the end Summary: per sdong's request, this will help processor prefetch on n->key case. Test Plan: make all check Reviewers: sdong, haobo, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17415	2014-04-04 15:37:28 -07:00
sdong	99c756f0fe	Flush Buffered Info Logs Before Doing Compaction (one line change) Summary: Flushing log buffer earlier to avoid confusion of time holding the locks. Test Plan: Should be safe as long as several related db test passes Reviewers: haobo, igor, ljin Reviewed By: igor CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17493	2014-04-04 10:58:30 -07:00
Igor Canadi	040657aec9	Fix MacOS errors	2014-04-03 16:04:34 -07:00
Igor Canadi	f76e4027ca	initialize candidate count	2014-04-03 11:45:44 -07:00
sdong	b9767d0e09	Move several more logging inside DB mutex to log buffer Summary: Move several some common logging still in DB mutex to log buffer. Test Plan: make all check Reviewers: haobo, igor, ljin, nkg- Reviewed By: nkg- CC: nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17439	2014-04-03 10:47:18 -07:00
Thomas Adam	98422cba77	[C-API] implemented more options	2014-04-03 10:47:37 +02:00
Thomas Adam	3a30b5b0be	[C-API] added "rocksdb_options_set_plain_table_factory" to make it possible to use plain table factory	2014-04-03 10:47:37 +02:00
Haobo Xu	48bc0c6ad3	[RocksDB] Fix a race condition in GetSortedWalFiles Summary: This patch fixed a race condition where a log file is moved to archived dir in the middle of GetSortedWalFiles. Without the fix, the log file would be missed in the result, which leads to transaction log iterator gap. A test utility SyncPoint is added to help reproducing the race condition. Test Plan: TransactionLogIteratorRace; make check Reviewers: dhruba, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17121	2014-04-02 22:12:29 -07:00
Igor Canadi	d1d19f5db3	Fix valgrind error in c_test	2014-04-02 17:24:30 -07:00
sdong	158845ba9a	Move a info logging out of DB Mutex Summary: As we know, logging can be slow, or even hang for some file systems. Move one more logging out of DB mutex. Test Plan: make all check Reviewers: haobo, igor, ljin Reviewed By: igor CC: yhchiang, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17427	2014-04-02 16:48:32 -07:00
sdong	4af1954fd6	Compaction Filter V1 to use old context struct to keep backward compatible Summary: The previous change D15087 changed existing compaction filter, which makes the commonly used class not backward compatible. Revert the older interface. Use a new interface for V2 instead. Test Plan: make all check Reviewers: haobo, yhchiang, igor CC: danguo, dhruba, ljin, igor, leveldb Differential Revision: https://reviews.facebook.net/D17223	2014-04-02 14:57:51 -07:00
sdong	284c365b77	Fix valgrind error caused by FileMetaData as two level iterator's index block handle Summary: It is a regression valgrind bug caused by using FileMetaData as index block handle. One of the fields of FileMetaData is not initialized after being contructed and copied, but I'm not able to find which one. Also, I realized that it's not a good idea to use FileMetaData as in TwoLevelIterator::InitDataBlock(), a copied FileMetaData can be compared with the one in version set byte by byte, but the refs can be changed. Also comparing such a large structure is slightly more expensive. Use a simpler structure instead Test Plan: Run the failing valgrind test (Harness.RandomizedLongDB) make all check Reviewers: igor, haobo, ljin Reviewed By: igor CC: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17409	2014-04-02 14:38:28 -07:00
Igor Canadi	8555ce2dec	Merge branch 'master' into columnfamilies	2014-04-02 10:48:05 -07:00
sdong	e0a87c4cf1	DBIter to use static allocated char array for saved_key_ (if it is not too long) Summary: DBIter now uses a std::string for saved_key. Based on some profiling, it could be more expensive than we though. Optimize it with the same technique as LookupKey -- if it is short, we copy it to a static allocated char. Otherwise, dynamically allocate memory for it. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: dhruba, igor, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17289	2014-04-01 16:43:11 -07:00
sdong	d50619a559	PlainTableIterator::Seek() shouldn't check bloom filter in total order mode Summary: In total order mode, iterator's seek() shouldn't check total order. Also some cleaning up about checking null for shared pointers. I don't know the behavior before it. This bug was reported by @igor. Test Plan: test plain_table_db_test Reviewers: ljin, haobo, igor Reviewed By: igor CC: yhchiang, dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D17391	2014-04-01 15:05:16 -07:00
Thomas Adam	38dc5ef45f	[C-API] added the possiblity to create a HashSkipList or HashLinkedList to support prefix seeks	2014-04-01 12:44:27 +02:00
Igor Canadi	ddbd1ece88	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc db/internal_stats.cc db/internal_stats.h db/version_edit.cc db/version_edit.h db/version_set.cc include/rocksdb/options.h util/options.cc	2014-03-31 13:39:24 -07:00
Igor Canadi	577556d5f9	Don't store version number in MANIFEST Summary: Talked to <insert internal project name> folks and they found it really scary that they won't be able to roll back once they upgrade to 2.8. We should fix this. Test Plan: make check Reviewers: haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17343	2014-03-31 11:33:09 -07:00
Igor Canadi	8a139a054c	More valgrind issues! Summary: Fix some more CompactionFilterV2 valgrind issues. Maybe it would make sense for CompactionFilterV2 to delete its prefix_extractor? Test Plan: ran CompactionFilterV2* tests with valgrind. issues before patch -> no issues after Reviewers: haobo, sdong, ljin, dhruba Reviewed By: dhruba CC: leveldb, danguo Differential Revision: https://reviews.facebook.net/D17337	2014-03-29 10:34:47 -07:00
sdong	43a593a6d9	Change default value of some Options Summary: Since we are optimizing for server workloads, some default values are not optimized any more. We change some of those values that I feel it's less prone to regression bugs. Test Plan: make all check Reviewers: dhruba, haobo, ljin, igor, yhchiang Reviewed By: igor CC: leveldb, MarkCallaghan Differential Revision: https://reviews.facebook.net/D16995	2014-03-28 17:09:28 -07:00
sdong	2d3468c293	MemTableIterator not to reference Memtable Summary: In one of the perf, I shows 10%-25% CPU costs of MemTableIterator.Seek(), when doing dynamic prefix seek, are spent on checking whether we need to do bloom filter check or finding out the prefix extractor. Seems that more level of pointer checking makes CPU cache miss more likely. This patch makes things slightly simpler by copying pointer of bloom of prefix extractor into the iterator. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin CC: igor, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17247	2014-03-28 16:46:25 -07:00
Lei Jin	0d755fff14	cache friendly blocked bloomfilter Summary: By constraining the probes within cache line(s), we can improve the cache miss rate thus performance. This probably only makes sense for in-memory workload so defaults the option to off. Numbers and comparision can be found in wiki: https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study Test Plan: benchmarked this change substantially. Will run make all check as well Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17133	2014-03-28 09:21:20 -07:00
Yueh-Hsuan Chiang	10cebec79e	Fix the bug in MergeUtil which causes mixing values of different keys. Summary: Fix the bug in MergeUtil which causes mixing values of different keys. Test Plan: stringappend_test make all check Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17235	2014-03-27 16:15:25 -07:00
Haobo Xu	a92194e5b2	[RocksDB] Add db property "rocksdb.cur-size-active-mem-table" Summary: as title Test Plan: db_test Reviewers: sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17217	2014-03-27 15:14:04 -07:00
Igor Canadi	1c9f8f0884	Fix valgrind issues Summary: NewFixedPrefixTransform is leaked in default options. Broken by `b47812fba6` Also included in the diff some code cleanup Test Plan: valgrind env_test also make check Reviewers: haobo, danguo, yhchiang Reviewed By: danguo CC: leveldb Differential Revision: https://reviews.facebook.net/D17211	2014-03-27 08:22:59 -07:00
sdong	d556200264	Some small cleaning up to make some compiling environment happy Summary: Compiler complains some errors when building using our internal build settings. Fix them. Test Plan: rebuild Reviewers: haobo, dhruba, igor, yhchiang, ljin Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17199	2014-03-26 18:11:41 -07:00
Igor Canadi	6a08bc042a	Fix no return warning in FileComparator	2014-03-26 14:46:07 -07:00
Igor Canadi	1e9621d4e5	Sort files correctly in Builder::SaveTo Summary: Previously, we used to sort all files by BySmallestFirst comparator and then re-sort level0 files in the Finalize() (recently moved to end of SaveTo). In this diff, I chose the correct comparator at the beginning and sort the files correctly in Builder::SaveTo. I also added a verification that all files are sorted correctly in CheckConsistency() NOTE: This diff depends on D17037 Test Plan: make check. Will also run db_stress Reviewers: dhruba, haobo, sdong, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17049	2014-03-26 13:30:14 -07:00
Igor Canadi	954679bb0f	AssertHeld() should do things Summary: AssertHeld() was a no-op before. Now it does things. Also, this change caught a bad bug in SuperVersion::Init(). The method is calling db->mutex.AssertHeld(), but db variable is not initialized yet! I also fixed that issue. Test Plan: make check Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17193	2014-03-26 11:24:52 -07:00
Igor Canadi	ad9a39c9b4	[RocksDB] Preallocate new MANIFEST files Summary: We don't preallocate MANIFEST file, even though we have an option for that. This diff preallocates manifest file every time we create it Test Plan: make check Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17163	2014-03-26 09:37:53 -07:00
sdong	6b2e7a2a01	When Options.max_num_files=-1, non level0 files also by pass table cache Summary: This is the part that was not finished when doing the Options.max_num_files=-1 feature. For iterating non level0 SST files (which was done using two level iterator), table cache is not bypassed. With this patch, the leftover feature is done. Test Plan: make all check; change Options.max_num_files=-1 in one of the tests to cover the codes. Reviewers: haobo, igor, dhruba, ljin, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17001	2014-03-25 18:40:52 -07:00
Igor Canadi	e86d7dffd7	Merge branch 'master' into columnfamilies	2014-03-25 15:24:02 -07:00
Yueh-Hsuan Chiang	b9ce156e38	Add assert to MergeOperator::PartialMergeMulti to check # of operands. Summary: Add assert(operands_list.size() >= 2) in MergeOperator::PartialMergeMulti to ensure it's only be called when we have at least two merge operands. Test Plan: run merge_test and stringappend_test. Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17169	2014-03-25 13:39:17 -07:00
Danny Guo	d9ca83df28	[rocksdb] make init prefix more robust Summary: Currently if client uses kNULLString as the prefix, it will confuse compaction filter v2. This diff added a bool to indicate if the prefix has been intialized. I also added a unit test to cover this case and make sure the new code path is hit. Test Plan: db_test Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17151	2014-03-25 11:59:40 -07:00
Yueh-Hsuan Chiang	34f9da1cef	Fix the failure of stringappend_test caused by PartialMergeMulti. Summary: Fix a bug that PartialMergeMulti will try to merge the first operand with an empty slice. Test Plan: run stringappend_test and merge_test. Reviewers: haobo, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17157	2014-03-25 11:50:09 -07:00
Igor Canadi	e8168382c4	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc include/rocksdb/options.h util/options.cc	2014-03-25 11:09:40 -07:00
Danny Guo	b47812fba6	[rocksdb] new CompactionFilterV2 API Summary: This diff adds a new CompactionFilterV2 API that roll up the decisions of kv pairs during compactions. These kv pairs must share the same key prefix. They are buffered inside the db. typedef std::vector<Slice> SliceVector; virtual std::vector<bool> Filter(int level, const SliceVector& keys, const SliceVector& existing_values, std::vector<std::string>* new_values, std::vector<bool>* values_changed ) const = 0; Application can override the Filter() function to operate on the buffered kv pairs. More details in the inline documentation. Test Plan: make check. Added unit tests to make sure Keep, Delete, Change all works. Reviewers: haobo CCs: leveldb Differential Revision: https://reviews.facebook.net/D15087	2014-03-24 20:47:53 -07:00
Yueh-Hsuan Chiang	cda4006e87	Enhance partial merge to support multiple arguments Summary: * PartialMerge api now takes a list of operands instead of two operands. * Add min_pertial_merge_operands to Options, indicating the minimum number of operands to trigger partial merge. * This diff is based on Schalk's previous diff (D14601), but it also includes necessary changes such as updating the pure C api for partial merge. Test Plan: * make check all * develop tests for cases where partial merge takes more than two operands. TODOs (from Schalk): * Add test with min_partial_merge_operands > 2. * Perform benchmarks to measure the performance improvements (can probably use results of task #2837810.) * Add description of problem to doc/index.html. * Change wiki pages to reflect the interface changes. Reviewers: haobo, igor, vamsi Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16815	2014-03-24 17:57:13 -07:00
Igor Canadi	ac328a86b9	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc	2014-03-20 14:41:37 -07:00
Igor Canadi	c21ce14fa5	Fix double-free in corruption_test	2014-03-20 14:37:30 -07:00
Igor Canadi	e67241f0b9	Sanity check on Open Summary: Everytime a client opens a DB, we do a sanity check that: * checks the existance of all the necessary files * verifies that file sizes are correct Some of the code was stolen from https://reviews.facebook.net/D16935 Test Plan: added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17097	2014-03-20 14:18:29 -07:00
Yiting Li	7981a43274	Consistency Check Function Summary: Added a function/command to check the consistency of live files' meta data Test Plan: Manual test (size mismatch, file not exist). Command test script. Reviewers: haobo Reviewed By: haobo CC: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D16935	2014-03-20 13:42:45 -07:00
Igor Canadi	8ea3cb621e	If paranoid_checks -- Mark DB read-only on any IOError Summary: Whenever we get an IOError from GetImpl() or NewIterator(), we should immediatelly mark the DB read-only. The same check already exists in Write() and Compaction(). This should help with clients that are somehow missing a file. Test Plan: make check Reviewers: dhruba, haobo, sdong, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17061	2014-03-20 13:10:02 -07:00
sdong	f681030c80	Fix DBTest.UniversalCompactionTrigger failure caused by D17067 Summary: D17067 breaks DBTest.UniversalCompactionTrigger because of wrong location of the checking. Fix it. Test Plan: Run the test and make sure it passes. Reviewers: igor, haobo Reviewed By: igor CC: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D17079	2014-03-20 11:10:11 -07:00
sdong	752ec46cd5	Add a unit test to verify compaction filter context Summary: Add unit tests to make sure CompactionFilterContext::is_manual_compaction_ and CompactionFilterContext::is_full_compaction_ are set correctly. Test Plan: run the new tests. Reviewers: haobo, igor, dhruba, yhchiang, ljin Reviewed By: haobo CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D17067	2014-03-19 18:10:48 -07:00
Igor Canadi	e20fa3f8a4	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/internal_stats.cc db/internal_stats.h db/version_set.cc	2014-03-19 17:22:20 -07:00
Igor Canadi	fcd5c5e828	ComputeCompactionScore in CompactionPicker Summary: As it turns out, we need the call to ComputeCompactionScore (previously: Finalize) in CompactionPicker. The issue caused a deadlock in db_stress: http://ci-builds.fb.com/job/rocksdb_crashtest/290/console The last two lines before a deadlock were: 2014/03/18-22:43:41.481029 7facafbee700 (Original Log Time 2014/03/18-22:43:41.480989) Compaction nothing to do 2014/03/18-22:43:41.481041 7faccf7fc700 wait for fewer level0 files... "Compaction nothing to do" and other thread waiting for fewer level0 files. Hm hm. I moved the pre-sorting to SaveTo, which should fix both the original and the new issue. Test Plan: make check for now, will run db_stress in jenkins Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17037	2014-03-19 16:52:26 -07:00
Igor Canadi	e493f2f54e	Don't compact with zero input files Summary: We have an issue with internal service trying to run compaction with zero input files: 2014/02/07-02:26:58.386531 7f79117ec700 Compaction start summary: Base version 1420 Base level 3, seek compaction:0, inputs:[ϛ~^Qy^?],[] 2014/02/07-02:26:58.386539 7f79117ec700 Compacted 0@3 + 0@4 files => 0 bytes There are two issues: * inputsummary is printing out junk * it's constantly retrying (since I guess madeProgress is true), so it prints out a lot of data in the LOG file (40GB in one day). I read through the Level compaction picker and added some failure condition if input[0] is empty. I think PickCompaction() should not return compaction with zero input files with this change. I'm not confident enough to add an assertion though :) Test Plan: make check Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16005	2014-03-19 16:01:25 -07:00
Igor Canadi	22507aff6c	Fix compile issue in Mac OS Summary: Compile issues are: * Unused variable env_ * Unused fallocate_with_keep_size_ Test Plan: compiles Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17043	2014-03-19 15:40:12 -07:00
Lei Jin	6dc940d4c9	avoid shared_ptr assignment in Version::Get() Summary: This is a 500ns operation while the whole Get() call takes only a few micro! Test Plan: ran db_bench, for a DB with 50M keys, QPS jumps from 5.2M/s to 7.2M/s Reviewers: haobo, igor, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17007	2014-03-19 10:54:32 -07:00
sdong	71e6a34271	Add a DB property to indicate number of background errors encountered Summary: Add a property to calculate number of background errors encountered to help users build their monitoring Test Plan: Add a unit test. make all check Reviewers: haobo, igor, dhruba Reviewed By: igor CC: ljin, nkg-, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16959	2014-03-18 14:28:30 -07:00
Igor Canadi	69aa6ecb26	Finalize fist version in column family	2014-03-18 14:23:47 -07:00
Igor Canadi	e25819a185	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc	2014-03-18 14:00:20 -07:00
Kai Liu	1ec72b37b1	Several easy-to-add properties related to compaction and flushes Summary: To partly address the request @nkg- raised, add three easy-to-add properties to compactions and flushes. Test Plan: run unit tests and add a new unit test to cover new properties. Reviewers: haobo, dhruba Reviewed By: dhruba CC: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D13677	2014-03-18 14:00:09 -07:00
Igor Canadi	758fa8c359	Don't Finalize in CompactionPicker Summary: Finalize re-sorts (read: mutates) the files_ in Version* and it is called by CompactionPicker during normal runtime. At the same time, this same Version* lives in the SuperVersion* and is accessed without the mutex in GetImpl() code path. Mutating the files_ in one thread and reading the same files_ in another thread is a bad idea. It caused this issue: http://ci-builds.fb.com/job/rocksdb_crashtest/285/console Long-term, we need to be more careful with method contracts and clearly document what state can be mutated when. Now that we are much faster because we don't lock in GetImpl(), we keep running into data races that were not a problem before when we were slower. db_stress has been very helpful in detecting those. Short-term, I removed Finalize() from CompactionPicker. Note: I believe this is an issue in current 2.7 version running in production. Test Plan: make check Will also run db_stress to see if issue is gone Reviewers: sdong, ljin, dhruba, haobo Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16983	2014-03-18 13:59:59 -07:00
Igor Canadi	3055a15b29	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/version_edit.cc db/version_edit.h db/version_set.cc	2014-03-18 13:24:27 -07:00
Lei Jin	63cef90078	disable the log_number check in Recover() Summary: There is a chance that an old MANIFEST is corrupted in 2.7 but just not noticed. This check would fail them. Change it to log instead of returning a Corruption status. Test Plan: make Reviewers: haobo, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D16923	2014-03-18 12:46:29 -07:00
Igor Canadi	bcea9c1296	Finalize version in dumpmanifest	2014-03-18 09:45:52 -07:00
Igor Canadi	f26cb0f093	Optimize fallocation Summary: Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads. This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x. At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported. Test Plan: make check Reviewers: dhruba, sdong, haobo, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16761	2014-03-17 21:52:14 -07:00
Igor Canadi	ae25742af9	Fix race condition in manifest roll Summary: When the manifest is getting rolled the following happens: 1) manifest_file_number_ is assigned to a new manifest number (even though the old one is still current) 2) mutex is unlocked 3) SetCurrentFile() creates temporary file manifest_file_number_.dbtmp 4) SetCurrentFile() renames manifest_file_number_.dbtmp to CURRENT 5) mutex is locked If FindObsoleteFiles happens between (3) and (4) it will: 1) Delete manifest_file_number_.dbtmp (because it's not in pending_outputs_) 2) Delete old manifest (because the manifest_file_number_ already points to a new one) I introduce the concept of prev_manifest_file_number_ that will avoid the race condition. However, we should discuss the future of MANIFEST file rolling. We found some race conditions with it last week and who knows how many more are there. Nobody is using it in production because we don't trust the implementation. Should we even support it? Test Plan: make check Reviewers: ljin, dhruba, haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16929	2014-03-17 21:50:15 -07:00
Igor Canadi	d63ae5cb59	Adjust memtable sizes in unit test	2014-03-17 18:37:34 -07:00
Igor Canadi	64904b39a0	Merge branch 'master' into columnfamilies Conflicts: utilities/backupable/backupable_db.cc	2014-03-17 17:57:14 -07:00
Igor Canadi	e0c1211555	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc tools/db_stress.cc	2014-03-17 12:21:05 -07:00

1 2 3 4 5 ...

1049 Commits