rocksdb

Author	SHA1	Message	Date
Lei Jin	ad0c3747cb	cache SuperVersion in thread local storage to avoid mutex lock Summary: as title Test Plan: asan_check will post results later Reviewers: haobo, igor, dhruba, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16257	2014-02-27 11:38:55 -08:00
kailiu	e41c060a06	Make sure logger is safely released in `InfoLogLevel` Summary: fix the memory leak that was captured by jenkin build. Test Plan: ran the valgrind test locally Reviewers: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D16389	2014-02-26 19:07:57 -08:00
Yueh-Hsuan Chiang	ccaedd16d4	Enable log info with different levels. Summary: * Now each Log related function has a variant that takes an additional argument indicating its log level, which is one of the following: - DEBUG, INFO, WARN, ERROR, FATAL. * To ensure backward-compatibility, old version Log functions are kept unchanged. * Logger now has a member variable indicating its log level. Any incoming Log request which log level is lower than Logger's log level will not be output. * The output of the newer version Log will be prefixed by its log level. Test Plan: Add a LogType test in auto_roll_logger_test.cc = Sample log output = 2014/02/11-00:03:07.683895 7feded179840 [DEBUG] this is the message to be written to the log file!! 2014/02/11-00:03:07.683898 7feded179840 [INFO] this is the message to be written to the log file!! 2014/02/11-00:03:07.683900 7feded179840 [WARN] this is the message to be written to the log file!! 2014/02/11-00:03:07.683903 7feded179840 [ERROR] this is the message to be written to the log file!! 2014/02/11-00:03:07.683906 7feded179840 [FATAL] this is the message to be written to the log file!! Reviewers: dhruba, xjin, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D16071	2014-02-26 14:41:28 -08:00
Lei Jin	b2795b799e	thread local pointer storage Summary: This is not a generic thread local implementation in the sense that it only takes pointer. But it does support multiple instances per thread and lets user plugin function to perform cleanup when thread exits or an instance gets destroyed. Test Plan: unit test for now Reviewers: haobo, igor, sdong, dhruba Reviewed By: igor CC: leveldb, kailiu Differential Revision: https://reviews.facebook.net/D16131	2014-02-25 17:47:37 -08:00
sdong	01c27be5fb	A simple benchmark to measure WAL append latency Summary: A simple benchmark that simulates WAL append. It can be used to test different platform/file system's performance on WAL. Test Plan: run it. Reviewers: haobo, kailiu Reviewed By: haobo CC: igor, dhruba, i.am.jin.lei, yhchiang, leveldb, nkg- Differential Revision: https://reviews.facebook.net/D16239	2014-02-24 14:39:32 -08:00
Lei Jin	994c327b86	IOError cleanup Summary: Clean up IOErrors so that it only indicates errors talking to device. Test Plan: make all check Reviewers: igor, haobo, dhruba, emayanke Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15831	2014-02-12 11:42:54 -08:00
Siying Dong	33042669f6	Reduce malloc of iterators in Get() code paths Summary: This patch optimized Get() code paths by avoiding malloc of iterators. Iterator creation is moved to mem table rep implementations, where a callback is called when any key is found. This is the same practice as what we do in (SST) table readers. db_bench result for readrandom following a writeseq, with no compression, single thread and tmpfs, we see throughput improved to 144958 from 139027, about 3%. Test Plan: make all check Reviewers: dhruba, haobo, igor Reviewed By: haobo CC: leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D14685	2014-02-11 10:32:51 -08:00
Albert Strasheim	df2f92214a	Support for LZ4 compression.	2014-02-08 14:15:51 -08:00
Igor Canadi	d53b188228	Fix some errors detected by coverity scan Summary: Nothing major, just an extra return line and posibility of leaking fb in NewRandomRWFile Test Plan: make check Reviewers: kailiu, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15993	2014-02-06 21:59:44 -08:00
kailiu	84f8185fc0	Merge branch 'master' into performance Conflicts: HISTORY.md db/db_impl.cc db/memtable.cc	2014-02-05 21:21:00 -08:00
Albert Strasheim	d411dc5884	crc32c: choose function in static initialization Before: 4.430 micros/op 225732 ops/sec; 881.8 MB/s (4K per op) After: 4.125 micros/op 242425 ops/sec; 947.0 MB/s (4K per op)	2014-02-04 19:13:57 -08:00
Igor Canadi	2966d764cd	Fix some 32-bit compile errors Summary: RocksDB doesn't compile on 32-bit architecture apparently. This is attempt to fix some of 32-bit errors. They are reported here: https://gist.github.com/paxos/8789697 Test Plan: RocksDB still compiles on 64-bit :) Reviewers: kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15825	2014-02-03 13:48:30 -08:00
Siying Dong	d169b67680	[Performance Branch] PlainTable to encode rows with seqID 0, value type using 1 internal byte. Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases. Test Plan: make all check Reviewers: haobo, dhruba, kailiu Reviewed By: haobo CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15489	2014-02-03 12:19:30 -08:00
Igor Canadi	5c6ef56152	Fix printf format	2014-02-03 10:25:37 -08:00
Kai Liu	87bda51d77	Merge pull request #58 from mlin/no-stdout Eliminate stdout message when launching a posix thread.	2014-02-03 00:38:11 -08:00
kailiu	4f6cb17bdb	First phase API clean up Summary: Addressed all the issues in https://reviews.facebook.net/D15447. Now most table-related modules are hidden from user land. Test Plan: make check Reviewers: sdong, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15525	2014-02-03 00:30:43 -08:00
kailiu	4e0298f23c	Clean up arena API Summary: Easy thing goes first. This patch moves arena to internal dir; based on which, the coming patch will deal with memtable_rep. Test Plan: make check Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15615	2014-01-30 22:10:10 -08:00
Igor Canadi	ac92420fc5	Merge branch 'master' into performance Conflicts: db/db_impl.h	2014-01-30 10:09:23 -08:00
Lei Jin	fb84c49a36	convert Tickers back to array with padding and alignment Summary: Pad each Ticker structure to be 64 bytes and make them align on 64 bytes boundary to avoid cache line false sharing issue. Please refer to task 3615553 for more details Test Plan: db_bench LevelDB: version 2.0s Date: Wed Jan 29 12:23:17 2014 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB rocksdb.build.overwrite.qps 49638 rocksdb.build.overwrite.p50_micros 58.73 rocksdb.build.overwrite.p75_micros 210.56 rocksdb.build.overwrite.p99_micros 733.28 rocksdb.build.fillseq.qps 366729 rocksdb.build.fillseq.p50_micros 1.00 rocksdb.build.fillseq.p75_micros 1.00 rocksdb.build.fillseq.p99_micros 2.65 rocksdb.build.readrandom.qps 1152995 rocksdb.build.readrandom.p50_micros 11.27 rocksdb.build.readrandom.p75_micros 15.69 rocksdb.build.readrandom.p99_micros 33.59 rocksdb.build.readrandom_smallblockcache.qps 956047 rocksdb.build.readrandom_smallblockcache.p50_micros 15.23 rocksdb.build.readrandom_smallblockcache.p75_micros 17.31 rocksdb.build.readrandom_smallblockcache.p99_micros 31.49 rocksdb.build.readrandom_memtable_sst.qps 1105183 rocksdb.build.readrandom_memtable_sst.p50_micros 12.04 rocksdb.build.readrandom_memtable_sst.p75_micros 15.78 rocksdb.build.readrandom_memtable_sst.p99_micros 32.49 rocksdb.build.readrandom_fillunique_random.qps 487856 rocksdb.build.readrandom_fillunique_random.p50_micros 29.65 rocksdb.build.readrandom_fillunique_random.p75_micros 40.93 rocksdb.build.readrandom_fillunique_random.p99_micros 78.68 rocksdb.build.memtablefillrandom.qps 91304 rocksdb.build.memtablefillrandom.p50_micros 171.05 rocksdb.build.memtablefillrandom.p75_micros 196.12 rocksdb.build.memtablefillrandom.p99_micros 291.73 rocksdb.build.memtablereadrandom.qps 1340411 rocksdb.build.memtablereadrandom.p50_micros 9.48 rocksdb.build.memtablereadrandom.p75_micros 13.95 rocksdb.build.memtablereadrandom.p99_micros 30.36 rocksdb.build.readwhilewriting.qps 491004 rocksdb.build.readwhilewriting.p50_micros 29.58 rocksdb.build.readwhilewriting.p75_micros 40.34 rocksdb.build.readwhilewriting.p99_micros 76.78 Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15573	2014-01-29 15:08:41 -08:00
kailiu	a5e220f5ef	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_test.cc db/memtable_list.cc db/memtable_list.h table/block_based_table_reader.cc table/table_test.cc util/cache.cc util/coding.cc	2014-01-28 10:35:55 -08:00
Igor Canadi	832158e7f7	Fsync directory after we create a new file Summary: @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder. Should I also add FsyncDir() to new files that get created by a compaction? Test Plan: Confirmed that FsyncDir is returning Status::OK() Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D14751	2014-01-27 11:02:21 -08:00
Siying Dong	b20486f294	[Performance Branch] HashLinkList to avoid to convert length prefixed string back to internal keys Summary: Converting from length prefixed buffer back to internal key costs some CPU but it is not necessary. In this patch, internal keys are pass though the functions so that we don't need to convert back to it. Test Plan: make all check Reviewers: haobo, kailiu Reviewed By: kailiu CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15393	2014-01-27 10:26:14 -08:00
Siying Dong	8477255da3	Moving Some includes from options.h to forward declaration Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies. Test Plan: make all check Reviewers: kailiu, haobo, igor, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15411	2014-01-24 17:16:22 -08:00
Igor Canadi	b13bdfa500	Add a call DisownData() to Cache, which should speed up shutdown Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache. Test Plan: I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368 Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB. Reviewers: dhruba, haobo, MarkCallaghan, xjin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15069	2014-01-24 14:57:52 -08:00
Igor Canadi	677fee27c6	Make VersionSet::ReduceNumberOfLevels() static Summary: A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method. Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file. Test Plan: reduce_levels_test Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15303	2014-01-24 14:57:04 -08:00
kailiu	eda924a03a	Remove an unused `GetLengthPrefixedSlice` Summary: We have 3 versions of GetLengthPrefixedSlice() and one of them is no longer in use. Test Plan: make Reviewers: sdong, igor, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15399	2014-01-23 23:06:52 -08:00
Kai Liu	054c5dda8c	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h util/statistics_imp.h	2014-01-23 16:32:49 -08:00
Igor Canadi	4e91f27c3a	Fix performance regression in statistics Summary: For some reason, D15099 caused a big performance regression: https://fburl.com/16059000 After digging a bit, I figured out that the reason was that std::atomic_uint_fast64_t was allocated in an array. When I switched from an array to vector, the QPS returned to the previous level. I'm not sure why this is happening, but this diff seems to fix the performance regression. Test Plan: I ran the regression script, observed the performance going back to normal Reviewers: tnovak, kailiu, haobo Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15375	2014-01-23 16:11:55 -08:00
Kai Liu	bb19b530ca	Aggressively inlining the short functions in coding.cc Summary: This diff takes an even more aggressive way to inline the functions. A decent rule that I followed is "not inline a function if it is more than 10 lines long." Normally optimizing code by inline is ugly and hard to control, but since one of our usecase has significant amount of CPU used in functions from coding.cc, I'd like to try this diff out. Test Plan: 1. the size for some .o file increased a little bit, but most less than 1%. So I think the negative impact of inline is negligible. 2. As the regression test shows (ran for 10 times and I calculated the average number) Metrics Befor After ======================================================================== rocksdb.build.fillseq.qps 426595 444515 (+4.6%) rocksdb.build.memtablefillrandom.qps 121739 123110 rocksdb.build.memtablereadrandom.qps 1285103 1280520 rocksdb.build.overwrite.qps 125816 135570 (+9%) rocksdb.build.readrandom_fillunique_random.qps 285995 296863 rocksdb.build.readrandom_memtable_sst.qps 1027132 1027279 rocksdb.build.readrandom.qps 1041427 1054665 rocksdb.build.readrandom_smallblockcache.qps 1028631 1038433 rocksdb.build.readwhilewriting.qps 918352 914629 Reviewers: haobo, sdong, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15291	2014-01-23 16:03:34 -08:00
kailiu	4036d58dc9	Fix a Statistics-related unit test faulure Summary: In my MacOS, the member variables are populated with random numbers after initialization. This diff fixes it by fill these arrays with 0. Test Plan: make && ./table_test Reviewers: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15315	2014-01-21 18:02:55 -08:00
Kai Liu	ef602f6275	Misc cleanup on performance branch Summary: Did some trivial stuffs: * Add more comments; * fix compiler's warning messages (uninitialized variables). * etc Test Plan: make check	2014-01-17 14:26:29 -08:00
Igor Canadi	83681bf9ef	Statistics code cleanup Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review. Test Plan: make check Reviewers: haobo, kailiu, sdong, dhruba, tnovak Reviewed By: tnovak CC: leveldb Differential Revision: https://reviews.facebook.net/D15099	2014-01-17 12:46:06 -08:00
kailiu	1304d8c8ce	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_impl.h db/db_test.cc db/memtable.cc db/memtable.h db/version_edit.h db/version_set.cc include/rocksdb/options.h util/hash_skiplist_rep.cc util/options.cc	2014-01-15 23:12:31 -08:00
kailiu	eae1804f29	Remove the unnecessary use of shared_ptr Summary: shared_ptr is slower than unique_ptr (which literally comes with no performance cost compare with raw pointers). In memtable and memtable rep, we use shared_ptr when we'd actually should use unique_ptr. According to igor's previous work, we are likely to make quite some performance gain from this diff. Test Plan: make check Reviewers: dhruba, igor, sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15213	2014-01-15 18:22:01 -08:00
Igor Canadi	65a8a52b54	Decrease reliance on VersionSet::NumberLevels() Summary: With column families VersionSet will not have a constant number of levels (each CF can have different options), so we'll need to eliminate call to VersionSet::NumberLevels() This diff decreases number of callsites, but we're not there yet. It associates number of levels with Version (each version is associated with single CF) instead of VersionSet. I have also slightly changed how VersionSet keeps track of manifest size. This diff also modifies constructor of Compaction such that it takes input_version and automatically Ref()s it. Before this was done outside of constructor. In next diffs I will continue to decrease number of callsites of VersionSet::NumberLevels() and also references to current_ Test Plan: make check Reviewers: haobo, dhruba, kailiu, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D15171	2014-01-15 16:15:43 -08:00
Kai Liu	cd535c2280	Optimize MayContainHash() Summary: In latest leaf's, MayContainHash() consistently consumes 5%~7% CPU usage. I checked the code and did an experiment with/without inlining this method. In release mode, with `1024 * 1024 * 256` bits and `1024 * 512` entries, both call 2^30 MayContainHash() with distinctive parameters. As the result showed, this patch reduced the running time from 9.127 sec to 7.891 sec. Test Plan: make check Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15177	2014-01-14 22:03:57 -08:00
Igor Canadi	d9cd7a063f	Fix CompactRange to apply filter to every key Summary: When doing CompactRange(), we should first flush the memtable and then calculate max_level_with_files. Also, we want to compact all the levels that have files, including level `max_level_with_files`. This patch fixed the unit test. Test Plan: Added a failing unit test and a fix, so it's not failing anymore. Reviewers: dhruba, haobo, sdong Reviewed By: haobo CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D14421	2014-01-14 16:19:09 -08:00
Naman Gupta	8454cfe569	Add read/modify/write functionality to Put() api Summary: The application can set a callback function, which is applied on the previous value. And calculates the new value. This new value can be set, either inplace, if the previous value existed in memtable, and new value is smaller than previous value. Otherwise the new value is added normally. Test Plan: fbmake. Added unit tests. All unit tests pass. Reviewers: dhruba, haobo Reviewed By: haobo CC: sdong, kailiu, xinyaohu, sumeet, leveldb Differential Revision: https://reviews.facebook.net/D14745	2014-01-14 07:55:16 -08:00
Schalk-Willem Kruger	a09ee1069d	Improve RocksDB "get" performance by computing merge result in memtable Summary: Added an option (max_successive_merges) that can be used to specify the maximum number of successive merge operations on a key in the memtable. This can be used to improve performance of the "get" operation. If many successive merge operations are performed on a key, the performance of "get" operations on the key deteriorates, as the value has to be computed for each "get" operation by applying all the successive merge operations. FB Task ID: #3428853 Test Plan: make all check db_bench --benchmarks=readrandommergerandom counter_stress_test Reviewers: haobo, vamsi, dhruba, sdong Reviewed By: haobo CC: zshao Differential Revision: https://reviews.facebook.net/D14991	2014-01-10 17:33:56 -08:00
Siying Dong	5b5ab0c1a8	[Performance Branch] Fix memory leak in HashLinkListRep.GetIterator() Summary: Full list constructed for full iterator can be leaked. This was a bug introduced when I copy the full iterator codes from hash skip list to hash link list. This patch fixes it. Test Plan: Run valgrind test against db_test and make sure the memory leak is fixed Reviewers: kailiu, haobo Reviewed By: kailiu CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15093	2014-01-10 12:12:28 -08:00
Yancey	afdd2d1a46	fix compile warning	2014-01-10 17:56:35 +08:00
Siying Dong	237a3da677	StopWatch not to get time if it is created for statistics and it is disabled Summary: Currently, even if statistics is not enabled, StopWatch only for the stats still gets the time of the day, which is wasteful. This patch adds a new option to StopWatch to disable this get in this case. Test Plan: make all check Reviewers: dhruba, haobo, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14703 Conflicts: db/db_impl.cc	2014-01-09 17:39:48 -08:00
Siying Dong	424a524ac9	[Performance Branch] A Hashed Linked List Based Mem Table Summary: Implement a mem table, in which keys are hashed based on prefixes. In each bucket, entries are organized in a sorted linked list. It has the same thread safety guarantee as skip list. The motivation is to optimize memory usage for the case that prefix hashing is primary way of seeking to the entry. Compared to hash skip list implementation, this implementation is more memory efficient, but inside each bucket, search is always linear. The target scenario is that there are only very limited number of records in each hash bucket. Test Plan: Add a test case in db_test Reviewers: haobo, kailiu, dhruba Reviewed By: haobo CC: igor, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D14979	2014-01-09 16:19:11 -08:00
Siying Dong	5575316350	StopWatch not to get time if it is created for statistics and it is disabled Summary: Currently, even if statistics is not enabled, StopWatch only for the stats still gets the time of the day, which is wasteful. This patch adds a new option to StopWatch to disable this get in this case. Test Plan: make all check Reviewers: dhruba, haobo, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14703	2014-01-08 16:05:36 -08:00
kailiu	12b6d2b839	Separate the aligned and unaligned memory allocation Summary: Use two vectors for different types of memory allocation. Test Plan: run all unit tests. Reviewers: haobo, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15027	2014-01-08 15:11:42 -08:00
Igor Canadi	17a222670b	Merge branch 'master' into performance	2014-01-07 11:04:21 -08:00
Mike Lin	4c75e21c20	Eliminate stdout message when launching a posix thread. This seems out of place as it's the only time RocksDB prints to stdout in the normal course of operations. Thread IDs can still be retrieved from the LOG file: cut -d ' ' -f2 LOG \| sort \| uniq \| egrep -x '[0-9a-f]+'	2014-01-07 10:44:02 -08:00
kailiu	7e70ff63d6	Fix issue #57	2014-01-06 11:11:19 -08:00
Kai Liu	774ed89c24	Replace vector with autovector Summary: this diff only replace the cases when we need to frequently create vector with small amount of entries. This diff doesn't aim to improve performance of a specific area, but more like a small scale test for the autovector and see how it works in real life. Test Plan: make check I also ran the performance tests, however there is no performance gain/loss. All performance numbers are pretty much the same before/after the change. Reviewers: dhruba, haobo, sdong, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14985	2014-01-02 16:43:35 -08:00
kailiu	f1cec73a76	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h	2013-12-27 12:23:17 -08:00
Siying Dong	18df47b79a	Avoid malloc in NotFound key status if no message is given. Summary: In some places we have NotFound status created with empty message, but it doesn't avoid a malloc. With this patch, the malloc is avoided for that case. The motivation of it is that I found in db_bench readrandom test when all keys are not existing, about 4% of the total running time is spent on malloc of Status, plus a similar amount of CPU spent on free of them, which is not necessary. Test Plan: make all check Reviewers: dhruba, haobo, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14691	2013-12-26 16:23:10 -08:00
Kai Liu	b40c052bfa	Fix all the comparison issue in fb dev servers	2013-12-26 16:13:49 -08:00
kailiu	113a08c929	Fix [-Werror=sign-compare] in autovector_test	2013-12-26 15:47:07 -08:00
kailiu	c01676e46d	Implement autovector Summary: A vector that leverages pre-allocated stack-based array to achieve better performance for array with small amount of items. Test Plan: Added tests for both correctness and performance Here is the performance benchmark between vector and autovector Please note that in the test "Creation and Insertion Test", the test case were designed with the motivation described below: * no element inserted: internal array of std::vector may not really get initialize. * one element inserted: internal array of std::vector must have initialized. * kSize elements inserted. This shows the most time we'll spend if we keep everything in stack. * 2 * kSize elements inserted. The internal vector of autovector must have been initialized. Note: kSize is the capacity of autovector ===================================================== Creation and Insertion Test ===================================================== created 100000 vectors: each was inserted with 0 elements total time elapsed: 128000 (ns) created 100000 autovectors: each was inserted with 0 elements total time elapsed: 3641000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 0 elements total time elapsed: 9896000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 1 elements total time elapsed: 11089000 (ns) created 100000 autovectors: each was inserted with 1 elements total time elapsed: 5008000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 1 elements total time elapsed: 24271000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 4 elements total time elapsed: 39369000 (ns) created 100000 autovectors: each was inserted with 4 elements total time elapsed: 10121000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 4 elements total time elapsed: 28473000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 8 elements total time elapsed: 75013000 (ns) created 100000 autovectors: each was inserted with 8 elements total time elapsed: 18237000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 8 elements total time elapsed: 42464000 (ns) ----------------------------------- created 100000 vectors: each was inserted with 16 elements total time elapsed: 102319000 (ns) created 100000 autovectors: each was inserted with 16 elements total time elapsed: 76724000 (ns) created 100000 VectorWithReserveSizes: each was inserted with 16 elements total time elapsed: 68285000 (ns) ----------------------------------- ===================================================== Sequence Access Test ===================================================== performed 100000 sequence access against vector size: 4 total time elapsed: 198000 (ns) performed 100000 sequence access against autovector size: 4 total time elapsed: 306000 (ns) ----------------------------------- performed 100000 sequence access against vector size: 8 total time elapsed: 565000 (ns) performed 100000 sequence access against autovector size: 8 total time elapsed: 512000 (ns) ----------------------------------- performed 100000 sequence access against vector size: 16 total time elapsed: 1076000 (ns) performed 100000 sequence access against autovector size: 16 total time elapsed: 1070000 (ns) ----------------------------------- Reviewers: dhruba, haobo, sdong, chip Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14655	2013-12-26 15:03:47 -08:00
Kai Liu	5643ae1a3f	Merge pull request #32 from jamesgolick/master Only try to use fallocate if it's actually present on the system.	2013-12-26 14:20:22 -08:00
Siying Dong	abaf26266d	[RocksDB] [Performance Branch] Some Changes to PlainTable format Summary: Some changes to PlainTable format: (1) support variable key length (2) use user defined slice transformer to extract prefixes (3) Run some test cases against PlainTable in db_test and table_test Test Plan: test db_test Reviewers: haobo, kailiu CC: dhruba, igor, leveldb, nkg- Differential Revision: https://reviews.facebook.net/D14457	2013-12-20 12:08:35 -08:00
kailiu	5f5e5fc2e9	Revert `atomic_size_t usage` Summary: By disassemble the function, we found that the atomic variables do invoke the `lock` that locks the memory bus. As a tradeoff, we protect the GetUsage by mutex and leave usage_ as plain size_t. Test Plan: passed `cache_test` Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14667	2013-12-13 17:03:19 -08:00
Haobo Xu	5090316f0d	[RocksDB] [Performance Branch] Trivia build fix Summary: make release complains signed unsigned comparison. Test Plan: make release Reviewers: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14661	2013-12-13 14:21:59 -08:00
kailiu	b660e2d468	Expose usage info for the cache Summary: This diff will help us to figure out the memory usage for the cache part. Test Plan: added a new memory usage test for cache Reviewers: haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14559	2013-12-13 12:53:45 -08:00
James Golick	c28dd2a891	oops - missed a spot	2013-12-11 11:18:00 -08:00
Igor Canadi	e8d40c31b3	[RocksDB perf] Cache speedup Summary: I have ran a get benchmark where all the data is in the cache and observed that most of the time is spent on waiting for lock in LRUCache. This is an effort to optimize LRUCache. Test Plan: The data was loaded with fillseq. Then, I ran a benchmark: /db_bench --db=/tmp/rocksdb_stat_bench --num=1000000 --benchmarks=readrandom --statistics=1 --use_existing_db=1 --threads=16 --disable_seek_compaction=1 --cache_size=20000000000 --cache_numshardbits=8 --table_cache_numshardbits=8 I ran the benchmark three times. Here are the results: AFTER THE PATCH: 798072, 803998, 811807 BEFORE THE PATCH: 782008, 815593, 763017 Reviewers: dhruba, haobo, kailiu Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14571	2013-12-11 08:33:29 -08:00
Haobo Xu	3c02c363b3	[RocksDB] [Performance Branch] Added dynamic bloom, to be used for memable non-existing key filtering Summary: as title Test Plan: dynamic_bloom_test Reviewers: dhruba, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14385	2013-12-11 00:15:14 -08:00
James Golick	43c386b72e	only try to use fallocate if it's actually present on the system	2013-12-10 22:34:19 -08:00
kailiu	c79e595471	Make Cache::GetCapacity constant Summary: This will allow us to access constant via `DB::GetOptions().table_cache.GetCapacity()` or `DB::GetOptions().block_cache.GetCapacity()` since GetOptions() is also constant method.	2013-12-10 17:34:35 -08:00
Igor Canadi	19f5463d3f	Don't LogFlush() in foreground threads Summary: So fflush() takes a lock which is heavyweight. I added flush_pending_, but more importantly, I removed LogFlush() from foreground threads. Test Plan: ./db_test Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14535	2013-12-10 10:57:46 -08:00
Igor Canadi	fb9fce4fc3	[RocksDB] BackupableDB Summary: In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you. Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes. There are multiple things you can configure: 1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like. 2. sync - if true, it guarantees backup consistency on machine reboot 3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. IMPORTANT -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files. 4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup. 5. More things you can find in BackupableDBOptions Here is the directory structure I use: backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot 0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files All the files are ref counted and deleted immediatelly when they get out of scope. Some other stuff in this diff: 1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do. 2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB. Test Plan: I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff. Also, `make asan_check` Reviewers: dhruba, haobo, emayanke Reviewed By: dhruba CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14295	2013-12-09 14:06:52 -08:00
Igor Canadi	9644e0e0c7	Print stack trace on assertion failure Summary: This will help me a lot! When we hit an assertion in unittest, we get the whole stack trace now. Also, changed stack trace a bit, we now include actual demangled C++ class::function symbols! Test Plan: Added ASSERT_TRUE(false) to a test, observed a stack trace Reviewers: haobo, dhruba, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14499	2013-12-06 17:11:09 -08:00
kailiu	c7707f24c2	Refine the statistics	2013-12-06 16:51:35 -08:00
kailiu	551e9428ce	Merge branch 'master' into performance	2013-12-06 14:15:42 -08:00
kailiu	e1d92dfd2e	Fix a bunch of mac compilation issues in performance branch	2013-12-04 23:00:33 -08:00
Vamsi Ponnekanti	fa88cbc71e	[Log dumper broken when merge operator is in log] Summary: $title Test Plan: on my dev box Revert Plan: OK Task ID: # Reviewers: emayanke, dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14451	2013-12-04 16:22:54 -08:00
Mark Callaghan	97aa401e2f	Add compression options to db_bench Summary: This adds 2 options for compression to db_bench: * universal_compression_size_percent * compression_level - to set zlib compression level It also logs compression_size_percent at startup in LOG Task ID: # Blame Rev: Test Plan: make check, run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14439	2013-12-03 14:28:48 -08:00
Igor Canadi	eb12e47e0e	Killing Transform Rep Summary: Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around. This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release. I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14397	2013-12-03 12:42:15 -08:00
lovro	930cb0b9ee	Clarify CompactionFilter thread safety requirements Summary: Documenting our discussion Test Plan: make Reviewers: dhruba, haobo Reviewed By: dhruba CC: igor Differential Revision: https://reviews.facebook.net/D14403	2013-12-02 16:41:43 -08:00
lovro	45a2f2d8d3	Fix build without glibc Summary: The preprocessor does not follow normal rules of && evaluation, tries to evaluate __GLIBC_PREREQ(2, 12) even though the defined() check fails. This breaks the build if __GLIBC_PREREQ is absent. Test Plan: Try adding #undef __GLIBC_PREREQ above the offending line, build no longer breaks Reviewed By: igor Blame Rev: `4c81383628`	2013-12-01 11:32:54 -08:00
Kai Liu	1966b63137	Merge branch 'master' into perf	2013-11-27 11:47:40 -08:00
lovro	4c81383628	Set background thread name with pthread_setname_np() Summary: Makes it easier to monitor performance with top Test Plan: ./manual_compaction_test with `top -H` running. Previously was two `manual_compacti`, now one shows `rocksdb:bg0`. Reviewers: igor, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14367	2013-11-27 11:28:06 -08:00
Haobo Xu	4e6463ea44	[RocksDB][Performance Branch] Make height and branching factor configurable for skiplist implementation Summary: As title. Especially, HashSkipListRepFactory will be able to specify a relatively small height, to reduce the memory overhead of one skiplist per bucket. Test Plan: make check and test it on leaf4 Reviewers: dhruba, sdong, kailiu CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14307	2013-11-26 21:59:36 -08:00
Siying Dong	8aac46d686	[RocksDB Performance Branch] Fix a regression bug of munmap Summary: Fix a stupid bug I just introduced in `b59d4d5a50`, which I didn't even mean to include. GCC might remove the munmap. Test Plan: Run it and make sure munmap succeeds Reviewers: haobo, kailiu Reviewed By: kailiu CC: dhruba, reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14361	2013-11-26 14:05:37 -08:00
Haobo Xu	5b825d6964	[RocksDB] Use raw pointer instead of shared pointer when passing Statistics object internally Summary: liveness of the statistics object is already ensured by the shared pointer in DB options. There's no reason to pass again shared pointer among internal functions. Raw pointer is sufficient and efficient. Test Plan: make check Reviewers: dhruba, MarkCallaghan, igor Reviewed By: dhruba CC: leveldb, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14289	2013-11-25 10:38:15 -08:00
Siying Dong	3e35aa6412	Revert "Allow users to profile a query and see bottleneck of the query" This reverts commit `3d8ac31d71`.	2013-11-21 17:40:39 -08:00
Siying Dong	b135d01e7b	Allow users to profile a query and see bottleneck of the query Summary: Provide a framework to profile a query in detail to figure out latency bottleneck. Currently, in Get(), Put() and iterators, 2-3 simple timing is used. We can easily add more profile counters to the framework later. Test Plan: Enable this profiling in seveal existing tests. Reviewers: haobo, dhruba, kailiu, emayanke, vamsi, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14001 Conflicts: table/merger.cc	2013-11-21 17:39:19 -08:00
Siying Dong	3d8ac31d71	Allow users to profile a query and see bottleneck of the query Summary: Provide a framework to profile a query in detail to figure out latency bottleneck. Currently, in Get(), Put() and iterators, 2-3 simple timing is used. We can easily add more profile counters to the framework later. Test Plan: Enable this profiling in seveal existing tests. Reviewers: haobo, dhruba, kailiu, emayanke, vamsi, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14001	2013-11-21 16:29:57 -08:00
Siying Dong	58e1956d50	[Only for Performance Branch] A Hacky patch to lazily generate memtable key for prefix-hashed memtables. Summary: For prefix mem tables, encoding mem table key may be unnecessary if the prefix doesn't have any key. This patch is a little bit hacky but I want to try out the performance gain of removing this lazy initialization. In longer term, we might want to revisit the way we abstract mem tables implementations. Test Plan: make all check Reviewers: haobo, igor, kailiu Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14265	2013-11-20 20:49:23 -08:00
Siying Dong	b59d4d5a50	A Simple Plain Table Summary: A Simple plain table format. No block structure. When creating the table reader, scanning the full table to create indexes. Test Plan:Add unit test Reviewers:haobo,dhruba,kailiu CC: Task ID: # Blame Rev:	2013-11-20 18:44:22 -08:00
kailiu	6eb5649800	Move flush_block_policy from Options to TableFactory Summary: Previously we introduce a `flush_block_policy_factory` in Options, however, that options is strongly releated to Table based tables. It will make more sense to move it to block based table's own factory class. Test Plan: make check to pass existing tests Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14211	2013-11-19 22:00:48 -08:00
kailiu	1415f8820d	Improve the "table stats" Summary: The primary motivation of the changes is to make it easier to figure out the inside of the tables. * rename "table stats" to "table properties" since now we have more than "integers" to store in the property block. * Add filter block size to the basic table properties. * Whenever a table is built, we'll log the table properties (the sample output is in Test Plan). * Make an api to expose deleted keys. Test Plan: Passed all existing test. and the sample output of table stats: ================================================================== Basic Properties ------------------------------------------------------------------ # data blocks: 1 # entries: 1 raw key size: 9 raw average key size: 9 raw value size: 9 raw average value size: 0 data block size: 25 index block size: 27 filter block size: 18 (estimated) table size: 70 filter policy: rocksdb.BuiltinBloomFilter ================================================================== User collected properties: InternalKeyPropertiesCollector ------------------------------------------------------------------ kDeletedKeys: 1 ================================================================== Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14187	2013-11-19 16:29:42 -08:00
Igor Canadi	f611aba559	Move the compiler back to 4.8.1 + more small fixes Summary: 1. Moved the compiler back to 4.8.1 and uses Centos 5.2 binaries if OS is Centos 5.2. 2. Fixes this issue: https://github.com/facebook/rocksdb/issues/7 3. We use lot of c++11 features, so we can't pretend we can compile without them. Makes it a first class dependency. 4. Fix blob_store_test, which failes on Ubuntu with "too many files opened" error 5. Removed dependency on port/port_chromium.h, which does not even exist on our system Test Plan: make clean; make check Reviewers: dhruba, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D14145	2013-11-18 11:40:16 -08:00
kailiu	97d8e573a6	make util/env_posix.cc work under mac Summary: This diff invoves some more complicated issues in the posix environment. Test Plan: works under mac os. will need to verify dev box. Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14061	2013-11-16 23:44:39 -08:00
Pascal Borreli	443e04e62d	Fixed typos	2013-11-16 11:21:34 +00:00
Kai Liu	80bb81c6fe	Add the correct table_factory for tables in table_tests	2013-11-12 23:54:31 -08:00
Kai Liu	22e1b04deb	Quick fix for a string format Summary: Fix one more string format issue that throws warning in mac	2013-11-12 21:22:32 -08:00
Kai Liu	35460ccb53	Fix the string format issue Summary: mac and our dev server has totally differnt definition of uint64_t, therefore fixing the warning in mac has actually made code in linux uncompileable. Test Plan: make clean && make -j32	2013-11-12 21:05:39 -08:00
kailiu	21587760b9	Fixing the warning messages captured under mac os # Consider using `git commit -m 'One line title' && arc diff`. # You will save time by running lint and unit in the background. Summary: The work to make sure mac os compiles rocksdb is not completed yet. But at least we can start cleaning some warnings captured only by g++ from mac os.. Test Plan: ran make in mac os Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14049	2013-11-12 20:05:28 -08:00
Igor Canadi	9bc4a26f56	Small changes in Deleting obsolete files Summary: @haobo's suggestions from https://reviews.facebook.net/D13827 Renaming some variables, deprecating purge_log_after_flush, changing for loop into auto for loop. I have not implemented deleting objects outside of mutex yet because it would require a big code change - we would delete object in db_impl, which currently does not know anything about object because it's defined in version_edit.h (FileMetaData). We should do it at some point, though. Test Plan: Ran deletefile_test Reviewers: haobo Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14025	2013-11-12 11:53:26 -08:00
lovro	8a46ecd357	WriteBatch::Put() overload that gathers key and value from arrays of slices Summary: In our project, when writing to the database, we want to form the value as the concatenation of a small header and a larger payload. It's a shame to have to copy the payload just so we can give RocksDB API a linear view of the value. Since RocksDB makes a copy internally, it's easy to support gather writes. Test Plan: write_batch_test, new test case Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13947	2013-11-08 16:34:32 -08:00
Igor Canadi	1510339e52	Speed up FindObsoleteFiles Summary: Here's one solution we discussed on speeding up FindObsoleteFiles. Keep a set of all files in DBImpl and update the set every time we create a file. I probably missed few other spots where we create a file. It might speed things up a bit, but makes code uglier. I don't really like it. Much better approach would be to abstract all file handling to a separate class. Think of it as layer between DBImpl and Env. Having a separate class deal with file namings and deletion would benefit both code cleanliness (especially with huge DBImpl) and speed things up. It will take a huge effort to do this, though. Let's discuss offline today. Test Plan: Ran ./db_stress, verified that files are getting deleted Reviewers: dhruba, haobo, kailiu, emayanke Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D13827	2013-11-08 15:23:46 -08:00
Igor Canadi	8b3379dc0a	Implementing DynamicIterator for TransformRepNoLock Summary: What @haobo done with TransformRep, now in TransformRepNoLock. Similar implementation, except that I made DynamicIterator a subclass of Iterator which makes me have less iterator initializations. Test Plan: ./prefix_test. Seeing huge savings vs. TransformRep again! Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D13953	2013-11-08 00:31:09 -08:00
Kai Liu	fd075d6edd	Provide mechanism to configure when to flush the block Summary: Allow block based table to configure the way flushing the blocks. This feature will allow us to add support for prefix-aligned block. Test Plan: make check Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D13875	2013-11-07 21:27:21 -08:00
Igor Canadi	444cf88a56	Flush the log outside of lock Summary: Added a new call LogFlush() that flushes the log contents to the OS buffers. We never call it with lock held. We call it once for every Read/Write and often in compaction/flush process so the frequency should not be a problem. Test Plan: db_test Reviewers: dhruba, haobo, kailiu, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13935	2013-11-07 11:31:56 -08:00

1 2 3 4 5 ...

403 Commits