rocksdb

Author	SHA1	Message	Date
Igor Canadi	9fe87b17aa	Fix compile	2014-06-20 10:36:48 +02:00
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	2014-06-20 10:23:02 +02:00
Haobo Xu	7a9dd5f214	[RocksDB] Make block based table hash index more adaptive Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index. Test Plan: unit test, also manually veried LOG that fallback did occur. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19191	2014-06-19 16:40:32 -07:00
Lei Jin	1ec2d1c69d	fix make shared_lib compilation error Summary: s/class ParsedInternalKey/struct ParsedInternalKey Test Plan: make shared_lib Reviewers: igor, yhchiang, sdong, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19173	2014-06-19 10:12:26 -07:00
Haobo Xu	167738256f	[RocksDB] Fix unit test Summary: fix a bug in D19047, which caused DBTest.RecoverDuringMemtableCompaction to fail. Test Plan: unit test Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19155	2014-06-19 01:37:21 -07:00
sdong	edd47c5104	PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key Summary: Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes. The data format is documented in table/plain_table_factory.h Test Plan: Add unit test coverage in plain_table_db_test Reviewers: yhchiang, igor, dhruba, ljin, haobo Reviewed By: haobo Subscribers: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18735	2014-06-18 20:41:52 -07:00
Haobo Xu	0f0076ed5a	[RocksDB] Reduce memory footprint of the blockbased table hash index. Summary: Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption. Just a quick draft, still testing and refining. Test Plan: unit test and shadow testing Reviewers: dhruba, kailiu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19047	2014-06-18 18:16:07 -07:00
Igor Canadi	3525aac9e5	Change order of parameters in adaptive table factory Summary: This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation: 1) block based factory 2) plain table factory 3) output factory 4) new format factory I think it makes more sense to have output as the first parameter. Also, fixed a NewAdaptiveTableFactory() call in unit test Test Plan: unit test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19119	2014-06-18 07:04:37 +02:00
sdong	200e4b4a72	Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format. Test Plan: add a unit test Reviewers: haobo, igor Reviewed By: igor Subscribers: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19017	2014-06-17 11:49:22 -07:00
Lei Jin	c83b085770	prefetch bloom filter data block for L0 files Summary: as title Test Plan: db_bench the initial result is very promising. I will post results of complete runs Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18867	2014-06-12 10:06:18 -07:00
sdong	88a1691a1e	BlockBasedTable::PrefixMayMatch() to bloom setting to the beginning of the function Summary: In BlockBasedTable::PrefixMayMatch() we calculate prefix even if bloom is not config. Move the check before Test Plan: make all check Reviewers: igor, ljin Reviewed By: ljin Subscribers: wuj, leveldb, haobo, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18993	2014-06-10 11:14:22 -07:00
sdong	80f409ea37	Clean PlainTableReader's variables for better data locality Summary: Clean PlainTableReader's data structures: (1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization) (2) remove some variables only used when initialization from the class (3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly (4) make Options a reference. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18891	2014-06-09 13:53:39 -07:00
Igor Canadi	f43c8262c2	Don't compress block bigger than 2GB Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446. Test Plan: make check doesn't complain :) Reviewers: haobo, ljin, yhchiang, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18975	2014-06-09 12:26:09 -07:00
sdong	df9069d23f	In DB::NewIterator(), try to allocate the whole iterator tree in an arena Summary: In this patch, try to allocate the whole iterator tree starting from DBIter from an arena 1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it. 2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator. 3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it. Limitations: (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc (2) Two level iterator itself is allocated in arena, but not iterators inside it. Test Plan: make all check Reviewers: ljin, haobo Reviewed By: haobo Subscribers: leveldb, dhruba, yhchiang, igor Differential Revision: https://reviews.facebook.net/D18513	2014-06-02 17:44:57 -07:00
Lei Jin	388d2054c7	forward iterator Summary: Forward iterator puts everything together in a flat structure instead of a hierarchy of nested iterators. this should simplify the code and provide better performance. It also enables more optimization since all information are accessiable in one place. Init evaluation shows about 6% improvement Test Plan: db_test and db_bench Reviewers: dhruba, igor, tnovak, sdong, haobo Reviewed By: haobo Subscribers: sdong, leveldb Differential Revision: https://reviews.facebook.net/D18795	2014-05-30 14:31:55 -07:00
Kai Liu	0b3d03d026	Materialize the hash index Summary: Materialize the hash index to avoid the soaring cpu/flash usage when initializing the database. Test Plan: existing unit tests passed Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D18339	2014-05-15 14:09:03 -07:00
Igor Canadi	26f5dd9a5a	TablePropertiesCollectorFactory Summary: This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options. Here's description of task #4296714: I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB. For example, this table has 3155 entries and 1391828 deleted keys. The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for. Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API. In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe. Test Plan: Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one). Also, all other tests Reviewers: sdong, dhruba, haobo, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D18579	2014-05-13 12:30:55 -07:00
Igor Canadi	fec4269966	Fix more gflag namespace issues	2014-05-09 08:41:02 -07:00
sdong	ddd41146c4	MergingIterator uses autovector instead of vector Summary: Use autovector in MergingIterator so that if there are 4 or less child iterators in it, iterator wrappers are inline, which is more likely to be cache friendly. Based on one test run with a shadow traffic of one product, it reduces CPU of MergingIterator::Seek() by half. Test Plan: make all check Reviewers: haobo, yhchiang, igor, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D18531	2014-05-08 15:01:20 -07:00
Igor Canadi	b5616dafd1	Fix iOS compile	2014-05-07 17:48:31 -07:00
sdong	3a171dcb51	Pass logger to memtable rep and TLB page allocation error logged to info logs Summary: TLB page allocation errors are now logged to info logs, instead of stderr. In order to do that, mem table rep's factory functions take a info logger now. Test Plan: make all check Reviewers: haobo, igor, yhchiang Reviewed By: yhchiang CC: leveldb, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18471	2014-05-05 16:43:37 -07:00
sdong	9b17558311	PlainTableFactory::PlainTableFactory() to have huge TLB turned off by default Summary: PlainTableFactory::PlainTableFactory() now has Huge TLB page feature turned on by default. Although it is not a public API (which we always turn the feature off now), our unit tests, like db_test sometimes uses it directly, which causes wrong coverage of codes. This patch fix it to allow unit tests to run with the correct setting Test Plan: Run db_test and make sure this feature is not on any more. Reviewers: igor, haobo Reviewed By: igor CC: yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18483	2014-05-05 11:05:54 -07:00
sdong	4a7c747064	Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"" And make the default 0 for hash linked list memtable This reverts commit d69dc64be78a8da3ce661454655966d11ff61bb6.	2014-05-04 13:56:29 -07:00
Igor Canadi	d69dc64be7	Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB" This reverts commit 7dafa3a1d7e63d4cc4a7d717ab958efc616a2892.	2014-05-04 08:37:09 -07:00
Igor Canadi	d28ed6931f	fix release build	2014-05-01 12:42:06 -07:00
Igor Canadi	d29e48bb2e	fix compile warning	2014-05-01 14:12:35 -04:00
Igor Canadi	0afc8bc29a	xxHash Summary: Originally: https://github.com/facebook/rocksdb/pull/87/files I'm taking over to apply some finishing touches Test Plan: will add tests Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18315	2014-05-01 14:09:32 -04:00
sdong	7dafa3a1d7	Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, dhruba, leveldb, igor, yhchiang Differential Revision: https://reviews.facebook.net/D18357	2014-04-30 11:02:26 -07:00
Igor Canadi	c489499a2b	Fix OSX compile	2014-04-26 17:15:43 -04:00
Lei Jin	ccaca59bee	avoid calling FindFile twice in TwoLevelIterator for PlainTable Summary: this is to reclaim the regression introduced in https://reviews.facebook.net/D17853 Test Plan: make all check Reviewers: igor, haobo, sdong, dhruba, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17985	2014-04-25 12:23:07 -07:00
Lei Jin	d642c60bdc	Check PrefixMayMatch on Seek() Summary: As a follow-up diff for https://reviews.facebook.net/D17805, add optimization to check PrefixMayMatch on Seek() Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17853	2014-04-25 12:22:23 -07:00
Lei Jin	3995e801ab	kill ReadOptions.prefix and .prefix_seek Summary: also add an override option total_order_iteration if you want to use full iterator with prefix_extractor Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17805	2014-04-25 12:21:34 -07:00
sdong	86a0133d05	PlainTableReader to expose index size to users Summary: This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files. In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them. Test Plan: Add a unit test. make all check` Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18195	2014-04-22 19:29:05 -07:00
Yueh-Hsuan Chiang	af6ad113a8	Fix SIGFAULT when running sst_dump on v2.6 db Summary: Fix the sigfault when running sst_dump on v2.6 db. Test Plan: git checkout bba6595b1f3f42cf79bb21c2d5b981ede1cc0063 make clean make db_bench ./db_bench --db=/tmp/some/db --benchmarks=fillseq arc patch D18039 make clean make sst_dump ./sst_dump --file=/tmp/some/db --command=check Reviewers: igor, haobo, sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D18039	2014-04-21 17:49:47 -07:00
Igor Canadi	8dc34364d2	Rename "benchmark" back to "bench". Also, make `benchharness.cc` not compiled into rocksdb library.	2014-04-21 13:12:15 -07:00
Pratyush Seth	ff1b5df4c6	Added benchmark functionality on the lines of folly/Benchmark.h Summary: Added benchmark functionality on the lines of folly/Benchmark.h Test Plan: Added unit tests Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17973	2014-04-21 12:29:55 -07:00
sdong	27d3bc184e	Use a different approach to make sure BlockBasedTableReader can use hash index on older files Summary: A recent commit `e37dd216f9` makes sure hash index can be used when reading existing files. This patch uses another way to achieve the approach: (1) Currently, always writing kBinarySearch to files, despite of BlockBasedTableOptions.IndexType setting. (2) When reading a file, read out the field, and make sure it is kBinarySearch, while always use index type by users. The reason for doing it is, to reserve kHashSearch property on disk to future. If now we write out binary index for both of kHashSearch and kBinarySearch. We have to use a new flag in the future for hash index on disk, otherwise compatibility would break. Also, we want the real index type and type shown in properties block to be consistent. Test Plan: make all check Reviewers: haobo, kailiu Reviewed By: kailiu CC: igor, ljin, yhchiang, xjin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18009	2014-04-18 14:09:21 -07:00
Kai Liu	e37dd216f9	Index type doesn't have to be persisted Summary: With the recent changes, there is no need to check the property block about the index block type. If user want to use it, they don't really need any disk format change; everything happens in the fly. Also another team encountered an error while reading the index type from properties. Test Plan: ran all the tests Reviewers: sdong CC: Task ID: # Blame Rev:	2014-04-17 11:08:12 -07:00
sdong	5cef458a2c	RocksDB 2.8 to be able to read files generated by 2.6 Summary: From 2.6 to 2.7, property block name is renamed from rocksdb.stats to rocksdb.properties. Older properties were not able to be loaded. In 2.8, we seem to have added some logic that uses property block without checking null pointers, which create segment faults. In this patch, we fix it by: (1) try rocksdb.stats if rocksdb.properties is not found (2) add some null checking before consuming rep->table_properties Test Plan: make sure a file generated in 2.7 couldn't be opened now can be opened. Reviewers: haobo, igor, yhchiang Reviewed By: igor CC: ljin, xjin, dhruba, kailiu, leveldb Differential Revision: https://reviews.facebook.net/D17961	2014-04-17 09:51:43 -07:00
Igor Canadi	588bca2020	RocksDBLite Summary: Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile. Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping) Test Plan: compiles :) Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17835	2014-04-15 13:39:26 -07:00
Kai Liu	e23e73e67c	Use shorten index key for hash-index Summary: I was wrong about the "index builder", right now since we create index by scanning both whole table and index, there is not need to preserve the whole key as the index key. I switch back to original way index which is both space efficient and able to supprot in-fly construction of hash index. IN this patch, I made minimal change since I'm not sure if we still need the "pluggable index builder", under current circumstance it is of no use and kind of over-engineered. But I'm not sure if we can still exploit its usefulness in the future; otherwise I think I can just burn them with great vengeance. Test Plan: unit tests Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17745	2014-04-10 22:45:25 -07:00
Kai Liu	75b59d5146	Enable hash index for block-based table Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index. Test Plan: Wrote several new unit tests. Reviewers: sdong, haobo, dhruba Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16539	2014-04-10 14:19:43 -07:00
Igor Canadi	4daea66343	Turn on -Wmissing-prototypes Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes. Test Plan: compiles Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17649	2014-04-09 21:17:14 -07:00
sdong	e9ed28f9c8	PlainTableBuilder::Add() to use local char array instead of reused std::string as tmp buffer Summary: Our profile shows that in one of the applications, 5% of the CPU costs of PlainTableBuilder::Add() are spent on std::string stacks. By this simple change, we avoid this global reusable string. Also, we avoid another call of file appending, which probably gives another 2%. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17601	2014-04-09 10:17:32 -07:00
Igor Canadi	34455deb06	Fix Mac OS compile issues	2014-04-08 14:05:53 -07:00
Igor Canadi	5b345b76cb	Remove env_ from MergingIterator Summary: env_ is not used. Compiling for iOS complains. Test Plan: compiles now Reviewers: ljin, haobo, sdong, dhruba Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17589	2014-04-08 13:40:42 -07:00
Lei Jin	92c1eb0291	macros for perf_context Summary: This will allow us to disable them completely for iOS or for better performance Test Plan: will run make all check Reviewers: igor, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17511	2014-04-08 10:58:07 -07:00
sdong	5e2db3b434	PlainTableIterator not to store copied key in std::string Summary: Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h. This patch improves iterator performance significantly. Running this benchmark: ./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond The average latency is improved to about 750 nanoseconds from 1100 nanoseconds. Test Plan: Add a unit test. make all check Reviewers: haobo, ljin Reviewed By: haobo CC: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D17547	2014-04-07 19:06:09 -07:00
Igor Canadi	3d2fe844ab	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/memtable_list.cc db/version_set.cc	2014-04-07 11:31:11 -07:00
Igor Canadi	bcd1f15b60	Remove -Wno-unused-const-variable	2014-04-04 16:15:47 -07:00

1 2 3 4 5

245 Commits