rocksdb

Author	SHA1	Message	Date
Igor Canadi	6c2ca1d3e6	Move NeedsCompaction() from VersionSet to Version Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet. Test Plan: make check Reviewers: kailiu, haobo, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15333	2014-01-27 09:59:00 -08:00
Igor Canadi	e55b3c040c	Fixing ref-counting memtables	2014-01-26 18:03:38 -08:00
Igor Canadi	983fafa56c	Fix memory leak	2014-01-25 13:57:11 -08:00
Kai Liu	4b51dffcf8	Some refactorings on plain table Summary: Plain table has been working well and this is just a nit-picking patch, which is generated during my coding reading. No real functional changes. only some changes regarding: * Improve some comments from the perspective a "new" code reader. * Change some magic number to constant, which can help us to parameterize them in the future. * Did some style, naming, C++ convention changes. * Fix warnings from new "arc lint" Test Plan: make check Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15429	2014-01-24 21:28:10 -08:00
Igor Canadi	68a91a2e6a	missing include	2014-01-24 18:40:05 -08:00
Igor Canadi	5356b2a680	Merge branch 'master' into columnfamilies	2014-01-24 18:34:48 -08:00
Igor Canadi	04afa32134	Fix reduce levels ReduceNumberOfLevels had segmentation fault in WriteSnapshot() since we didn't change the number of levels in VersionSet (we consider them immutable from now on). This fixes the problem.	2014-01-24 18:32:38 -08:00
Siying Dong	8477255da3	Moving Some includes from options.h to forward declaration Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies. Test Plan: make all check Reviewers: kailiu, haobo, igor, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15411	2014-01-24 17:16:22 -08:00
Igor Canadi	6a404de4ba	Merge branch 'master' into columnfamilies	2014-01-24 15:54:18 -08:00
Igor Canadi	f653fdcf5a	Fixing iterator cleanup for Tailing iterator Immutable tailing iterator doesn't set CleanupState::mem, so we don't have to unref it.	2014-01-24 15:51:06 -08:00
Igor Canadi	1423e7c9de	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc db/version_set_reduce_num_levels.cc util/ldb_cmd.cc	2014-01-24 15:03:54 -08:00
Igor Canadi	b13bdfa500	Add a call DisownData() to Cache, which should speed up shutdown Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache. Test Plan: I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368 Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB. Reviewers: dhruba, haobo, MarkCallaghan, xjin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15069	2014-01-24 14:57:52 -08:00
Igor Canadi	677fee27c6	Make VersionSet::ReduceNumberOfLevels() static Summary: A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method. Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file. Test Plan: reduce_levels_test Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15303	2014-01-24 14:57:04 -08:00
Igor Canadi	c583157d49	MemTableListVersion Summary: MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful. This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team. I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet()) Test Plan: `make check` works. I will also do `make valgrind_check` before commit Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15255	2014-01-24 14:52:08 -08:00
Kai Liu	0ab766132b	Re-org the table tests Summary: We'll divide the table tests into 3 buckets, plain table test, block-based table test and general table feature test. This diff does no real change and only does the rename and reorg. Test Plan: run table_test Reviewers: sdong, haobo, igor, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15417	2014-01-24 12:46:17 -08:00
kailiu	f131d4c280	Add a make target for shared library Summary: Previous we made `make release` also compile shared library. However it takes a long time to complete. To make our development process more efficient. I added a new make target shared_lib. User can of course run `make <library_name>` for direct compilation. However the <library_name> changed under certain condition. Thus we need `make shared_lib` to get rid of the memorization from users' side. Test Plan: make shared_lib Reviewers: igor, sdong, haobo, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15309	2014-01-24 11:56:01 -08:00
Igor Canadi	e832e72b31	Revert "Moving to glibc-fb" This reverts commit `d24961b65e`. For some reason, glibc2.17-fb breaks gflags. Reverting for now	2014-01-24 11:50:38 -08:00
Kai Liu	7d991be400	Some small refactorings on table_test Summary: Just revise some hard-to-read or unnecessarily verbose code. Test Plan: make check	2014-01-24 11:10:32 -08:00
kailiu	66dc033af3	Temporarily disable caching index/filter blocks Summary: Mixing index/filter blocks with data blocks resulted in some known issues. To make sure in next release our users won't be affected, we added a new option in BlockBasedTableFactory::TableOption to conceal this functionality for now. This patch also introduced a BlockBasedTableReader::OpenOptions, which avoids the "infinite" growth of parameters in BlockBasedTableReader::Open(). Test Plan: make check Reviewers: haobo, sdong, igor, dhruba Reviewed By: igor CC: leveldb, tnovak Differential Revision: https://reviews.facebook.net/D15327	2014-01-24 10:57:15 -08:00
Igor Canadi	d24961b65e	Moving to glibc-fb Summary: It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group. This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works. Test Plan: Compiled Ran ./db_bench ./db_stress ./db_repl_stress Reviewers: kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15405	2014-01-24 10:24:08 -08:00
Siying Dong	4605e20c58	If User setting of compaction multipliers overflow, use default value 1 instead Summary: Currently, compaction multipliers can overflow and cause unexpected behaviors. In this patch, we detect those overflows and use multiplier 1 for them. Test Plan: make all check Reviewers: dhruba, haobo, igor, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15321	2014-01-24 10:14:23 -08:00
Igor Canadi	28d1a0c6f5	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.h db/db_test.cc include/rocksdb/db.h include/utilities/stackable_db.h	2014-01-24 09:27:29 -08:00
Igor Canadi	09489d395f	Fix a bug in DBImpl::CreateColumnFamily	2014-01-24 09:12:00 -08:00
kailiu	eda924a03a	Remove an unused `GetLengthPrefixedSlice` Summary: We have 3 versions of GetLengthPrefixedSlice() and one of them is no longer in use. Test Plan: make Reviewers: sdong, igor, haobo, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15399	2014-01-23 23:06:52 -08:00
Lei Jin	aba2acb5ec	CompactRange() to return status Summary: as title Test Plan: make all check What else tests shall I cover? Reviewers: igor, haobo CC: Differential Revision: https://reviews.facebook.net/D15339	2014-01-23 16:41:46 -08:00
Kai Liu	054c5dda8c	Merge branch 'master' into performance Conflicts: db/db_impl.cc db/db_test.cc db/memtable.cc db/version_set.cc include/rocksdb/statistics.h util/statistics_imp.h	2014-01-23 16:32:49 -08:00
Tomislav Novak	81c9cc9b3b	Tailing iterator Summary: This diff implements a special type of iterator that doesn't create a snapshot (can be used to read newly inserted data) and is optimized for doing sequential reads. TailingIterator uses current superversion number to determine whether to invalidate its internal iterators. If the version hasn't changed, it can often avoid doing expensive seeks over immutable structures (sst files and immutable memtables). Test Plan: * new unit tests * running LD with this patch Reviewers: igor, dhruba, haobo, sdong, kailiu Reviewed By: sdong CC: leveldb, lovro, march Differential Revision: https://reviews.facebook.net/D15285	2014-01-23 16:26:08 -08:00
Igor Canadi	4e91f27c3a	Fix performance regression in statistics Summary: For some reason, D15099 caused a big performance regression: https://fburl.com/16059000 After digging a bit, I figured out that the reason was that std::atomic_uint_fast64_t was allocated in an array. When I switched from an array to vector, the QPS returned to the previous level. I'm not sure why this is happening, but this diff seems to fix the performance regression. Test Plan: I ran the regression script, observed the performance going back to normal Reviewers: tnovak, kailiu, haobo Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15375	2014-01-23 16:11:55 -08:00
Kai Liu	bb19b530ca	Aggressively inlining the short functions in coding.cc Summary: This diff takes an even more aggressive way to inline the functions. A decent rule that I followed is "not inline a function if it is more than 10 lines long." Normally optimizing code by inline is ugly and hard to control, but since one of our usecase has significant amount of CPU used in functions from coding.cc, I'd like to try this diff out. Test Plan: 1. the size for some .o file increased a little bit, but most less than 1%. So I think the negative impact of inline is negligible. 2. As the regression test shows (ran for 10 times and I calculated the average number) Metrics Befor After ======================================================================== rocksdb.build.fillseq.qps 426595 444515 (+4.6%) rocksdb.build.memtablefillrandom.qps 121739 123110 rocksdb.build.memtablereadrandom.qps 1285103 1280520 rocksdb.build.overwrite.qps 125816 135570 (+9%) rocksdb.build.readrandom_fillunique_random.qps 285995 296863 rocksdb.build.readrandom_memtable_sst.qps 1027132 1027279 rocksdb.build.readrandom.qps 1041427 1054665 rocksdb.build.readrandom_smallblockcache.qps 1028631 1038433 rocksdb.build.readwhilewriting.qps 918352 914629 Reviewers: haobo, sdong, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15291	2014-01-23 16:03:34 -08:00
kailiu	d0458469c8	Add google-style checker to "arc lint" Summary: After we reached a consensus on code format, which follows exactly Google's coding style, a natural follow-up is to have a style checker that can handle stuffs beyond format. Google already has a powerful style checker "cpplint.py" and, luckily, phabricator already provides the built-in linter for it! Next time with "arc lint" most style inconsistency will be detected (but will not be fixed). Also I copied cpplint.py to linters directory, which is mostly because we may need the flexibility to make some modifications on it for our own need. Test Plan: ran arc lint table/block_based_table_builder.cc to see the amazing results. Reviewers: haobo, sdong, igor, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15369	2014-01-23 15:04:12 -08:00
Igor Canadi	7c5e583a27	ColumnFamilySet Summary: I created a separate class ColumnFamilySet to keep track of column families. Before we did this in VersionSet and I believe this approach is cleaner. Let me know if you have any comments. I will commit tomorrow. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15357	2014-01-23 14:03:38 -08:00
Igor Canadi	f9a25dda9f	Fix wrong merge	2014-01-22 11:30:19 -08:00
Igor Canadi	92a022ad07	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/db_impl_readonly.cc db/version_set.cc	2014-01-22 10:59:07 -08:00
Igor Canadi	fb01755aa4	Unfriending classes Summary: In this diff I made some effort to reduce usage of friending. To do that, I had to expose Compaction::inputs_ through a method inputs(). Not sure if this is a good idea, there is a trade-off. I think it's less confusing than having lots of friends. I also thought about other friendship relationships, but they are too much tangled at this point. Once you friend two classes, it's very hard to unfriend them :) Test Plan: make check Reviewers: haobo, kailiu, sdong, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15267	2014-01-22 10:55:16 -08:00
Igor Canadi	6fe9b57748	Refactor Recover() code Summary: This diff does two things: * Rethinks how we call Recover() with read_only option. Before, we call it with pointer to memtable where we'd like to apply those changes to. This memtable is set in db_impl_readonly.cc and it's actually DBImpl::mem_. Why don't we just apply updates to mem_ right away? It seems more intuitive. * Changes when we apply updates to manifest. Before, the process is to recover all the logs, flush it to sst files and then do one giant commit that atomically adds all recovered sst files and sets the next log number. This works good enough, but causes some small troubles for my column family approach, since I can't have one VersionEdit apply to more than single column family[1]. The change here is to commit the files recovered from logs right away. Here is the state of the world before the change: 1. Recover log 5, add new sst files to edit 2. Recover log 7, add new sst files to edit 3. Recover log 8, add new sst files to edit 4. Commit all added sst files to manifest and mark log files 5, 7 and 8 as recoverd (via SetLogNumber(9) function) After the change, we'll do: 1. Recover log 5, commit the new sst files and set log 5 as recovered 2. Recover log 7, commit the new sst files and set log 7 as recovered 3. Recover log 8, commit the new sst files and set log 8 as recovered The added (small) benefit is that if we fail after (2), the new recovery will only have to recover log 8. In previous case, we'll have to restart the recovery from the beginning. The bigger benefit will be to enable easier integration of multiple column families in Recovery code path. [1] I'm happy to dicuss this decison, but I believe this is the cleanest way to go. It also makes backward compatibility much easier. We don't have a requirement of adding multiple column families atomically. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15237	2014-01-22 10:45:26 -08:00
kailiu	4036d58dc9	Fix a Statistics-related unit test faulure Summary: In my MacOS, the member variables are populated with random numbers after initialization. This diff fixes it by fill these arrays with 0. Test Plan: make && ./table_test Reviewers: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15315	2014-01-21 18:02:55 -08:00
Igor Canadi	23f6791c9e	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl_readonly.cc db/db_test.cc db/version_edit.cc db/version_edit.h db/version_set.cc db/version_set.h db/version_set_reduce_num_levels.cc	2014-01-21 17:01:52 -08:00
Siying Dong	7dea558e6d	[Performance Branch] Fix a bug when merging from master Summary: Commit "1304d8c8cefe66be1a3caa5e93413211ba2486f2" (Merge branch 'master' into performance) removes a line in performance branch by mistake. This patch fixes it. Test Plan: make all check Reviewers: haobo, kailiu, igor Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15297	2014-01-21 12:44:43 -08:00
Mark Callaghan	4e8321bfea	Boost access before mutex is unlocked Summary: This moves the use of versions_ to before the mutex is unlocked to avoid a possible race. Task ID: # Blame Rev: Test Plan: make check Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15279	2014-01-17 21:32:23 -08:00
Kai Liu	ef602f6275	Misc cleanup on performance branch Summary: Did some trivial stuffs: * Add more comments; * fix compiler's warning messages (uninitialized variables). * etc Test Plan: make check	2014-01-17 14:26:29 -08:00
Igor Canadi	83681bf9ef	Statistics code cleanup Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review. Test Plan: make check Reviewers: haobo, kailiu, sdong, dhruba, tnovak Reviewed By: tnovak CC: leveldb Differential Revision: https://reviews.facebook.net/D15099	2014-01-17 12:46:06 -08:00
Igor Canadi	8079dd5d24	Merge branch 'master' into performance	2014-01-17 12:20:07 -08:00
Igor Canadi	0f4a75b710	Fix SIGSEGV in compaction picker Summary: The SIGSEGV was introduced by https://reviews.facebook.net/D15171 I also fixed ExpandWhileOverlapping() which returned the failure by setting it's own stack variable to nullptr (!). This bug is present in 2.6 release, so I guess ExpandWhileOverlapping never fails :) Test Plan: `make check`. Also MarkCallaghan confirmed it fixed the SIGSEGV he reported. Reviewers: MarkCallaghan, kailiu, sdong, dhruba, haobo Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15261	2014-01-17 12:02:03 -08:00
Mark Callaghan	439e36db21	Fix SlowdownAmount Summary: This had a few bugs. 1) bottom and top were reversed. top is for the max value but the callers were passing the max value to bottom. The result is that the max sleep is used when n >= bottom. 2) one of the callers passed values with type double and these values are frequently between 1.0 and 2.0 so rounding will do some bad things 3) sometimes the function returned 0 when there should be a stall With this change and one other diff (out for review soon) there are slightly fewer stalls on one workload. With the fix. Stalls(secs): 160.166 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 58.495 leveln_slowdown Stalls(count): 910261 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 54526 leveln_slowdown Without the fix. Stalls(secs): 172.227 level0_slowdown, 0.000 level0_numfiles, 0.000 memtable_compaction, 56.538 leveln_slowdown Stalls(count): 160831 level0_slowdown, 0 level0_numfiles, 0 memtable_compaction, 52845 leveln_slowdown Task ID: # Blame Rev: Test Plan: run db_bench for --benchmarks=overwrite with IO-bound database Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15243	2014-01-17 10:15:30 -08:00
Kai Liu	23576d773f	Remove the extra line in "make release" Summary: that line was introduced during merge.	2014-01-16 17:01:34 -08:00
Naman Gupta	1447bb5919	Allow callback to change size of existing value. Change return type of the callback function to an enum status to handle 3 cases. Summary: This diff fixes 2 hacks: * The callback function can modify the existing value inplace, if the merged value fits within the existing buffer size. But currently the existing buffer size is not being modified. Now the callback recieves a int* allowing the size to be modified. Since size is encoded as a varint in the internal key for memtable. It might happen that the entire value might have be copied to the new location if the new size varint is smaller than the existing size varint. * The callback function has 3 functionalities 1. Modify existing buffer inplace, and update size correspondingly. Now to indicate that, Returns 1. 2. Generate a new buffer indicating merged value. Returns 2. 3. Fails to do either of above, based on whatever application logic. Returns 0. Test Plan: Just make all for now. I'm adding another unit test to test each scenario. Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb, sdong, kailiu, xinyaohu, sumeet, danguo Differential Revision: https://reviews.facebook.net/D15195	2014-01-16 15:12:39 -08:00
Kai Liu	d4f65f1683	Merge branch 'master' into performance This patch merges master's changes on build_tools/format-diff.sh. Conflicts: db/version_edit.cc	2014-01-16 14:31:18 -08:00
Kai Liu	e19bad9bbd	Fix some "make format" issue Summary: * make sure when some pre-check fails, the script won't halt immediately. * change fburl to google's short url. * Fix a bug in this script: now it checks the uncommitted code only. Test Plan: Ran the script under differnet environments. Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15231	2014-01-16 14:26:51 -08:00
Igor Canadi	6d6fb70960	Remove compaction pointers Summary: The only thing we do with compaction pointers is set them to some values, we never actually read them. I don't know what we used them for, but it doesn't look like we use them anymore. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15225	2014-01-16 14:06:53 -08:00
Igor Canadi	c699c84af4	CompactionPicker Summary: This is a big one. This diff moves all the code related to picking compactions from VersionSet to new class CompactionPicker. Column families' compactions will be completely separate processes, so we need to have multiple CompactionPickers. To make this easier to review, most of the code change is just copy/paste. There is also a small change not to use VersionSet::current_, but rather to take `Version* version` as a parameter. Most of the other code is exactly the same. In future diffs, I will also make some improvements to CompactionPickers. I think the most important part will be encapsulating it better. Currently Version, VersionSet, Compaction and CompactionPicker are all friend classes, which makes it harder to change the implementation. This diff depends on D15171, D15183, D15189 and D15201 Test Plan: `make check` Reviewers: kailiu, sdong, dhruba, haobo Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15207	2014-01-16 13:03:52 -08:00

... 2 3 4 5 6 ...

1253 Commits