rocksdb

Author	SHA1	Message	Date
Lei Jin	d118707f8d	set bg_error_ when background flush goes wrong Summary: as title Test Plan: unit test Reviewers: haobo, igor, sdong, kailiu, dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15435	2014-01-29 15:55:58 -08:00
Igor Canadi	514e42c7cc	Fix some lint warnings	2014-01-29 15:27:27 -08:00
Igor Canadi	fa99d53e55	Change ColumnFamilyData from struct to class Summary: ColumnFamilyData grew a lot, there's much more data that it holds now. It makes more sense to encapsulate it better by making it a class. Test Plan: make check Reviewers: dhruba, haobo, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15579	2014-01-29 15:18:36 -08:00
Lei Jin	fb84c49a36	convert Tickers back to array with padding and alignment Summary: Pad each Ticker structure to be 64 bytes and make them align on 64 bytes boundary to avoid cache line false sharing issue. Please refer to task 3615553 for more details Test Plan: db_bench LevelDB: version 2.0s Date: Wed Jan 29 12:23:17 2014 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB rocksdb.build.overwrite.qps 49638 rocksdb.build.overwrite.p50_micros 58.73 rocksdb.build.overwrite.p75_micros 210.56 rocksdb.build.overwrite.p99_micros 733.28 rocksdb.build.fillseq.qps 366729 rocksdb.build.fillseq.p50_micros 1.00 rocksdb.build.fillseq.p75_micros 1.00 rocksdb.build.fillseq.p99_micros 2.65 rocksdb.build.readrandom.qps 1152995 rocksdb.build.readrandom.p50_micros 11.27 rocksdb.build.readrandom.p75_micros 15.69 rocksdb.build.readrandom.p99_micros 33.59 rocksdb.build.readrandom_smallblockcache.qps 956047 rocksdb.build.readrandom_smallblockcache.p50_micros 15.23 rocksdb.build.readrandom_smallblockcache.p75_micros 17.31 rocksdb.build.readrandom_smallblockcache.p99_micros 31.49 rocksdb.build.readrandom_memtable_sst.qps 1105183 rocksdb.build.readrandom_memtable_sst.p50_micros 12.04 rocksdb.build.readrandom_memtable_sst.p75_micros 15.78 rocksdb.build.readrandom_memtable_sst.p99_micros 32.49 rocksdb.build.readrandom_fillunique_random.qps 487856 rocksdb.build.readrandom_fillunique_random.p50_micros 29.65 rocksdb.build.readrandom_fillunique_random.p75_micros 40.93 rocksdb.build.readrandom_fillunique_random.p99_micros 78.68 rocksdb.build.memtablefillrandom.qps 91304 rocksdb.build.memtablefillrandom.p50_micros 171.05 rocksdb.build.memtablefillrandom.p75_micros 196.12 rocksdb.build.memtablefillrandom.p99_micros 291.73 rocksdb.build.memtablereadrandom.qps 1340411 rocksdb.build.memtablereadrandom.p50_micros 9.48 rocksdb.build.memtablereadrandom.p75_micros 13.95 rocksdb.build.memtablereadrandom.p99_micros 30.36 rocksdb.build.readwhilewriting.qps 491004 rocksdb.build.readwhilewriting.p50_micros 29.58 rocksdb.build.readwhilewriting.p75_micros 40.34 rocksdb.build.readwhilewriting.p99_micros 76.78 Reviewers: igor, haobo Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15573	2014-01-29 15:08:41 -08:00
Igor Canadi	15999e728e	Fix column family test (create directory)	2014-01-29 14:06:59 -08:00
Igor Canadi	4662969bf5	PurgeObsoleteFiles in DropColumnFamily Summary: When we drop the column family, we want to delete all the files from that column family. Test Plan: make check Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15561	2014-01-29 11:58:33 -08:00
Igor Canadi	20b231d712	Merge branch 'master' into columnfamilies	2014-01-29 11:47:43 -08:00
Igor Canadi	f24a3ee52d	Read from and write to different column families Summary: This one is big. It adds ability to write to and read from different column families (see the unit test). It also supports recovery of different column families from log, which was the hardest part to reason about. We need to make sure to never delete the log file which has unflushed data from any column family. To support that, I added another concept, which is versions_->MinLogNumber() Test Plan: Added a unit test in column_family_test Reviewers: dhruba, haobo, sdong, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15537	2014-01-29 11:38:16 -08:00
Kai Liu	b7db241118	LIBNAME in Makefile is not really configurable Summary: In new third-party release tool, `LIBNAME=<customized_library> make` will not really change the LIBNAME. However it's very odd that the same approach works with old third-party release tools. I checked previous rocksdb version and both librocksdb.a and librocksdb_debug.a were correctly generated and copied to the right place. Test Plan: `LIBNAME=hello make -j32` generates hello.a `make -j32` generates librocksdb.a Reviewers: igor, sdong, haobo, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15555	2014-01-29 11:35:05 -08:00
kailiu	b1874af8bc	Canonicalize "RocksDB" in make_new_version.sh Summary: Change all occurrences of "rocksdb" to its canonical form "RocksDB". Test Plan: N/A Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15549	2014-01-29 10:19:34 -08:00
kailiu	c9eef784b7	Improve make_new_version.sh	2014-01-29 10:04:53 -08:00
Igor Canadi	9a597dc696	Installation instructions for CentOS	2014-01-29 08:43:11 -08:00
Igor Canadi	e57f0cc1a1	add include <atomic> to version_set.h	2014-01-29 08:17:43 -08:00
kailiu	9fe60d50ff	Add history log and revise script Summary: * Add a change log for rocksdb releases. * Remove the hacky parts of make_new_version.sh, which are either no longer useful or will be done in our dedicated 3rd-party release tool. Test Plan: N/A Reviewers: igor, haobo, sdong, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15543 2.7.fb v2.7	2014-01-28 20:24:41 -08:00
Igor Canadi	c1071ed95c	Merge branch 'master' into columnfamilies	2014-01-28 16:04:00 -08:00
Lei Jin	9a126ba3b3	only corrupt private file checksum in backupable_db_test Summary: if it happens (randomly) to corrupt shared file in the test, then the checksum will be inconsistent between meta files from different backup. BackupEngine will then detect this issue and fail. But in reality, this does not happen since the checksum is checked on every backup. So here, only corrupt checksum of private file to let BackupEngine to construct properly (but fail during restore). Test Plan: run test with valgrind Reviewers: igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15531	2014-01-28 16:03:55 -08:00
Igor Canadi	5d2c62822e	Only get the manifest file size if there is no error Summary: I came across this while working on column families. CorruptionTest::RecoverWriteError threw a SIGSEG because the descriptor_log_->file() was nullptr. I'm not sure why it doesn't happen in master, but better safe than sorry. @kailiu, can we get this in release, too? Test Plan: make check Reviewers: kailiu, dhruba, haobo Reviewed By: haobo CC: leveldb, kailiu Differential Revision: https://reviews.facebook.net/D15513	2014-01-28 16:02:51 -08:00
Igor Canadi	e5ec7384a0	Better interface to create BackupEngine Summary: I think it looks nicer. In RocksDB we have both styles, but I think that static method is the more common version. Test Plan: backupable_db_test Reviewers: ljin, benj, swk Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D15519	2014-01-28 16:01:53 -08:00
Igor Canadi	ec2fa4a690	Export BackupEngine Summary: Lots of clients have problems with using StackableDB interface. It's nice to have BackupableDB as a layer on top of DB, but not necessary. This diff exports BackupEngine, which can be used to create backups without forcing clients to use StackableDB interface. Test Plan: backupable_db_test Reviewers: dhruba, ljin, swk Reviewed By: ljin CC: leveldb, benj Differential Revision: https://reviews.facebook.net/D15477	2014-01-28 11:26:07 -08:00
kailiu	a5e220f5ef	Merge branch 'master' into performance Conflicts: Makefile db/db_impl.cc db/db_test.cc db/memtable_list.cc db/memtable_list.h table/block_based_table_reader.cc table/table_test.cc util/cache.cc util/coding.cc	2014-01-28 10:35:55 -08:00
Lei Jin	9dc29414e3	add checksum for backup files Summary: Keep checksum of each backuped file in meta file. When it restores these files, compute their checksum on the fly and compare against what is in the meta file. Fail the restore process if checksum mismatch. Test Plan: unit test Reviewers: haobo, igor, sdong, kailiu Reviewed By: igor CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D15381	2014-01-28 09:43:36 -08:00
Igor Canadi	4bf25357ae	[column families] Removing VersionSet::current() Summary: Instead of VersionSet::current(), DBImpl uses default_cfd_->current directly. Test Plan: make check Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15483	2014-01-27 17:04:46 -08:00
Mark Callaghan	90f29ccbef	Update monitoring to include average time per compaction and stall Summary: The new columns are msComp and msStall that provide average time per compaction and stall for that level in milliseconds. Level Files Size(MB) Score Time(sec) Read(MB) Write(MB) Rn(MB) Rnp1(MB) Wnew(MB) RW-Amplify Read(MB/s) Write(MB/s) Rn Rnp1 Wnp1 NewW Count msComp msStall Ln-stall Stall-cnt ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 0 8 15 1.5 2 0 30 0 0 30 0.0 0.0 15.5 0 0 0 0 16 112 0.2 1.3 7568 1 8 16 1.6 1 26 26 15 11 16 3.5 17.6 18.1 8 6 13 7 3 362 0.0 0.0 0 2 1 2 0.0 0 0 2 0 0 2 0.0 0.0 18.4 0 0 0 0 1 50 0.0 0.0 0 Task ID: # Blame Rev: Test Plan: run db_bench Revert Plan: Database Impact: Memcache Impact: Other Notes: EImportant: - begin PUBLIC platform impact section - Bugzilla: # - end platform impact - Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15345	2014-01-27 15:06:04 -08:00
Schalk-Willem Kruger	3d33da75ef	Fix UnmarkEOF for partial blocks Summary: Blocks in the transaction log are a fixed size, but the last block in the transaction log file is usually a partial block. When a new record is added after the reader hit the end of the file, a new physical record will be appended to the last block. ReadPhysicalRecord can only read full blocks and assumes that the file position indicator is aligned to the start of a block. If the reader is forced to read further by simply clearing the EOF flag, ReadPhysicalRecord will read a full block starting from somewhere in the middle of a real block, causing it to lose alignment and to have a partial physical record at the end of the read buffer. This will result in length mismatches and checksum failures. When the log file is tailed for replication this will cause the log iterator to become invalid, necessitating the creation of a new iterator which will have to read the log file from scratch. This diff fixes this issue by reading the remaining portion of the last block we read from. This is done when the reader is forced to read further (UnmarkEOF is called). Test Plan: - Added unit tests - Stress test (with replication). Check dbdir/LOG file for corruptions. - Test on test tier Reviewers: emayanke, haobo, dhruba Reviewed By: haobo CC: vamsi, sheki, dhruba, kailiu, igor Differential Revision: https://reviews.facebook.net/D15249	2014-01-27 14:49:10 -08:00
Igor Canadi	511b03a5b5	LogAndApply to take ColumnFamilyData Summary: This removes the default implementation of LogAndApply that applied the changed to the default column family by default. It is mostly simple reformatting. Test Plan: make check Reviewers: dhruba, kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15465	2014-01-27 13:57:58 -08:00
Igor Canadi	eb055609e4	[column families] Move memtable and immutable memtable list to column family data Summary: All memtables and immutable memtables are moved from DBImpl to ColumnFamilyData. For now, they are all referenced from default column family in DBImpl. It shouldn't be hard to get them from custom column family. Test Plan: make check Reviewers: dhruba, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D15459	2014-01-27 13:37:14 -08:00
Igor Canadi	ae16606f98	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc db/version_set.h	2014-01-27 11:11:51 -08:00
Igor Canadi	832158e7f7	Fsync directory after we create a new file Summary: @dhruba, I'm not sure where we need to sync the directory. I implemented the function in Env() and added the dir sync just after we close the newly created file in the builder. Should I also add FsyncDir() to new files that get created by a compaction? Test Plan: Confirmed that FsyncDir is returning Status::OK() Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D14751	2014-01-27 11:02:21 -08:00
Siying Dong	b20486f294	[Performance Branch] HashLinkList to avoid to convert length prefixed string back to internal keys Summary: Converting from length prefixed buffer back to internal key costs some CPU but it is not necessary. In this patch, internal keys are pass though the functions so that we don't need to convert back to it. Test Plan: make all check Reviewers: haobo, kailiu Reviewed By: kailiu CC: igor, leveldb Differential Revision: https://reviews.facebook.net/D15393	2014-01-27 10:26:14 -08:00
Igor Canadi	cf783c674a	Merge branch 'master' into columnfamilies Conflicts: db/version_set.h	2014-01-27 10:00:24 -08:00
Igor Canadi	6c2ca1d3e6	Move NeedsCompaction() from VersionSet to Version Summary: There is no reason to have functions NeedCompaction(), MaxCompactionScore() and MaxCompactionScoreLevel() in VersionSet, since they don't access any data in VersionSet. Test Plan: make check Reviewers: kailiu, haobo, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15333	2014-01-27 09:59:00 -08:00
Igor Canadi	e55b3c040c	Fixing ref-counting memtables	2014-01-26 18:03:38 -08:00
Mike Lin	af7838de36	address code review comments on 5e3aeb5f8e - reduce string copying in Compaction::Summary - simplify file number checking in UniversalCompactionStopStyleSimilarSize unit test	2014-01-25 14:12:24 -08:00
Igor Canadi	983fafa56c	Fix memory leak	2014-01-25 13:57:11 -08:00
Kai Liu	4b51dffcf8	Some refactorings on plain table Summary: Plain table has been working well and this is just a nit-picking patch, which is generated during my coding reading. No real functional changes. only some changes regarding: * Improve some comments from the perspective a "new" code reader. * Change some magic number to constant, which can help us to parameterize them in the future. * Did some style, naming, C++ convention changes. * Fix warnings from new "arc lint" Test Plan: make check Reviewers: sdong, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D15429	2014-01-24 21:28:10 -08:00
Igor Canadi	68a91a2e6a	missing include	2014-01-24 18:40:05 -08:00
Igor Canadi	5356b2a680	Merge branch 'master' into columnfamilies	2014-01-24 18:34:48 -08:00
Igor Canadi	04afa32134	Fix reduce levels ReduceNumberOfLevels had segmentation fault in WriteSnapshot() since we didn't change the number of levels in VersionSet (we consider them immutable from now on). This fixes the problem.	2014-01-24 18:32:38 -08:00
Siying Dong	8477255da3	Moving Some includes from options.h to forward declaration Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies. Test Plan: make all check Reviewers: kailiu, haobo, igor, dhruba Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15411	2014-01-24 17:16:22 -08:00
Igor Canadi	6a404de4ba	Merge branch 'master' into columnfamilies	2014-01-24 15:54:18 -08:00
Igor Canadi	f653fdcf5a	Fixing iterator cleanup for Tailing iterator Immutable tailing iterator doesn't set CleanupState::mem, so we don't have to unref it.	2014-01-24 15:51:06 -08:00
Igor Canadi	1423e7c9de	Merge branch 'master' into columnfamilies Conflicts: db/version_set.cc db/version_set_reduce_num_levels.cc util/ldb_cmd.cc	2014-01-24 15:03:54 -08:00
Igor Canadi	b13bdfa500	Add a call DisownData() to Cache, which should speed up shutdown Summary: On a shutdown, freeing memory takes a long time. If we're shutting down, we don't really care about memory leaks. I added a call to Cache that will avoid freeing all objects in cache. Test Plan: I created a script to test the speedup and demonstrate how to use the call: https://phabricator.fb.com/P3864368 Clean shutdown took 7.2 seconds, while fast and dirty one took 6.3 seconds. Unfortunately, the speedup is not that big, but should be bigger with bigger block_cache. I have set up the capacity to 80GB, but the script filled up only ~7GB. Reviewers: dhruba, haobo, MarkCallaghan, xjin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15069	2014-01-24 14:57:52 -08:00
Igor Canadi	677fee27c6	Make VersionSet::ReduceNumberOfLevels() static Summary: A lot of our code implicitly assumes number_levels to be static. ReduceNumberOfLevels() breaks that assumption. For example, after calling ReduceNumberOfLevels(), DBImpl::NumberLevels() will be different from VersionSet::NumberLevels(). This is dangerous. Thankfully, it's not in public headers and is only used from LDB cmd tool. LDB tool is only using it statically, i.e. it never calls it with running DB instance. With this diff, we make it explicitly static. This way, we can assume number_levels to be immutable and not break assumption that lot of our code is relying upon. LDB tool can still use the method. Also, I removed the method from a separate file since it breaks filename completition. version_se<TAB> now completes to "version_set." instead of "version_set" (without the dot). I don't see a big reason that the function should be in a different file. Test Plan: reduce_levels_test Reviewers: dhruba, haobo, kailiu, sdong Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15303	2014-01-24 14:57:04 -08:00
Igor Canadi	c583157d49	MemTableListVersion Summary: MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful. This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team. I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet()) Test Plan: `make check` works. I will also do `make valgrind_check` before commit Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D15255	2014-01-24 14:52:08 -08:00
Kai Liu	0ab766132b	Re-org the table tests Summary: We'll divide the table tests into 3 buckets, plain table test, block-based table test and general table feature test. This diff does no real change and only does the rename and reorg. Test Plan: run table_test Reviewers: sdong, haobo, igor, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D15417	2014-01-24 12:46:17 -08:00
kailiu	f131d4c280	Add a make target for shared library Summary: Previous we made `make release` also compile shared library. However it takes a long time to complete. To make our development process more efficient. I added a new make target shared_lib. User can of course run `make <library_name>` for direct compilation. However the <library_name> changed under certain condition. Thus we need `make shared_lib` to get rid of the memorization from users' side. Test Plan: make shared_lib Reviewers: igor, sdong, haobo, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D15309	2014-01-24 11:56:01 -08:00
Igor Canadi	e832e72b31	Revert "Moving to glibc-fb" This reverts commit d24961b65edeb155ebfc9f31c33e5aae380a4fec. For some reason, glibc2.17-fb breaks gflags. Reverting for now	2014-01-24 11:50:38 -08:00
Kai Liu	7d991be400	Some small refactorings on table_test Summary: Just revise some hard-to-read or unnecessarily verbose code. Test Plan: make check	2014-01-24 11:10:32 -08:00
kailiu	66dc033af3	Temporarily disable caching index/filter blocks Summary: Mixing index/filter blocks with data blocks resulted in some known issues. To make sure in next release our users won't be affected, we added a new option in BlockBasedTableFactory::TableOption to conceal this functionality for now. This patch also introduced a BlockBasedTableReader::OpenOptions, which avoids the "infinite" growth of parameters in BlockBasedTableReader::Open(). Test Plan: make check Reviewers: haobo, sdong, igor, dhruba Reviewed By: igor CC: leveldb, tnovak Differential Revision: https://reviews.facebook.net/D15327	2014-01-24 10:57:15 -08:00

... 3 4 5 6 7 ...

1336 Commits