rocksdb

Author	SHA1	Message	Date
Kai Liu	1966b63137	Merge branch 'master' into perf	2013-11-27 11:47:40 -08:00
Haobo Xu	4e6463ea44	[RocksDB][Performance Branch] Make height and branching factor configurable for skiplist implementation Summary: As title. Especially, HashSkipListRepFactory will be able to specify a relatively small height, to reduce the memory overhead of one skiplist per bucket. Test Plan: make check and test it on leaf4 Reviewers: dhruba, sdong, kailiu CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14307	2013-11-26 21:59:36 -08:00
Dhruba Borthakur	8478f380a0	During benchmarking, I see excessive use of vector.reserve(). Summary: This code path can potentially accumulate multiple important_files for level 0. But for other levels, it should have only one file in the important_files, so it is ok not to reserve excessive space, is it not? Test Plan: make check Reviewers: haobo Reviewed By: haobo CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14349	2013-11-26 07:47:08 -08:00
Dhruba Borthakur	27bbef1180	Free obsolete memtables outside the dbmutex. Summary: Large memory allocations and frees are costly and best done outside the db-mutex. The memtables are already allocated outside the db-mutex but they were being freed while holding the db-mutex. This patch frees obsolete memtables outside the db-mutex. Test Plan: make check db_stress Unit tests pass, I am in the process of running stress tests. Reviewers: haobo, igor, emayanke Reviewed By: haobo CC: reconnect.grayhat, leveldb Differential Revision: https://reviews.facebook.net/D14319	2013-11-25 21:04:48 -08:00
Igor Canadi	3ce3658411	DB::GetOptions() Summary: We need access to options for BackupableDB Test Plan: make check Reviewers: dhruba Reviewed By: dhruba CC: leveldb, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14331	2013-11-25 15:51:50 -08:00
Igor Canadi	11c26bd4a4	[RocksDB] Interface changes required for BackupableDB Summary: This is part of https://reviews.facebook.net/D14295 -- smaller diff that is easier to review Test Plan: make asan_check Reviewers: dhruba, haobo, emayanke Reviewed By: emayanke CC: leveldb, kailiu, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14301	2013-11-25 12:39:23 -08:00
Dhruba Borthakur	299f5c76bb	Create new log file outside the dbmutex. Summary: All filesystem Io should be done outside the dbmutex. There was one place when we have to roll the transaction log that we were creating the new log file while holding the dbmutex. I rearranged this code so that the act of creating the new transaction log file is done without holding the dbmutex. I also allocate the new memtable outside the dbmutex, this is important because creating the memtable could be heavyweight. Test Plan: make check and dbstress Reviewers: haobo, igor Reviewed By: haobo CC: leveldb, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14283	2013-11-25 11:23:42 -08:00
Haobo Xu	5b825d6964	[RocksDB] Use raw pointer instead of shared pointer when passing Statistics object internally Summary: liveness of the statistics object is already ensured by the shared pointer in DB options. There's no reason to pass again shared pointer among internal functions. Raw pointer is sufficient and efficient. Test Plan: make check Reviewers: dhruba, MarkCallaghan, igor Reviewed By: dhruba CC: leveldb, reconnect.grayhat Differential Revision: https://reviews.facebook.net/D14289	2013-11-25 10:38:15 -08:00
Siying Dong	718488abc5	Add BloomFilter to PlainTableIterator::Seek() Summary: This patch adds a simple bloom filter in PlainTableIterator::Seek() Test Plan: N/A Reviewers: CC: Task ID: # Blame Rev:	2013-11-21 22:26:39 -08:00
kailiu	0c93df912e	Improve the readability of the TableProperties::ToString()	2013-11-21 17:54:23 -08:00
Siying Dong	3e35aa6412	Revert "Allow users to profile a query and see bottleneck of the query" This reverts commit `3d8ac31d71`.	2013-11-21 17:40:39 -08:00
Siying Dong	b135d01e7b	Allow users to profile a query and see bottleneck of the query Summary: Provide a framework to profile a query in detail to figure out latency bottleneck. Currently, in Get(), Put() and iterators, 2-3 simple timing is used. We can easily add more profile counters to the framework later. Test Plan: Enable this profiling in seveal existing tests. Reviewers: haobo, dhruba, kailiu, emayanke, vamsi, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14001 Conflicts: table/merger.cc	2013-11-21 17:39:19 -08:00
Siying Dong	3d8ac31d71	Allow users to profile a query and see bottleneck of the query Summary: Provide a framework to profile a query in detail to figure out latency bottleneck. Currently, in Get(), Put() and iterators, 2-3 simple timing is used. We can easily add more profile counters to the framework later. Test Plan: Enable this profiling in seveal existing tests. Reviewers: haobo, dhruba, kailiu, emayanke, vamsi, igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14001	2013-11-21 16:29:57 -08:00
Siying Dong	58e1956d50	[Only for Performance Branch] A Hacky patch to lazily generate memtable key for prefix-hashed memtables. Summary: For prefix mem tables, encoding mem table key may be unnecessary if the prefix doesn't have any key. This patch is a little bit hacky but I want to try out the performance gain of removing this lazy initialization. In longer term, we might want to revisit the way we abstract mem tables implementations. Test Plan: make all check Reviewers: haobo, igor, kailiu Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14265	2013-11-20 20:49:23 -08:00
Siying Dong	b59d4d5a50	A Simple Plain Table Summary: A Simple plain table format. No block structure. When creating the table reader, scanning the full table to create indexes. Test Plan:Add unit test Reviewers:haobo,dhruba,kailiu CC: Task ID: # Blame Rev:	2013-11-20 18:44:22 -08:00
Siying Dong	071fb0d77b	Inline a couple of functions and put one save lazily clearing Summary: Machine several functions inline. Also, in DBIter.Seek() make value cleaning up lazily done. These are for the use case that Seek() are called lots of times but few return values. Test Plan: make all check Differential Revision: https://reviews.facebook.net/D14217	2013-11-20 17:32:57 -08:00
Haobo Xu	37b459f0aa	[RocksDB] Test diff on performance branch Summary: trivia comment change Test Plan: Go through the step ofs developing under the performance branch Reviewers: dhruba, kailiu, sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D14259	2013-11-20 14:34:52 -08:00
Haobo Xu	a617227a36	[RocksDB] fix prefix_test Summary: user comparator needs to work if either input is prefix only. Test Plan: ./prefix_test --write_buffer_size=100000 --total_prefixes=10000 --items_per_prefix=10 Reviewers: dhruba, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D14241	2013-11-20 09:16:23 -08:00
kailiu	6eb5649800	Move flush_block_policy from Options to TableFactory Summary: Previously we introduce a `flush_block_policy_factory` in Options, however, that options is strongly releated to Table based tables. It will make more sense to move it to block based table's own factory class. Test Plan: make check to pass existing tests Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14211	2013-11-19 22:00:48 -08:00
kailiu	1415f8820d	Improve the "table stats" Summary: The primary motivation of the changes is to make it easier to figure out the inside of the tables. * rename "table stats" to "table properties" since now we have more than "integers" to store in the property block. * Add filter block size to the basic table properties. * Whenever a table is built, we'll log the table properties (the sample output is in Test Plan). * Make an api to expose deleted keys. Test Plan: Passed all existing test. and the sample output of table stats: ================================================================== Basic Properties ------------------------------------------------------------------ # data blocks: 1 # entries: 1 raw key size: 9 raw average key size: 9 raw value size: 9 raw average value size: 0 data block size: 25 index block size: 27 filter block size: 18 (estimated) table size: 70 filter policy: rocksdb.BuiltinBloomFilter ================================================================== User collected properties: InternalKeyPropertiesCollector ------------------------------------------------------------------ kDeletedKeys: 1 ================================================================== Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14187	2013-11-19 16:29:42 -08:00
Igor Canadi	fc61428288	Include <unistd.h> in db_test Summary: This is the only compile issue in Ubuntu. It might be better to include <unistd.h> only in env_posix and add Truncate function to Env, but since we use truncate only in db_test, I don't think it makes much sense. Test Plan: Rocksdb now compiles on Ubuntu! Reviewers: dhruba, kailiu Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14127	2013-11-17 21:58:16 -08:00
Igor Canadi	de9ce7d439	Upgrading compiler to gcc4.8.1 Summary: Finally did it - the trick was in using --dynamic-linker option. This is first step to running ASAN. All of our code seems to compile just fine on 4.8.1. However, I still left fbcode.471.sh in the 'build_tools/' just in case. Test Plan: make clean; make Reviewers: dhruba, haobo, kailiu, emayanke, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14109	2013-11-17 13:52:55 -08:00
Kai Liu	75df72f2a5	Change the logic in KeyMayExist() Summary: Previously in KeyMayExist(), if DB::Get() returns non-Status::OK(), we assumes key may not exist. However, as if index block is not in block cache, Status::Incomplete() will return. Worse still, if options::filter_delete is enabled, we may falsely ignore the "delete" operation: https://github.com/facebook/rocksdb/blob/master/db/write_batch.cc#L217-L220 This diff fixes this bug and will let crash-test pass. Test Plan: Ran: ./db_stress --test_batches_snapshots=1 --ops_per_thread=1000000 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=1 --reopen=0 --readpercent=5 --prefixpercent=45 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/home/kailiu/local/newer --max_key=100000000 --disable_seek_compaction=0 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --disable_wal=0 --disable_data_sync=0 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=1 Previously we'll see crash happens very soon. Reviewers: igor, dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14115	2013-11-17 01:00:34 -08:00
kailiu	97d8e573a6	make util/env_posix.cc work under mac Summary: This diff invoves some more complicated issues in the posix environment. Test Plan: works under mac os. will need to verify dev box. Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14061	2013-11-16 23:44:39 -08:00
Pascal Borreli	443e04e62d	Fixed typos	2013-11-16 11:21:34 +00:00
Igor Canadi	21905dd4a8	Start DeleteFileTest with clean plate Summary: Remove all the files from the test dir before the test. The test failed when there were some old files still in the directory, since it checks the file counts. This is what caused jenkins' test failures. It was running fine on my machine so it was hard to repro. Test Plan: 1. create an extra 000001.log file in the test directory 2. run a ./deletefile_test - test failes 3. patch ./deletefile_test with this 4. test succeeds Reviewers: haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14097	2013-11-15 16:30:23 -08:00
Igor Canadi	29c931f70b	Avoid populating live set if we don't need to Summary: Also changed some comments Test Plan: ./deletefile_test Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14091	2013-11-14 22:42:02 -08:00
Igor Canadi	a0ce3fd00a	PurgeObsoleteFiles() unittest Summary: Created a unittest that verifies that automatic deletion performed by PurgeObsoleteFiles() works correctly. Also, few small fixes on the logic part -- call version_set_->GetObsoleteFiles() in FindObsoleteFiles() instead of on some arbitrary positions. Test Plan: Created a unit test Reviewers: dhruba, haobo, nkg- Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14079	2013-11-14 18:03:57 -08:00
Vamsi Ponnekanti	94dde686bb	[Merge operand meant for key K is being applied on wrong key] Summary: We iterate until we find a different key than original key. ikey is pointing to next key when we break out of loop. After the loop we apply all merge operands meant for original key on the next key! Test Plan: Need to give a build to Marcin to test out. Revert Plan: OK Task ID: #3181932 Reviewers: haobo, emayanke, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14073	2013-11-14 17:13:24 -08:00
Igor Canadi	fda8142f29	Delete log files in the correct dir Summary: Log files are stored in wal_dir, not dbname_ Test Plan: deletfile_test Reviewers: nkg- Reviewed By: nkg- CC: leveldb Differential Revision: https://reviews.facebook.net/D14067	2013-11-13 14:54:54 -08:00
Kai Liu	88ba331c1a	Add the index/filter block cache Summary: This diff leverage the existing block cache and extend it to cache index/filter block. Test Plan: Added new tests in db_test and table_test The correctness is checked by: 1. make check 2. make valgrind_check Performance is test by: 1. 10 times of build_tools/regression_build_test.sh on two versions of rocksdb before/after the code change. Test results suggests no significant difference between them. For the two key operatons `overwrite` and `readrandom`, the average iops are both 20k and ~260k, with very small variance). 2. db_stress. Reviewers: dhruba Reviewed By: dhruba CC: leveldb, haobo, xjin Differential Revision: https://reviews.facebook.net/D13167	2013-11-12 22:46:51 -08:00
Kai Liu	35460ccb53	Fix the string format issue Summary: mac and our dev server has totally differnt definition of uint64_t, therefore fixing the warning in mac has actually made code in linux uncompileable. Test Plan: make clean && make -j32	2013-11-12 21:05:39 -08:00
Igor Canadi	d88d8ecf80	Fix deleting files Summary: One more fix! In some cases, our filenames start with "/". Apparently, env_ can't handle filenames with double // Test Plan: deletefile_test does not include this line in the LOG anymore: 2013/11/12-18:11:43.150149 7fe4a6fff700 RenameFile logfile #3 FAILED -- IO error: /tmp/rocksdbtest-3574/deletefile_test//000003.log: No such file or directory Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14055	2013-11-12 20:32:07 -08:00
kailiu	21587760b9	Fixing the warning messages captured under mac os # Consider using `git commit -m 'One line title' && arc diff`. # You will save time by running lint and unit in the background. Summary: The work to make sure mac os compiles rocksdb is not completed yet. But at least we can start cleaning some warnings captured only by g++ from mac os.. Test Plan: ran make in mac os Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14049	2013-11-12 20:05:28 -08:00
Igor Canadi	9df2b217e9	Move fast and break things Summary: Broke the compile when I removed purge_log_after_memtable_flush. sorrybus Test Plan: make db_bench works now Reviewers: haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D14037	2013-11-12 12:42:42 -08:00
Igor Canadi	9bc4a26f56	Small changes in Deleting obsolete files Summary: @haobo's suggestions from https://reviews.facebook.net/D13827 Renaming some variables, deprecating purge_log_after_flush, changing for loop into auto for loop. I have not implemented deleting objects outside of mutex yet because it would require a big code change - we would delete object in db_impl, which currently does not know anything about object because it's defined in version_edit.h (FileMetaData). We should do it at some point, though. Test Plan: Ran deletefile_test Reviewers: haobo Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14025	2013-11-12 11:53:26 -08:00
Igor Canadi	dad425562f	Move the comment Summary: Moving the comment per @haobo suggestion. Test Plan: No Reviewers: haobo Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D14019	2013-11-12 10:07:55 -08:00
Igor Canadi	4abd219cfc	Combine two FindObsoleteFiles() Summary: We don't need to call FindObsoleteFiles() twice Test Plan: deletefile_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14007	2013-11-11 21:41:32 -08:00
Igor Canadi	94e139f94d	Fixing failed delete file test Summary: FindObsoleteFiles() has to be called before PurgeObsoleteFiles() because FindObsoleteFiles() sets manifest_file_number, log_number and prev_log_number to valid values. Test Plan: deletefile_test now works Reviewers: dhruba, emayanke, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D13995	2013-11-11 21:03:41 -08:00
Dhruba Borthakur	318a4919d2	Fix valgrind check by initialising DeletionState. Summary: The valgrind error was introduced by commit `1510339e52`. Initialize DeletionState in constructor. Test Plan: valgrind --leak-check=yes ./deletefile_test Reviewers: igor, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D13983	2013-11-11 16:01:13 -08:00
lovro	8a46ecd357	WriteBatch::Put() overload that gathers key and value from arrays of slices Summary: In our project, when writing to the database, we want to form the value as the concatenation of a small header and a larger payload. It's a shame to have to copy the payload just so we can give RocksDB API a linear view of the value. Since RocksDB makes a copy internally, it's easy to support gather writes. Test Plan: write_batch_test, new test case Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13947	2013-11-08 16:34:32 -08:00
Igor Canadi	1510339e52	Speed up FindObsoleteFiles Summary: Here's one solution we discussed on speeding up FindObsoleteFiles. Keep a set of all files in DBImpl and update the set every time we create a file. I probably missed few other spots where we create a file. It might speed things up a bit, but makes code uglier. I don't really like it. Much better approach would be to abstract all file handling to a separate class. Think of it as layer between DBImpl and Env. Having a separate class deal with file namings and deletion would benefit both code cleanliness (especially with huge DBImpl) and speed things up. It will take a huge effort to do this, though. Let's discuss offline today. Test Plan: Ran ./db_stress, verified that files are getting deleted Reviewers: dhruba, haobo, kailiu, emayanke Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D13827	2013-11-08 15:23:46 -08:00
Igor Canadi	dd218bbc88	Forgot to change interface everywhere Summary: Changed the name and interface for creating HashSkipListRep. Forgot to change it in db_test. Test Plan: make db_test Reviewers: haobo Reviewed By: haobo Differential Revision: https://reviews.facebook.net/D13965	2013-11-08 12:23:12 -08:00
Igor Canadi	8b3379dc0a	Implementing DynamicIterator for TransformRepNoLock Summary: What @haobo done with TransformRep, now in TransformRepNoLock. Similar implementation, except that I made DynamicIterator a subclass of Iterator which makes me have less iterator initializations. Test Plan: ./prefix_test. Seeing huge savings vs. TransformRep again! Reviewers: dhruba, haobo, sdong, kailiu Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D13953	2013-11-08 00:31:09 -08:00
Kai Liu	fd075d6edd	Provide mechanism to configure when to flush the block Summary: Allow block based table to configure the way flushing the blocks. This feature will allow us to add support for prefix-aligned block. Test Plan: make check Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D13875	2013-11-07 21:27:21 -08:00
Kai Liu	bba6595b1f	Fix the valgrind error Summary: I this bug from valgrind report and found a place that may potentially leak memory. Test Plan: re-ran the valgrind and no error any more Reviewers: emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13959	2013-11-07 15:46:48 -08:00
Igor Canadi	444cf88a56	Flush the log outside of lock Summary: Added a new call LogFlush() that flushes the log contents to the OS buffers. We never call it with lock held. We call it once for every Read/Write and often in compaction/flush process so the frequency should not be a problem. Test Plan: db_test Reviewers: dhruba, haobo, kailiu, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13935	2013-11-07 11:31:56 -08:00
Haobo Xu	fd2044883a	[RocksDB] Generalize prefix-aware iterator to be used for more than one Seek Summary: Added a prefix_seek flag in ReadOptions to indicate that Seek is prefix aware(might not return data with different prefix), and also not bound to a specific prefix. Multiple Seeks and range scans can be invoked on the same iterator. If a specific prefix is specified, this flag will be ignored. Just a quick prototype that works for PrefixHashRep, the new lockless memtable could be easily extended with this support too. Test Plan: test it on Leaf Reviewers: dhruba, kailiu, sdong, igor Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D13929	2013-11-06 20:45:49 -08:00
shamdor	c2be2cba04	WAL log retention policy based on archive size. Summary: Archive cleaning will still happen every WAL_ttl seconds but archived logs will be deleted only if archive size is greater then a WAL_size_limit value. Empty archived logs will be deleted evety WAL_ttl. Test Plan: 1. Unit tests pass. 2. Benchmark. Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D13869	2013-11-06 18:46:28 -08:00
Igor Canadi	be96f2498e	TransformRep - use array instead of unordered_map Summary: I'm sending this diff together with https://reviews.facebook.net/D13881 because it didn't allow me to send only the array one. Here I also replaced unordered_map with just an array of shared_ptrs. This elminated all the locks. I will run the new benchmark and post the results here. Test Plan: db_test Reviewers: dhruba, haobo Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D13893	2013-11-06 11:55:43 -08:00

1 2 3 4 5 ...

480 Commits