rocksdb

Author	SHA1	Message	Date
Dhruba Borthakur	5e9f3a9aa7	Better locking in vectorrep that increases throughput to match speed of storage. Summary: There is a use-case where we want to insert data into rocksdb as fast as possible. Vector rep is used for this purpose. The background flush thread needs to flush the vectorrep to storage. It acquires the dblock then sorts the vector, releases the dblock and then writes the sorted vector to storage. This is suboptimal because the lock is held during the sort, which prevents new writes for occuring. This patch moves the sorting of the vector rep to outside the db mutex. Performance is now as fastas the underlying storage system. If you are doing buffered writes to rocksdb files, then you can observe throughput upwards of 200 MB/sec writes. This is an early draft and not yet ready to be reviewed. Test Plan: make check Task ID: # Blame Rev: Reviewers: haobo Reviewed By: haobo CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D12987	2013-09-19 21:48:10 -07:00
Haobo Xu	4734dbb742	[RocksDB] Unit test to show Seek key comparison number Summary: Added SeekKeyComparison to show the uer key comparison incurred by Seek. Test Plan: make perf_context_test export LEVELDB_TESTS=DBTest.SeekKeyComparison ./perf_context_test --write_buffer_size=500000 --total_keys=10000 ./perf_context_test --write_buffer_size=250000 --total_keys=10000 Reviewers: dhruba, xjin Reviewed By: xjin CC: leveldb Differential Revision: https://reviews.facebook.net/D12843	2013-09-18 21:43:41 -07:00
Haobo Xu	72fcbf055d	[RocksDB] Fix DBTest.UniversalCompactionSizeAmplification too Summary: as title Test Plan: make db_test; ./db_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D13005	2013-09-17 21:29:33 -07:00
Haobo Xu	5b76338c01	[RocksDB] Fix DBTest.UniversalCompactionTrigger to reflect the correct compaction trigger condition. Summary: as title Test Plan: make db_test; ./db_test Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12981	2013-09-17 14:17:48 -07:00
Rajat Goel	11c65021fb	Revert "Minor fixes found while trying to compile it using clang on Mac OS X" This reverts commit 5f2c136c328a8dbb6c3cb3818881e30eeb916cd6.	2013-09-15 23:01:26 -07:00
Haobo Xu	1d8c57db23	[RocksDB] Universal compaction trigger condition minor fix Summary: Currently, when total number of files reaches level0_file_num_compaction_trigger, universal compaction will schedule a compaction job, but the job will not honor the compaction until the total number of files is level0_file_num_compaction_trigger+1. Fixed the condition for consistent behavior (start compaction on reaching level0_file_num_compaction_trigger). Test Plan: make check; db_stress Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12945	2013-09-15 22:35:59 -07:00
Rajat Goel	5f2c136c32	Minor fixes found while trying to compile it using clang on Mac OS X	2013-09-15 22:06:14 -07:00
Dhruba Borthakur	4012ca1c7b	Added a parameter to limit the maximum space amplification for universal compaction. Summary: Added a new field called max_size_amplification_ratio in the CompactionOptionsUniversal structure. This determines the maximum percentage overhead of space amplification. The size amplification is defined to be the ratio between the size of the oldest file to the sum of the sizes of all other files. If the size amplification exceeds the specified value, then min_merge_width and max_merge_width are ignored and a full compaction of all files is done. A value of 10 means that the size a database that stores 100 bytes of user data could occupy 110 bytes of physical storage. Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added. Reviewers: haobo, emayanke, xjin Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12825	2013-09-13 16:27:18 -07:00
Haobo Xu	0e422308aa	[RocksDB] Remove Log file immediately after memtable flush Summary: As title. The DB log file life cycle is tied up with the memtable it backs. Once the memtable is flushed to sst and committed, we should be able to delete the log file, without holding the mutex. This is part of the bigger change to avoid FindObsoleteFiles at runtime. It deals with log files. sst files will be dealt with later. Test Plan: make check; db_bench Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D11709	2013-09-12 11:54:44 -07:00
Haobo Xu	f2f4c8072f	[RocksDB] Added nano second stopwatch and new perf counters to track block read cost Summary: The pupose of this diff is to expose per user-call level precise timing of block read, so that we can answer questions like: a Get() costs me 100ms, is that somehow related to loading blocks from file system, or sth else? We will answer that with EXACTLY how many blocks have been read, how much time was spent on transfering the bytes from os, how much time was spent on checksum verification and how much time was spent on block decompression, just for that one Get. A nano second stopwatch was introduced to track time with higher precision. The cost/precision of the stopwatch is also measured in unit-test. On my dev box, retrieving one time instance costs about 30ns, on average. The deviation of timing results is good enough to track 100ns-1us level events. And the overhead could be safely ignored for 100us level events (10000 instances/s), for example, a viewstate thrift call. Test Plan: perf_context_test, also testing with viewstate shadow traffic. Reviewers: dhruba Reviewed By: dhruba CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D12351	2013-09-07 21:14:54 -07:00
Dhruba Borthakur	32c965d417	Flush was hanging because the configured options specified that more than 1 memtable need to be merged. Summary: There is an config option called Options.min_write_buffer_number_to_merge that specifies the minimum number of write buffers to merge in memory before flushing to a file in L0. But in the the case when the db is being closed, we should not be using this config, instead we should flush whatever write buffers were available at that time. Test Plan: Unit test attached. Reviewers: haobo, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12717	2013-09-06 16:28:33 -07:00
Dhruba Borthakur	197034e4c3	An iterator may automatically invoke reseeks. Summary: An iterator invokes reseek if the number of sequential skips over the same userkey exceeds a configured number. This makes iter->Next() faster (bacause of fewer key compares) if a large number of adjacent internal keys in a table (sst or memtable) have the same userkey. Test Plan: Unit test DBTest.IterReseek. Reviewers: emayanke, haobo, xjin Reviewed By: xjin CC: leveldb, xjin Differential Revision: https://reviews.facebook.net/D11865	2013-09-06 11:50:53 -07:00
Mayank Agarwal	aa5c897d42	Return pathname relative to db dir in LogFile and cleanup AppendSortedWalsOfType Summary: So that replication can just download from wherever LogFile.Pathname is pointing them. Test Plan: make all check;./db_repl_stress Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12609	2013-09-04 13:44:43 -07:00
Xing Jin	42c109cc2e	New ldb command to convert compaction style Summary: Add new command "change_compaction_style" to ldb tool. For universal->level, it shows "nothing to do". For level->universal, it compacts all files into a single one and moves the file to level 0. Also add check for number of files at level 1+ when opening db with universal compaction style. Test Plan: 'make all check'. New unit test for internal convertion function. Also manully test various cmd like: ./ldb change_compaction_style --old_compaction_style=0 --new_compaction_style=1 --db=/tmp/leveldbtest-3088/db_test Reviewers: haobo, dhruba Reviewed By: haobo CC: vamsi, emayanke Differential Revision: https://reviews.facebook.net/D12603	2013-09-04 13:13:08 -07:00
Mayank Agarwal	c34271a5a5	Fix bug in Counters and record Sequencenumber using only TickerCount Summary: The way counters/statistics are implemented in rocksdb demands that enum Tickers and TickerNameMap follow the same order, otherwise statistics exposed from fbcode/rocks get out-of-sync. 2 counters for prefix had violated this order and when I built counters for fbcode/mcrocksdb, statistics for sequence number were appearing out-of-sync. The other change is to record sequence-number using setTickerCount only and not recordTick. This is because of difference in statistics as understood by rocks/utils which uses ServiceData::statistics function and rocksdb statistics. In rocksdb there is just 1 counter for a countername. But in ServiceData there are 4 independent buckets for every countername-Count, Sum, Average and Rate. SetTickerCount and RecordTick update the same variable in rocksdb but different buckets in ServiceData. Therefore, I had to choose one consistent function from RecordTick or SetTickerCount for sequence number in rocksdb. I chose SetTickerCount because the statistics object in options passed during rocksdb-open is user-dependent and SetTickerCount makes sense there. There will be a corresponding diff to mcorcksdb in fbcode shortly. Test Plan: make all check; check ticker value using fprintfs Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12669	2013-09-01 17:59:32 -07:00
Mayank Agarwal	ab5c5c28fe	Fix build caused by DeleteFile not tolerating / at the beginning Summary: db->DeleteFile calls ParseFileName to check name that was returned for sst file. Now, sst filename is returned using TableFileName which uses MakeFileName. This puts a / at the front of the name and ParseFileName doesn't like that. Changed ParseFileName to tolerate /s at the beginning. The test delet_file_test used to pass earlier because this behaviour of MakeFileName had been changed a while back to not return a / during which delete_file_test was checked in. But MakeFileName had to be reverted to add / at the front because GetLiveFiles used at many places outside rocksdb used the previous behaviour of MakeFileName. Test Plan: make;./delete_filetest;make all check Reviewers: dhruba, haobo, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12663	2013-09-01 17:59:13 -07:00
Mayank Agarwal	46dcf51ca5	Return a '/' before names of all files through MakeFileName Summary: // won't hurt but a missing / hurts sometimes Test Plan: make all check; ./db_repl_stress Reviewers: vamsi Reviewed By: vamsi CC: dhruba Differential Revision: https://reviews.facebook.net/D12621	2013-08-30 14:25:22 -07:00
Dhruba Borthakur	59de2dbad7	Cleanup DeleteFile API Summary: The DeleteFile API was removing files inside the db-lock. This is now changed to remove files outside the db-lock. The GetLiveFilesMetadata() returns the smallest and largest seqnuence number of each file as well. Test Plan: deletefile_test Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Maniphest Tasks: T63 Differential Revision: https://reviews.facebook.net/D12567	2013-08-28 21:18:58 -07:00
Haobo Xu	48e5ea0c34	[RocksDB] Fix TransformRepFactory related valgrind problem Summary: Let TransformRepFactory own the passed in transform. Also make it better encapsulated. Test Plan: make valgrind_check; Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D12591	2013-08-28 19:27:54 -07:00
Dhruba Borthakur	fc0c399d2e	Introduced a new flag non_blocking_io in ReadOptions. Summary: If ReadOptions.non_blocking_io is set to true, then KeyMayExists and Iterators will return data that is cached in RAM. If the Iterator needs to do IO from storage to serve the data, then the Iterator.status() will return Status::IsRetry(). Test Plan: Enhanced unit test DBTest.KeyMayExist to detect if there were are IOs issues from storage. Added DBTest.NonBlockingIteration to verify nonblocking Iterations. Reviewers: emayanke, haobo Reviewed By: haobo CC: leveldb Maniphest Tasks: T63 Differential Revision: https://reviews.facebook.net/D12531	2013-08-28 10:49:14 -07:00
Haobo Xu	43eef52001	[RocksDB] move stats counting outside of mutex protected region for DB::Get() Summary: As title. This is possible as tickers are atomic now. db_bench on high qps in-memory muti-thread random get workload, showed ~5% throughput improvement. Test Plan: make check; db_bench; db_stress Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12555	2013-08-27 13:36:10 -07:00
Mayank Agarwal	dad2731729	Fix bug in KeyMayExist Summary: In KeyMayExist.db_test we do a Flush which causes sst file to be written and added as open file in TableCache, but block cache for the file is not populated. So value_found should have been false where it was true and KeyMayExist.db_test should not have passed earlier. But it passed because BlockReader in table/table.cc takes 2 default arguments at the end called for_compaction and no_io. Although I passed no_io=true from InternalGet to BlockReader, but it understood for_compaction=true and defaulted no_io to false. This is a bug and although will be removed by Dhruba's new patch to incorporate no_io in readoptions, I'm submitting this patch to fix this bug independently of that patch. Test Plan: make all check Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12537	2013-08-26 08:45:58 -07:00
Mayank Agarwal	b1074ac24f	Use initializer list for VersionSet Summary: initialiszer list is fasteri/preferable because it can straightaway call the constructor for this object, otherwise it will be created first and then again initialized. Although gain may not be much in this case because files_ is just a pointer and not a complex object, this is recommended practice. Test Plan: make all check Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12519	2013-08-24 18:16:01 -07:00
Deon Nicholas	573844807c	Fix for no_io Summary: Oops. My bad. Test Plan: Make all check Reviewers: emayanke Reviewed By: emayanke CC: haobo, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D12525	2013-08-23 16:36:01 -07:00
Tyler Harter	4504c99030	Internal/user key bug fix. Summary: Fix code so that the filter_block layer only assumes keys are internal when prefix_extractor is set. Test Plan: ./filter_block_test Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12501	2013-08-23 14:49:57 -07:00
Dhruba Borthakur	1186192ed1	Replace include/leveldb with include/rocksdb. Summary: Replace include/leveldb with include/rocksdb. Test Plan: make clean; make check make clean; make release Differential Revision: https://reviews.facebook.net/D12489	2013-08-23 10:51:00 -07:00
Jim Paton	74781a0c49	Add three new MemTableRep's Summary: This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep. UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that. VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector. PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets. I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed). Test Plan: make -j32 check ./db_stress --memtablerep=vector ./db_stress --memtablerep=unsorted ./db_stress --memtablerep=prefixhash --prefix_size=10 Reviewers: dhruba, haobo, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12117	2013-08-22 23:10:02 -07:00
Xing Jin	17dc128048	Pull from https://reviews.facebook.net/D10917 Summary: Pull Mark's patch and slightly revise it. I revised another place in db_impl.cc with similar new formula. Test Plan: make all check. Also run "time ./db_bench --num=2500000000 --numdistinct=2200000000". It has run for 20+ hours and hasn't finished. Looks good so far: Installed stack trace handler for SIGILL SIGSEGV SIGBUS SIGABRT LevelDB: version 2.0 Date: Tue Aug 20 23:11:55 2013 CPU: 32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz CPUCache: 20480 KB Keys: 16 bytes each Values: 100 bytes each (50 bytes after compression) Entries: 2500000000 RawSize: 276565.6 MB (estimated) FileSize: 157356.3 MB (estimated) Write rate limit: 0 Compression: snappy WARNING: Assertions are enabled; benchmarks unnecessarily slow ------------------------------------------------ DB path: [/tmp/leveldbtest-3088/dbbench] fillseq : 7202.000 micros/op 138 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] fillsync : 7148.000 micros/op 139 ops/sec; (2500000 ops) DB path: [/tmp/leveldbtest-3088/dbbench] fillrandom : 7105.000 micros/op 140 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] overwrite : 6930.000 micros/op 144 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] readrandom : 1.020 micros/op 980507 ops/sec; (0 of 2500000000 found) DB path: [/tmp/leveldbtest-3088/dbbench] readrandom : 1.021 micros/op 979620 ops/sec; (0 of 2500000000 found) DB path: [/tmp/leveldbtest-3088/dbbench] readseq : 113.000 micros/op 8849 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] readreverse : 102.000 micros/op 9803 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] Created bg thread 0x7f0ac17f7700 compact : 111701.000 micros/op 8 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] readrandom : 1.020 micros/op 980376 ops/sec; (0 of 2500000000 found) DB path: [/tmp/leveldbtest-3088/dbbench] readseq : 120.000 micros/op 8333 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] readreverse : 29.000 micros/op 34482 ops/sec; DB path: [/tmp/leveldbtest-3088/dbbench] ... finished 618100000 ops Reviewers: MarkCallaghan, haobo, dhruba, chip Reviewed By: dhruba Differential Revision: https://reviews.facebook.net/D12441	2013-08-22 22:37:13 -07:00
Tyler Harter	94cf218720	Revert "Prefix scan: db_bench and bug fixes" This reverts commit c2bd8f4824bda98db8699f1e08d6969cf21ef86f.	2013-08-22 18:01:11 -07:00
Tyler Harter	c2bd8f4824	Prefix scan: db_bench and bug fixes Summary: If use_prefix_filters is set and read_range>1, then the random seeks will set a the prefix filter to be the prefix of the key which was randomly selected as the target. Still need to add statistics (perhaps in a separate diff). Test Plan: ./db_bench --benchmarks=fillseq,prefixscanrandom --num=10000000 --statistics=1 --use_prefix_blooms=1 --use_prefix_api=1 --bloom_bits=10 Reviewers: dhruba Reviewed By: dhruba CC: leveldb, haobo Differential Revision: https://reviews.facebook.net/D12273	2013-08-22 16:06:50 -07:00
Simha Venkataramaiah	60bf2b7d4a	Add APIs to query SST file metadata and to delete specific SST files Summary: An api to query the level, key ranges, size etc for each SST file and an api to delete a specific file from the db and all associated state in the bookkeeping datastructures. Notes: Editing the manifest version does not release the obsolete files right away. However deleting the file directly will mess up the iterator. We may need a more aggressive/timely file deletion api. I have used std::unique_ptr - will switch to boost:: since this is external. thoughts? Unit test is fragile right now as it expects the compaction at certain levels. Test Plan: unittest Reviewers: dhruba, vamsi, emayanke CC: zshao, leveldb, haobo Task ID: # Blame Rev:	2013-08-22 15:27:19 -07:00
Jim Paton	cb703c9d03	Allow WriteBatch::Handler to abort iteration Summary: Sometimes you don't need to iterate through the whole WriteBatch. This diff makes the Handler member functions return a bool that indicates whether to abort or not. If they return true, the iteration stops. One thing I just thought of is that this will break backwards-compability. Maybe it would be better to add a virtual member function WriteBatch::Handler::ShouldAbort() that returns false by default. Comments requested. I still have to add a new unit test for the abort code, but let's finalize the API first. Test Plan: make -j32 check Reviewers: dhruba, haobo, vamsi, emayanke Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12339	2013-08-21 18:27:48 -07:00
Haobo Xu	f9e2decf7c	[RocksDB] Minor iterator cleanup Summary: Was going through the iterator related code, did some cleanup along the way. Basically replaced array with vector and adopted range based loop where applicable. Test Plan: make check; make valgrind_check Reviewers: dhruba, emayanke Reviewed By: emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D12435	2013-08-21 16:54:48 -07:00
Deon Nicholas	b87dcae1a3	Made merge_oprator a shared_ptr; and added TTL unit tests Test Plan: - make all check; - make release; - make stringappend_test; ./stringappend_test Reviewers: haobo, emayanke Reviewed By: haobo CC: leveldb, kailiu Differential Revision: https://reviews.facebook.net/D12381	2013-08-20 13:35:28 -07:00
Mayank Agarwal	8a3547d38e	API for getting archived log files Summary: Also expanded class LogFile to have startSequene and FileSize and exposed it publicly Test Plan: make all check Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12087	2013-08-19 13:37:04 -07:00
Deon Nicholas	e1346968d8	Merge operator fixes part 1. Summary: -Added null checks and revisions to DBIter::MergeValuesNewToOld() -Added DBIter test to stringappend_test -Major fix with Merge and TTL More plans for fixes later. Test Plan: -make clean; make stringappend_test -j 32; ./stringappend_test -make all check; Reviewers: haobo, emayanke, vamsi, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12315	2013-08-19 11:42:47 -07:00
Mayank Agarwal	387ac0f1e1	Expose statistic for sequence number and implement setTickerCount Summary: statistic for sequence number is needed by wormhole. setTickerCount is demanded for this statistic. I can't simply recordTick(max_sequence) when db recovers because the statistic iobject is owned by client and may/may not be reset during reopen. Eg. statistic is reset in mcrocksdb whereas it is not in db_stress. Therefore it is best to go with setTickerCount Test Plan: ./db_stress ... --statistics=1 and observed expected sequence number Reviewers: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12327	2013-08-15 23:00:20 -07:00
Deon Nicholas	d1d3d15eb7	Tiny fix to db_bench for make release. Summary: In release, "found variable assigned but not used anywhere". Changed it to work with assert. Someone accept this :). Test Plan: make release -j 32 Reviewers: haobo, dhruba, emayanke Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D12309	2013-08-15 17:50:12 -07:00
Deon Nicholas	ad48c3c262	Benchmarking for Merge Operator Summary: Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag so that the tester can easily benchmark different merge operators. Currently supports the PutOperator and UInt64Add operator. Support for stringappend or list append may come later. Test Plan: 1. make db_bench 2. Test the PutOperator (simulating Put) as follows: ./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put --threads=2 3. Test the UInt64AddOperator (simulating numeric addition) similarly: ./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=uint64add --threads=2 Reviewers: haobo, dhruba, zshao, MarkCallaghan Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D11535	2013-08-15 17:13:07 -07:00
Xing Jin	0a5afd1afc	Minor fix to current codes Summary: Minor fix to current codes, including: coding style, output format, comments. No major logic change. There are only 2 real changes, please see my inline comments. Test Plan: make all check Reviewers: haobo, dhruba, emayanke Differential Revision: https://reviews.facebook.net/D12297	2013-08-14 23:03:57 -07:00
Jim Paton	0307c5fe3a	Implement log blobs Summary: This patch adds the ability for the user to add sequences of arbitrary data (blobs) to write batches. These blobs are saved to the log along with everything else in the write batch. You can add multiple blobs per WriteBatch and the ordering of blobs, puts, merges, and deletes are preserved. Blobs are not saves to SST files. RocksDB ignores blobs in every way except for writing them to the log. Before committing this patch, I need to add some test code. But I'm submitting it now so people can comment on the API. Test Plan: make -j32 check Reviewers: dhruba, haobo, vamsi Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D12195	2013-08-14 16:32:46 -07:00
Haobo Xu	d9dd2a1926	[RocksDB] Expose thread local perf counter for low overhead, per call level performance statistics. Summary: As title. No locking/atomic is needed due to thread local. There is also no need to modify the existing client interface, in order to expose related counters. perf_context_test shows a simple example of retrieving the number of user key comparison done for each put and get call. More counters could be added later. Sample output ./perf_context_test 1000000 ==== Test PerfContextTest.KeyComparisonCount Inserting 1000000 key/value pairs ... total user key comparison get: 43446523 total user key comparison put: 8017877 max user key comparison get: 88939 avg user key comparison get:43 Basically, the current skiplist does well on average, but could perform poorly in extreme cases. Test Plan: run perf_context_test <total number of entries to put/get> Reviewers: dhruba Differential Revision: https://reviews.facebook.net/D12225	2013-08-14 15:24:06 -07:00
Mayank Agarwal	f1bf169484	Counter for merge failure Summary: With Merge returning bool, it can keep failing silently(eg. While faling to fetch timestamp in TTL). We need to detect this through a rocksdb counter which can get bumped whenever Merge returns false. This will also be super-useful for the mcrocksdb-counter service where Merge may fail. Added a counter NUMBER_MERGE_FAILURES and appropriately updated db/merge_helper.cc I felt that it would be better to directly add counter-bumping in Merge as a default function of MergeOperator class but user should not be aware of this, so this approach seems better to me. Test Plan: make all check Reviewers: dnicholas, haobo, dhruba, vamsi CC: leveldb Differential Revision: https://reviews.facebook.net/D12129	2013-08-13 14:25:42 -07:00
Tyler Harter	f5f1842282	Prefix filters for scans (v4) Summary: Similar to v2 (db and table code understands prefixes), but use ReadOptions as in v3. Also, make the CreateFilter code faster and cleaner. Test Plan: make db_test; export LEVELDB_TESTS=PrefixScan; ./db_test Reviewers: dhruba Reviewed By: dhruba CC: haobo, emayanke Differential Revision: https://reviews.facebook.net/D12027	2013-08-13 14:04:56 -07:00
sumeet	3b81df34bd	Separate compaction filter for each compaction Summary: If we have same compaction filter for each compaction, application cannot know about the different compaction processes. Later on, we can put in more details in compaction filter for the application to consume and use it according to its needs. For e.g. In the universal compaction, we have a compaction process involving all the files while others don't involve all the files. Applications may want to collect some stats only when during full compaction. Test Plan: run existing unit tests Reviewers: haobo, dhruba Reviewed By: dhruba CC: xinyaohu, leveldb Differential Revision: https://reviews.facebook.net/D12057	2013-08-13 10:56:20 -07:00
Dhruba Borthakur	03bd4461ad	Merge branch 'performance' of github.com:facebook/rocksdb into performance	2013-08-12 10:31:21 -07:00
Dhruba Borthakur	f3967a5132	Merge remote-tracking branch 'origin' into performance	2013-08-12 09:58:50 -07:00
Dhruba Borthakur	93d77a27d2	Universal Compaction should keep DeleteMarkers unless it is the earliest file. Summary: The pre-existing code was purging a DeleteMarker if thay key did not exist in deeper levels. But in the Universal Compaction Style, all files are in Level0. For compaction runs that did not include the earliest file, we were erroneously purging the DeleteMarkers. The fix is to purge DeleteMarkers only if the compaction includes the earlist file. Test Plan: DBTest.Randomized triggers this code path. Differential Revision: https://reviews.facebook.net/D12081	2013-08-09 14:03:57 -07:00
Xing Jin	8ae905ed63	Fix unit tests for universal compaction (step 2) Summary: Continue fixing existing unit tests for universal compaction. I have tried to apply universal compaction to all unit tests those haven't called ChangeOptions(). I left a few which are either apparently not applicable to universal compaction (because they check files/keys/values at level 1 or above levels), or apparently not related to compaction (e.g., open a file, open a db). I also add a new unit test for universal compaction. Good news is I didn't see any bugs during this round. Test Plan: Ran "make all check" yesterday. Has rebased and is rerunning Reviewers: haobo, dhruba Differential Revision: https://reviews.facebook.net/D12135	2013-08-09 13:35:44 -07:00
Haobo Xu	3a3b1c3e6c	[RocksDB] Improve manifest dump to print internal keys in hex for version edits. Summary: Currently, VersionEdit::DebugString always display internal keys in the original ascii format. This could cause manifest dump to be truncated if internal keys contain special charactors (like null). Also added an option --input_key_hex for ldb idump to indicate that the passed in user keys are in hex. Test Plan: run ldb manifest_dump Reviewers: dhruba, emayanke CC: leveldb Differential Revision: https://reviews.facebook.net/D12111	2013-08-08 16:19:01 -07:00

1 2 3 4 5 ...

377 Commits