rocksdb

Author	SHA1	Message	Date
Igor Canadi	767777c2bd	Turn on -Wshorten-64-to-32 and fix all the errors Summary: We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details. This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables. Test Plan: compiles Reviewers: ljin, rven, yhchiang, sdong Reviewed By: yhchiang Subscribers: bobbaldwin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28689	2014-11-11 16:47:22 -05:00
Igor Canadi	113796c493	Fix NewFileNumber() Summary: I mistakenly changed the behavior to ++next_file_number_ instead of next_file_number_++, as it should have been: `344edbb044/db/version_set.h (L539)` Test Plan: none. not sure if this would break anything. It's just different behavior, so I'd rather not risk Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28557	2014-11-11 06:58:47 -08:00
Igor Canadi	c7ee9c3ab7	Fix -Wnon-virtual-dtor errors Summary: This breaks mongo+rocks build https://mci.10gen.com/task_log_raw/rocksdb_ubuntu1404_rocksdb_c6e8e3d868660dc66b3bbd438cdc135df6356c5a_14_11_10_21_36_10_compile_ubuntu1404_rocksdb/0?type=T Test Plan: m check + -Wnon-virtual-dtor Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28653	2014-11-10 17:39:38 -05:00
sdong	dd726a59ef	Bump Version Number to 3.8 Summary: As tittle. Test Plan: Not needed. Reviewers: ljin, igor, yhchiang, rven, fpi Reviewed By: fpi Subscribers: leveldb, fpi, dhruba Differential Revision: https://reviews.facebook.net/D28629	2014-11-10 12:10:56 -08:00
Federico Piccinini	543df158c5	Expose sst_dump functionality as library call. Summary: Refactor sst_dump to follow the same structure as ldb. Introduce a SSTDump interface. Test Plan: built sst_dump and tried it manually. Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28485	2014-11-07 17:23:58 -08:00
Yueh-Hsuan Chiang	344edbb044	Fixed the shadowing in db/compaction.cc and include/rocksdb/db.h Summary: Fixed the shadowing in db/compaction.cc and include/rocksdb/db.h Test Plan: make	2014-11-07 15:22:10 -08:00
Yueh-Hsuan Chiang	8447861896	Fixed -WShadow errors in db/db_test.cc and include/rocksdb/metadata.h Summary: Fixed -WShadow errors in db/db_test.cc and include/rocksdb/metadata.h Test Plan: make	2014-11-07 14:57:51 -08:00
Yueh-Hsuan Chiang	28c82ff1b3	CompactFiles, EventListener and GetDatabaseMetaData Summary: This diff adds three sets of APIs to RocksDB. = GetColumnFamilyMetaData = * This APIs allow users to obtain the current state of a RocksDB instance on one column family. * See GetColumnFamilyMetaData in include/rocksdb/db.h = EventListener = * A virtual class that allows users to implement a set of call-back functions which will be called when specific events of a RocksDB instance happens. * To register EventListener, simply insert an EventListener to ColumnFamilyOptions::listeners = CompactFiles = * CompactFiles API inputs a set of file numbers and an output level, and RocksDB will try to compact those files into the specified level. = Example = * Example code can be found in example/compact_files_example.cc, which implements a simple external compactor using EventListener, GetColumnFamilyMetaData, and CompactFiles API. Test Plan: listener_test compactor_test example/compact_files_example export ROCKSDB_TESTS=CompactFiles db_test export ROCKSDB_TESTS=MetaData db_test Reviewers: ljin, igor, rven, sdong Reviewed By: sdong Subscribers: MarkCallaghan, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D24705	2014-11-07 14:45:18 -08:00
Igor Canadi	31342c4005	Fix implicit compare	2014-11-07 12:41:05 -08:00
Igor Canadi	9f20395cd6	Turn -Wshadow back on Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc. Test Plan: compiles Reviewers: ljin, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28131	2014-11-06 11:14:28 -08:00
sdong	ac95ae1b5d	Make sure WAL is synced for DB::Write() if write batch is empty Summary: This patch makes it a contract that if an empty write batch is passed to DB::Write() and WriteOptions.sync = true, fsync is called to WAL. Test Plan: A new unit test Reviewers: ljin, rven, yhchiang, igor Reviewed By: igor Subscribers: dhruba, MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D28365	2014-11-06 09:48:19 -08:00
Lei Jin	29a9161f34	Note dynamic options in options.h Summary: as title Test Plan: n/a Reviewers: igor, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28287	2014-11-04 16:23:45 -08:00
Lei Jin	fd24ae9d05	SetOptions() to return status and also add it to StackableDB Summary: as title Test Plan: ./db_test Reviewers: sdong, yhchiang, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28269	2014-11-04 16:23:05 -08:00
sdong	83bf09144b	Bump verison number to 3.7 Summary: As tittle Test Plan: N/A Reviewers: ljin, yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D28299	2014-11-04 14:52:02 -08:00
sdong	09899f0b51	DB::Open() to automatically increase thread pool size if it is smaller than max number of parallel compactions or flushes Summary: With the patch, thread pool size will be automatically increased if DB's options ask for more parallelism of compactions or flushes. Too many users have been confused by the API. Change it to make it harder for users to make mistakes Test Plan: Add two unit tests to cover the function. Reviewers: yhchiang, rven, igor, MarkCallaghan, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27555	2014-11-03 17:22:34 -08:00
Jonah Cohen	c1a924b9f0	Move convenience.h to /include Summary: Move header file so it can be referenced externally. Test Plan: Rebuild. Reviewers: ljin Reviewed By: ljin Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D28095	2014-10-31 12:08:43 -07:00
Igor Canadi	c2999f54bd	Revert "tmp" This reverts commit `9ab0132360`.	2014-10-29 15:29:33 -07:00
Lei Jin	9ab0132360	tmp Summary: Test Plan: Reviewers: CC: Task ID: # Blame Rev:	2014-10-29 13:36:47 -07:00
Lei Jin	f1841985e4	dynamic inplace_update options Summary: Make inplace_update_support and inplace_update_num_locks dynamic. inplace_callback becomes immutable We are almost free of references to cfd->options() in db_impl Test Plan: unit test Reviewers: igor, yhchiang, rven, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25293	2014-10-27 12:10:13 -07:00
Lei Jin	2dd9bfe3a8	Sanitize block-based table index type and check prefix_extractor Summary: Respond to issue reported https://www.facebook.com/groups/rocksdb.dev/permalink/651090261656158/ Change the Sanitize signature to take both DBOptions and CFOptions Test Plan: unit test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25041	2014-10-17 21:18:36 -07:00
Igor Canadi	833357402c	WriteBatchWithIndex supports an iterator that merge its change with a base iterator. Summary: Add an iterator that combines base_iterator of type Iterator* with delta iterator of type WBWIIterator*. Test Plan: nothing yet. work in progress Reviewers: ljin, igor Reviewed By: igor Subscribers: rven, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D24741	2014-10-10 19:02:58 -07:00
sdong	4f65fbd197	WriteBatchWithIndex's iterator to support SeekToFirst(), SeekToLast() and Prev() Summary: Support SeekToFirst(), SeekToLast() and Prev() in WBWIIterator, returned by WriteBatchWithIndex::NewIterator(). Test Plan: Write unit test cases to cover the case. Reviewers: ljin, igor Reviewed By: igor Subscribers: rven, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D24765	2014-10-10 16:19:34 -07:00
sdong	f441b273ae	WriteBatchWithIndex to support an option to overwrite rows when operating the same key Summary: With a new option, when accepting a new key, WriteBatchWithIndex will find an existing index of the same key, and replace the content of it. Test Plan: Add a unit test case. Reviewers: ljin, yhchiang, rven, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24753	2014-10-10 15:19:21 -07:00
Lei Jin	cd0d581ff5	convert Options from string Summary: Allow accepting Options as a string of key/value pairs Test Plan: unit test Reviewers: yhchiang, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D24597	2014-10-10 10:00:12 -07:00
Tomislav Novak	88edfd90ae	SkipListRep::LookaheadIterator Summary: This diff introduces the `lookahead` argument to `SkipListFactory()`. This is an optimization for the tailing use case which includes many seeks. E.g. consider the following operations on a skip list iterator: Seek(x), Next(), Next(), Seek(x+2), Next(), Seek(x+3), Next(), Next(), ... If `lookahead` is positive, `SkipListRep` will return an iterator which also keeps track of the previously visited node. Seek() then first does a linear search starting from that node (up to `lookahead` steps). As in the tailing example above, this may require fewer than ~log(n) comparisons as with regular skip list search. Test Plan: Added a new benchmark (`fillseekseq`) which simulates the usage pattern. It first writes N records (with consecutive keys), then measures how much time it takes to read them by calling `Seek()` and `Next()`. $ time ./db_bench -num 10000000 -benchmarks fillseekseq -prefix_size 1 \ -key_size 8 -write_buffer_size $[102410241024] -value_size 50 \ -seekseq_next 2 -skip_list_lookahead=0 [...] DB path: [/dev/shm/rocksdbtest/dbbench] fillseekseq : 0.389 micros/op 2569047 ops/sec; real 0m21.806s user 0m12.106s sys 0m9.672s $ time ./db_bench [...] -skip_list_lookahead=2 [...] DB path: [/dev/shm/rocksdbtest/dbbench] fillseekseq : 0.153 micros/op 6540684 ops/sec; real 0m19.469s user 0m10.192s sys 0m9.252s Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb, march, lovro Differential Revision: https://reviews.facebook.net/D23997	2014-10-07 11:48:23 -07:00
Igor Canadi	0908ddcea5	Don't keep managing two rocksdb version Summary: Before this diff, there are two places with rocksdb versions. After the diff: 1. we only have one source of truth for rocksdb version 2. we have a script that we can use to get the version that we can use in other compilations (java, go, etc). Test Plan: make Reviewers: yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24333	2014-10-02 11:59:22 -07:00
Lei Jin	5ec53f3edf	make compaction related options changeable Summary: make compaction related options changeable. Most of changes are tedious, following the same convention: grabs MutableCFOptions at the beginning of compaction under mutex, then pass it throughout the job and register it in SuperVersion at the end. Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23349	2014-10-01 16:19:16 -07:00
fyrz	5340484266	Built-in comparator(s) in RocksJava Extended Built-in comparators with ReverseBytewiseComparator. Reverse key handling is under certain conditions essential. E.g. while using timestamp versioned data. As native-comparators were not available using JAVA-API. Both built-in comparators were exposed via JNI to be set upon database creation time.	2014-09-26 10:35:12 +02:00
Lei Jin	c6275956e2	improve memory efficiency of cuckoo reader Summary: When creating a new iterator, instead of storing mapping from key to bucket id for sorting, store only bucket id and read key from mmap file based on the id. This reduces from 20 bytes per entry to only 4 bytes. Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23757	2014-09-25 16:15:23 -07:00
Lei Jin	581442d446	option to choose module when calculating CuckooTable hash Summary: Using module to calculate hash makes lookup ~8% slower. But it has its benefit: file size is more predictable, more space enffient Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23691	2014-09-25 13:53:27 -07:00
Igor Canadi	21ddcf6e4f	Remove allow_thread_local Summary: See https://reviews.facebook.net/D19365 Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23907	2014-09-24 13:12:16 -07:00
Lei Jin	0a29ce5393	re-enable BlockBasedTable::SetupForCompaction() Summary: It was commented out in D22545 by accident. Keep the option in ImmutableOptions for now. I can make it dynamic in https://reviews.facebook.net/D23349 Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23865	2014-09-23 14:18:57 -07:00
sdong	d0de413f4d	WriteBatchWithIndex to allow different Comparators for different column families Summary: Previously, one single column family is given to WriteBatchWithIndex to index keys for all column families. An extra map from column family ID to comparator is maintained which can override the default comparator given in the constructor. A WriteBatchWithIndex::SetComparatorForCF() is added for user to add comparators per column family. Also move more codes into anonymous namespace. Test Plan: Add a unit test Reviewers: ljin, igor Reviewed By: igor Subscribers: dhruba, leveldb, yhchiang Differential Revision: https://reviews.facebook.net/D23355	2014-09-22 13:47:39 -07:00
Lei Jin	57a32f147f	change target_file_size_base to uint64_t Summary: It contrains the file size to be 4G max with int Test Plan: tried to grep instance and made sure other related variables are also uint64 Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23697	2014-09-22 11:15:03 -07:00
Lei Jin	51af7c326c	CuckooTable: add one option to allow identity function for the first hash function Summary: MurmurHash becomes expensive when we do millions Get() a second in one thread. Add this option to allow the first hash function to use identity function as hash function. It results in QPS increase from 3.7M/s to ~4.3M/s. I did not observe improvement for end to end RocksDB performance. This may be caused by other bottlenecks that I will address in a separate diff. Test Plan: ``` [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320 [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320 ``` Reviewers: sdong, igor, yhchiang Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23451	2014-09-18 11:00:48 -07:00
Lei Jin	a062e1f2c4	SetOptions() for memtable related options Summary: as title Test Plan: make all check I will think a way to set up stress test for this Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23055	2014-09-17 12:49:13 -07:00
Lei Jin	e4eca6a1e5	Options conversion function for convenience Summary: as title Test Plan: options_test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23283	2014-09-17 12:46:32 -07:00
Igor Canadi	4a27a2f193	Don't sync manifest when disableDataSync = true Summary: As we discussed offline Test Plan: compiles Reviewers: yhchiang, sdong, ljin, dhruba Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22989	2014-09-15 11:32:01 -07:00
Jonah Cohen	092f97e219	Fix comments and typos Summary: Correct some comments and typos in RocksDB. Test Plan: Inspection Reviewers: sdong, igor Reviewed By: igor Differential Revision: https://reviews.facebook.net/D23133	2014-09-09 15:20:49 -07:00
Xiaozheng Tie	6cc12860f0	Added a few statistics for BackupableDB Summary: Added the following statistics to BackupableDB: 1. Number of successful and failed backups in class BackupStatistics 2. Time taken to do a backup 3. Number of files in a backup 1 is implemented in the BackupStatistics class 2 and 3 are added in the BackupMeta and BackupInfo class Test Plan: 1 can be tested using BackupStatistics::ToString(), 2 and 3 can be tested in the BackupInfo class Reviewers: sdong, igor2, ljin, igor Reviewed By: igor Differential Revision: https://reviews.facebook.net/D22785	2014-09-09 13:44:42 -07:00
Lei Jin	52311463e9	MemTableOptions Summary: removed reference to options in WriteBatch and DBImpl::Get() Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23049	2014-09-08 18:46:52 -07:00
Lei Jin	659d2d50c3	move compaction_filter to immutable_options Summary: all shared_ptrs are in immutable_options now. This will also make options assignment a little cheaper Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23001	2014-09-08 15:09:25 -07:00
Lei Jin	048560a642	reduce references to cfd->options() in DBImpl Summary: I found it is almost impossible to get rid of this function in a single batch. I will take a step by step approach Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22995	2014-09-08 15:04:34 -07:00
Igor Canadi	a2bb7c3c33	Push- instead of pull-model for managing Write stalls Summary: Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either: * proceed with all writes without delay * delay all writes by fixed time * stop all writes The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case). When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal. This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write. Test Plan: make check for now. I'll add some unit tests later. Also, perf test. Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22791	2014-09-08 11:20:25 -07:00
Feng Zhu	0af157f9bf	Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit `1d23b5c470` Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979	2014-09-08 10:37:05 -07:00
Nik Bougalis	d1cfb71ec7	Remove unused member(s)	2014-09-05 20:50:29 -07:00
liuhuahang	bb6ae0f80c	fix more compile warnings N/A Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-05 14:14:37 +08:00
Lei Jin	5665e5e285	introduce ImmutableOptions Summary: As a preparation to support updating some options dynamically, I'd like to first introduce ImmutableOptions, which is a subset of Options that cannot be changed during the course of a DB lifetime without restart. ColumnFamily will keep both Options and ImmutableOptions. Any component below ColumnFamily should only take ImmutableOptions in their constructor. Other options should be taken from APIs, which will be allowed to adjust dynamically. I am yet to make changes to memtable and other related classes to take ImmutableOptions in their ctor. That can be done in a seprate diff as this one is already pretty big. Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D22545	2014-09-04 16:18:36 -07:00
Raghav Pisolkar	e0b99d4f5d	created a new ReadOptions parameter 'iterate_upper_bound'	2014-09-04 11:00:16 -07:00
Lei Jin	703c3eacd9	comments about the BlockBasedTableOptions migration in Options Summary: as title Test Plan: none Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22737	2014-09-03 17:01:34 -07:00
Lei Jin	9b58c73c7c	call SanitizeDBOptionsByCFOptions() in the right place Summary: It only covers Open() with default column family right now Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22467	2014-09-02 14:42:23 -07:00
Igor Canadi	a84234a61b	Ignore missing column families Summary: Before this diff, whenever we Write to non-existing column family, Write() would fail. This diff adds an option to not fail a Write() when WriteBatch points to non-existing column family. MongoDB said this would be useful for them, since they might have a transaction updating an index that was dropped by another thread. This way, they don't have to worry about checking if all indexes are alive on every write. They don't care if they lose writes to dropped index. Test Plan: added a small unit test Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22143	2014-09-02 13:29:05 -07:00
Feng Zhu	8438a19360	fix dropping column family bug Summary: 1. db/db_impl.cc:2324 (DBImpl::BackgroundCompaction) should not raise bg_error_ when column family is dropped during compaction. Test Plan: 1. db_stress Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22653	2014-09-02 12:25:58 -07:00
Radheshyam Balasundaram	7f71448388	Implementing a cache friendly version of Cuckoo Hash Summary: This implements a cache friendly version of Cuckoo Hash in which, in case of collission, we try to insert in next few locations. The size of the neighborhood to check is taken as an input parameter in builder and stored in the table. Test Plan: make check all cuckoo_table_{db,reader,builder}_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22455	2014-08-28 10:42:23 -07:00
Igor Canadi	d5bd6c772b	Fix ios compile Summary: No __thread for ios. Test Plan: compile works for ios now Reviewers: ljin, dhruba Reviewed By: dhruba Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22491	2014-08-28 12:46:05 -04:00
Lei Jin	1755581f19	improve OptimizeForPointLookup() Summary: also fix HISTORY.md Test Plan: make all check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22437	2014-08-26 14:15:00 -07:00
Lei Jin	23861857c4	ReadOptions.total_order_seek to allow total order seek for block-based table when hash index is enabled Summary: as title Test Plan: table_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22239	2014-08-25 16:14:30 -07:00
Lei Jin	a98badff16	print table options Summary: Add a virtual function in table factory that will print table options Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22149	2014-08-25 14:24:09 -07:00
Lei Jin	384400128f	move block based table related options BlockBasedTableOptions Summary: I will move compression related options in a separate diff since this diff is already pretty lengthy. I guess I will also need to change JNI accordingly :( Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21915	2014-08-25 14:22:05 -07:00
Andrew Bonventre	050869177e	Add missing include to use std::unique_ptr This was causing issues when including this header from another file.	2014-08-23 13:02:21 -04:00
Yueh-Hsuan Chiang	63a2215c63	Improve Options sanitization and add MmapReadRequired() to TableFactory Summary: Currently, PlainTable must use mmap_reads. When PlainTable is used but allow_mmap_reads is not set, rocksdb will fail in flush. This diff improve Options sanitization and add MmapReadRequired() to TableFactory. Test Plan: export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest make db_test -j32 ./db_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: you, leveldb Differential Revision: https://reviews.facebook.net/D21939	2014-08-20 15:53:39 -07:00
sdong	28b5c76004	WriteBatchWithIndex: a wrapper of WriteBatch, with a searchable index Summary: Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write. WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset. Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries. I will add more unit tests if people are OK about this API. Test Plan: make all check Add unit tests. Reviewers: yhchiang, igor, MarkCallaghan, ljin Reviewed By: ljin Subscribers: dhruba, leveldb, xjin Differential Revision: https://reviews.facebook.net/D21381	2014-08-18 16:37:38 -07:00
sdong	68eed8c9f0	Bump up version Summary: Bump up version after we've cut 3.4 Test Plan: N/A Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: leveldb, dhruba, igor Differential Revision: https://reviews.facebook.net/D22047	2014-08-18 12:02:02 -07:00
Lei Jin	58c49466d2	Allow env_posix to lower background thread IO priority Summary: This is a linux-specific system call. Test Plan: ran db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: haobo, leveldb Differential Revision: https://reviews.facebook.net/D21183	2014-08-13 20:49:58 -07:00
Lei Jin	5a5953b388	Add histogram for DB_SEEK Summary: as title Test Plan: make release Reviewers: sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21717	2014-08-13 15:56:37 -07:00
Lei Jin	5d0074c471	set bytes_per_sync to 1MB if rate limiter is enabled Summary: as title Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21201	2014-08-12 16:42:18 -07:00
Radheshyam Balasundaram	9674c11d01	Integrating Cuckoo Hash SST Table format into RocksDB Summary: Contains the following changes: - Implementation of cuckoo_table_factory - Adding cuckoo table into AdaptiveTableFactory - Adding cuckoo_table_db_test, similar to lines of plain_table_db_test - Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key. - Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test cuckoo_table_db_test make check all make valgrind_check make asan_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21219	2014-08-11 20:21:07 -07:00
Lei Jin	37c6740c38	make statistics ToString function empty instead of pure virtual Summary: as title Test Plan: make release Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21549	2014-08-11 15:04:41 -07:00
Spencer Kimball	38e8b727a8	Fix typo, add missing inclusion of state void* in invocation of create_compaction_filter_v2_.	2014-08-06 18:42:15 -04:00
Spencer Kimball	c1f588af71	Add support for C bindings to the compaction V2 filter mechanism. Test Plan: make c_test && ./c_test Some fixes after merge.	2014-08-06 15:55:48 -04:00
Igor Canadi	9c5a3f4746	FeatureSet DebugString Summary: This will help debugging Test Plan: ran, observed output Reviewers: yinwang Reviewed By: yinwang Differential Revision: https://reviews.facebook.net/D20937	2014-08-01 08:55:37 -07:00
Demon	7ef7df005f	fix project name in the comments fix project name in the comments	2014-07-31 11:42:36 +08:00
Igor Canadi	99e03bcbf1	Better comment for inplace_update_support Summary: See https://github.com/facebook/rocksdb/issues/215 Test Plan: none Reviewers: dhruba, sdong, ljin, yhchiang, nkg- Reviewed By: nkg- Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20769	2014-07-30 09:32:47 -07:00
sdong	f04356e660	Add DB::GetIntProperty() to return integer properties to be returned as integers Summary: We have quite some properties that are integers and we are adding more. Add a function to directly return them as an integer, instead of a string Test Plan: Add several unit test checks Reviewers: yhchiang, igor, dhruba, haobo, ljin Reviewed By: ljin Subscribers: yoshinorim, leveldb Differential Revision: https://reviews.facebook.net/D20637	2014-07-28 16:55:57 -07:00
Lei Jin	40fa8a4cd5	make statistics forward-able Summary: Make StatisticsImpl being able to forward stats to provided statistics implementation. The main purpose is to allow us to collect internal stats in the future even when user supplies custom statistics implementation. It avoids intrumenting 2 sets of stats collection code. One immediate use case is tuning advisor, which needs to collect some internal stats, users may not be interested. Test Plan: ran db_bench and see stats show up at the end of run Will run make all check since some tests rely on statistics Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20145	2014-07-28 12:05:36 -07:00
Radheshyam Balasundaram	62f9b071ff	Implementation of CuckooTableReader Summary: Contains: - Implementation of TableReader based on Cuckoo Hashing - Unittests for CuckooTableReader - Performance test for TableReader Test Plan: make cuckoo_table_reader_test ./cuckoo_table_reader_test make valgrind_check make asan_check Reviewers: yhchiang, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20511	2014-07-25 16:37:32 -07:00
Lei Jin	d650612c4c	expose RateLimiter definition Summary: User gets undefinied error since the definition is not exposed. Also re-enable the db test with only upper bound check Test Plan: db_test, rate_limit_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20403	2014-07-25 15:17:06 -07:00
Igor Canadi	0754d4cb3b	SpatialDB change API Summary: I changed SpatialDB API so that we only specify list of indexes when we create the database. That way, whoever is querying the DB doesn't need to know the full list of indexes and their options. Test Plan: spatial_db_test Reviewers: yinwang Reviewed By: yinwang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20571	2014-07-24 16:39:33 -04:00
Radheshyam Balasundaram	07a7d870b8	Addressing TODOs in CuckooTableBuilder Summary: Contains the following changes in CuckooTableBuilder: - Take an extra parameter in constructor to identify last level file. - Implement a better way to identify if a bucket has been inserted into the tree already during BFS search. - Minor typos Test Plan: make cuckoo_table_builder ./cuckoo_table_builder make valgrind_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20445	2014-07-24 10:07:41 -07:00
Igor Canadi	6296330417	SpatialDB Summary: This diff is adding spatial index support to RocksDB. When creating the DB user specifies a list of spatial indexes. Spatial indexes can cover different areas and have different resolution (i.e. number of tiles). This is useful for supporting different zoom levels. Each element inserted into SpatialDB has: * a bounding box, which determines how will the element be indexed * string blob, which will usually be WKB representation of the polygon (http://en.wikipedia.org/wiki/Well-known_text) * feature set, which is a map of key-value pairs, where value can be int, double, bool, null or a string. FeatureSet will be a set of tags associated with geo elements (for example, 'road': 'highway' and similar) * a list of indexes to insert the element in. For example, small river element will be inserted in index for high zoom level, while country border will be inserted in all indexes (including the index for low zoom level). Each query is executed on single spatial index. Query guarantees that it will return all elements intersecting the specified bounding box, but it might also return some extra non-intersecting elements. Test Plan: Added bunch of unit tests in spatial_db_test Reviewers: dhruba, yinwang Reviewed By: yinwang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20361	2014-07-23 14:22:58 -04:00
Igor Canadi	1053358a84	Bump the version	2014-07-23 10:21:54 -04:00
Igor Canadi	0ff183a0d9	Move include/utilities/.h to include/rocksdb/utilities/.h Summary: All public headers need to be under `include/rocksdb` directory. Otherwise, clients include our header files like this: #include <rocksdb/db.h> #include <utilities/backupable_db.h> // still our public header! Also, internally, we include: #include "utilities/backupable/backupable_db.h" // internal header #include "utilities/backupable_db.h" // public header which is confusing. This way, when we install rocksdb as a system library, we can just copy `include/rocksdb` directory to system's header files. We can't really copy `utilities` directory to system's header files. Test Plan: compiles Reviewers: dhruba, ljin, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20409	2014-07-23 10:21:38 -04:00
sdong	e6de02103a	Add a utility function to guess optimized options based on constraints Summary: Add a function GetOptions(), where based on four parameters users give: read/write amplification threshold, memory budget for mem tables and target DB size, it picks up a compaction style and parameters for them. Background threads are not touched yet. One limit of this algorithm: since compression rate and key/value size are hard to predict, it's hard to predict level 0 file size from write buffer size. Simply make 1:1 ratio here. Sample results: https://reviews.facebook.net/P477 Test Plan: Will add some a unit test where some sample scenarios are given and see they pick the results that make sense Reviewers: yhchiang, dhruba, haobo, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18741	2014-07-22 15:24:21 -07:00
sdong	f6b7e1ed1a	Allow user to specify DB path of output file of manual compaction Summary: Add a parameter path_id to DB::CompactRange(), to indicate where the output file should be placed to. Test Plan: add a unit test Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, igor, dhruba, MarkCallaghan, leveldb Differential Revision: https://reviews.facebook.net/D20085	2014-07-21 19:06:00 -07:00
Radheshyam Balasundaram	cf3da899b0	Adding a new SST table builder based on Cuckoo Hashing Summary: Cuckoo Hashing based SST table builder. Contains: - Cuckoo Hashing logic and file storage logic. - Unit tests for logic Test Plan: make cuckoo_table_builder_test ./cuckoo_table_builder_test make check all Reviewers: yhchiang, igor, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19545	2014-07-21 13:26:09 -07:00
Lei Jin	9c0d84d240	improve comments for CrateRateLimiter() Summary: Suggested by @dhruba from the other diff, here is the improved comments for parameters of the function Test Plan: none Reviewers: dhruba, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19623	2014-07-21 11:03:16 -07:00
Stanislau Hlebik	9d70cce047	Adding option to save PlainTable index and bloom filter in SST file. Summary: Adding option to save PlainTable index and bloom filter in SST file. If there is no bloom block and/or index block, PlainTableReader builds new ones. Otherwise PlainTableReader just use these blocks. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19527	2014-07-18 16:58:13 -07:00
Stanislau Hlebik	92d73cbe78	Add PlainTableOptions Summary: Since we have a lot of options for PlainTable, add a struct PlainTableOptions to manage them Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20175	2014-07-18 00:08:38 -07:00
sdong	0abaed2e08	Support multiple DB directories in universal compaction style Summary: This patch adds a target size parameter in options.db_paths and universal compaction will base it to determine which DB path to place a new file. Level-style stays the same. Test Plan: Add new unit tests Reviewers: ljin, yhchiang Reviewed By: yhchiang Subscribers: MarkCallaghan, dhruba, igor, leveldb Differential Revision: https://reviews.facebook.net/D19869	2014-07-15 12:06:28 -07:00
Igor Canadi	20c056306b	Remove stats logger Summary: Browsing through the code, looks like StatsLogger is not used at all! Test Plan: compiles Reviewers: ljin, sdong, yhchiang, dhruba Reviewed By: dhruba Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19827	2014-07-15 09:16:32 -04:00
sdong	dd6c444822	Improve Put()'s comment to indicate that the key is overwritten if existing Summary: As title Test Plan: Not needed for comment only. Reviewers: yhchiang, ljin, MarkCallaghan Reviewed By: MarkCallaghan Subscribers: xjin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19887	2014-07-14 14:43:59 -07:00
Reed Allman	1fc71a4b16	C API: create missing cf's, cleanup	2014-07-10 12:55:53 -07:00
sdong	01700b6911	Update master to version 3.3 Summary: As tittle Test Plan: no need Reviewers: igor, yhchiang, ljin Reviewed By: ljin Subscribers: haobo, dhruba, xjin, leveldb Differential Revision: https://reviews.facebook.net/D19629	2014-07-10 11:59:35 -07:00
sdong	36de0e5359	Add a function to return current perf level Summary: Add a function to return the perf level. It is to allow a wrapper of DB to increase the perf level and restore the original perf level after finishing the function call. Test Plan: Add a verification in db_test Reviewers: yhchiang, igor, ljin Reviewed By: ljin Subscribers: xjin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19551	2014-07-10 11:35:48 -07:00
Stanislau Hlebik	30c81e7717	Removing NewTotalOrderPlainTableFactory Summary: Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect. Total order mode indicator is prefix_extractor == nullptr, but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests in plain_table_db_tests is incorrect. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19587	2014-07-10 11:32:04 -07:00
Igor Canadi	f0a8be253e	JSON (Document) API sketch Summary: This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API. I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :) Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time. Test Plan: Added a simple unit test Reviewers: haobo, yhchiang, dhruba, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18747	2014-07-10 09:31:42 -07:00
Lei Jin	534357ca3a	integrate rate limiter into rocksdb Summary: Add option and plugin rate limiter for PosixWritableFile. The rate limiter only applies to flush and compaction. WAL and MANIFEST are excluded from this enforcement. Test Plan: db_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19425	2014-07-08 12:31:49 -07:00
Lei Jin	5ef1ba7ff5	generic rate limiter Summary: A generic rate limiter that can be shared by threads and rocksdb instances. Will use this to smooth out write traffic generated by compaction and flush. This will help us get better p99 behavior on flash storage. Test Plan: unit test output ==== Test RateLimiterTest.Rate request size [1 - 1023], limit 10 KB/sec, actual rate: 10.374969 KB/sec, elapsed 2002265 request size [1 - 2047], limit 20 KB/sec, actual rate: 20.771242 KB/sec, elapsed 2002139 request size [1 - 4095], limit 40 KB/sec, actual rate: 41.285299 KB/sec, elapsed 2202424 request size [1 - 8191], limit 80 KB/sec, actual rate: 81.371605 KB/sec, elapsed 2402558 request size [1 - 16383], limit 160 KB/sec, actual rate: 162.541268 KB/sec, elapsed 3303500 Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19359	2014-07-08 11:41:57 -07:00
Reed Allman	fd3fb4b0bf	C API: update options w/ convenience funcs & fifo compaction	2014-07-08 10:57:45 -07:00
sdong	01159aa802	stackable_db: add default function for GetLiveFilesMetaData() Summary: stackable_db doesn't have GetLiveFilesMetaData() implemented. Add it Test Plan: make all check Reviewers: yhchiang, ljin, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19491	2014-07-07 17:02:57 -07:00
Igor Canadi	8a03935f8c	Fix valgrind error in c_test Summary: External contribution caused some valgrind errors: `1a34aaaef0` This diff fixes them Test Plan: ran valgrind Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19485	2014-07-07 14:41:54 -07:00
Evan Shaw	3f7104d7c5	C API: Allow setting compaction filter factory	2014-07-08 08:12:36 +12:00
Evan Shaw	91bede79cc	C API: Add support for compaction filter factories (v1)	2014-07-08 08:12:36 +12:00
Evan Shaw	0bf5589c74	Fix compaction_filter.h typos	2014-07-08 08:11:01 +12:00
Igor Canadi	0bc5fa9f40	Merge pull request #190 from edsrzf/c-api-writebatch-serialized C API: support constructing write batch from serialized representation	2014-07-07 10:17:43 -07:00
Reed Allman	1a34aaaef0	C API: column family support	2014-07-07 01:41:01 -07:00
Evan Shaw	9fc23d0c56	C API: support constructing write batch from serialized representation	2014-07-06 10:36:33 +12:00
Yueh-Hsuan Chiang	90a6aca48e	Finer report I/O stats about Flush and Compaction. Summary: This diff allows the I/O stats about Flush and Compaction to be reported in a more accurate way. Instead of measuring the size of a file, it measure I/O cost in per read / write basis. Test Plan: make all check Reviewers: sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19383	2014-07-03 16:28:03 -07:00
Yueh-Hsuan Chiang	d4d338de33	Add timeout_hint_us to WriteOptions and introduce Status::TimeOut. Summary: This diff adds timeout_hint_us to WriteOptions. If it's non-zero, then 1) writes associated with this options MAY be aborted when it has been waiting for longer than the specified time. If an abortion happens, associated writes will return Status::TimeOut. 2) the stall time of the associated write caused by flush or compaction will be limited by timeout_hint_us. The default value of timeout_hint_us is 0 (i.e., OFF.) The statistics of timeout writes will be recorded in WRITE_TIMEDOUT. Test Plan: export ROCKSDB_TESTS=WriteTimeoutAndDelayTest make db_test ./db_test Reviewers: igor, ljin, haobo, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18837	2014-07-03 15:47:02 -07:00
sdong	2459f7ec4e	Support Multiple DB paths (without having an interface to expose to users) Summary: In this patch, we allow RocksDB to support multiple DB paths internally. No user interface is supported yet so this patch is silent to users. Test Plan: make all check Reviewers: igor, haobo, ljin, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18921	2014-07-02 21:14:44 -07:00
sdong	9c332aa11a	HashLinkList memtable switches a bucket to a skip list to reduce performance outliers Summary: In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search. Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list. The new data structure is documented in comments in the codes. Test Plan: make all check set threshold_use_skiplist in several tests Reviewers: yhchiang, haobo, ljin Reviewed By: yhchiang, ljin Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19299	2014-07-01 17:14:15 -07:00
sdong	19de6a7aad	Remove MemTableRep::GetIterator(const Slice& slice) Summary: It seems to me that when ever function MemTableRep::GetIterator(const Slice& slice) is used, we can use MemTableRep::GetDynamicPrefixIterator() instead. Just delete it to simplify the codes. Test Plan: make all check Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, dhruba, haobo, leveldb Differential Revision: https://reviews.facebook.net/D19281	2014-06-25 14:09:29 -07:00
Haobo Xu	dfb31d152d	[RocksDB] allow LDB tool to have customized key formatter Summary: Currently ldb tool dump keys either in ascii format or hex format - neither is ideal if the key has a binary structure and is not readable in ascii. This diff also allows LDB tool to be customized in ways beyond DB options. Test Plan: verify that key formatter works with some simple db with binary key. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19209	2014-06-23 15:35:40 -07:00
Igor Canadi	00b26c3a83	JSONDocument Summary: After evaluating options for JSON storage, I decided to implement our own. The reason is that we'll be able to optimize it better and we get to reduce unnecessary dependencies (which is what we'd get with folly). I also plan to write a serializer/deserializer for JSONDocument with our own binary format similar to BSON. That way we'll store binary JSON format in RocksDB instead of the plain-text JSON. This means less storage and faster deserialization. There are still some inefficiencies left here. I plan to optimize them after we develop a functioning DocumentDB. That way we can move and iterate faster. Test Plan: added a unit test Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18831	2014-06-20 11:14:14 +02:00
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	2014-06-20 10:23:02 +02:00
Igor Canadi	bae495740d	Merge pull request #179 from edsrzf/c-api-compaction-filter Support for compaction filters in the C API	2014-06-19 21:22:46 +02:00
Evan Shaw	d72313a7fa	Add a way to set compaction filter in the C API	2014-06-19 16:31:24 +12:00
Evan Shaw	df2701373d	Support for compaction filters in the C API	2014-06-19 16:31:17 +12:00
sdong	edd47c5104	PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key Summary: Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes. The data format is documented in table/plain_table_factory.h Test Plan: Add unit test coverage in plain_table_db_test Reviewers: yhchiang, igor, dhruba, ljin, haobo Reviewed By: haobo Subscribers: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18735	2014-06-18 20:41:52 -07:00
Haobo Xu	0f0076ed5a	[RocksDB] Reduce memory footprint of the blockbased table hash index. Summary: Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption. Just a quick draft, still testing and refining. Test Plan: unit test and shadow testing Reviewers: dhruba, kailiu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19047	2014-06-18 18:16:07 -07:00
Igor Canadi	3525aac9e5	Change order of parameters in adaptive table factory Summary: This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation: 1) block based factory 2) plain table factory 3) output factory 4) new format factory I think it makes more sense to have output as the first parameter. Also, fixed a NewAdaptiveTableFactory() call in unit test Test Plan: unit test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19119	2014-06-18 07:04:37 +02:00
sdong	8c265c08f1	HashLinkList to log distribution of number of entries aross buckets Summary: Add two parameters of hash linked list to log distribution of number of entries across all buckets, and a sample row when there are too many entries in one single bucket. Test Plan: Turn it on in plain_table_db_test and see the logs. Reviewers: haobo, ljin Reviewed By: ljin Subscribers: leveldb, nkg-, dhruba, yhchiang Differential Revision: https://reviews.facebook.net/D19095	2014-06-17 17:55:36 -07:00
sdong	200e4b4a72	Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format. Test Plan: add a unit test Reviewers: haobo, igor Reviewed By: igor Subscribers: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19017	2014-06-17 11:49:22 -07:00
Yueh-Hsuan Chiang	e6e259b8ab	Include max_write_buffer_number >= 2 to SanitizeOptions. Summary: Include max_write_buffer_number >= 2 to SanitizeOptions. Test Plan: make all check Reviewers: haobo, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19077	2014-06-16 16:26:46 -07:00
Igor Canadi	f43c8262c2	Don't compress block bigger than 2GB Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446. Test Plan: make check doesn't complain :) Reviewers: haobo, ljin, yhchiang, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18975	2014-06-09 12:26:09 -07:00
Igor Canadi	a0191c9dfe	Create Missing Column Families Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490 Test Plan: added unit test. also, stress test for some time Reviewers: sdong, haobo, dhruba, ljin, yhchiang Reviewed By: yhchiang Subscribers: yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D18951	2014-06-06 18:04:56 -07:00
sdong	df9069d23f	In DB::NewIterator(), try to allocate the whole iterator tree in an arena Summary: In this patch, try to allocate the whole iterator tree starting from DBIter from an arena 1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it. 2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator. 3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it. Limitations: (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc (2) Two level iterator itself is allocated in arena, but not iterators inside it. Test Plan: make all check Reviewers: ljin, haobo Reviewed By: haobo Subscribers: leveldb, dhruba, yhchiang, igor Differential Revision: https://reviews.facebook.net/D18513	2014-06-02 17:44:57 -07:00
sdong	462796697c	dynamic_bloom: replace some divide (remainder) operations with shifts in locality mode, and other improvements Summary: This patch changes meaning of options.bloom_locality: 0 means disable cache line optimization and any positive number means use CACHE_LINE_SIZE as block size (the previous behavior is the block size will be CACHE_LINE_SIZE*options.bloom_locality). By doing it, the divide operations inside a block can be replaced by a shift. Performance is improved: https://reviews.facebook.net/P471 Also, improve the basic algorithm in two ways: (1) make sure num of blocks is an odd number (2) rotate bytes after every probe in locality mode. Since the divider is 2^n, unless doing it, we are never able to use all the bits. Improvements of false positive: https://reviews.facebook.net/P459 Test Plan: make all check Reviewers: ljin, haobo Reviewed By: haobo Subscribers: dhruba, yhchiang, igor, leveldb Differential Revision: https://reviews.facebook.net/D18843	2014-06-02 17:36:38 -07:00
Igor Canadi	f068d2a94d	Move master version to 3.2	2014-05-23 10:27:56 -07:00
Igor Canadi	6de6a06631	FIFO compaction style Summary: Introducing new compaction style -- FIFO. FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values. FIFO compaction style is suited for storing high-frequency event logs. Test Plan: Added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba Subscribers: alberts, leveldb Differential Revision: https://reviews.facebook.net/D18765	2014-05-21 11:43:35 -07:00
Igor Canadi	b2cf95fe38	Call EnableFileDeletions with false as argument	2014-05-20 14:28:51 -07:00
Igor Canadi	c07c9606ed	Expose Status::code()	2014-05-14 16:23:40 -07:00
Igor Canadi	26f5dd9a5a	TablePropertiesCollectorFactory Summary: This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options. Here's description of task #4296714: I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB. For example, this table has 3155 entries and 1391828 deleted keys. The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for. Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API. In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe. Test Plan: Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one). Also, all other tests Reviewers: sdong, dhruba, haobo, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D18579	2014-05-13 12:30:55 -07:00
Igor Canadi	3edc056f6d	comment	2014-05-10 11:25:56 -07:00
Igor Canadi	038a477b53	Make it easier to start using RocksDB Summary: This diff is addressing multiple things with a single goal -- to make RocksDB easier to use: * Add some functions to Options that make RocksDB easier to tune. * Add example code for both simple RocksDB and RocksDB with Column Families. * Rewrite our README.md Regarding Options, I took a stab at something we talked about for a long time: * https://www.facebook.com/groups/rocksdb.dev/permalink/563169950448190/ I added functions: * IncreaseParallelism() -- easy, increases the thread pool and max_background_compactions * OptimizeLevelStyleCompaction(memtable_memory_budget) -- the easiest way to optimize rocksdb for less stalls with level style compaction. This is very likely not ideal configuration. Feel free to suggest improvements. I used some of Mark's suggestions from here: https://github.com/facebook/rocksdb/issues/54 * OptimizeUniversalStyleCompaction(memtable_memory_budget) -- optimize for universal compaction. Test Plan: compiled rocksdb. ran examples. Reviewers: dhruba, MarkCallaghan, haobo, sdong, yhchiang Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18621	2014-05-10 10:49:33 -07:00
sdong	3a171dcb51	Pass logger to memtable rep and TLB page allocation error logged to info logs Summary: TLB page allocation errors are now logged to info logs, instead of stderr. In order to do that, mem table rep's factory functions take a info logger now. Test Plan: make all check Reviewers: haobo, igor, yhchiang Reviewed By: yhchiang CC: leveldb, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18471	2014-05-05 16:43:37 -07:00
sdong	4a7c747064	Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"" And make the default 0 for hash linked list memtable This reverts commit `d69dc64be7`.	2014-05-04 13:56:29 -07:00
Igor Canadi	db1854d78c	Declare all DB methods virtual so that StackableDB can override them	2014-05-04 11:39:49 -07:00
Igor Canadi	d69dc64be7	Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB" This reverts commit `7dafa3a1d7`.	2014-05-04 08:37:09 -07:00
Benjamin Renard	41e5cf2392	Add share_files_with_cheksum option to BackupEngine Summary: added a new option to BackupEngine: if share_files_with_checksum is set to true, sst files are stored in shared_checksum/ and are identified by the triple (file name, checksum, file size) instead of just the file name. This option is targeted at distributed databases that want to backup their primary replica. Test Plan: unit tests and tested backup and restore on a distributed rocksdb Reviewers: igor Reviewed By: igor Differential Revision: https://reviews.facebook.net/D18393	2014-05-02 17:08:55 -07:00
Igor Canadi	4ecfbcf865	ApplyToAllCacheEntries Summary: Added a method that executes a callback on every cache entry. Test Plan: added a unit test Reviewers: haobo Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D18441	2014-05-02 16:24:04 -04:00
Igor Canadi	82042f451c	Include version in options	2014-05-01 19:28:23 -04:00
Igor Canadi	0afc8bc29a	xxHash Summary: Originally: https://github.com/facebook/rocksdb/pull/87/files I'm taking over to apply some finishing touches Test Plan: will add tests Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18315	2014-05-01 14:09:32 -04:00
Igor Canadi	096f5be0ed	Put column family information in LiveFileMetaData Summary: As summary Test Plan: compiles :) Reviewers: dhruba, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18405	2014-04-30 16:24:52 -04:00
Igor Canadi	df70047669	Flush stale column families Summary: Added a new option `max_total_wal_size`. Once the total WAL size goes over that, we make an attempt to flush all column families that still have data in the earliest WAL file. By default, I calculate `max_total_wal_size` dynamically, that should be good-enough for non-advanced customers. Test Plan: Added a test Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18345	2014-04-30 14:33:40 -04:00
sdong	7dafa3a1d7	Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, dhruba, leveldb, igor, yhchiang Differential Revision: https://reviews.facebook.net/D18357	2014-04-30 11:02:26 -07:00
Yueh-Hsuan Chiang	9d9d2965cb	Add a new mem-table representation based on cuckoo hash. Summary: = Major Changes = * Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash. Cuckoo hash uses multiple hash functions. This allows each key to have multiple possible locations in the mem-table. - Put: When insert a key, it will try to find whether one of its possible locations is vacant and store the key. If none of its possible locations are available, then it will kick out a victim key and store at that location. The kicked-out victim key will then be stored at a vacant space of its possible locations or kick-out another victim. In this diff, the kick-out path (known as cuckoo-path) is found using BFS, which guarantees to be the shortest. - Get: Simply tries all possible locations of a key --- this guarantees worst-case constant time complexity. - Time complexity: O(1) for Get, and average O(1) for Put if the fullness of the mem-table is below 80%. - Default using two hash functions, the number of hash functions used by the cuckoo-hash may dynamically increase if it fails to find a short-enough kick-out path. - Currently, HashCuckooRep does not support iteration and snapshots, as our current main purpose of this is to optimize point access. = Minor Changes = * Add IsSnapshotSupported() to DB to indicate whether the current DB supports snapshots. If it returns false, then DB::GetSnapshot() will always return nullptr. Test Plan: Run existing tests. Will develop a test specifically for cuckoo hash in the next diff. Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb, dhruba, igor Differential Revision: https://reviews.facebook.net/D16155	2014-04-29 17:13:46 -07:00
Igor Canadi	e525bb16ea	Make kMajorVersion and kMinorVersion take version from version macros	2014-04-29 11:59:48 -04:00
Igor Canadi	6cb0cb300c	Add version.h	2014-04-29 11:33:00 -04:00
Igor Canadi	f868dcbbed	Support for adding TTL-ed column family Summary: This enables user to add a TTL column family to normal DB. Next step should be to expand StackableDB and create StackableColumnFamily, such that users can for example add geo-spatial column families to normal DB. Test Plan: added a test Reviewers: dhruba, haobo, ljin Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D18201	2014-04-28 20:34:20 -07:00
Donovan Hide	4f9fae9bb7	Add rocksdb_open_for_read_only to C API	2014-04-27 20:57:10 +01:00
Igor Canadi	a618691a3b	Read-only BackupEngine Summary: Read-only BackupEngine can connect to the same backup directory that is already running BackupEngine. That enables some interesting use-cases (i.e. restoring replica from primary's backup directory) Test Plan: added a unit test Reviewers: dhruba, haobo, ljin Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D18297	2014-04-25 15:49:29 -04:00
Lei Jin	3995e801ab	kill ReadOptions.prefix and .prefix_seek Summary: also add an override option total_order_iteration if you want to use full iterator with prefix_extractor Test Plan: make all check Reviewers: igor, haobo, sdong, yhchiang Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D17805	2014-04-25 12:21:34 -07:00
sdong	86a0133d05	PlainTableReader to expose index size to users Summary: This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files. In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them. Test Plan: Add a unit test. make all check` Reviewers: haobo, ljin Reviewed By: haobo CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18195	2014-04-22 19:29:05 -07:00
Igor Canadi	3992aec8fa	Support for column families in TTL DB Summary: This will enable people using TTL DB to do so with multiple column families. They can also specify different TTLs for each one. TODO: Implement CreateColumnFamily() in TTL world. Test Plan: Added a very simple sanity test. Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: haobo CC: leveldb, alberts Differential Revision: https://reviews.facebook.net/D17859	2014-04-22 11:27:33 -07:00
Igor Canadi	588bca2020	RocksDBLite Summary: Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile. Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping) Test Plan: compiles :) Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17835	2014-04-15 13:39:26 -07:00
Igor Canadi	23c8f89b57	Revert "Don't compile ldb tool into static library" This reverts commit `e296577ef6`.	2014-04-15 11:29:02 -07:00
Igor Canadi	e296577ef6	Don't compile ldb tool into static library Summary: This is first step of my effort to reduce size of librocksdb.a for use in mobile. ldb object files are huge and are ment to be used as a command line tool. I moved them to `tools/` directory and include them only when compiling `ldb` This diff reduced librocksdb.a from 42MB to 39MB on my mac (not stripped). Test Plan: ran ldb Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17823	2014-04-15 10:52:39 -07:00
Igor Canadi	be016613c2	Expose in memory Env to the world Summary: That will help with some iOS testing I'm doing. Test Plan: compiles Reviewers: dhruba, haobo, ljin, yhchiang, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17787	2014-04-14 12:28:15 -07:00
Igor Canadi	30aff72f77	Don't shadow in ColumnFamilyDescriptor	2014-04-11 14:48:20 -07:00
Lei Jin	eba3fc644a	make corruption_test:CompactionInputErrorParanoid deterministic Summary: it writes ~10M data, default L0 compaction trigger is 4, plus 2 writer buffer, so that can accommodate ~6M data before compaction happens for sure. I guess encoding is doing a good job to shrink the data so that sometime, compaction does not get triggered. I get test failure quite often. Test Plan: ran it multiple times and all got pass Reviewers: igor, sdong Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17775	2014-04-11 12:48:38 -07:00
Igor Canadi	b3d7435b4e	No shadow in public headers	2014-04-10 17:19:03 -07:00
Igor Canadi	ddef6841b3	Renamed InfoLogLevel::DEBUG to InfoLogLevel::DEBUG_LEVEL Summary: XCode for some reason injects `#define DEBUG 1` into our code, which makes compile fail because we use `DEBUG` keyword for other stuff. This diff fixes the issue by renaming `DEBUG` to `DEBUG_LEVEL`. Test Plan: compiles Reviewers: dhruba, haobo, sdong, yhchiang, ljin Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17709	2014-04-10 15:27:42 -07:00
Kai Liu	75b59d5146	Enable hash index for block-based table Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index. Test Plan: Wrote several new unit tests. Reviewers: sdong, haobo, dhruba Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D16539	2014-04-10 14:19:43 -07:00
Igor Canadi	4daea66343	Turn on -Wmissing-prototypes Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes. Test Plan: compiles Reviewers: dhruba, haobo, ljin, sdong Reviewed By: ljin CC: leveldb Differential Revision: https://reviews.facebook.net/D17649	2014-04-09 21:17:14 -07:00
Igor Canadi	b947fdc89d	Column family support for DB::OpenForReadOnly() Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though) Test Plan: added a unit test in column_family_test Reviewers: haobo, sdong, ljin, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17565	2014-04-09 09:56:17 -07:00
Lei Jin	92c1eb0291	macros for perf_context Summary: This will allow us to disable them completely for iOS or for better performance Test Plan: will run make all check Reviewers: igor, haobo, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17511	2014-04-08 10:58:07 -07:00
Igor Canadi	664559fe2d	Small final fixes before merge	2014-04-07 15:38:53 -07:00
Igor Canadi	3d2fe844ab	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_impl.h db/memtable_list.cc db/version_set.cc	2014-04-07 11:31:11 -07:00
Igor Canadi	bcd1f15b60	Remove -Wno-unused-const-variable	2014-04-04 16:15:47 -07:00
Lei Jin	c90d446ee7	make hash_link_list Node's key space consecutively followed at the end Summary: per sdong's request, this will help processor prefetch on n->key case. Test Plan: make all check Reviewers: sdong, haobo, igor Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D17415	2014-04-04 15:37:28 -07:00
Igor Canadi	51023c3911	Make RocksDB compile for iOS Summary: I had to make number of changes to the code and Makefile: * Add `make lib`, that will create static library without debug info. We need this to avoid growing binary too much. Currently it's 14MB. * Remove cpuinfo() function and use __SSE4_2__ macro. We actually used the macro as part of Fast_CRC32() function. As a result, I also accidentally fixed this issue: https://www.facebook.com/groups/rocksdb.dev/permalink/549700778461774/?stream_ref=2 * Remove __thread locals in OS_MACOSX Test Plan: `make lib PLATFORM=IOS` Reviewers: ljin, haobo, dhruba, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17475	2014-04-04 13:11:44 -07:00
Thomas Adam	98422cba77	[C-API] implemented more options	2014-04-03 10:47:37 +02:00
Thomas Adam	3a30b5b0be	[C-API] added "rocksdb_options_set_plain_table_factory" to make it possible to use plain table factory	2014-04-03 10:47:37 +02:00
sdong	4af1954fd6	Compaction Filter V1 to use old context struct to keep backward compatible Summary: The previous change D15087 changed existing compaction filter, which makes the commonly used class not backward compatible. Revert the older interface. Use a new interface for V2 instead. Test Plan: make all check Reviewers: haobo, yhchiang, igor CC: danguo, dhruba, ljin, igor, leveldb Differential Revision: https://reviews.facebook.net/D17223	2014-04-02 14:57:51 -07:00
Igor Canadi	8555ce2dec	Merge branch 'master' into columnfamilies	2014-04-02 10:48:05 -07:00
Thomas Adam	38dc5ef45f	[C-API] added the possiblity to create a HashSkipList or HashLinkedList to support prefix seeks	2014-04-01 12:44:27 +02:00
Igor Canadi	ddbd1ece88	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/db_test.cc db/internal_stats.cc db/internal_stats.h db/version_edit.cc db/version_edit.h db/version_set.cc include/rocksdb/options.h util/options.cc	2014-03-31 13:39:24 -07:00
sdong	43a593a6d9	Change default value of some Options Summary: Since we are optimizing for server workloads, some default values are not optimized any more. We change some of those values that I feel it's less prone to regression bugs. Test Plan: make all check Reviewers: dhruba, haobo, ljin, igor, yhchiang Reviewed By: igor CC: leveldb, MarkCallaghan Differential Revision: https://reviews.facebook.net/D16995	2014-03-28 17:09:28 -07:00
Dhruba Borthakur	4031b98373	A GIS implementation for rocksdb. Summary: This patch stores gps locations in rocksdb. Each object is uniquely identified by an id. Each object has a gps (latitude, longitude) associated with it. The geodb supports looking up an object either by its gps location or by its id. There is a method to retrieve all objects within a circular radius centered at a specified gps location. Test Plan: Simple unit-test attached. Reviewers: leveldb, haobo Reviewed By: haobo CC: leveldb, tecbot, haobo Differential Revision: https://reviews.facebook.net/D15567	2014-03-28 15:26:44 -07:00
Lei Jin	0d755fff14	cache friendly blocked bloomfilter Summary: By constraining the probes within cache line(s), we can improve the cache miss rate thus performance. This probably only makes sense for in-memory workload so defaults the option to off. Numbers and comparision can be found in wiki: https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study Test Plan: benchmarked this change substantially. Will run make all check as well Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17133	2014-03-28 09:21:20 -07:00
Igor Canadi	1c9f8f0884	Fix valgrind issues Summary: NewFixedPrefixTransform is leaked in default options. Broken by `b47812fba6` Also included in the diff some code cleanup Test Plan: valgrind env_test also make check Reviewers: haobo, danguo, yhchiang Reviewed By: danguo CC: leveldb Differential Revision: https://reviews.facebook.net/D17211	2014-03-27 08:22:59 -07:00
Igor Canadi	e8168382c4	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc include/rocksdb/options.h util/options.cc	2014-03-25 11:09:40 -07:00
Danny Guo	b47812fba6	[rocksdb] new CompactionFilterV2 API Summary: This diff adds a new CompactionFilterV2 API that roll up the decisions of kv pairs during compactions. These kv pairs must share the same key prefix. They are buffered inside the db. typedef std::vector<Slice> SliceVector; virtual std::vector<bool> Filter(int level, const SliceVector& keys, const SliceVector& existing_values, std::vector<std::string>* new_values, std::vector<bool>* values_changed ) const = 0; Application can override the Filter() function to operate on the buffered kv pairs. More details in the inline documentation. Test Plan: make check. Added unit tests to make sure Keep, Delete, Change all works. Reviewers: haobo CCs: leveldb Differential Revision: https://reviews.facebook.net/D15087	2014-03-24 20:47:53 -07:00
Yueh-Hsuan Chiang	cda4006e87	Enhance partial merge to support multiple arguments Summary: * PartialMerge api now takes a list of operands instead of two operands. * Add min_pertial_merge_operands to Options, indicating the minimum number of operands to trigger partial merge. * This diff is based on Schalk's previous diff (D14601), but it also includes necessary changes such as updating the pure C api for partial merge. Test Plan: * make check all * develop tests for cases where partial merge takes more than two operands. TODOs (from Schalk): * Add test with min_partial_merge_operands > 2. * Perform benchmarks to measure the performance improvements (can probably use results of task #2837810.) * Add description of problem to doc/index.html. * Change wiki pages to reflect the interface changes. Reviewers: haobo, igor, vamsi Reviewed By: haobo CC: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D16815	2014-03-24 17:57:13 -07:00
Igor Canadi	b253f24403	Rate limiter for BackupableDB Summary: Might be useful if client doesn't want to effect running system during backup too much. Test Plan: added a test case Reviewers: dhruba, haobo, ljin Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17091	2014-03-24 11:38:44 -07:00
Igor Canadi	e67241f0b9	Sanity check on Open Summary: Everytime a client opens a DB, we do a sanity check that: * checks the existance of all the necessary files * verifies that file sizes are correct Some of the code was stolen from https://reviews.facebook.net/D16935 Test Plan: added a unit test Reviewers: dhruba, haobo, sdong Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D17097	2014-03-20 14:18:29 -07:00
Yiting Li	7981a43274	Consistency Check Function Summary: Added a function/command to check the consistency of live files' meta data Test Plan: Manual test (size mismatch, file not exist). Command test script. Reviewers: haobo Reviewed By: haobo CC: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D16935	2014-03-20 13:42:45 -07:00
Igor Canadi	3055a15b29	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc db/version_edit.cc db/version_edit.h db/version_set.cc	2014-03-18 13:24:27 -07:00
Igor Canadi	f26cb0f093	Optimize fallocation Summary: Based on my recent findings (posted in our internal group), if we use fallocate without KEEP_SIZE flag, we get superior performance of fdatasync() in append-only workloads. This diff provides an option for user to not use KEEP_SIZE flag, thus optimizing his sync performance by up to 2x-3x. At one point we also just called posix_fallocate instead of fallocate, which isn't very fast: http://code.woboq.org/userspace/glibc/sysdeps/posix/posix_fallocate.c.html (tl;dr it manually writes out zero bytes to allocate storage). This diff also fixes that, by first calling fallocate and then posix_fallocate if fallocate is not supported. Test Plan: make check Reviewers: dhruba, sdong, haobo, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16761	2014-03-17 21:52:14 -07:00
Igor Canadi	64904b39a0	Merge branch 'master' into columnfamilies Conflicts: utilities/backupable/backupable_db.cc	2014-03-17 17:57:14 -07:00
Igor Canadi	9caeff516e	keep_log_files option in BackupableDB Summary: Added an option to BackupableDB implementation that allows users to persist in-memory databases. When the restore happens with keep_log_files = true, it will ) Not delete existing log files in wal_dir ) Move log files from archive directory to wal_dir, so that DB can replay them if necessary Test Plan: Added an unit test Reviewers: dhruba, ljin Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D16941	2014-03-17 15:39:23 -07:00
Igor Canadi	db234133a9	[CF] WriteBatch to take in ColumnFamilyHandle Summary: Client doesn't need to know anything about ColumnFamily ID. By making WriteBatch take ColumnFamilyHandle as a parameter, we can eliminate method GetID() from ColumnFamilyHandle Test Plan: column_family_test Reviewers: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16887	2014-03-14 11:30:14 -07:00
Igor Canadi	dff9214165	Merge branch 'master' into columnfamilies Conflicts: db/db_impl.cc tools/db_stress.cc	2014-03-12 10:17:41 -07:00
Igor Canadi	2b95dc1542	Revert "Fix bad merge of D16791 and D16767" This reverts commit `839c8ecfcd`.	2014-03-12 09:37:43 -07:00
sdong	839c8ecfcd	Fix bad merge of D16791 and D16767 Summary: A bad Auto-Merge caused log buffer is flushed twice. Remove the unintended one. Test Plan: Should already be tested (the code looks the same as when I ran unit tests). Reviewers: haobo, igor Reviewed By: haobo CC: ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D16821	2014-03-11 21:31:57 -07:00
Igor Canadi	457c78eb89	[CF] db_stress for column families Summary: I had this diff for a while to test column families implementation. Last night, I ran it sucessfully for 10 hours with the command: time ./db_stress --threads=30 --ops_per_thread=200000000 --max_key=5000 --column_families=20 --clear_column_family_one_in=3000000 --verify_before_write=1 --reopen=50 --max_background_compactions=10 --max_background_flushes=10 --db=/tmp/db_stress It is ready to be committed :) Test Plan: Ran it for 10 hours Reviewers: dhruba, haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D16797	2014-03-11 12:06:12 -07:00
sdong	01dcef114b	Env to add a function to allow users to query waiting queue length Summary: Add a function to Env so that users can query the waiting queue length of each thread pool Test Plan: add a test in env_test Reviewers: haobo Reviewed By: haobo CC: dhruba, igor, yhchiang, ljin, nkg-, leveldb Differential Revision: https://reviews.facebook.net/D16755	2014-03-11 10:19:02 -07:00
Igor Canadi	9634ba42ac	Merge branch 'master' into columnfamilies Conflicts: db/compaction_picker.cc db/db_impl.cc db/db_impl.h db/tailing_iter.cc db/version_set.h include/rocksdb/options.h util/options.cc	2014-03-10 17:26:09 -07:00
Igor Canadi	9db8c4c556	Fix share_table_files bug Summary: constructor wasn't properly constructing BackupableDBOptions Test Plan: no test Reviewers: benj Reviewed By: benj CC: leveldb Differential Revision: https://reviews.facebook.net/D16749	2014-03-10 14:42:03 -07:00

... 2 3 4 5 6 ...

742 Commits