rocksdb

Author	SHA1	Message	Date
Igor Canadi	6bb7e3ef25	Merger test Summary: I abandoned https://reviews.facebook.net/D18789, but I wrote a good unit test there, so let's check it in. :) Test Plan: this is test Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22827	2014-09-08 22:24:40 -07:00
Igor Canadi	a2bb7c3c33	Push- instead of pull-model for managing Write stalls Summary: Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either: * proceed with all writes without delay * delay all writes by fixed time * stop all writes The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case). When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal. This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write. Test Plan: make check for now. I'll add some unit tests later. Also, perf test. Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22791	2014-09-08 11:20:25 -07:00
Feng Zhu	0af157f9bf	Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979	2014-09-08 10:37:05 -07:00
Feng Zhu	40ddc3d6c4	add cache bench Summary: 1. A benchmark for cache Test Plan: ./cache_bench Reviewers: yhchiang, dhruba, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22809	2014-09-05 15:55:43 -07:00
Radheshyam Balasundaram	b6fd7811eb	Don't do memtable lookup in db_impl_readonly if memtables are empty while opening db. Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not. Test Plan: db_test make check all Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22281	2014-08-26 17:19:03 -07:00
sdong	28b5c76004	WriteBatchWithIndex: a wrapper of WriteBatch, with a searchable index Summary: Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write. WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset. Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries. I will add more unit tests if people are OK about this API. Test Plan: make all check Add unit tests. Reviewers: yhchiang, igor, MarkCallaghan, ljin Reviewed By: ljin Subscribers: dhruba, leveldb, xjin Differential Revision: https://reviews.facebook.net/D21381	2014-08-18 16:37:38 -07:00
Lei Jin	218857b3f5	remove tailing_iter.h/cc Summary: as title Test Plan: make all check ran db_bench and saw seek stats at the end Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21651	2014-08-12 17:13:15 -07:00
Radheshyam Balasundaram	9674c11d01	Integrating Cuckoo Hash SST Table format into RocksDB Summary: Contains the following changes: - Implementation of cuckoo_table_factory - Adding cuckoo table into AdaptiveTableFactory - Adding cuckoo_table_db_test, similar to lines of plain_table_db_test - Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key. - Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test cuckoo_table_db_test make check all make valgrind_check make asan_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21219	2014-08-11 20:21:07 -07:00
miguelportilla	93e6b5e9d9	Changes to support unity build: * Script for building the unity.cc file via Makefile * Unity executable Makefile target for testing builds * Source code changes to fix compilation of unity build	2014-08-11 13:22:47 -04:00
Radheshyam Balasundaram	62f9b071ff	Implementation of CuckooTableReader Summary: Contains: - Implementation of TableReader based on Cuckoo Hashing - Unittests for CuckooTableReader - Performance test for TableReader Test Plan: make cuckoo_table_reader_test ./cuckoo_table_reader_test make valgrind_check make asan_check Reviewers: yhchiang, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20511	2014-07-25 16:37:32 -07:00
Igor Canadi	6296330417	SpatialDB Summary: This diff is adding spatial index support to RocksDB. When creating the DB user specifies a list of spatial indexes. Spatial indexes can cover different areas and have different resolution (i.e. number of tiles). This is useful for supporting different zoom levels. Each element inserted into SpatialDB has: * a bounding box, which determines how will the element be indexed * string blob, which will usually be WKB representation of the polygon (http://en.wikipedia.org/wiki/Well-known_text) * feature set, which is a map of key-value pairs, where value can be int, double, bool, null or a string. FeatureSet will be a set of tags associated with geo elements (for example, 'road': 'highway' and similar) * a list of indexes to insert the element in. For example, small river element will be inserted in index for high zoom level, while country border will be inserted in all indexes (including the index for low zoom level). Each query is executed on single spatial index. Query guarantees that it will return all elements intersecting the specified bounding box, but it might also return some extra non-intersecting elements. Test Plan: Added bunch of unit tests in spatial_db_test Reviewers: dhruba, yinwang Reviewed By: yinwang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20361	2014-07-23 14:22:58 -04:00
Yueh-Hsuan Chiang	b5c4c0b86b	[Java] Add the missing ROCKSDB_JAR variable in Makefile Summary: Add the missing ROCKSDB_JAR variable in Makefile, which is mistakenly removed in https://reviews.facebook.net/D20289. Test Plan: export ROCKSDB_JAR= make rocksdbjava	2014-07-23 11:16:18 -07:00
Igor Canadi	f82d4a2498	Also bump version in Makefile	2014-07-23 10:28:41 -04:00
sdong	e6de02103a	Add a utility function to guess optimized options based on constraints Summary: Add a function GetOptions(), where based on four parameters users give: read/write amplification threshold, memory budget for mem tables and target DB size, it picks up a compaction style and parameters for them. Background threads are not touched yet. One limit of this algorithm: since compression rate and key/value size are hard to predict, it's hard to predict level 0 file size from write buffer size. Simply make 1:1 ratio here. Sample results: https://reviews.facebook.net/P477 Test Plan: Will add some a unit test where some sample scenarios are given and see they pick the results that make sense Reviewers: yhchiang, dhruba, haobo, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18741	2014-07-22 15:24:21 -07:00
Yueh-Hsuan Chiang	ae7743f226	Fixed some make and linking issues of RocksDBJava Summary: Fixed some make and linking issues of RocksDBJava. Specifically: * Add JAVA_LDFLAGS, which does not include gflags * rocksdbjava library now uses JAVA_LDFLAGS instead of LDFLAGS * java/Makefile now includes build_config.mk * rearrange make rocksdbjava workflow to ensure the library file is correctly included in the jar file. Test Plan: make rocksdbjava make jdb_bench java/jdb_bench.sh Reviewers: dhruba, swapnilghike, zzbennett, rsumbaly, ankgup87 Reviewed By: ankgup87 Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20289	2014-07-21 22:41:54 -07:00
Radheshyam Balasundaram	cf3da899b0	Adding a new SST table builder based on Cuckoo Hashing Summary: Cuckoo Hashing based SST table builder. Contains: - Cuckoo Hashing logic and file storage logic. - Unit tests for logic Test Plan: make cuckoo_table_builder_test ./cuckoo_table_builder_test make check all Reviewers: yhchiang, igor, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19545	2014-07-21 13:26:09 -07:00
Yueh-Hsuan Chiang	2f289dccf3	Add -Wsign-compare to WARNING_FLAGS in Makefile Summary: Add -Wsign-compare to WARNING_FLAGS in Makefile as not all g++ compiler include -Wsign-compare in -Wall when compiling '.h' file. Test Plan: make -j32 Reviewers: ljin, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20169	2014-07-17 17:26:12 -07:00
Stanislau Hlebik	1c9f190ae3	Fix db_test Summary: Added deletion of DBIterators in DBIterator's tests Test Plan: make valgrind_check Reviewers: igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20043	2014-07-16 14:51:43 -07:00
sdong	01700b6911	Update master to version 3.3 Summary: As tittle Test Plan: no need Reviewers: igor, yhchiang, ljin Reviewed By: ljin Subscribers: haobo, dhruba, xjin, leveldb Differential Revision: https://reviews.facebook.net/D19629	2014-07-10 11:59:35 -07:00
Igor Canadi	f0a8be253e	JSON (Document) API sketch Summary: This is a rough sketch of our new document API. Would like to get some thoughts and comments about the high-level architecture and API. I didn't optimize for performance at all. Leaving some low-hanging fruit so that we can be happy when we fix them! :) Currently, bunch of features are not supported at all. Indexes can be only specified when creating database. There is no query planner whatsoever. This will all be added in due time. Test Plan: Added a simple unit test Reviewers: haobo, yhchiang, dhruba, sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18747	2014-07-10 09:31:42 -07:00
Lei Jin	5ef1ba7ff5	generic rate limiter Summary: A generic rate limiter that can be shared by threads and rocksdb instances. Will use this to smooth out write traffic generated by compaction and flush. This will help us get better p99 behavior on flash storage. Test Plan: unit test output ==== Test RateLimiterTest.Rate request size [1 - 1023], limit 10 KB/sec, actual rate: 10.374969 KB/sec, elapsed 2002265 request size [1 - 2047], limit 20 KB/sec, actual rate: 20.771242 KB/sec, elapsed 2002139 request size [1 - 4095], limit 40 KB/sec, actual rate: 41.285299 KB/sec, elapsed 2202424 request size [1 - 8191], limit 80 KB/sec, actual rate: 81.371605 KB/sec, elapsed 2402558 request size [1 - 16383], limit 160 KB/sec, actual rate: 162.541268 KB/sec, elapsed 3303500 Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19359	2014-07-08 11:41:57 -07:00
Ankit Gupta	e0ebea6cc2	Add doc and end of line	2014-06-22 13:27:22 -07:00
Igor Canadi	00b26c3a83	JSONDocument Summary: After evaluating options for JSON storage, I decided to implement our own. The reason is that we'll be able to optimize it better and we get to reduce unnecessary dependencies (which is what we'd get with folly). I also plan to write a serializer/deserializer for JSONDocument with our own binary format similar to BSON. That way we'll store binary JSON format in RocksDB instead of the plain-text JSON. This means less storage and faster deserialization. There are still some inefficiencies left here. I plan to optimize them after we develop a functioning DocumentDB. That way we can move and iterate faster. Test Plan: added a unit test Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18831	2014-06-20 11:14:14 +02:00
Igor Canadi	f068d2a94d	Move master version to 3.2	2014-05-23 10:27:56 -07:00
Mike Lin	76596b5318	Fix building RocksDB in paths containing spaces -- quote path names in Makefile and build_detect_platform.	2014-05-10 21:01:25 -07:00
Igor Canadi	313b2e5da1	Better INSTALL.md and Makefile rules Summary: We have a lot of problems with gflags. However, when compiling rocksdb static library, we don't need gflags dependency. Reorganize INSTALL.md such that first-time customers don't need any dependency installed to actually build rocksdb static library. Test Plan: none Reviewers: dhruba, haobo Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D18501	2014-05-07 16:51:30 -07:00
Igor Canadi	d2569fea47	log_and_apply_bench on a new benchmark framework Summary: db_test includes Benchmark for LogAndApply. This diff removes it from db_test and puts it into a separate log_and_apply bench. I just wanted to play around with our new benchmark framework and figure out how it works. I would also like to show you how great it is! I believe right set of microbenchmarks can speed up our productivity a lot and help catch early regressions. Test Plan: no Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D18261	2014-05-05 11:11:48 -07:00
Yueh-Hsuan Chiang	d56959a5fc	[Java] Use environmental variable JAVA_HOME in Makefile for RocksJava.	2014-05-04 13:23:21 -07:00
Krzysztof Kowalczyk	2b7cf03e0d	Update Makefile	2014-04-29 14:29:45 -07:00
Igor Canadi	f1c9aa6ebe	More unsigned/signed compare fixes	2014-04-29 13:01:06 -07:00
Igor Canadi	38693d99c4	Fix more signed/unsigned comparsions	2014-04-29 12:40:18 -07:00
Igor Canadi	e525bb16ea	Make kMajorVersion and kMinorVersion take version from version macros	2014-04-29 11:59:48 -04:00
Igor Canadi	a40970aa31	Run whitebox test before black box	2014-04-24 12:28:16 -04:00
Igor Canadi	d0939cdcea	Single-threaded asan_crash_test	2014-04-21 15:42:28 -07:00
Igor Canadi	8dc34364d2	Rename "benchmark" back to "bench". Also, make `benchharness.cc` not compiled into rocksdb library.	2014-04-21 13:12:15 -07:00
Pratyush Seth	ff1b5df4c6	Added benchmark functionality on the lines of folly/Benchmark.h Summary: Added benchmark functionality on the lines of folly/Benchmark.h Test Plan: Added unit tests Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba Reviewed By: igor CC: leveldb Differential Revision: https://reviews.facebook.net/D17973	2014-04-21 12:29:55 -07:00
Lei Jin	0f2d768191	hints for narrowing down FindFile range and avoiding checking unrelevant L0 files Summary: The file tree structure in Version is prebuilt and the range of each file is known. On the Get() code path, we do binary search in FindFile() by comparing target key with each file's largest key and also check the range for each L0 file. With some pre-calculated knowledge, each key comparision that has been done can serve as a hint to narrow down further searches: (1) If a key falls within a L0 file's range, we can safely skip the next file if its range does not overlap with the current one. (2) If a key falls within a file's range in level L0 - Ln-1, we should only need to binary search in the next level for files that overlap with the current one. (1) will be able to skip some files depending one the key distribution. (2) can greatly reduce the range of binary search, especially for bottom levels, given that one file most likely only overlaps with N files from the level below (where N is max_bytes_for_level_multiplier). So on level L, we will only look at ~N files instead of N^L files. Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s = 9.6M/s), it gives us ~13% improvement. Test Plan: make all check Reviewers: haobo, igor, dhruba, sdong, yhchiang Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17205	2014-04-21 09:10:12 -07:00
Ankit Gupta	ebd85e8f3a	Fix build	2014-04-18 10:47:03 -07:00
Ankit Gupta	dc291f5bf0	Merge branch 'master' of https://github.com/facebook/rocksdb Conflicts: Makefile java/Makefile java/org/rocksdb/Options.java java/rocksjni/portal.h	2014-04-18 10:32:14 -07:00
Yueh-Hsuan Chiang	bb6fd15a6e	[Java] Add a basic binding and test for BackupableDB and StackableDB. Summary: Add a skeleton binding and test for BackupableDB which shows that BackupableDB and RocksDB can share the same JNI calls. Test Plan: make rocksdbjava make jtest Reviewers: haobo, ankgup87, sdong, dhruba Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17793	2014-04-17 17:28:51 -07:00
Igor Canadi	62551b1c4e	Don't compile sync_point if NDEBUG Summary: We don't really need sync_point.o if we're compiling with NDEBUG. This diff depends on D17823 Test Plan: compiles Reviewers: haobo, ljin, sdong Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17829	2014-04-17 10:49:58 -07:00
Ankit Gupta	320ae72e17	Add histogramType for statistics	2014-04-16 21:38:33 -07:00
Igor Canadi	7d838856cf	Fix compile issues when doing make release	2014-04-15 16:00:10 -07:00
Igor Canadi	588bca2020	RocksDBLite Summary: Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile. Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping) Test Plan: compiles :) Reviewers: dhruba, haobo, ljin, sdong, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17835	2014-04-15 13:39:26 -07:00
Igor Canadi	23c8f89b57	Revert "Don't compile ldb tool into static library" This reverts commit e296577ef64deac036a275a1a4c0d6172cfa42df.	2014-04-15 11:29:02 -07:00
Igor Canadi	a347ffe92f	Revert "Fix sst_dump and reduce_levels_test compile errors" This reverts commit d8f00b4109e3df10be56141f3ff3ba9b0d10f585.	2014-04-15 11:28:52 -07:00
Igor Canadi	d8f00b4109	Fix sst_dump and reduce_levels_test compile errors	2014-04-15 11:13:12 -07:00
Igor Canadi	e296577ef6	Don't compile ldb tool into static library Summary: This is first step of my effort to reduce size of librocksdb.a for use in mobile. ldb object files are huge and are ment to be used as a command line tool. I moved them to `tools/` directory and include them only when compiling `ldb` This diff reduced librocksdb.a from 42MB to 39MB on my mac (not stripped). Test Plan: ran ldb Reviewers: dhruba, haobo, sdong, ljin, yhchiang Reviewed By: yhchiang CC: leveldb Differential Revision: https://reviews.facebook.net/D17823	2014-04-15 10:52:39 -07:00
Yueh-Hsuan Chiang	ca4fa2047e	[Java] rename 'make jni' to 'make rocksdbjava'	2014-04-10 10:04:48 -07:00
Yueh-Hsuan Chiang	0f5cbcd798	[JNI] Add an initial benchmark for java binding for rocksdb. Summary: * Add a benchmark for java binding for rocksdb. The java benchmark is a complete rewrite based on the c++ db/db_bench.cc and the DbBenchmark in dain's java leveldb. * Support multithreading. * 'readseq' is currently not supported as it requires RocksDB Iterator. * usage: --benchmarks Comma-separated list of operations to run in the specified order Actual benchmarks: fillseq -- write N values in sequential key order in async mode fillrandom -- write N values in random key order in async mode fillbatch -- write N/1000 batch where each batch has 1000 values in random key order in sync mode fillsync -- write N/100 values in random key order in sync mode fill100K -- write N/1000 100K values in random order in async mode readseq -- read N times sequentially readrandom -- read N times in random order readhot -- read N times in random order from 1% section of DB Meta Operations: delete -- delete DB DEFAULT: [fillseq, readrandom, fillrandom] --compression_ratio Arrange to generate values that shrink to this fraction of their original size after compression DEFAULT: 0.5 --use_existing_db If true, do not destroy the existing database. If you set this flag and also specify a benchmark that wants a fresh database, that benchmark will fail. DEFAULT: false --num Number of key/values to place in database. DEFAULT: 1000000 --threads Number of concurrent threads to run. DEFAULT: 1 --reads Number of read operations to do. If negative, do --nums reads. --key_size The size of each key in bytes. DEFAULT: 16 --value_size The size of each value in bytes. DEFAULT: 100 --write_buffer_size Number of bytes to buffer in memtable before compacting (initialized to default value by 'main'.) DEFAULT: 4194304 --cache_size Number of bytes to use as a cache of uncompressed data. Negative means use default settings. DEFAULT: -1 --seed Seed base for random number generators. DEFAULT: 0 --db Use the db with the following name. DEFAULT: /tmp/rocksdbjni-bench * Add RocksDB.write(). Test Plan: make jbench Reviewers: haobo, sdong, dhruba, ankgup87 Reviewed By: haobo CC: leveldb Differential Revision: https://reviews.facebook.net/D17433	2014-04-09 00:48:20 -07:00

1 2 3 4 5

214 Commits