rocksdb

Author	SHA1	Message	Date
krad	d9f4875e52	Disable pre-fetching of index and filter blocks for sst_dump_tool. Summary: BlockBasedTable pre-fetches the filter and index blocks on Open call. This is an optimistic optimization targeted for runtime scenario. The optimization is unnecessary for sst_dump_tool - Added a provision to disable pre-fetching of index and filter blocks in BlockBasedTable - Disabled pre-fetching for the sst_dump tool Stack for reference : #01 0x00000000005ed944 in snappy::InternalUncompress<snappy::SnappyArrayWriter> () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:148 #02 0x00000000005edeee in snappy::RawUncompress () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:947 #03 0x00000000004e0b4d in rocksdb::UncompressBlockContents () from /data/users/paultuckfield/rocksdb/./util/compression.h:69 #04 0x00000000004e145c in rocksdb::ReadBlockContents () from /data/users/paultuckfield/rocksdb/table/format.cc:334 #05 0x00000000004ca424 in rocksdb::(anonymous namespace)::ReadBlockFromFile () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:70 #06 0x00000000004cccad in rocksdb::BlockBasedTable::CreateIndexReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:173 #07 0x00000000004d17e5 in rocksdb::BlockBasedTable::Open () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:553 #08 0x00000000004c8184 in rocksdb::BlockBasedTableFactory::NewTableReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_factory.cc:51 #09 0x0000000000598463 in rocksdb::SstFileReader::NewTableReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:69 #10 0x00000000005986c2 in rocksdb::SstFileReader::SstFileReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:26 #11 0x0000000000599047 in rocksdb::SSTDumpTool::Run () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:332 #12 0x0000000000409b06 in main () from /data/users/paultuckfield/rocksdb/tools/sst_dump.cc:12 Test Plan: - Added a unit test to trigger the code. - Also did some manual verification. - Passed all unit tests task #6296048 Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34041	2015-02-25 16:34:26 -08:00
Igor Sugak	62f7a1be4f	rocksdb: Fixed 'Dead assignment' and 'Dead initialization' scan-build warnings Summary: This diff contains trivial fixes for 6 scan-build warnings: db/c_test.c `db` variable is never read. Removed assignment. scan-build report: http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath db/db_iter.cc `skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return. scan-build report: http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath db/log_reader.cc In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion. scan-build report: http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases. scan-build reprots: http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPath http://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath table/plain_table_key_coding.cc Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration. scan-build report: http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath tools/db_stress.cc Missing `break` in switch case block. This seems to be a bug. Added missing `break`. Test Plan: Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs. ```lang=bash % make check % make analyze ``` Reviewers: meyering, igor, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33795	2015-02-23 14:10:09 -08:00
Jim Meyering	aa5d8e6d95	table_test.cc: add missing 5th arg in TestArgs initializer Summary: Adding -W and -Wextra to CXXFLAGS provoked this failure: table/table_test.cc:1854:56: error: missing initializer for member ‘rocksdb::TestArgs::format_version’ [-Werror=missing-field-initializers] TestArgs args = { DB_TEST, false, 16, kNoCompression }; ^ Add the missing, 5th value (format_version). Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33765	2015-02-20 11:07:11 -08:00
Jim Meyering	9283c7afd2	build: remove always-true assertions Summary: Remove some always-true assertions. They provoke these compilation failures: table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] * table/plain_table_key_coding.cc (rocksdb): Remove assertion that unsigned type variable is >= 0. * db/version_set.cc (DoGenerateLevelFilesBrief): Likewise. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33747	2015-02-20 11:07:03 -08:00
Igor Sugak	98870c7b9c	rocksdb: Fix scan-build memory warning in table/block_based_table_reader.cc Summary: scan-build is reporting two memory leak bugs in `table/block_based_table_reader.cc`. They are both false positives. In both cases we allocate memory in `ReadBlockFromFile` if `s.ok()`. Then after the function `ReadBlockFromFile` returns we check for the same variable if `s.ok()` and then use the memory that was allocated. The bugs reported by scan-build is if `ReadBlockFromFile` allocates memory and returns, but for some reason status `s` is not the same and `s.ok() != true`. In this case scan-build is concerned that memory owner transfer is not explicit. I modified `ReadBlockFromFile` to accept `std::unique_ptr<Block>*` as a parameter, instead of raw pointer. scan-build reports: http://home.fburl.com/~sugak/latest2/report-a4b3fa.html#EndPath http://home.fburl.com/~sugak/latest2/report-29adbf.html#EndPath Test Plan: Make sure scan-build does not report these bugs and all tests are passing. ```lang=bash % make check % make analyze ``` Reviewers: sdong, lgalanis, meyering, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33681	2015-02-19 14:07:38 -08:00
sdong	68af7811ea	Remember whole key/prefix filtering on/off in SST file Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter. Test Plan: Add a unit test for it Reviewers: yhchiang, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32889	2015-02-11 11:20:04 -08:00
sdong	e63140d52b	Get() to use prefix bloom filter when filter is not block based Summary: Get() now doesn't make use of bloom filter if it is prefix based. Add the check. Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter. Test Plan: make all check Add a test case in DBTest Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31941	2015-02-04 15:15:41 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
Igor Canadi	96b8240bc5	Support footer versions bigger than 1 Summary: In this diff I add another parameter to BlockBasedTableOptions that will let users specify block based table's format. This will greatly simplify block based table's format changes in the future. First format change that this will support is encoding decompressed size in Zlib and BZip2 blocks. This diff is blocking https://reviews.facebook.net/D31311. Test Plan: Added a unit tests. More tests to come as part of https://reviews.facebook.net/D31311. Reviewers: dhruba, MarkCallaghan, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31383	2015-01-13 14:33:04 -08:00
Igor Canadi	15d2abbec3	Fix build issues	2015-01-09 13:04:06 -08:00
Igor Canadi	abb9b95ffe	Move compression functions from port/ to util/ Summary: We keep checksum functions in util/, there is no reason for compression to be in port/ Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31281	2015-01-09 12:57:11 -08:00
Manish Patil	7ea7bdf04d	Dump routine to BlockBasedTableReader Summary: Added necessary routines for dumping block based SST with block filter Test Plan: Added "raw" mode to utility sst_dump Reviewers: sdong, rven Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D29679	2014-12-23 13:24:07 -08:00
Jonah Cohen	a14b7873ee	Enforce write buffer memory limit across column families Summary: Introduces a new class for managing write buffer memory across column families. We supplement ColumnFamilyOptions::write_buffer_size with ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer instance that enforces memory limits before flushing out to disk. Test Plan: Added SharedWriteBuffer unit test to db_test.cc Reviewers: sdong, rven, ljin, igor Reviewed By: igor Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim Differential Revision: https://reviews.facebook.net/D22581	2014-12-02 12:09:20 -08:00
Yueh-Hsuan Chiang	7e608e2fe3	Block plain_table_index.cc in ROCKSDB_LITE Summary: Block plain_table_index.cc in ROCKSDB_LITE Test Plan: make clean make OPT=-DROCKSDB_LITE shared_lib -j32 make clean make shared_lib -j32 Reviewers: ljin, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29535	2014-11-24 20:47:27 -08:00
Yueh-Hsuan Chiang	13de000f07	Add rocksdb::ToString() to address cases where std::to_string is not available. Summary: In some environment such as android, the c++ library does not have std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string when std::to_string is not available, and implements std::to_string in the other case. Test Plan: make dbg -j32 ./db_test make clean make dbg OPT=-DOS_ANDROID -j32 ./db_test Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29181	2014-11-24 20:44:49 -08:00
Bryan Rosario	9e285d4238	Added CompatibleOptions for compatibility with LevelDB Options Summary: Created a CompatibleOptions object that can be used as a LevelDB Options object and then converted to a RocksDB Options object using the ConvertOptions() method. Test Plan: Unit test included in diff. Reviewers: ljin Reviewed By: ljin Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28893	2014-11-20 19:24:39 -08:00
Lei Jin	8d3f8f9696	remove all remaining references to cfd->options() Summary: The very last reference happens in DBImpl::GetOptions() I built with both DBImpl::GetOptions() and ColumnFamilyData::options() commented out Test Plan: make all check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29073	2014-11-18 10:20:10 -08:00
Igor Canadi	9be338cf9d	CompactionJobTest Summary: This is just a simple test that passes two files though a compaction. It shows the framework so that people can continue building new compaction unit tests. In the future we might want to move some Compaction* tests from DBTest here. For example, CompactBetweenSnapshot seems a good candidate. Hopefully this test can be simpler when we mock out VersionSet. Test Plan: this is a test Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28449	2014-11-14 11:35:48 -08:00
sdong	a177742a9b	Make db_stress built for ROCKSDB_LITE Summary: Make db_stress built for ROCKSDB_LITE. The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB. Test Plan: make db_stress Reviewers: yhchiang, rven, ljin, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D28797	2014-11-14 10:20:51 -08:00
Igor Canadi	25f273027b	Fix iOS compile with -Wshorten-64-to-32 Summary: So iOS size_t is 32-bit, so we need to static_cast<size_t> any uint64_t :( Test Plan: TARGET_OS=IOS make static_lib Reviewers: dhruba, ljin, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28743	2014-11-13 14:39:30 -05:00
Igor Canadi	767777c2bd	Turn on -Wshorten-64-to-32 and fix all the errors Summary: We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details. This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables. Test Plan: compiles Reviewers: ljin, rven, yhchiang, sdong Reviewed By: yhchiang Subscribers: bobbaldwin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28689	2014-11-11 16:47:22 -05:00
Igor Canadi	68effa0348	Fix -Wshadow for tools Summary: Previously I made `make check` work with -Wshadow, but there are some tools that are not compiled using `make check`. Test Plan: make all Reviewers: yhchiang, rven, ljin, sdong Reviewed By: ljin, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28497	2014-11-07 15:04:30 -08:00
Igor Canadi	9f20395cd6	Turn -Wshadow back on Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc. Test Plan: compiles Reviewers: ljin, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28131	2014-11-06 11:14:28 -08:00
Igor Canadi	9f7fc3ac45	Turn on -Wshadow Summary: ...and fix all the errors :) Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean. Test Plan: compiles Reviewers: yhchiang, rven, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27711	2014-10-31 11:59:54 -07:00
Yueh-Hsuan Chiang	98849a35fa	Apply InfoLogLevel to the logs in table/block_based_table_reader.cc Summary: Apply InfoLogLevel to the logs in table/block_based_table_reader.cc Also, add missing checks for the returned status in BlockBasedTable::Open Test Plan: make Reviewers: sdong, ljin, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D28005	2014-10-31 11:41:15 -07:00
Yueh-Hsuan Chiang	217cc217d7	Apply InfoLogLevel to the logs in table/meta_blocks.cc Summary: Apply InfoLogLevel to the logs in table/meta_blocks.cc Test Plan: make Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27903	2014-10-29 17:55:19 -07:00
Yueh-Hsuan Chiang	fd95745a59	Fix compile error in table/plain_table_index.cc Summary: Fix compile error in table/plain_table_index.cc Test Plan: make	2014-10-29 17:42:38 -07:00
Yueh-Hsuan Chiang	e7ad69b9fe	Apply InfoLogLevel to the logs in table/plain_table_index.cc Summary: Apply InfoLogLevel to the logs in table/plain_table_index.cc Test Plan: make Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27909	2014-10-29 17:08:40 -07:00
Yueh-Hsuan Chiang	bbd9c53457	Apply InfoLogLevel to the logs in table/block_based_table_builder.cc Summary: Apply InfoLogLevel to the logs in table/block_based_table_builder.cc Test Plan: make Reviewers: igor, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27921	2014-10-29 17:08:20 -07:00
Igor Canadi	412b7f85bb	Include atomic in mock_table.h	2014-10-28 18:10:55 -07:00
Igor Canadi	abac3d6476	TableMock + framework for mock classes Summary: This diff replaces BlockBasedTable in flush_job_test with TableMock, making it depend on less things and making it closer to an unit test than integration test. It also introduces a framework to compile mock classes -- Any file named *mock.cc will not be compiled into the build. It will only get compiled into the tests. What way we can mock out most other classes, Version, VersionSet, DBImpl, etc. Test Plan: flush_job_test Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27681	2014-10-28 17:52:32 -07:00
Igor Canadi	c1c68bce43	remove atomic_pointer.h references	2014-10-27 15:12:20 -07:00
Lei Jin	f1841985e4	dynamic inplace_update options Summary: Make inplace_update_support and inplace_update_num_locks dynamic. inplace_callback becomes immutable We are almost free of references to cfd->options() in db_impl Test Plan: unit test Reviewers: igor, yhchiang, rven, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25293	2014-10-27 12:10:13 -07:00
Lei Jin	839c376bd1	fix table_test Summary: SaveValue expects an internal key but I previously added to table a user key Test Plan: ran the test	2014-10-22 13:53:35 -07:00
Lei Jin	0fd985f427	Avoid reloading filter on Get() if cache_index_and_filter_blocks == false Summary: This fixes the case that filter policy is missing in SST file, but we open the table with filter policy on and cache_index_and_filter_blocks = false. The current behavior is that we will try to load it every time on Get() but fail. Test Plan: unit test Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25455	2014-10-22 11:52:35 -07:00
Lei Jin	2dd9bfe3a8	Sanitize block-based table index type and check prefix_extractor Summary: Respond to issue reported https://www.facebook.com/groups/rocksdb.dev/permalink/651090261656158/ Change the Sanitize signature to take both DBOptions and CFOptions Test Plan: unit test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25041	2014-10-17 21:18:36 -07:00
Igor Canadi	d6987216c9	Merge pull request #327 from dalgaaf/wip-da-SCA-20141001 Fix some issues from SCA	2014-10-02 10:59:52 -07:00
Lei Jin	5ec53f3edf	make compaction related options changeable Summary: make compaction related options changeable. Most of changes are tedious, following the same convention: grabs MutableCFOptions at the beginning of compaction under mutex, then pass it throughout the job and register it in SuperVersion at the end. Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23349	2014-10-01 16:19:16 -07:00
Danny Al-Gaaf	28a6e31583	table/block_based_table_builder.cc: remove unused variable Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:09 +02:00
Danny Al-Gaaf	6b6cedbb1b	table/format.cc: reduce scope of some variables Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	55652043c8	table/cuckoo_table_reader.cc: pass func parameter by reference Fix for: [table/cuckoo_table_reader.cc:196]: (performance) Function parameter 'target' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	d517c83648	in_table_factory.cc: use correct format specifier Use %zu instead of %zd since size_t and uint32_t are unsigned. Fix for: [table/plain_table_factory.cc:55]: (warning) %zd in format string (no. 1) requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'. [table/plain_table_factory.cc:58]: (warning) %zd in format string (no. 1) requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	063471bf76	table/table_test.cc: pass func parameter by reference Fix for: [table/table_test.cc:1218]: (performance) Function parameter 'prefix' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	93548ce8f4	table/cuckoo_table_reader.cc: pass func parameter by ref Fix for: [table/cuckoo_table_reader.cc:198]: (performance) Function parameter 'file_data' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	8ce050b51b	table/bloom_block.*: pass func parameter by reference [table/bloom_block.h:29]: (performance) Function parameter 'keys_hashes' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:31 +02:00
Lei Jin	2faf49d5f1	use GetContext to replace callback function pointer Summary: Intead of passing callback function pointer and its arg on Table::Get() interface, passing GetContext. This makes the interface cleaner and possible better perf. Also adding a fast pass for SaveValue() Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24057	2014-09-29 11:09:09 -07:00
Lei Jin	2dc6f62bb9	handle kDelete type in cuckoo builder Summary: when I changed std::vector<std::string, std::string> to std::string to store key/value pairs in builder, I missed the handling for kDeletion type. As a result, value_size_ can be wrong if the first add key is for deletion. The is captured by ./cuckoo_table_db_test Test Plan: ./cuckoo_table_db_test ./cuckoo_table_reader_test ./cuckoo_table_builder_test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24045	2014-09-29 10:25:21 -07:00
Lei Jin	d439451fab	delay initialization of cuckoo table iterator Summary: cuckoo table iterator creation is quite expensive since it needs to load all data and sort them. After compaction, RocksDB creates a new iterator of the new file to make sure it is in good state. That makes the DB creation quite slow. Delay the iterator db sort to the seek time to speed it up. Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23775	2014-09-25 16:45:37 -07:00
Lei Jin	94997eab5e	reduce memory usage of cuckoo table builder Summary: builder currently buffers all key value pairs as a vector of pair<string, string>. That is too much due to std::string overhead. It wasn't able to fit 1B key/values (12bytes total) in 100GB of ram. Switch to use a plain string to store the key/value sequence and use only 12GB of ram as a result. Test Plan: db_bench Reviewers: igor, sdong, yhchiang Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23763	2014-09-25 16:34:24 -07:00
Lei Jin	c6275956e2	improve memory efficiency of cuckoo reader Summary: When creating a new iterator, instead of storing mapping from key to bucket id for sorting, store only bucket id and read key from mmap file based on the id. This reduces from 20 bytes per entry to only 4 bytes. Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23757	2014-09-25 16:15:23 -07:00
Lei Jin	581442d446	option to choose module when calculating CuckooTable hash Summary: Using module to calculate hash makes lookup ~8% slower. But it has its benefit: file size is more predictable, more space enffient Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23691	2014-09-25 13:53:27 -07:00
Lei Jin	3c68006109	CompactedDBImpl Summary: Add a CompactedDBImpl that will enabled when calling OpenForReadOnly() and the DB only has one level (>0) of files. As a performan comparison, CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with ReadOnlyDBImpl. Test Plan: db_bench Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23553	2014-09-25 11:14:01 -07:00
Lei Jin	0a29ce5393	re-enable BlockBasedTable::SetupForCompaction() Summary: It was commented out in D22545 by accident. Keep the option in ImmutableOptions for now. I can make it dynamic in https://reviews.facebook.net/D23349 Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23865	2014-09-23 14:18:57 -07:00
Igor Canadi	55af370756	Remove TODO for checking index checksums	2014-09-23 13:02:23 -07:00
Igor Canadi	3d74f09979	Fix compile	2014-09-22 15:19:20 -07:00
liuchang0812	4436f17bd8	fixed #303 : replace %ld with % PRId64	2014-09-21 22:09:48 +08:00
Lei Jin	51af7c326c	CuckooTable: add one option to allow identity function for the first hash function Summary: MurmurHash becomes expensive when we do millions Get() a second in one thread. Add this option to allow the first hash function to use identity function as hash function. It results in QPS increase from 3.7M/s to ~4.3M/s. I did not observe improvement for end to end RocksDB performance. This may be caused by other bottlenecks that I will address in a separate diff. Test Plan: ``` [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320 [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320 ``` Reviewers: sdong, igor, yhchiang Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23451	2014-09-18 11:00:48 -07:00
Igor Canadi	ff76895614	Remove some unnecessary constructors Summary: This is continuing the work done by `27b22f13a3` It's just cleaning up some unnecessary constructors. The most important change is removing Block::Block(const BlockContents& contents) constructor. It was only used from the unit test. Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23547	2014-09-17 16:45:58 -07:00
Lei Jin	feadb9df53	fix cuckoo table builder test Summary: as title Test Plan: ./cuckoo_table_builder_test Reviewers:igor CC:leveldb Task ID: # Blame Rev:	2014-09-17 15:48:33 -07:00
Igor Canadi	54cada92b1	Run make format on PR #249	2014-09-17 15:08:50 -07:00
Igor Canadi	27b22f13a3	Merge pull request #249 from tdfischer/decompression-refactoring Decompression refactoring	2014-09-17 15:01:40 -07:00
Torrie Fischer	fb6456b00d	Replace naked calls to operator new and delete (Fixes #222 ) This replaces a mishmash of pointers in the Block and BlockContents classes with std::unique_ptr. It also changes the semantics of BlockContents to be limited to use as a constructor parameter for Block objects, as it owns any block buffers handed to it.	2014-09-17 13:50:07 -07:00
Lei Jin	5600c8f6e5	cuckoo table: return estimated size - 1 Summary: This is to avoid cutting file prematurely and resulting file size to be half of specified. Test Plan: db_bench Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23541	2014-09-17 13:25:29 -07:00
Lei Jin	a062e1f2c4	SetOptions() for memtable related options Summary: as title Test Plan: make all check I will think a way to set up stress test for this Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23055	2014-09-17 12:49:13 -07:00
Igor Canadi	faad439ac4	Fix #284 Summary: This work on my compiler, but it turns out some compilers don't implicitly add constness, see: https://github.com/facebook/rocksdb/issues/284. This diff adds constness explicitly. Test Plan: still compiles Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23409	2014-09-16 10:30:32 -07:00
Igor Canadi	059e584dd3	[unit test] CompactRange should fail if we don't have space Summary: See t5106397. Also, few more changes: 1. in unit tests, the assumption is that writes will be dropped when there is no space left on device. I changed the wording around it. 2. InvalidArgument() errors are only when user-provided arguments are invalid. When the file is corrupted, we need to return Status::Corruption Test Plan: make check Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23145	2014-09-10 17:00:00 -07:00
Igor Canadi	6bb7e3ef25	Merger test Summary: I abandoned https://reviews.facebook.net/D18789, but I wrote a good unit test there, so let's check it in. :) Test Plan: this is test Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22827	2014-09-08 22:24:40 -07:00
Lei Jin	52311463e9	MemTableOptions Summary: removed reference to options in WriteBatch and DBImpl::Get() Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23049	2014-09-08 18:46:52 -07:00
Feng Zhu	0af157f9bf	Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit `1d23b5c470` Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979	2014-09-08 10:37:05 -07:00
wankai	88a2f44f99	fix comments	2014-09-08 16:34:04 +08:00
wankai	823773837b	replace hard-coded number with named variable	2014-09-08 11:10:17 +08:00
wankai	4c2b1f097b	Merge remote-tracking branch 'upstream/master'	2014-09-06 23:23:11 +08:00
wankai	a5d2863074	typo improvement	2014-09-06 23:21:26 +08:00
Radheshyam Balasundaram	5cd0576ffe	Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Summary: Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Also added tests. Test Plan: make check all Also ran db_bench to generate multiple files. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22743	2014-09-05 11:18:01 -07:00
liuhuahang	bb6ae0f80c	fix more compile warnings N/A Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-05 14:14:37 +08:00
Stanislau Hlebik	45a5e3ede0	Remove path with arena==nullptr from NewInternalIterator Summary: Simply code by removing code path which does not use Arena from NewInternalIterator Test Plan: make all check make valgrind_check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22395	2014-09-04 17:40:41 -07:00
Lei Jin	5665e5e285	introduce ImmutableOptions Summary: As a preparation to support updating some options dynamically, I'd like to first introduce ImmutableOptions, which is a subset of Options that cannot be changed during the course of a DB lifetime without restart. ColumnFamily will keep both Options and ImmutableOptions. Any component below ColumnFamily should only take ImmutableOptions in their constructor. Other options should be taken from APIs, which will be allowed to adjust dynamically. I am yet to make changes to memtable and other related classes to take ImmutableOptions in their ctor. That can be done in a seprate diff as this one is already pretty big. Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D22545	2014-09-04 16:18:36 -07:00
Igor Canadi	f7f973d354	Merge pull request #269 from huahang/patch-2 fix a few compile warnings	2014-09-04 09:43:00 -07:00
liuhuahang	ef5b384729	fix a few compile warnings 1, const qualifiers on return types make no sense and will trigger a compile warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 2, class HistogramImpl has virtual functions and thus should have a virtual destructor 3, with some toolchain, the macro __STDC_FORMAT_MACROS is predefined and thus should be checked before define Change-Id: I69747a03bfae88671bfbb2637c80d17600159c99 Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-04 23:06:23 +08:00
wankai	1785114a6f	delete unused Comparator	2014-09-04 09:10:13 +08:00
wankai	19cc588b77	change to filter_block std::unique_ptr support RAII	2014-09-04 00:44:49 +08:00
wankai	5d25a46936	Merge remote-tracking branch 'upstream/master'	2014-09-03 21:57:13 +08:00
Lei Jin	9b58c73c7c	call SanitizeDBOptionsByCFOptions() in the right place Summary: It only covers Open() with default column family right now Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22467	2014-09-02 14:42:23 -07:00
Igor Canadi	7f19bb93c6	Merge pull request #242 from tdfischer/perf-timer-destructors Refactor PerfStepTimer to automatically stop on destruct	2014-09-02 13:06:40 -07:00
Torrie Fischer	6614a48418	Refactor PerfStepTimer to stop on destruct This eliminates the need to remember to call PERF_TIMER_STOP when a section has been timed. This allows more useful design with the perf timers and enables possible return value optimizations. Simplistic example: class Foo { public: Foo(int v) : m_v(v); private: int m_v; } Foo makeFrobbedFoo(int errno) { errno = 0; return Foo(); } Foo bar(int *errno) { PERF_TIMER_GUARD(some_timer); return makeFrobbedFoo(errno); } int main(int argc, char[] argv) { Foo f; int errno; f = bar(&errno); if (errno) return -1; return 0; } After bar() is called, perf_context.some_timer would be incremented as if Stop(&perf_context.some_timer) was called at the end, and the compiler is still able to produce optimizations on the return value from makeFrobbedFoo() through to main().	2014-09-02 12:04:22 -07:00
Igor Canadi	076bd01a29	Fix compile Summary: gcc on our dev boxes is not happy about __attribute__((unused)) Test Plan: compiles now Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22707	2014-09-02 11:49:38 -07:00
Igor Canadi	990df99a61	Fix ios compile Summary: We need to set contbuild for this :) Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22701	2014-09-02 10:50:15 -07:00
Wankai Zhang	dff2b1a8f8	typo improvement	2014-09-02 22:57:03 +08:00
Radheshyam Balasundaram	d20b8cfaa1	Improve Cuckoo Table Reader performance. Inlined hash function and number of buckets a power of two. Summary: Use inlined hash functions instead of function pointer. Make number of buckets a power of two and use bitwise and instead of mod. After these changes, we get almost 50% improvement in performance. Results: With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.231us (4.3 Mqps) with batch size of 0 Time taken per op is 0.229us (4.4 Mqps) with batch size of 0 Time taken per op is 0.185us (5.4 Mqps) with batch size of 0 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.108us (9.3 Mqps) with batch size of 10 Time taken per op is 0.100us (10.0 Mqps) with batch size of 10 Time taken per op is 0.103us (9.7 Mqps) with batch size of 10 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.101us (9.9 Mqps) with batch size of 25 Time taken per op is 0.098us (10.2 Mqps) with batch size of 25 Time taken per op is 0.097us (10.3 Mqps) with batch size of 25 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50 Time taken per op is 0.097us (10.3 Mqps) with batch size of 50 Time taken per op is 0.097us (10.3 Mqps) with batch size of 50 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.102us (9.8 Mqps) with batch size of 100 Time taken per op is 0.098us (10.2 Mqps) with batch size of 100 Time taken per op is 0.115us (8.7 Mqps) with batch size of 100 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0 Time taken per op is 0.155us (6.5 Mqps) with batch size of 0 Time taken per op is 0.152us (6.6 Mqps) with batch size of 0 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.089us (11.3 Mqps) with batch size of 10 Time taken per op is 0.084us (11.9 Mqps) with batch size of 10 Time taken per op is 0.086us (11.6 Mqps) with batch size of 10 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.087us (11.5 Mqps) with batch size of 25 Time taken per op is 0.085us (11.7 Mqps) with batch size of 25 Time taken per op is 0.093us (10.8 Mqps) with batch size of 25 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.094us (10.6 Mqps) with batch size of 50 Time taken per op is 0.094us (10.7 Mqps) with batch size of 50 Time taken per op is 0.093us (10.8 Mqps) with batch size of 50 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.092us (10.9 Mqps) with batch size of 100 Time taken per op is 0.089us (11.2 Mqps) with batch size of 100 Time taken per op is 0.088us (11.3 Mqps) with batch size of 100 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0 Time taken per op is 0.168us (6.0 Mqps) with batch size of 0 Time taken per op is 0.190us (5.3 Mqps) with batch size of 0 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.081us (12.4 Mqps) with batch size of 10 Time taken per op is 0.077us (13.0 Mqps) with batch size of 10 Time taken per op is 0.083us (12.1 Mqps) with batch size of 10 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 25 Time taken per op is 0.073us (13.7 Mqps) with batch size of 25 Time taken per op is 0.073us (13.7 Mqps) with batch size of 25 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.076us (13.1 Mqps) with batch size of 50 Time taken per op is 0.072us (13.8 Mqps) with batch size of 50 Time taken per op is 0.072us (13.8 Mqps) with batch size of 50 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 100 Time taken per op is 0.074us (13.6 Mqps) with batch size of 100 Time taken per op is 0.073us (13.6 Mqps) with batch size of 100 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.190us (5.3 Mqps) with batch size of 0 Time taken per op is 0.186us (5.4 Mqps) with batch size of 0 Time taken per op is 0.184us (5.4 Mqps) with batch size of 0 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.079us (12.7 Mqps) with batch size of 10 Time taken per op is 0.070us (14.2 Mqps) with batch size of 10 Time taken per op is 0.072us (14.0 Mqps) with batch size of 10 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 25 Time taken per op is 0.072us (14.0 Mqps) with batch size of 25 Time taken per op is 0.071us (14.1 Mqps) with batch size of 25 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.082us (12.1 Mqps) with batch size of 50 Time taken per op is 0.071us (14.1 Mqps) with batch size of 50 Time taken per op is 0.073us (13.6 Mqps) with batch size of 50 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 100 Time taken per op is 0.077us (13.0 Mqps) with batch size of 100 Time taken per op is 0.078us (12.8 Mqps) with batch size of 100 Test Plan: make check all make valgrind_check make asan_check Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22539	2014-08-29 19:06:15 -07:00
Tomislav Novak	0f9c43ea36	ForwardIterator: reset incomplete iterators on Seek() Summary: When reading from kBlockCacheTier, ForwardIterator's internal child iterators may end up in the incomplete state (read was unable to complete without doing disk I/O). `ForwardIterator::status()` will correctly report that; however, the iterator may be stuck in that state until all sub-iterators are rebuilt: * `NeedToSeekImmutable()` may return false even if some sub-iterators are incomplete * one of the child iterators may be an empty iterator without any state other that the kIncomplete status (created using `NewErrorIterator()`); seeking on any such iterator has no effect -- we need to construct it again Akin to rebuilding iterators after a superversion bump, this diff makes forward iterator reset all incomplete child iterators when `Seek()` or `Next()` are called. Test Plan: TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: lovro, march, leveldb Differential Revision: https://reviews.facebook.net/D22575	2014-08-29 16:21:29 -07:00
Igor Canadi	22a0a60dc4	Merge pull request #250 from wankai/master delete unused struct Options	2014-08-29 09:53:18 -04:00
Wankai Zhang	be25ee44fe	delete unused struct Options	2014-08-29 17:31:04 +08:00
Feng Zhu	1d23b5c470	remove_internal_filter_policy Summary: 1. remove class InternalFilterPolicy in db/dbformat.h 2. Transformation from internal key to user key is done in filter_block.cc 3. This is a preparation for patch D20979 Test Plan: make all check valgrind ./db_test Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22509	2014-08-28 17:06:29 -07:00
Radheshyam Balasundaram	7f71448388	Implementing a cache friendly version of Cuckoo Hash Summary: This implements a cache friendly version of Cuckoo Hash in which, in case of collission, we try to insert in next few locations. The size of the neighborhood to check is taken as an input parameter in builder and stored in the table. Test Plan: make check all cuckoo_table_{db,reader,builder}_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22455	2014-08-28 10:42:23 -07:00
Wankai Zhang	528a11c635	Update block_builder.h more c++11 way noncopyable and keep parameter's name of constructor consistent	2014-08-28 13:37:10 +08:00
Radheshyam Balasundaram	4142a3e783	Adding a user comparator for comparing Uint64 slices. Summary: - New Uint64 comparator - Modify Reader and Builder to take custom user comparators instead of bytewise comparator - Modify logic for choosing unused user key in builder - Modify iterator logic in reader - test changes Test Plan: cuckoo_table_{builder,reader,db}_test make check all Reviewers: ljin, sdong Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D22377	2014-08-27 10:39:31 -07:00
Lei Jin	23861857c4	ReadOptions.total_order_seek to allow total order seek for block-based table when hash index is enabled Summary: as title Test Plan: table_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22239	2014-08-25 16:14:30 -07:00
Lei Jin	a98badff16	print table options Summary: Add a virtual function in table factory that will print table options Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22149	2014-08-25 14:24:09 -07:00
Lei Jin	384400128f	move block based table related options BlockBasedTableOptions Summary: I will move compression related options in a separate diff since this diff is already pretty lengthy. I guess I will also need to change JNI accordingly :( Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21915	2014-08-25 14:22:05 -07:00
Shao Yu Zhang	f76eda74d6	Fix compilation issue on OSX	2014-08-21 18:11:33 -07:00
Radheshyam Balasundaram	08be7f5266	Implement Prepare method in CuckooTableReader Summary: - Implement Prepare method - Rewrite performance tests in cuckoo_table_reader_test to write new file only if one doesn't already exist. - Add performance tests for batch lookup along with prefetching. Test Plan: ./cuckoo_table_reader_test --enable_perf Results (We get better results if we used int64 comparator instead of string comparator (TBD in future diffs)): With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.208us (4.8 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.182us (5.5 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.161us (6.2 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.161us (6.2 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.163us (6.1 Mqps) with batch size of 100 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.252us (4.0 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.192us (5.2 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.195us (5.1 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.191us (5.2 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.194us (5.1 Mqps) with batch size of 100 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.228us (4.4 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.185us (5.4 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.186us (5.4 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.189us (5.3 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.188us (5.3 Mqps) with batch size of 100 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.325us (3.1 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.196us (5.1 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.199us (5.0 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.196us (5.1 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.209us (4.8 Mqps) with batch size of 100 Reviewers: sdong, yhchiang, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22167	2014-08-20 18:35:35 -07:00
Yueh-Hsuan Chiang	63a2215c63	Improve Options sanitization and add MmapReadRequired() to TableFactory Summary: Currently, PlainTable must use mmap_reads. When PlainTable is used but allow_mmap_reads is not set, rocksdb will fail in flush. This diff improve Options sanitization and add MmapReadRequired() to TableFactory. Test Plan: export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest make db_test -j32 ./db_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: you, leveldb Differential Revision: https://reviews.facebook.net/D21939	2014-08-20 15:53:39 -07:00
sdong	045575ad0c	Add CuckooHash table format to table_reader_bench Summary: Make table_reader_bench cover all the three table formats. Test Plan: Run it using three options Reviewers: radheshyamb, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22137	2014-08-19 14:58:15 -07:00
Yueh-Hsuan Chiang	570ba5aca8	Avoid retrying to read property block from a table when it does not exist. Summary: Avoid retrying to read property block from a table when it does not exist in updating stats for compensating deletion entries. In addition, ReadTableProperties() now returns Status::NotFound instead of Status::Corruption when table properties does not exist in the file. Test Plan: make db_test -j32 export ROCKSDB_TESTS=CompactionDeleteionTrigger ./db_test Reviewers: ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21867	2014-08-15 12:17:44 -07:00
Radheshyam Balasundaram	9674c11d01	Integrating Cuckoo Hash SST Table format into RocksDB Summary: Contains the following changes: - Implementation of cuckoo_table_factory - Adding cuckoo table into AdaptiveTableFactory - Adding cuckoo_table_db_test, similar to lines of plain_table_db_test - Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key. - Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test cuckoo_table_db_test make check all make valgrind_check make asan_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21219	2014-08-11 20:21:07 -07:00
miguelportilla	93e6b5e9d9	Changes to support unity build: * Script for building the unity.cc file via Makefile * Unity executable Makefile target for testing builds * Source code changes to fix compilation of unity build	2014-08-11 13:22:47 -04:00
ZHANG Biao	63d5cc72fa	fix various 'comparison of integers of different signs' compiling errors under macosx	2014-08-07 17:06:07 +08:00
sdong	1242bfcad7	Add DB property "rocksdb.estimate-table-readers-mem" Summary: Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache. Refactor the property codes to allow getting property from a version, with DB mutex not acquired. Test Plan: Add several checks of this new property in existing codes for various cases. Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, igor, leveldb Differential Revision: https://reviews.facebook.net/D20733	2014-08-06 11:39:46 -07:00
Radheshyam Balasundaram	606a126703	Changing implementaiton of CuckooTableBuilder to not take file_size, key_length, value_length. Summary: - Maintain a list of key-value pairs as vectors during Add operation. - Start building hash table only when Finish() is called. - This approach takes more time and space but avoids taking file_size, key and value lengths. - Rewrote cuckoo_table_builder_test I did not know about IterKey while writing this diff. I shall change places where IterKey could be used instead of std::string tomorrow. Please review rest of the logic. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test valgrind_check asan_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20907	2014-08-05 20:55:46 -07:00
Radheshyam Balasundaram	2124c85cc6	Implementing CuckooTableReader::NewIterator Summary: - Reads key-value pairs from file and builds an in-memory index of key-to-bucket id map in sorted order of key. - Assumes bytewise comparator for sorting keys. - Test changes Test Plan: cuckoo_table_reader_test --enable_perf valgrind_check asan_check Reviewers: yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb, igor Differential Revision: https://reviews.facebook.net/D20721	2014-08-05 16:35:02 -07:00
sdong	02c4023666	Remove port::MemoryBarrier() from table_reader_bench Summary: port::MemoryBarrier() is not recommended to use outside of port. Remove it. Test Plan: run table_reader_bench Reviewers: ljin, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21075	2014-08-05 11:33:56 -07:00
Feng Zhu	1129921e9b	logging_when_create_and_delete_manifest Summary: 1. logging when create and delete manifest file 2. fix formating in table/format.cc Test Plan: make all check run db_bench, track the LOG file. Reviewers: ljin, yhchiang, igor, yufei.zhu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21009	2014-08-04 11:25:42 -07:00
Radheshyam Balasundaram	0c9d03ba10	Fixing broken Mac build Summary: Made some small changes to fix the broken mac build Test Plan: make check all in both linux and mac. All tests pass. Reviewers: sdong, igor, ljin, yhchiang Reviewed By: ljin, yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20895	2014-07-31 20:52:13 -07:00
Feng Zhu	b0999011e2	use stack instead of heap memory in ReadBlockContents in some case Summary: When compression is enabled, and blocksize is not too big, use the space in stack to hold bytes read from block. Bencmark: base version: commit `8f09d53fd1` malloc: 1.30% -> 0.98% free: 1.49% -> 1.07% Test Plan: make all check Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20679	2014-07-30 23:11:59 -07:00
Feng Zhu	8f09d53fd1	remove malloc when create data and index iterator in Get Summary: Define Block::Iter to be an independent class to be used by block_based_table_reader When creating data and index iterator, update an existing iterator rather than new one Thus malloc and free could be reduced Benchmark, Base: commit `76286ee67e` commands: --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=2621440 --use_hash_search=1 --block_size=1024 --block_restart_interval=1 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 malloc: 3.30% -> 1.42% free: 3.59%->1.61% Test Plan: make all check run db_stress valgrind ./db_test ./table_test Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20655	2014-07-30 16:34:35 -07:00
Radheshyam Balasundaram	91c01485d1	Minor changes to CuckooTableBuilder Summary: - Copy the key and value to in-memory hash table during Add operation. Also modified cuckoo_table_reader_test to use this. - Store only the user_key in in-memory hash table if it is last level file. - Handle Carryover while chosing unused key in Finish() method in case unused key was never found before Finish() call. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test valgrind_check asan_check Reviewers: sdong, yhchiang, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20715	2014-07-28 17:14:25 -07:00
Lei Jin	40fa8a4cd5	make statistics forward-able Summary: Make StatisticsImpl being able to forward stats to provided statistics implementation. The main purpose is to allow us to collect internal stats in the future even when user supplies custom statistics implementation. It avoids intrumenting 2 sets of stats collection code. One immediate use case is tuning advisor, which needs to collect some internal stats, users may not be interested. Test Plan: ran db_bench and see stats show up at the end of run Will run make all check since some tests rely on statistics Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20145	2014-07-28 12:05:36 -07:00
sdong	4a8f0c957c	Block::Iter::PrefixSeek() to have an extra check to filter out some false matches Summary: In block based table's hash index checking, when looking for a key that doesn't exist, there is a high chance that a false block is returned because of hash bucket conflicts. In this revision, another check is done to filter out some of those cases: comparing previous key of the block boundary to see whether the target block is what we are looking for. In a favored test setting (bloom filter disabled, 8 L0 files), I saw about 80% improvements. In a non-favored test setting (bloom filter enabled, files are all in L1, files are all cached), I see the performance penalty is less than 3%. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: wuj, leveldb, zagfox, yhchiang Differential Revision: https://reviews.facebook.net/D20595	2014-07-25 17:27:57 -07:00
Radheshyam Balasundaram	62f9b071ff	Implementation of CuckooTableReader Summary: Contains: - Implementation of TableReader based on Cuckoo Hashing - Unittests for CuckooTableReader - Performance test for TableReader Test Plan: make cuckoo_table_reader_test ./cuckoo_table_reader_test make valgrind_check make asan_check Reviewers: yhchiang, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20511	2014-07-25 16:37:32 -07:00
Radheshyam Balasundaram	07a7d870b8	Addressing TODOs in CuckooTableBuilder Summary: Contains the following changes in CuckooTableBuilder: - Take an extra parameter in constructor to identify last level file. - Implement a better way to identify if a bucket has been inserted into the tree already during BFS search. - Minor typos Test Plan: make cuckoo_table_builder ./cuckoo_table_builder make valgrind_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20445	2014-07-24 10:07:41 -07:00
Feng Zhu	da9274574f	Use IterKey instead of string in Block::Iter to reduce malloc Summary: Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test Use IterKey in block::Iter to replace std::string to reduce malloc. Evaluate it using perf record. malloc: 4.26% -> 2.91% free: 3.61% -> 3.08% Test Plan: make all check ./valgrind db_test dbformat_test Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20433	2014-07-23 12:31:11 -07:00
Igor Canadi	2d3d63597a	Fix signed-unsigned compare error	2014-07-22 15:35:07 -04:00
Radheshyam Balasundaram	f6272e3055	Fixing memory leaks in cuckoo_table_builder_test Summary: Fixes some memory leaks in cuckoo_builder_test.cc. This also fixed broken valgrind_check tests Test Plan: make valgrind_check ./cuckoo_builder_test Currently running make check all. I shall update once it is done. Reviewers: ljin, sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20385	2014-07-22 09:49:04 -07:00
Yueh-Hsuan Chiang	bbe2e91d00	Fixed a compile error of cuckoo_table_builder. Summary: Fixed the following compile error. ./table/cuckoo_table_builder.h:72:22: error: private field 'key_length_' is not used [-Werror,-Wunused-private-field] const unsigned int key_length_; ^ 1 error generated. Test Plan: make Reviewers: sdong, ljin, radheshyamb, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20349	2014-07-21 15:17:09 -07:00
Radheshyam Balasundaram	cf3da899b0	Adding a new SST table builder based on Cuckoo Hashing Summary: Cuckoo Hashing based SST table builder. Contains: - Cuckoo Hashing logic and file storage logic. - Unit tests for logic Test Plan: make cuckoo_table_builder_test ./cuckoo_table_builder_test make check all Reviewers: yhchiang, igor, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D19545	2014-07-21 13:26:09 -07:00
Stanislau Hlebik	c1a90b0848	Fix db_bench Summary: Adding check for zero size index Test Plan: ./build_tools/regression_build_test.sh Reviewers: yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20259	2014-07-21 10:31:33 -07:00
Chilledheart	54f4e2f188	Fix clang compiler warnings	2014-07-20 22:57:20 +08:00
Stanislau Hlebik	9d70cce047	Adding option to save PlainTable index and bloom filter in SST file. Summary: Adding option to save PlainTable index and bloom filter in SST file. If there is no bloom block and/or index block, PlainTableReader builds new ones. Otherwise PlainTableReader just use these blocks. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19527	2014-07-18 16:58:13 -07:00
Stanislau Hlebik	92d73cbe78	Add PlainTableOptions Summary: Since we have a lot of options for PlainTable, add a struct PlainTableOptions to manage them Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20175	2014-07-18 00:08:38 -07:00
Igor Canadi	1614284eff	Fix compressed cache	2014-07-16 06:45:49 -07:00
Stanislau Hlebik	30c81e7717	Removing NewTotalOrderPlainTableFactory Summary: Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect. Total order mode indicator is prefix_extractor == nullptr, but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests in plain_table_db_tests is incorrect. Test Plan: make all check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19587	2014-07-10 11:32:04 -07:00
Lei Jin	8d9a46fcd1	initialize decoded_internal_key_valid Summary: ReadInternalKey() will assign correct value anyway. Initialize it to true to suppress compiler error reported https://github.com/facebook/rocksdb/issues/186 Test Plan: I cannot reproduce it but this is obvious Reviewers: sdong, yhchiang Reviewed By: yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19467	2014-07-03 23:13:08 -07:00
Feng Zhu	5656367416	use arena to allocate memtable's bloomfilter and hashskiplist's buckets_ Summary: Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena DynamicBloom: pass arena via constructor, allocate space in SetTotalBits HashSkipListRep: allocate space of buckets_ using arena. do not delete it in deconstructor because arena would take care of it. Several test files are changed. Test Plan: make all check Reviewers: ljin, haobo, yhchiang, sdong Reviewed By: sdong Subscribers: igor, dhruba Differential Revision: https://reviews.facebook.net/D19335	2014-06-30 15:54:31 -07:00
Igor Canadi	9fe87b17aa	Fix compile	2014-06-20 10:36:48 +02:00
Igor Canadi	d4a8423334	Remove seek compaction Summary: As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases. This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction. There is one test case that relied on didIO information. I'll try to find another way to implement it. Test Plan: make check Reviewers: sdong, haobo, yhchiang, ljin, dhruba Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19161	2014-06-20 10:23:02 +02:00
Haobo Xu	7a9dd5f214	[RocksDB] Make block based table hash index more adaptive Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index. Test Plan: unit test, also manually veried LOG that fallback did occur. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19191	2014-06-19 16:40:32 -07:00
Lei Jin	1ec2d1c69d	fix make shared_lib compilation error Summary: s/class ParsedInternalKey/struct ParsedInternalKey Test Plan: make shared_lib Reviewers: igor, yhchiang, sdong, haobo Reviewed By: haobo Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19173	2014-06-19 10:12:26 -07:00
Haobo Xu	167738256f	[RocksDB] Fix unit test Summary: fix a bug in D19047, which caused DBTest.RecoverDuringMemtableCompaction to fail. Test Plan: unit test Reviewers: sdong, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19155	2014-06-19 01:37:21 -07:00
sdong	edd47c5104	PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key Summary: Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes. The data format is documented in table/plain_table_factory.h Test Plan: Add unit test coverage in plain_table_db_test Reviewers: yhchiang, igor, dhruba, ljin, haobo Reviewed By: haobo Subscribers: nkg-, leveldb Differential Revision: https://reviews.facebook.net/D18735	2014-06-18 20:41:52 -07:00
Haobo Xu	0f0076ed5a	[RocksDB] Reduce memory footprint of the blockbased table hash index. Summary: Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption. Just a quick draft, still testing and refining. Test Plan: unit test and shadow testing Reviewers: dhruba, kailiu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19047	2014-06-18 18:16:07 -07:00
Igor Canadi	3525aac9e5	Change order of parameters in adaptive table factory Summary: This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation: 1) block based factory 2) plain table factory 3) output factory 4) new format factory I think it makes more sense to have output as the first parameter. Also, fixed a NewAdaptiveTableFactory() call in unit test Test Plan: unit test Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D19119	2014-06-18 07:04:37 +02:00
sdong	200e4b4a72	Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format. Test Plan: add a unit test Reviewers: haobo, igor Reviewed By: igor Subscribers: dhruba, ljin, yhchiang, leveldb Differential Revision: https://reviews.facebook.net/D19017	2014-06-17 11:49:22 -07:00
Lei Jin	c83b085770	prefetch bloom filter data block for L0 files Summary: as title Test Plan: db_bench the initial result is very promising. I will post results of complete runs Reviewers: dhruba, haobo, sdong, igor Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18867	2014-06-12 10:06:18 -07:00
sdong	88a1691a1e	BlockBasedTable::PrefixMayMatch() to bloom setting to the beginning of the function Summary: In BlockBasedTable::PrefixMayMatch() we calculate prefix even if bloom is not config. Move the check before Test Plan: make all check Reviewers: igor, ljin Reviewed By: ljin Subscribers: wuj, leveldb, haobo, yhchiang, dhruba Differential Revision: https://reviews.facebook.net/D18993	2014-06-10 11:14:22 -07:00
sdong	80f409ea37	Clean PlainTableReader's variables for better data locality Summary: Clean PlainTableReader's data structures: (1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization) (2) remove some variables only used when initialization from the class (3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly (4) make Options a reference. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: igor, yhchiang, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D18891	2014-06-09 13:53:39 -07:00
Igor Canadi	f43c8262c2	Don't compress block bigger than 2GB Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446. Test Plan: make check doesn't complain :) Reviewers: haobo, ljin, yhchiang, dhruba, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D18975	2014-06-09 12:26:09 -07:00
sdong	df9069d23f	In DB::NewIterator(), try to allocate the whole iterator tree in an arena Summary: In this patch, try to allocate the whole iterator tree starting from DBIter from an arena 1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it. 2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator. 3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it. Limitations: (1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc (2) Two level iterator itself is allocated in arena, but not iterators inside it. Test Plan: make all check Reviewers: ljin, haobo Reviewed By: haobo Subscribers: leveldb, dhruba, yhchiang, igor Differential Revision: https://reviews.facebook.net/D18513	2014-06-02 17:44:57 -07:00
Lei Jin	388d2054c7	forward iterator Summary: Forward iterator puts everything together in a flat structure instead of a hierarchy of nested iterators. this should simplify the code and provide better performance. It also enables more optimization since all information are accessiable in one place. Init evaluation shows about 6% improvement Test Plan: db_test and db_bench Reviewers: dhruba, igor, tnovak, sdong, haobo Reviewed By: haobo Subscribers: sdong, leveldb Differential Revision: https://reviews.facebook.net/D18795	2014-05-30 14:31:55 -07:00
Kai Liu	0b3d03d026	Materialize the hash index Summary: Materialize the hash index to avoid the soaring cpu/flash usage when initializing the database. Test Plan: existing unit tests passed Reviewers: sdong, haobo Reviewed By: sdong CC: leveldb Differential Revision: https://reviews.facebook.net/D18339	2014-05-15 14:09:03 -07:00
Igor Canadi	26f5dd9a5a	TablePropertiesCollectorFactory Summary: This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options. Here's description of task #4296714: I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB. For example, this table has 3155 entries and 1391828 deleted keys. The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for. Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API. In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe. Test Plan: Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one). Also, all other tests Reviewers: sdong, dhruba, haobo, kailiu Reviewed By: kailiu CC: leveldb Differential Revision: https://reviews.facebook.net/D18579	2014-05-13 12:30:55 -07:00

1 2 3 4 5 ...

478 Commits