rocksdb

Author	SHA1	Message	Date
Yueh-Hsuan Chiang	685582a0b4	Revert two diffs related to DBIter::FindPrevUserKey() Summary: This diff reverts the following two previous diffs related to DBIter::FindPrevUserKey(), which makes db_stress unstable. We should bake a better fix for this. * "Fix a comparison in DBIter::FindPrevUserKey()" `ec70fea4c4`. * "Fixed endless loop in DBIter::FindPrevUserKey()" `acee2b08a2`. Test Plan: db_stress Reviewers: anthony, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41301	2015-07-07 11:36:24 -07:00
lovro	b6655a679d	Replace std::priority_queue in MergingIterator with custom heap Summary: While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison. Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge. Keys in our dataset include sequence numbers that increase with time. Adjacent keys in an L0 file are very likely to be adjacent in the full database. Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one. It would be great to avoid the O(log K) operation per row while compacting. This diff replaces std::priority_queue with a custom binary heap implementation. It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top). Test Plan: make check To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number). I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs). The exact improvement depends on the number of L0 files and the amount of locality. Performance on randomly distributed keys seems on par with the old code. Reviewers: kailiu, sdong, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, tnovak Differential Revision: https://reviews.facebook.net/D29133	2015-07-06 04:24:09 -07:00
Yueh-Hsuan Chiang	acee2b08a2	Fixed endless loop in DBIter::FindPrevUserKey() Summary: Fixed endless loop in DBIter::FindPrevUserKey() Test Plan: ./db_stress --test_batches_snapshots=1 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=0 --reopen=20 --readpercent=45 --prefixpercent=5 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/tmp/rocksdb_crashtest_KdCI5F --max_key=100000000 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --progress_reports=0 --disable_wal=0 --disable_data_sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=0 --memtablerep=prefix_hash --prefix_size=7 --ops_per_thread=200 --kill_random_test=97 Reviewers: tnovak, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41085	2015-07-02 16:10:31 -07:00
Dmitri Smirnov	9dbde7277c	Merge remote-tracking branch 'origin' into ms_win_port	2015-07-02 11:34:22 -07:00
Dmitri Smirnov	18285c1e2f	Windows Port from Microsoft Summary: Make RocksDb build and run on Windows to be functionally complete and performant. All existing test cases run with no regressions. Performance numbers are in the pull-request. Test plan: make all of the existing unit tests pass, obtain perf numbers. Co-authored-by: Praveen Rao praveensinghrao@outlook.com Co-authored-by: Sherlock Huang baihan.huang@gmail.com Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com	2015-07-01 16:13:56 -07:00
sdong	05e2831966	Allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena Summary: Try to allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena, instead of calling malloc and free. Test Plan: valgrind check Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40929	2015-06-30 17:30:38 -07:00
Igor Canadi	0a019d74a0	Use malloc_usable_size() for accounting block cache size Summary: Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB! This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected. This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory. I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases. Test Plan: 1. fill up mongo instance with 80GB of data 2. restart mongo with block cache size configured to 10GB 3. do a table scan in mongo 4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40635	2015-06-26 11:48:09 -07:00
Giuseppe Ottaviano	782a1590f9	Implement a table-level row cache Summary: Implementation of a table-level row cache. It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache. Supports snapshots and merge operations. Test Plan: Ran `make valgrind_check commit-prereq` Reviewers: igor, philipp, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39849	2015-06-23 10:25:45 -07:00
sdong	6df589b446	Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files Summary: It is experimental. Allow users to return from a call back function TablePropertiesCollector::NeedCompact(), based on the data in the file. It can be used to allow users to suggest DB to clear up delete tombstones faster. Test Plan: Add a unit test. Reviewers: igor, yhchiang, kradhakrishnan, rven Reviewed By: rven Subscribers: yoshinorim, MarkCallaghan, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39585	2015-06-05 20:18:21 -07:00
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	2015-05-29 14:36:35 -07:00
Igor Canadi	dbd95b7532	Add more table properties to EventLogger Summary: Example output: {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}} Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter. Test Plan: make check + check out the output in the log Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38343	2015-05-12 15:53:55 -07:00
clark.kang	6ede020dc4	fix typos	2015-04-25 18:14:27 +09:00
sdong	98a44559d5	Build for CYGWIN Summary: Make it build for CYGWIN. Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions. Test Plan: Build it and run some unit tests in CYGWIN Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37605	2015-04-23 21:33:44 -07:00
sdong	fcb206b667	SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward Summary: Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests. Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug. Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check. Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36999	2015-04-14 16:18:50 -07:00
Igor Canadi	5e067a7b19	Clean up compression logging Summary: Now we add warnings when user configures compression and the compression is not supported. Test Plan: Configured compression to non-supported values. Observed messages in my log: 2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2. 2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data. Reviewers: rven, sdong, yhchiang Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35979	2015-04-06 12:50:44 -07:00
sdong	a45e7581b7	Avoid naming conflict of EntryType Summary: Fix build break on travis build: $ OPT=-DTRAVIS V=1 make unity && make clean && OPT=-DTRAVIS V=1 make db_test && ./db_test ...... In file included from unity.cc:65:0: ./table/plain_table_key_coding.cc: In member function ‘rocksdb::Status rocksdb::PlainTableKeyDecoder::NextPrefixEncodingKey(const char, const char, rocksdb::ParsedInternalKey, rocksdb::Slice, size_t, bool)’: ./table/plain_table_key_coding.cc:224:3: error: reference to ‘EntryType’ is ambiguous EntryType entry_type; ^ In file included from ./db/table_properties_collector.h:9:0, from ./db/builder.h:11, from ./db/builder.cc:10, from unity.cc:1: ./include/rocksdb/table_properties.h:81:6: note: candidates are: enum rocksdb::EntryType enum EntryType { ^ In file included from unity.cc:65:0: ./table/plain_table_key_coding.cc:16:6: note: enum rocksdb::{anonymous}::EntryType enum EntryType : unsigned char { ^ ./table/plain_table_key_coding.cc:231:51: error: ‘entry_type’ was not declared in this scope const char* pos = DecodeSize(key_ptr, limit, &entry_type, &size); ^ make: *** [unity.o] Error 1 Test Plan: OPT=-DTRAVIS V=1 make unity And make sure it doesn't break anymore. Reviewers: yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36549	2015-04-06 11:49:13 -07:00
sdong	953a885ebf	A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge Summary: Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it. Also refactor the codes so that (1) make table property collector and internal table property collector two separate data structures with the later one now exposed (2) table builders only receive internal table properties Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector. Reviewers: yhchiang, igor.sugak, rven, igor Reviewed By: rven, igor Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D35373	2015-04-06 10:27:21 -07:00
Anurag Indu	1e57f2bf2b	Fix build Test Plan: Running make all Reviewers: sdong Reviewed By: sdong Subscribers: rven, yhchiang, igor, meyering, dhruba Differential Revision: https://reviews.facebook.net/D35889	2015-03-24 17:00:28 -07:00
Anurag Indu	211ca26aee	Fixing build issue Summary: Fixing issues with get context function. Test Plan: Run make commit-prereq Reviewers: sdong, meyering, yhchiang Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35853	2015-03-24 16:27:24 -07:00
Anurag Indu	3d1a924ff3	Adding stats for the merge and filter operation Summary: We have addded new stats and perf_context for measuring the merge and filter operation time consumption. We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB. Test Plan: WIP Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34377	2015-03-24 14:42:04 -07:00
Igor Sugak	9405b5ef8f	rocksdb: Remove #include "util/string_util.h" from util/testharness.h Summary: 1. Manually deleted #include "util/string_util.h" from util/testharness.h 2. ``` % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -E 'say $F[0] if /: error:/' build.log \| sort -u \| xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"' ``` Test Plan: Make sure make all completes with no errors. ``` % make all -j55 ``` Reviewers: meyering, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35493	2015-03-19 17:29:37 -07:00
Igor Sugak	b4b69e4f77	rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te \| test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files 'test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157	2015-03-17 14:08:00 -07:00
Igor Sugak	9fd6edf81c	rocksdb: Replace ASSERT* with EXPECT* in functions that does not return void value Summary: gtest does not use exceptions to fail a unit test by design, and `ASSERT`s are implemented using `return`. As a consequence we cannot use `ASSERT` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement \| 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes. In order to detect all existing `ASSERT` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases. In `util/testharness.h` I defined `EXPECT` assertions, the same way as `ASSERT`, and redefined `ASSERT` to return `void`. Then executed: ```lang=bash % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if /: error:/' \ build.log \| xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number' % make format ``` After that I reverted back change to `ASSERT` in `util/testharness.h`. But preserved introduced `EXPECT`, which is the same as `ASSERT*`. This will be deleted once switched to gtest. This diff is independent and contains manual changes only in `util/testharness.h`. Test Plan: Make sure all tests are passing. ```lang=bash % USE_CLANG=1 make check ``` Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33333	2015-03-16 20:52:32 -07:00
Igor Sugak	95344346af	rocksdb: Small refactoring before migrating to gtest Summary: These changes are necessary to make tests look more generic, and avoid feature conflicts with gtest. Test Plan: Make sure no build errors, and all test are passing. ``` % make check ``` Reviewers: igor, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35145	2015-03-16 18:08:59 -07:00
sdong	e9de8b65a6	Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true Summary: Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc. Test Plan: run all tests Reviewers: rven, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34431	2015-03-11 13:14:52 -07:00
krad	f29b33c73b	Add functionality to pre-fetch blocks specified by a key range to BlockBasedTable implementation. Summary: Pre-fetching is a common operation performed by data stores for disk/flash based systems as part of database startup. This is part of task 5197184. Test Plan: Run the newly added unit test Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33933	2015-03-02 17:07:03 -08:00
Sameet Agarwal	e7c434c364	Add columnfamily option optimize_filters_for_hits to optimize for key hits only Summary: Summary: Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most accesses to the store are key hits and we dont need to optimize performance for key misses. This is useful when you have a very large database and most of your lookups succeed. The option allows the store to not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level. This option is only provided for BlockBasedTable. We skip the filters when we are compacting Test Plan: 1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level) 2. Added another unit test to db_test which specifically tests that filters are being skipped Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33717	2015-02-26 16:25:56 -08:00
Igor Sugak	62247ffa3b	rocksdb: Add missing override Summary: When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes. Prerequisites: bear and clang 3.5 build with extra tools ```lang=bash % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html % clang-modernize -p . -include . -add-override % make format ``` Test Plan: Make sure all tests are passing. ```lang=bash % #Use default fb code clang. % make check ``` Verify less error and no missing override errors. ```lang=bash % # Have trunk clang present in path. % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make ``` Reviewers: igor, kradhakrishnan, rven, meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34077	2015-02-26 11:28:41 -08:00
krad	d9f4875e52	Disable pre-fetching of index and filter blocks for sst_dump_tool. Summary: BlockBasedTable pre-fetches the filter and index blocks on Open call. This is an optimistic optimization targeted for runtime scenario. The optimization is unnecessary for sst_dump_tool - Added a provision to disable pre-fetching of index and filter blocks in BlockBasedTable - Disabled pre-fetching for the sst_dump tool Stack for reference : #01 0x00000000005ed944 in snappy::InternalUncompress<snappy::SnappyArrayWriter> () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:148 #02 0x00000000005edeee in snappy::RawUncompress () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:947 #03 0x00000000004e0b4d in rocksdb::UncompressBlockContents () from /data/users/paultuckfield/rocksdb/./util/compression.h:69 #04 0x00000000004e145c in rocksdb::ReadBlockContents () from /data/users/paultuckfield/rocksdb/table/format.cc:334 #05 0x00000000004ca424 in rocksdb::(anonymous namespace)::ReadBlockFromFile () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:70 #06 0x00000000004cccad in rocksdb::BlockBasedTable::CreateIndexReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:173 #07 0x00000000004d17e5 in rocksdb::BlockBasedTable::Open () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:553 #08 0x00000000004c8184 in rocksdb::BlockBasedTableFactory::NewTableReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_factory.cc:51 #09 0x0000000000598463 in rocksdb::SstFileReader::NewTableReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:69 #10 0x00000000005986c2 in rocksdb::SstFileReader::SstFileReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:26 #11 0x0000000000599047 in rocksdb::SSTDumpTool::Run () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:332 #12 0x0000000000409b06 in main () from /data/users/paultuckfield/rocksdb/tools/sst_dump.cc:12 Test Plan: - Added a unit test to trigger the code. - Also did some manual verification. - Passed all unit tests task #6296048 Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34041	2015-02-25 16:34:26 -08:00
Igor Sugak	62f7a1be4f	rocksdb: Fixed 'Dead assignment' and 'Dead initialization' scan-build warnings Summary: This diff contains trivial fixes for 6 scan-build warnings: db/c_test.c `db` variable is never read. Removed assignment. scan-build report: http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath db/db_iter.cc `skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return. scan-build report: http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath db/log_reader.cc In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion. scan-build report: http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases. scan-build reprots: http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPath http://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath table/plain_table_key_coding.cc Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration. scan-build report: http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath tools/db_stress.cc Missing `break` in switch case block. This seems to be a bug. Added missing `break`. Test Plan: Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs. ```lang=bash % make check % make analyze ``` Reviewers: meyering, igor, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33795	2015-02-23 14:10:09 -08:00
Jim Meyering	aa5d8e6d95	table_test.cc: add missing 5th arg in TestArgs initializer Summary: Adding -W and -Wextra to CXXFLAGS provoked this failure: table/table_test.cc:1854:56: error: missing initializer for member ‘rocksdb::TestArgs::format_version’ [-Werror=missing-field-initializers] TestArgs args = { DB_TEST, false, 16, kNoCompression }; ^ Add the missing, 5th value (format_version). Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33765	2015-02-20 11:07:11 -08:00
Jim Meyering	9283c7afd2	build: remove always-true assertions Summary: Remove some always-true assertions. They provoke these compilation failures: table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] * table/plain_table_key_coding.cc (rocksdb): Remove assertion that unsigned type variable is >= 0. * db/version_set.cc (DoGenerateLevelFilesBrief): Likewise. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33747	2015-02-20 11:07:03 -08:00
Igor Sugak	98870c7b9c	rocksdb: Fix scan-build memory warning in table/block_based_table_reader.cc Summary: scan-build is reporting two memory leak bugs in `table/block_based_table_reader.cc`. They are both false positives. In both cases we allocate memory in `ReadBlockFromFile` if `s.ok()`. Then after the function `ReadBlockFromFile` returns we check for the same variable if `s.ok()` and then use the memory that was allocated. The bugs reported by scan-build is if `ReadBlockFromFile` allocates memory and returns, but for some reason status `s` is not the same and `s.ok() != true`. In this case scan-build is concerned that memory owner transfer is not explicit. I modified `ReadBlockFromFile` to accept `std::unique_ptr<Block>*` as a parameter, instead of raw pointer. scan-build reports: http://home.fburl.com/~sugak/latest2/report-a4b3fa.html#EndPath http://home.fburl.com/~sugak/latest2/report-29adbf.html#EndPath Test Plan: Make sure scan-build does not report these bugs and all tests are passing. ```lang=bash % make check % make analyze ``` Reviewers: sdong, lgalanis, meyering, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33681	2015-02-19 14:07:38 -08:00
sdong	68af7811ea	Remember whole key/prefix filtering on/off in SST file Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter. Test Plan: Add a unit test for it Reviewers: yhchiang, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32889	2015-02-11 11:20:04 -08:00
sdong	e63140d52b	Get() to use prefix bloom filter when filter is not block based Summary: Get() now doesn't make use of bloom filter if it is prefix based. Add the check. Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter. Test Plan: make all check Add a test case in DBTest Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31941	2015-02-04 15:15:41 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
Igor Canadi	96b8240bc5	Support footer versions bigger than 1 Summary: In this diff I add another parameter to BlockBasedTableOptions that will let users specify block based table's format. This will greatly simplify block based table's format changes in the future. First format change that this will support is encoding decompressed size in Zlib and BZip2 blocks. This diff is blocking https://reviews.facebook.net/D31311. Test Plan: Added a unit tests. More tests to come as part of https://reviews.facebook.net/D31311. Reviewers: dhruba, MarkCallaghan, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31383	2015-01-13 14:33:04 -08:00
Igor Canadi	15d2abbec3	Fix build issues	2015-01-09 13:04:06 -08:00
Igor Canadi	abb9b95ffe	Move compression functions from port/ to util/ Summary: We keep checksum functions in util/, there is no reason for compression to be in port/ Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31281	2015-01-09 12:57:11 -08:00
Manish Patil	7ea7bdf04d	Dump routine to BlockBasedTableReader Summary: Added necessary routines for dumping block based SST with block filter Test Plan: Added "raw" mode to utility sst_dump Reviewers: sdong, rven Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D29679	2014-12-23 13:24:07 -08:00
Jonah Cohen	a14b7873ee	Enforce write buffer memory limit across column families Summary: Introduces a new class for managing write buffer memory across column families. We supplement ColumnFamilyOptions::write_buffer_size with ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer instance that enforces memory limits before flushing out to disk. Test Plan: Added SharedWriteBuffer unit test to db_test.cc Reviewers: sdong, rven, ljin, igor Reviewed By: igor Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim Differential Revision: https://reviews.facebook.net/D22581	2014-12-02 12:09:20 -08:00
Yueh-Hsuan Chiang	7e608e2fe3	Block plain_table_index.cc in ROCKSDB_LITE Summary: Block plain_table_index.cc in ROCKSDB_LITE Test Plan: make clean make OPT=-DROCKSDB_LITE shared_lib -j32 make clean make shared_lib -j32 Reviewers: ljin, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29535	2014-11-24 20:47:27 -08:00
Yueh-Hsuan Chiang	13de000f07	Add rocksdb::ToString() to address cases where std::to_string is not available. Summary: In some environment such as android, the c++ library does not have std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string when std::to_string is not available, and implements std::to_string in the other case. Test Plan: make dbg -j32 ./db_test make clean make dbg OPT=-DOS_ANDROID -j32 ./db_test Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29181	2014-11-24 20:44:49 -08:00
Bryan Rosario	9e285d4238	Added CompatibleOptions for compatibility with LevelDB Options Summary: Created a CompatibleOptions object that can be used as a LevelDB Options object and then converted to a RocksDB Options object using the ConvertOptions() method. Test Plan: Unit test included in diff. Reviewers: ljin Reviewed By: ljin Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28893	2014-11-20 19:24:39 -08:00
Lei Jin	8d3f8f9696	remove all remaining references to cfd->options() Summary: The very last reference happens in DBImpl::GetOptions() I built with both DBImpl::GetOptions() and ColumnFamilyData::options() commented out Test Plan: make all check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29073	2014-11-18 10:20:10 -08:00
Igor Canadi	9be338cf9d	CompactionJobTest Summary: This is just a simple test that passes two files though a compaction. It shows the framework so that people can continue building new compaction unit tests. In the future we might want to move some Compaction* tests from DBTest here. For example, CompactBetweenSnapshot seems a good candidate. Hopefully this test can be simpler when we mock out VersionSet. Test Plan: this is a test Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28449	2014-11-14 11:35:48 -08:00
sdong	a177742a9b	Make db_stress built for ROCKSDB_LITE Summary: Make db_stress built for ROCKSDB_LITE. The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB. Test Plan: make db_stress Reviewers: yhchiang, rven, ljin, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D28797	2014-11-14 10:20:51 -08:00
Igor Canadi	25f273027b	Fix iOS compile with -Wshorten-64-to-32 Summary: So iOS size_t is 32-bit, so we need to static_cast<size_t> any uint64_t :( Test Plan: TARGET_OS=IOS make static_lib Reviewers: dhruba, ljin, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28743	2014-11-13 14:39:30 -05:00
Igor Canadi	767777c2bd	Turn on -Wshorten-64-to-32 and fix all the errors Summary: We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details. This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables. Test Plan: compiles Reviewers: ljin, rven, yhchiang, sdong Reviewed By: yhchiang Subscribers: bobbaldwin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28689	2014-11-11 16:47:22 -05:00
Igor Canadi	68effa0348	Fix -Wshadow for tools Summary: Previously I made `make check` work with -Wshadow, but there are some tools that are not compiled using `make check`. Test Plan: make all Reviewers: yhchiang, rven, ljin, sdong Reviewed By: ljin, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28497	2014-11-07 15:04:30 -08:00
Igor Canadi	9f20395cd6	Turn -Wshadow back on Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc. Test Plan: compiles Reviewers: ljin, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28131	2014-11-06 11:14:28 -08:00
Igor Canadi	9f7fc3ac45	Turn on -Wshadow Summary: ...and fix all the errors :) Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean. Test Plan: compiles Reviewers: yhchiang, rven, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27711	2014-10-31 11:59:54 -07:00
Yueh-Hsuan Chiang	98849a35fa	Apply InfoLogLevel to the logs in table/block_based_table_reader.cc Summary: Apply InfoLogLevel to the logs in table/block_based_table_reader.cc Also, add missing checks for the returned status in BlockBasedTable::Open Test Plan: make Reviewers: sdong, ljin, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D28005	2014-10-31 11:41:15 -07:00
Yueh-Hsuan Chiang	217cc217d7	Apply InfoLogLevel to the logs in table/meta_blocks.cc Summary: Apply InfoLogLevel to the logs in table/meta_blocks.cc Test Plan: make Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27903	2014-10-29 17:55:19 -07:00
Yueh-Hsuan Chiang	fd95745a59	Fix compile error in table/plain_table_index.cc Summary: Fix compile error in table/plain_table_index.cc Test Plan: make	2014-10-29 17:42:38 -07:00
Yueh-Hsuan Chiang	e7ad69b9fe	Apply InfoLogLevel to the logs in table/plain_table_index.cc Summary: Apply InfoLogLevel to the logs in table/plain_table_index.cc Test Plan: make Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27909	2014-10-29 17:08:40 -07:00
Yueh-Hsuan Chiang	bbd9c53457	Apply InfoLogLevel to the logs in table/block_based_table_builder.cc Summary: Apply InfoLogLevel to the logs in table/block_based_table_builder.cc Test Plan: make Reviewers: igor, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27921	2014-10-29 17:08:20 -07:00
Igor Canadi	412b7f85bb	Include atomic in mock_table.h	2014-10-28 18:10:55 -07:00
Igor Canadi	abac3d6476	TableMock + framework for mock classes Summary: This diff replaces BlockBasedTable in flush_job_test with TableMock, making it depend on less things and making it closer to an unit test than integration test. It also introduces a framework to compile mock classes -- Any file named *mock.cc will not be compiled into the build. It will only get compiled into the tests. What way we can mock out most other classes, Version, VersionSet, DBImpl, etc. Test Plan: flush_job_test Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27681	2014-10-28 17:52:32 -07:00
Igor Canadi	c1c68bce43	remove atomic_pointer.h references	2014-10-27 15:12:20 -07:00
Lei Jin	f1841985e4	dynamic inplace_update options Summary: Make inplace_update_support and inplace_update_num_locks dynamic. inplace_callback becomes immutable We are almost free of references to cfd->options() in db_impl Test Plan: unit test Reviewers: igor, yhchiang, rven, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25293	2014-10-27 12:10:13 -07:00
Lei Jin	839c376bd1	fix table_test Summary: SaveValue expects an internal key but I previously added to table a user key Test Plan: ran the test	2014-10-22 13:53:35 -07:00
Lei Jin	0fd985f427	Avoid reloading filter on Get() if cache_index_and_filter_blocks == false Summary: This fixes the case that filter policy is missing in SST file, but we open the table with filter policy on and cache_index_and_filter_blocks = false. The current behavior is that we will try to load it every time on Get() but fail. Test Plan: unit test Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25455	2014-10-22 11:52:35 -07:00
Lei Jin	2dd9bfe3a8	Sanitize block-based table index type and check prefix_extractor Summary: Respond to issue reported https://www.facebook.com/groups/rocksdb.dev/permalink/651090261656158/ Change the Sanitize signature to take both DBOptions and CFOptions Test Plan: unit test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25041	2014-10-17 21:18:36 -07:00
Igor Canadi	d6987216c9	Merge pull request #327 from dalgaaf/wip-da-SCA-20141001 Fix some issues from SCA	2014-10-02 10:59:52 -07:00
Lei Jin	5ec53f3edf	make compaction related options changeable Summary: make compaction related options changeable. Most of changes are tedious, following the same convention: grabs MutableCFOptions at the beginning of compaction under mutex, then pass it throughout the job and register it in SuperVersion at the end. Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23349	2014-10-01 16:19:16 -07:00
Danny Al-Gaaf	28a6e31583	table/block_based_table_builder.cc: remove unused variable Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:09 +02:00
Danny Al-Gaaf	6b6cedbb1b	table/format.cc: reduce scope of some variables Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	55652043c8	table/cuckoo_table_reader.cc: pass func parameter by reference Fix for: [table/cuckoo_table_reader.cc:196]: (performance) Function parameter 'target' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	d517c83648	in_table_factory.cc: use correct format specifier Use %zu instead of %zd since size_t and uint32_t are unsigned. Fix for: [table/plain_table_factory.cc:55]: (warning) %zd in format string (no. 1) requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'. [table/plain_table_factory.cc:58]: (warning) %zd in format string (no. 1) requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	063471bf76	table/table_test.cc: pass func parameter by reference Fix for: [table/table_test.cc:1218]: (performance) Function parameter 'prefix' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	93548ce8f4	table/cuckoo_table_reader.cc: pass func parameter by ref Fix for: [table/cuckoo_table_reader.cc:198]: (performance) Function parameter 'file_data' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	8ce050b51b	table/bloom_block.*: pass func parameter by reference [table/bloom_block.h:29]: (performance) Function parameter 'keys_hashes' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:31 +02:00
Lei Jin	2faf49d5f1	use GetContext to replace callback function pointer Summary: Intead of passing callback function pointer and its arg on Table::Get() interface, passing GetContext. This makes the interface cleaner and possible better perf. Also adding a fast pass for SaveValue() Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24057	2014-09-29 11:09:09 -07:00
Lei Jin	2dc6f62bb9	handle kDelete type in cuckoo builder Summary: when I changed std::vector<std::string, std::string> to std::string to store key/value pairs in builder, I missed the handling for kDeletion type. As a result, value_size_ can be wrong if the first add key is for deletion. The is captured by ./cuckoo_table_db_test Test Plan: ./cuckoo_table_db_test ./cuckoo_table_reader_test ./cuckoo_table_builder_test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24045	2014-09-29 10:25:21 -07:00
Lei Jin	d439451fab	delay initialization of cuckoo table iterator Summary: cuckoo table iterator creation is quite expensive since it needs to load all data and sort them. After compaction, RocksDB creates a new iterator of the new file to make sure it is in good state. That makes the DB creation quite slow. Delay the iterator db sort to the seek time to speed it up. Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23775	2014-09-25 16:45:37 -07:00
Lei Jin	94997eab5e	reduce memory usage of cuckoo table builder Summary: builder currently buffers all key value pairs as a vector of pair<string, string>. That is too much due to std::string overhead. It wasn't able to fit 1B key/values (12bytes total) in 100GB of ram. Switch to use a plain string to store the key/value sequence and use only 12GB of ram as a result. Test Plan: db_bench Reviewers: igor, sdong, yhchiang Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23763	2014-09-25 16:34:24 -07:00
Lei Jin	c6275956e2	improve memory efficiency of cuckoo reader Summary: When creating a new iterator, instead of storing mapping from key to bucket id for sorting, store only bucket id and read key from mmap file based on the id. This reduces from 20 bytes per entry to only 4 bytes. Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23757	2014-09-25 16:15:23 -07:00
Lei Jin	581442d446	option to choose module when calculating CuckooTable hash Summary: Using module to calculate hash makes lookup ~8% slower. But it has its benefit: file size is more predictable, more space enffient Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23691	2014-09-25 13:53:27 -07:00
Lei Jin	3c68006109	CompactedDBImpl Summary: Add a CompactedDBImpl that will enabled when calling OpenForReadOnly() and the DB only has one level (>0) of files. As a performan comparison, CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with ReadOnlyDBImpl. Test Plan: db_bench Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23553	2014-09-25 11:14:01 -07:00
Lei Jin	0a29ce5393	re-enable BlockBasedTable::SetupForCompaction() Summary: It was commented out in D22545 by accident. Keep the option in ImmutableOptions for now. I can make it dynamic in https://reviews.facebook.net/D23349 Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23865	2014-09-23 14:18:57 -07:00
Igor Canadi	55af370756	Remove TODO for checking index checksums	2014-09-23 13:02:23 -07:00
Igor Canadi	3d74f09979	Fix compile	2014-09-22 15:19:20 -07:00
liuchang0812	4436f17bd8	fixed #303 : replace %ld with % PRId64	2014-09-21 22:09:48 +08:00
Lei Jin	51af7c326c	CuckooTable: add one option to allow identity function for the first hash function Summary: MurmurHash becomes expensive when we do millions Get() a second in one thread. Add this option to allow the first hash function to use identity function as hash function. It results in QPS increase from 3.7M/s to ~4.3M/s. I did not observe improvement for end to end RocksDB performance. This may be caused by other bottlenecks that I will address in a separate diff. Test Plan: ``` [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320 [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320 ``` Reviewers: sdong, igor, yhchiang Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23451	2014-09-18 11:00:48 -07:00
Igor Canadi	ff76895614	Remove some unnecessary constructors Summary: This is continuing the work done by `27b22f13a3` It's just cleaning up some unnecessary constructors. The most important change is removing Block::Block(const BlockContents& contents) constructor. It was only used from the unit test. Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23547	2014-09-17 16:45:58 -07:00
Lei Jin	feadb9df53	fix cuckoo table builder test Summary: as title Test Plan: ./cuckoo_table_builder_test Reviewers:igor CC:leveldb Task ID: # Blame Rev:	2014-09-17 15:48:33 -07:00
Igor Canadi	54cada92b1	Run make format on PR #249	2014-09-17 15:08:50 -07:00
Igor Canadi	27b22f13a3	Merge pull request #249 from tdfischer/decompression-refactoring Decompression refactoring	2014-09-17 15:01:40 -07:00
Torrie Fischer	fb6456b00d	Replace naked calls to operator new and delete (Fixes #222 ) This replaces a mishmash of pointers in the Block and BlockContents classes with std::unique_ptr. It also changes the semantics of BlockContents to be limited to use as a constructor parameter for Block objects, as it owns any block buffers handed to it.	2014-09-17 13:50:07 -07:00
Lei Jin	5600c8f6e5	cuckoo table: return estimated size - 1 Summary: This is to avoid cutting file prematurely and resulting file size to be half of specified. Test Plan: db_bench Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23541	2014-09-17 13:25:29 -07:00
Lei Jin	a062e1f2c4	SetOptions() for memtable related options Summary: as title Test Plan: make all check I will think a way to set up stress test for this Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23055	2014-09-17 12:49:13 -07:00
Igor Canadi	faad439ac4	Fix #284 Summary: This work on my compiler, but it turns out some compilers don't implicitly add constness, see: https://github.com/facebook/rocksdb/issues/284. This diff adds constness explicitly. Test Plan: still compiles Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23409	2014-09-16 10:30:32 -07:00
Igor Canadi	059e584dd3	[unit test] CompactRange should fail if we don't have space Summary: See t5106397. Also, few more changes: 1. in unit tests, the assumption is that writes will be dropped when there is no space left on device. I changed the wording around it. 2. InvalidArgument() errors are only when user-provided arguments are invalid. When the file is corrupted, we need to return Status::Corruption Test Plan: make check Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23145	2014-09-10 17:00:00 -07:00
Igor Canadi	6bb7e3ef25	Merger test Summary: I abandoned https://reviews.facebook.net/D18789, but I wrote a good unit test there, so let's check it in. :) Test Plan: this is test Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22827	2014-09-08 22:24:40 -07:00
Lei Jin	52311463e9	MemTableOptions Summary: removed reference to options in WriteBatch and DBImpl::Get() Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23049	2014-09-08 18:46:52 -07:00
Feng Zhu	0af157f9bf	Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit `1d23b5c470` Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979	2014-09-08 10:37:05 -07:00
wankai	88a2f44f99	fix comments	2014-09-08 16:34:04 +08:00
wankai	823773837b	replace hard-coded number with named variable	2014-09-08 11:10:17 +08:00
wankai	4c2b1f097b	Merge remote-tracking branch 'upstream/master'	2014-09-06 23:23:11 +08:00
wankai	a5d2863074	typo improvement	2014-09-06 23:21:26 +08:00
Radheshyam Balasundaram	5cd0576ffe	Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Summary: Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Also added tests. Test Plan: make check all Also ran db_bench to generate multiple files. Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22743	2014-09-05 11:18:01 -07:00
liuhuahang	bb6ae0f80c	fix more compile warnings N/A Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-05 14:14:37 +08:00
Stanislau Hlebik	45a5e3ede0	Remove path with arena==nullptr from NewInternalIterator Summary: Simply code by removing code path which does not use Arena from NewInternalIterator Test Plan: make all check make valgrind_check Reviewers: sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22395	2014-09-04 17:40:41 -07:00
Lei Jin	5665e5e285	introduce ImmutableOptions Summary: As a preparation to support updating some options dynamically, I'd like to first introduce ImmutableOptions, which is a subset of Options that cannot be changed during the course of a DB lifetime without restart. ColumnFamily will keep both Options and ImmutableOptions. Any component below ColumnFamily should only take ImmutableOptions in their constructor. Other options should be taken from APIs, which will be allowed to adjust dynamically. I am yet to make changes to memtable and other related classes to take ImmutableOptions in their ctor. That can be done in a seprate diff as this one is already pretty big. Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D22545	2014-09-04 16:18:36 -07:00
Igor Canadi	f7f973d354	Merge pull request #269 from huahang/patch-2 fix a few compile warnings	2014-09-04 09:43:00 -07:00
liuhuahang	ef5b384729	fix a few compile warnings 1, const qualifiers on return types make no sense and will trigger a compile warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers] 2, class HistogramImpl has virtual functions and thus should have a virtual destructor 3, with some toolchain, the macro __STDC_FORMAT_MACROS is predefined and thus should be checked before define Change-Id: I69747a03bfae88671bfbb2637c80d17600159c99 Signed-off-by: liuhuahang <liuhuahang@zerus.co>	2014-09-04 23:06:23 +08:00
wankai	1785114a6f	delete unused Comparator	2014-09-04 09:10:13 +08:00
wankai	19cc588b77	change to filter_block std::unique_ptr support RAII	2014-09-04 00:44:49 +08:00
wankai	5d25a46936	Merge remote-tracking branch 'upstream/master'	2014-09-03 21:57:13 +08:00
Lei Jin	9b58c73c7c	call SanitizeDBOptionsByCFOptions() in the right place Summary: It only covers Open() with default column family right now Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22467	2014-09-02 14:42:23 -07:00
Igor Canadi	7f19bb93c6	Merge pull request #242 from tdfischer/perf-timer-destructors Refactor PerfStepTimer to automatically stop on destruct	2014-09-02 13:06:40 -07:00
Torrie Fischer	6614a48418	Refactor PerfStepTimer to stop on destruct This eliminates the need to remember to call PERF_TIMER_STOP when a section has been timed. This allows more useful design with the perf timers and enables possible return value optimizations. Simplistic example: class Foo { public: Foo(int v) : m_v(v); private: int m_v; } Foo makeFrobbedFoo(int errno) { errno = 0; return Foo(); } Foo bar(int *errno) { PERF_TIMER_GUARD(some_timer); return makeFrobbedFoo(errno); } int main(int argc, char[] argv) { Foo f; int errno; f = bar(&errno); if (errno) return -1; return 0; } After bar() is called, perf_context.some_timer would be incremented as if Stop(&perf_context.some_timer) was called at the end, and the compiler is still able to produce optimizations on the return value from makeFrobbedFoo() through to main().	2014-09-02 12:04:22 -07:00
Igor Canadi	076bd01a29	Fix compile Summary: gcc on our dev boxes is not happy about __attribute__((unused)) Test Plan: compiles now Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22707	2014-09-02 11:49:38 -07:00
Igor Canadi	990df99a61	Fix ios compile Summary: We need to set contbuild for this :) Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22701	2014-09-02 10:50:15 -07:00
Wankai Zhang	dff2b1a8f8	typo improvement	2014-09-02 22:57:03 +08:00
Radheshyam Balasundaram	d20b8cfaa1	Improve Cuckoo Table Reader performance. Inlined hash function and number of buckets a power of two. Summary: Use inlined hash functions instead of function pointer. Make number of buckets a power of two and use bitwise and instead of mod. After these changes, we get almost 50% improvement in performance. Results: With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.231us (4.3 Mqps) with batch size of 0 Time taken per op is 0.229us (4.4 Mqps) with batch size of 0 Time taken per op is 0.185us (5.4 Mqps) with batch size of 0 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.108us (9.3 Mqps) with batch size of 10 Time taken per op is 0.100us (10.0 Mqps) with batch size of 10 Time taken per op is 0.103us (9.7 Mqps) with batch size of 10 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.101us (9.9 Mqps) with batch size of 25 Time taken per op is 0.098us (10.2 Mqps) with batch size of 25 Time taken per op is 0.097us (10.3 Mqps) with batch size of 25 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50 Time taken per op is 0.097us (10.3 Mqps) with batch size of 50 Time taken per op is 0.097us (10.3 Mqps) with batch size of 50 With 120000000 items, utilization is 89.41%, number of hash functions: 2. Time taken per op is 0.102us (9.8 Mqps) with batch size of 100 Time taken per op is 0.098us (10.2 Mqps) with batch size of 100 Time taken per op is 0.115us (8.7 Mqps) with batch size of 100 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0 Time taken per op is 0.155us (6.5 Mqps) with batch size of 0 Time taken per op is 0.152us (6.6 Mqps) with batch size of 0 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.089us (11.3 Mqps) with batch size of 10 Time taken per op is 0.084us (11.9 Mqps) with batch size of 10 Time taken per op is 0.086us (11.6 Mqps) with batch size of 10 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.087us (11.5 Mqps) with batch size of 25 Time taken per op is 0.085us (11.7 Mqps) with batch size of 25 Time taken per op is 0.093us (10.8 Mqps) with batch size of 25 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.094us (10.6 Mqps) with batch size of 50 Time taken per op is 0.094us (10.7 Mqps) with batch size of 50 Time taken per op is 0.093us (10.8 Mqps) with batch size of 50 With 100000000 items, utilization is 74.51%, number of hash functions: 2. Time taken per op is 0.092us (10.9 Mqps) with batch size of 100 Time taken per op is 0.089us (11.2 Mqps) with batch size of 100 Time taken per op is 0.088us (11.3 Mqps) with batch size of 100 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0 Time taken per op is 0.168us (6.0 Mqps) with batch size of 0 Time taken per op is 0.190us (5.3 Mqps) with batch size of 0 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.081us (12.4 Mqps) with batch size of 10 Time taken per op is 0.077us (13.0 Mqps) with batch size of 10 Time taken per op is 0.083us (12.1 Mqps) with batch size of 10 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 25 Time taken per op is 0.073us (13.7 Mqps) with batch size of 25 Time taken per op is 0.073us (13.7 Mqps) with batch size of 25 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.076us (13.1 Mqps) with batch size of 50 Time taken per op is 0.072us (13.8 Mqps) with batch size of 50 Time taken per op is 0.072us (13.8 Mqps) with batch size of 50 With 80000000 items, utilization is 59.60%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 100 Time taken per op is 0.074us (13.6 Mqps) with batch size of 100 Time taken per op is 0.073us (13.6 Mqps) with batch size of 100 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.190us (5.3 Mqps) with batch size of 0 Time taken per op is 0.186us (5.4 Mqps) with batch size of 0 Time taken per op is 0.184us (5.4 Mqps) with batch size of 0 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.079us (12.7 Mqps) with batch size of 10 Time taken per op is 0.070us (14.2 Mqps) with batch size of 10 Time taken per op is 0.072us (14.0 Mqps) with batch size of 10 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 25 Time taken per op is 0.072us (14.0 Mqps) with batch size of 25 Time taken per op is 0.071us (14.1 Mqps) with batch size of 25 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.082us (12.1 Mqps) with batch size of 50 Time taken per op is 0.071us (14.1 Mqps) with batch size of 50 Time taken per op is 0.073us (13.6 Mqps) with batch size of 50 With 70000000 items, utilization is 52.15%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 100 Time taken per op is 0.077us (13.0 Mqps) with batch size of 100 Time taken per op is 0.078us (12.8 Mqps) with batch size of 100 Test Plan: make check all make valgrind_check make asan_check Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22539	2014-08-29 19:06:15 -07:00
Tomislav Novak	0f9c43ea36	ForwardIterator: reset incomplete iterators on Seek() Summary: When reading from kBlockCacheTier, ForwardIterator's internal child iterators may end up in the incomplete state (read was unable to complete without doing disk I/O). `ForwardIterator::status()` will correctly report that; however, the iterator may be stuck in that state until all sub-iterators are rebuilt: * `NeedToSeekImmutable()` may return false even if some sub-iterators are incomplete * one of the child iterators may be an empty iterator without any state other that the kIncomplete status (created using `NewErrorIterator()`); seeking on any such iterator has no effect -- we need to construct it again Akin to rebuilding iterators after a superversion bump, this diff makes forward iterator reset all incomplete child iterators when `Seek()` or `Next()` are called. Test Plan: TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test Reviewers: igor, sdong, ljin Reviewed By: ljin Subscribers: lovro, march, leveldb Differential Revision: https://reviews.facebook.net/D22575	2014-08-29 16:21:29 -07:00
Igor Canadi	22a0a60dc4	Merge pull request #250 from wankai/master delete unused struct Options	2014-08-29 09:53:18 -04:00
Wankai Zhang	be25ee44fe	delete unused struct Options	2014-08-29 17:31:04 +08:00
Feng Zhu	1d23b5c470	remove_internal_filter_policy Summary: 1. remove class InternalFilterPolicy in db/dbformat.h 2. Transformation from internal key to user key is done in filter_block.cc 3. This is a preparation for patch D20979 Test Plan: make all check valgrind ./db_test Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22509	2014-08-28 17:06:29 -07:00
Radheshyam Balasundaram	7f71448388	Implementing a cache friendly version of Cuckoo Hash Summary: This implements a cache friendly version of Cuckoo Hash in which, in case of collission, we try to insert in next few locations. The size of the neighborhood to check is taken as an input parameter in builder and stored in the table. Test Plan: make check all cuckoo_table_{db,reader,builder}_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22455	2014-08-28 10:42:23 -07:00
Wankai Zhang	528a11c635	Update block_builder.h more c++11 way noncopyable and keep parameter's name of constructor consistent	2014-08-28 13:37:10 +08:00
Radheshyam Balasundaram	4142a3e783	Adding a user comparator for comparing Uint64 slices. Summary: - New Uint64 comparator - Modify Reader and Builder to take custom user comparators instead of bytewise comparator - Modify logic for choosing unused user key in builder - Modify iterator logic in reader - test changes Test Plan: cuckoo_table_{builder,reader,db}_test make check all Reviewers: ljin, sdong Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D22377	2014-08-27 10:39:31 -07:00
Lei Jin	23861857c4	ReadOptions.total_order_seek to allow total order seek for block-based table when hash index is enabled Summary: as title Test Plan: table_test Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22239	2014-08-25 16:14:30 -07:00
Lei Jin	a98badff16	print table options Summary: Add a virtual function in table factory that will print table options Test Plan: make release Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22149	2014-08-25 14:24:09 -07:00
Lei Jin	384400128f	move block based table related options BlockBasedTableOptions Summary: I will move compression related options in a separate diff since this diff is already pretty lengthy. I guess I will also need to change JNI accordingly :( Test Plan: make all check Reviewers: yhchiang, igor, sdong Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21915	2014-08-25 14:22:05 -07:00
Shao Yu Zhang	f76eda74d6	Fix compilation issue on OSX	2014-08-21 18:11:33 -07:00
Radheshyam Balasundaram	08be7f5266	Implement Prepare method in CuckooTableReader Summary: - Implement Prepare method - Rewrite performance tests in cuckoo_table_reader_test to write new file only if one doesn't already exist. - Add performance tests for batch lookup along with prefetching. Test Plan: ./cuckoo_table_reader_test --enable_perf Results (We get better results if we used int64 comparator instead of string comparator (TBD in future diffs)): With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.208us (4.8 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.182us (5.5 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.161us (6.2 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.161us (6.2 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2. Time taken per op is 0.163us (6.1 Mqps) with batch size of 100 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.252us (4.0 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.192us (5.2 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.195us (5.1 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.191us (5.2 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3. Time taken per op is 0.194us (5.1 Mqps) with batch size of 100 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.228us (4.4 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.185us (5.4 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.186us (5.4 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.189us (5.3 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3. Time taken per op is 0.188us (5.3 Mqps) with batch size of 100 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.325us (3.1 Mqps) with batch size of 0 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.196us (5.1 Mqps) with batch size of 10 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.199us (5.0 Mqps) with batch size of 25 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.196us (5.1 Mqps) with batch size of 50 With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3. Time taken per op is 0.209us (4.8 Mqps) with batch size of 100 Reviewers: sdong, yhchiang, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22167	2014-08-20 18:35:35 -07:00
Yueh-Hsuan Chiang	63a2215c63	Improve Options sanitization and add MmapReadRequired() to TableFactory Summary: Currently, PlainTable must use mmap_reads. When PlainTable is used but allow_mmap_reads is not set, rocksdb will fail in flush. This diff improve Options sanitization and add MmapReadRequired() to TableFactory. Test Plan: export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest make db_test -j32 ./db_test Reviewers: sdong, ljin Reviewed By: ljin Subscribers: you, leveldb Differential Revision: https://reviews.facebook.net/D21939	2014-08-20 15:53:39 -07:00
sdong	045575ad0c	Add CuckooHash table format to table_reader_bench Summary: Make table_reader_bench cover all the three table formats. Test Plan: Run it using three options Reviewers: radheshyamb, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D22137	2014-08-19 14:58:15 -07:00
Yueh-Hsuan Chiang	570ba5aca8	Avoid retrying to read property block from a table when it does not exist. Summary: Avoid retrying to read property block from a table when it does not exist in updating stats for compensating deletion entries. In addition, ReadTableProperties() now returns Status::NotFound instead of Status::Corruption when table properties does not exist in the file. Test Plan: make db_test -j32 export ROCKSDB_TESTS=CompactionDeleteionTrigger ./db_test Reviewers: ljin, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21867	2014-08-15 12:17:44 -07:00
Radheshyam Balasundaram	9674c11d01	Integrating Cuckoo Hash SST Table format into RocksDB Summary: Contains the following changes: - Implementation of cuckoo_table_factory - Adding cuckoo table into AdaptiveTableFactory - Adding cuckoo_table_db_test, similar to lines of plain_table_db_test - Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key. - Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test cuckoo_table_db_test make check all make valgrind_check make asan_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21219	2014-08-11 20:21:07 -07:00
miguelportilla	93e6b5e9d9	Changes to support unity build: * Script for building the unity.cc file via Makefile * Unity executable Makefile target for testing builds * Source code changes to fix compilation of unity build	2014-08-11 13:22:47 -04:00
ZHANG Biao	63d5cc72fa	fix various 'comparison of integers of different signs' compiling errors under macosx	2014-08-07 17:06:07 +08:00
sdong	1242bfcad7	Add DB property "rocksdb.estimate-table-readers-mem" Summary: Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache. Refactor the property codes to allow getting property from a version, with DB mutex not acquired. Test Plan: Add several checks of this new property in existing codes for various cases. Reviewers: yhchiang, ljin Reviewed By: ljin Subscribers: xjin, igor, leveldb Differential Revision: https://reviews.facebook.net/D20733	2014-08-06 11:39:46 -07:00
Radheshyam Balasundaram	606a126703	Changing implementaiton of CuckooTableBuilder to not take file_size, key_length, value_length. Summary: - Maintain a list of key-value pairs as vectors during Add operation. - Start building hash table only when Finish() is called. - This approach takes more time and space but avoids taking file_size, key and value lengths. - Rewrote cuckoo_table_builder_test I did not know about IterKey while writing this diff. I shall change places where IterKey could be used instead of std::string tomorrow. Please review rest of the logic. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test valgrind_check asan_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20907	2014-08-05 20:55:46 -07:00
Radheshyam Balasundaram	2124c85cc6	Implementing CuckooTableReader::NewIterator Summary: - Reads key-value pairs from file and builds an in-memory index of key-to-bucket id map in sorted order of key. - Assumes bytewise comparator for sorting keys. - Test changes Test Plan: cuckoo_table_reader_test --enable_perf valgrind_check asan_check Reviewers: yhchiang, sdong, ljin Reviewed By: ljin Subscribers: leveldb, igor Differential Revision: https://reviews.facebook.net/D20721	2014-08-05 16:35:02 -07:00
sdong	02c4023666	Remove port::MemoryBarrier() from table_reader_bench Summary: port::MemoryBarrier() is not recommended to use outside of port. Remove it. Test Plan: run table_reader_bench Reviewers: ljin, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21075	2014-08-05 11:33:56 -07:00
Feng Zhu	1129921e9b	logging_when_create_and_delete_manifest Summary: 1. logging when create and delete manifest file 2. fix formating in table/format.cc Test Plan: make all check run db_bench, track the LOG file. Reviewers: ljin, yhchiang, igor, yufei.zhu, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D21009	2014-08-04 11:25:42 -07:00
Radheshyam Balasundaram	0c9d03ba10	Fixing broken Mac build Summary: Made some small changes to fix the broken mac build Test Plan: make check all in both linux and mac. All tests pass. Reviewers: sdong, igor, ljin, yhchiang Reviewed By: ljin, yhchiang Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20895	2014-07-31 20:52:13 -07:00
Feng Zhu	b0999011e2	use stack instead of heap memory in ReadBlockContents in some case Summary: When compression is enabled, and blocksize is not too big, use the space in stack to hold bytes read from block. Bencmark: base version: commit `8f09d53fd1` malloc: 1.30% -> 0.98% free: 1.49% -> 1.07% Test Plan: make all check Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20679	2014-07-30 23:11:59 -07:00
Feng Zhu	8f09d53fd1	remove malloc when create data and index iterator in Get Summary: Define Block::Iter to be an independent class to be used by block_based_table_reader When creating data and index iterator, update an existing iterator rather than new one Thus malloc and free could be reduced Benchmark, Base: commit `76286ee67e` commands: --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=2621440 --use_hash_search=1 --block_size=1024 --block_restart_interval=1 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 malloc: 3.30% -> 1.42% free: 3.59%->1.61% Test Plan: make all check run db_stress valgrind ./db_test ./table_test Reviewers: ljin, yhchiang, dhruba, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20655	2014-07-30 16:34:35 -07:00
Radheshyam Balasundaram	91c01485d1	Minor changes to CuckooTableBuilder Summary: - Copy the key and value to in-memory hash table during Add operation. Also modified cuckoo_table_reader_test to use this. - Store only the user_key in in-memory hash table if it is last level file. - Handle Carryover while chosing unused key in Finish() method in case unused key was never found before Finish() call. Test Plan: cuckoo_table_reader_test --enable_perf cuckoo_table_builder_test valgrind_check asan_check Reviewers: sdong, yhchiang, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20715	2014-07-28 17:14:25 -07:00
Lei Jin	40fa8a4cd5	make statistics forward-able Summary: Make StatisticsImpl being able to forward stats to provided statistics implementation. The main purpose is to allow us to collect internal stats in the future even when user supplies custom statistics implementation. It avoids intrumenting 2 sets of stats collection code. One immediate use case is tuning advisor, which needs to collect some internal stats, users may not be interested. Test Plan: ran db_bench and see stats show up at the end of run Will run make all check since some tests rely on statistics Reviewers: yhchiang, sdong, igor Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20145	2014-07-28 12:05:36 -07:00
sdong	4a8f0c957c	Block::Iter::PrefixSeek() to have an extra check to filter out some false matches Summary: In block based table's hash index checking, when looking for a key that doesn't exist, there is a high chance that a false block is returned because of hash bucket conflicts. In this revision, another check is done to filter out some of those cases: comparing previous key of the block boundary to see whether the target block is what we are looking for. In a favored test setting (bloom filter disabled, 8 L0 files), I saw about 80% improvements. In a non-favored test setting (bloom filter enabled, files are all in L1, files are all cached), I see the performance penalty is less than 3%. Test Plan: make all check Reviewers: haobo, ljin Reviewed By: ljin Subscribers: wuj, leveldb, zagfox, yhchiang Differential Revision: https://reviews.facebook.net/D20595	2014-07-25 17:27:57 -07:00
Radheshyam Balasundaram	62f9b071ff	Implementation of CuckooTableReader Summary: Contains: - Implementation of TableReader based on Cuckoo Hashing - Unittests for CuckooTableReader - Performance test for TableReader Test Plan: make cuckoo_table_reader_test ./cuckoo_table_reader_test make valgrind_check make asan_check Reviewers: yhchiang, sdong, igor, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20511	2014-07-25 16:37:32 -07:00
Radheshyam Balasundaram	07a7d870b8	Addressing TODOs in CuckooTableBuilder Summary: Contains the following changes in CuckooTableBuilder: - Take an extra parameter in constructor to identify last level file. - Implement a better way to identify if a bucket has been inserted into the tree already during BFS search. - Minor typos Test Plan: make cuckoo_table_builder ./cuckoo_table_builder make valgrind_check Reviewers: sdong, igor, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D20445	2014-07-24 10:07:41 -07:00
Feng Zhu	da9274574f	Use IterKey instead of string in Block::Iter to reduce malloc Summary: Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test Use IterKey in block::Iter to replace std::string to reduce malloc. Evaluate it using perf record. malloc: 4.26% -> 2.91% free: 3.61% -> 3.08% Test Plan: make all check ./valgrind db_test dbformat_test Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong Reviewed By: sdong Differential Revision: https://reviews.facebook.net/D20433	2014-07-23 12:31:11 -07:00
Igor Canadi	2d3d63597a	Fix signed-unsigned compare error	2014-07-22 15:35:07 -04:00

1 2 3 4 5 ...

506 Commits