rocksdb

Author	SHA1	Message	Date
sdong	76d3cd3286	Fix public API dependency on internal codes and dependency on MAX_INT32 Summary: Public API depends on port/port.h which is wrong. Fix it. Also with gcc 4.8.1 build was broken as MAX_INT32 was not recognized. Fix it by using ::max in linux. Test Plan: Build it and try to build an external project on top of it. Reviewers: anthony, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41745	2015-07-11 10:32:11 -07:00
Siying Dong	e41cbd9c2f	Merge pull request #646 from yuslepukhin/ms_win_port Windows Port from Microsoft	2015-07-10 15:53:39 -07:00
sdong	041b6f95a2	perf_context: report time spent on reading index and bloom blocks Summary: Add a perf context counter to help users figure out time spent on reading indexes and bloom filter blocks. Test Plan: Will write a unit test Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D41433	2015-07-10 14:45:42 -07:00
Dmitri Smirnov	c903ccc4c2	Merge from github/master	2015-07-09 18:01:08 -07:00
unknown	5c79132335	Revert the changes related to Options, as requested to seperate them into a different patch.	2015-07-09 11:31:42 -07:00
Dmitri Smirnov	ef4b87f1b2	Commit both PR and internal code review changes	2015-07-07 16:58:20 -07:00
Yueh-Hsuan Chiang	b7a2369fb2	Revert "Replace std::priority_queue in MergingIterator with custom heap" Summary: This patch reverts "Replace std::priority_queue in MergingIterator with custom heap" (commit commit `b6655a679d`) as it causes db_stress failure. Test Plan: ./db_stress --test_batches_snapshots=1 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=0 --reopen=20 --readpercent=45 --prefixpercent=5 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/tmp/rocksdb_crashtest_KdCI5F --max_key=100000000 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --progress_reports=0 --disable_wal=0 --disable_data_sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=0 --memtablerep=prefix_hash --prefix_size=7 --ops_per_thread=200 --kill_random_test=97 Reviewers: igor, anthony, lovro, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41343	2015-07-07 14:45:20 -07:00
Yueh-Hsuan Chiang	57d216ea65	Remove assert(current_ == CurrentReverse()) in MergingIterator::Prev() Summary: Remove assert(current_ == CurrentReverse()) in MergingIterator::Prev() because it is possible to have some keys larger than the seek-key inserted between Seek() and SeekToLast(), which makes current_ not equal to CurrentReverse(). Test Plan: db_stress Reviewers: igor, sdong, IslamAbdelRahman, anthony Reviewed By: anthony Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41331	2015-07-07 12:45:06 -07:00
Yueh-Hsuan Chiang	685582a0b4	Revert two diffs related to DBIter::FindPrevUserKey() Summary: This diff reverts the following two previous diffs related to DBIter::FindPrevUserKey(), which makes db_stress unstable. We should bake a better fix for this. * "Fix a comparison in DBIter::FindPrevUserKey()" `ec70fea4c4`. * "Fixed endless loop in DBIter::FindPrevUserKey()" `acee2b08a2`. Test Plan: db_stress Reviewers: anthony, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41301	2015-07-07 11:36:24 -07:00
lovro	b6655a679d	Replace std::priority_queue in MergingIterator with custom heap Summary: While profiling compaction in our service I noticed a lot of CPU (~15% of compaction) being spent in MergingIterator and key comparison. Looking at the code I found MergingIterator was (understandably) using std::priority_queue for the multiway merge. Keys in our dataset include sequence numbers that increase with time. Adjacent keys in an L0 file are very likely to be adjacent in the full database. Consequently, compaction will often pick a chunk of rows from the same L0 file before switching to another one. It would be great to avoid the O(log K) operation per row while compacting. This diff replaces std::priority_queue with a custom binary heap implementation. It has a "replace top" operation that is cheap when the new top is the same as the old one (i.e. the priority of the top entry is decreased but it still stays on top). Test Plan: make check To test the effect on performance, I generated databases with data patterns that mimic what I describe in the summary (rows have a mostly increasing sequence number). I see a 10-15% CPU decrease for compaction (and a matching throughput improvement on tmpfs). The exact improvement depends on the number of L0 files and the amount of locality. Performance on randomly distributed keys seems on par with the old code. Reviewers: kailiu, sdong, igor Reviewed By: igor Subscribers: yoshinorim, dhruba, tnovak Differential Revision: https://reviews.facebook.net/D29133	2015-07-06 04:24:09 -07:00
Yueh-Hsuan Chiang	acee2b08a2	Fixed endless loop in DBIter::FindPrevUserKey() Summary: Fixed endless loop in DBIter::FindPrevUserKey() Test Plan: ./db_stress --test_batches_snapshots=1 --threads=32 --write_buffer_size=4194304 --destroy_db_initially=0 --reopen=20 --readpercent=45 --prefixpercent=5 --writepercent=35 --delpercent=5 --iterpercent=10 --db=/tmp/rocksdb_crashtest_KdCI5F --max_key=100000000 --mmap_read=0 --block_size=16384 --cache_size=1048576 --open_files=500000 --verify_checksum=1 --sync=0 --progress_reports=0 --disable_wal=0 --disable_data_sync=1 --target_file_size_base=2097152 --target_file_size_multiplier=2 --max_write_buffer_number=3 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --filter_deletes=0 --memtablerep=prefix_hash --prefix_size=7 --ops_per_thread=200 --kill_random_test=97 Reviewers: tnovak, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D41085	2015-07-02 16:10:31 -07:00
Dmitri Smirnov	9dbde7277c	Merge remote-tracking branch 'origin' into ms_win_port	2015-07-02 11:34:22 -07:00
Dmitri Smirnov	18285c1e2f	Windows Port from Microsoft Summary: Make RocksDb build and run on Windows to be functionally complete and performant. All existing test cases run with no regressions. Performance numbers are in the pull-request. Test plan: make all of the existing unit tests pass, obtain perf numbers. Co-authored-by: Praveen Rao praveensinghrao@outlook.com Co-authored-by: Sherlock Huang baihan.huang@gmail.com Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com	2015-07-01 16:13:56 -07:00
sdong	05e2831966	Allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena Summary: Try to allocate LevelFileIteratorState and LevelFileNumIterator from DB iterator's arena, instead of calling malloc and free. Test Plan: valgrind check Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D40929	2015-06-30 17:30:38 -07:00
Igor Canadi	0a019d74a0	Use malloc_usable_size() for accounting block cache size Summary: Currently, when we insert something into block cache, we say that the block cache capacity decreased by the size of the block. However, size of the block might be less than the actual memory used by this object. For example, 4.5KB block will actually use 8KB of memory. So even if we configure block cache to 10GB, our actually memory usage of block cache will be 20GB! This problem showed up a lot in testing and just recently also showed up in MongoRocks production where we were using 30GB more memory than expected. This diff will fix the problem. Instead of counting the block size, we will count memory used by the block. That way, a block cache configured to be 10GB will actually use only 10GB of memory. I'm using non-portable function and I couldn't find info on portability on Google. However, it seems to work on Linux, which will cover majority of our use-cases. Test Plan: 1. fill up mongo instance with 80GB of data 2. restart mongo with block cache size configured to 10GB 3. do a table scan in mongo 4. memory usage before the diff: 12GB. memory usage after the diff: 10.5GB Reviewers: sdong, MarkCallaghan, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D40635	2015-06-26 11:48:09 -07:00
Giuseppe Ottaviano	782a1590f9	Implement a table-level row cache Summary: Implementation of a table-level row cache. It only caches point queries done through the `DB::Get` interface, queries done through the `Iterator` interface will completely skip the cache. Supports snapshots and merge operations. Test Plan: Ran `make valgrind_check commit-prereq` Reviewers: igor, philipp, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D39849	2015-06-23 10:25:45 -07:00
sdong	6df589b446	Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files Summary: It is experimental. Allow users to return from a call back function TablePropertiesCollector::NeedCompact(), based on the data in the file. It can be used to allow users to suggest DB to clear up delete tombstones faster. Test Plan: Add a unit test. Reviewers: igor, yhchiang, kradhakrishnan, rven Reviewed By: rven Subscribers: yoshinorim, MarkCallaghan, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D39585	2015-06-05 20:18:21 -07:00
agiardullo	dc9d70de65	Optimistic Transactions Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported. Test Plan: Added a new test, but still need to look into stress testing. Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: adamretter, MarkCallaghan, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D33435	2015-05-29 14:36:35 -07:00
Igor Canadi	dbd95b7532	Add more table properties to EventLogger Summary: Example output: {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}} Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter. Test Plan: make check + check out the output in the log Reviewers: sdong, rven, yhchiang Reviewed By: yhchiang Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D38343	2015-05-12 15:53:55 -07:00
clark.kang	6ede020dc4	fix typos	2015-04-25 18:14:27 +09:00
sdong	98a44559d5	Build for CYGWIN Summary: Make it build for CYGWIN. Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions. Test Plan: Build it and run some unit tests in CYGWIN Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D37605	2015-04-23 21:33:44 -07:00
sdong	fcb206b667	SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward Summary: Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests. Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug. Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check. Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36999	2015-04-14 16:18:50 -07:00
Igor Canadi	5e067a7b19	Clean up compression logging Summary: Now we add warnings when user configures compression and the compression is not supported. Test Plan: Configured compression to non-supported values. Observed messages in my log: 2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2. 2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data. Reviewers: rven, sdong, yhchiang Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35979	2015-04-06 12:50:44 -07:00
sdong	a45e7581b7	Avoid naming conflict of EntryType Summary: Fix build break on travis build: $ OPT=-DTRAVIS V=1 make unity && make clean && OPT=-DTRAVIS V=1 make db_test && ./db_test ...... In file included from unity.cc:65:0: ./table/plain_table_key_coding.cc: In member function ‘rocksdb::Status rocksdb::PlainTableKeyDecoder::NextPrefixEncodingKey(const char, const char, rocksdb::ParsedInternalKey, rocksdb::Slice, size_t, bool)’: ./table/plain_table_key_coding.cc:224:3: error: reference to ‘EntryType’ is ambiguous EntryType entry_type; ^ In file included from ./db/table_properties_collector.h:9:0, from ./db/builder.h:11, from ./db/builder.cc:10, from unity.cc:1: ./include/rocksdb/table_properties.h:81:6: note: candidates are: enum rocksdb::EntryType enum EntryType { ^ In file included from unity.cc:65:0: ./table/plain_table_key_coding.cc:16:6: note: enum rocksdb::{anonymous}::EntryType enum EntryType : unsigned char { ^ ./table/plain_table_key_coding.cc:231:51: error: ‘entry_type’ was not declared in this scope const char* pos = DecodeSize(key_ptr, limit, &entry_type, &size); ^ make: *** [unity.o] Error 1 Test Plan: OPT=-DTRAVIS V=1 make unity And make sure it doesn't break anymore. Reviewers: yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D36549	2015-04-06 11:49:13 -07:00
sdong	953a885ebf	A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge Summary: Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it. Also refactor the codes so that (1) make table property collector and internal table property collector two separate data structures with the later one now exposed (2) table builders only receive internal table properties Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector. Reviewers: yhchiang, igor.sugak, rven, igor Reviewed By: rven, igor Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D35373	2015-04-06 10:27:21 -07:00
Anurag Indu	1e57f2bf2b	Fix build Test Plan: Running make all Reviewers: sdong Reviewed By: sdong Subscribers: rven, yhchiang, igor, meyering, dhruba Differential Revision: https://reviews.facebook.net/D35889	2015-03-24 17:00:28 -07:00
Anurag Indu	211ca26aee	Fixing build issue Summary: Fixing issues with get context function. Test Plan: Run make commit-prereq Reviewers: sdong, meyering, yhchiang Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D35853	2015-03-24 16:27:24 -07:00
Anurag Indu	3d1a924ff3	Adding stats for the merge and filter operation Summary: We have addded new stats and perf_context for measuring the merge and filter operation time consumption. We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB. Test Plan: WIP Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong Reviewed By: sdong Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D34377	2015-03-24 14:42:04 -07:00
Igor Sugak	9405b5ef8f	rocksdb: Remove #include "util/string_util.h" from util/testharness.h Summary: 1. Manually deleted #include "util/string_util.h" from util/testharness.h 2. ``` % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -E 'say $F[0] if /: error:/' build.log \| sort -u \| xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"' ``` Test Plan: Make sure make all completes with no errors. ``` % make all -j55 ``` Reviewers: meyering, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35493	2015-03-19 17:29:37 -07:00
Igor Sugak	b4b69e4f77	rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te \| test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files 'test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157	2015-03-17 14:08:00 -07:00
Igor Sugak	9fd6edf81c	rocksdb: Replace ASSERT* with EXPECT* in functions that does not return void value Summary: gtest does not use exceptions to fail a unit test by design, and `ASSERT`s are implemented using `return`. As a consequence we cannot use `ASSERT` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement \| 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes. In order to detect all existing `ASSERT` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases. In `util/testharness.h` I defined `EXPECT` assertions, the same way as `ASSERT`, and redefined `ASSERT` to return `void`. Then executed: ```lang=bash % USE_CLANG=1 make all -j55 -k 2> build.log % perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if /: error:/' \ build.log \| xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number' % make format ``` After that I reverted back change to `ASSERT` in `util/testharness.h`. But preserved introduced `EXPECT`, which is the same as `ASSERT*`. This will be deleted once switched to gtest. This diff is independent and contains manual changes only in `util/testharness.h`. Test Plan: Make sure all tests are passing. ```lang=bash % USE_CLANG=1 make check ``` Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33333	2015-03-16 20:52:32 -07:00
Igor Sugak	95344346af	rocksdb: Small refactoring before migrating to gtest Summary: These changes are necessary to make tests look more generic, and avoid feature conflicts with gtest. Test Plan: Make sure no build errors, and all test are passing. ``` % make check ``` Reviewers: igor, meyering Reviewed By: meyering Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35145	2015-03-16 18:08:59 -07:00
sdong	e9de8b65a6	Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true Summary: Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc. Test Plan: run all tests Reviewers: rven, yhchiang, kradhakrishnan, igor Reviewed By: igor Subscribers: yoshinorim, leveldb, dhruba Differential Revision: https://reviews.facebook.net/D34431	2015-03-11 13:14:52 -07:00
krad	f29b33c73b	Add functionality to pre-fetch blocks specified by a key range to BlockBasedTable implementation. Summary: Pre-fetching is a common operation performed by data stores for disk/flash based systems as part of database startup. This is part of task 5197184. Test Plan: Run the newly added unit test Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33933	2015-03-02 17:07:03 -08:00
Sameet Agarwal	e7c434c364	Add columnfamily option optimize_filters_for_hits to optimize for key hits only Summary: Summary: Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most accesses to the store are key hits and we dont need to optimize performance for key misses. This is useful when you have a very large database and most of your lookups succeed. The option allows the store to not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level. This option is only provided for BlockBasedTable. We skip the filters when we are compacting Test Plan: 1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level) 2. Added another unit test to db_test which specifically tests that filters are being skipped Reviewers: rven, igor, sdong Reviewed By: sdong Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33717	2015-02-26 16:25:56 -08:00
Igor Sugak	62247ffa3b	rocksdb: Add missing override Summary: When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes. Prerequisites: bear and clang 3.5 build with extra tools ```lang=bash % USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html % clang-modernize -p . -include . -add-override % make format ``` Test Plan: Make sure all tests are passing. ```lang=bash % #Use default fb code clang. % make check ``` Verify less error and no missing override errors. ```lang=bash % # Have trunk clang present in path. % ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make ``` Reviewers: igor, kradhakrishnan, rven, meyering, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34077	2015-02-26 11:28:41 -08:00
krad	d9f4875e52	Disable pre-fetching of index and filter blocks for sst_dump_tool. Summary: BlockBasedTable pre-fetches the filter and index blocks on Open call. This is an optimistic optimization targeted for runtime scenario. The optimization is unnecessary for sst_dump_tool - Added a provision to disable pre-fetching of index and filter blocks in BlockBasedTable - Disabled pre-fetching for the sst_dump tool Stack for reference : #01 0x00000000005ed944 in snappy::InternalUncompress<snappy::SnappyArrayWriter> () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:148 #02 0x00000000005edeee in snappy::RawUncompress () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:947 #03 0x00000000004e0b4d in rocksdb::UncompressBlockContents () from /data/users/paultuckfield/rocksdb/./util/compression.h:69 #04 0x00000000004e145c in rocksdb::ReadBlockContents () from /data/users/paultuckfield/rocksdb/table/format.cc:334 #05 0x00000000004ca424 in rocksdb::(anonymous namespace)::ReadBlockFromFile () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:70 #06 0x00000000004cccad in rocksdb::BlockBasedTable::CreateIndexReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:173 #07 0x00000000004d17e5 in rocksdb::BlockBasedTable::Open () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:553 #08 0x00000000004c8184 in rocksdb::BlockBasedTableFactory::NewTableReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_factory.cc:51 #09 0x0000000000598463 in rocksdb::SstFileReader::NewTableReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:69 #10 0x00000000005986c2 in rocksdb::SstFileReader::SstFileReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:26 #11 0x0000000000599047 in rocksdb::SSTDumpTool::Run () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:332 #12 0x0000000000409b06 in main () from /data/users/paultuckfield/rocksdb/tools/sst_dump.cc:12 Test Plan: - Added a unit test to trigger the code. - Also did some manual verification. - Passed all unit tests task #6296048 Reviewers: igor, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D34041	2015-02-25 16:34:26 -08:00
Igor Sugak	62f7a1be4f	rocksdb: Fixed 'Dead assignment' and 'Dead initialization' scan-build warnings Summary: This diff contains trivial fixes for 6 scan-build warnings: db/c_test.c `db` variable is never read. Removed assignment. scan-build report: http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath db/db_iter.cc `skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return. scan-build report: http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath db/log_reader.cc In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion. scan-build report: http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases. scan-build reprots: http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPath http://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath table/plain_table_key_coding.cc Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration. scan-build report: http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath tools/db_stress.cc Missing `break` in switch case block. This seems to be a bug. Added missing `break`. Test Plan: Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs. ```lang=bash % make check % make analyze ``` Reviewers: meyering, igor, kradhakrishnan, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33795	2015-02-23 14:10:09 -08:00
Jim Meyering	aa5d8e6d95	table_test.cc: add missing 5th arg in TestArgs initializer Summary: Adding -W and -Wextra to CXXFLAGS provoked this failure: table/table_test.cc:1854:56: error: missing initializer for member ‘rocksdb::TestArgs::format_version’ [-Werror=missing-field-initializers] TestArgs args = { DB_TEST, false, 16, kNoCompression }; ^ Add the missing, 5th value (format_version). Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33765	2015-02-20 11:07:11 -08:00
Jim Meyering	9283c7afd2	build: remove always-true assertions Summary: Remove some always-true assertions. They provoke these compilation failures: table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits] * table/plain_table_key_coding.cc (rocksdb): Remove assertion that unsigned type variable is >= 0. * db/version_set.cc (DoGenerateLevelFilesBrief): Likewise. Test Plan: Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors. Reviewers: ljin, sdong, igor.sugak, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33747	2015-02-20 11:07:03 -08:00
Igor Sugak	98870c7b9c	rocksdb: Fix scan-build memory warning in table/block_based_table_reader.cc Summary: scan-build is reporting two memory leak bugs in `table/block_based_table_reader.cc`. They are both false positives. In both cases we allocate memory in `ReadBlockFromFile` if `s.ok()`. Then after the function `ReadBlockFromFile` returns we check for the same variable if `s.ok()` and then use the memory that was allocated. The bugs reported by scan-build is if `ReadBlockFromFile` allocates memory and returns, but for some reason status `s` is not the same and `s.ok() != true`. In this case scan-build is concerned that memory owner transfer is not explicit. I modified `ReadBlockFromFile` to accept `std::unique_ptr<Block>*` as a parameter, instead of raw pointer. scan-build reports: http://home.fburl.com/~sugak/latest2/report-a4b3fa.html#EndPath http://home.fburl.com/~sugak/latest2/report-29adbf.html#EndPath Test Plan: Make sure scan-build does not report these bugs and all tests are passing. ```lang=bash % make check % make analyze ``` Reviewers: sdong, lgalanis, meyering, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D33681	2015-02-19 14:07:38 -08:00
sdong	68af7811ea	Remember whole key/prefix filtering on/off in SST file Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter. Test Plan: Add a unit test for it Reviewers: yhchiang, igor, rven Reviewed By: rven Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D32889	2015-02-11 11:20:04 -08:00
sdong	e63140d52b	Get() to use prefix bloom filter when filter is not block based Summary: Get() now doesn't make use of bloom filter if it is prefix based. Add the check. Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter. Test Plan: make all check Add a test case in DBTest Reviewers: rven, yhchiang, igor Reviewed By: igor Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim Differential Revision: https://reviews.facebook.net/D31941	2015-02-04 15:15:41 -08:00
Igor Canadi	9ab5adfc59	New BlockBasedTable version -- better compressed block format Summary: This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions: 1) Zlib -- encode decompressed size in compressed block header 2) BZip2 -- encode decompressed size in compressed block header 3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes. It does not affect format for snappy. If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case. Test Plan: Added a new test in db_test. I will also run db_bench and verify VSIZE when block_cache == 1GB Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31461	2015-01-14 16:24:24 -08:00
Igor Canadi	96b8240bc5	Support footer versions bigger than 1 Summary: In this diff I add another parameter to BlockBasedTableOptions that will let users specify block based table's format. This will greatly simplify block based table's format changes in the future. First format change that this will support is encoding decompressed size in Zlib and BZip2 blocks. This diff is blocking https://reviews.facebook.net/D31311. Test Plan: Added a unit tests. More tests to come as part of https://reviews.facebook.net/D31311. Reviewers: dhruba, MarkCallaghan, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31383	2015-01-13 14:33:04 -08:00
Igor Canadi	15d2abbec3	Fix build issues	2015-01-09 13:04:06 -08:00
Igor Canadi	abb9b95ffe	Move compression functions from port/ to util/ Summary: We keep checksum functions in util/, there is no reason for compression to be in port/ Test Plan: compiles Reviewers: sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D31281	2015-01-09 12:57:11 -08:00
Manish Patil	7ea7bdf04d	Dump routine to BlockBasedTableReader Summary: Added necessary routines for dumping block based SST with block filter Test Plan: Added "raw" mode to utility sst_dump Reviewers: sdong, rven Reviewed By: rven Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D29679	2014-12-23 13:24:07 -08:00
Jonah Cohen	a14b7873ee	Enforce write buffer memory limit across column families Summary: Introduces a new class for managing write buffer memory across column families. We supplement ColumnFamilyOptions::write_buffer_size with ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer instance that enforces memory limits before flushing out to disk. Test Plan: Added SharedWriteBuffer unit test to db_test.cc Reviewers: sdong, rven, ljin, igor Reviewed By: igor Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim Differential Revision: https://reviews.facebook.net/D22581	2014-12-02 12:09:20 -08:00
Yueh-Hsuan Chiang	7e608e2fe3	Block plain_table_index.cc in ROCKSDB_LITE Summary: Block plain_table_index.cc in ROCKSDB_LITE Test Plan: make clean make OPT=-DROCKSDB_LITE shared_lib -j32 make clean make shared_lib -j32 Reviewers: ljin, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29535	2014-11-24 20:47:27 -08:00
Yueh-Hsuan Chiang	13de000f07	Add rocksdb::ToString() to address cases where std::to_string is not available. Summary: In some environment such as android, the c++ library does not have std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string when std::to_string is not available, and implements std::to_string in the other case. Test Plan: make dbg -j32 ./db_test make clean make dbg OPT=-DOS_ANDROID -j32 ./db_test Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29181	2014-11-24 20:44:49 -08:00
Bryan Rosario	9e285d4238	Added CompatibleOptions for compatibility with LevelDB Options Summary: Created a CompatibleOptions object that can be used as a LevelDB Options object and then converted to a RocksDB Options object using the ConvertOptions() method. Test Plan: Unit test included in diff. Reviewers: ljin Reviewed By: ljin Subscribers: sdong, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28893	2014-11-20 19:24:39 -08:00
Lei Jin	8d3f8f9696	remove all remaining references to cfd->options() Summary: The very last reference happens in DBImpl::GetOptions() I built with both DBImpl::GetOptions() and ColumnFamilyData::options() commented out Test Plan: make all check Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D29073	2014-11-18 10:20:10 -08:00
Igor Canadi	9be338cf9d	CompactionJobTest Summary: This is just a simple test that passes two files though a compaction. It shows the framework so that people can continue building new compaction unit tests. In the future we might want to move some Compaction* tests from DBTest here. For example, CompactBetweenSnapshot seems a good candidate. Hopefully this test can be simpler when we mock out VersionSet. Test Plan: this is a test Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28449	2014-11-14 11:35:48 -08:00
sdong	a177742a9b	Make db_stress built for ROCKSDB_LITE Summary: Make db_stress built for ROCKSDB_LITE. The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB. Test Plan: make db_stress Reviewers: yhchiang, rven, ljin, igor Reviewed By: igor Subscribers: leveldb, dhruba Differential Revision: https://reviews.facebook.net/D28797	2014-11-14 10:20:51 -08:00
Igor Canadi	25f273027b	Fix iOS compile with -Wshorten-64-to-32 Summary: So iOS size_t is 32-bit, so we need to static_cast<size_t> any uint64_t :( Test Plan: TARGET_OS=IOS make static_lib Reviewers: dhruba, ljin, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28743	2014-11-13 14:39:30 -05:00
Igor Canadi	767777c2bd	Turn on -Wshorten-64-to-32 and fix all the errors Summary: We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details. This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables. Test Plan: compiles Reviewers: ljin, rven, yhchiang, sdong Reviewed By: yhchiang Subscribers: bobbaldwin, dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28689	2014-11-11 16:47:22 -05:00
Igor Canadi	68effa0348	Fix -Wshadow for tools Summary: Previously I made `make check` work with -Wshadow, but there are some tools that are not compiled using `make check`. Test Plan: make all Reviewers: yhchiang, rven, ljin, sdong Reviewed By: ljin, sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28497	2014-11-07 15:04:30 -08:00
Igor Canadi	9f20395cd6	Turn -Wshadow back on Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc. Test Plan: compiles Reviewers: ljin, yhchiang, rven, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D28131	2014-11-06 11:14:28 -08:00
Igor Canadi	9f7fc3ac45	Turn on -Wshadow Summary: ...and fix all the errors :) Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean. Test Plan: compiles Reviewers: yhchiang, rven, sdong, ljin Reviewed By: ljin Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27711	2014-10-31 11:59:54 -07:00
Yueh-Hsuan Chiang	98849a35fa	Apply InfoLogLevel to the logs in table/block_based_table_reader.cc Summary: Apply InfoLogLevel to the logs in table/block_based_table_reader.cc Also, add missing checks for the returned status in BlockBasedTable::Open Test Plan: make Reviewers: sdong, ljin, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D28005	2014-10-31 11:41:15 -07:00
Yueh-Hsuan Chiang	217cc217d7	Apply InfoLogLevel to the logs in table/meta_blocks.cc Summary: Apply InfoLogLevel to the logs in table/meta_blocks.cc Test Plan: make Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27903	2014-10-29 17:55:19 -07:00
Yueh-Hsuan Chiang	fd95745a59	Fix compile error in table/plain_table_index.cc Summary: Fix compile error in table/plain_table_index.cc Test Plan: make	2014-10-29 17:42:38 -07:00
Yueh-Hsuan Chiang	e7ad69b9fe	Apply InfoLogLevel to the logs in table/plain_table_index.cc Summary: Apply InfoLogLevel to the logs in table/plain_table_index.cc Test Plan: make Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27909	2014-10-29 17:08:40 -07:00
Yueh-Hsuan Chiang	bbd9c53457	Apply InfoLogLevel to the logs in table/block_based_table_builder.cc Summary: Apply InfoLogLevel to the logs in table/block_based_table_builder.cc Test Plan: make Reviewers: igor, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27921	2014-10-29 17:08:20 -07:00
Igor Canadi	412b7f85bb	Include atomic in mock_table.h	2014-10-28 18:10:55 -07:00
Igor Canadi	abac3d6476	TableMock + framework for mock classes Summary: This diff replaces BlockBasedTable in flush_job_test with TableMock, making it depend on less things and making it closer to an unit test than integration test. It also introduces a framework to compile mock classes -- Any file named *mock.cc will not be compiled into the build. It will only get compiled into the tests. What way we can mock out most other classes, Version, VersionSet, DBImpl, etc. Test Plan: flush_job_test Reviewers: ljin, rven, yhchiang, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D27681	2014-10-28 17:52:32 -07:00
Igor Canadi	c1c68bce43	remove atomic_pointer.h references	2014-10-27 15:12:20 -07:00
Lei Jin	f1841985e4	dynamic inplace_update options Summary: Make inplace_update_support and inplace_update_num_locks dynamic. inplace_callback becomes immutable We are almost free of references to cfd->options() in db_impl Test Plan: unit test Reviewers: igor, yhchiang, rven, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25293	2014-10-27 12:10:13 -07:00
Lei Jin	839c376bd1	fix table_test Summary: SaveValue expects an internal key but I previously added to table a user key Test Plan: ran the test	2014-10-22 13:53:35 -07:00
Lei Jin	0fd985f427	Avoid reloading filter on Get() if cache_index_and_filter_blocks == false Summary: This fixes the case that filter policy is missing in SST file, but we open the table with filter policy on and cache_index_and_filter_blocks = false. The current behavior is that we will try to load it every time on Get() but fail. Test Plan: unit test Reviewers: yhchiang, igor, rven, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25455	2014-10-22 11:52:35 -07:00
Lei Jin	2dd9bfe3a8	Sanitize block-based table index type and check prefix_extractor Summary: Respond to issue reported https://www.facebook.com/groups/rocksdb.dev/permalink/651090261656158/ Change the Sanitize signature to take both DBOptions and CFOptions Test Plan: unit test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D25041	2014-10-17 21:18:36 -07:00
Igor Canadi	d6987216c9	Merge pull request #327 from dalgaaf/wip-da-SCA-20141001 Fix some issues from SCA	2014-10-02 10:59:52 -07:00
Lei Jin	5ec53f3edf	make compaction related options changeable Summary: make compaction related options changeable. Most of changes are tedious, following the same convention: grabs MutableCFOptions at the beginning of compaction under mutex, then pass it throughout the job and register it in SuperVersion at the end. Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23349	2014-10-01 16:19:16 -07:00
Danny Al-Gaaf	28a6e31583	table/block_based_table_builder.cc: remove unused variable Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:09 +02:00
Danny Al-Gaaf	6b6cedbb1b	table/format.cc: reduce scope of some variables Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	55652043c8	table/cuckoo_table_reader.cc: pass func parameter by reference Fix for: [table/cuckoo_table_reader.cc:196]: (performance) Function parameter 'target' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-10-01 10:49:08 +02:00
Danny Al-Gaaf	d517c83648	in_table_factory.cc: use correct format specifier Use %zu instead of %zd since size_t and uint32_t are unsigned. Fix for: [table/plain_table_factory.cc:55]: (warning) %zd in format string (no. 1) requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'. [table/plain_table_factory.cc:58]: (warning) %zd in format string (no. 1) requires 'ssize_t' but the argument type is 'size_t {aka unsigned long}'. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	063471bf76	table/table_test.cc: pass func parameter by reference Fix for: [table/table_test.cc:1218]: (performance) Function parameter 'prefix' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	93548ce8f4	table/cuckoo_table_reader.cc: pass func parameter by ref Fix for: [table/cuckoo_table_reader.cc:198]: (performance) Function parameter 'file_data' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:32 +02:00
Danny Al-Gaaf	8ce050b51b	table/bloom_block.*: pass func parameter by reference [table/bloom_block.h:29]: (performance) Function parameter 'keys_hashes' should be passed by reference. Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>	2014-09-30 23:30:31 +02:00
Lei Jin	2faf49d5f1	use GetContext to replace callback function pointer Summary: Intead of passing callback function pointer and its arg on Table::Get() interface, passing GetContext. This makes the interface cleaner and possible better perf. Also adding a fast pass for SaveValue() Test Plan: make all check Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24057	2014-09-29 11:09:09 -07:00
Lei Jin	2dc6f62bb9	handle kDelete type in cuckoo builder Summary: when I changed std::vector<std::string, std::string> to std::string to store key/value pairs in builder, I missed the handling for kDeletion type. As a result, value_size_ can be wrong if the first add key is for deletion. The is captured by ./cuckoo_table_db_test Test Plan: ./cuckoo_table_db_test ./cuckoo_table_reader_test ./cuckoo_table_builder_test Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D24045	2014-09-29 10:25:21 -07:00
Lei Jin	d439451fab	delay initialization of cuckoo table iterator Summary: cuckoo table iterator creation is quite expensive since it needs to load all data and sort them. After compaction, RocksDB creates a new iterator of the new file to make sure it is in good state. That makes the DB creation quite slow. Delay the iterator db sort to the seek time to speed it up. Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23775	2014-09-25 16:45:37 -07:00
Lei Jin	94997eab5e	reduce memory usage of cuckoo table builder Summary: builder currently buffers all key value pairs as a vector of pair<string, string>. That is too much due to std::string overhead. It wasn't able to fit 1B key/values (12bytes total) in 100GB of ram. Switch to use a plain string to store the key/value sequence and use only 12GB of ram as a result. Test Plan: db_bench Reviewers: igor, sdong, yhchiang Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23763	2014-09-25 16:34:24 -07:00
Lei Jin	c6275956e2	improve memory efficiency of cuckoo reader Summary: When creating a new iterator, instead of storing mapping from key to bucket id for sorting, store only bucket id and read key from mmap file based on the id. This reduces from 20 bytes per entry to only 4 bytes. Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23757	2014-09-25 16:15:23 -07:00
Lei Jin	581442d446	option to choose module when calculating CuckooTable hash Summary: Using module to calculate hash makes lookup ~8% slower. But it has its benefit: file size is more predictable, more space enffient Test Plan: db_bench Reviewers: igor, yhchiang, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23691	2014-09-25 13:53:27 -07:00
Lei Jin	3c68006109	CompactedDBImpl Summary: Add a CompactedDBImpl that will enabled when calling OpenForReadOnly() and the DB only has one level (>0) of files. As a performan comparison, CuckooTable performs 2.1M/s with CompactedDBImpl vs. 1.78M/s with ReadOnlyDBImpl. Test Plan: db_bench Reviewers: yhchiang, igor, sdong Reviewed By: sdong Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23553	2014-09-25 11:14:01 -07:00
Lei Jin	0a29ce5393	re-enable BlockBasedTable::SetupForCompaction() Summary: It was commented out in D22545 by accident. Keep the option in ImmutableOptions for now. I can make it dynamic in https://reviews.facebook.net/D23349 Test Plan: make release Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23865	2014-09-23 14:18:57 -07:00
Igor Canadi	55af370756	Remove TODO for checking index checksums	2014-09-23 13:02:23 -07:00
Igor Canadi	3d74f09979	Fix compile	2014-09-22 15:19:20 -07:00
liuchang0812	4436f17bd8	fixed #303 : replace %ld with % PRId64	2014-09-21 22:09:48 +08:00
Lei Jin	51af7c326c	CuckooTable: add one option to allow identity function for the first hash function Summary: MurmurHash becomes expensive when we do millions Get() a second in one thread. Add this option to allow the first hash function to use identity function as hash function. It results in QPS increase from 3.7M/s to ~4.3M/s. I did not observe improvement for end to end RocksDB performance. This may be caused by other bottlenecks that I will address in a separate diff. Test Plan: ``` [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=0 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.272us (3.7 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.138us (7.2 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.1 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.142us (7.0 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.144us (6.9 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.201us (5.0 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.123us (8.1 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.121us (8.3 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.112us (8.9 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.251us (4.0 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.107us (9.4 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.099us (10.1 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.100us (10.0 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.116us (8.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.189us (5.3 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.095us (10.5 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.096us (10.4 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.098us (10.2 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.105us (9.5 Mqps) with batch size of 100, # of found keys 73400320 [ljin@dev1964 rocksdb] ./cuckoo_table_reader_test --enable_perf --file_dir=/dev/shm --write --identity_as_first_hash=1 ==== Test CuckooReaderTest.WhenKeyExists ==== Test CuckooReaderTest.WhenKeyExistsWithUint64Comparator ==== Test CuckooReaderTest.CheckIterator ==== Test CuckooReaderTest.CheckIteratorUint64 ==== Test CuckooReaderTest.WhenKeyNotFound ==== Test CuckooReaderTest.TestReadPerformance With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.230us (4.3 Mqps) with batch size of 0, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.086us (11.7 Mqps) with batch size of 10, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.088us (11.3 Mqps) with batch size of 25, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 50, # of found keys 125829120 With 125829120 items, utilization is 93.75%, number of hash functions: 2. Time taken per op is 0.083us (12.1 Mqps) with batch size of 100, # of found keys 125829120 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.159us (6.3 Mqps) with batch size of 0, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 10, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.6 Mqps) with batch size of 25, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.080us (12.5 Mqps) with batch size of 50, # of found keys 104857600 With 104857600 items, utilization is 78.12%, number of hash functions: 2. Time taken per op is 0.082us (12.2 Mqps) with batch size of 100, # of found keys 104857600 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.154us (6.5 Mqps) with batch size of 0, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (13.0 Mqps) with batch size of 10, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.077us (12.9 Mqps) with batch size of 25, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 50, # of found keys 83886080 With 83886080 items, utilization is 62.50%, number of hash functions: 2. Time taken per op is 0.079us (12.6 Mqps) with batch size of 100, # of found keys 83886080 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.218us (4.6 Mqps) with batch size of 0, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.083us (12.0 Mqps) with batch size of 10, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.085us (11.7 Mqps) with batch size of 25, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.086us (11.6 Mqps) with batch size of 50, # of found keys 73400320 With 73400320 items, utilization is 54.69%, number of hash functions: 2. Time taken per op is 0.078us (12.8 Mqps) with batch size of 100, # of found keys 73400320 ``` Reviewers: sdong, igor, yhchiang Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23451	2014-09-18 11:00:48 -07:00
Igor Canadi	ff76895614	Remove some unnecessary constructors Summary: This is continuing the work done by `27b22f13a3` It's just cleaning up some unnecessary constructors. The most important change is removing Block::Block(const BlockContents& contents) constructor. It was only used from the unit test. Test Plan: compiles Reviewers: sdong, yhchiang, ljin Reviewed By: ljin Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23547	2014-09-17 16:45:58 -07:00
Lei Jin	feadb9df53	fix cuckoo table builder test Summary: as title Test Plan: ./cuckoo_table_builder_test Reviewers:igor CC:leveldb Task ID: # Blame Rev:	2014-09-17 15:48:33 -07:00
Igor Canadi	54cada92b1	Run make format on PR #249	2014-09-17 15:08:50 -07:00
Igor Canadi	27b22f13a3	Merge pull request #249 from tdfischer/decompression-refactoring Decompression refactoring	2014-09-17 15:01:40 -07:00
Torrie Fischer	fb6456b00d	Replace naked calls to operator new and delete (Fixes #222 ) This replaces a mishmash of pointers in the Block and BlockContents classes with std::unique_ptr. It also changes the semantics of BlockContents to be limited to use as a constructor parameter for Block objects, as it owns any block buffers handed to it.	2014-09-17 13:50:07 -07:00
Lei Jin	5600c8f6e5	cuckoo table: return estimated size - 1 Summary: This is to avoid cutting file prematurely and resulting file size to be half of specified. Test Plan: db_bench Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23541	2014-09-17 13:25:29 -07:00
Lei Jin	a062e1f2c4	SetOptions() for memtable related options Summary: as title Test Plan: make all check I will think a way to set up stress test for this Reviewers: sdong, yhchiang, igor Reviewed By: igor Subscribers: leveldb Differential Revision: https://reviews.facebook.net/D23055	2014-09-17 12:49:13 -07:00

1 2 3 4 5 ...

464 Commits