Summary:
A recent commit e37dd216f9 makes sure hash index can be used when reading existing files. This patch uses another way to achieve the approach:
(1) Currently, always writing kBinarySearch to files, despite of BlockBasedTableOptions.IndexType setting.
(2) When reading a file, read out the field, and make sure it is kBinarySearch, while always use index type by users.
The reason for doing it is, to reserve kHashSearch property on disk to future. If now we write out binary index for both of kHashSearch and kBinarySearch. We have to use a new flag in the future for hash index on disk, otherwise compatibility would break. Also, we want the real index type and type shown in properties block to be consistent.
Test Plan: make all check
Reviewers: haobo, kailiu
Reviewed By: kailiu
CC: igor, ljin, yhchiang, xjin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18009
Conflicts:
table/block_based_table_reader.cc
Summary:
From 2.6 to 2.7, property block name is renamed from rocksdb.stats to rocksdb.properties. Older properties were not able to be loaded. In 2.8, we seem to have added some logic that uses property block without checking null pointers, which create segment faults.
In this patch, we fix it by:
(1) try rocksdb.stats if rocksdb.properties is not found
(2) add some null checking before consuming rep->table_properties
Test Plan: make sure a file generated in 2.7 couldn't be opened now can be opened.
Reviewers: haobo, igor, yhchiang
Reviewed By: igor
CC: ljin, xjin, dhruba, kailiu, leveldb
Differential Revision: https://reviews.facebook.net/D17961
Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index.
Test Plan: Wrote several new unit tests.
Reviewers: sdong, haobo, dhruba
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16539
Use shorten index key for hash-index
Summary:
I was wrong about the "index builder", right now since we create index
by scanning both whole table and index, there is not need to preserve
the whole key as the index key.
I switch back to original way index which is both space efficient and
able to supprot in-fly construction of hash index.
IN this patch, I made minimal change since I'm not sure if we still need
the "pluggable index builder", under current circumstance it is of no use
and kind of over-engineered. But I'm not sure if we can still exploit its
usefulness in the future; otherwise I think I can just burn them with great
vengeance.
Test Plan: unit tests
Reviewers: sdong, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17745
Conflicts:
table/block_based_table_reader.cc
table/block_based_table_reader.h
Summary:
This patch introduced a new table options that allows us to make
block-based table's index pluggable.
To support that new features:
* Code has been refacotred to be more flexible and supports this option well.
* More documentation is added for the existing obsecure functionalities.
* Big surgeon on DataBlockReader(), where the logic was really convoluted.
* Other small code cleanups.
The pluggablility will mostly affect development of internal modules
and won't change frequently, as a result I intentionally avoid
heavy-weight patterns (like factory) and try to make it simple.
Test Plan: make all check
Reviewers: haobo, sdong
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16395
Conflicts:
table/block_based_table_reader.cc
table/block_based_table_reader.h
Summary: Our profile shows that in one of the applications, 5% of the CPU costs of PlainTableBuilder::Add() are spent on std::string stacks. By this simple change, we avoid this global reusable string. Also, we avoid another call of file appending, which probably gives another 2%.
Test Plan: make all check
Reviewers: haobo, ljin
Reviewed By: haobo
CC: igor, yhchiang, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D17601
Summary:
Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h.
This patch improves iterator performance significantly. Running this benchmark:
./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond
The average latency is improved to about 750 nanoseconds from 1100 nanoseconds.
Test Plan:
Add a unit test.
make all check
Reviewers: haobo, ljin
Reviewed By: haobo
CC: igor, yhchiang, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D17547
Summary:
In total order mode, iterator's seek() shouldn't check total order.
Also some cleaning up about checking null for shared pointers. I don't know the behavior before it.
This bug was reported by @igor.
Test Plan: test plain_table_db_test
Reviewers: ljin, haobo, igor
Reviewed By: igor
CC: yhchiang, dhruba, igor, leveldb
Differential Revision: https://reviews.facebook.net/D17391
Summary:
By constraining the probes within cache line(s), we can improve the
cache miss rate thus performance. This probably only makes sense for
in-memory workload so defaults the option to off.
Numbers and comparision can be found in wiki:
https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study
Test Plan: benchmarked this change substantially. Will run make all check as well
Reviewers: haobo, igor, dhruba, sdong, yhchiang
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17133
Summary:
Previous code had two bugs:
* didn't initialize the table_magic_number_ explicitly -- as a
result a random junk number is stored for table_magic_number_, making
HasInitializedMagicNumber() always return true.
* if condition is inconrrect in set_table_magic_number(), and the return value is not checked.
I replace if-else by a stronger requirement enforced by assert().
Test Plan:
Previous sst_dump failed to work.
After the fix, things back to normal.
Reviewers: yhchiang
CC: haobo, sdong, igor, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D17055
Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.
Test Plan: db_bench && make asan_check
Reviewers: haobo, igor, sdong
Reviewed By: igor
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D16587
Summary:
Previous we did rough estimation of subindex size, which in worst case may result in array reallocation.
This patch aims to get the exact size and avoid any reallocation.
Test Plan: make all check
Reviewers: sdong, dhruba, haobo
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16125
Summary: The assert was pointless since if if prefix is the same as the whole key, assertion will surely fail. Reason behind is when performing the internal key comparison, if user keys are the same, *key with smaller transaction id* wins.
Test Plan: make -j32 && make check
Reviewers: sdong, dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16551
Summary:
this is the key component extracted from diff: https://reviews.facebook.net/D14271
I separate it to a dedicated patch to make the review easier.
Test Plan: added a unit test and passed it.
Reviewers: haobo, sdong, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16245
Summary:
Due to a bad merge of D14163 and D14001 before checking in D14001, "direction_ = kForward;" in MergeIterator::Seek() was deleted my mistake (in commit b135d01e7b ). It will generate wrong results or assert failure after the sequence of Prev() (or SeekToLast()), Seek() and Prev().
Fix it
Test Plan: make all check
Reviewers: igor, haobo, dhruba
Reviewed By: igor
CC: yhchiang, i.am.jin.lei, ljin, leveldb
Differential Revision: https://reviews.facebook.net/D16527
Summary:
BinarySearchIndex didn't use unique_ptr to guard the block object nor
delete it in destructor, leading to valgrind failure for "definite
memory leak".
Test Plan:
re-ran the failed valgrind test cases
Summary:
My last diff was developed in MacOS but in devserver environment error occurs.
I dug into the problem and found the way we calcuate approximate data size is pretty out-of-date. We can use table properties to get more accurate results.
Test Plan: ran ./table_test and passed
Reviewers: igor, dhruba, haobo, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16509
Summary:
This patch introduced a new table options that allows us to make
block-based table's index pluggable.
To support that new features:
* Code has been refacotred to be more flexible and supports this option well.
* More documentation is added for the existing obsecure functionalities.
* Big surgeon on DataBlockReader(), where the logic was really convoluted.
* Other small code cleanups.
The pluggablility will mostly affect development of internal modules
and won't change frequently, as a result I intentionally avoid
heavy-weight patterns (like factory) and try to make it simple.
Test Plan: make all check
Reviewers: haobo, sdong
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16395
Summary:
Previous code is too convoluted and I must be drunk for letting
such code to be written without a second thought.
Thanks to the discussion with @sdong, I added the `Options` when
generating the flusher, thus avoiding the tricks.
Just FYI: I resisted to add Options in flush_block_policy.h since I
wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options
and Options also depends FlushBlockPolicy... While I appreciate my
effort to prevent it, the old design turns out creating more troubles than
it tried to avoid.
Test Plan: ran ./table_test
Reviewers: sdong
Reviewed By: sdong
CC: sdong, leveldb
Differential Revision: https://reviews.facebook.net/D16503
Summary:
Found some function follows camel style. When naming funciton, we have two styles:
Trivially expose internal data in readonly mode: `all_lower_case()`
Regular function: `CapitalizeFirstLetter()`
I renames these functions.
Test Plan: make -j32
Reviewers: haobo, sdong, dhruba, igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16383
Summary:
PlainTable::Next() should pass the error message from ReadKey(). Now it would return a wrong error message.
Also improve the messages of status when failing to read
Test Plan: make all check
Reviewers: ljin, kailiu, haobo
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16365
Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache.
Test Plan: Wrote a new unit tests to make sure it works.
Reviewers: dhruba, haobo, igor, sdong
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16221
Summary: Provide a public API for users to access the table properties for each SSTable.
Test Plan: Added a unit tests to test the function correctness under differnet conditions.
Reviewers: haobo, dhruba, sdong
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16083
Summary: Fix table_reader_bench after some interface changes. Add it to make to avoid future breaking
Test Plan: make table_reader_bench and run it with different options.
Reviewers: kailiu, haobo
Reviewed By: haobo
CC: igor, leveldb
Differential Revision: https://reviews.facebook.net/D16107
Summary:
1. Add some more implementation-aware tests for PlainTable
2. move from a hard-coded one index per 16 rows in one prefix to a configurable number. Also, make hash table ratio = 0 means binary search only. Also fixes some divide 0 risks.
3. Explicitly support total order (only use binary search)
4. some code cleaning up.
Test Plan: make all check
Reviewers: haobo, kailiu
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16023
Summary: We'll need the prefix seek support for property aggregation.
Test Plan: make all check
Reviewers: haobo, sdong, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15963
Summary:
In MacOS, I got issue with `Footer`'s default constructor, which initialized the magic number with some random number instead of 0.
With investigation, I found we forgot to make the kInvalidTableMagicNumber to be static. As a result, kInvalidTableMagicNumber was assgined to `table_magic_number_` before it is initialized (which will be populated with random number).
Test Plan: passed current unit tests; also passed the unit tests for the incoming diff which used the default footer.
Reviewers: yhchiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16077
Summary:
* Fixed the compression state array size bug.
* Temporarily disable running `DoCompressionTest()` against bzip, which will fail the test.
Test Plan: make && ./table_test
Reviewers: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16065
Summary:
We are going to expose properties of all tables to end users through "some" db interface.
However, current design doesn't naturally fit for this need, which is because:
1. If a table presents in table cache, we cannot simply return the reference to its table properties, because the table may be destroy after compaction (and we don't want to hold the ref of the version).
2. Copy table properties is OK, but it's slow.
Thus in this diff, I change the table reader's interface to return a shared pointer (for const table properties), instead a const refernce.
Test Plan: `make check` passed
Reviewers: haobo, sdong, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15999
Summary:
This diff enables the command line tool `sst_dump` to work for sst files
under plain table format. Changes include:
* In tools/sst_dump.cc:
- add support for plain table format
- display prefix_extractor information when --show_properties is on
* In table/format.cc
- Now the table magic number of a Footer can be later initialized
via ReadFooterFromFile().
* In table/meta_bocks:
- add function ReadTableMagicNumber() that reads the magic number of
the specified file.
Minor fixes:
- remove a duplicate #include in table/table_test.cc
- fix a commentary typo in include/rocksdb/memtablerep.h
- fix lint errors.
Test Plan:
Runs sst_dump with both block-based and plain-table format files with
different arguments, specifically those with --show-properties and --from.
* sample output:
https://reviews.facebook.net/P261
Reviewers: kailiu, sdong, xjin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15903
Summary:
Previous I am too ambitious to hide every detail about table factory
to internal api. However, we cannot pass the compilatoin for external
users since we use table factory as the shared_ptr, which requires
the definition of table factory's destructor.
Test Plan: make check;
Reviewers: sdong, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15861
Summary: RocksDB doesn't compile on 32-bit architecture apparently. This is attempt to fix some of 32-bit errors. They are reported here: https://gist.github.com/paxos/8789697
Test Plan: RocksDB still compiles on 64-bit :)
Reviewers: kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15825
Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases.
Test Plan: make all check
Reviewers: haobo, dhruba, kailiu
Reviewed By: haobo
CC: igor, leveldb
Differential Revision: https://reviews.facebook.net/D15489
Summary:
Easy thing goes first. This patch moves arena to internal dir; based
on which, the coming patch will deal with memtable_rep.
Test Plan: make check
Reviewers: haobo, sdong, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15615
Summary:
Plain table has been working well and this is just a nit-picking patch,
which is generated during my coding reading. No real functional changes.
only some changes regarding:
* Improve some comments from the perspective a "new" code reader.
* Change some magic number to constant, which can help us to parameterize them
in the future.
* Did some style, naming, C++ convention changes.
* Fix warnings from new "arc lint"
Test Plan: make check
Reviewers: sdong, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15429
Summary:
We'll divide the table tests into 3 buckets, plain table test, block-based table test and general table feature test.
This diff does no real change and only does the rename and reorg.
Test Plan: run table_test
Reviewers: sdong, haobo, igor, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15417
Summary:
Mixing index/filter blocks with data blocks resulted in some known
issues. To make sure in next release our users won't be affected,
we added a new option in BlockBasedTableFactory::TableOption to
conceal this functionality for now.
This patch also introduced a BlockBasedTableReader::OpenOptions,
which avoids the "infinite" growth of parameters in
BlockBasedTableReader::Open().
Test Plan: make check
Reviewers: haobo, sdong, igor, dhruba
Reviewed By: igor
CC: leveldb, tnovak
Differential Revision: https://reviews.facebook.net/D15327