Commit Graph

210 Commits

Author SHA1 Message Date
Pratyush Seth
ff1b5df4c6 Added benchmark functionality on the lines of folly/Benchmark.h
Summary: Added benchmark functionality on the lines of folly/Benchmark.h

Test Plan: Added unit tests

Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17973
2014-04-21 12:29:55 -07:00
sdong
27d3bc184e Use a different approach to make sure BlockBasedTableReader can use hash index on older files
Summary:
A recent commit e37dd216f9 makes sure hash index can be used when reading existing files. This patch uses another way to achieve the approach:
(1) Currently, always writing kBinarySearch to files, despite of BlockBasedTableOptions.IndexType setting.
(2) When reading a file, read out the field, and make sure it is kBinarySearch, while always use index type by users.

The reason for doing it is, to reserve kHashSearch property on disk to future. If now we write out binary index for both of kHashSearch and kBinarySearch. We have to use a new flag in the future for hash index on disk, otherwise compatibility would break. Also, we want the real index type and type shown in properties block to be consistent.

Test Plan: make all check

Reviewers: haobo, kailiu

Reviewed By: kailiu

CC: igor, ljin, yhchiang, xjin, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18009
2014-04-18 14:09:21 -07:00
Kai Liu
e37dd216f9 Index type doesn't have to be persisted
Summary:

With the recent changes, there is no need to check the property block about the index block type.
If user want to use it, they don't really need any disk format change; everything happens in the fly.

Also another team encountered an error while reading the index type from properties.

Test Plan:

ran all the tests

Reviewers: sdong

CC:

Task ID: #

Blame Rev:
2014-04-17 11:08:12 -07:00
sdong
5cef458a2c RocksDB 2.8 to be able to read files generated by 2.6
Summary:
From 2.6 to 2.7, property block name is renamed from rocksdb.stats to rocksdb.properties. Older properties were not able to be loaded. In 2.8, we seem to have added some logic that uses property block without checking null pointers, which create segment faults.

In this patch, we fix it by:
(1) try rocksdb.stats if rocksdb.properties is not found
(2) add some null checking before consuming rep->table_properties

Test Plan: make sure a file generated in 2.7 couldn't be opened now can be opened.

Reviewers: haobo, igor, yhchiang

Reviewed By: igor

CC: ljin, xjin, dhruba, kailiu, leveldb

Differential Revision: https://reviews.facebook.net/D17961
2014-04-17 09:51:43 -07:00
Igor Canadi
588bca2020 RocksDBLite
Summary:
Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile.

Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping)

Test Plan: compiles :)

Reviewers: dhruba, haobo, ljin, sdong, yhchiang

Reviewed By: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17835
2014-04-15 13:39:26 -07:00
Kai Liu
e23e73e67c Use shorten index key for hash-index
Summary:
I was wrong about the "index builder", right now since we create index
by scanning both whole table and index, there is not need to preserve
the whole key as the index key.

I switch back to original way index which is both space efficient and
able to supprot in-fly construction of hash index.

IN this patch, I made minimal change since I'm not sure if we still need
the "pluggable index builder", under current circumstance it is of no use
and kind of over-engineered. But I'm not sure if we can still exploit its
usefulness in the future; otherwise I think I can just burn them with great
vengeance.

Test Plan: unit tests

Reviewers: sdong, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17745
2014-04-10 22:45:25 -07:00
Kai Liu
75b59d5146 Enable hash index for block-based table
Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index.

Test Plan: Wrote several new unit tests.

Reviewers: sdong, haobo, dhruba

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16539
2014-04-10 14:19:43 -07:00
Igor Canadi
4daea66343 Turn on -Wmissing-prototypes
Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes.

Test Plan: compiles

Reviewers: dhruba, haobo, ljin, sdong

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17649
2014-04-09 21:17:14 -07:00
sdong
e9ed28f9c8 PlainTableBuilder::Add() to use local char array instead of reused std::string as tmp buffer
Summary: Our profile shows that in one of the applications, 5% of the CPU costs of PlainTableBuilder::Add() are spent on std::string stacks. By this simple change, we avoid this global reusable string. Also, we avoid another call of file appending, which probably gives another 2%.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D17601
2014-04-09 10:17:32 -07:00
Igor Canadi
34455deb06 Fix Mac OS compile issues 2014-04-08 14:05:53 -07:00
Igor Canadi
5b345b76cb Remove env_ from MergingIterator
Summary: env_ is not used. Compiling for iOS complains.

Test Plan: compiles now

Reviewers: ljin, haobo, sdong, dhruba

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17589
2014-04-08 13:40:42 -07:00
Lei Jin
92c1eb0291 macros for perf_context
Summary: This will allow us to disable them completely for iOS or for better performance

Test Plan: will run make all check

Reviewers: igor, haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17511
2014-04-08 10:58:07 -07:00
sdong
5e2db3b434 PlainTableIterator not to store copied key in std::string
Summary:
Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h.

This patch improves iterator performance significantly. Running this benchmark:

./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond

The average latency is improved to about 750 nanoseconds from 1100 nanoseconds.

Test Plan:
Add a unit test.
make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D17547
2014-04-07 19:06:09 -07:00
Igor Canadi
3d2fe844ab Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/memtable_list.cc
	db/version_set.cc
2014-04-07 11:31:11 -07:00
Igor Canadi
bcd1f15b60 Remove -Wno-unused-const-variable 2014-04-04 16:15:47 -07:00
Igor Canadi
8555ce2dec Merge branch 'master' into columnfamilies 2014-04-02 10:48:05 -07:00
sdong
d50619a559 PlainTableIterator::Seek() shouldn't check bloom filter in total order mode
Summary:
In total order mode, iterator's seek() shouldn't check total order.

Also some cleaning up about checking null for shared pointers. I don't know the behavior before it.

This bug was reported by @igor.

Test Plan: test plain_table_db_test

Reviewers: ljin, haobo, igor

Reviewed By: igor

CC: yhchiang, dhruba, igor, leveldb

Differential Revision: https://reviews.facebook.net/D17391
2014-04-01 15:05:16 -07:00
Igor Canadi
ddbd1ece88 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	db/internal_stats.cc
	db/internal_stats.h
	db/version_edit.cc
	db/version_edit.h
	db/version_set.cc
	include/rocksdb/options.h
	util/options.cc
2014-03-31 13:39:24 -07:00
Lei Jin
0d755fff14 cache friendly blocked bloomfilter
Summary:
By constraining the probes within cache line(s), we can improve the
cache miss rate thus performance. This probably only makes sense for
in-memory workload so defaults the option to off.

Numbers and comparision can be found in wiki:
https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study

Test Plan: benchmarked this change substantially. Will run make all check as well

Reviewers: haobo, igor, dhruba, sdong, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17133
2014-03-28 09:21:20 -07:00
Igor Canadi
e20fa3f8a4 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/internal_stats.cc
	db/internal_stats.h
	db/version_set.cc
2014-03-19 17:22:20 -07:00
Kai Liu
69f6cf431d Fix two bugs in talbe format
Summary:
Previous code had two bugs:

* didn't initialize the table_magic_number_ explicitly -- as a
  result a random junk number is stored for table_magic_number_, making
  HasInitializedMagicNumber() always return true.
* if condition is inconrrect in set_table_magic_number(), and the return value is not checked.
  I replace if-else by a stronger requirement enforced by assert().

Test Plan:
Previous sst_dump failed to work.
After the fix, things back to normal.

Reviewers: yhchiang

CC: haobo, sdong, igor, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D17055
2014-03-19 16:18:33 -07:00
Igor Canadi
fb2346fc1f [CF] Code cleanup part 1
Summary:
I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup.

Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch.

This diff also fixes race condition when dropping column family.

Test Plan: Ran db_stress with variety of parameters

Reviewers: dhruba, haobo

Differential Revision: https://reviews.facebook.net/D16833
2014-03-12 09:56:53 -07:00
Igor Canadi
9634ba42ac Merge branch 'master' into columnfamilies
Conflicts:
	db/compaction_picker.cc
	db/db_impl.cc
	db/db_impl.h
	db/tailing_iter.cc
	db/version_set.h
	include/rocksdb/options.h
	util/options.cc
2014-03-10 17:26:09 -07:00
Lei Jin
8d007b4aaf Consolidate SliceTransform object ownership
Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.

Test Plan: db_bench && make asan_check

Reviewers: haobo, igor, sdong

Reviewed By: igor

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D16587
2014-03-10 12:56:46 -07:00
Igor Canadi
1e0d47276c Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
2014-03-07 16:59:47 -08:00
Kai Liu
566f18e6ad More precise calculation of sub_index_size
Summary:
Previous we did rough estimation of subindex size, which in worst case may result in array reallocation.
This patch aims to get the exact size and avoid any reallocation.

Test Plan: make all check

Reviewers: sdong, dhruba, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16125
2014-03-06 17:30:46 -08:00
Igor Canadi
0738ae6dc9 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
2014-03-05 12:25:05 -08:00
kailiu
a01bda0997 Fix a buggy assert
Summary: The assert was pointless since if if prefix is the same as the whole key, assertion will surely fail. Reason behind is when performing the internal key comparison, if user keys are the same, *key with smaller transaction id* wins.

Test Plan: make -j32 && make check

Reviewers: sdong, dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16551
2014-03-04 18:08:23 -08:00
Igor Canadi
fa34697237 Merge branch 'master' into columnfamilies 2014-03-04 09:39:14 -08:00
kailiu
a1d56e73ec Uncomment the unit tests in table test 2014-03-03 22:19:49 -08:00
kailiu
906f3dca72 Add a hash-index component for block
Summary:
this is the key component extracted from diff: https://reviews.facebook.net/D14271
I separate it to a dedicated patch to make the review easier.

Test Plan: added a unit test and passed it.

Reviewers: haobo, sdong, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16245
2014-03-03 21:11:49 -08:00
Igor Canadi
9d0577a6be Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/transaction_log_impl.cc
	db/transaction_log_impl.h
	include/rocksdb/options.h
	util/env.cc
	util/options.cc
2014-03-03 18:29:03 -08:00
sdong
f0ee2356af Fix issue with iterator operations in this order: Prev(), Seek(), Prev()
Summary:
Due to a bad merge of D14163 and D14001 before checking in D14001, "direction_ = kForward;" in MergeIterator::Seek() was deleted my mistake (in commit b135d01e7b ). It will generate wrong results or assert failure after the sequence of Prev() (or SeekToLast()), Seek() and Prev().

Fix it

Test Plan: make all check

Reviewers: igor, haobo, dhruba

Reviewed By: igor

CC: yhchiang, i.am.jin.lei, ljin, leveldb

Differential Revision: https://reviews.facebook.net/D16527
2014-03-03 17:50:27 -08:00
kailiu
1aeafeccac Make the Create() function comform the convention
Summary:

Moved "Return multiple values" a more conventional way.
2014-03-01 23:43:03 -08:00
Kai Liu
16d4e45c12 Fix the memory leak in table index
Summary:

BinarySearchIndex didn't use unique_ptr to guard the block object nor
delete it in destructor, leading to valgrind failure for "definite
memory leak".

Test Plan:
re-ran the failed valgrind test cases
2014-03-01 11:50:35 -08:00
Kai Liu
ff151132b3 Fix the unit test failure in devbox
Summary:
My last diff was developed in MacOS but in devserver environment error occurs.

I dug into the problem and found the way we calcuate approximate data size is pretty out-of-date. We can use table properties to get more accurate results.

Test Plan: ran ./table_test and passed

Reviewers: igor, dhruba, haobo, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16509
2014-02-28 20:40:05 -08:00
kailiu
74939a9e13 Make the block-based table's index pluggable
Summary:
This patch introduced a new table options that allows us to make
block-based table's index pluggable.

To support that new features:

* Code has been refacotred to be more flexible and supports this option well.
* More documentation is added for the existing obsecure functionalities.
* Big surgeon on DataBlockReader(), where the logic was really convoluted.
* Other small code cleanups.

The pluggablility will mostly affect development of internal modules
and won't change frequently, as a result I intentionally avoid
heavy-weight patterns (like factory) and try to make it simple.

Test Plan: make all check

Reviewers: haobo, sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16395
2014-02-28 18:19:07 -08:00
kailiu
bf86af5174 Remove the terrible hack in for flush_block_policy_factory
Summary:
Previous code is too convoluted and I must be drunk for letting
such code to be written without a second thought.

Thanks to the discussion with @sdong, I added the `Options` when
generating the flusher, thus avoiding the tricks.

Just FYI: I resisted to add Options in flush_block_policy.h since I
wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options
and Options also depends FlushBlockPolicy... While I appreciate my
effort to prevent it, the old design turns out creating more troubles than
it tried to avoid.

Test Plan: ran ./table_test

Reviewers: sdong

Reviewed By: sdong

CC: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D16503
2014-02-28 16:39:27 -08:00
kailiu
444cafc28c Fix inconsistent code format
Summary:
Found some function follows camel style. When naming funciton, we have two styles:

Trivially expose internal data in readonly mode: `all_lower_case()`
Regular function: `CapitalizeFirstLetter()`

I renames these functions.

Test Plan: make -j32

Reviewers: haobo, sdong, dhruba, igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16383
2014-02-26 18:56:39 -08:00
sdong
a04dbf6e49 PlainTable::Next() should pass the error message from ReadKey()
Summary:
PlainTable::Next() should pass the error message from ReadKey(). Now it would return a wrong error message.
Also improve the messages of status when failing to read

Test Plan: make all check

Reviewers: ljin, kailiu, haobo

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16365
2014-02-26 15:12:44 -08:00
Igor Canadi
d39da4b578 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
2014-02-24 17:09:05 -08:00
Kai Liu
2b205b35d8 Disable putting filter block to block cache
Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache.

Test Plan: Wrote a new unit tests to make sure it works.

Reviewers: dhruba, haobo, igor, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16221
2014-02-19 15:38:57 -08:00
Igor Canadi
76c048183c Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	include/rocksdb/db.h
2014-02-14 16:46:03 -08:00
kailiu
63690625cd Expose the table properties to application
Summary: Provide a public API for users to access the table properties for each SSTable.

Test Plan: Added a unit tests to test the function correctness under differnet conditions.

Reviewers: haobo, dhruba, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16083
2014-02-13 16:28:21 -08:00
Kai Liu
b2e7ee8b41 Followup code refactor on plain table
Summary:
Fixed most comments in https://reviews.facebook.net/D15429.
Still have some remaining comments left.

Test Plan: make all check

Reviewers: sdong, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15885
2014-02-13 15:27:59 -08:00
Kai Liu
59cffe02c4 Benchmark table reader wiht nanoseconds
Summary: nanosecnods gave us better view of the performance, especially when some operations are fast so that micro seconds may only reveal less informative results.

Test Plan:
sample output:

    ./table_reader_bench --plain_table --time_unit=nanosecond
    =======================================================================================================
    InMemoryTableSimpleBenchmark:           PlainTable   num_key1:   4096   num_key2:   512   non_empty
    =======================================================================================================
    Histogram (unit: nanosecond):
    Count: 6291456  Average: 475.3867  StdDev: 556.05
    Min: 135.0000  Median: 400.1817  Max: 33370.0000
    Percentiles: P50: 400.18 P75: 530.02 P99: 887.73 P99.9: 8843.26 P99.99: 9941.21
    ------------------------------------------------------
    [     120,     140 )        2   0.000%   0.000%
    [     140,     160 )      452   0.007%   0.007%
    [     160,     180 )    13683   0.217%   0.225%
    [     180,     200 )    54353   0.864%   1.089%
    [     200,     250 )   101004   1.605%   2.694%
    [     250,     300 )   729791  11.600%  14.294% ##
    [     300,     350 )   616070   9.792%  24.086% ##
    [     350,     400 )  1628021  25.877%  49.963% #####
    [     400,     450 )   647220  10.287%  60.250% ##
    [     450,     500 )   577206   9.174%  69.424% ##
    [     500,     600 )  1168585  18.574%  87.999% ####
    [     600,     700 )   506875   8.057%  96.055% ##
    [     700,     800 )   147878   2.350%  98.406%
    [     800,     900 )    42633   0.678%  99.083%
    [     900,    1000 )    16304   0.259%  99.342%
    [    1000,    1200 )     7811   0.124%  99.466%
    [    1200,    1400 )     1453   0.023%  99.490%
    [    1400,    1600 )      307   0.005%  99.494%
    [    1600,    1800 )       81   0.001%  99.496%
    [    1800,    2000 )       18   0.000%  99.496%
    [    2000,    2500 )        8   0.000%  99.496%
    [    2500,    3000 )        6   0.000%  99.496%
    [    3500,    4000 )        3   0.000%  99.496%
    [    4000,    4500 )      116   0.002%  99.498%
    [    4500,    5000 )     1144   0.018%  99.516%
    [    5000,    6000 )     1087   0.017%  99.534%
    [    6000,    7000 )     2403   0.038%  99.572%
    [    7000,    8000 )     9840   0.156%  99.728%
    [    8000,    9000 )    12820   0.204%  99.932%
    [    9000,   10000 )     3881   0.062%  99.994%
    [   10000,   12000 )      135   0.002%  99.996%
    [   12000,   14000 )      159   0.003%  99.998%
    [   14000,   16000 )       58   0.001%  99.999%
    [   16000,   18000 )       30   0.000% 100.000%
    [   18000,   20000 )       14   0.000% 100.000%
    [   20000,   25000 )        2   0.000% 100.000%
    [   25000,   30000 )        2   0.000% 100.000%
    [   30000,   35000 )        1   0.000% 100.000%

Reviewers: haobo, dhruba, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16113
2014-02-13 13:57:36 -08:00
sdong
b5140a0361 Fix table_reader_bench and add it to "make"
Summary: Fix table_reader_bench after some interface changes. Add it to make to avoid future breaking

Test Plan: make table_reader_bench and run it with different options.

Reviewers: kailiu, haobo

Reviewed By: haobo

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D16107
2014-02-12 18:31:02 -08:00
Siying Dong
f3ae3d07cc Add more black-box tests for PlainTable and explicitly support total order mode
Summary:
1. Add some more implementation-aware tests for PlainTable
2. move from a hard-coded one index per 16 rows in one prefix to a configurable number. Also, make hash table ratio = 0  means binary search only. Also fixes some divide 0 risks.
3. Explicitly support total order (only use binary search)
4. some code cleaning up.

Test Plan: make all check

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16023
2014-02-12 17:37:22 -08:00
Igor Canadi
ccdb93e775 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/memtable_list.cc
	db/memtable_list.h
	db/version_set.cc
	db/version_set.h
2014-02-12 14:01:30 -08:00
Igor Canadi
b06840aa7d [CF] Rethinking ColumnFamilyHandle and fix to dropping column families
Summary:
The change to the public behavior:
* When opening a DB or creating new column family client gets a ColumnFamilyHandle.
* As long as column family handle is alive, client can do whatever he wants with it, even drop it
* Dropped column family can still be read from (using the column family handle)
* Added a new call CloseColumnFamily(). Client has to close all column families that he has opened before deleting the DB
* As soon as column family is closed, any calls to DB using that column family handle will fail (also any outstanding calls)

Internally:
* Ref-counting ColumnFamilyData
* New thread-safety for ColumnFamilySet
* Dropped column families are now completely dropped and their memory cleaned-up

Test Plan: added some tests to column_family_test

Reviewers: dhruba, haobo, kailiu, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16101
2014-02-12 13:47:09 -08:00