Commit Graph

365 Commits

Author SHA1 Message Date
Lei Jin
a062e1f2c4 SetOptions() for memtable related options
Summary: as title

Test Plan:
make all check
I will think a way to set up stress test for this

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23055
2014-09-17 12:49:13 -07:00
Igor Canadi
faad439ac4 Fix #284
Summary: This work on my compiler, but it turns out some compilers don't implicitly add constness, see: https://github.com/facebook/rocksdb/issues/284. This diff adds constness explicitly.

Test Plan: still compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23409
2014-09-16 10:30:32 -07:00
Igor Canadi
059e584dd3 [unit test] CompactRange should fail if we don't have space
Summary:
See t5106397.

Also, few more changes:
1. in unit tests, the assumption is that writes will be dropped when there is no space left on device. I changed the wording around it.
2. InvalidArgument() errors are only when user-provided arguments are invalid. When the file is corrupted, we need to return Status::Corruption

Test Plan: make check

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23145
2014-09-10 17:00:00 -07:00
Igor Canadi
6bb7e3ef25 Merger test
Summary: I abandoned https://reviews.facebook.net/D18789, but I wrote a good unit test there, so let's check it in. :)

Test Plan: this is test

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22827
2014-09-08 22:24:40 -07:00
Lei Jin
52311463e9 MemTableOptions
Summary: removed reference to options in WriteBatch and DBImpl::Get()

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D23049
2014-09-08 18:46:52 -07:00
Feng Zhu
0af157f9bf Implement full filter for block based table.
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.

2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.

3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.

4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.

5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc

Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.

Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh

Reviewers: igor, yhchiang, ljin, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 10:37:05 -07:00
wankai
88a2f44f99 fix comments 2014-09-08 16:34:04 +08:00
wankai
823773837b replace hard-coded number with named variable 2014-09-08 11:10:17 +08:00
wankai
4c2b1f097b Merge remote-tracking branch 'upstream/master' 2014-09-06 23:23:11 +08:00
wankai
a5d2863074 typo improvement 2014-09-06 23:21:26 +08:00
Radheshyam Balasundaram
5cd0576ffe Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method.
Summary: Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Also added tests.

Test Plan:
make check all
Also ran db_bench to generate multiple files.

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22743
2014-09-05 11:18:01 -07:00
liuhuahang
bb6ae0f80c fix more compile warnings
N/A

Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
2014-09-05 14:14:37 +08:00
Stanislau Hlebik
45a5e3ede0 Remove path with arena==nullptr from NewInternalIterator
Summary:
Simply code by removing code path which does not use Arena
from NewInternalIterator

Test Plan:
make all check
make valgrind_check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22395
2014-09-04 17:40:41 -07:00
Lei Jin
5665e5e285 introduce ImmutableOptions
Summary:
As a preparation to support updating some options dynamically, I'd like
to first introduce ImmutableOptions, which is a subset of Options that
cannot be changed during the course of a DB lifetime without restart.

ColumnFamily will keep both Options and ImmutableOptions. Any component
below ColumnFamily should only take ImmutableOptions in their
constructor. Other options should be taken from APIs, which will be
allowed to adjust dynamically.

I am yet to make changes to memtable and other related classes to take
ImmutableOptions in their ctor. That can be done in a seprate diff as
this one is already pretty big.

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D22545
2014-09-04 16:18:36 -07:00
Igor Canadi
f7f973d354 Merge pull request #269 from huahang/patch-2
fix a few compile warnings
2014-09-04 09:43:00 -07:00
liuhuahang
ef5b384729 fix a few compile warnings
1, const qualifiers on return types make no sense and will trigger a compile warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]

2, class HistogramImpl has virtual functions and thus should have a virtual destructor

3, with some toolchain, the macro __STDC_FORMAT_MACROS is predefined and thus should be checked before define

Change-Id: I69747a03bfae88671bfbb2637c80d17600159c99
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
2014-09-04 23:06:23 +08:00
wankai
1785114a6f delete unused Comparator 2014-09-04 09:10:13 +08:00
wankai
19cc588b77 change to filter_block std::unique_ptr support RAII 2014-09-04 00:44:49 +08:00
wankai
5d25a46936 Merge remote-tracking branch 'upstream/master' 2014-09-03 21:57:13 +08:00
Lei Jin
9b58c73c7c call SanitizeDBOptionsByCFOptions() in the right place
Summary: It only covers Open() with default column family right now

Test Plan: make release

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22467
2014-09-02 14:42:23 -07:00
Igor Canadi
7f19bb93c6 Merge pull request #242 from tdfischer/perf-timer-destructors
Refactor PerfStepTimer to automatically stop on destruct
2014-09-02 13:06:40 -07:00
Torrie Fischer
6614a48418 Refactor PerfStepTimer to stop on destruct
This eliminates the need to remember to call PERF_TIMER_STOP when a section has
been timed. This allows more useful design with the perf timers and enables
possible return value optimizations. Simplistic example:

class Foo {
  public:
    Foo(int v) : m_v(v);
  private:
    int m_v;
}

Foo makeFrobbedFoo(int *errno)
{
  *errno = 0;
  return Foo();
}

Foo bar(int *errno)
{
  PERF_TIMER_GUARD(some_timer);

  return makeFrobbedFoo(errno);
}

int main(int argc, char[] argv)
{
  Foo f;
  int errno;

  f = bar(&errno);

  if (errno)
    return -1;
  return 0;
}

After bar() is called, perf_context.some_timer would be incremented as if
Stop(&perf_context.some_timer) was called at the end, and the compiler is still
able to produce optimizations on the return value from makeFrobbedFoo() through
to main().
2014-09-02 12:04:22 -07:00
Igor Canadi
076bd01a29 Fix compile
Summary: gcc on our dev boxes is not happy about __attribute__((unused))

Test Plan: compiles now

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22707
2014-09-02 11:49:38 -07:00
Igor Canadi
990df99a61 Fix ios compile
Summary: We need to set contbuild for this :)

Test Plan: compiles

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22701
2014-09-02 10:50:15 -07:00
Wankai Zhang
dff2b1a8f8 typo improvement 2014-09-02 22:57:03 +08:00
Radheshyam Balasundaram
d20b8cfaa1 Improve Cuckoo Table Reader performance. Inlined hash function and number of buckets a power of two.
Summary:
Use inlined hash functions instead of function pointer. Make number of buckets a power of two and use bitwise and instead of mod.
After these changes, we get almost 50% improvement in performance.

Results:
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.231us (4.3 Mqps) with batch size of 0
Time taken per op is 0.229us (4.4 Mqps) with batch size of 0
Time taken per op is 0.185us (5.4 Mqps) with batch size of 0
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.108us (9.3 Mqps) with batch size of 10
Time taken per op is 0.100us (10.0 Mqps) with batch size of 10
Time taken per op is 0.103us (9.7 Mqps) with batch size of 10
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.101us (9.9 Mqps) with batch size of 25
Time taken per op is 0.098us (10.2 Mqps) with batch size of 25
Time taken per op is 0.097us (10.3 Mqps) with batch size of 25
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.100us (10.0 Mqps) with batch size of 50
Time taken per op is 0.097us (10.3 Mqps) with batch size of 50
Time taken per op is 0.097us (10.3 Mqps) with batch size of 50
With 120000000 items, utilization is 89.41%, number of hash functions: 2.
Time taken per op is 0.102us (9.8 Mqps) with batch size of 100
Time taken per op is 0.098us (10.2 Mqps) with batch size of 100
Time taken per op is 0.115us (8.7 Mqps) with batch size of 100

With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.201us (5.0 Mqps) with batch size of 0
Time taken per op is 0.155us (6.5 Mqps) with batch size of 0
Time taken per op is 0.152us (6.6 Mqps) with batch size of 0
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.089us (11.3 Mqps) with batch size of 10
Time taken per op is 0.084us (11.9 Mqps) with batch size of 10
Time taken per op is 0.086us (11.6 Mqps) with batch size of 10
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.087us (11.5 Mqps) with batch size of 25
Time taken per op is 0.085us (11.7 Mqps) with batch size of 25
Time taken per op is 0.093us (10.8 Mqps) with batch size of 25
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.094us (10.6 Mqps) with batch size of 50
Time taken per op is 0.094us (10.7 Mqps) with batch size of 50
Time taken per op is 0.093us (10.8 Mqps) with batch size of 50
With 100000000 items, utilization is 74.51%, number of hash functions: 2.
Time taken per op is 0.092us (10.9 Mqps) with batch size of 100
Time taken per op is 0.089us (11.2 Mqps) with batch size of 100
Time taken per op is 0.088us (11.3 Mqps) with batch size of 100

With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.154us (6.5 Mqps) with batch size of 0
Time taken per op is 0.168us (6.0 Mqps) with batch size of 0
Time taken per op is 0.190us (5.3 Mqps) with batch size of 0
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.081us (12.4 Mqps) with batch size of 10
Time taken per op is 0.077us (13.0 Mqps) with batch size of 10
Time taken per op is 0.083us (12.1 Mqps) with batch size of 10
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.077us (13.0 Mqps) with batch size of 25
Time taken per op is 0.073us (13.7 Mqps) with batch size of 25
Time taken per op is 0.073us (13.7 Mqps) with batch size of 25
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.076us (13.1 Mqps) with batch size of 50
Time taken per op is 0.072us (13.8 Mqps) with batch size of 50
Time taken per op is 0.072us (13.8 Mqps) with batch size of 50
With 80000000 items, utilization is 59.60%, number of hash functions: 2.
Time taken per op is 0.077us (13.0 Mqps) with batch size of 100
Time taken per op is 0.074us (13.6 Mqps) with batch size of 100
Time taken per op is 0.073us (13.6 Mqps) with batch size of 100

With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.190us (5.3 Mqps) with batch size of 0
Time taken per op is 0.186us (5.4 Mqps) with batch size of 0
Time taken per op is 0.184us (5.4 Mqps) with batch size of 0
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.079us (12.7 Mqps) with batch size of 10
Time taken per op is 0.070us (14.2 Mqps) with batch size of 10
Time taken per op is 0.072us (14.0 Mqps) with batch size of 10
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.080us (12.5 Mqps) with batch size of 25
Time taken per op is 0.072us (14.0 Mqps) with batch size of 25
Time taken per op is 0.071us (14.1 Mqps) with batch size of 25
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.082us (12.1 Mqps) with batch size of 50
Time taken per op is 0.071us (14.1 Mqps) with batch size of 50
Time taken per op is 0.073us (13.6 Mqps) with batch size of 50
With 70000000 items, utilization is 52.15%, number of hash functions: 2.
Time taken per op is 0.080us (12.5 Mqps) with batch size of 100
Time taken per op is 0.077us (13.0 Mqps) with batch size of 100
Time taken per op is 0.078us (12.8 Mqps) with batch size of 100

Test Plan:
make check all
make valgrind_check
make asan_check

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22539
2014-08-29 19:06:15 -07:00
Tomislav Novak
0f9c43ea36 ForwardIterator: reset incomplete iterators on Seek()
Summary:
When reading from kBlockCacheTier, ForwardIterator's internal child iterators
may end up in the incomplete state (read was unable to complete without doing
disk I/O). `ForwardIterator::status()` will correctly report that; however, the
iterator may be stuck in that state until all sub-iterators are rebuilt:

  * `NeedToSeekImmutable()` may return false even if some sub-iterators are
    incomplete
  * one of the child iterators may be an empty iterator without any state other
    that the kIncomplete status (created using `NewErrorIterator()`); seeking on
    any such iterator has no effect -- we need to construct it again

Akin to rebuilding iterators after a superversion bump, this diff makes forward
iterator reset all incomplete child iterators when `Seek()` or `Next()` are
called.

Test Plan: TEST_TMPDIR=/dev/shm/rocksdbtest ROCKSDB_TESTS=TailingIterator ./db_test

Reviewers: igor, sdong, ljin

Reviewed By: ljin

Subscribers: lovro, march, leveldb

Differential Revision: https://reviews.facebook.net/D22575
2014-08-29 16:21:29 -07:00
Igor Canadi
22a0a60dc4 Merge pull request #250 from wankai/master
delete unused struct Options
2014-08-29 09:53:18 -04:00
Wankai Zhang
be25ee44fe delete unused struct Options 2014-08-29 17:31:04 +08:00
Feng Zhu
1d23b5c470 remove_internal_filter_policy
Summary:
1. remove class InternalFilterPolicy in db/dbformat.h
2. Transformation from internal key to user key is done in filter_block.cc
3. This is a preparation for patch D20979

Test Plan:
make all check
valgrind ./db_test

Reviewers: igor, yhchiang, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22509
2014-08-28 17:06:29 -07:00
Radheshyam Balasundaram
7f71448388 Implementing a cache friendly version of Cuckoo Hash
Summary: This implements a cache friendly version of Cuckoo Hash in which, in case of collission, we try to insert in next few locations. The size of the neighborhood to check is taken as an input parameter in builder and stored in the table.

Test Plan:
make check all
cuckoo_table_{db,reader,builder}_test

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22455
2014-08-28 10:42:23 -07:00
Wankai Zhang
528a11c635 Update block_builder.h
more c++11 way noncopyable and keep parameter's name of constructor consistent
2014-08-28 13:37:10 +08:00
Radheshyam Balasundaram
4142a3e783 Adding a user comparator for comparing Uint64 slices.
Summary:
- New Uint64 comparator
- Modify Reader and Builder to take custom user comparators instead of bytewise comparator
- Modify logic for choosing unused user key in builder
- Modify iterator logic in reader
- test changes

Test Plan:
cuckoo_table_{builder,reader,db}_test
make check all

Reviewers: ljin, sdong

Reviewed By: ljin

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D22377
2014-08-27 10:39:31 -07:00
Lei Jin
23861857c4 ReadOptions.total_order_seek to allow total order seek for block-based table when hash index is enabled
Summary: as title

Test Plan: table_test

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22239
2014-08-25 16:14:30 -07:00
Lei Jin
a98badff16 print table options
Summary: Add a virtual function in table factory that will print table options

Test Plan: make release

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22149
2014-08-25 14:24:09 -07:00
Lei Jin
384400128f move block based table related options BlockBasedTableOptions
Summary:
I will move compression related options in a separate diff since this
diff is already pretty lengthy.
I guess I will also need to change JNI accordingly :(

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21915
2014-08-25 14:22:05 -07:00
Shao Yu Zhang
f76eda74d6 Fix compilation issue on OSX 2014-08-21 18:11:33 -07:00
Radheshyam Balasundaram
08be7f5266 Implement Prepare method in CuckooTableReader
Summary:
- Implement Prepare method
- Rewrite performance tests in cuckoo_table_reader_test to write new file only if one doesn't already exist.
- Add performance tests for batch lookup along with prefetching.

Test Plan:
./cuckoo_table_reader_test --enable_perf
Results (We get better results if we used int64 comparator instead of string comparator (TBD in future diffs)):
With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
Time taken per op is 0.208us (4.8 Mqps) with batch size of 0
With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
Time taken per op is 0.182us (5.5 Mqps) with batch size of 10
With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
Time taken per op is 0.161us (6.2 Mqps) with batch size of 25
With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
Time taken per op is 0.161us (6.2 Mqps) with batch size of 50
With 100000000 items and hash table ratio 0.500000, number of hash functions used: 2.
Time taken per op is 0.163us (6.1 Mqps) with batch size of 100

With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
Time taken per op is 0.252us (4.0 Mqps) with batch size of 0
With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
Time taken per op is 0.192us (5.2 Mqps) with batch size of 10
With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
Time taken per op is 0.195us (5.1 Mqps) with batch size of 25
With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
Time taken per op is 0.191us (5.2 Mqps) with batch size of 50
With 100000000 items and hash table ratio 0.600000, number of hash functions used: 3.
Time taken per op is 0.194us (5.1 Mqps) with batch size of 100

With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
Time taken per op is 0.228us (4.4 Mqps) with batch size of 0
With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
Time taken per op is 0.185us (5.4 Mqps) with batch size of 10
With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
Time taken per op is 0.186us (5.4 Mqps) with batch size of 25
With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
Time taken per op is 0.189us (5.3 Mqps) with batch size of 50
With 100000000 items and hash table ratio 0.750000, number of hash functions used: 3.
Time taken per op is 0.188us (5.3 Mqps) with batch size of 100

With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
Time taken per op is 0.325us (3.1 Mqps) with batch size of 0
With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
Time taken per op is 0.196us (5.1 Mqps) with batch size of 10
With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
Time taken per op is 0.199us (5.0 Mqps) with batch size of 25
With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
Time taken per op is 0.196us (5.1 Mqps) with batch size of 50
With 100000000 items and hash table ratio 0.900000, number of hash functions used: 3.
Time taken per op is 0.209us (4.8 Mqps) with batch size of 100

Reviewers: sdong, yhchiang, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22167
2014-08-20 18:35:35 -07:00
Yueh-Hsuan Chiang
63a2215c63 Improve Options sanitization and add MmapReadRequired() to TableFactory
Summary:
Currently, PlainTable must use mmap_reads.  When PlainTable is used but
allow_mmap_reads is not set, rocksdb will fail in flush.

This diff improve Options sanitization and add MmapReadRequired() to
TableFactory.

Test Plan:
export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest
make db_test -j32
./db_test

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: you, leveldb

Differential Revision: https://reviews.facebook.net/D21939
2014-08-20 15:53:39 -07:00
sdong
045575ad0c Add CuckooHash table format to table_reader_bench
Summary: Make table_reader_bench cover all the three table formats.

Test Plan: Run it using three options

Reviewers: radheshyamb, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22137
2014-08-19 14:58:15 -07:00
Yueh-Hsuan Chiang
570ba5aca8 Avoid retrying to read property block from a table when it does not exist.
Summary:
Avoid retrying to read property block from a table when it does not exist
in updating stats for compensating deletion entries.

In addition, ReadTableProperties() now returns Status::NotFound instead
of Status::Corruption when table properties does not exist in the file.

Test Plan:
make db_test -j32
export ROCKSDB_TESTS=CompactionDeleteionTrigger
./db_test

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21867
2014-08-15 12:17:44 -07:00
Radheshyam Balasundaram
9674c11d01 Integrating Cuckoo Hash SST Table format into RocksDB
Summary:
Contains the following changes:
- Implementation of cuckoo_table_factory
- Adding cuckoo table into AdaptiveTableFactory
- Adding cuckoo_table_db_test, similar to lines of plain_table_db_test
- Minor fixes to Reader: When a key is found in the table, return the key found instead of the search key.
- Minor fixes to Builder: Add table properties that are required by Version::UpdateTemporaryStats() during Get operation. Don't define curr_node as a reference variable as the memory locations may get reassigned during tree.push_back operation, leading to invalid memory access.

Test Plan:
cuckoo_table_reader_test --enable_perf
cuckoo_table_builder_test
cuckoo_table_db_test
make check all
make valgrind_check
make asan_check

Reviewers: sdong, igor, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21219
2014-08-11 20:21:07 -07:00
miguelportilla
93e6b5e9d9 Changes to support unity build:
* Script for building the unity.cc file via Makefile
* Unity executable Makefile target for testing builds
* Source code changes to fix compilation of unity build
2014-08-11 13:22:47 -04:00
ZHANG Biao
63d5cc72fa fix various 'comparison of integers of different signs' compiling errors under macosx 2014-08-07 17:06:07 +08:00
sdong
1242bfcad7 Add DB property "rocksdb.estimate-table-readers-mem"
Summary:
Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache.

Refactor the property codes to allow getting property from a version, with DB mutex not acquired.

Test Plan: Add several checks of this new property in existing codes for various cases.

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, igor, leveldb

Differential Revision: https://reviews.facebook.net/D20733
2014-08-06 11:39:46 -07:00
Radheshyam Balasundaram
606a126703 Changing implementaiton of CuckooTableBuilder to not take file_size, key_length, value_length.
Summary:
 - Maintain a list of key-value pairs as vectors during Add operation.
 - Start building hash table only when Finish() is called.
 - This approach takes more time and space but avoids taking file_size, key and value lengths.
 - Rewrote cuckoo_table_builder_test

I did not know about IterKey while writing this diff. I shall change places where IterKey could be used instead of std::string tomorrow. Please review rest of the logic.

Test Plan:
cuckoo_table_reader_test --enable_perf
cuckoo_table_builder_test
valgrind_check
asan_check

Reviewers: sdong, igor, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20907
2014-08-05 20:55:46 -07:00
Radheshyam Balasundaram
2124c85cc6 Implementing CuckooTableReader::NewIterator
Summary:
- Reads key-value pairs from file and builds an in-memory index of key-to-bucket id map in sorted order of key.
- Assumes bytewise comparator for sorting keys.
- Test changes

Test Plan:
cuckoo_table_reader_test --enable_perf
valgrind_check
asan_check

Reviewers: yhchiang, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb, igor

Differential Revision: https://reviews.facebook.net/D20721
2014-08-05 16:35:02 -07:00
sdong
02c4023666 Remove port::MemoryBarrier() from table_reader_bench
Summary: port::MemoryBarrier() is not recommended to use outside of port. Remove it.

Test Plan: run table_reader_bench

Reviewers: ljin, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21075
2014-08-05 11:33:56 -07:00
Feng Zhu
1129921e9b logging_when_create_and_delete_manifest
Summary:
  1. logging when create and delete manifest file
  2. fix formating in table/format.cc

Test Plan:
  make all check
  run db_bench, track the LOG file.

Reviewers: ljin, yhchiang, igor, yufei.zhu, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21009
2014-08-04 11:25:42 -07:00
Radheshyam Balasundaram
0c9d03ba10 Fixing broken Mac build
Summary: Made some small changes to fix the broken mac build

Test Plan: make check all in both linux and mac. All tests pass.

Reviewers: sdong, igor, ljin, yhchiang

Reviewed By: ljin, yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20895
2014-07-31 20:52:13 -07:00
Feng Zhu
b0999011e2 use stack instead of heap memory in ReadBlockContents in some case
Summary:
  When compression is enabled, and blocksize is not too big, use the
  space in stack to hold bytes read from block.

Bencmark:
base version: commit 8f09d53fd1
  malloc: 1.30% -> 0.98%
  free: 1.49% -> 1.07%

Test Plan:
  make all check

Reviewers: ljin, yhchiang, dhruba, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20679
2014-07-30 23:11:59 -07:00
Feng Zhu
8f09d53fd1 remove malloc when create data and index iterator in Get
Summary:
  Define Block::Iter to be an independent class to be used by block_based_table_reader
  When creating data and index iterator, update an existing iterator rather than new one
  Thus malloc and free could be reduced

Benchmark,
Base:
commit 76286ee67e
commands:
--db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=2621440 --use_hash_search=1 --block_size=1024 --block_restart_interval=1 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1

malloc: 3.30% -> 1.42%
free: 3.59%->1.61%

Test Plan:
  make all check
  run db_stress
  valgrind ./db_test ./table_test

Reviewers: ljin, yhchiang, dhruba, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20655
2014-07-30 16:34:35 -07:00
Radheshyam Balasundaram
91c01485d1 Minor changes to CuckooTableBuilder
Summary:
- Copy the key and value to in-memory hash table during Add operation. Also modified cuckoo_table_reader_test to use this.
- Store only the user_key in in-memory hash table if it is last level file.
- Handle Carryover while chosing unused key in Finish() method in case unused key was never found before Finish() call.

Test Plan:
cuckoo_table_reader_test --enable_perf
cuckoo_table_builder_test
valgrind_check
asan_check

Reviewers: sdong, yhchiang, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20715
2014-07-28 17:14:25 -07:00
Lei Jin
40fa8a4cd5 make statistics forward-able
Summary:
Make StatisticsImpl being able to forward stats to provided statistics
implementation. The main purpose is to allow us to collect internal
stats in the future even when user supplies custom statistics
implementation. It avoids intrumenting 2 sets of stats collection code.
One immediate use case is tuning advisor, which needs to collect some
internal stats, users may not be interested.

Test Plan:
ran db_bench and see stats show up at the end of run
Will run make all check since some tests rely on statistics

Reviewers: yhchiang, sdong, igor

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D20145
2014-07-28 12:05:36 -07:00
sdong
4a8f0c957c Block::Iter::PrefixSeek() to have an extra check to filter out some false matches
Summary:
In block based table's hash index checking, when looking for a key that doesn't exist, there is a high chance that a false block is returned because of hash bucket conflicts. In this revision, another check is done to filter out some of those cases: comparing previous key of the block boundary to see whether the target block is what we are looking for.

In a favored test setting (bloom filter disabled, 8 L0 files), I saw about 80% improvements. In a non-favored test setting (bloom filter enabled, files are all in L1, files are all cached), I see the performance penalty is less than 3%.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: ljin

Subscribers: wuj, leveldb, zagfox, yhchiang

Differential Revision: https://reviews.facebook.net/D20595
2014-07-25 17:27:57 -07:00
Radheshyam Balasundaram
62f9b071ff Implementation of CuckooTableReader
Summary:
Contains:
- Implementation of TableReader based on Cuckoo Hashing
- Unittests for CuckooTableReader
- Performance test for TableReader

Test Plan:
make cuckoo_table_reader_test
./cuckoo_table_reader_test
make valgrind_check
make asan_check

Reviewers: yhchiang, sdong, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20511
2014-07-25 16:37:32 -07:00
Radheshyam Balasundaram
07a7d870b8 Addressing TODOs in CuckooTableBuilder
Summary:
Contains the following changes in CuckooTableBuilder:
- Take an extra parameter in constructor to identify last level file.
- Implement a better way to identify if a bucket has been inserted into the tree already during BFS search.
- Minor typos

Test Plan:
make cuckoo_table_builder
./cuckoo_table_builder
make valgrind_check

Reviewers: sdong, igor, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20445
2014-07-24 10:07:41 -07:00
Feng Zhu
da9274574f Use IterKey instead of string in Block::Iter to reduce malloc
Summary:
  Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test
  Use IterKey in block::Iter to replace std::string to reduce malloc.

  Evaluate it using perf record.
  malloc: 4.26% -> 2.91%
  free: 3.61% -> 3.08%

Test Plan:
  make all check
  ./valgrind db_test dbformat_test

Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong

Reviewed By: sdong

Differential Revision: https://reviews.facebook.net/D20433
2014-07-23 12:31:11 -07:00
Igor Canadi
2d3d63597a Fix signed-unsigned compare error 2014-07-22 15:35:07 -04:00
Radheshyam Balasundaram
f6272e3055 Fixing memory leaks in cuckoo_table_builder_test
Summary: Fixes some memory leaks in cuckoo_builder_test.cc. This also fixed broken valgrind_check tests

Test Plan:
make valgrind_check
./cuckoo_builder_test
Currently running make check all. I shall update once it is done.

Reviewers: ljin, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20385
2014-07-22 09:49:04 -07:00
Yueh-Hsuan Chiang
bbe2e91d00 Fixed a compile error of cuckoo_table_builder.
Summary:
Fixed the following compile error.

./table/cuckoo_table_builder.h:72:22: error: private field 'key_length_' is not used [-Werror,-Wunused-private-field]
  const unsigned int key_length_;
                     ^
1 error generated.

Test Plan: make

Reviewers: sdong, ljin, radheshyamb, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20349
2014-07-21 15:17:09 -07:00
Radheshyam Balasundaram
cf3da899b0 Adding a new SST table builder based on Cuckoo Hashing
Summary:
Cuckoo Hashing based SST table builder. Contains:
- Cuckoo Hashing logic and file storage logic.
- Unit tests for logic

Test Plan:
make cuckoo_table_builder_test
./cuckoo_table_builder_test
make check all

Reviewers: yhchiang, igor, sdong, ljin

Reviewed By: ljin

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D19545
2014-07-21 13:26:09 -07:00
Stanislau Hlebik
c1a90b0848 Fix db_bench
Summary: Adding check for zero size index

Test Plan: ./build_tools/regression_build_test.sh

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20259
2014-07-21 10:31:33 -07:00
Chilledheart
54f4e2f188 Fix clang compiler warnings 2014-07-20 22:57:20 +08:00
Stanislau Hlebik
9d70cce047 Adding option to save PlainTable index and bloom filter in SST file.
Summary:
Adding option to save PlainTable index and bloom filter in SST file.
If there is no bloom block and/or index block, PlainTableReader builds
new ones. Otherwise PlainTableReader just use these blocks.

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19527
2014-07-18 16:58:13 -07:00
Stanislau Hlebik
92d73cbe78 Add PlainTableOptions
Summary:
Since we have a lot of options for PlainTable, add a struct PlainTableOptions
to manage them

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20175
2014-07-18 00:08:38 -07:00
Igor Canadi
1614284eff Fix compressed cache 2014-07-16 06:45:49 -07:00
Stanislau Hlebik
30c81e7717 Removing NewTotalOrderPlainTableFactory
Summary:
Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect.
Total order mode indicator is prefix_extractor == nullptr,
but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests
in plain_table_db_tests is incorrect.

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19587
2014-07-10 11:32:04 -07:00
Lei Jin
8d9a46fcd1 initialize decoded_internal_key_valid
Summary:
ReadInternalKey() will assign correct value anyway. Initialize it to
true to suppress compiler error reported
https://github.com/facebook/rocksdb/issues/186

Test Plan: I cannot reproduce it but this is obvious

Reviewers: sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19467
2014-07-03 23:13:08 -07:00
Feng Zhu
5656367416 use arena to allocate memtable's bloomfilter and hashskiplist's buckets_
Summary:
    Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena
    DynamicBloom: pass arena via constructor, allocate space in SetTotalBits
    HashSkipListRep: allocate space of buckets_ using arena.
       do not delete it in deconstructor because arena would take care of it.
    Several test files are changed.

Test Plan:
    make all check

Reviewers: ljin, haobo, yhchiang, sdong

Reviewed By: sdong

Subscribers: igor, dhruba

Differential Revision: https://reviews.facebook.net/D19335
2014-06-30 15:54:31 -07:00
Igor Canadi
9fe87b17aa Fix compile 2014-06-20 10:36:48 +02:00
Igor Canadi
d4a8423334 Remove seek compaction
Summary:
As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases.

This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction.

There is one test case that relied on didIO information. I'll try to find another way to implement it.

Test Plan: make check

Reviewers: sdong, haobo, yhchiang, ljin, dhruba

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19161
2014-06-20 10:23:02 +02:00
Haobo Xu
7a9dd5f214 [RocksDB] Make block based table hash index more adaptive
Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index.

Test Plan: unit test, also manually veried LOG that fallback did occur.

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19191
2014-06-19 16:40:32 -07:00
Lei Jin
1ec2d1c69d fix make shared_lib compilation error
Summary: s/class ParsedInternalKey/struct ParsedInternalKey

Test Plan: make shared_lib

Reviewers: igor, yhchiang, sdong, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19173
2014-06-19 10:12:26 -07:00
Haobo Xu
167738256f [RocksDB] Fix unit test
Summary: fix a bug in D19047, which caused  DBTest.RecoverDuringMemtableCompaction to fail.

Test Plan: unit test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19155
2014-06-19 01:37:21 -07:00
sdong
edd47c5104 PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key
Summary:
Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes.
The data format is documented in table/plain_table_factory.h

Test Plan: Add unit test coverage in plain_table_db_test

Reviewers: yhchiang, igor, dhruba, ljin, haobo

Reviewed By: haobo

Subscribers: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D18735
2014-06-18 20:41:52 -07:00
Haobo Xu
0f0076ed5a [RocksDB] Reduce memory footprint of the blockbased table hash index.
Summary:
Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption.
Just a quick draft, still testing and refining.

Test Plan: unit test and shadow testing

Reviewers: dhruba, kailiu, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19047
2014-06-18 18:16:07 -07:00
Igor Canadi
3525aac9e5 Change order of parameters in adaptive table factory
Summary:
This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation:
1) block based factory
2) plain table factory
3) output factory
4) new format factory

I think it makes more sense to have output as the first parameter.

Also, fixed a NewAdaptiveTableFactory() call in unit test

Test Plan: unit test

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19119
2014-06-18 07:04:37 +02:00
sdong
200e4b4a72 Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it
Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format.

Test Plan: add a unit test

Reviewers: haobo, igor

Reviewed By: igor

Subscribers: dhruba, ljin, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D19017
2014-06-17 11:49:22 -07:00
Lei Jin
c83b085770 prefetch bloom filter data block for L0 files
Summary: as title

Test Plan:
db_bench
the initial result is very promising. I will post results of complete
runs

Reviewers: dhruba, haobo, sdong, igor

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18867
2014-06-12 10:06:18 -07:00
sdong
88a1691a1e BlockBasedTable::PrefixMayMatch() to bloom setting to the beginning of the function
Summary: In BlockBasedTable::PrefixMayMatch() we calculate prefix even if bloom is not config. Move the check before

Test Plan: make all check

Reviewers: igor, ljin

Reviewed By: ljin

Subscribers: wuj, leveldb, haobo, yhchiang, dhruba

Differential Revision: https://reviews.facebook.net/D18993
2014-06-10 11:14:22 -07:00
sdong
80f409ea37 Clean PlainTableReader's variables for better data locality
Summary:
Clean PlainTableReader's data structures:
(1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization)
(2) remove some variables only used when initialization from the class
(3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly
(4) make Options a reference.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: ljin

Subscribers: igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18891
2014-06-09 13:53:39 -07:00
Igor Canadi
f43c8262c2 Don't compress block bigger than 2GB
Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446.

Test Plan: make check doesn't complain :)

Reviewers: haobo, ljin, yhchiang, dhruba, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18975
2014-06-09 12:26:09 -07:00
sdong
df9069d23f In DB::NewIterator(), try to allocate the whole iterator tree in an arena
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.

Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.

Test Plan: make all check

Reviewers: ljin, haobo

Reviewed By: haobo

Subscribers: leveldb, dhruba, yhchiang, igor

Differential Revision: https://reviews.facebook.net/D18513
2014-06-02 17:44:57 -07:00
Lei Jin
388d2054c7 forward iterator
Summary:
Forward iterator puts everything together in a flat structure instead of
a hierarchy of nested iterators. this should simplify the code and
provide better performance. It also enables more optimization since all
information are accessiable in one place.
Init evaluation shows about 6% improvement

Test Plan: db_test and db_bench

Reviewers: dhruba, igor, tnovak, sdong, haobo

Reviewed By: haobo

Subscribers: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D18795
2014-05-30 14:31:55 -07:00
Kai Liu
0b3d03d026 Materialize the hash index
Summary:
Materialize the hash index to avoid the soaring cpu/flash usage
when initializing the database.

Test Plan: existing unit tests passed

Reviewers: sdong, haobo

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18339
2014-05-15 14:09:03 -07:00
Igor Canadi
26f5dd9a5a TablePropertiesCollectorFactory
Summary:
This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options.

Here's description of task #4296714:
       I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB.

       For example, this table has 3155 entries and 1391828 deleted keys.

The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for.

Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API.

In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe.

Test Plan:
Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one).
Also, all other tests

Reviewers: sdong, dhruba, haobo, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18579
2014-05-13 12:30:55 -07:00
Igor Canadi
fec4269966 Fix more gflag namespace issues 2014-05-09 08:41:02 -07:00
sdong
ddd41146c4 MergingIterator uses autovector instead of vector
Summary:
Use autovector in MergingIterator so that if there are 4 or less child iterators in it, iterator wrappers are inline, which is more likely to be cache friendly.

Based on one test run with a shadow traffic of one product, it reduces CPU of MergingIterator::Seek() by half.

Test Plan: make all check

Reviewers: haobo, yhchiang, igor, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18531
2014-05-08 15:01:20 -07:00
Igor Canadi
b5616dafd1 Fix iOS compile 2014-05-07 17:48:31 -07:00
sdong
3a171dcb51 Pass logger to memtable rep and TLB page allocation error logged to info logs
Summary:
TLB page allocation errors are now logged to info logs, instead of stderr.
In order to do that, mem table rep's factory functions take a info logger now.

Test Plan: make all check

Reviewers: haobo, igor, yhchiang

Reviewed By: yhchiang

CC: leveldb, yhchiang, dhruba

Differential Revision: https://reviews.facebook.net/D18471
2014-05-05 16:43:37 -07:00
sdong
9b17558311 PlainTableFactory::PlainTableFactory() to have huge TLB turned off by default
Summary: PlainTableFactory::PlainTableFactory() now has Huge TLB page feature turned on by default. Although it is not a public API (which we always turn the feature off now), our unit tests, like db_test sometimes uses it directly, which causes wrong coverage of codes. This patch fix it to allow unit tests to run with the correct setting

Test Plan: Run db_test and make sure this feature is not on any more.

Reviewers: igor, haobo

Reviewed By: igor

CC: yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18483
2014-05-05 11:05:54 -07:00
sdong
4a7c747064 Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB""
And make the default 0 for hash linked list memtable

This reverts commit d69dc64be7.
2014-05-04 13:56:29 -07:00
Igor Canadi
d69dc64be7 Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"
This reverts commit 7dafa3a1d7.
2014-05-04 08:37:09 -07:00
Igor Canadi
d28ed6931f fix release build 2014-05-01 12:42:06 -07:00
Igor Canadi
d29e48bb2e fix compile warning 2014-05-01 14:12:35 -04:00
Igor Canadi
0afc8bc29a xxHash
Summary:
Originally: https://github.com/facebook/rocksdb/pull/87/files

I'm taking over to apply some finishing touches

Test Plan: will add tests

Reviewers: dhruba, haobo, sdong, yhchiang, ljin

Reviewed By: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18315
2014-05-01 14:09:32 -04:00
sdong
7dafa3a1d7 Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB
Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: nkg-, dhruba, leveldb, igor, yhchiang

Differential Revision: https://reviews.facebook.net/D18357
2014-04-30 11:02:26 -07:00
Igor Canadi
c489499a2b Fix OSX compile 2014-04-26 17:15:43 -04:00
Lei Jin
ccaca59bee avoid calling FindFile twice in TwoLevelIterator for PlainTable
Summary:
this is to reclaim the regression introduced in
https://reviews.facebook.net/D17853

Test Plan: make all check

Reviewers: igor, haobo, sdong, dhruba, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17985
2014-04-25 12:23:07 -07:00