Commit Graph

2203 Commits

Author SHA1 Message Date
Igor Canadi
a2bb7c3c33 Push- instead of pull-model for managing Write stalls
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes

The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).

When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.

This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.

Test Plan: make check for now. I'll add some unit tests later. Also, perf test.

Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22791
2014-09-08 11:20:25 -07:00
Feng Zhu
0af157f9bf Implement full filter for block based table.
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.

2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.

3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.

4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.

5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc

Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.

Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh

Reviewers: igor, yhchiang, ljin, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 10:37:05 -07:00
Igor Canadi
9360cc690e Fix valgrind issue 2014-09-08 08:01:25 -07:00
Igor Canadi
02d5bff393 Merge pull request #277 from wankai/master
fix comments
2014-09-08 07:48:58 -07:00
wankai
88a2f44f99 fix comments 2014-09-08 16:34:04 +08:00
Igor Canadi
7c16e39228 Merge pull request #276 from wankai/master
replace hard-coded number with named variable
2014-09-07 22:41:21 -07:00
wankai
823773837b replace hard-coded number with named variable 2014-09-08 11:10:17 +08:00
Igor Canadi
db8ca520b2 Merge pull request #273 from nbougalis/static-analysis
Cleanups from static analysis
2014-09-06 14:14:59 -07:00
Igor Canadi
b7b031f428 Merge pull request #274 from wankai/master
typo improvement
2014-09-06 13:53:42 -07:00
wankai
4c2b1f097b Merge remote-tracking branch 'upstream/master' 2014-09-06 23:23:11 +08:00
wankai
a5d2863074 typo improvement 2014-09-06 23:21:26 +08:00
Nik Bougalis
9f8aa09395 Don't leak data returned by opendir 2014-09-05 20:50:29 -07:00
Nik Bougalis
d1cfb71ec7 Remove unused member(s) 2014-09-05 20:50:29 -07:00
Nik Bougalis
bfee319fb0 sizeof(int*) where sizeof(int) was intended 2014-09-05 20:50:29 -07:00
Nik Bougalis
d40c1f742f Add missing break statement 2014-09-05 20:50:29 -07:00
Nik Bougalis
2e97c38980 Avoid off-by-one error when using readlink 2014-09-05 20:50:29 -07:00
Feng Zhu
40ddc3d6c4 add cache bench
Summary: 1. A benchmark for cache

Test Plan: ./cache_bench

Reviewers: yhchiang, dhruba, sdong, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22809
2014-09-05 15:55:43 -07:00
Igor Canadi
9f1c80b556 Drop column family from write thread
Summary: If we drop column family only from (single) write thread, we can be sure that nobody will drop the column family while we're writing (and our mutex is released). This greatly simplifies my patch that's getting rid of MakeRoomForWrite().

Test Plan: make check, but also running stress test

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22965
2014-09-05 15:20:05 -07:00
Igor Canadi
8de151bb99 Add db_bench with lots of column families to regression tests
Summary:
That way we can see when this graph goes up and be happy.

Couple of changes:
1. title
2. fix db_bench to delete column families before deleting the DB. this was asserting when compiled in debug mode
3. don't sync manifest when disableDataSync. We discussed this offline. I can move it to separate diff if you'd like

Test Plan: ran it

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22815
2014-09-05 14:20:18 -07:00
Lei Jin
c9e419ccb6 rename options_ to db_options_ in DBImpl to avoid confusion
Summary: as title

Test Plan: make release

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22935
2014-09-05 11:48:17 -07:00
Radheshyam Balasundaram
5cd0576ffe Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method.
Summary: Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. Also added tests.

Test Plan:
make check all
Also ran db_bench to generate multiple files.

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22743
2014-09-05 11:18:01 -07:00
Raghav Pisolkar
0fbb3facc0 fixed memory leak in unit test DBIteratorBoundTest
Summary: fixed memory leak in unit test DBIteratorBoundTest

Test Plan: ran valgrind test on my unit test

Reviewers: sdong

Differential Revision: https://reviews.facebook.net/D22911
2014-09-05 10:35:28 -07:00
Lei Jin
adcd2532ca fix asan check
Summary:
PlainTable takes reference instead of a copy. Keep a copy in the test
code

Test Plan: make asan_check

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22899
2014-09-05 09:53:04 -07:00
Igor Canadi
4092b7a0bd Merge pull request #272 from project-zerus/patch-1
fix more compile warnings
2014-09-04 23:25:25 -07:00
liuhuahang
bb6ae0f80c fix more compile warnings
N/A

Change-Id: I5b6f9c70aea7d3f3489328834fed323d41106d9f
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
2014-09-05 14:14:37 +08:00
Igor Canadi
6d31441181 Merge pull request #271 from nbougalis/cleanups
Cleanups
2014-09-04 21:46:55 -07:00
Nik Bougalis
0cd0ec4fe0 Plug memory leak during index creation 2014-09-04 20:52:00 -07:00
Nik Bougalis
4329d74e05 Fix swapped variable names to accurately reflect usage 2014-09-04 20:09:45 -07:00
Stanislau Hlebik
45a5e3ede0 Remove path with arena==nullptr from NewInternalIterator
Summary:
Simply code by removing code path which does not use Arena
from NewInternalIterator

Test Plan:
make all check
make valgrind_check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22395
2014-09-04 17:40:41 -07:00
Lei Jin
5665e5e285 introduce ImmutableOptions
Summary:
As a preparation to support updating some options dynamically, I'd like
to first introduce ImmutableOptions, which is a subset of Options that
cannot be changed during the course of a DB lifetime without restart.

ColumnFamily will keep both Options and ImmutableOptions. Any component
below ColumnFamily should only take ImmutableOptions in their
constructor. Other options should be taken from APIs, which will be
allowed to adjust dynamically.

I am yet to make changes to memtable and other related classes to take
ImmutableOptions in their ctor. That can be done in a seprate diff as
this one is already pretty big.

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D22545
2014-09-04 16:18:36 -07:00
Raghav Pisolkar
e0b99d4f5d created a new ReadOptions parameter 'iterate_upper_bound' 2014-09-04 11:00:16 -07:00
Igor Canadi
51ea889002 Fix travis builds
Summary:
Lots of travis builds are failing because on EnvPosixTest.RandomAccessUniqueID: https://travis-ci.org/facebook/rocksdb/builds/34400833

This is the result of their environment and not because of RocksDB's bug.

Also note that RocksDB works correctly even though UniqueID feature is not present in the system (as it's the case with os x)

Test Plan:
OPT=-DTRAVIS make env_test && ./env_test
Observed that offending tests are not being run

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22803
2014-09-04 10:23:45 -07:00
Igor Canadi
a4816269f1 Relax backupable rate limiting test 2014-09-04 10:22:58 -07:00
Igor Canadi
f7f973d354 Merge pull request #269 from huahang/patch-2
fix a few compile warnings
2014-09-04 09:43:00 -07:00
liuhuahang
ef5b384729 fix a few compile warnings
1, const qualifiers on return types make no sense and will trigger a compile warning: warning: type qualifiers ignored on function return type [-Wignored-qualifiers]

2, class HistogramImpl has virtual functions and thus should have a virtual destructor

3, with some toolchain, the macro __STDC_FORMAT_MACROS is predefined and thus should be checked before define

Change-Id: I69747a03bfae88671bfbb2637c80d17600159c99
Signed-off-by: liuhuahang <liuhuahang@zerus.co>
2014-09-04 23:06:23 +08:00
Igor Canadi
2fd3806c88 Merge pull request #263 from wankai/master
delete unused Comparator
2014-09-03 18:17:37 -07:00
wankai
1785114a6f delete unused Comparator 2014-09-04 09:10:13 +08:00
Lei Jin
1b1d9619ff update HISTORY.md
Summary: as title

Test Plan: no

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22761
2014-09-03 17:03:30 -07:00
Lei Jin
703c3eacd9 comments about the BlockBasedTableOptions migration in Options
Summary: as title

Test Plan: none

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22737
2014-09-03 17:01:34 -07:00
Igor Canadi
4b5ad88658 Merge pull request #260 from wankai/master
replace filter_block with std::unique_ptr to support RAII
2014-09-03 10:13:16 -07:00
wankai
19cc588b77 change to filter_block std::unique_ptr support RAII 2014-09-04 00:44:49 +08:00
Igor Canadi
9b976e34f5 Merge pull request #259 from wankai/master
typo improvement
2014-09-03 08:42:25 -07:00
wankai
5d25a46936 Merge remote-tracking branch 'upstream/master' 2014-09-03 21:57:13 +08:00
Lei Jin
9b58c73c7c call SanitizeDBOptionsByCFOptions() in the right place
Summary: It only covers Open() with default column family right now

Test Plan: make release

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22467
2014-09-02 14:42:23 -07:00
Igor Canadi
a84234a61b Ignore missing column families
Summary:
Before this diff, whenever we Write to non-existing column family, Write() would fail.

This diff adds an option to not fail a Write() when WriteBatch points to non-existing column family. MongoDB said this would be useful for them, since they might have a transaction updating an index that was dropped by another thread. This way, they don't have to worry about checking if all indexes are alive on every write. They don't care if they lose writes to dropped index.

Test Plan: added a small unit test

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22143
2014-09-02 13:29:05 -07:00
Feng Zhu
8ed70fc209 add assert to db Put in db_stress test
Summary:
1. assert db->Put to be true in db_stress
2. begin column family with name "1".

Test Plan: 1. ./db_stress

Reviewers: ljin, yhchiang, dhruba, sdong, igor

Reviewed By: sdong, igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22659
2014-09-02 13:21:59 -07:00
Igor Canadi
7f19bb93c6 Merge pull request #242 from tdfischer/perf-timer-destructors
Refactor PerfStepTimer to automatically stop on destruct
2014-09-02 13:06:40 -07:00
Feng Zhu
8438a19360 fix dropping column family bug
Summary: 1. db/db_impl.cc:2324 (DBImpl::BackgroundCompaction) should not raise bg_error_ when column family is dropped during compaction.

Test Plan: 1. db_stress

Reviewers: ljin, yhchiang, dhruba, igor, sdong

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22653
2014-09-02 12:25:58 -07:00
Torrie Fischer
6614a48418 Refactor PerfStepTimer to stop on destruct
This eliminates the need to remember to call PERF_TIMER_STOP when a section has
been timed. This allows more useful design with the perf timers and enables
possible return value optimizations. Simplistic example:

class Foo {
  public:
    Foo(int v) : m_v(v);
  private:
    int m_v;
}

Foo makeFrobbedFoo(int *errno)
{
  *errno = 0;
  return Foo();
}

Foo bar(int *errno)
{
  PERF_TIMER_GUARD(some_timer);

  return makeFrobbedFoo(errno);
}

int main(int argc, char[] argv)
{
  Foo f;
  int errno;

  f = bar(&errno);

  if (errno)
    return -1;
  return 0;
}

After bar() is called, perf_context.some_timer would be incremented as if
Stop(&perf_context.some_timer) was called at the end, and the compiler is still
able to produce optimizations on the return value from makeFrobbedFoo() through
to main().
2014-09-02 12:04:22 -07:00
Igor Canadi
076bd01a29 Fix compile
Summary: gcc on our dev boxes is not happy about __attribute__((unused))

Test Plan: compiles now

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22707
2014-09-02 11:49:38 -07:00