Commit Graph

380 Commits

Author SHA1 Message Date
Raghav Pisolkar
0fbb3facc0 fixed memory leak in unit test DBIteratorBoundTest
Summary: fixed memory leak in unit test DBIteratorBoundTest

Test Plan: ran valgrind test on my unit test

Reviewers: sdong

Differential Revision: https://reviews.facebook.net/D22911
2014-09-05 10:35:28 -07:00
Stanislau Hlebik
45a5e3ede0 Remove path with arena==nullptr from NewInternalIterator
Summary:
Simply code by removing code path which does not use Arena
from NewInternalIterator

Test Plan:
make all check
make valgrind_check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22395
2014-09-04 17:40:41 -07:00
Raghav Pisolkar
e0b99d4f5d created a new ReadOptions parameter 'iterate_upper_bound' 2014-09-04 11:00:16 -07:00
Igor Canadi
0c26e76b28 Merge pull request #237 from tdfischer/tdfischer/faster-timeout-test
test: db: fix test to have a smaller timeout for when it runs on faster ...
2014-08-28 20:40:10 -04:00
Radheshyam Balasundaram
b6fd7811eb Don't do memtable lookup in db_impl_readonly if memtables are empty while opening db.
Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not.

Test Plan:
db_test
make check all

Reviewers: sdong, yhchiang, ljin, igor

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22281
2014-08-26 17:19:03 -07:00
Stanislau Hlebik
9dcb75b6d9 Add is-file-deletions-enabled property
Summary:
Add property 'rocksdb.is-file-deletions-enable'
	 which equals disable_delete_obsole_file_

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22119
2014-08-26 16:26:29 -07:00
Torrie Fischer
0db6b028e7 Update timeout to 50ms instead of 3. 2014-08-26 09:38:45 -07:00
Lei Jin
384400128f move block based table related options BlockBasedTableOptions
Summary:
I will move compression related options in a separate diff since this
diff is already pretty lengthy.
I guess I will also need to change JNI accordingly :(

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21915
2014-08-25 14:22:05 -07:00
Yueh-Hsuan Chiang
63a2215c63 Improve Options sanitization and add MmapReadRequired() to TableFactory
Summary:
Currently, PlainTable must use mmap_reads.  When PlainTable is used but
allow_mmap_reads is not set, rocksdb will fail in flush.

This diff improve Options sanitization and add MmapReadRequired() to
TableFactory.

Test Plan:
export ROCKSDB_TESTS=PlainTableOptionsSanitizeTest
make db_test -j32
./db_test

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: you, leveldb

Differential Revision: https://reviews.facebook.net/D21939
2014-08-20 15:53:39 -07:00
sdong
10720a5587 Revert the unintended change that DestroyDB() doesn't clean up info logs.
Summary: A previous change triggered a change by mistake: DestroyDB() will keep info logs under DB directory. Revert the unintended change.

Test Plan: Add a unit test case to verify it.

Reviewers: ljin, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22209
2014-08-20 12:07:32 -07:00
Torrie Fischer
7c5173d27f test: db: fix test to have a smaller timeout for when it runs on faster hardware 2014-08-19 13:45:12 -07:00
sdong
28b5c76004 WriteBatchWithIndex: a wrapper of WriteBatch, with a searchable index
Summary:
Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write.

WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset.

Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries.

I will add more unit tests if people are OK about this API.

Test Plan:
make all check
Add unit tests.

Reviewers: yhchiang, igor, MarkCallaghan, ljin

Reviewed By: ljin

Subscribers: dhruba, leveldb, xjin

Differential Revision: https://reviews.facebook.net/D21381
2014-08-18 16:37:38 -07:00
sdong
58b0f9d890 Support purging logs from separate log directory
Summary:
1. Support purging info logs from a separate paths from DB path. Refactor the codes of generating info log prefixes so that it can be called when generating new files and scanning log directory.
2. Fix the bug of not scanning multiple DB paths (should only impact multiple DB paths)

Test Plan:
Add unit test for generating and parsing info log files
Add end-to-end test in db_test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb, igor, dhruba

Differential Revision: https://reviews.facebook.net/D21801
2014-08-14 13:22:50 -07:00
Yueh-Hsuan Chiang
1562653ba0 Fixed a signed-unsigned comparison error in db_test
Summary:
Fixed a signed-unsigned comparison error in db_test

Test Plan:
make db_test
2014-08-12 17:26:47 -07:00
Stanislau Hlebik
06a52bda64 Flush only one column family
Summary:
Currently DBImpl::Flush() triggers flushes in all column families.
Instead we need to trigger just the column family specified.

Test Plan: make all check

Reviewers: igor, ljin, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20841
2014-08-11 22:10:32 -07:00
Stanislau Hlebik
7c88249f51 Fix db_test and DBIter
Summary: Fix old issue with DBTest.Randomized with BlockBasedTableWithWholeKeyHashIndex + added printing in DBTest.Randomized.

Test Plan: make all check

Reviewers: zagfox, igor, ljin, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D21003
2014-08-08 09:44:14 -07:00
sdong
1242bfcad7 Add DB property "rocksdb.estimate-table-readers-mem"
Summary:
Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache.

Refactor the property codes to allow getting property from a version, with DB mutex not acquired.

Test Plan: Add several checks of this new property in existing codes for various cases.

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, igor, leveldb

Differential Revision: https://reviews.facebook.net/D20733
2014-08-06 11:39:46 -07:00
Yueh-Hsuan Chiang
1903aa5cc7 Fixed a warning / error in signed and unsigned comparison
Summary:
Fixed the following compilation error detected in mac:
db/db_test.cc:2524:3: note: in instantiation of function template
  specialization 'rocksdb::test::Tester::IsEq<unsigned long long, int>' requested here
    ASSERT_EQ(int_num, 0);
      ^

Test Plan:
make
2014-07-31 14:48:00 -07:00
sdong
f04356e660 Add DB::GetIntProperty() to return integer properties to be returned as integers
Summary: We have quite some properties that are integers and we are adding more. Add a function to directly return them as an integer, instead of a string

Test Plan: Add several unit test checks

Reviewers: yhchiang, igor, dhruba, haobo, ljin

Reviewed By: ljin

Subscribers: yoshinorim, leveldb

Differential Revision: https://reviews.facebook.net/D20637
2014-07-28 16:55:57 -07:00
sdong
f6784766db Add DB property estimated number of keys
Summary: Add a DB property of estimated number of live keys, by adding number of entries of all mem tables and all files, subtracted by all deletions in all files.

Test Plan: Add the case in unit tests

Reviewers: hobbymanyp, ljin

Reviewed By: ljin

Subscribers: MarkCallaghan, yoshinorim, leveldb, igor, dhruba

Differential Revision: https://reviews.facebook.net/D20631
2014-07-28 15:27:58 -07:00
Lei Jin
d650612c4c expose RateLimiter definition
Summary:
User gets undefinied error since the definition is not exposed.
Also re-enable the db test with only upper bound check

Test Plan: db_test, rate_limit_test

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20403
2014-07-25 15:17:06 -07:00
sdong
f6b7e1ed1a Allow user to specify DB path of output file of manual compaction
Summary: Add a parameter path_id to DB::CompactRange(), to indicate where the output file should be placed to.

Test Plan: add a unit test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, igor, dhruba, MarkCallaghan, leveldb

Differential Revision: https://reviews.facebook.net/D20085
2014-07-21 19:06:00 -07:00
Stanislau Hlebik
1c9f190ae3 Fix db_test
Summary: Added deletion of DBIterators in DBIterator's tests

Test Plan: make valgrind_check

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20043
2014-07-16 14:51:43 -07:00
Stanislau Hlebik
d916593ead Add Prev() for merge operator
Summary: Implement Prev() with merge operator for DBIterator. Request from mongoDB. Task 4673663.

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19743
2014-07-15 16:10:18 -07:00
sdong
0abaed2e08 Support multiple DB directories in universal compaction style
Summary:
This patch adds a target size parameter in options.db_paths and universal compaction will base it to determine which DB path to place a new file.
Level-style stays the same.

Test Plan: Add new unit tests

Reviewers: ljin, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, igor, leveldb

Differential Revision: https://reviews.facebook.net/D19869
2014-07-15 12:06:28 -07:00
Tomislav Novak
3b97ee96c4 ForwardIterator seek bugfix
Summary:
If `NeedToSeekImmutable()` returns false, `SeekInternal()` won't reset the
contents of `immutable_min_heap_`. However, since it calls `UpdateCurrent()`
unconditionally, if `current_` is one of immutable iterators (previously popped
from `immutable_min_heap_`), `UpdateCurrent()` will overwrite it. As a result,
if old `current_` in fact pointed to the smallest entry, forward iterator will
skip some records.

Fix implemented in this diff pushes `current_` back to `immutable_min_heap_`
before calling `UpdateCurrent()`.

Test Plan:
New unit test (courtesy of @lovro):
   $ ROCKSDB_TESTS=TailingIteratorSeekToSame ./db_test

Reviewers: igor, dhruba, haobo, ljin

Reviewed By: ljin

Subscribers: lovro, leveldb

Differential Revision: https://reviews.facebook.net/D19653
2014-07-10 16:46:13 -07:00
Tomislav Novak
105c1e099b ForwardIterator::status() checks all child iterators
Summary:
Forward iterator only checked `status_` and `mutable_iter_->status()`, which is
not sufficient. For example, when reading exclusively from cache
(kBlockCacheTier), `mutable_iter_->status()` may return kOk (e.g. there's
nothing in the memtable), but one of immutable iterators could be in
kIncomplete. In this case, `ForwardIterator::status()` ought to return that
status instead of kOk.

This diff changes `status()` to also check `imm_iters_`, `l0_iters_`, and
`level_iters_`.

Test Plan:
  ROCKSDB_TESTS=TailingIteratorIncomplete ./db_test

Reviewers: ljin, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D19581
2014-07-10 12:43:12 -07:00
sdong
36de0e5359 Add a function to return current perf level
Summary: Add a function to return the perf level. It is to allow a wrapper of DB to increase the perf level and restore the original perf level after finishing the function call.

Test Plan: Add a verification in db_test

Reviewers: yhchiang, igor, ljin

Reviewed By: ljin

Subscribers: xjin, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D19551
2014-07-10 11:35:48 -07:00
Lei Jin
8a7d1fe616 disable rate limiter test
Summary:
The test is not stable because it relies on disk and only runs for a
short period of time. So misisng a compaction/flush would greatly affect
the rate. I am disabling it for now. What do you guys think?

Test Plan: make

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19599
2014-07-09 22:46:15 -07:00
Lei Jin
73d7147096 make rate limiter test more reliable
Summary:
Randomize keys so that compaction actually happens.
Change the config so that compaction happens more aggressively.
The test takes longer time, but the results are more stable shown by
iostat

Test Plan: ran it

Reviewers: igor, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19533
2014-07-08 15:15:00 -07:00
Lei Jin
534357ca3a integrate rate limiter into rocksdb
Summary:
Add option and plugin rate limiter for PosixWritableFile. The rate
limiter only applies to flush and compaction. WAL and MANIFEST are
excluded from this enforcement.

Test Plan: db_test

Reviewers: igor, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19425
2014-07-08 12:31:49 -07:00
Yueh-Hsuan Chiang
7b85c1e900 Improve SimpleWriteTimeoutTest to avoid false alarm.
Summary:
SimpleWriteTimeoutTest has two parts: 1) insert two large key/values
to make memtable full and expect both of them are successful; 2) insert
another key / value and expect it to be timed-out.  Previously we also
set a timeout in the first step, but this might sometimes cause
false alarm.

This diff makes the first two writes run without timeout setting.

Test Plan:
export ROCKSDB_TESTS=Time
make db_test

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19461
2014-07-04 00:02:12 -07:00
Yueh-Hsuan Chiang
d4d338de33 Add timeout_hint_us to WriteOptions and introduce Status::TimeOut.
Summary:
This diff adds timeout_hint_us to WriteOptions.  If it's non-zero, then
1) writes associated with this options MAY be aborted when it has been
  waiting for longer than the specified time.  If an abortion happens,
  associated writes will return Status::TimeOut.
2) the stall time of the associated write caused by flush or compaction
  will be limited by timeout_hint_us.

The default value of timeout_hint_us is 0 (i.e., OFF.)

The statistics of timeout writes will be recorded in WRITE_TIMEDOUT.

Test Plan:
export ROCKSDB_TESTS=WriteTimeoutAndDelayTest
make db_test
./db_test

Reviewers: igor, ljin, haobo, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18837
2014-07-03 15:47:02 -07:00
sdong
2459f7ec4e Support Multiple DB paths (without having an interface to expose to users)
Summary:
In this patch, we allow RocksDB to support multiple DB paths internally.
No user interface is supported yet so this patch is silent to users.

Test Plan: make all check

Reviewers: igor, haobo, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18921
2014-07-02 21:14:44 -07:00
sdong
1d05006740 Re-commit the correct part (WalDir) of the revision:
Commit 6634844dba by sdong
Two small fixes in db_test

Summary:
Two fixes:
(1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
(2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.

Test Plan: ./db_test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: nkg-, igor, dhruba, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D19389
2014-07-01 18:54:50 -07:00
sdong
30b20604db Revert "Two small fixes in db_test"
This reverts commit 6634844dba.
2014-07-01 17:41:38 -07:00
sdong
9c332aa11a HashLinkList memtable switches a bucket to a skip list to reduce performance outliers
Summary:
In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search.

Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list.

The new data structure is documented in comments in the codes.

Test Plan:
make all check
set threshold_use_skiplist in several tests

Reviewers: yhchiang, haobo, ljin

Reviewed By: yhchiang, ljin

Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D19299
2014-07-01 17:14:15 -07:00
sdong
6634844dba Two small fixes in db_test
Summary:
Two fixes:
(1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
(2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.

Test Plan: ./db_test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: nkg-, igor, dhruba, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D19389
2014-07-01 16:58:03 -07:00
Feng Zhu
5656367416 use arena to allocate memtable's bloomfilter and hashskiplist's buckets_
Summary:
    Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena
    DynamicBloom: pass arena via constructor, allocate space in SetTotalBits
    HashSkipListRep: allocate space of buckets_ using arena.
       do not delete it in deconstructor because arena would take care of it.
    Several test files are changed.

Test Plan:
    make all check

Reviewers: ljin, haobo, yhchiang, sdong

Reviewed By: sdong

Subscribers: igor, dhruba

Differential Revision: https://reviews.facebook.net/D19335
2014-06-30 15:54:31 -07:00
Yueh-Hsuan Chiang
e813f5b6d9 Allow compaction to reclaim storage more effectively.
Summary:
This diff allows compaction to reclaim storage more effectively.
In the current design, compactions are mainly triggered based on
the file sizes.  However, since deletion entries does not have
value, files which have many deletion entries are less likely
to be compacted.  As a result, it may took a while to make
deletion entries to be compacted.

This diff address issue by compensating the size of deletion
entries during compaction process: the size of each deletion
entry in the compaction process is augmented by 2x average
value size.  The diff applies to both leveled and universal
compacitons.

Test Plan:
develop CompactionDeletionTrigger
make db_test
./db_test

Reviewers: haobo, igor, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19029
2014-06-24 16:37:06 -06:00
Igor Canadi
d4a8423334 Remove seek compaction
Summary:
As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases.

This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction.

There is one test case that relied on didIO information. I'll try to find another way to implement it.

Test Plan: make check

Reviewers: sdong, haobo, yhchiang, ljin, dhruba

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19161
2014-06-20 10:23:02 +02:00
Haobo Xu
7a9dd5f214 [RocksDB] Make block based table hash index more adaptive
Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index.

Test Plan: unit test, also manually veried LOG that fallback did occur.

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19191
2014-06-19 16:40:32 -07:00
Lei Jin
77db08f27b fix forward iterator bug
Summary: obvious

Test Plan: db_test

Reviewers: sdong, haobo, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18987
2014-06-10 09:57:26 -07:00
Igor Canadi
6de6a06631 FIFO compaction style
Summary:
Introducing new compaction style -- FIFO.

FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.

FIFO compaction style is suited for storing high-frequency event logs.

Test Plan: Added a unit test

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

Subscribers: alberts, leveldb

Differential Revision: https://reviews.facebook.net/D18765
2014-05-21 11:43:35 -07:00
sdong
3e4a9ec241 Arena to inline 2KB of data in it.
Summary:
In order to use arena to a use case that the total allocation size might be small (LogBuffer is already such a case), inline 1KB of data in it, so that it can be mostly in stack or inline in another class.

If always inlining 2KB is a concern, I could make it a template to determine what to inline. However, dependents need to changes. Doesn't go with it for now

Test Plan: make all check.

Reviewers: haobo, igor, yhchiang, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18609
2014-05-14 11:49:01 -07:00
Yueh-Hsuan Chiang
1c7799d8aa Fixed a file-not-found issue when a log file is moved to archive.
Summary:
Fixed a file-not-found issue when a log file is moved to archive
by doing a missing retry.

Test Plan:
make db_test
export ROCKSDB_TEST=TransactionLogIteratorRace
./db_test

Reviewers: sdong, haobo

Reviewed By: sdong

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D18669
2014-05-12 17:50:21 -07:00
Igor Canadi
8e37a29bfb Compaction with zero outputs
Summary: We had a hypothesis in https://reviews.facebook.net/D18507 that empty-string internal keys might have been caused by compaction filter deleting all the entries. I added a unit test for that case. Unforutnately, everything works as expected.

Test Plan: this is a test

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18519
2014-05-08 13:48:39 -07:00
Igor Canadi
d2569fea47 log_and_apply_bench on a new benchmark framework
Summary:
db_test includes Benchmark for LogAndApply. This diff removes it from db_test and puts it into a separate log_and_apply bench. I just wanted to play around with our new benchmark framework and figure out how it works.

I would also like to show you how great it is! I believe right set of microbenchmarks can speed up our productivity a lot and help catch early regressions.

Test Plan: no

Reviewers: dhruba, haobo, sdong, ljin, yhchiang

Reviewed By: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18261
2014-05-05 11:11:48 -07:00
sdong
4a7c747064 Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB""
And make the default 0 for hash linked list memtable

This reverts commit d69dc64be7.
2014-05-04 13:56:29 -07:00
Igor Canadi
d69dc64be7 Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"
This reverts commit 7dafa3a1d7.
2014-05-04 08:37:09 -07:00
Igor Canadi
0afc8bc29a xxHash
Summary:
Originally: https://github.com/facebook/rocksdb/pull/87/files

I'm taking over to apply some finishing touches

Test Plan: will add tests

Reviewers: dhruba, haobo, sdong, yhchiang, ljin

Reviewed By: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18315
2014-05-01 14:09:32 -04:00
sdong
7dafa3a1d7 Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB
Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: nkg-, dhruba, leveldb, igor, yhchiang

Differential Revision: https://reviews.facebook.net/D18357
2014-04-30 11:02:26 -07:00
Yueh-Hsuan Chiang
9d9d2965cb Add a new mem-table representation based on cuckoo hash.
Summary:
= Major Changes =
* Add a new mem-table representation, HashCuckooRep, which is based cuckoo hash.
  Cuckoo hash uses multiple hash functions.  This allows each key to have multiple
  possible locations in the mem-table.

  - Put: When insert a key, it will try to find whether one of its possible
    locations is vacant and store the key.  If none of its possible
    locations are available, then it will kick out a victim key and
    store at that location.  The kicked-out victim key will then be
    stored at a vacant space of its possible locations or kick-out
    another victim.  In this diff, the kick-out path (known as
    cuckoo-path) is found using BFS, which guarantees to be the shortest.

 - Get: Simply tries all possible locations of a key --- this guarantees
   worst-case constant time complexity.

 - Time complexity: O(1) for Get, and average O(1) for Put if the
   fullness of the mem-table is below 80%.

 - Default using two hash functions, the number of hash functions used
   by the cuckoo-hash may dynamically increase if it fails to find a
   short-enough kick-out path.

 - Currently, HashCuckooRep does not support iteration and snapshots,
   as our current main purpose of this is to optimize point access.

= Minor Changes =
* Add IsSnapshotSupported() to DB to indicate whether the current DB
  supports snapshots.  If it returns false, then DB::GetSnapshot() will
  always return nullptr.

Test Plan:
Run existing tests.  Will develop a test specifically for cuckoo hash in
the next diff.

Reviewers: sdong, haobo

Reviewed By: sdong

CC: leveldb, dhruba, igor

Differential Revision: https://reviews.facebook.net/D16155
2014-04-29 17:13:46 -07:00
Igor Canadi
f1c9aa6ebe More unsigned/signed compare fixes 2014-04-29 13:01:06 -07:00
Igor Canadi
dd9eb7a7d5 Cache result of ReadFirstRecord()
Summary:
ReadFirstRecord() reads the actual log file from disk on every call. This diff introduces a cache layer on top of ReadFirstRecord(), which should significantly speed up repeated calls to GetUpdatesSince().

I also cleaned up some stuff, but the whole TransactionLogIterator could use some refactoring, especially if we see increased usage.

Test Plan: make check

Reviewers: haobo, sdong, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18387
2014-04-29 13:27:58 -04:00
Igor Canadi
72ff275e3c Fix TransactionLogIterator EOF caching
Summary:
When TransactionLogIterator comes to EOF, it calls UnmarkEOF and continues reading. However, if glibc cached the EOF status of the file, it will get EOF again, even though the new data might have been written to it.

This has been causing errors in Mac OS.

Test Plan: test passes, was failing before

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18381
2014-04-28 23:30:27 -04:00
Lei Jin
ccaca59bee avoid calling FindFile twice in TwoLevelIterator for PlainTable
Summary:
this is to reclaim the regression introduced in
https://reviews.facebook.net/D17853

Test Plan: make all check

Reviewers: igor, haobo, sdong, dhruba, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17985
2014-04-25 12:23:07 -07:00
Lei Jin
d642c60bdc Check PrefixMayMatch on Seek()
Summary:
As a follow-up diff for https://reviews.facebook.net/D17805, add
optimization to check PrefixMayMatch on Seek()

Test Plan: make all check

Reviewers: igor, haobo, sdong, yhchiang, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17853
2014-04-25 12:22:23 -07:00
Lei Jin
3995e801ab kill ReadOptions.prefix and .prefix_seek
Summary:
also add an override option total_order_iteration if you want to use full
iterator with prefix_extractor

Test Plan: make all check

Reviewers: igor, haobo, sdong, yhchiang

Reviewed By: haobo

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D17805
2014-04-25 12:21:34 -07:00
sdong
a570740727 Expose number of entries in mem tables to users
Summary: In this patch, two new DB properties are defined: rocksdb.num-immutable-mem-table and rocksdb.num-entries-imm-mem-tables, from where number of entries in mem tables can be exposed to users

Test Plan:
Cover the codes in db_test
make all check

Reviewers: haobo, ljin, igor

Reviewed By: igor

CC: nkg-, igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18207
2014-04-22 22:13:21 -07:00
Igor Canadi
1068d2fa60 Revert "Better port::Mutex::AssertHeld() and AssertNotHeld()"
This reverts commit ddafceb6c2.
2014-04-22 18:38:10 -07:00
Igor Canadi
ddafceb6c2 Better port::Mutex::AssertHeld() and AssertNotHeld()
Summary:
Using ThreadLocalPtr as a flag to determine if a mutex is locked or not enables us to implement AssertNotHeld(). It also makes AssertHeld() actually correct.

I had to remove port::Mutex as a dependency for util/thread_local.h, but that's fine since we can just use std::mutex :)

Test Plan: make check

Reviewers: ljin, dhruba, haobo, sdong, yhchiang

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18171
2014-04-22 17:26:21 -07:00
Igor Canadi
f813279da5 Remove TransactionLogIteratorRace when -DNDEBUG 2014-04-21 11:08:30 -07:00
sdong
0f40fe4bc7 When creating a new DB, fail it when wal_dir contains existing log files
Summary: Current behavior of creating new DB is, if there is existing log files, we will go ahead and replay them on top of empty DB. This is a behavior that no user would expect. With this patch, we will fail the creation if a user creates a DB with existing log files.

Test Plan: make all check

Reviewers: haobo, igor, ljin

Reviewed By: haobo

CC: nkg-, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D17817
2014-04-15 14:01:57 -07:00
Igor Canadi
dbe0f327ca Set log_empty to false even when options.sync is off [fix tests] 2014-04-15 10:28:34 -07:00
Kai Liu
1405232b6d Temporarily disable a test case in db_test
Summary:

Root cause is still under investigation. Just Disable the troubling use case for now.
2014-04-10 17:17:39 -07:00
Kai Liu
75b59d5146 Enable hash index for block-based table
Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index.

Test Plan: Wrote several new unit tests.

Reviewers: sdong, haobo, dhruba

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16539
2014-04-10 14:19:43 -07:00
Igor Canadi
4daea66343 Turn on -Wmissing-prototypes
Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes.

Test Plan: compiles

Reviewers: dhruba, haobo, ljin, sdong

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17649
2014-04-09 21:17:14 -07:00
Igor Canadi
dc55903293 Improved CompressedCache
Summary:
This is testing behavior that was reported in https://github.com/facebook/rocksdb/issues/111

No issue was found, but it still good to commit this and make CompressedCache more robust.

Test Plan: this is a plan

Reviewers: ljin, dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17625
2014-04-09 11:43:14 -07:00
Igor Canadi
b947fdc89d Column family support for DB::OpenForReadOnly()
Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though)

Test Plan: added a unit test in column_family_test

Reviewers: haobo, sdong, ljin, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17565
2014-04-09 09:56:17 -07:00
Igor Canadi
731e55c01c Fix GetProperty() test
Summary:
GetProperty test is flakey.
Before this diff: P8635927
After: P8635945

We need to make sure the thread is done before we destruct sleeping tasks. Otherwise, bad things happen.

Test Plan: See summary

Reviewers: ljin, sdong, haobo, dhruba

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17595
2014-04-08 14:57:00 -07:00
Igor Canadi
3d2fe844ab Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_impl.h
	db/memtable_list.cc
	db/version_set.cc
2014-04-07 11:31:11 -07:00
Haobo Xu
48bc0c6ad3 [RocksDB] Fix a race condition in GetSortedWalFiles
Summary: This patch fixed a race condition where a log file is moved to archived dir in the middle of GetSortedWalFiles. Without the fix, the log file would be missed in the result, which leads to transaction log iterator gap. A test utility SyncPoint is added to help reproducing the race condition.

Test Plan: TransactionLogIteratorRace; make check

Reviewers: dhruba, ljin

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17121
2014-04-02 22:12:29 -07:00
sdong
4af1954fd6 Compaction Filter V1 to use old context struct to keep backward compatible
Summary: The previous change D15087 changed existing compaction filter, which makes the commonly used class not backward compatible. Revert the older interface. Use a new interface for V2 instead.

Test Plan: make all check

Reviewers: haobo, yhchiang, igor

CC: danguo, dhruba, ljin, igor, leveldb

Differential Revision: https://reviews.facebook.net/D17223
2014-04-02 14:57:51 -07:00
Igor Canadi
8555ce2dec Merge branch 'master' into columnfamilies 2014-04-02 10:48:05 -07:00
sdong
e0a87c4cf1 DBIter to use static allocated char array for saved_key_ (if it is not too long)
Summary: DBIter now uses a std::string for saved_key. Based on some profiling, it could be more expensive than we though. Optimize it with the same technique as LookupKey -- if it is short, we copy it to a static allocated char. Otherwise, dynamically allocate memory for it.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: dhruba, igor, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D17289
2014-04-01 16:43:11 -07:00
Igor Canadi
ddbd1ece88 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	db/internal_stats.cc
	db/internal_stats.h
	db/version_edit.cc
	db/version_edit.h
	db/version_set.cc
	include/rocksdb/options.h
	util/options.cc
2014-03-31 13:39:24 -07:00
Igor Canadi
8a139a054c More valgrind issues!
Summary: Fix some more CompactionFilterV2 valgrind issues. Maybe it would make sense for CompactionFilterV2 to delete its prefix_extractor?

Test Plan: ran CompactionFilterV2* tests with valgrind. issues before patch -> no issues after

Reviewers: haobo, sdong, ljin, dhruba

Reviewed By: dhruba

CC: leveldb, danguo

Differential Revision: https://reviews.facebook.net/D17337
2014-03-29 10:34:47 -07:00
sdong
43a593a6d9 Change default value of some Options
Summary: Since we are optimizing for server workloads, some default values are not optimized any more. We change some of those values that I feel it's less prone to regression bugs.

Test Plan: make all check

Reviewers: dhruba, haobo, ljin, igor, yhchiang

Reviewed By: igor

CC: leveldb, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D16995
2014-03-28 17:09:28 -07:00
Haobo Xu
a92194e5b2 [RocksDB] Add db property "rocksdb.cur-size-active-mem-table"
Summary: as title

Test Plan: db_test

Reviewers: sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17217
2014-03-27 15:14:04 -07:00
sdong
6b2e7a2a01 When Options.max_num_files=-1, non level0 files also by pass table cache
Summary:
This is the part that was not finished when doing the Options.max_num_files=-1 feature. For iterating non level0 SST files (which was done using two level iterator), table cache is not bypassed. With this patch, the leftover feature is done.

Test Plan: make all check; change Options.max_num_files=-1 in one of the tests to cover the codes.

Reviewers: haobo, igor, dhruba, ljin, yhchiang

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17001
2014-03-25 18:40:52 -07:00
Igor Canadi
e86d7dffd7 Merge branch 'master' into columnfamilies 2014-03-25 15:24:02 -07:00
Danny Guo
d9ca83df28 [rocksdb] make init prefix more robust
Summary:
Currently if client uses kNULLString as the prefix, it will confuse
compaction filter v2. This diff added a bool to indicate if the prefix
has been intialized. I also added a unit test to cover this case and
make sure the new code path is hit.

Test Plan: db_test

Reviewers: igor, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17151
2014-03-25 11:59:40 -07:00
Igor Canadi
e8168382c4 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	include/rocksdb/options.h
	util/options.cc
2014-03-25 11:09:40 -07:00
Danny Guo
b47812fba6 [rocksdb] new CompactionFilterV2 API
Summary:
This diff adds a new CompactionFilterV2 API that roll up the
decisions of kv pairs during compactions. These kv pairs must share the
same key prefix. They are buffered inside the db.

    typedef std::vector<Slice> SliceVector;
    virtual std::vector<bool> Filter(int level,
                                 const SliceVector& keys,
                                 const SliceVector& existing_values,
                                 std::vector<std::string>* new_values,
                                 std::vector<bool>* values_changed
                                 ) const = 0;

Application can override the Filter() function to operate
on the buffered kv pairs. More details in the inline documentation.

Test Plan:
make check. Added unit tests to make sure Keep, Delete,
Change all works.

Reviewers: haobo

CCs: leveldb

Differential Revision: https://reviews.facebook.net/D15087
2014-03-24 20:47:53 -07:00
Igor Canadi
ac328a86b9 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
2014-03-20 14:41:37 -07:00
sdong
f681030c80 Fix DBTest.UniversalCompactionTrigger failure caused by D17067
Summary: D17067 breaks DBTest.UniversalCompactionTrigger because of wrong location of the checking. Fix it.

Test Plan: Run the test and make sure it passes.

Reviewers: igor, haobo

Reviewed By: igor

CC: dhruba, ljin, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D17079
2014-03-20 11:10:11 -07:00
sdong
752ec46cd5 Add a unit test to verify compaction filter context
Summary: Add unit tests to make sure CompactionFilterContext::is_manual_compaction_ and CompactionFilterContext::is_full_compaction_ are set correctly.

Test Plan: run the new tests.

Reviewers: haobo, igor, dhruba, yhchiang, ljin

Reviewed By: haobo

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D17067
2014-03-19 18:10:48 -07:00
Igor Canadi
e20fa3f8a4 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/internal_stats.cc
	db/internal_stats.h
	db/version_set.cc
2014-03-19 17:22:20 -07:00
Igor Canadi
22507aff6c Fix compile issue in Mac OS
Summary:
Compile issues are:
* Unused variable env_
* Unused fallocate_with_keep_size_

Test Plan: compiles

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17043
2014-03-19 15:40:12 -07:00
sdong
71e6a34271 Add a DB property to indicate number of background errors encountered
Summary: Add a property to calculate number of background errors encountered to help users build their monitoring

Test Plan: Add a unit test. make all check

Reviewers: haobo, igor, dhruba

Reviewed By: igor

CC: ljin, nkg-, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D16959
2014-03-18 14:28:30 -07:00
Kai Liu
1ec72b37b1 Several easy-to-add properties related to compaction and flushes
Summary: To partly address the request @nkg- raised, add three easy-to-add properties to compactions and flushes.

Test Plan: run unit tests and add a new unit test to cover new properties.

Reviewers: haobo, dhruba

Reviewed By: dhruba

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D13677
2014-03-18 14:00:09 -07:00
Igor Canadi
e0c1211555 Merge branch 'master' into columnfamilies
Conflicts:
	db/version_set.cc
	tools/db_stress.cc
2014-03-17 12:21:05 -07:00
sdong
c61c9830d4 Fix a bug that Prev() can hang.
Summary: Prev() now can hang when there is a key with more than max_skipped number of appearance internally but all of them are newer than the sequence ID to seek. Add unit tests to confirm the bug and fix it.

Test Plan: make all check

Reviewers: igor, haobo

Reviewed By: igor

CC: ljin, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D16899
2014-03-17 10:00:41 -07:00
Igor Canadi
928ee23567 Change WriteBatch interface 2014-03-14 13:40:06 -07:00
Igor Canadi
e1f56e12cf Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	tools/db_stress.cc
2014-03-13 13:21:20 -07:00
Kai Liu
11da8bc5df A heuristic way to check if a memtable is full
Summary:
This is is based on https://reviews.facebook.net/D15027. It's not finished but I would like to give a prototype to avoid arena over-allocation while making better use of the already allocated memory blocks.

Instead of check approximate memtable size, we will take a deeper look at the arena, which incorporate essential idea that @sdong suggests: flush when arena has allocated its last and the last is "almost full"

Test Plan: N/A

Reviewers: haobo, sdong

Reviewed By: sdong

CC: leveldb, sdong

Differential Revision: https://reviews.facebook.net/D15051
2014-03-12 16:40:14 -07:00
Igor Canadi
25c8a1a20f More bug fixed introduced by code cleanup 2014-03-12 12:28:23 -07:00
Igor Canadi
b5d6ad69fc Bug fixes introduced by code cleanup 2014-03-12 11:10:26 -07:00
Igor Canadi
2b95dc1542 Revert "Fix bad merge of D16791 and D16767"
This reverts commit 839c8ecfcd.
2014-03-12 09:37:43 -07:00