Commit Graph

6466 Commits

Author SHA1 Message Date
sdong
80f409ea37 Clean PlainTableReader's variables for better data locality
Summary:
Clean PlainTableReader's data structures:
(1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization)
(2) remove some variables only used when initialization from the class
(3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly
(4) make Options a reference.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: ljin

Subscribers: igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18891
2014-06-09 13:53:39 -07:00
Igor Canadi
f43c8262c2 Don't compress block bigger than 2GB
Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446.

Test Plan: make check doesn't complain :)

Reviewers: haobo, ljin, yhchiang, dhruba, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18975
2014-06-09 12:26:09 -07:00
sdong
ee5a51e6ce sst_dump: still try to print out table properties even if failing to read the file
Summary: Even if the file is corrupted, table properties are usually available to print out. Now sst_dump would just fail without printing table properties. With this patch, table properties are still try to be printed out.

Test Plan: run sst_dump against multiple scenarios

Reviewers: igor, yhchiang, ljin, haobo

Reviewed By: haobo

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18981
2014-06-09 11:21:44 -07:00
Yueh-Hsuan Chiang
3e701a654f [Java] Improve documentation for RocksEnv and its C++ resource.
Summary:
Improve documentation for RocksEnv and its C++ resource.  Specifically,
the result of RocksEnv::Default() is a singleton, and the ownership
of its c++ resource belongs to rocksdb c++.  As a result, calling
dispose() of the return value of RocksDB::Default() will be no-op.

Test Plan: no code change.

Reviewers: haobo, ankgup87

Reviewed By: ankgup87

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18945
2014-06-07 17:27:03 -07:00
Igor Canadi
0365eaf12e remove unnecessary printf 2014-06-06 18:27:44 -07:00
Igor Canadi
a0191c9dfe Create Missing Column Families
Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490

Test Plan: added unit test. also, stress test for some time

Reviewers: sdong, haobo, dhruba, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D18951
2014-06-06 18:04:56 -07:00
Igor Canadi
99d3eed2fd Write Fast-path for single column family
Summary: We have a perf regression of Write() even with one column family. Make fast path for single column family to avoid the perf regression. See task #4455480

Test Plan: make check

Reviewers: sdong, ljin

Reviewed By: sdong, ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18963
2014-06-06 17:26:23 -07:00
Yueh-Hsuan Chiang
e72b02e3c2 [Java] Add basic Java binding for rocksdb::Env.
Summary: Add basic Java binding for rocksdb::Env.

Test Plan:
make rocksdbjava
make jtest
cd java
./jdb_bench.sh --max_background_compactions=1
./jdb_bench.sh --max_background_compactions=10

Reviewers: sdong, ankgup87, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18903
2014-06-05 17:09:25 -07:00
sdong
b92a19a431 sst_dump: Set dummy prefix extractor for binary search index in block based table
Summary: Now sst_dump fails in block based tables if binary search index is used, as it requires a prefix extractor. Add it.

Test Plan: Run it against such a file to make sure it fixes the problem.

Reviewers: yhchiang, kailiu

Reviewed By: kailiu

Subscribers: ljin, igor, dhruba, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D18927
2014-06-05 15:37:23 -07:00
Igor Canadi
5d870717ae Correctly preallocate files in universal compaction
Summary: In universal compaction, MaxFileSizeForLevel is ULLONG_MAX. We've been preallocation files to UULONG_MAX size all these time :)

Test Plan: make check

Reviewers: dhruba, haobo, ljin, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18915
2014-06-05 13:19:35 -07:00
Yueh-Hsuan Chiang
166cc5b456 Merge pull request #157 from ankgup87/master
[Java] Add RestoreBackupableDB
2014-06-05 09:24:55 -07:00
Ankit Gupta
1a6c1a5ddd Fix build 2014-06-05 13:34:38 +01:00
Ankit Gupta
2fa0a993b8 Fix build 2014-06-05 13:25:08 +01:00
Ankit Gupta
08314321fa Merge branch 'master' of https://github.com/facebook/rocksdb 2014-06-05 13:17:53 +01:00
Igor Canadi
457bae6911 Fix regression test
Summary:
388d2054c7
added extra line to db_bench output, breaking regression tests. This diff makes it more robust and fixes the issue

Test Plan: ran it

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D18897
2014-06-04 09:59:44 -07:00
Igor Canadi
552c49f0f4 Remove upper bound for rate limiting unit test 2014-06-03 13:58:44 -07:00
Igor Canadi
fd27001072 Fix compile errors on Mac
Summary: https://phabricator.fb.com/P11372644

Test Plan: compiles

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18873
2014-06-03 12:28:58 -07:00
sdong
df9069d23f In DB::NewIterator(), try to allocate the whole iterator tree in an arena
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.

Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.

Test Plan: make all check

Reviewers: ljin, haobo

Reviewed By: haobo

Subscribers: leveldb, dhruba, yhchiang, igor

Differential Revision: https://reviews.facebook.net/D18513
2014-06-02 17:44:57 -07:00
sdong
462796697c dynamic_bloom: replace some divide (remainder) operations with shifts in locality mode, and other improvements
Summary:
This patch changes meaning of options.bloom_locality: 0 means disable cache line optimization and any positive number means use CACHE_LINE_SIZE as block size (the previous behavior is the block size will be CACHE_LINE_SIZE*options.bloom_locality). By doing it, the divide operations inside a block can be replaced by a shift.

Performance is improved:
https://reviews.facebook.net/P471

Also, improve the basic algorithm in two ways:
(1) make sure num of blocks is an odd number
(2) rotate bytes after every probe in locality mode. Since the divider is 2^n, unless doing it, we are never able to use all the bits.
Improvements of false positive: https://reviews.facebook.net/P459

Test Plan: make all check

Reviewers: ljin, haobo

Reviewed By: haobo

Subscribers: dhruba, yhchiang, igor, leveldb

Differential Revision: https://reviews.facebook.net/D18843
2014-06-02 17:36:38 -07:00
Igor Canadi
91ddd587cc Only signal cond variable if need to
Summary:
At the end of BackgroundCallCompaction(), we call SignalAll(), even though we don't need to. If compaction hasn't done anything and there's another compaction running, there is no need to signal on the condition variable. Doing so creates a tight feedback loop which results in log files like:

   wait for memtable flush
   compaction nothing to do
   wait for memtable flush
   compaction nothing to do

This change eliminates that

Test Plan:
make check
Also:

    icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
    7435
    icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
    372

First version is before the change, second version is after the change.

Reviewers: dhruba, ljin, haobo, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18855
2014-06-02 17:23:55 -07:00
Igor Canadi
8cb7ad83c3 Flush stale column families less aggressively
Summary:
We've seen some production issues where column family is detected as stale, although there is only one column family in the system. This is a quick fix that:
1) doesn't flush stale column families if there's only one of them
2) Use 4 as a coefficient instead of 2 for determening when a column family is stale. This will make flushing less aggressive, while still keep a nice dynamic flushing of very stale CFs.

Test Plan: make check

Reviewers: dhruba, haobo, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18861
2014-06-02 15:33:54 -07:00
sdong
593bb2c40b db_stress to add an option to periodically change background thread pool size.
Summary: Add an option to indicates the variation of background threads. If the flag is not 0, for every 100 milliseconds, adjust thread pool size to a value within the range.

Test Plan: run db_stress

Reviewers: haobo, dhruba, igor, ljin

Reviewed By: ljin

Subscribers: leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D18759
2014-06-02 10:32:09 -07:00
Lei Jin
388d2054c7 forward iterator
Summary:
Forward iterator puts everything together in a flat structure instead of
a hierarchy of nested iterators. this should simplify the code and
provide better performance. It also enables more optimization since all
information are accessiable in one place.
Init evaluation shows about 6% improvement

Test Plan: db_test and db_bench

Reviewers: dhruba, igor, tnovak, sdong, haobo

Reviewed By: haobo

Subscribers: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D18795
2014-05-30 14:31:55 -07:00
Lei Jin
f29c62fc6f add an iterator refresh option for SeekRandom
Summary: One more option to allow iterator refreshing when using normal iterator

Test Plan: ran db_bench

Reviewers: haobo, sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18849
2014-05-30 14:09:22 -07:00
Ankit Gupta
fa1b62cabd Merge remote-tracking branch 'upstream/master' 2014-05-29 18:33:11 -07:00
sdong
9899b12780 ThreadID printed when Thread terminating in the same format as posix_logger
Summary: 220132b65e correctly fixed the issue of thread ID printing when terminating a thread. Nothing wrong with it. This diff prints the ID in the same way as in PosixLogger::logv() so that users can be more easily to correlates them.

Test Plan: run env_test and make sure it prints correctly.

Reviewers: igor, haobo, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18819
2014-05-29 11:11:08 -07:00
Yueh-Hsuan Chiang
ab3e566699 [Java] Generalize dis-own native handle and refine dispose framework.
Summary:
1. Move disOwnNativeHandle() function from RocksDB to RocksObject
to allow other RocksObject to use disOwnNativeHandle() when its
ownership of native handle has been transferred.

2. RocksObject now has an abstract implementation of dispose(),
which does the following two things.  First, it checks whether
both isOwningNativeHandle() and isInitialized() return true.
If so, it will call the protected abstract function dispose0(),
which all the subclasses of RocksObject should implement.  Second,
it sets nativeHandle_ = 0.  This redesign ensure all subclasses
of RocksObject have the same dispose behavior.

3. All subclasses of RocksObject now should implement dispose0()
instead of dispose(), and dispose0() will be called only when
isInitialized() returns true.

Test Plan:
make rocksdbjava
make jtest

Reviewers: dhruba, sdong, ankgup87, rsumbaly, swapnilghike, zzbennett, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18801
2014-05-28 18:16:29 -07:00
Yueh-Hsuan Chiang
663627abf3 Merge pull request #153 from bpot/fix_jni_segfault
Segfault when using BackupableDB with Java bindings
2014-05-27 14:07:22 -07:00
Yueh-Hsuan Chiang
8e41e54e1b [Java] Makes DbBenchmark takes 0 and 1 as boolean values.
Summary:
Originally, the boolean arguments in Java DB benchmark only takes
true and false as valid input.  This diff allows boolean arguments
to be set using 0 or 1.

Test Plan:
make rocksdbjava
make jdb_bench
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=1
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=0
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=1

Reviewers: haobo, dhruba, sdong, ankgup87, rsumbaly, swapnilghike, zzbennett

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18687
2014-05-27 13:52:44 -07:00
Ankit Gupta
84cbafa5ce Merge branch 'master' of https://github.com/facebook/rocksdb 2014-05-27 13:04:16 -07:00
Bob Potter
05dd018353 BackupableDB: don't keep a reference to the RocksDB object now that we have disOwnNativeObject 2014-05-27 13:54:02 -05:00
Igor Canadi
f068d2a94d Move master version to 3.2 2014-05-23 10:27:56 -07:00
Igor Canadi
3a1cf1281b Run FIFO compaction as part of db_crashtest2.py
Summary: As title

Test Plan: ran it

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D18783
2014-05-22 10:24:24 -07:00
Bob Potter
5fe176d8f6 Java Bindings: prevent segfault from double delete
Give BackupableDB sole responsibility of the native
object to prevent RocksDB from also trying to delete
it.
2014-05-21 18:47:18 -05:00
Igor Canadi
1fd4654de5 Update HISTORY.md 2014-05-21 15:25:05 -07:00
Igor Canadi
6de6a06631 FIFO compaction style
Summary:
Introducing new compaction style -- FIFO.

FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.

FIFO compaction style is suited for storing high-frequency event logs.

Test Plan: Added a unit test

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

Subscribers: alberts, leveldb

Differential Revision: https://reviews.facebook.net/D18765
2014-05-21 11:43:35 -07:00
Igor Canadi
220132b65e Merge pull request #156 from Chilledheart/print_pthread_info
Print pthread_t in a more safe way
2014-05-21 10:26:09 -07:00
Chilledheart
81b498bc15 Print pthread_t in a more safe way 2014-05-22 01:24:42 +08:00
Igor Canadi
62d2b92075 Merge pull request #154 from mrsqueeze/master
hdfs cleanup and compile test against CDH 4.4.
2014-05-21 09:25:43 -07:00
Ankit Gupta
260842a7f8 Merge branch 'master' of https://github.com/facebook/rocksdb 2014-05-21 07:59:50 -07:00
Ankit Gupta
d271cc5b4e Treat negative values as no limit 2014-05-21 07:54:22 -07:00
Mike Orr
591f71285c cleanup exception text 2014-05-21 07:54:22 -04:00
Mike Orr
d788bb8f71 - hdfs cleanup; fix to NewDirectory to comply with definition in env.h
- fix compile error with env_test; static casts added
2014-05-21 07:50:37 -04:00
Igor Canadi
f725e4fe1f Make RateLimiting unit test less flakey 2014-05-20 17:09:38 -07:00
Igor Canadi
b2cf95fe38 Call EnableFileDeletions with false as argument 2014-05-20 14:28:51 -07:00
Mike Orr
c2fda55cfe hdfs cleanup and compile test against CDH 4.4. 2014-05-20 17:22:12 -04:00
sdong
bd1105aa5a Print out thread ID while thread terminates for decreased pool size.
Summary: Per request from @nkg-, temporarily print thread ID when a thread terminates. It is a temp solution as we try to minimized stderr messages.

Test Plan: env_test

Reviewers: haobo, igor, dhruba

Reviewed By: igor

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D18753
2014-05-19 15:18:02 -07:00
sdong
3df07d1703 ThreadPool to allow decrease number of threads and increase of number of threads is to be instantly scheduled
Summary:
Add a feature to decrease the number of threads in thread pool.
Also instantly schedule more threads if number of threads is increased.

Here is the way it is implemented: each background thread needs its thread ID. After decreasing number of threads, all threads are woken up. The thread with the largest thread ID will terminate. If there are more threads to terminate, the thread will wake up all threads again.

Another change is made so that when number of threads is increased, more threads are created and all previous excessive threads are woken up to do the work.

Test Plan: Add a unit test.

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: yhchiang, igor, nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D18675
2014-05-19 11:52:12 -07:00
Ankit Gupta
5a1b41cee1 Merge branch 'master' of https://github.com/facebook/rocksdb 2014-05-18 22:41:00 -07:00
Ankit Gupta
e87973cde1 Removing code from portal.h for setting handle of RestoreOptions and RestoreBackupableDB 2014-05-18 22:40:48 -07:00