Commit Graph

1786 Commits

Author SHA1 Message Date
Igor Canadi
91ddd587cc Only signal cond variable if need to
Summary:
At the end of BackgroundCallCompaction(), we call SignalAll(), even though we don't need to. If compaction hasn't done anything and there's another compaction running, there is no need to signal on the condition variable. Doing so creates a tight feedback loop which results in log files like:

   wait for memtable flush
   compaction nothing to do
   wait for memtable flush
   compaction nothing to do

This change eliminates that

Test Plan:
make check
Also:

    icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
    7435
    icanadi@dev1440 ~ $ grep "nothing to do" /fast-rocksdb-tmp/rocksdb_test/column_family_test/LOG | wc -l
    372

First version is before the change, second version is after the change.

Reviewers: dhruba, ljin, haobo, yhchiang, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18855
2014-06-02 17:23:55 -07:00
Igor Canadi
8cb7ad83c3 Flush stale column families less aggressively
Summary:
We've seen some production issues where column family is detected as stale, although there is only one column family in the system. This is a quick fix that:
1) doesn't flush stale column families if there's only one of them
2) Use 4 as a coefficient instead of 2 for determening when a column family is stale. This will make flushing less aggressive, while still keep a nice dynamic flushing of very stale CFs.

Test Plan: make check

Reviewers: dhruba, haobo, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18861
2014-06-02 15:33:54 -07:00
sdong
593bb2c40b db_stress to add an option to periodically change background thread pool size.
Summary: Add an option to indicates the variation of background threads. If the flag is not 0, for every 100 milliseconds, adjust thread pool size to a value within the range.

Test Plan: run db_stress

Reviewers: haobo, dhruba, igor, ljin

Reviewed By: ljin

Subscribers: leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D18759
2014-06-02 10:32:09 -07:00
Lei Jin
388d2054c7 forward iterator
Summary:
Forward iterator puts everything together in a flat structure instead of
a hierarchy of nested iterators. this should simplify the code and
provide better performance. It also enables more optimization since all
information are accessiable in one place.
Init evaluation shows about 6% improvement

Test Plan: db_test and db_bench

Reviewers: dhruba, igor, tnovak, sdong, haobo

Reviewed By: haobo

Subscribers: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D18795
2014-05-30 14:31:55 -07:00
Lei Jin
f29c62fc6f add an iterator refresh option for SeekRandom
Summary: One more option to allow iterator refreshing when using normal iterator

Test Plan: ran db_bench

Reviewers: haobo, sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18849
2014-05-30 14:09:22 -07:00
sdong
9899b12780 ThreadID printed when Thread terminating in the same format as posix_logger
Summary: 220132b65e correctly fixed the issue of thread ID printing when terminating a thread. Nothing wrong with it. This diff prints the ID in the same way as in PosixLogger::logv() so that users can be more easily to correlates them.

Test Plan: run env_test and make sure it prints correctly.

Reviewers: igor, haobo, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18819
2014-05-29 11:11:08 -07:00
Yueh-Hsuan Chiang
ab3e566699 [Java] Generalize dis-own native handle and refine dispose framework.
Summary:
1. Move disOwnNativeHandle() function from RocksDB to RocksObject
to allow other RocksObject to use disOwnNativeHandle() when its
ownership of native handle has been transferred.

2. RocksObject now has an abstract implementation of dispose(),
which does the following two things.  First, it checks whether
both isOwningNativeHandle() and isInitialized() return true.
If so, it will call the protected abstract function dispose0(),
which all the subclasses of RocksObject should implement.  Second,
it sets nativeHandle_ = 0.  This redesign ensure all subclasses
of RocksObject have the same dispose behavior.

3. All subclasses of RocksObject now should implement dispose0()
instead of dispose(), and dispose0() will be called only when
isInitialized() returns true.

Test Plan:
make rocksdbjava
make jtest

Reviewers: dhruba, sdong, ankgup87, rsumbaly, swapnilghike, zzbennett, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18801
2014-05-28 18:16:29 -07:00
Yueh-Hsuan Chiang
663627abf3 Merge pull request #153 from bpot/fix_jni_segfault
Segfault when using BackupableDB with Java bindings
2014-05-27 14:07:22 -07:00
Yueh-Hsuan Chiang
8e41e54e1b [Java] Makes DbBenchmark takes 0 and 1 as boolean values.
Summary:
Originally, the boolean arguments in Java DB benchmark only takes
true and false as valid input.  This diff allows boolean arguments
to be set using 0 or 1.

Test Plan:
make rocksdbjava
make jdb_bench
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=1
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=0
java/jdb_bench.sh --db=/tmp/path-not-exist --use_existing_db=1

Reviewers: haobo, dhruba, sdong, ankgup87, rsumbaly, swapnilghike, zzbennett

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18687
2014-05-27 13:52:44 -07:00
Bob Potter
05dd018353 BackupableDB: don't keep a reference to the RocksDB object now that we have disOwnNativeObject 2014-05-27 13:54:02 -05:00
Igor Canadi
f068d2a94d Move master version to 3.2 2014-05-23 10:27:56 -07:00
Igor Canadi
3a1cf1281b Run FIFO compaction as part of db_crashtest2.py
Summary: As title

Test Plan: ran it

Reviewers: dhruba

Reviewed By: dhruba

Differential Revision: https://reviews.facebook.net/D18783
2014-05-22 10:24:24 -07:00
Bob Potter
5fe176d8f6 Java Bindings: prevent segfault from double delete
Give BackupableDB sole responsibility of the native
object to prevent RocksDB from also trying to delete
it.
2014-05-21 18:47:18 -05:00
Igor Canadi
1fd4654de5 Update HISTORY.md 2014-05-21 15:25:05 -07:00
Igor Canadi
6de6a06631 FIFO compaction style
Summary:
Introducing new compaction style -- FIFO.

FIFO compaction style has write amplification of 1 (+1 for WAL) and it deletes the oldest files when the total DB size exceeds pre-configured values.

FIFO compaction style is suited for storing high-frequency event logs.

Test Plan: Added a unit test

Reviewers: dhruba, haobo, sdong

Reviewed By: dhruba

Subscribers: alberts, leveldb

Differential Revision: https://reviews.facebook.net/D18765
2014-05-21 11:43:35 -07:00
Igor Canadi
220132b65e Merge pull request #156 from Chilledheart/print_pthread_info
Print pthread_t in a more safe way
2014-05-21 10:26:09 -07:00
Chilledheart
81b498bc15 Print pthread_t in a more safe way 2014-05-22 01:24:42 +08:00
Igor Canadi
62d2b92075 Merge pull request #154 from mrsqueeze/master
hdfs cleanup and compile test against CDH 4.4.
2014-05-21 09:25:43 -07:00
Mike Orr
591f71285c cleanup exception text 2014-05-21 07:54:22 -04:00
Mike Orr
d788bb8f71 - hdfs cleanup; fix to NewDirectory to comply with definition in env.h
- fix compile error with env_test; static casts added
2014-05-21 07:50:37 -04:00
Igor Canadi
f725e4fe1f Make RateLimiting unit test less flakey 2014-05-20 17:09:38 -07:00
Igor Canadi
b2cf95fe38 Call EnableFileDeletions with false as argument 2014-05-20 14:28:51 -07:00
Mike Orr
c2fda55cfe hdfs cleanup and compile test against CDH 4.4. 2014-05-20 17:22:12 -04:00
sdong
bd1105aa5a Print out thread ID while thread terminates for decreased pool size.
Summary: Per request from @nkg-, temporarily print thread ID when a thread terminates. It is a temp solution as we try to minimized stderr messages.

Test Plan: env_test

Reviewers: haobo, igor, dhruba

Reviewed By: igor

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D18753
2014-05-19 15:18:02 -07:00
sdong
3df07d1703 ThreadPool to allow decrease number of threads and increase of number of threads is to be instantly scheduled
Summary:
Add a feature to decrease the number of threads in thread pool.
Also instantly schedule more threads if number of threads is increased.

Here is the way it is implemented: each background thread needs its thread ID. After decreasing number of threads, all threads are woken up. The thread with the largest thread ID will terminate. If there are more threads to terminate, the thread will wake up all threads again.

Another change is made so that when number of threads is increased, more threads are created and all previous excessive threads are woken up to do the work.

Test Plan: Add a unit test.

Reviewers: haobo, dhruba

Reviewed By: haobo

CC: yhchiang, igor, nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D18675
2014-05-19 11:52:12 -07:00
Igor Canadi
1e560459b9 Add .swp to gitignore 2014-05-16 13:17:29 -07:00
Igor Canadi
a0e9ee5965 Add rapidjson to RocksDB
Summary:
This diff adds rapidjson (https://code.google.com/p/rapidjson/) to RocksDB repository. First step to adding JSON is to add a JSON parser :)

I'm not sure if rapidjson is the right choice. I only considered folly as an alternative. Using folly JSON parser has serious downsides. I tried extracting folly::json from the folly library. However, it depends on folly::dynamic, which basically depends on everything else. I would prefer to avoid adding complete folly library as RocksDB dependency. Folly has a lot of its own dependencies (https://github.com/facebook/folly) and also looks like open source world has some trouble installing it (https://groups.google.com/forum/#!forum/facebook-folly -- 60% of the posts are compile errors). We can discuss this if you think otherwise.

RapidJSON does not have any dependencies whatsoever. No boost, no STL. We don't need to compile it since it's all header files. The performance results are also impressive: https://code.google.com/p/rapidjson/wiki/Performance. However, I'll make sure to write code in a way that we can easily switch JSON implementations going forward. Quora thread has some alternatives: http://www.quora.com/What-is-the-best-C-JSON-library

Test Plan: none

Reviewers: dhruba, haobo, yhchiang, sdong, jamesgpearce

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18729
2014-05-15 16:07:05 -07:00
Kai Liu
0b3d03d026 Materialize the hash index
Summary:
Materialize the hash index to avoid the soaring cpu/flash usage
when initializing the database.

Test Plan: existing unit tests passed

Reviewers: sdong, haobo

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18339
2014-05-15 14:09:03 -07:00
sdong
4e0602f941 Remove maximum key_size check in db_bench
Summary: Key size limit doesn't seem to be applicable anymore. Remove it.

Test Plan: run a couple of tests in db_bench

Reviewers: haobo, igor, yhchiang, dhruba

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18723
2014-05-15 11:06:37 -07:00
Igor Canadi
f3ec191fc5 Add build status to README.md 2014-05-15 10:16:58 -07:00
Igor Canadi
2ca7a9886d Merge pull request #150 from Chilledheart/master
Fix errors while building with clang (against libc++) under linux
2014-05-15 00:20:34 -07:00
Chilledheart
fa16ef394c Fix errors while building with clang 2014-05-15 12:34:53 +08:00
Igor Canadi
c07c9606ed Expose Status::code() 2014-05-14 16:23:40 -07:00
Igor Canadi
cdc53dc120 Turn off travis notifications 2014-05-14 16:08:02 -07:00
Igor Canadi
52783c793c declare kInline size in arena.cc 2014-05-14 12:40:49 -07:00
Igor Canadi
a490d2d3a0 Revert "Define kInlineSize in .cc file instead of header"
This reverts commit 35a8873aa6.
2014-05-14 12:36:21 -07:00
Igor Canadi
35a8873aa6 Define kInlineSize in .cc file instead of header 2014-05-14 12:29:46 -07:00
Igor Canadi
eea73226e9 Improve EnvHdfs
Summary: Copy improvements from fbcode's version of EnvHdfs to our open-source version. Some very important bug fixes in there.

Test Plan: compiles

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18711
2014-05-14 12:14:18 -07:00
Igor Canadi
f4574449e9 Clean up compaction logging
Summary: Cleaned up compaction logging a little bit. Now file sizes are easier to read. Also, removed the trailing space.

Test Plan:
verified that i'm happy with logging output:

        files_size[#33(seq=101,sz=98KB,0) #31(seq=81,sz=159KB,0) #26(seq=0,sz=637KB,0)]

Reviewers: sdong, haobo, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18549
2014-05-14 12:13:50 -07:00
sdong
3e4a9ec241 Arena to inline 2KB of data in it.
Summary:
In order to use arena to a use case that the total allocation size might be small (LogBuffer is already such a case), inline 1KB of data in it, so that it can be mostly in stack or inline in another class.

If always inlining 2KB is a concern, I could make it a template to determine what to inline. However, dependents need to changes. Doesn't go with it for now

Test Plan: make all check.

Reviewers: haobo, igor, yhchiang, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18609
2014-05-14 11:49:01 -07:00
Igor Canadi
1ef31b6ca1 Merge pull request #143 from mlin/travis-ci
Travis CI
2014-05-14 10:44:22 -07:00
Yueh-Hsuan Chiang
9a0e3ab7ec Merge branch 'master' of github.com:facebook/rocksdb into show-reads 2014-05-13 16:45:54 -07:00
Yueh-Hsuan Chiang
e883407ead [Java] Refined the output of Java DbBenchmark. 2014-05-13 16:44:53 -07:00
sdong
8c2c4602ee FixedPrefixTransform to include prefix length in its name
Summary: As title

Test Plan: make all check.

Reviewers: haobo, igor, yhchiang

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18705
2014-05-13 16:08:21 -07:00
Yueh-Hsuan Chiang
e30dec938d [Java] Fixed a bug in Java DB Benchmark where random reads does not consider full key range.
Summary: Fixed a bug in Java DB Benchmark where random reads does not consider full key range.

Test Plan:
make rocksdbjava
make jdb_bench
cd java
jdb_bench.sh --db=/tmp/rocksdb-test --benchmarks=fillseq --use_existing_db=false --num=100000
jdb_bench.sh --db=/tmp/rocksdb-test --benchmarks=readrandom --use_existing_db=true --num=100000 --reads=1000000

Reviewers: haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18693
2014-05-13 13:19:50 -07:00
Yueh-Hsuan Chiang
93f2643656 [Java] Make read benchmarks print out found / not-found information.
Summary: Make read benchmarks print out found / not-found information.

Test Plan:
make rocksdbjava
make jdb_bench
cd java
./jdb_bench.sh

Reviewers: haobo, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18699
2014-05-13 13:11:26 -07:00
Igor Canadi
26f5dd9a5a TablePropertiesCollectorFactory
Summary:
This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options.

Here's description of task #4296714:
       I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB.

       For example, this table has 3155 entries and 1391828 deleted keys.

The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for.

Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API.

In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe.

Test Plan:
Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one).
Also, all other tests

Reviewers: sdong, dhruba, haobo, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18579
2014-05-13 12:30:55 -07:00
Yueh-Hsuan Chiang
2082a7d745 [Java] Temporary set the number of BG threads based on the number of BG compactions.
Summary:
Before the Java binding for Env is ready, Java developers have no way to
control the number of background threads.  This diff provides a temporary
solution where RocksDB.setMaxBackgroundCompactions() will affect the
number of background threads.

Note that once Env is ready.  Changes made in this diff should be reverted.

Test Plan:
make rocksdbjava
make jtest
make jdb_bench
java/jdb_bench.sh

Reviewers: haobo, sdong

Reviewed By: sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D18681
2014-05-13 12:28:47 -07:00
Yueh-Hsuan Chiang
1c7799d8aa Fixed a file-not-found issue when a log file is moved to archive.
Summary:
Fixed a file-not-found issue when a log file is moved to archive
by doing a missing retry.

Test Plan:
make db_test
export ROCKSDB_TEST=TransactionLogIteratorRace
./db_test

Reviewers: sdong, haobo

Reviewed By: sdong

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D18669
2014-05-12 17:50:21 -07:00
Yueh-Hsuan Chiang
d14581f936 [Java] Rename org.rocksdb.Iterator to org.rocksdb.RocksIterator.
Summary:
Renamed Iterator to RocksIterator to avoid potential confliction with
the Java built-in Iterator.

Test Plan:
make rocksdbjava
make jtest

Reviewers: haobo, dhruba, sdong, ankgup87

Reviewed By: sdong

CC: leveldb, rsumbaly, swapnilghike, zzbennett

Differential Revision: https://reviews.facebook.net/D18615
2014-05-12 11:02:25 -07:00