Commit Graph

1969 Commits

Author SHA1 Message Date
Evan Shaw
0bf5589c74 Fix compaction_filter.h typos 2014-07-08 08:11:01 +12:00
Ankit Gupta
97bfcd6a16 Update doc of options.cc 2014-07-07 12:26:06 -07:00
Ankit Gupta
c9cfb88256 Remove doc from options class 2014-07-07 12:21:15 -07:00
Ankit Gupta
4efca96bb3 Change enum type from int to byte 2014-07-07 12:18:55 -07:00
Radheshyam Balasundaram
f0660d5253 Adding NUMA support to db_bench tests
Summary:
Changes:
- Adding numa_aware flag to db_bench.cc
- Using numa.h library to bind memory and cpu of threads to a fixed NUMA node
Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op.

Test Plan:
Ran db_bench tests using following command:
./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True]

The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True.

Reviewers: sdong, yhchiang, ljin, igor

Reviewed By: ljin, igor

Subscribers: igor, leveldb

Differential Revision: https://reviews.facebook.net/D19353
2014-07-07 10:53:31 -07:00
Ankit Gupta
e8c592c625 Add newline 2014-07-07 10:25:27 -07:00
Igor Canadi
0bc5fa9f40 Merge pull request #190 from edsrzf/c-api-writebatch-serialized
C API: support constructing write batch from serialized representation
2014-07-07 10:17:43 -07:00
Ankit Gupta
47f0cf6d38 Add docs 2014-07-07 10:06:28 -07:00
Ankit Gupta
da12f9ec4e Remove .so package with JNI 2014-07-07 10:01:28 -07:00
Ankit Gupta
54d7a2c0c5 Add compression type to options 2014-07-07 09:58:54 -07:00
Igor Canadi
efbb4bd387 Merge pull request #193 from rdallman/columnfamilies
C API: column family support
2014-07-07 09:47:30 -07:00
Ankit Gupta
bc708e0012 Merge branch 'master' of https://github.com/facebook/rocksdb 2014-07-07 09:14:32 -07:00
Reed Allman
1a34aaaef0 C API: column family support 2014-07-07 01:41:01 -07:00
Evan Shaw
9fc23d0c56 C API: support constructing write batch from serialized representation 2014-07-06 10:36:33 +12:00
Yueh-Hsuan Chiang
7b85c1e900 Improve SimpleWriteTimeoutTest to avoid false alarm.
Summary:
SimpleWriteTimeoutTest has two parts: 1) insert two large key/values
to make memtable full and expect both of them are successful; 2) insert
another key / value and expect it to be timed-out.  Previously we also
set a timeout in the first step, but this might sometimes cause
false alarm.

This diff makes the first two writes run without timeout setting.

Test Plan:
export ROCKSDB_TESTS=Time
make db_test

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19461
2014-07-04 00:02:12 -07:00
Lei Jin
8d9a46fcd1 initialize decoded_internal_key_valid
Summary:
ReadInternalKey() will assign correct value anyway. Initialize it to
true to suppress compiler error reported
https://github.com/facebook/rocksdb/issues/186

Test Plan: I cannot reproduce it but this is obvious

Reviewers: sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19467
2014-07-03 23:13:08 -07:00
Yueh-Hsuan Chiang
d33657a4a5 Fixed a warning in release mode.
Summary: Removed a variable that is only used in assertion check.

Test Plan: make release

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19455
2014-07-03 17:19:17 -07:00
Yueh-Hsuan Chiang
90a6aca48e Finer report I/O stats about Flush and Compaction.
Summary:
This diff allows the I/O stats about Flush and Compaction to be reported
in a more accurate way.  Instead of measuring the size of a file, it
measure I/O cost in per read / write basis.

Test Plan: make all check

Reviewers: sdong, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19383
2014-07-03 16:28:03 -07:00
Yueh-Hsuan Chiang
a1df6c1fc8 Update HISTORY.md to include TimeOut write API and compaction update.
Summary: Update HISTORY.md to include TimeOut write API and compaction update.

Test Plan: n/a

Reviewers: ljin, sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19449
2014-07-03 15:58:27 -07:00
Yueh-Hsuan Chiang
d4d338de33 Add timeout_hint_us to WriteOptions and introduce Status::TimeOut.
Summary:
This diff adds timeout_hint_us to WriteOptions.  If it's non-zero, then
1) writes associated with this options MAY be aborted when it has been
  waiting for longer than the specified time.  If an abortion happens,
  associated writes will return Status::TimeOut.
2) the stall time of the associated write caused by flush or compaction
  will be limited by timeout_hint_us.

The default value of timeout_hint_us is 0 (i.e., OFF.)

The statistics of timeout writes will be recorded in WRITE_TIMEDOUT.

Test Plan:
export ROCKSDB_TESTS=WriteTimeoutAndDelayTest
make db_test
./db_test

Reviewers: igor, ljin, haobo, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18837
2014-07-03 15:47:02 -07:00
Ankit Gupta
6b835c6009 Merge branch 'master' of https://github.com/facebook/rocksdb 2014-07-03 14:13:46 -07:00
Ankit Gupta
70aa243fa1 Package .so file with .jar 2014-07-03 14:13:05 -07:00
Igor Canadi
4203431e71 Fix mac os compile error 2014-07-03 23:03:24 +02:00
Yueh-Hsuan Chiang
6580685260 Add TimedWait() API to CondVar.
Summary:
Add TimedWait() API to CondVar, which will be used in the future to
support TimedOut Write API and Rate limiter.

Test Plan: make db_test -j32

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19431
2014-07-03 10:22:08 -07:00
sdong
2459f7ec4e Support Multiple DB paths (without having an interface to expose to users)
Summary:
In this patch, we allow RocksDB to support multiple DB paths internally.
No user interface is supported yet so this patch is silent to users.

Test Plan: make all check

Reviewers: igor, haobo, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18921
2014-07-02 21:14:44 -07:00
Igor Canadi
f146cab261 Centralize compression decision to compaction picker
Summary:
Before this diff, we're deciding enable_compression in CompactionPicker and then we're deciding final compression type in DBImpl. This is kind of confusing.

After the diff, the final compression type will be decided in CompactionPicker.

The reason for this is that I want CompactFiles() to specify output compression type, so that people can mix and match compression styles in their compaction algorithms. This diff makes it much easier to do that.

Test Plan: make check

Reviewers: dhruba, haobo, sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19137
2014-07-02 20:40:57 +02:00
Igor Canadi
d3f63f03ad Fix 32-bit errors
Summary: https://www.facebook.com/groups/rocksdb.dev/permalink/590438347721350/

Test Plan: compiles

Reviewers: sdong, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19197
2014-07-02 11:40:16 +02:00
sdong
1d05006740 Re-commit the correct part (WalDir) of the revision:
Commit 6634844dba by sdong
Two small fixes in db_test

Summary:
Two fixes:
(1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
(2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.

Test Plan: ./db_test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: nkg-, igor, dhruba, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D19389
2014-07-01 18:54:50 -07:00
sdong
30b20604db Revert "Two small fixes in db_test"
This reverts commit 6634844dba.
2014-07-01 17:41:38 -07:00
sdong
9c332aa11a HashLinkList memtable switches a bucket to a skip list to reduce performance outliers
Summary:
In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search.

Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list.

The new data structure is documented in comments in the codes.

Test Plan:
make all check
set threshold_use_skiplist in several tests

Reviewers: yhchiang, haobo, ljin

Reviewed By: yhchiang, ljin

Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D19299
2014-07-01 17:14:15 -07:00
sdong
6634844dba Two small fixes in db_test
Summary:
Two fixes:
(1) WalDir to pick a directory under TmpDir to allow two tests running in parallel without impacting each other
(2) kBlockBasedTableWithWholeKeyHashIndex is disabled by mistake (I assume). Enable it.

Test Plan: ./db_test

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: nkg-, igor, dhruba, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D19389
2014-07-01 16:58:03 -07:00
Feng Zhu
c4018e771c In tools/db_stress.cc, set proper value in NewHashSkipListRepFactory's bucket_size
Summary:
    Now that the arena is used to allocate space for hashskiplist's bucket. The bucket size
    need to be set small enough to avoid "should_flush_" failure in memtable's assertion.

Test Plan:
    make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: igor

Differential Revision: https://reviews.facebook.net/D19371
2014-07-01 11:02:42 -07:00
Igor Canadi
f5d4df1c02 Fix compile error 2014-07-01 10:55:03 +02:00
Igor Canadi
a2e0d890ed No need for files_by_size_ in universal compaction
Summary: files_by_size_ is sorted by time in case of universal compaction. However, Version::files_ is also sorted by time. So no need for files_by_size_

Test Plan:
1) make check with the change
2) make check with `assert(last_index == c->input_version_->files_[level].size() - 1);` in compaction picker

Reviewers: dhruba, haobo, yhchiang, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19125
2014-07-01 08:55:04 +02:00
Feng Zhu
5656367416 use arena to allocate memtable's bloomfilter and hashskiplist's buckets_
Summary:
    Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena
    DynamicBloom: pass arena via constructor, allocate space in SetTotalBits
    HashSkipListRep: allocate space of buckets_ using arena.
       do not delete it in deconstructor because arena would take care of it.
    Several test files are changed.

Test Plan:
    make all check

Reviewers: ljin, haobo, yhchiang, sdong

Reviewed By: sdong

Subscribers: igor, dhruba

Differential Revision: https://reviews.facebook.net/D19335
2014-06-30 15:54:31 -07:00
Yueh-Hsuan Chiang
f799c8be5f [Java] Correct the library loading for zlib in RocksJava.
Summary:
Correct the library loading for zlib in RocksJava: zlib should
be loaded by loadLibrary("z") instead of loadLibrary("zlib").

Test Plan:
make rocksdbjava
cd java
make db_bench
./jdb_bench.sh --compression_type=zlib

Reviewers: sdong, ljin, ankgup87

Reviewed By: ankgup87

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19341
2014-06-29 12:35:47 -07:00
Yueh-Hsuan Chiang
918a355e00 Merge pull request #183 from ankgup87/master
[Java] Merge pull request #183 --- Add statistics collector utility

Summary:
1. Add statistics collector which collects statistics periodically at a period specified by caller.
2. Added unit-test for StatsCollector

make rocksdbjava
make test

Reviewers: haobo, yhchiang, sdong, dhruba, rsumbaly, zzbennett, swapnilghike
Reviewed By: yhchiang
CC: leveldb

Differential Revision: https://reviews.facebook.net/19215
2014-06-29 10:29:26 -07:00
Ankit Gupta
b2625a4cc1 Add java docs for StatsCollector constructor and StatsCollectorCallback thread safety 2014-06-29 07:52:27 -07:00
Ankit Gupta
abf9920386 Merge branch 'master' of https://github.com/facebook/rocksdb 2014-06-28 11:22:33 -07:00
sdong
dd337bc0b2 In logging format, use PRIu64 instead of casting
Summary: Code cleaning up, since we are already using __STDC_FORMAT_MACROS in printing uint64_t, change other places. Only logging is changed.

Test Plan: make all check

Reviewers: ljin

Reviewed By: ljin

Subscribers: dhruba, yhchiang, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D19113
2014-06-27 16:34:15 -07:00
Stanislau Hlebik
a3594867ba Cache some conditions for DBImpl::MakeRoomForWrite
Summary:
Task 4580155. Some conditions in DBImpl::MakeRoomForWrite can be cached in
ColumnFamilyData, because theirs value can be changed only during compaction,
adding new memtable and/or add recalculation of compaction score.

These conditions are:

cfd->imm()->size() ==  cfd->options()->max_write_buffer_number - 1
cfd->current()->NumLevelFiles(0) >=  cfd->options()->level0_stop_writes_trigger
cfd->options()->soft_rate_limit > 0.0 &&
    (score = cfd->current()->MaxCompactionScore()) >  cfd->options()->soft_rate_limit
cfd->options()->hard_rate_limit > 1.0 &&
    (score = cfd->current()->MaxCompactionScore()) >  cfd->options()->hard_rate_limit

P.S.
As it's my first diff, Siying suggested to add everybody as a reviewers
for this diff. Sorry, if I forgot someone or add someone by mistake.

Test Plan: make all check

Reviewers: haobo, xjin, dhruba, yhchiang, zagfox, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19311
2014-06-26 16:45:27 -07:00
Yueh-Hsuan Chiang
81c5d98900 Fixed a comparison between signed and unsigned integers in options.cc
Summary:
Fixed the following warning:

util/options.cc: In constructor ‘rocksdb::ColumnFamilyOptions::ColumnFamilyOptions(const rocksdb::Options&)’:
util/options.cc:157:58: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
   if (max_bytes_for_level_multiplier_additional.size() < num_levels) {
                                                             ^

Test Plan: make all check

Reviewers: haobo, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19293
2014-06-25 15:31:30 -06:00
sdong
19de6a7aad Remove MemTableRep::GetIterator(const Slice& slice)
Summary: It seems to me that when ever function MemTableRep::GetIterator(const Slice& slice) is used, we can use MemTableRep::GetDynamicPrefixIterator() instead. Just delete it to simplify the codes.

Test Plan: make all check

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, dhruba, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D19281
2014-06-25 14:09:29 -07:00
Yueh-Hsuan Chiang
55531fd089 Fixed heap-buffer-overflow issue when Options.num_levels > 7.
Summary:
Currently, when num_levels has been changed to > 7, internally
it will not resize max_bytes_for_level_multiplier_additional.
As a result, max_bytes_for_level_multiplier_additional.size() will
be smaller than num_levels, which causes heap-buffer-overflow.

Test Plan: make all check

Reviewers: haobo, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19275
2014-06-25 14:11:12 -06:00
Yueh-Hsuan Chiang
8898a0a0d1 Reorder the member variables of FileMetaData to improve cache locality.
Summary:
Move stats related member variables of FileMetaData to the bottom to
improve cache locality of normal DB operations.

Test Plan: make

Reviewers: haobo, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19287
2014-06-24 19:22:11 -06:00
Yueh-Hsuan Chiang
e813f5b6d9 Allow compaction to reclaim storage more effectively.
Summary:
This diff allows compaction to reclaim storage more effectively.
In the current design, compactions are mainly triggered based on
the file sizes.  However, since deletion entries does not have
value, files which have many deletion entries are less likely
to be compacted.  As a result, it may took a while to make
deletion entries to be compacted.

This diff address issue by compensating the size of deletion
entries during compaction process: the size of each deletion
entry in the compaction process is augmented by 2x average
value size.  The diff applies to both leveled and universal
compacitons.

Test Plan:
develop CompactionDeletionTrigger
make db_test
./db_test

Reviewers: haobo, igor, ljin, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19029
2014-06-24 16:37:06 -06:00
Yueh-Hsuan Chiang
faa8d21922 Improve an assertion in RandomGenerator::Generate() in db_bench.
Summary:
RandomGenerator::Generate() currently has an assertion len < data_.size().
However, it is actually fine to have len == data_.size().
This diff change the assertion to len <= data_.size().

Test Plan:
make db_bench
./db_bench

Reviewers: haobo, sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19269
2014-06-24 15:29:28 -06:00
Yueh-Hsuan Chiang
85f9bb4ef4 [Java] Enable compression_ratio option in DbBenchmark.java
Summary:
Enable the random values in Java DB Bench to be generated based
on the compression_ratio specified in the command-line arguments.

Test Plan:
make rocksdbjava
java/jdb_bench.sh

Reviewers: sdong, ankgup87, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19101
2014-06-24 15:12:36 -06:00
Yueh-Hsuan Chiang
e7c9412a93 Merge pull request #181 from nanwu/master
Corrected the output message in mac-install-gflags.sh
2014-06-23 23:55:24 -06:00
nawu
fb54eef744 escaped the special characters and added a dot 2014-06-24 00:14:02 -05:00