Commit Graph

2929 Commits

Author SHA1 Message Date
Venkatesh Radhakrishnan
153f4f0719 RocksDB: Allow Level-Style Compaction to Place Files in Different Paths
Summary:
Allow Level-style compaction to place files in different paths
This diff provides the code for task 4854591. We now support level-compaction
to place files in different paths by specifying  them in db_paths  along with
the minimum level for files to store in that path.

Test Plan: ManualLevelCompactionOutputPathId in db_test.cc

Reviewers: yhchiang, MarkCallaghan, dhruba, yoshinorim, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29799
2014-12-15 21:48:16 -08:00
Igor Canadi
06eed650a0 Optimize default compile to compilation platform by default
Summary:
This diff changes compile to optimize for native platform by default. This will automatically turn on crc32 optimizations for modern processors, which greatly improves rocksdb's performance.

I also did some more changes to compilation documentation.

Test Plan:
compile with `make`, observe -march=native
compile with `PORTABLE=1 make`, observe no -march=native

Reviewers: sdong, rven, yhchiang, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30225
2014-12-15 11:29:41 +01:00
Qiao Yang
cef6f84393 Added 'dump_live_files' command to ldb tool.
Summary:
Priliminary diff to solicit comments.
Given DB path, dump all SST files (key/value and properties), WAL file and manifest
files. What command options do we need to support for this command? Maybe
output_hex for keys?

Test Plan: Create additional ldb unit tests.

Reviewers: sdong, rven

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D29547
2014-12-12 17:50:36 -08:00
sdong
7ab1526c0e Add an assert and avoid std::sort(autovector) to investigate an ASAN issue
Summary:
ASAN build fails once for this error:

14:04:52 ==== Test DBTest.CompactFilesOnLevelCompaction
14:04:52 db_test: db/version_set.cc:1062: void rocksdb::VersionStorageInfo::AddFile(int, rocksdb::FileMetaData*): Assertion `level <= 0 || level_files->empty() || internal_comparator_->Compare( (*level_files)[level_files->size() - 1]->largest, f->smallest) < 0' failed.

Not abling figure out reason. We use std:vector for sorting for save and add one more assert to help figure out whether it is the sorting's problem.

Test Plan: make all check

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D30117
2014-12-12 12:44:00 -08:00
Yueh-Hsuan Chiang
74b3fb6d97 Fix Mac compile errors on util/cache_test.cc
Summary:
Fix Mac compile errors on util/cache_test.cc

Test Plan:
make dbg -j32
./cache_test
2014-12-11 14:15:13 -08:00
sdong
d7a486668c Improve scalability of DB::GetSnapshot()
Summary: Now DB::GetSnapshot() doesn't scale to more column families, as it needs to go through all the column families to find whether snapshot is supported. This patch optimizes it.

Test Plan:
Add unit tests to cover negative cases.
make all check

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30093
2014-12-11 13:27:57 -08:00
Alexey Maykov
ee95cae9a4 Modifed the LRU cache eviction code so that it doesn't evict blocks which have exteranl references
Summary:
Currently, blocks which have more than one reference (ie referenced by something other than cache itself) are evicted from cache. This doesn't make much sense:
- blocks are still in RAM, so the RAM usage reported by the cache is incorrect
- if the same block is needed by another iterator, it will be loaded and decompressed again

This diff changes the reference counting scheme a bit. Previously, if the cache contained the block, this was accounted for in its refcount. After this change, the refcount is only used to track external references. There is a boolean flag which indicates whether or not the block is contained in the cache.
This diff also changes how LRU list is used. Previously, both hashtable and the LRU list contained all blocks. After this change, the LRU list contains blocks with the refcount==0, ie those which can be evicted from the cache.

Note that this change still allows for cache to grow beyond its capacity. This happens when all blocks are pinned (ie refcount>0). This is consistent with the current behavior. The cache's insert function never fails. I spent lots of time trying to make table_reader and other places work with the insert which might failed. It turned out to be pretty hard. It might really destabilize some customers, so finally, I decided against doing this.

table_cache_remove_scan_count_limit option will be unneeded after this change, but I will remove it in the following diff, if this one gets approved

Test Plan: Ran tests, made sure they pass

Reviewers: sdong, ljin

Differential Revision: https://reviews.facebook.net/D25503
2014-12-10 22:28:53 -08:00
sdong
0ab0242f37 VersionBuilder to use unordered set and map to store added and deleted files
Summary: Set operations in VerisonBuilder is shown as a performance bottleneck of restarting DB when there are lots of files. Make both of added_files and deleted_files use unordered set or map. Only when adding the files, sort the added files.

Test Plan: make all check

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: hermanlee4, leveldb, dhruba, ljin

Differential Revision: https://reviews.facebook.net/D30051
2014-12-10 18:53:30 -08:00
Lei Jin
e93f044d99 add range scan test to benchmark script
Summary: as title

Test Plan: ran it

Reviewers: yhchiang, igor, sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D25563
2014-12-10 13:04:58 -08:00
Igor Canadi
cb82d7b081 Fix #434
Summary: Why do we assert here? This doesn't seem like user friendly thing to do :)

Test Plan: none

Reviewers: sdong, yhchiang, rven

Reviewed By: rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30027
2014-12-09 10:22:07 -08:00
sdong
046ba7d47c Fix calculation of max_total_wal_size in db_options_.max_total_wal_size == 0 case
Summary: This is a regression bug introduced by https://reviews.facebook.net/D24729 . max_total_wal_size would be off the target it should be more and more in the case that the a user holds the current super version after flush or compaction. This patch fixes it

Test Plan: make all check

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: ljin, yoshinorim, MarkCallaghan, hermanlee4, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29961
2014-12-08 15:26:35 -08:00
Yueh-Hsuan Chiang
1b7fbb9e82 Update HISTORY.md for release 3.9 2014-12-08 15:19:48 -08:00
Leonidas Galanis
635c61fd3b Fix problem with create_if_missing option when wal_dir is used
Summary: When wal_dir is used, DestroyDB is not passed the wal_dir option and so we get a Corruption exception.

Test Plan:
Verified manually that the following command line works now:
./db_bench --db=/mnt/db/rocksdb ... --disable_wal=0 --wal_dir=/data/users/rocksdb/WAL... --benchmarks=filluniquerandom --use_existing_db=0...

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29859
2014-12-08 12:53:24 -08:00
Yueh-Hsuan Chiang
2871bc7bc8 Merge pull request #422 from fyrz/RocksJava-Quality-Improvements
Rocks java quality improvements
2014-12-05 21:38:05 -08:00
Yueh-Hsuan Chiang
8c5781666e Add -fno-exceptions flag to ROCKSDB_LITE.
Summary: Add -fno-exceptions flag to ROCKSDB_LITE.

Test Plan: make OPT=-DROCKSDB_LITE shared_lib -j32

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29901
2014-12-05 21:34:20 -08:00
sdong
1f04066cab Add DBProperty to return number of snapshots and time for oldest snapshot
Summary:
Add a counter in SnapshotList to show number of snapshots. Also a unix timestamp in every snapshot.
Add two DB Properties to return number of snapshots and timestamp of the oldest one.

Test Plan: Add unit test checking

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, MarkCallaghan

Differential Revision: https://reviews.facebook.net/D29919
2014-12-05 17:07:49 -08:00
Venkatesh Radhakrishnan
6436ba6b06 Provide mechanism to restart tests from previous error
Summary:
While running rocksdb tests, we sometimes encounter errors and
the test run stops. We now provide a new make target call check_some
which restarts the test run from a specific test and continues from
there depending on the value of the environment variable ROCKSDBTESTS_START

Test Plan:
Run make check_some with different values of
ROCKSDBTESTS_START.

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29913
2014-12-05 16:16:56 -08:00
Yueh-Hsuan Chiang
d84b2badeb Replace exception by abort() in dummy HdfsEnv implementation.
Summary: Replace exception by abort() in dummy HdfsEnv implementation.

Test Plan: make dbg -j32

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29895
2014-12-05 13:30:57 -08:00
Igor Canadi
9260e1ad74 Bump version to 3.9 2014-12-05 11:05:24 -08:00
Yueh-Hsuan Chiang
8f4e1c1c9a Remove the compability check on log2 OS_ANDROID as it's already blocked by ROCKSDB_LITE
Summary:
Remove the compability check on log2 OS_ANDROID as it's already blocked by ROCKSDB_LITE

Test Plan:
make OPT="-DROCKSDB_LITE -DOS_ANDROID" shared_lib -j32
make shared_lib -j32
2014-12-04 13:56:14 -08:00
Yueh-Hsuan Chiang
c4a7423c1d Replace runtime_error exception by abort() in thread_local
Summary: Replace runtime_error exception by abort() in thread_local

Test Plan: make dbg -j32

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29853
2014-12-04 13:35:31 -08:00
Yueh-Hsuan Chiang
a94d54aa47 Remove the use of exception in WriteBatch::Handler
Summary:
Remove the use of exception in WriteBatch::Handler.  Now the default
implementations of Put, Merge, and Delete in WriteBatch::Handler are no-op.

Test Plan:
Add three test cases in write_batch_test
./write_batch_test

Reviewers: sdong, igor

Reviewed By: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29835
2014-12-04 12:01:55 -08:00
Yueh-Hsuan Chiang
a5d4fc0a25 Fix compile warning in db_stress
Summary:
Fix compile warning in db_stress

Test Plan:
make db_stress
2014-12-04 11:59:29 -08:00
Yueh-Hsuan Chiang
1a8f4821a7 Replace exception by assertion in autovector
Summary: Replace exception by assertion in autovector

Test Plan: autovector_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29847
2014-12-04 11:41:56 -08:00
Yueh-Hsuan Chiang
97c1940882 Fix compile warning in db_stress.cc on Mac
Summary:
Fix the following compile warning in db_stress.cc on Mac
tools/db_stress.cc:1688:52: error: format specifies type 'unsigned long' but the argument has type '::google::uint64' (aka 'unsigned long long') [-Werror,-Wformat]
    fprintf(stdout, "DB-write-buffer-size: %lu\n", FLAGS_db_write_buffer_size);
                                           ~~~     ^~~~~~~~~~~~~~~~~~~~~~~~~~
                                           %llu

Test Plan:
make
2014-12-04 11:19:12 -08:00
Yueh-Hsuan Chiang
5f719d7202 Replace exception by setting valid_ = false in DBIter::MergeValuesNewToOld()
Summary: Replace exception by setting valid_ = false in DBIter::MergeValuesNewToOld().

Test Plan:
Not sure if I am right at this, but it seems we currently don't have a good
way to test that code path as it requires dynamically set merge_operator = nullptr
at the time while Merge() is calling.

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29811
2014-12-04 11:11:11 -08:00
Mark Callaghan
c0dee851c3 Improve formatting, add missing newlines
Summary:
Improve formatting

Task ID: #

Blame Rev:

Test Plan:
make

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D29829
2014-12-04 10:34:06 -08:00
Igor Canadi
815f638cd0 Fix java build 2014-12-03 19:06:57 -08:00
Mark Callaghan
32a0a03844 Add Moved(GB) to Compaction IO stats
Summary:
Adds counter for bytes moved (files pushed down a level rather than compacted) to compaction
IO stats as Moved(GB). From the output removed these infrequently used columns: RW-Amp, Rn(cnt), Rnp1(cnt),
Wnp1(cnt), Wnew(cnt).
Example old output:
Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) RW-Amp W-Amp Rd(MB/s) Wr(MB/s)  Rn(cnt) Rnp1(cnt) Wnp1(cnt) Wnew(cnt)  Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms) RecordIn RecordDrop
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     0/0          0   0.0      0.0     0.0      0.0    2130.8   2130.8    0.0   0.0      0.0    109.1        0         0         0         0      20002     25068    0.798      28.75     182059    0.16       0          0
  L1   142/0        509   1.0   4618.5  2036.5   2582.0    4602.1   2020.2    4.5   2.3     88.5     88.1    24220    701246   1215528    514282      53466      4229   12.643       0.00          0    0.002032745988  300688729

Example new output:
Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(sec) Stall(cnt) Avg(ms)     RecordIn   RecordDrop
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     7/0         13   1.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     14.7        44       353    0.124       0.03        626    0.05            0            0
  L1     9/0         16   1.6      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000       0.00          0    0.00            0            0

Task ID: #

Blame Rev:

Test Plan:
make check, run db_bench --fillseq --stats_per_interval --stats_interval and look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D29787
2014-12-03 18:28:39 -08:00
Jonah Cohen
a14b7873ee Enforce write buffer memory limit across column families
Summary:
Introduces a new class for managing write buffer memory across column
families.  We supplement ColumnFamilyOptions::write_buffer_size with
ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
instance that enforces memory limits before flushing out to disk.

Test Plan: Added SharedWriteBuffer unit test to db_test.cc

Reviewers: sdong, rven, ljin, igor

Reviewed By: igor

Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim

Differential Revision: https://reviews.facebook.net/D22581
2014-12-02 12:09:20 -08:00
fyrz
3e684aa685 Integrated changes from D29571 2014-12-02 19:56:45 +01:00
Igor Canadi
37d73d597e Fix linters
Summary:
Two fixes:
1. if cpplint is not present on the system, don't return a confusing error in the linter
2. Add include_alpha, which means our includes should be sorted lexicographically

Test Plan: Tried unsorting our includes, lint complained

Reviewers: rven, ljin, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28845
2014-12-02 13:53:39 -05:00
fyrz
a15169f2e9 Fixed a Lint problem 2014-12-02 09:58:20 +01:00
fyrz
b7f9e644cc [RocksJava] Quality improvements
Summary:
- Addressed some FindBugs issues.
- Remove obsolete dbFolder cleanup
- Comparator tests for CF
 - Added AbstractComparatorTest.
 - Fixed a bug in the JNI Part about Java comparators
- Minor test improvements

Test Plan:
make rocksdbjava
make jtest
mvn -f rocksjni.pom package

Reviewers: adamretter, yhchiang, ankgup87

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D29571
2014-12-02 09:58:19 +01:00
fyrz
e002a6122f [RocksJava] Comparator tests for CF
- Added AbstractComparatorTest.
- Fixed a bug in the JNI Part about Java comparators
2014-12-02 09:58:19 +01:00
fyrz
335e6ad5cd [RocksJava] Remove obsolete dbFolder cleanup 2014-12-02 09:58:18 +01:00
fyrz
b036804ac1 RocksJava - FindBugs issues
Addressed some FindBugs issues.
2014-12-02 09:58:17 +01:00
Igor Canadi
9a632b4a92 Merge pull request #429 from fyrz/RocksJava-MacOSX-strip-fix
[RocksJava] MacOSX strip support
2014-12-01 13:13:02 -05:00
fyrz
b426675061 [RocksJava] MacOSX strip support 2014-12-01 19:01:29 +01:00
Igor Canadi
e463cb0bcf Merge pull request #424 from eile/master
Tweak Makefile for building on BG/Q
2014-12-01 10:20:12 -05:00
Stefan Eilemann
91d8981639 Tweak Makefile for building on BG/Q 2014-12-01 09:01:54 +01:00
Haneef Mubarak
c6f31a2893 minor memory leak in C example 2014-11-29 21:42:42 -08:00
Igor Canadi
703ef66a86 Merge pull request #426 from fyrz/RocksJava-Restore-PrecisionFix
[RocksJava] Fixed MacOS build of RocksJava
2014-11-27 20:39:39 -05:00
Haneef Mubarak
ac4ed1e305 fix examples/makefile for C example 2014-11-27 15:20:55 -08:00
Haneef Mubarak
d7f5ccb0c2 add c example to makefile and fix "make clean" 2014-11-27 15:06:12 -08:00
Haneef Mubarak
9c34d5e361 fix type in C simple example 2014-11-27 13:53:04 -08:00
Haneef Mubarak
0a9a7e753c added C version of simple_example 2014-11-27 13:49:19 -08:00
Yueh-Hsuan Chiang
bcf9086899 Block Universal and FIFO compactions in ROCKSDB_LITE
Summary: Block Universal and FIFO compactions in ROCKSDB_LITE

Test Plan:
make shared_lib -j32
make OPT=-DROCKSDB_LITE shared_lib

Reviewers: ljin, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29589
2014-11-26 15:45:11 -08:00
fyrz
67cb7ca758 [RocksJava] Fixed MacOS build of RocksJava
There were still some precision loss problems
remainging in RocksJava. This pull request resolve
these.
2014-11-26 20:53:23 +01:00
Yueh-Hsuan Chiang
b8136a7d27 Merge pull request #398 from fyrz/RocksJava-CreateCheckPoint
[RocksJava] Support for stored snapshots
2014-11-26 11:40:41 -08:00