Commit Graph

1650 Commits

Author SHA1 Message Date
Igor Canadi
aa14670b27 Add an assertion in CompactionPicker
Summary: Reading CompactionPicker I noticed this dangerous substraction of two unsigned integers. We should assert to mark this as safe.

Test Plan: make check

Reviewers: anthony, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37041
2015-04-23 17:46:15 -07:00
Giuseppe Ottaviano
2dc421df48 Implement DB::PromoteL0 method
Summary:
This diff implements a new `DB` method `PromoteL0` which moves all files in L0
to a given level skipping compaction, provided that the files have disjoint
ranges and all levels up to the target level are empty.

This method provides finer-grain control for trivial compactions, and it is
useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change
the semantics of an existing operation, which can impact existing code.

PromoteL0 is designed to work well in combination with the proposed
`GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure
by populating one level at a time. Such fine-grained control can be very useful
for static or mostly-static databases.

Test Plan: `make check`

Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37107
2015-04-23 12:10:36 -07:00
sdong
9bf40b64d0 Print max score in level summary
Summary: Add more logging to help debugging issues.

Test Plan: Run test suites

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37401
2015-04-23 11:34:36 -07:00
sdong
397b6588bd options.paranoid_file_checks to read all rows after writing to a file.
Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows.

Test Plan: Add a new unit test for it

Reviewers: rven, igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37335
2015-04-23 11:34:35 -07:00
Venkatesh Radhakrishnan
618d07b068 Making PreShutdown tests more reliable.
Summary:
A couple of times on Travis, we have had the thread status say that there were no compactions done and since we assert for it, the test failed.
We now fix this by waiting till compaction started.

Test Plan:
run DBTEST::*PreShutdown*

d=/tmp/j; rm -rf $d; seq 200 | parallel --gnu --eta 'd=/tmp/j/d-{}; mkdir -p $d; TEST_TMPDIR=$d ./db_test --gtest_filter=DBTest.PreShutdown* >& '$d'/log-{}'

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37545
2015-04-23 08:35:02 -07:00
Igor Canadi
6059bdf86a Add experimental API MarkForCompaction()
Summary:
Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).

All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.

MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.

Test Plan: added a new unit test

Reviewers: yhchiang, rven, MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37083
2015-04-17 16:44:45 -07:00
Jim Meyering
acf8a4141d maint: use ASSERT_TRUE, not ASSERT_EQ(true; same for false
Summary:
The usage I'm fixing here caused trouble on Fedora 21 when
compiling with the current gcc version 4.9.2 20150212 (Red Hat 4.9.2-6) (GCC):

  db/write_controller_test.cc: In member function ‘virtual void rocksdb::WriteControllerTest_SanityTest_Test::TestBody()’:
  db/write_controller_test.cc:23:165: error: converting ‘false’ to pointer type for argument 1 of ‘char testing::internal::IsNullLiteralHelper(testing::internal::Secret*)’ [-Werror=conversion-null]
     ASSERT_EQ(false, controller.IsStopped());
                                                                                                                                                                          ^

This change was induced mechanically via:

  git grep -l -E 'ASSERT_EQ\(false'|xargs perl -pi -e 's/ASSERT_EQ\(false, /ASSERT_FALSE(/'
  git grep -l -E 'ASSERT_EQ\(true'|xargs perl -pi -e 's/ASSERT_EQ\(true, /ASSERT_TRUE(/'

Except for the three in utilities/backupable/backupable_db_test.cc for which
I ended up reformatting (joining lines) in the result.

As for why this problem is exhibited with that version of gcc, and none
of the others I've used (from 4.8.1 through gcc-5.0.0 and newer), I suspect
it's a bug in F21's gcc that has been fixed in gcc-5.0.0.

Test Plan:
  "make" now succeed on Fedora 21

Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37329
2015-04-17 14:54:17 -07:00
Igor Canadi
b5400f90fe Kill dead code
Summary: this is not used anywhere

Test Plan: compiles

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37053
2015-04-17 12:07:47 -07:00
Igor Canadi
00c2afcd38 Fix bug in ExpandWhileOverlapping()
Summary: If ExpandWhileOverlapping() we don't clear inputs. That's a bug introduced by my recent patch https://reviews.facebook.net/D36687. However, we have no tests covering ExpandWhileOverlapping(). I created a task t6771252 to add ExpandWhileOverlapping() tests.

Test Plan: make check

Reviewers: sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37077
2015-04-16 19:31:10 -07:00
sdong
debaf85ef5 Bug of trivial move of dynamic level
Summary: D36669 introduces a bug that trivial moved data is not going to specific level but the next level, which will incorrectly be level 1 for level 0 compaciton if base level is not level 1. Fixing it by appreciating the output level

Test Plan: Run all tests

Reviewers: MarkCallaghan, rven, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37119
2015-04-14 21:42:08 -07:00
sdong
12d7d3d28d Fix and Improve DBTest.DynamicLevelCompressionPerLevel2
Summary:
Recent change of DBTest.DynamicLevelCompressionPerLevel2 has a bug that the second sync point is not enabled. Fix it. Also add an assert for that.
Also, flush compression is not tracked in the test. Add it.

Test Plan: Build everything

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37101
2015-04-14 21:42:08 -07:00
sdong
a1271c6c6f Fix build break introduced by new SyncPoint interface change
Summary: When commiting the sync point interface change, didn't resolve the new occurance of the old interface in rebase. Fix it.

Test Plan: Build and see it pass

Reviewers: igor, yhchiang, rven, anthony, kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37095
2015-04-14 16:42:37 -07:00
sdong
fcb206b667 SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward
Summary:
Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests.
Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug.

Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36999
2015-04-14 16:18:50 -07:00
Igor Canadi
281db8bb62 Temporarily disable test CompactFilesOnLevelCompaction
Summary: https://reviews.facebook.net/D36963 made the debug build much faster and that triggered failures of CompactFilesOnLevelCompaction test. 3 out of 4 last tests on Jenkins failed. I'm disabling this test temporarily, since we likely know the reason why it's failing and there's already work in progress to address it -- https://reviews.facebook.net/D36225

Test Plan: none

Reviewers: sdong, rven, yhchiang, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36993
2015-04-13 19:30:40 -07:00
Igor Canadi
9b983befa8 Fix flakiness of WalManagerTest
Summary: We should use mocked-out env for these tests to make it more realiable. Added benefit is that instead of actually sleeping for 3 seconds, we can instead pretend to sleep and just increase time counters.

Test Plan: for i in `seq 100`; do ./wal_manager_test --gtest_filter=WalManagerTest.WALArchivalTtl ;done

Reviewers: rven, meyering

Reviewed By: meyering

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36951
2015-04-13 16:15:05 -07:00
Igor Canadi
e7ad14926a Fix flakiness in FIFOCompaction test (github issue #573)
Summary:
The problem is that sometimes two memtables will be compacted together into a single file. In that case, our assertion

        ASSERT_EQ(NumTableFilesAtLevel(0), 5);

fails because same amount of data is in 4 files instead of 5. We should wait for flush so that we prevent two memtables merging into a single file.

Test Plan: `for i in `seq 20`; do mrtest FIFOCompactionTest; done` -- fails at least once before. fails zero times after.

Reviewers: rven

Reviewed By: rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36939
2015-04-13 11:39:45 -07:00
Igor Canadi
abb4052278 Kill benchharness
Summary:
1. it doesn't work
2. we're not using it

In the future, if we need general benchmark framework, we should probably use https://github.com/google/benchmark

Test Plan: make all

Reviewers: yhchiang, rven, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36777
2015-04-13 10:17:42 -07:00
Igor Canadi
590fadc407 Fix compile warning on CLANG
Summary: oops

Test Plan: compiles now

Reviewers: sdong, yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36867
2015-04-10 15:14:57 -07:00
Igor Canadi
47b8743984 Make Compaction class easier to use
Summary:
The goal of this diff is to make Compaction class easier to use. This should also make new compaction algorithms easier to write (like CompactFiles from @yhchiang and dynamic leveled and multi-leveled universal from @sdong).

Here are couple of things demonstrating that Compaction class is hard to use:
1. we have two constructors of Compaction class
2. there's this thing called grandparents_, but it appears to only be setup for leveled compaction and not compactfiles
3. it's easy to introduce a subtle and dangerous bug like this: D36225
4. SetupBottomMostLevel() is hard to understand and it shouldn't be. See this comment: afbafeaeae/db/compaction.cc (L236-L241). It also made it harder for @yhchiang to write CompactFiles, as evidenced by this: afbafeaeae/db/compaction_picker.cc (L204-L210)

The problem is that we create Compaction object, which holds a lot of state, and then pass it around to some functions. After those functions are done mutating, then we call couple of functions on Compaction object, like SetupBottommostLevel() and MarkFilesBeingCompacted(). It is very hard to see what's happening with all that Compaction's state while it's travelling across different functions. If you're writing a new PickCompaction() function you need to try really hard to understand what are all the functions you need to run on Compaction object and what state you need to setup.

My proposed solution is to make important parts of Compaction immutable after construction. PickCompaction() should calculate compaction inputs and then pass them onto Compaction object once they are finalized. That makes it easy to create a new compaction -- just provide all the parameters to the constructor and you're done. No need to call confusing functions after you created your object.

This diff doesn't fully achieve that goal, but it comes pretty close. Here are some of the changes:
* have one Compaction constructor instead of two.
* inputs_ is constant after construction
* MarkFilesBeingCompacted() is now private to Compaction class and automatically called on construction/destruction.
* SetupBottommostLevel() is gone. Compaction figures it out on its own based on the input.
* CompactionPicker's functions are not passing around Compaction object anymore. They are only passing around the state that they need.

Test Plan:
make check
make asan_check
make valgrind_check

Reviewers: rven, anthony, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: sdong, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36687
2015-04-10 15:01:54 -07:00
agiardullo
753dd1fdd0 Fix valgrind issues in memtable_list_test
Summary: Need to remember to unref MemTableList->current() before deleting.

Test Plan: ran test with valgrind

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36855
2015-04-10 14:16:03 -07:00
krad
697380f3d7 Repairer documentation improvement.
Summary: Adding verbosity to existing comments.

Test Plan: None

Reviewers: sdong

CC: leveldb

Task ID: #6718960

Blame Rev:
2015-04-10 12:35:28 -07:00
agiardullo
0feeee6433 Fix memtable_list_test
Summary:
Test failing due to a missing directory caused by a simple bug (did not run into this on my dev box since the path already existed).

We should look into deleting test::TmpDir() before each test run.

Test Plan: ran test

Reviewers: igor, yhchiang, meyering, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36831
2015-04-09 22:11:35 -07:00
Yueh-Hsuan Chiang
7b9581bc3b Fixed xfunc related compile errors in ROCKSDB_LITE
Summary:
Fixed xfunc related compile errors in ROCKSDB_LITE

Now make OPT=-DROCKSDB_LITE shared_lib -j32 would work

Test Plan:
make clean
make OPT=-DROCKSDB_LITE shared_lib -j32
make clean
make OPT=-DROCKSDB_LITE static_lib -j32

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36825
2015-04-09 21:05:18 -07:00
agiardullo
fabc115690 MemTableList tests
Summary: Add tests for MemTableList

Test Plan: run test

Reviewers: yhchiang, kradhakrishnan, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36735
2015-04-09 18:01:11 -07:00
Yueh-Hsuan Chiang
9741dec0e5 Fix a compile error in ROCKSDB_LITE in db/db_impl.cc
Summary:
Fix a compile error in ROCKSDB_LITE in db/db_impl.cc
related to internal_stats.

Test Plan: make OPT=-DROCKSDB_LITE shared_lib

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36819
2015-04-09 17:07:29 -07:00
Yueh-Hsuan Chiang
d2a056241a Fix a compilation error in ROCKSDB_LITE in db/internal_stats.h
Summary:
Fix a compilation error in ROCKSDB_LITE in db/internal_stats.h

Other compilation errors will be fixed in a separate diff.

Test Plan: make OPT=-DROCKSDB_LITE

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36807
2015-04-09 16:43:54 -07:00
sdong
316ec80bf8 fault_injection_test: add a test case to cover log syncing after a log roll
Summary:
Add a test case:
Write some keys without sync, flush, write other keys and do sync. Before flush finishes, host crashes and unsync data is dropped.
Tag the new test as disabled since it is not passing.

Test Plan: Run the test

Reviewers: MarkCallaghan, rven, anthony, igor, kradhakrishnan

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36741
2015-04-09 16:15:42 -07:00
Mark Callaghan
ed229a0dee Fixes for readcache-flashcache
Summary:
This fixes two problems:
1) the env should not be created twice when use_existing_db is false
2) the env dtor should run before cachedev_fd_ is closed.

Task ID: #

Blame Rev:

Test Plan:
Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36795
2015-04-09 15:51:34 -07:00
agiardullo
84c5bd7eb9 Add thread-safety documentation to MemTable and related classes
Summary: Other than making some class members private, this is a documentation-only change

Test Plan: unit tests

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36567
2015-04-08 21:10:35 -07:00
krad
2b019a1512 Enabling checksum in repair db as it should have been.
Summary: I think the checksum was turned off by mistake.

Test Plan: Run make check

Reviewers: igor sdong chip

CC:

Task ID:

Blame Rev:
2015-04-08 15:52:02 -07:00
sdong
b1bbdd7919 Create EnvOptions using sanitized DB Options
Summary: Now EnvOptions uses unsanitized DB options. bytes_per_sync is tuned off when rate_limiter is used, but this change doesn't take effort.

Test Plan: See different I/O pattern in db_bench running fillseq.

Reviewers: yhchiang, kradhakrishnan, rven, anthony, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36723
2015-04-08 14:40:42 -07:00
sdong
b118238a57 Trivial move to cover multiple input levels
Summary: Now trivial move is only triggered when moving from level n to n+1. With dynamic level base, it is possible that file is moved from level 0 to level n, while levels from 1 to n-1 are empty. Extend trivial move to this case.

Test Plan: Add a more unit test of sequential loading. Non-trivial compaction happened without the patch and now doesn't happen.

Reviewers: rven, yhchiang, MarkCallaghan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, IslamAbdelRahman

Differential Revision: https://reviews.facebook.net/D36669
2015-04-08 09:26:40 -07:00
krad
58346b9e29 Log writer record format doc.
Summary: Added a ASCII doodle to represent the log writer format.

Test Plan: None

Reviewers: sdong

CC: leveldb

Task ID: 6179896

Blame Rev:
2015-04-07 16:25:56 -07:00
Yoshinori Matsunobu
f12614070f Fix TSAN build error of D36447
Summary:
D36447 caused build error when using COMPILE_WITH_TSAN=1.
This diff fixes the error.

Test Plan: jenkins

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36579
2015-04-06 17:37:36 -07:00
Yoshinori Matsunobu
824e646341 Adding another NewFlashcacheAwareEnv function to support pre-opened fd
Summary:
There are some cases when flachcache file descriptor was
already allocated (i.e. fb-MySQL). Then NewFlashcacheAwareEnv returns an
error at open() because fd was already assigned. This diff adds another
function to instantiate FlashcacheAwareEnv, with pre-allocated fd cachedev_fd.

Test Plan: Tested with MyRocks using this function, then worked

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, MarkCallaghan, rven

Differential Revision: https://reviews.facebook.net/D36447
2015-04-06 16:50:36 -07:00
Igor Canadi
5e067a7b19 Clean up compression logging
Summary: Now we add warnings when user configures compression and the compression is not supported.

Test Plan:
Configured compression to non-supported values. Observed messages in my log:

    2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2.

    2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data.

Reviewers: rven, sdong, yhchiang

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35979
2015-04-06 12:50:44 -07:00
sdong
953a885ebf A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge
Summary:
Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.

Also refactor the codes so that
(1) make table property collector and internal table property collector two separate data structures with the later one now exposed
(2) table builders only receive internal table properties

Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.

Reviewers: yhchiang, igor.sugak, rven, igor

Reviewed By: rven, igor

Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D35373
2015-04-06 10:27:21 -07:00
Jim Meyering
d2a92c13bc avoid returning a number-of-active-keys estimate of nearly 2^64
Summary:
If accumulated_num_non_deletions_ were ever smaller than
accumulated_num_deletions_, the computation of
"accumulated_num_non_deletions_ - accumulated_num_deletions_"
would result in a logically "negative" value, but since
the two operands are unsigned (uint64_t), the result corresponding
to e.g., -1 would 2^64-1.

Instead, return 0 in that case.

Test Plan:
  - ensure "make check" still passes
  - temporarily add an "abort();" call in the new "if"-block, and
      observe that it fails in some test cases.  However, note that
      this case is triggered only when the two numbers are equal.
      Thus, no test case triggers the erroneous behavior this
      change is designed to avoid. If anyone can construct a
      scenario in which that bug would be triggered, I'll be
      happy to add a test case.

Reviewers: ljin, igor, rven, igor.sugak, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36489
2015-04-03 14:46:35 -07:00
sdong
a7ac6cef1f Fix level size overflow for options_.level_compaction_dynamic_level_bytes=true
Summary: Int is used for level size targets when options_.level_compaction_dynamic_level_bytes=true, which will cause overflow when database grows big. Fix it.

Test Plan: Add a new unit test which fails without the fix.

Reviewers: rven, yhchiang, MarkCallaghan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D36453
2015-04-03 09:04:35 -07:00
sdong
089509b847 db_test: clean up sync points in test cleaning up
Summary: In some db_test tests sync points are not cleared which will cause unexpected results in the next tests. Clean them up in test cleaning up.

Test Plan:
Run the same tests that used to fail:

build using USE_CLANG=1 and run
./db_test --gtest_filter="DBTest.CompressLevelCompaction:*DBTestUniversalCompactionParallel*"

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36429
2015-04-02 16:17:58 -07:00
Venkatesh Radhakrishnan
afbafeaeae Disallow trivial move if compression level is different
Summary:
Check compression level of start_level with output_compression
before allowing trivial move

Test Plan: New DBTest CompressLevelCompactionThirdPath added

Reviewers: igor, yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36213
2015-04-02 11:06:30 -07:00
Venkatesh Radhakrishnan
d0695f3e26 Fix crash caused by opening an empty DB in readonly mode
Summary:
This diff fixes a crash found when an empty database is opened in readonly mode.
We now check the number of levels before we open the DB as a compacted DB.

Test Plan: DBTest.EmptyCompactedDB

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36327
2015-04-01 16:55:08 -07:00
Herman Lee
51c8133a72 Fix make unity build compiler warning about "stats" shadowing global variable
Summary:
Fix the make unity build. The local stats variable name was shadowing a
global stats variable.

Test Plan:
Run the build
OPT=-DTRAVIS V=1 make unity

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36285
2015-04-01 10:48:42 -07:00
sdong
76d63b4525 Fix one non-determinism of DBTest.DynamicCompactionOptions
Summary:
After recent change of DBTest.DynamicCompactionOptions, occasionally hit another non-deterministic case where L0 showdown is triggered while timeout should not triggered for hard limit.
Fix it by increasing L0 slowdown trigger at the same time.

Test Plan: Run the failed test.

Reviewers: igor, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36219
2015-03-30 15:53:44 -07:00
sdong
b23bbaa82a Universal Compactions with Small Files
Summary:
With this change, we use L1 and up to store compaction outputs in universal compaction.
The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.

If options.num_levels=1, it behaves all the same as now.

Test Plan:
1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
3) add unit test to migrate from multiple level setting back to one level setting
4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.

Reviewers: rven, kradhakrishnan, yhchiang, igor

Reviewed By: igor

Subscribers: meyering, leveldb, MarkCallaghan, dhruba

Differential Revision: https://reviews.facebook.net/D34539
2015-03-30 15:12:02 -07:00
Igor Canadi
2511b7d947 Makefile minor cleanup
Summary:
Just couple of small changes:
1. removed signal_test, since it doesn't seem useful and we don't even run it as part of `make check`
2. moved perf_context_test to TESTS instead of PROGRAMS
3. `make release` probably shouldn't compile benchmarks. We currently rely on `make release` building db_bench (via Jenkins), so I left db_bench there.

This is just a minor cleanup. We need to rethink our targets since they are a bit messy right now. We can do this during our tech debt week.

Test Plan: make release

Reviewers: anthony, rven, yhchiang, sdong, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36171
2015-03-30 16:05:35 -04:00
Mark Callaghan
1bd70fb54a Add --stats_interval_seconds to db_bench
Summary:
The --stats_interval_seconds determines interval for stats reporting
and overrides --stats_interval when set. I also changed tools/benchmark.sh
to report stats every 60 seconds so I can avoid trying to figure out a
good value for --stats_interval per test and per storage device.

Task ID: #6631621

Blame Rev:

Test Plan:
run tools/run_flash_bench, look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36189
2015-03-30 12:58:32 -07:00
Igor Canadi
fd3dbef22b Clean up old log files in background threads
Summary:
Cleaning up log files can do heavy IO, since we call ftruncate() in the destructor. We don't want to call ftruncate() in user threads.

This diff moves cleaning to background threads (flush and compaction)

Test Plan: make check, will also run valgrind

Reviewers: yhchiang, rven, MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36177
2015-03-30 15:04:10 -04:00
Mark Callaghan
99ec2412e5 Make the benchmark scripts configurable and add tests
Summary:
This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
makes the duration per-test configurable and refactors the test scripts.

Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
the writer thread does Merge rather than Put.

Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
Sometimes automation and humans want different format.

Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
stats in the output at test end.

Adds the average ingest rate to compaction IO stats. Output now looks like:
https://gist.github.com/mdcallag/2bd64d18be1b93adc494

More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1

For benchmark.sh changes default RocksDB configuration to reduce stalls:
* min_level_to_compress from 2 to 3
* hard_rate_limit from 2 to 3
* max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
* L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)

Task ID: #6596829

Blame Rev:

Test Plan:
run tools/run_flash_bench.sh

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36075
2015-03-30 11:28:25 -07:00
Igor Canadi
d61cb0b9de db_bench can now disable flashcache for background threads
Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code.

Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35391
2015-03-30 09:51:11 -07:00
Herman Lee
e018892bb6 Formalize the DB properties string definitions.
Summary:
Assign the string properties to const string variables under the
DB::Properties namespace. This helps catch typos during compilation and
also consolidates the property definition in one place.

Test Plan: Run rocksdb unit tests

Reviewers: sdong, yoshinorim, igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35991
2015-03-27 14:50:20 -07:00
Igor Canadi
030859eb5d Dump compression info on startup
Summary: It's useful to know if we have compression support or no

Test Plan:
Observed this in my LOG:

      2015/03/26-10:34:35.460681 7f5b322b7840 Snappy supported
      2015/03/26-10:34:35.460682 7f5b322b7840 Zlib supported
      2015/03/26-10:34:35.460686 7f5b322b7840 Bzip supported
      2015/03/26-10:34:35.460687 7f5b322b7840 LZ4 NOT supported

Reviewers: sdong, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35955
2015-03-26 11:22:20 -07:00
Alexander.Mikhaylov
a3e4b32483 fix compilation error (same as fix #284)
[maa@srv2-nskb-devg2 rocksdb-master]$ CXX=/usr/local/CC/gcc-4.7.4/bin/g++ EXTRA_CXXFLAGS=-std=c++11 DISABLE_WARNING_AS_ERROR=1  make db_bench
  CC       db/db_bench.o
db/db_bench.cc: In member function 'rocksdb::Slice rocksdb::Benchmark::AllocateKey(std::unique_ptr<const char []>*)':
db/db_bench.cc:1434:41: error: use of deleted function 'void std::unique_ptr<_Tp [], _Dp>::reset(_Up) [with _Up = char*; _Tp = const char; _Dp = std::default_delete<const char []>]'
In file included from /usr/local/CC/gcc-4.7.4/lib/gcc/x86_64-unknown-linux-gnu/4.7.4/../../../../include/c++/4.7.4/memory:86:0,
                 from ./include/rocksdb/db.h:14,
                 from ./db/dbformat.h:14,
                 from ./db/db_impl.h:21,
                 from db/db_bench.cc:33:
2015-03-26 14:53:42 +06:00
Anurag Indu
3d1a924ff3 Adding stats for the merge and filter operation
Summary:
We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.

Test Plan: WIP

Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34377
2015-03-24 14:42:04 -07:00
Yueh-Hsuan Chiang
248c063ba1 Report elapsed time in micros in ThreadStatus instead of start time.
Summary:
Report elapsed time of a thread operation in micros in ThreadStatus
instead of start time of a thread operation in seconds since the
Epoch, 1970-01-01 00:00:00 (UTC).

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10

Sample Output:
            ThreadID ThreadType                    cfName    Operation  ElapsedTime                                         Stage        State
     140667724562496   High Pri column_family_name_000002        Flush   772.419 ms                    FlushJob::WriteLevel0Table
     140667728756800   High Pri                   default        Flush   617.845 ms                    FlushJob::WriteLevel0Table
     140667732951104   High Pri column_family_name_000005        Flush   772.078 ms                    FlushJob::WriteLevel0Table
     140667875557440    Low Pri column_family_name_000008   Compaction  1409.216 ms                        CompactionJob::Install
     140667737145408    Low Pri
     140667749728320    Low Pri
     140667816837184    Low Pri column_family_name_000007   Compaction  1071.815 ms      CompactionJob::ProcessKeyValueCompaction
     140667787477056    Low Pri column_family_name_000009   Compaction   772.516 ms      CompactionJob::ProcessKeyValueCompaction
     140667741339712    Low Pri
     140667758116928    Low Pri column_family_name_000004   Compaction   620.739 ms      CompactionJob::ProcessKeyValueCompaction
     140667753922624    Low Pri
     140667842003008    Low Pri column_family_name_000006   Compaction  1260.079 ms      CompactionJob::ProcessKeyValueCompaction
     140667745534016    Low Pri

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35769
2015-03-24 11:32:25 -07:00
Yueh-Hsuan Chiang
a057bb2a8e Improve ThreadStatusSingleCompaction
Summary:
Improve ThreadStatusSingleCompaction in two ways:
1. Use SYNC_POINT to ensure compaction won't happen
   before the test finishes its "Put Phase" instead of
   using sleep.
2. In Put Phase, it continues until we have sufficient
   number of L0 files.  Note that during the put phase,
   there won't be any compaction that consumes L0 files
   because of item 1.

Test Plan: ./db_test  --gtest_filter="*ThreadStatusSingleCompaction*"

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35727
2015-03-23 15:30:45 -07:00
sdong
38d286f146 Clean-up WAL directory before running db_test
Summary: DBTest doesn't clean up wal directory. It might cause failure after a failure test run. Fix it.

Test Plan:
Run unit tests
Try open DB with non-empty db_path/wal.

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D35559
2015-03-20 12:34:24 -07:00
Igor Sugak
9405b5ef8f rocksdb: Remove #include "util/string_util.h" from util/testharness.h
Summary:
1. Manually deleted #include "util/string_util.h" from util/testharness.h
2.
```
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -E 'say $F[0] if /: error:/' build.log | sort -u | xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"'
```

Test Plan:
Make sure make all completes with no errors.
```
% make all -j55
```

Reviewers: meyering, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35493
2015-03-19 17:29:37 -07:00
Igor Canadi
b088c83e6e Don't delete files when column family is dropped
Summary:
To understand the bug read t5943287 and check out the new test in column_family_test (ReadDroppedColumnFamily), iter 0.

RocksDB contract allowes you to read a drop column family as long as there is a live reference. However, since our iteration ignores dropped column families, AddLiveFiles() didn't mark files of a dropped column families as live. So we deleted them.

In this patch I no longer ignore dropped column families in the iteration. I think this behavior was confusing and it also led to this bug. Now if an iterator client wants to ignore dropped column families, he needs to do it explicitly.

Test Plan: Added a new unit test that is failing on master. Unit test succeeds now.

Reviewers: sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32535
2015-03-19 17:04:29 -07:00
Igor Canadi
52e0f3353f Clean up compactions_in_progress_
Summary: Suprisingly, the only way we use this vector is to keep track of level0 compactions. Thus, I simplified it.

Test Plan: make check

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35313
2015-03-18 18:25:15 -07:00
Igor Sugak
6b626ff24c rocksdb: change db_test::MultiThreadedDBTest as value parameterized test.
Summary: This is a simple change to make db_test::MultiThreadedDBTest as value parameterized test. There is a value of creating a separate set of such tests later.

Test Plan:
```lang=bash
% make db_test
% ./make db_test
```

Also with the following command I can execute all db_test in 2:37.87 on my box
```
% ./db_test --gtest_list_tests | sed 's/\# GetParam.*//' | tr -d ' ' | env time parallel --gnu --eta --joblog=LOG -- 'TEST_TMPDIR=/dev/shm/rocksdb-{} ./db_test --gtest_filter="*{}"'
```

Reviewers: igor, rven, meyering, sdong

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35361
2015-03-18 18:18:12 -07:00
sdong
0831a35994 Add a DB Property For Number of Deletions in Memtables
Summary: Add a DB property for number of deletions in memtables. It can sometimes help people debug slowness because of too many deletes.

Test Plan: Add test cases.

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D35247
2015-03-18 17:03:59 -07:00
Mark Callaghan
dfccc7b4e2 Add readwhilemerging benchmark
Summary:
This is like readwhilewriting but uses Merge rather than Put in the writer thread.
I am using it for in-progress benchmarks. I don't think the other benchmarks for Merge
cover this behavior. The purpose for this test is to measure read performance when
readers might have to merge results. This will also benefit from work-in-progress
to add skewed key generation.

Task ID: #

Blame Rev:

Test Plan:
Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35115
2015-03-18 13:50:52 -07:00
agiardullo
81345b90f9 Create an abstract interface for write batches
Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class.  This makes it easier to write code that is agnostic toward the implementation of the particular write batch.  In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch.

Test Plan: modified existing WriteBatchWithIndex tests to test new functions.  Running all tests.

Reviewers: igor, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34017
2015-03-17 19:23:08 -07:00
Igor Canadi
c88ff4ca76 Deprecate removeScanCountLimit in NewLRUCache
Summary: It is no longer used by the implementation, so we should also remove it from the public API.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34971
2015-03-17 15:04:37 -07:00
Igor Sugak
b4b69e4f77 rocksdb: switch to gtest
Summary:
Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different.
In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest.

There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides.

```lang=bash
% cat ~/transform
#!/bin/sh
files=$(git ls-files '*test\.cc')
for file in $files
do
  if grep -q "rocksdb::test::RunAllTests()" $file
  then
    if grep -Eq '^class \w+Test {' $file
    then
      perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file
      perl -pi -e 's/^(TEST)/${1}_F/g' $file
    fi
    perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file
    perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file
  fi
done
% sh ~/transform
% make format
```

Second iteration of this diff contains only scripted changes.

Third iteration contains manual changes to fix last errors and make it compilable.

Test Plan:
Build and notice no errors.
```lang=bash
% USE_CLANG=1 make check -j55
```
Tests are still testing.

Reviewers: meyering, sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 14:08:00 -07:00
Venkatesh Radhakrishnan
98c37fda5d Remove unused parameter in CancelAllBackgroundWork
Summary: Some suggestions for cleanup from Igor.

Test Plan: Regression tests.

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35169
2015-03-16 21:07:54 -07:00
Igor Sugak
9fd6edf81c rocksdb: Replace ASSERT* with EXPECT* in functions that does not return void value
Summary:
gtest does not use exceptions to fail a unit test by design, and `ASSERT*`s are implemented using `return`. As a consequence we cannot use `ASSERT*` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement | 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes.

In order to detect all existing `ASSERT*` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases.

In `util/testharness.h` I defined `EXPECT*` assertions, the same way as `ASSERT*`, and redefined `ASSERT*` to return `void`. Then executed:

```lang=bash
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if  /: error:/' \
build.log | xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number'
% make format
```
After that I reverted back change to `ASSERT*` in `util/testharness.h`. But preserved introduced `EXPECT*`, which is the same as `ASSERT*`. This will be deleted once switched to gtest.

This diff is independent and contains manual changes only in `util/testharness.h`.

Test Plan:
Make sure all tests are passing.
```lang=bash
% USE_CLANG=1 make check
```

Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33333
2015-03-16 20:52:32 -07:00
Venkatesh Radhakrishnan
b2b3086524 Speed up rocksDB close call.
Summary:
On RocksDB, when there are multiple instances doing
flushes/compactions in the background, the close call takes a long time
because the flushes/compactions need to complete before the database can
shut down. If another instance is using the background threads and the compaction for this instance is in the queue since it has been scheduled, we still cannot shutdown. We now remove the scheduled background tasks which have not yet started running, so that shutdown is speeded up.

Test Plan: DB Test added.

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33741
2015-03-16 18:49:14 -07:00
Igor Sugak
95344346af rocksdb: Small refactoring before migrating to gtest
Summary: These changes are necessary to make tests look more generic, and avoid feature conflicts with gtest.

Test Plan:
Make sure no build errors, and all test are passing.
```
% make check
```

Reviewers: igor, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35145
2015-03-16 18:08:59 -07:00
Mark Callaghan
56337faf3e Fix compaction IO stats to handle large file counts
Summary:
The output did not have space for 6-digit file counts or for 3-digit
counts of files being compacted. This adds space for that while preserving
existing alignment. See https://gist.github.com/mdcallag/0a61c6a18dd467224c11

Task ID: #

Blame Rev:

Test Plan:
run db_bench, look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35091
2015-03-16 11:50:23 -07:00
Igor Canadi
c6967a1a5e Make RecordIn/RecordOut human readable
Summary: I had hard time understanding these big numbers. Here's how the output looks like now: https://gist.github.com/igorcanadi/4c39c17685049584a992

Test Plan: db_bench

Reviewers: sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35073
2015-03-14 15:12:41 -07:00
Mark Callaghan
c8da670325 Stop printing per-level stall times.
Summary:
Per-level stall times are the suggested stall time, not the actual stall time so this change stops printing them
both in the per-level output lines and in the summary. Also changed output for total stall time to include units
in all cases. The new output looks like:
Level   Files   Size(MB) Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) Stall(cnt)    RecordIn   RecordDrop
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0     4/1          7   0.8      0.0     0.0      0.0       0.6      0.6       0.0   0.0      0.0     12.9        50       352    0.141        882            0            0
  L1     5/0          9   0.9      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
  L2    54/0         99   1.0      0.0     0.0      0.0       0.0      0.0       0.6   0.0      0.0      0.0         0         0    0.000          0            0            0
  L3   289/0        527   0.5      0.0     0.0      0.0       0.0      0.0       0.5   0.0      0.0      0.0         0         0    0.000          0            0            0
 Sum   352/1        642   0.0      0.0     0.0      0.0       0.6      0.6       1.7   1.0      0.0     12.9        50       352    0.141        882            0            0
 Int     0/0          0   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     15.5         0         3    0.118          7            0            0
Flush(GB): accumulative 0.627, interval 0.005
Stalls(count): 0 level0_slowdown, 0 level0_numfiles, 882 memtable_compaction, 0 leveln_slowdown_soft, 0 leveln_slowdown_hard

Task ID: #6493861

Blame Rev:

Test Plan:
run db_bench, look at output

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35085
2015-03-14 15:01:43 -07:00
Yueh-Hsuan Chiang
12134139e3 Fixed the unit-test issue in PreShutdownCompactionMiddle
Summary: Fixed the unit-test issue in PreShutdownCompactionMiddle

Test Plan: export ROCKSDB_TESTS=PreShutdownCompactionMiddle

Reviewers: rven, sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35061
2015-03-14 08:25:27 -07:00
Yueh-Hsuan Chiang
fd1b3f385a Fix the issue in PreShutdownMultipleCompaction
Summary: Fix the issue in PreShutdownMultipleCompaction

Test Plan:
export ROCKSDB_TESTS=PreShutdownMultipleCompaction
./db_test

Reviewers: rven, sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35055
2015-03-14 08:03:02 -07:00
Igor Canadi
417367c42d Fix SIGSEGV when not using cache 2015-03-13 16:41:00 -07:00
Venkatesh Radhakrishnan
e25ff039c8 Prevent slowdowns and stalls in PreShutdown tests
Summary:
The preshutdown tests check for stopped compactions/flushes.
Removing stalls on the write path.

Test Plan: DBTests.PreShutdown*

Reviewers: yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35037
2015-03-13 14:51:40 -07:00
Igor Canadi
f690712652 Speed up db_bench shutdown
Summary: See t6489044

Test Plan: compiles

Reviewers: MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34977
2015-03-13 14:45:15 -07:00
Yueh-Hsuan Chiang
c1b3cde18a Improve the robustness of ThreadStatusSingleCompaction
Summary:
Improve the robustness of ThreadStatusSingleCompaction
by ensuring the number of files flushed in the test.

Test Plan:
export ROCKSDB_TESTS=ThreadStatus
./db_test

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35019
2015-03-13 13:16:53 -07:00
Yueh-Hsuan Chiang
8c12426c93 Fix the deadlock issue in ThreadStatusSingleCompaction.
Summary:
Fix the deadlock issue in ThreadStatusSingleCompaction.

In the previous version of ThreadStatusSingleCompaction, the compaction
thread will wait for a SYNC_POINT while its db_mutex is held.  However,
if the test hasn't finished its Put cycle while a compaction is running,
a deadlock will happen in the test.

Test Plan:
export ROCKSDB_TESTS=ThreadStatus
./db_test

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35001
2015-03-13 12:53:00 -07:00
sdong
b16ead531d DBTest.DynamicLevelCompressionPerLevel should not run without snappy support
Summary: The test depends on snappy to be used. Skip the test if it is not supported.

Test Plan: Run the test

Reviewers: meyering, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D34995
2015-03-13 11:26:17 -07:00
Yueh-Hsuan Chiang
a5e60bafc2 Fix a typo / test failure in ThreadStatusSingleCompaction
Summary:
Fix a typo / test failure in ThreadStatusSingleCompaction

Test Plan:
export ROCKSDB_TESTS=ThreadStatus
./db_test
2015-03-13 11:20:17 -07:00
Igor Canadi
cb2c91850c Don't run some tests is snappy is not present
Summary: Currently, we have `ifdef SNAPPY` around bunch of db_test code. Some tests that don't even use compression are also blocked when running system doesn't have snappy. This also causes hard-to-catch bugs, like D34983. We should dynamically figure out if compression is supported or not.

Test Plan: compiles

Reviewers: sdong, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34989
2015-03-13 11:08:50 -07:00
Yueh-Hsuan Chiang
c594b0e89d Allow GetThreadList() to report operation stage.
Summary: Allow GetThreadList() to report operation stage.

Test Plan:
  ./thread_list_test
  ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
    --max_background_compactions=10 --max_background_flushes=3 \
    --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
    --num_column_families=10

  export ROCKSDB_TESTS=ThreadStatus
  ./db_test

Sample output
          ThreadID ThreadType                    cfName    Operation        OP_StartTime    ElapsedTime                                         Stage        State
   140116265861184    Low Pri
   140116270055488    Low Pri
   140116274249792   High Pri column_family_name_000005        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116400078912    Low Pri column_family_name_000004   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116358135872    Low Pri column_family_name_000006   Compaction 2015/03/10-14:58:10           1 us     CompactionJob::FinishCompactionOutputFile
   140116341358656    Low Pri
   140116295221312   High Pri                   default        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116324581440    Low Pri column_family_name_000009   Compaction 2015/03/10-14:58:11           0 us      CompactionJob::ProcessKeyValueCompaction
   140116278444096    Low Pri
   140116299415616    Low Pri column_family_name_000008   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116291027008   High Pri column_family_name_000001        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116286832704    Low Pri column_family_name_000002   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116282638400    Low Pri

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34683
2015-03-13 10:45:40 -07:00
Igor Canadi
52d8347a91 EventLogger
Summary:
Here's my proposal for making our LOGs easier to read by machines.

The idea is to dump all events as JSON objects. JSON is easy to read by humans, but more importantly, it's easy to read by machines. That way, we can parse this, load into SQLite/mongo and then query or visualize.

I started with table_create and table_delete events, but if everybody agrees, I'll continue by adding more events (flush/compaction/etc etc)

Test Plan:
Ran db_bench. Observed:
2015/01/15-14:13:25.788019 1105ef000 EVENT_LOG_v1 {"time_micros": 1421360005788015, "event": "table_file_creation", "file_number": 12, "file_size": 1909699}
2015/01/15-14:13:25.956500 110740000 EVENT_LOG_v1 {"time_micros": 1421360005956498, "event": "table_file_deletion", "file_number": 12}

Reviewers: yhchiang, rven, dhruba, MarkCallaghan, lgalanis, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31647
2015-03-13 10:15:54 -07:00
Islam AbdelRahman
9d22a1f136 Allow negative Wnew
Summary:
we are using uint64_t for Wnew this is not correct since this value can be negative
https://github.com/facebook/rocksdb/issues/535

Test Plan: run db_bench and check what happens when Wnew is -ve

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34935
2015-03-12 20:53:18 -07:00
Venkatesh Radhakrishnan
b411d06031 Prevent stalls in preshutdown tests
Summary:
The tests using sync_point for intent to shutdown stop
compaction and this results in stalls if too many rows are written. We
now limit the number of rows written to prevent stalls, since the focus
of the test is to cancel background work, which is being correctly
tested. This fixes a Jenkins issue.

Test Plan: DBTest.PreShutdown*

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34893
2015-03-12 10:49:06 -07:00
Islam AbdelRahman
1d43bc41fb Fixing segmentation fault in db_bench
Summary:
Fixing segmentation fault when running db_bench

This seg fault happens because num_created is used without being initialized

Test Plan:
running db_bench using these arguments
bpl=10485760;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4; delay=8; stop=12; wbn=3; mbc=20; mb=67108864;wbs=134217728; dds=0; sync=0; r=1000000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=overwrite --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/home/tec/koko/ --sync=$sync --disable_wal=1 --compression_type=zlib --stats_interval=$si --compression_ratio=0.5 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=1

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34881
2015-03-11 17:57:16 -07:00
sdong
e9de8b65a6 Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true
Summary:
Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc.

Test Plan: run all tests

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D34431
2015-03-11 13:14:52 -07:00
Yueh-Hsuan Chiang
2b785d76b8 Fixed a bug where CompactFiles won't delete obsolete files until flush.
Summary: Fixed a bug where CompactFiles won't delete obsolete files until flush.

Test Plan:
./compact_files_test
export ROCKSDB_TESTS=CompactFiles
./db_test

Reviewers: rven, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34671
2015-03-11 13:06:59 -07:00
sdong
2884b100ba db_bench: Better way to randomize repeated read keys in -read_random_exp_range
Summary: Use a better way to map from a key with locality to a random location. Now with the same -read_random_exp_range setting, hit rate drops, which it is expected.

Test Plan: ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=<multiple_values>

Reviewers: MarkCallaghan, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D34761
2015-03-11 11:46:14 -07:00
Venkatesh Radhakrishnan
284be570c8 Provide a mechanism to inform Rocksdb that it is shutting down
Summary:
Provide an API which enables users to infor Rocksdb that it is
shutting down.

Test Plan: db_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34617
2015-03-11 10:31:02 -07:00
Igor Canadi
2ddf53b2ca Get OptimizeFilterForHits work on Mac
Summary: Got it working by some voodoo programming

Test Plan: works!

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34611
2015-03-10 17:53:22 -07:00
Yueh-Hsuan Chiang
89597bb66b Allow GetThreadList() to report the start time of the current operation.
Summary: Allow GetThreadList() to report the start time of the current operation.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
  --max_background_compactions=10 --max_background_flushes=3 \
  --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
  --num_column_families=10

Sample output:
          ThreadID ThreadType                    cfName    Operation        OP_StartTime         State
   140338840797248   High Pri column_family_name_000003        Flush 2015/03/09-17:49:59
   140338844991552   High Pri column_family_name_000004        Flush 2015/03/09-17:49:59
   140338849185856    Low Pri
   140338983403584    Low Pri
   140339008569408    Low Pri
   140338861768768    Low Pri
   140338924683328    Low Pri
   140338899517504    Low Pri
   140338853380160    Low Pri
   140338882740288    Low Pri
   140338865963072   High Pri column_family_name_000006        Flush 2015/03/09-17:49:59
   140338954043456    Low Pri
   140338857574464    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: lgalanis, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34689
2015-03-10 14:51:28 -07:00
sdong
37921b4997 db_bench: Add Option -read_random_exp_range to allow read skewness.
Summary: Introduce parameter -read_random_exp_range in db_bench to provide some key skewness in readrandom and multireadrandom benchmarks. It will helpful to cover block cache better.

Test Plan:
Run benchmarks with this new parameter. I can clearly see block cache hit rate change while I increase this value (DB size is about 66MB):

./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=0.0
rocksdb.block.cache.data.miss COUNT : 958418
rocksdb.block.cache.data.hit COUNT : 41582

./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=5.0
rocksdb.block.cache.data.miss COUNT : 819518
rocksdb.block.cache.data.hit COUNT : 180482

./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=10.0
rocksdb.block.cache.data.miss COUNT : 450479
rocksdb.block.cache.data.hit COUNT : 549521

./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=20.0
rocksdb.block.cache.data.miss COUNT : 223192
rocksdb.block.cache.data.hit COUNT : 776808

Reviewers: MarkCallaghan, kradhakrishnan, yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D34629
2015-03-09 11:34:52 -07:00
Igor Canadi
485ac0dbd0 Add rate_limiter to string options
Summary: I want to be able to set this through mongo config.

Test Plan: added unit test

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34599
2015-03-06 14:21:15 -08:00
Yueh-Hsuan Chiang
dc4532c497 Add --thread_status_per_interval to db_bench
Summary:
Add --thread_status_per_interval to db_bench, which allows
db_bench to optionally enable print the current thread status
periodically.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10

Sample output:
          ThreadID ThreadType                         dbName                     cfName       Operation           State
   140281571770432    Low Pri
   140281575964736   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000001           Flush
   140281710182464    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000008      Compaction
   140281638879296    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000007      Compaction
   140281592741952    Low Pri
   140281580159040   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000002           Flush
   140281676628032    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000006      Compaction
   140281584353344    Low Pri
   140281622102080    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000009      Compaction
   140281605324864    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000004      Compaction
   140281601130560   High Pri  /tmp/rocksdbtest-5297/dbbench                    default           Flush
   140281596936256    Low Pri
   140281588547648    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34515
2015-03-06 11:22:06 -08:00
Yueh-Hsuan Chiang
694988b627 Fix a bug in stall time counter. Improve its output format.
Summary: Fix a bug in stall time counter.  Improve its output format.

Test Plan:
export ROCKSDB_TESTS=Timeout
./db_test

./db_bench --benchmarks=fillrandom --stats_interval=10000 --statistics=true --stats_per_interval=1 --num=1000000 --threads=4 --level0_stop_writes_trigger=3 --level0_slowdown_writes_trigger=2

sample output:
    Uptime(secs): 35.8 total, 0.0 interval
    Cumulative writes: 359590 writes, 359589 keys, 183047 batches, 2.0 writes per batch, 0.04 GB user ingest, stall seconds: 1786.008 ms
    Cumulative WAL: 359591 writes, 183046 syncs, 1.96 writes per sync, 0.04 GB written
    Interval writes: 253 writes, 253 keys, 128 batches, 2.0 writes per batch, 0.0 MB user ingest, stall time: 0 us
    Interval WAL: 253 writes, 128 syncs, 1.96 writes per sync, 0.00 MB written

Reviewers: MarkCallaghan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34275
2015-03-03 12:48:12 -08:00
Igor Canadi
db03739340 options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.

In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.

Test Plan: New unit tests and pass tests suites including valgrind.

Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo

Reviewed By: ikabiljo

Subscribers: yoshinorim, ikabiljo, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31437
2015-03-02 22:40:41 -08:00
Mark Callaghan
c4bd03a97e Fix typo in log message
Summary:
fix typo

Task ID: #

Blame Rev:

Test Plan:
Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34251
2015-03-02 09:35:50 -08:00