Commit Graph

3625 Commits

Author SHA1 Message Date
sdong
6df589b446 Add TablePropertiesCollector::NeedCompact() to suggest DB to further compact output files
Summary:
It is experimental. Allow users to return from a call back function TablePropertiesCollector::NeedCompact(), based on the data in the file.
It can be used to allow users to suggest DB to clear up delete tombstones faster.

Test Plan: Add a unit test.

Reviewers: igor, yhchiang, kradhakrishnan, rven

Reviewed By: rven

Subscribers: yoshinorim, MarkCallaghan, maykov, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39585
2015-06-05 20:18:21 -07:00
Yueh-Hsuan Chiang
2e764f06ea [API Change] Improve EventListener::OnFlushCompleted interface
Summary:
EventListener::OnFlushCompleted() now passes a structure instead
of a list of parameters.  This minimizes the API change in the
future.

Test Plan:
listener_test
compact_files_test
example/compact_files_example

Reviewers: kradhakrishnan, sdong, IslamAbdelRahman, rven, igor

Reviewed By: rven, igor

Subscribers: IslamAbdelRahman, rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39543
2015-06-05 12:28:51 -07:00
Yueh-Hsuan Chiang
7322c74012 Revert incorrect commit
Summary: Revert incorrect commit

Test Plan: db_test

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39651
2015-06-05 11:23:09 -07:00
Islam AbdelRahman
31e60e2a77 Unlock mutex in ReFitLevel
Summary: I encountered an issue where the database hang, it looks like the mutex is not unlocked on return in ReFitLevel function

Test Plan: make -j64 check

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D39609
2015-06-05 11:06:14 -07:00
Yueh-Hsuan Chiang
7647df8f9e Fixed the tsan failure in util/compaction_job_stats_impl.cc
Summary:
The type of smallest_output_key_prefix and largest_output_key_prefix
have been changed to std::string in https://reviews.facebook.net/D39537.
As a result, we shouldn't do smallest_output_key_prefix[0] = 0 in the
initialization.

Test Plan: compile db_test with tsan enabled and repeat DBTest.CompactionDeletionTrigger test to verify the tsan issue has been gone.

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39645
2015-06-05 11:05:35 -07:00
Igor Canadi
b2785472c8 Fix compile
Summary:
This commit broke the compile: 3ce3bb3da2
As evidenced here: https://evergreen.mongodb.com/task/mongodb_mongo_master_ubuntu1404_rocksdb_compile_ce2b1d11d42de93f7b375f7e6c41fb709f66e969_15_06_04_23_09_36

This should fix it

Test Plan: make check

Reviewers: IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39627
2015-06-05 09:41:45 -04:00
Islam AbdelRahman
3ce3bb3da2 Allowing L0 -> L1 trivial move on sorted data
Summary:
This diff updates the logic of how we do trivial move, now trivial move can run on any number of files in input level as long as they are not overlapping

The conditions for trivial move have been updated

Introduced conditions:
  - Trivial move cannot happen if we have a compaction filter (except if the compaction is not manual)
  - Input level files cannot be overlapping

Removed conditions:
  - Trivial move only run when the compaction is not manual
  - Input level should can contain only 1 file

More context on what tests failed because of Trivial move
```
DBTest.CompactionsGenerateMultipleFiles
This test is expecting compaction on a file in L0 to generate multiple files in L1, this test will fail with trivial move because we end up with one file in L1
```

```
DBTest.NoSpaceCompactRange
This test expect compaction to fail when we force environment to report running out of space, of course this is not valid in trivial move situation
because trivial move does not need any extra space, and did not check for that
```

```
DBTest.DropWrites
Similar to DBTest.NoSpaceCompactRange
```

```
DBTest.DeleteObsoleteFilesPendingOutputs
This test expect that a file in L2 is deleted after it's moved to L3, this is not valid with trivial move because although the file was moved it is now used by L3
```

```
CuckooTableDBTest.CompactionIntoMultipleFiles
Same as DBTest.CompactionsGenerateMultipleFiles
```

This diff is based on a work by @sdong https://reviews.facebook.net/D34149

Test Plan: make -j64 check

Reviewers: rven, sdong, igor

Reviewed By: igor

Subscribers: yhchiang, ott, march, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D34797
2015-06-04 16:51:25 -07:00
Yueh-Hsuan Chiang
bb808eaddb Changed the CompactionJobStats::output_key_prefix type from char[] to string.
Summary:
Keys in RocksDB can be arbitrary byte strings.  However, in the current
CompactionJobStats, smallest_output_key_prefix and largest_output_key_prefix
are of type char[] without having a length, which is insufficient to handle
non-null terminated strings.

This patch change their type to std::string.

Test Plan: compaction_job_stats_test

Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39537
2015-06-04 12:31:12 -07:00
Yueh-Hsuan Chiang
0b3172d071 Add EventListener::OnTableFileDeletion()
Summary:
Add EventListener::OnTableFileDeletion(), which will be
called when a table file is deleted.

Test Plan: Extend three existing tests in db_test to verify the deleted files.

Reviewers: rven, anthony, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38931
2015-06-03 19:57:01 -07:00
Reed Allman
211a195d41 C: add MultiGet support 2015-06-03 17:57:42 -07:00
Reed Allman
5dc174e11a C: add support for WriteBatch SliceParts params 2015-06-03 17:08:00 -07:00
Igor Canadi
2d0b9e5f0a Fix compile on darwin
Summary: We need to start doing some CI on Macs.

Test Plan: works now

Reviewers: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39489
2015-06-03 16:52:51 -04:00
sdong
3af668ed17 Fix DBTest.MigrateToDynamicLevelMaxBytesBase slowness with valgrind
Summary:
DBTest.MigrateToDynamicLevelMaxBytesBase with valgrind test is
extremely slow. Work it around by not having both threads running
everything non-stop.

Test Plan: Run the test with valgrind which used to take too long to finish and see it finish in reasonable time.

Reviewers: yhchiang, anthony, rven, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39477
2015-06-03 12:08:37 -07:00
Igor Canadi
408cc4b8e0 Revert "Merge pull request #621 from rdallman/c-slice-parts-support"
This reverts commit 78382d4ba7, reversing
changes made to ca8b85ac04.
2015-06-03 13:34:07 -04:00
Igor Canadi
78382d4ba7 Merge pull request #621 from rdallman/c-slice-parts-support
C: add support for WriteBatch SliceParts params
2015-06-03 13:16:14 -04:00
agiardullo
ca8b85ac04 better document max_write_buffer_number_to_maintain
Test Plan: compile

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39285
2015-06-02 21:24:19 -07:00
Yueh-Hsuan Chiang
0483dab2ab Remove a TODO that has been done
Summary: Remove a TODO that has been done

Test Plan: make

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39429
2015-06-02 18:38:57 -07:00
Yueh-Hsuan Chiang
8afafc2783 Fix compile warning in db/db_impl
Summary:
Fix the following compile warning in db/db_impl

  db/db_impl.cc:1603:19: error: implicit conversion loses integer precision: 'const uint64_t' (aka 'const unsigned long') to 'int' [-Werror,-Wshorten-64-to-32]
     info.job_id = job_id;
                 ~ ^~~~~~

Test Plan: db_test

Reviewers: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39423
2015-06-02 17:36:45 -07:00
Yueh-Hsuan Chiang
fe5c6321cb Allow EventListener::OnCompactionCompleted to return CompactionJobStats.
Summary:
Allow EventListener::OnCompactionCompleted to return CompactionJobStats,
which contains useful information about a compaction.

Example CompactionJobStats returned by OnCompactionCompleted():
    smallest_output_key_prefix 05000000
    largest_output_key_prefix 06990000
    elapsed_time 42419
    num_input_records 300
    num_input_files 3
    num_input_files_at_output_level 2
    num_output_records 200
    num_output_files 1
    actual_bytes_input 167200
    actual_bytes_output 110688
    total_input_raw_key_bytes 5400
    total_input_raw_value_bytes 300000
    num_records_replaced 100
    is_manual_compaction 1

Test Plan: Developed a mega test in db_test which covers 20 variables in CompactionJobStats.

Reviewers: rven, igor, anthony, sdong

Reviewed By: sdong

Subscribers: tnovak, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38463
2015-06-02 17:07:16 -07:00
Yueh-Hsuan Chiang
3083ed2129 Fixed heap-use-after-free error in compaction_job_test.cc
Summary: Fixed heap-use-after-free error in compaction_job_test.cc

Test Plan: compaction_job_test

Reviewers: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39411
2015-06-02 16:23:01 -07:00
Yueh-Hsuan Chiang
8d8d4e45ba Fixed ROCKSDB_LITE compile error due to the missing of TableFileCreationInfo
Summary: Fixed ROCKSDB_LITE compile error due to the missing of TableFileCreationInfo

Test Plan: make OPT=-DROCKSDB_LITE shared_lib -j32

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39405
2015-06-02 14:27:46 -07:00
Yueh-Hsuan Chiang
ab946af08a Fix a compile warning in listener_test.cc
Summary:
Fixed the following compile warning in listener_test.cc:
db/listener_test.cc:214:8: error: 'OnTableFileCreated' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
14:16:46   void OnTableFileCreated(

Test Plan:
make listener_test

Reviewers: sdong, igor

Subscribers: leveldb
2015-06-02 14:20:27 -07:00
Yueh-Hsuan Chiang
fc83821270 Add EventListener::OnTableFileCreated()
Summary:
Add EventListener::OnTableFileCreated(), which will be called
when a table file is created.  This patch is part of the
EventLogger and EventListener integration.

Test Plan: Augment existing test in db/listener_test.cc

Reviewers: anthony, kradhakrishnan, rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38865
2015-06-02 14:12:23 -07:00
Yueh-Hsuan Chiang
898e803fc5 Add a stats counter for DB_WRITE back which was mistakenly removed.
Summary: Add a stats counter for DB_WRITE back which was mistakenly removed.

Test Plan: augment GroupCommitTest

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39399
2015-06-02 12:35:12 -07:00
sdong
ac81130fa0 Fix Bug: CompactRange() doesn't change to correct level caused by using wrong level
Summary: In previous change https://reviews.facebook.net/D39099 , while renaming parameters, use a wrong parameter, causing CompactRange() to compact not wrong level.

Test Plan: Run "DBTest.MigrateToDynamicLevelMaxBytesBase" which failed with the patch.

Reviewers: rven, yhchiang, kradhakrishnan, igor, anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39393
2015-06-02 10:00:58 -07:00
Mike Kolupaev
ec7a944360 more times in perf_context and iostats_context
Summary:
We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls:
 - perf_context.db_mutex_lock_nanos
 - perf_context.db_condition_wait_nanos
 - iostats_context.open_time
 - iostats_context.allocate_time
 - iostats_context.write_time
 - iostats_context.range_sync_time
 - iostats_context.logger_time

In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long.

Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took.

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: march, dhruba, tnovak

Differential Revision: https://reviews.facebook.net/D39177
2015-06-02 02:07:58 -07:00
sdong
4266d4fd90 Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange()
Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true.

Test Plan: Add a unit test for it.

Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39099
2015-06-01 18:21:14 -07:00
Yueh-Hsuan Chiang
d333820bad Removed DBImpl::notifying_events_
Summary:
DBImpl::notifying_events_ is a internal counter in DBImpl which is
used to prevent DB close when DB is notifying events.  However, as
the current events all rely on either compaction or flush which
already have similar counters to prevent DB close, it is safe to
remove notifying_events_.

Test Plan:
listener_test
examples/compact_files_example

Reviewers: igor, anthony, kradhakrishnan, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39315
2015-06-01 15:32:23 -07:00
Yueh-Hsuan Chiang
495ce6018a Fixed compile warning in compact_files_example.cc
Summary: Fixed compile warning in compact_files_example.cc

Test Plan: compact_files_example

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39309
2015-06-01 14:59:51 -07:00
Mike Kolupaev
2ecac9f96d add rocksdb::WritableFileWrapper similar to rocksdb::EnvWrapper
Summary: It used to be no good (known to me) non-intrusive way to wrap WritableFile - you can't call protected virtual methods of the wrapped pointer to WritableFile. This diff adds a convenience class WritableFileWrapper that makes wrapping WritableFile both possible and easy.

Test Plan: `make clean; make -j release`, `make clean; OPT=-DROCKSDB_LITE make release`, `make clean; USE_CLANG=1 make -j all`.

Reviewers: sdong, yhchiang, rven

Reviewed By: rven

Subscribers: dhruba, tnovak, march

Differential Revision: https://reviews.facebook.net/D39147
2015-06-01 11:22:36 -07:00
Igor Canadi
a187e66ad0 Merge pull request #617 from rdallman/wb-merge-sliceparts
WriteBatch.Merge w/ SliceParts support
2015-05-31 13:34:34 -04:00
Yueh-Hsuan Chiang
16c197627a Fixed db_stress
Summary:
Fixed db_stress by correcting the verification of column family
names in the Listener of db_stress

Test Plan: db_stress

Reviewers: igor, sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39255
2015-05-30 14:26:00 -07:00
Igor Canadi
4c181f08bc Fix compile on darwin
Summary: As title

Test Plan: make check

Reviewers: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39243
2015-05-30 12:25:45 -04:00
agiardullo
bc7a7a400c fix LITE build
Summary: Broken by optimistic transaction diff.  (I only built 'release' not 'static_lib' when testing).

Test Plan: build

Reviewers: yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39219
2015-05-29 15:22:00 -07:00
Yueh-Hsuan Chiang
832271f6b1 Fixed a compile warning in db_stress in NDEBUG mode.
Summary: Fixed a compile warning in db_stress in NDEBUG mode.

Test Plan: make OPT=-DNDEBUG db_stress

Reviewers: sdong, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39213
2015-05-29 15:00:25 -07:00
agiardullo
dc9d70de65 Optimistic Transactions
Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.

Test Plan: Added a new test, but still need to look into stress testing.

Reviewers: yhchiang, igor, rven, sdong

Reviewed By: sdong

Subscribers: adamretter, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33435
2015-05-29 14:36:35 -07:00
Yueh-Hsuan Chiang
d5a0c0e69b Fixed a compile warning in db_stress
Summary:
Fixed the following compile warning in db_stress:
error: 'OnCompactionCompleted' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]

Test Plan: make db_stress

Reviewers: sdong, igor, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39207
2015-05-29 13:37:59 -07:00
Yueh-Hsuan Chiang
ebfdb3c7f6 Fixed a compile error in ROCKSDB_LITE
Summary: Fixed a compile error in ROCKSDB_LITE

Test Plan: make db_stress OPT=-DROCKSDB_LITE -j32

Reviewers: sdong, igor, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39201
2015-05-29 13:21:09 -07:00
Yueh-Hsuan Chiang
9ffc8ba024 Include EventListener in stress test.
Summary: Include EventListener in stress test.

Test Plan: make blackbox_crash_test whitebox_crash_test

Reviewers: anthony, igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39105
2015-05-29 13:17:49 -07:00
Igor Canadi
a3da590226 Decrease number of jobs in make release
Summary: as title

Test Plan: make release

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: sdong, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38853
2015-05-29 13:39:33 -04:00
Reed Allman
21cd6b7ad8 C: add support for WriteBatch SliceParts params 2015-05-29 10:23:43 -07:00
Reed Allman
a0635ba3f6 WriteBatch.Merge w/ SliceParts support
also hooked up WriteBatchInternal
2015-05-29 04:30:03 -07:00
agiardullo
c815351038 Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang
ec4ff4e99c Rename EventLoggerHelpers EventHelpers
Summary:
Rename EventLoggerHelpers EventHelpers, as it's going to include
all event-related helper functions instead of EventLogger only stuffs.

Test Plan: make

Reviewers: sdong, rven, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39093
2015-05-28 13:37:47 -07:00
Yueh-Hsuan Chiang
672dda9b3b [API Change] Move listeners from ColumnFamilyOptions to DBOptions
Summary: Move listeners from ColumnFamilyOptions to DBOptions

Test Plan:
listener_test
compact_files_test

Reviewers: rven, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39087
2015-05-28 13:21:39 -07:00
Yueh-Hsuan Chiang
3ab8ffd4dd Compaction now conditionally boosts the size of deletion entries.
Summary:
Compaction now boosts the size of deletion entries of a file only when
the number of deletion entries is greater than the number of non-deletion
entries in the file.  The motivation here is that in a stable workload,
the number of deletion entries should be roughly equal to the number of
non-deletion entries.  If we compensate the size of deletion entries in a
stable workload, the deletion compensation logic might introduce unwanted
effet which changes the shape of LSM tree.

Test Plan: db_test --gtest_filter="*Deletion*"

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38703
2015-05-26 14:05:38 -07:00
Igor Canadi
a81ac24127 Merge pull request #615 from rdallman/master
C: add more block based table stuff, some aux slice transform/merge ops
2015-05-26 14:19:31 -04:00
Yueh-Hsuan Chiang
6d299b70b8 Fixed a bug in EventLoggerHelpers::LogTableFileCreation
Summary:
Fixed a missing "}" at the end of the generated JSON Log
in EventLoggerHelpers::LogTableFileCreation.

Test Plan: db_bench

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38919
2015-05-26 10:55:46 -07:00
Yueh-Hsuan Chiang
a0580205c8 Removed an unused private variable in db_impl.h
Summary: Removed an unused private variable in db_impl.h

Test Plan: make db_test

Reviewers: sdong, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38925
2015-05-26 10:46:26 -07:00
Reed Allman
328ad902ab update an import path to fit in with the rest of the kids 2015-05-22 22:56:32 -07:00