Commit Graph

3599 Commits

Author SHA1 Message Date
sdong
ac81130fa0 Fix Bug: CompactRange() doesn't change to correct level caused by using wrong level
Summary: In previous change https://reviews.facebook.net/D39099 , while renaming parameters, use a wrong parameter, causing CompactRange() to compact not wrong level.

Test Plan: Run "DBTest.MigrateToDynamicLevelMaxBytesBase" which failed with the patch.

Reviewers: rven, yhchiang, kradhakrishnan, igor, anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39393
2015-06-02 10:00:58 -07:00
Mike Kolupaev
ec7a944360 more times in perf_context and iostats_context
Summary:
We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls:
 - perf_context.db_mutex_lock_nanos
 - perf_context.db_condition_wait_nanos
 - iostats_context.open_time
 - iostats_context.allocate_time
 - iostats_context.write_time
 - iostats_context.range_sync_time
 - iostats_context.logger_time

In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long.

Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took.

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: march, dhruba, tnovak

Differential Revision: https://reviews.facebook.net/D39177
2015-06-02 02:07:58 -07:00
sdong
4266d4fd90 Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange()
Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true.

Test Plan: Add a unit test for it.

Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39099
2015-06-01 18:21:14 -07:00
Yueh-Hsuan Chiang
d333820bad Removed DBImpl::notifying_events_
Summary:
DBImpl::notifying_events_ is a internal counter in DBImpl which is
used to prevent DB close when DB is notifying events.  However, as
the current events all rely on either compaction or flush which
already have similar counters to prevent DB close, it is safe to
remove notifying_events_.

Test Plan:
listener_test
examples/compact_files_example

Reviewers: igor, anthony, kradhakrishnan, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39315
2015-06-01 15:32:23 -07:00
Yueh-Hsuan Chiang
495ce6018a Fixed compile warning in compact_files_example.cc
Summary: Fixed compile warning in compact_files_example.cc

Test Plan: compact_files_example

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39309
2015-06-01 14:59:51 -07:00
Mike Kolupaev
2ecac9f96d add rocksdb::WritableFileWrapper similar to rocksdb::EnvWrapper
Summary: It used to be no good (known to me) non-intrusive way to wrap WritableFile - you can't call protected virtual methods of the wrapped pointer to WritableFile. This diff adds a convenience class WritableFileWrapper that makes wrapping WritableFile both possible and easy.

Test Plan: `make clean; make -j release`, `make clean; OPT=-DROCKSDB_LITE make release`, `make clean; USE_CLANG=1 make -j all`.

Reviewers: sdong, yhchiang, rven

Reviewed By: rven

Subscribers: dhruba, tnovak, march

Differential Revision: https://reviews.facebook.net/D39147
2015-06-01 11:22:36 -07:00
Igor Canadi
a187e66ad0 Merge pull request #617 from rdallman/wb-merge-sliceparts
WriteBatch.Merge w/ SliceParts support
2015-05-31 13:34:34 -04:00
Yueh-Hsuan Chiang
16c197627a Fixed db_stress
Summary:
Fixed db_stress by correcting the verification of column family
names in the Listener of db_stress

Test Plan: db_stress

Reviewers: igor, sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39255
2015-05-30 14:26:00 -07:00
Igor Canadi
4c181f08bc Fix compile on darwin
Summary: As title

Test Plan: make check

Reviewers: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39243
2015-05-30 12:25:45 -04:00
agiardullo
bc7a7a400c fix LITE build
Summary: Broken by optimistic transaction diff.  (I only built 'release' not 'static_lib' when testing).

Test Plan: build

Reviewers: yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39219
2015-05-29 15:22:00 -07:00
Yueh-Hsuan Chiang
832271f6b1 Fixed a compile warning in db_stress in NDEBUG mode.
Summary: Fixed a compile warning in db_stress in NDEBUG mode.

Test Plan: make OPT=-DNDEBUG db_stress

Reviewers: sdong, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39213
2015-05-29 15:00:25 -07:00
agiardullo
dc9d70de65 Optimistic Transactions
Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.

Test Plan: Added a new test, but still need to look into stress testing.

Reviewers: yhchiang, igor, rven, sdong

Reviewed By: sdong

Subscribers: adamretter, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33435
2015-05-29 14:36:35 -07:00
Yueh-Hsuan Chiang
d5a0c0e69b Fixed a compile warning in db_stress
Summary:
Fixed the following compile warning in db_stress:
error: 'OnCompactionCompleted' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]

Test Plan: make db_stress

Reviewers: sdong, igor, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39207
2015-05-29 13:37:59 -07:00
Yueh-Hsuan Chiang
ebfdb3c7f6 Fixed a compile error in ROCKSDB_LITE
Summary: Fixed a compile error in ROCKSDB_LITE

Test Plan: make db_stress OPT=-DROCKSDB_LITE -j32

Reviewers: sdong, igor, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39201
2015-05-29 13:21:09 -07:00
Yueh-Hsuan Chiang
9ffc8ba024 Include EventListener in stress test.
Summary: Include EventListener in stress test.

Test Plan: make blackbox_crash_test whitebox_crash_test

Reviewers: anthony, igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39105
2015-05-29 13:17:49 -07:00
Igor Canadi
a3da590226 Decrease number of jobs in make release
Summary: as title

Test Plan: make release

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: sdong, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38853
2015-05-29 13:39:33 -04:00
Reed Allman
21cd6b7ad8 C: add support for WriteBatch SliceParts params 2015-05-29 10:23:43 -07:00
Reed Allman
a0635ba3f6 WriteBatch.Merge w/ SliceParts support
also hooked up WriteBatchInternal
2015-05-29 04:30:03 -07:00
agiardullo
c815351038 Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang
ec4ff4e99c Rename EventLoggerHelpers EventHelpers
Summary:
Rename EventLoggerHelpers EventHelpers, as it's going to include
all event-related helper functions instead of EventLogger only stuffs.

Test Plan: make

Reviewers: sdong, rven, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39093
2015-05-28 13:37:47 -07:00
Yueh-Hsuan Chiang
672dda9b3b [API Change] Move listeners from ColumnFamilyOptions to DBOptions
Summary: Move listeners from ColumnFamilyOptions to DBOptions

Test Plan:
listener_test
compact_files_test

Reviewers: rven, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39087
2015-05-28 13:21:39 -07:00
Yueh-Hsuan Chiang
3ab8ffd4dd Compaction now conditionally boosts the size of deletion entries.
Summary:
Compaction now boosts the size of deletion entries of a file only when
the number of deletion entries is greater than the number of non-deletion
entries in the file.  The motivation here is that in a stable workload,
the number of deletion entries should be roughly equal to the number of
non-deletion entries.  If we compensate the size of deletion entries in a
stable workload, the deletion compensation logic might introduce unwanted
effet which changes the shape of LSM tree.

Test Plan: db_test --gtest_filter="*Deletion*"

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38703
2015-05-26 14:05:38 -07:00
Igor Canadi
a81ac24127 Merge pull request #615 from rdallman/master
C: add more block based table stuff, some aux slice transform/merge ops
2015-05-26 14:19:31 -04:00
Yueh-Hsuan Chiang
6d299b70b8 Fixed a bug in EventLoggerHelpers::LogTableFileCreation
Summary:
Fixed a missing "}" at the end of the generated JSON Log
in EventLoggerHelpers::LogTableFileCreation.

Test Plan: db_bench

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38919
2015-05-26 10:55:46 -07:00
Yueh-Hsuan Chiang
a0580205c8 Removed an unused private variable in db_impl.h
Summary: Removed an unused private variable in db_impl.h

Test Plan: make db_test

Reviewers: sdong, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38925
2015-05-26 10:46:26 -07:00
Reed Allman
328ad902ab update an import path to fit in with the rest of the kids 2015-05-22 22:56:32 -07:00
Reed Allman
9c38ce1d02 C: extra bbto / noop slice transform 2015-05-22 22:56:28 -07:00
Igor Canadi
8d26799fef Merge pull request #614 from arschles/docker
adding docker build script and dockerfile for tools
2015-05-22 19:21:57 -04:00
agiardullo
32198343ff fix typo in c_simple_example
Summary: fix typo

Test Plan: none

Reviewers: tfarina, igor

Reviewed By: tfarina, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37347
2015-05-22 16:13:11 -07:00
Aaron Schlesinger
6116ccc232 moving dockerfile to root 2015-05-22 16:06:53 -07:00
Aaron Schlesinger
d90cee9fd3 adding docker build script and dockerfile 2015-05-22 16:03:39 -07:00
Igor Canadi
ea6d3a8ac0 Don't skip last level when calculating compaction stats
Summary: We have a bug where we don't report the last level's files as being compacted. This fixes it.

Test Plan: See the fix in action here: https://phabricator.fb.com/P19845738

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38727
2015-05-22 15:30:43 -04:00
Yueh-Hsuan Chiang
5c224d1b70 Fixed two bugs on logging file deletion.
Summary:
This patch fixes the following two bugs on logging file deletion.

1.  Previously, file deletion failure was only logged in INFO_LEVEL.
    This patch changes it to ERROR_LEVEL and does some code clean.

2.  EventLogger previously will always generate the same log on
    table file deletion even when file deletion is not successful.
    Now the resulting status of file deletion will also be logged.

Test Plan: make all check

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38817
2015-05-22 12:10:51 -07:00
Yueh-Hsuan Chiang
dc81efe415 Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL
Summary: Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL

Test Plan:
Use db_bench to verify the log level.

Sample output:
    2015/05/22-00:20:39.778064 7fff75b41300 [WARN] RocksDB version: 3.11.0
    2015/05/22-00:20:39.778095 7fff75b41300 [WARN] Git sha rocksdb_build_git_sha:7fee8775a459134c4cb04baae5bd1687e268f2a0
    2015/05/22-00:20:39.778099 7fff75b41300 [WARN] Compile date May 22 2015
    2015/05/22-00:20:39.778101 7fff75b41300 [WARN] DB SUMMARY
    2015/05/22-00:20:39.778145 7fff75b41300 [WARN] SST files in /tmp/rocksdbtest-691931916/dbbench dir, Total Num: 0, files:
    2015/05/22-00:20:39.778148 7fff75b41300 [WARN] Write Ahead Log file in /tmp/rocksdbtest-691931916/dbbench:
    2015/05/22-00:20:39.778150 7fff75b41300 [WARN]          Options.error_if_exists: 0
    2015/05/22-00:20:39.778152 7fff75b41300 [WARN]        Options.create_if_missing: 1
    2015/05/22-00:20:39.778153 7fff75b41300 [WARN]          Options.paranoid_checks: 1

Reviewers: MarkCallaghan, igor, kradhakrishnan

Reviewed By: igor

Subscribers: sdong, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38835
2015-05-22 11:54:59 -07:00
Yueh-Hsuan Chiang
687214f878 Ensure ColumnFamilyOptions.num_levels >= 2 when level compaction is used.
Summary: Ensure ColumnFamilyOptions.num_levels >= 2 when level compaction is used.

Test Plan: Extend SanitizeOptions test in column_family_test

Reviewers: sdong, rven, anthony, krishnanm86, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38829
2015-05-22 11:35:40 -07:00
Yueh-Hsuan Chiang
2abb592688 Avoid logging under mutex in DBImpl::WriteLevel0TableForRecovery().
Summary: Avoid logging under mutex in DBImpl::WriteLevel0TableForRecovery().

Test Plan: make all check

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38823
2015-05-22 11:24:12 -07:00
Igor Canadi
309a9d0760 Run tests sequentally if J=1
Summary: Sometimes we want to run tests sequentially. J=1 gives us that option

Test Plan:
make J=1 check -- sequential
make J=2 check -- parallel

Reviewers: sdong, yhchiang, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38805
2015-05-22 09:11:29 -07:00
Yueh-Hsuan Chiang
7fee8775a4 Allow EventLogger to directly log from a JSONWriter.
Summary:
Allow EventLogger to directly log from a JSONWriter.  This allows
the JSONWriter to be shared by EventLogger and potentially EventListener,
which is an important step to integrate EventLogger and EventListener.

This patch also rewrites EventLoggerHelpers::LogTableFileCreation(),
which uses the new API to generate identical log.

Test Plan:
Run db_bench in debug mode and make sure the log is correct and no
assertions fail.

Reviewers: sdong, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38709
2015-05-21 15:39:30 -07:00
Igor Canadi
7a3577519f Don't artificially inflate L0 score
Summary:
This turns out to be pretty bad because if we prioritize L0->L1 then L1 can grow artificially large, which makes L0->L1 more and more expensive. For example:
256MB @ L0 + 256MB @ L1 --> 512MB @ L1
256MB @ L0 + 512MB @ L1 --> 768MB @ L1
256MB @ L0 + 768MB @ L1 --> 1GB @ L1

....

256MB @ L0 + 10GB @ L1 --> 10.2GB @ L1

At some point we need to start compacting L1->L2 to speed up L0->L1.

Test Plan:
The performance improvement is massive for heavy write workload. This is the benchmark I ran: https://phabricator.fb.com/P19842671. Before this change, the benchmark took 47 minutes to complete. After, the benchmark finished in 2minutes. You can see full results here: https://phabricator.fb.com/P19842674

Also, we ran this diff on MongoDB on RocksDB on one replicaset. Before the change, our initial sync was so slow that it couldn't keep up with primary writes. After the change, the import finished without any issues

Reviewers: dynamike, MarkCallaghan, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38637
2015-05-21 11:40:48 -07:00
Igor Canadi
4cb4d546cd Set stats_dump_period_sec to 600 by default
Summary: Having stats in our LOG more often will help a lot with perf debugging.

Test Plan: none

Reviewers: sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38781
2015-05-21 14:22:16 -04:00
Yueh-Hsuan Chiang
e2c1d4b57f [Public API Change] Make DB::GetDbIdentity() be const function.
Summary: Make DB::GetDbIdentity() be const function.

Test Plan: make db_test

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38745
2015-05-21 11:01:48 -07:00
Karthikeyan Radhakrishnan
eaf61ba9f3 Minor text correction
New features title was repeated twice. Fixed it.
2015-05-21 10:55:58 -07:00
Yueh-Hsuan Chiang
f16c0b289c Merge pull request #613 from DerekSchenk/DerekSchenk-patch-issue-606
Add LDFLAGS to Java static library
2015-05-21 10:48:48 -07:00
Yueh-Hsuan Chiang
d1a978ae3d Rename JSONWritter to JSONWriter
Summary: Rename JSONWritter to JSONWriter

Test Plan: make

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38733
2015-05-20 12:11:57 -07:00
DerekSchenk
3e08175418 Add LDFLAGS to Java static library
Includes the LDFLAGS so that the correct libraries will be linked.  This links rt to resolve the issue https://github.com/facebook/rocksdb/issues/606.
2015-05-19 23:04:02 -04:00
Yueh-Hsuan Chiang
812c461c96 Dump db stats in WARN level
Summary: Dump db stats in WARN level

Test Plan: run db_bench and verify the LOG

Reviewers: igor, MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38691
2015-05-19 18:42:17 -07:00
Yueh-Hsuan Chiang
b588505a7f Update HISTORY.md for GetThreadList() update.
Summary: Update HISTORY.md for GetThreadList() update.

Test Plan: no code change

Reviewers: sdong, rven, anthony, krishnanm86, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38685
2015-05-19 18:41:57 -07:00
Mark Callaghan
944043d683 Add --wal_bytes_per_sync for db_bench and more IO stats
Summary:
See https://gist.github.com/mdcallag/89ebb2b8cbd331854865 for the IO stats.
I added "Cumulative compaction:" and "Interval compaction:" lines. The IO rates
can be confusing. Rates fro per-level stats lines, Wr(MB/s) & Rd(MB/s), are computed
using the duration of the compaction job. If the job reads 10MB, writes 9MB and the job
(IO & merging) takes 1 second then the rates are 10MB/s for read and 9MB/s for writes.
The IO rates in the Cumulative compaction line uses the total uptime. The IO rates in the
Interval compaction line uses the interval uptime. So these Cumalative & Interval
compaction IO rates cannot be compared to the per-level IO rates. But both forms of
the rates are useful for debugging perf.

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D38667
2015-05-19 16:19:30 -07:00
Karthikeyan Radhakrishnan
d5de04d20e Update history for 3.11
Flipped the unreleased section to 3.11
2015-05-19 14:19:11 -07:00
Igor Canadi
08b6b3796e FORCE_GIT_SHA
Summary: In third-party2 build we need to force git sha because we're compiling from a different git repositry.

Test Plan: `FORCE_GIT_SHA=igor make`

Reviewers: kradhakrishnan, sdong

Reviewed By: kradhakrishnan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38679
2015-05-19 11:45:01 -07:00