Commit Graph

862 Commits

Author SHA1 Message Date
Yueh-Hsuan Chiang
fe5c6321cb Allow EventListener::OnCompactionCompleted to return CompactionJobStats.
Summary:
Allow EventListener::OnCompactionCompleted to return CompactionJobStats,
which contains useful information about a compaction.

Example CompactionJobStats returned by OnCompactionCompleted():
    smallest_output_key_prefix 05000000
    largest_output_key_prefix 06990000
    elapsed_time 42419
    num_input_records 300
    num_input_files 3
    num_input_files_at_output_level 2
    num_output_records 200
    num_output_files 1
    actual_bytes_input 167200
    actual_bytes_output 110688
    total_input_raw_key_bytes 5400
    total_input_raw_value_bytes 300000
    num_records_replaced 100
    is_manual_compaction 1

Test Plan: Developed a mega test in db_test which covers 20 variables in CompactionJobStats.

Reviewers: rven, igor, anthony, sdong

Reviewed By: sdong

Subscribers: tnovak, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38463
2015-06-02 17:07:16 -07:00
Mike Kolupaev
ec7a944360 more times in perf_context and iostats_context
Summary:
We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls:
 - perf_context.db_mutex_lock_nanos
 - perf_context.db_condition_wait_nanos
 - iostats_context.open_time
 - iostats_context.allocate_time
 - iostats_context.write_time
 - iostats_context.range_sync_time
 - iostats_context.logger_time

In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long.

Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took.

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: march, dhruba, tnovak

Differential Revision: https://reviews.facebook.net/D39177
2015-06-02 02:07:58 -07:00
Mike Kolupaev
2ecac9f96d add rocksdb::WritableFileWrapper similar to rocksdb::EnvWrapper
Summary: It used to be no good (known to me) non-intrusive way to wrap WritableFile - you can't call protected virtual methods of the wrapped pointer to WritableFile. This diff adds a convenience class WritableFileWrapper that makes wrapping WritableFile both possible and easy.

Test Plan: `make clean; make -j release`, `make clean; OPT=-DROCKSDB_LITE make release`, `make clean; USE_CLANG=1 make -j all`.

Reviewers: sdong, yhchiang, rven

Reviewed By: rven

Subscribers: dhruba, tnovak, march

Differential Revision: https://reviews.facebook.net/D39147
2015-06-01 11:22:36 -07:00
agiardullo
dc9d70de65 Optimistic Transactions
Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.

Test Plan: Added a new test, but still need to look into stress testing.

Reviewers: yhchiang, igor, rven, sdong

Reviewed By: sdong

Subscribers: adamretter, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33435
2015-05-29 14:36:35 -07:00
Yueh-Hsuan Chiang
9ffc8ba024 Include EventListener in stress test.
Summary: Include EventListener in stress test.

Test Plan: make blackbox_crash_test whitebox_crash_test

Reviewers: anthony, igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39105
2015-05-29 13:17:49 -07:00
agiardullo
c815351038 Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang
672dda9b3b [API Change] Move listeners from ColumnFamilyOptions to DBOptions
Summary: Move listeners from ColumnFamilyOptions to DBOptions

Test Plan:
listener_test
compact_files_test

Reviewers: rven, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39087
2015-05-28 13:21:39 -07:00
Reed Allman
328ad902ab update an import path to fit in with the rest of the kids 2015-05-22 22:56:32 -07:00
Yueh-Hsuan Chiang
dc81efe415 Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL
Summary: Change the log-level of DB summary and options from INFO_LEVEL to WARN_LEVEL

Test Plan:
Use db_bench to verify the log level.

Sample output:
    2015/05/22-00:20:39.778064 7fff75b41300 [WARN] RocksDB version: 3.11.0
    2015/05/22-00:20:39.778095 7fff75b41300 [WARN] Git sha rocksdb_build_git_sha:7fee8775a459134c4cb04baae5bd1687e268f2a0
    2015/05/22-00:20:39.778099 7fff75b41300 [WARN] Compile date May 22 2015
    2015/05/22-00:20:39.778101 7fff75b41300 [WARN] DB SUMMARY
    2015/05/22-00:20:39.778145 7fff75b41300 [WARN] SST files in /tmp/rocksdbtest-691931916/dbbench dir, Total Num: 0, files:
    2015/05/22-00:20:39.778148 7fff75b41300 [WARN] Write Ahead Log file in /tmp/rocksdbtest-691931916/dbbench:
    2015/05/22-00:20:39.778150 7fff75b41300 [WARN]          Options.error_if_exists: 0
    2015/05/22-00:20:39.778152 7fff75b41300 [WARN]        Options.create_if_missing: 1
    2015/05/22-00:20:39.778153 7fff75b41300 [WARN]          Options.paranoid_checks: 1

Reviewers: MarkCallaghan, igor, kradhakrishnan

Reviewed By: igor

Subscribers: sdong, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38835
2015-05-22 11:54:59 -07:00
Yueh-Hsuan Chiang
7fee8775a4 Allow EventLogger to directly log from a JSONWriter.
Summary:
Allow EventLogger to directly log from a JSONWriter.  This allows
the JSONWriter to be shared by EventLogger and potentially EventListener,
which is an important step to integrate EventLogger and EventListener.

This patch also rewrites EventLoggerHelpers::LogTableFileCreation(),
which uses the new API to generate identical log.

Test Plan:
Run db_bench in debug mode and make sure the log is correct and no
assertions fail.

Reviewers: sdong, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38709
2015-05-21 15:39:30 -07:00
Igor Canadi
4cb4d546cd Set stats_dump_period_sec to 600 by default
Summary: Having stats in our LOG more often will help a lot with perf debugging.

Test Plan: none

Reviewers: sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38781
2015-05-21 14:22:16 -04:00
Yueh-Hsuan Chiang
d1a978ae3d Rename JSONWritter to JSONWriter
Summary: Rename JSONWritter to JSONWriter

Test Plan: make

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38733
2015-05-20 12:11:57 -07:00
Igor Canadi
4a855c0799 Add an option wal_bytes_per_sync to control sync_file_range for WAL files
Summary:
sync_file_range is not always asyncronous and thus can block writes if we do this for WAL in the foreground thread. See more here: http://yoshinorimatsunobu.blogspot.com/2014/03/how-syncfilerange-really-works.html

Some users don't want us to call sync_file_range on WALs. Some other do.
Thus, I'm adding a separate option wal_bytes_per_sync to control calling
sync_file_range on WAL files. bytes_per_sync will apply only to table
files now.

Test Plan: no more sync_file_range for WAL as evidenced by strace

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38253
2015-05-18 17:03:59 -07:00
Yueh-Hsuan Chiang
74f3832d85 Fixed compile errors due to some gcc does not have std::map::emplace
Summary:
Fixed the following compile errors due to some gcc does not have std::map::emplace

util/thread_status_impl.cc: In static member function ‘static std::map<std::basic_string<char>, long unsigned int> rocksdb::ThreadStatus::InterpretOperationProperties(rocksdb::ThreadStatus::OperationType, const uint64_t*)’:
util/thread_status_impl.cc:88:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’
util/thread_status_impl.cc:90:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’
util/thread_status_impl.cc:94:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’
util/thread_status_impl.cc:96:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’
util/thread_status_impl.cc:98:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’
util/thread_status_impl.cc:101:20: error: ‘class std::map<std::basic_string<char>, long unsigned int>’ has no member named ‘emplace’
make: *** [util/thread_status_impl.o] Error 1

Test Plan: make db_bench

Reviewers: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38643
2015-05-18 13:48:56 -07:00
stash93
0c8017dbae Remove duplicated code
Summary: Call Flush() function instead

Test Plan: make all check

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D38583
2015-05-18 23:44:52 +03:00
Yueh-Hsuan Chiang
3f0867c0fe Allow GetThreadList to report Flush properties.
Summary:
Allow GetThreadList to report Flush properties, which includes:
* job id
* number of bytes that has been written since flush started.
* total size of input mem-tables

Test Plan:
./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000

Sample output from db_bench which tracks same flush job

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140213879898240   High Pri      default                Flush       5789 us                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 577104 | JobID 8 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140213879898240   High Pri      default                Flush     30.634 ms                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 1734865 | JobID 8 |

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38505
2015-05-15 23:22:22 -07:00
Yueh-Hsuan Chiang
714fcc067d Make ThreadStatus::InterpretOperationProperties take const uint64_t*
Summary: Make ThreadStatus::InterpretOperationProperties take const uint64_t*

Test Plan:
make
make OPT=-DROCKSDB_LITE shared_lib

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38445
2015-05-13 12:26:07 -07:00
Igor Canadi
dbd95b7532 Add more table properties to EventLogger
Summary:
Example output:

    {"time_micros": 1431463794310521, "job": 353, "event": "table_file_creation", "file_number": 387, "file_size": 86937, "table_info": {"data_size": "81801", "index_size": "9751", "filter_size": "0", "raw_key_size": "23448", "raw_average_key_size": "24.000000", "raw_value_size": "990571", "raw_average_value_size": "1013.890481", "num_data_blocks": "245", "num_entries": "977", "filter_policy_name": "", "kDeletedKeys": "0"}}

Also fixed a bug where BuildTable() in recovery was passing Env::IOHigh argument into paranoid_checks_file parameter.

Test Plan: make check + check out the output in the log

Reviewers: sdong, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38343
2015-05-12 15:53:55 -07:00
Yueh-Hsuan Chiang
77a5a543a5 Allow GetThreadList() to report basic compaction operation properties.
Summary:
Now we're able to show more details about a compaction in
GetThreadList() :)

This patch allows GetThreadList() to report basic compaction
operation properties.  Basic compaction properties include:
    1. job id
    2. compaction input / output level
    3. compaction property flags (is_manual, is_deletion, .. etc)
    4. total input bytes
    5. the number of bytes has been read currently.
    6. the number of bytes has been written currently.

Flush operation properties will be done in a seperate diff.

Test Plan:
/db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1

Sample output of tracking same job:

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction     31.357 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction     59.440 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction    226.375 ms                        CompactionJob::Install              BaseInputLevel 1 | BytesRead 3958013 | BytesWritten 3621940 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37653
2015-05-06 22:51:06 -07:00
Igor Canadi
7f47ba0e26 Fix possible SIGSEGV in CompactRange (github issue #596)
Summary: For very detailed explanation of what's happening read this: https://github.com/facebook/rocksdb/issues/596

Test Plan: make check + new unit test

Reviewers: yhchiang, anthony, rven

Reviewed By: rven

Subscribers: adamretter, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37779
2015-04-29 10:52:31 -07:00
Igor Canadi
1bb4928da9 Include bunch of more events into EventLogger
Summary:
Added these events:
* Recovery start, finish and also when recovery creates a file
* Trivial move
* Compaction start, finish and when compaction creates a file
* Flush start, finish

Also includes small fix to EventLogger

Also added option ROCKSDB_PRINT_EVENTS_TO_STDOUT which is useful when we debug things. I've spent far too much time chasing LOG files.

Still didn't get sst table properties in JSON. They are written very deeply into the stack. I'll address in separate diff.

TODO:
* Write specification. Let's first use this for a while and figure out what's good data to put here, too. After that we'll write spec
* Write tools that parse and analyze LOGs. This can be in python or go. Good intern task.

Test Plan: Ran db_bench with ROCKSDB_PRINT_EVENTS_TO_STDOUT. Here's the output: https://phabricator.fb.com/P19811976

Reviewers: sdong, yhchiang, rven, MarkCallaghan, kradhakrishnan, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37521
2015-04-27 15:20:02 -07:00
Aashish Pant
3db81d535a Fix memory leak in cache_test introduced in the previous commit
Test Plan: Verified that valgrind build passes for cache_test

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37665
2015-04-26 21:47:30 -07:00
clark.kang
6ede020dc4 fix typos 2015-04-25 18:14:27 +09:00
Aashish Pant
242f9b4c26 Fix CLANG build issue introduced in previous commit
Summary: Added keyword override for SetCapacity()

Test Plan: Fixes build

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37647
2015-04-24 14:45:12 -07:00
Aashish Pant
794ccfde89 Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible
Summary:
When new capacity is larger than existing capacity, simply update the capacity to the new valie
When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity

Test Plan: Created unit tests in cache_test.cc

Reviewers: sdong, rven, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37527
2015-04-24 14:12:58 -07:00
sdong
98a44559d5 Build for CYGWIN
Summary:
Make it build for CYGWIN.
Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions.

Test Plan: Build it and run some unit tests in CYGWIN

Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37605
2015-04-23 21:33:44 -07:00
Igor Canadi
e003d3864c Abstract out SetMaxPossibleForUserKey() and SetMinPossibleForUserKey
Summary:
Based on feedback from D37083.

Are all of these correct? In some spaces it seems like we're doing SetMaxPossibleForUserKey() although we want the smallest possible internal key for user key.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37341
2015-04-23 18:08:37 -07:00
sdong
397b6588bd options.paranoid_file_checks to read all rows after writing to a file.
Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows.

Test Plan: Add a new unit test for it

Reviewers: rven, igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37335
2015-04-23 11:34:35 -07:00
Jim Meyering
0a91bca5db test: avoid vuln-inducing use of temporary directory
Summary:
Without this change, someone on the machine on which
I run "make check" could cause me to overwrite arbitrary
files owned by me, via a symlink attack.

Instead of using a predictable temporary directory and
accepting to use a preexisting one, always create a new
one using mkdtemp.  If $TEST_IOCTL_FRIENDLY_TMPDIR is
set and usable, attempt first to find a usable
temporary directory therein.  If not, or if unusable,
then try /var/tmp and /tmp.  If none of those is usable
abort with a diagnostic.

To do that, I added a new class.
Its constructor finds a suitable directory or aborts,
the sole member prints that directory's name, and the
destructor unlinks what should be an empty directory.

Note that while the code before this did not remove
its temporary directory, there was only one per $UID.
Now, there would be at least one per run or one per
test, depending on implementation, so it is important
to remove them.

Test Plan:
  Run this on a fedora rawhide system, where /tmp
  is a tmpfs file system, and /var/tmp is ext4.

  # This gives a diagnostic that /dev/shm is not suitable
  # and ends up using /var/tmp.
  TEST_IOCTL_FRIENDLY_TMPDIR=/dev/shm ./env_test

  # Uses /var/tmp; same as when envvar not set.
  TEST_IOCTL_FRIENDLY_TMPDIR=/var/tmp ./env_test

  # Uses /tmp unless it's tmpfs, in which case it gives
  # a diagnostic and uses /var/tmp.
  TEST_IOCTL_FRIENDLY_TMPDIR=/tmp ./env_test

Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37287
2015-04-23 08:00:56 -07:00
Jim Meyering
79c21ec0c4 skip ioctl-using tests when not supported
Summary:
[NB: this is a prerequisite for the /tmp-abuse-fixing patch]
This avoids spurious test failure on Linux systems
like Fedora for which /tmp is a tmpfs file system.

On a devtmpfs file
system, ioctl(fd, FS_IOC_GETVERSION, &version) returns -1 with
errno == ENOTTTY, indicating that that ioctl is not supported
on such a file system.  Do not let this cause test failures, e.g.,
where env_test would assert that file->GetUniqueId(...) > 0.

Before this change, ./env_test would fail these three tests
on a fedora rawhide system:

  [  FAILED  ] 3 tests, listed below:
  [  FAILED  ] EnvPosixTest.RandomAccessUniqueID
  [  FAILED  ] EnvPosixTest.RandomAccessUniqueIDConcurrent
  [  FAILED  ] EnvPosixTest.RandomAccessUniqueIDDeletes
   3 FAILED TESTS

The fix:
  When support for that ioctl is lacking, skip each affected test.
  Could be improved by noting which sub-tests are being skipped.

Test Plan:
run these on F21 and note that they now pass.

  TEST_TMPDIR=/dev/shm/rdb ./env_test
  ./env_test

Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37323
2015-04-17 20:39:02 -07:00
Igor Canadi
48b0a045da Speed up reduce_levels_test
Summary: For some reason reduce_levels is opening the databse with 65.000 levels. This makes ComputeCompactionScore() function terribly slow and the tests is also very slow (20seconds).

Test Plan: mr reduce_levels_test now takes 20ms

Reviewers: sdong, rven, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37059
2015-04-16 19:31:34 -07:00
sdong
fcb206b667 SyncPoint to allow a callback with an argument and use it to get DBTest.DynamicLevelCompressionPerLevel2 more straight-forward
Summary:
Allow users to give a callback function with parameter using sync point, so more complicated verification can be done in tests.
Use it in DBTest.DynamicLevelCompressionPerLevel2 so that failures will be more easy to debug.

Test Plan: Run all tests. Run DBTest.DynamicLevelCompressionPerLevel2 with valgrind check.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D36999
2015-04-14 16:18:50 -07:00
Igor Canadi
1983fadcbc assert(sorted) in vector rep
Summary: based on discussion on https://reviews.facebook.net/D36969

Test Plan: will let jenkins do its job

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36975
2015-04-13 17:33:24 -07:00
Igor Canadi
9b983befa8 Fix flakiness of WalManagerTest
Summary: We should use mocked-out env for these tests to make it more realiable. Added benefit is that instead of actually sleeping for 3 seconds, we can instead pretend to sleep and just increase time counters.

Test Plan: for i in `seq 100`; do ./wal_manager_test --gtest_filter=WalManagerTest.WALArchivalTtl ;done

Reviewers: rven, meyering

Reviewed By: meyering

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36951
2015-04-13 16:15:05 -07:00
Igor Canadi
d41a565a4a Don't do O(N^2) operations in debug mode for vector memtable
Summary: As title. For every operation we're asserting Valid(), which sorts the data. That's pretty terrible. We have to be careful to have decent performance even with DEBUG builds.

Test Plan: make check

Reviewers: sdong, rven, yhchiang, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36969
2015-04-13 16:11:47 -07:00
Igor Canadi
08be1803ee Fix bad performance in debug mode
Summary:
See github issue 574: https://github.com/facebook/rocksdb/issues/574

Basically when we're running in DEBUG mode we're calling `usleep(0)` on
every mutex lock. I bisected the issue to
https://reviews.facebook.net/D36963. Instead of calling sleep(0), this
diff just avoids calling SleepForMicroseconds() when delay is not set.

Test Plan:
    bpl=10485760;overlap=10;mcz=2;del=300000000;levels=2;ctrig=10000000; delay=10000000; stop=10000000; wbn=30; mbc=20; mb=1073741824;wbs=268435456; dds=1; sync=0; r=100000; t=1; vs=800; bs=65536; cs=1048576; of=500000; si=1000000; ./db_bench --benchmarks=fillrandom --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$r --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=4 --open_files=$of --verify_checksum=1 --db=/tmp/rdb10test --sync=$sync --disable_wal=1 --compression_type=snappy --stats_interval=$si --compression_ratio=0.5 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --memtablerep=vector --use_existing_db=0 --disable_auto_compactions=1 --source_compaction_factor=10000000 | grep ops

Before:
fillrandom   :     117.525 micros/op 8508 ops/sec;    6.6 MB/s
After:
fillrandom   :       1.283 micros/op 779502 ops/sec;  606.6 MB/s

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36963
2015-04-13 15:58:45 -07:00
Igor Canadi
abb4052278 Kill benchharness
Summary:
1. it doesn't work
2. we're not using it

In the future, if we need general benchmark framework, we should probably use https://github.com/google/benchmark

Test Plan: make all

Reviewers: yhchiang, rven, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36777
2015-04-13 10:17:42 -07:00
Yueh-Hsuan Chiang
7b9581bc3b Fixed xfunc related compile errors in ROCKSDB_LITE
Summary:
Fixed xfunc related compile errors in ROCKSDB_LITE

Now make OPT=-DROCKSDB_LITE shared_lib -j32 would work

Test Plan:
make clean
make OPT=-DROCKSDB_LITE shared_lib -j32
make clean
make OPT=-DROCKSDB_LITE static_lib -j32

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36825
2015-04-09 21:05:18 -07:00
Igor Canadi
5e067a7b19 Clean up compression logging
Summary: Now we add warnings when user configures compression and the compression is not supported.

Test Plan:
Configured compression to non-supported values. Observed messages in my log:

    2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2.

    2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data.

Reviewers: rven, sdong, yhchiang

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35979
2015-04-06 12:50:44 -07:00
sdong
953a885ebf A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge
Summary:
Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.

Also refactor the codes so that
(1) make table property collector and internal table property collector two separate data structures with the later one now exposed
(2) table builders only receive internal table properties

Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.

Reviewers: yhchiang, igor.sugak, rven, igor

Reviewed By: rven, igor

Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D35373
2015-04-06 10:27:21 -07:00
sdong
b23bbaa82a Universal Compactions with Small Files
Summary:
With this change, we use L1 and up to store compaction outputs in universal compaction.
The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.

If options.num_levels=1, it behaves all the same as now.

Test Plan:
1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
3) add unit test to migrate from multiple level setting back to one level setting
4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.

Reviewers: rven, kradhakrishnan, yhchiang, igor

Reviewed By: igor

Subscribers: meyering, leveldb, MarkCallaghan, dhruba

Differential Revision: https://reviews.facebook.net/D34539
2015-03-30 15:12:02 -07:00
Igor Canadi
2511b7d947 Makefile minor cleanup
Summary:
Just couple of small changes:
1. removed signal_test, since it doesn't seem useful and we don't even run it as part of `make check`
2. moved perf_context_test to TESTS instead of PROGRAMS
3. `make release` probably shouldn't compile benchmarks. We currently rely on `make release` building db_bench (via Jenkins), so I left db_bench there.

This is just a minor cleanup. We need to rethink our targets since they are a bit messy right now. We can do this during our tech debt week.

Test Plan: make release

Reviewers: anthony, rven, yhchiang, sdong, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36171
2015-03-30 16:05:35 -04:00
Mark Callaghan
99ec2412e5 Make the benchmark scripts configurable and add tests
Summary:
This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
makes the duration per-test configurable and refactors the test scripts.

Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
the writer thread does Merge rather than Put.

Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
Sometimes automation and humans want different format.

Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
stats in the output at test end.

Adds the average ingest rate to compaction IO stats. Output now looks like:
https://gist.github.com/mdcallag/2bd64d18be1b93adc494

More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1

For benchmark.sh changes default RocksDB configuration to reduce stalls:
* min_level_to_compress from 2 to 3
* hard_rate_limit from 2 to 3
* max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
* L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)

Task ID: #6596829

Blame Rev:

Test Plan:
run tools/run_flash_bench.sh

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D36075
2015-03-30 11:28:25 -07:00
xiaoxichen
bcd8a71a28 Fix interger overflow on i386 arch
The error was:

util/logging.cc: In function 'int rocksdb::AppendHumanMicros(uint64_t, char*, int)':
error: util/logging.cc:41:39: integer overflow in expression [-Werror=overflow]
} else if (micros < 1000000l * 60 * 60) {
^
error: util/logging.cc:41:39: comparison between signed and unsigned integer expressions [-Werror=sign-compare]
2015-03-27 08:38:53 +08:00
Yueh-Hsuan Chiang
cd987c383a Fix compile error when NROCKSDB_THREAD_STATUS is not used.
Summary: Fix compile error when NROCKSDB_THREAD_STATUS is not used.

Test Plan: make dbg OPT=-DNROCKSDB_THREAD_STATUS -j32

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35847
2015-03-24 14:52:59 -07:00
Anurag Indu
3d1a924ff3 Adding stats for the merge and filter operation
Summary:
We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.

Test Plan: WIP

Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34377
2015-03-24 14:42:04 -07:00
Yueh-Hsuan Chiang
248c063ba1 Report elapsed time in micros in ThreadStatus instead of start time.
Summary:
Report elapsed time of a thread operation in micros in ThreadStatus
instead of start time of a thread operation in seconds since the
Epoch, 1970-01-01 00:00:00 (UTC).

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10

Sample Output:
            ThreadID ThreadType                    cfName    Operation  ElapsedTime                                         Stage        State
     140667724562496   High Pri column_family_name_000002        Flush   772.419 ms                    FlushJob::WriteLevel0Table
     140667728756800   High Pri                   default        Flush   617.845 ms                    FlushJob::WriteLevel0Table
     140667732951104   High Pri column_family_name_000005        Flush   772.078 ms                    FlushJob::WriteLevel0Table
     140667875557440    Low Pri column_family_name_000008   Compaction  1409.216 ms                        CompactionJob::Install
     140667737145408    Low Pri
     140667749728320    Low Pri
     140667816837184    Low Pri column_family_name_000007   Compaction  1071.815 ms      CompactionJob::ProcessKeyValueCompaction
     140667787477056    Low Pri column_family_name_000009   Compaction   772.516 ms      CompactionJob::ProcessKeyValueCompaction
     140667741339712    Low Pri
     140667758116928    Low Pri column_family_name_000004   Compaction   620.739 ms      CompactionJob::ProcessKeyValueCompaction
     140667753922624    Low Pri
     140667842003008    Low Pri column_family_name_000006   Compaction  1260.079 ms      CompactionJob::ProcessKeyValueCompaction
     140667745534016    Low Pri

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35769
2015-03-24 11:32:25 -07:00
krad
689391406a Make SSTDumpTest.GetProperties less noisy
Summary:
Limiting verbose printing to "command=scan"

Test Plan:
Run make check and manual testing of sst_dump_test

Reviewers: sdong

CC: leveldb

Task ID: #6575982

Blame Rev:
2015-03-23 14:30:11 -07:00
Igor Sugak
28bc6de989 rocksdb: print status error message when (ASSERT|EXPECT)_OK fails
Summary: Modified rocksdb status assertions ASSERT_OK and EXPECT_OK to print error message from Status::ToString() when failed.

Test Plan: Modify a test to fail status assertions ASSERT_OK and EXPECT_OK and notice an error message that came from Status::ToString()

Reviewers: meyering, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35469
2015-03-19 17:32:43 -07:00
Igor Sugak
9405b5ef8f rocksdb: Remove #include "util/string_util.h" from util/testharness.h
Summary:
1. Manually deleted #include "util/string_util.h" from util/testharness.h
2.
```
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -E 'say $F[0] if /: error:/' build.log | sort -u | xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"'
```

Test Plan:
Make sure make all completes with no errors.
```
% make all -j55
```

Reviewers: meyering, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35493
2015-03-19 17:29:37 -07:00