Commit Graph

769 Commits

Author SHA1 Message Date
Yueh-Hsuan Chiang
bb808eaddb Changed the CompactionJobStats::output_key_prefix type from char[] to string.
Summary:
Keys in RocksDB can be arbitrary byte strings.  However, in the current
CompactionJobStats, smallest_output_key_prefix and largest_output_key_prefix
are of type char[] without having a length, which is insufficient to handle
non-null terminated strings.

This patch change their type to std::string.

Test Plan: compaction_job_stats_test

Reviewers: igor, rven, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39537
2015-06-04 12:31:12 -07:00
Yueh-Hsuan Chiang
0b3172d071 Add EventListener::OnTableFileDeletion()
Summary:
Add EventListener::OnTableFileDeletion(), which will be
called when a table file is deleted.

Test Plan: Extend three existing tests in db_test to verify the deleted files.

Reviewers: rven, anthony, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38931
2015-06-03 19:57:01 -07:00
Reed Allman
211a195d41 C: add MultiGet support 2015-06-03 17:57:42 -07:00
Reed Allman
5dc174e11a C: add support for WriteBatch SliceParts params 2015-06-03 17:08:00 -07:00
Igor Canadi
408cc4b8e0 Revert "Merge pull request #621 from rdallman/c-slice-parts-support"
This reverts commit 78382d4ba7, reversing
changes made to ca8b85ac04.
2015-06-03 13:34:07 -04:00
Igor Canadi
78382d4ba7 Merge pull request #621 from rdallman/c-slice-parts-support
C: add support for WriteBatch SliceParts params
2015-06-03 13:16:14 -04:00
agiardullo
ca8b85ac04 better document max_write_buffer_number_to_maintain
Test Plan: compile

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39285
2015-06-02 21:24:19 -07:00
Yueh-Hsuan Chiang
fe5c6321cb Allow EventListener::OnCompactionCompleted to return CompactionJobStats.
Summary:
Allow EventListener::OnCompactionCompleted to return CompactionJobStats,
which contains useful information about a compaction.

Example CompactionJobStats returned by OnCompactionCompleted():
    smallest_output_key_prefix 05000000
    largest_output_key_prefix 06990000
    elapsed_time 42419
    num_input_records 300
    num_input_files 3
    num_input_files_at_output_level 2
    num_output_records 200
    num_output_files 1
    actual_bytes_input 167200
    actual_bytes_output 110688
    total_input_raw_key_bytes 5400
    total_input_raw_value_bytes 300000
    num_records_replaced 100
    is_manual_compaction 1

Test Plan: Developed a mega test in db_test which covers 20 variables in CompactionJobStats.

Reviewers: rven, igor, anthony, sdong

Reviewed By: sdong

Subscribers: tnovak, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38463
2015-06-02 17:07:16 -07:00
Yueh-Hsuan Chiang
8d8d4e45ba Fixed ROCKSDB_LITE compile error due to the missing of TableFileCreationInfo
Summary: Fixed ROCKSDB_LITE compile error due to the missing of TableFileCreationInfo

Test Plan: make OPT=-DROCKSDB_LITE shared_lib -j32

Reviewers: sdong, igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39405
2015-06-02 14:27:46 -07:00
Yueh-Hsuan Chiang
fc83821270 Add EventListener::OnTableFileCreated()
Summary:
Add EventListener::OnTableFileCreated(), which will be called
when a table file is created.  This patch is part of the
EventLogger and EventListener integration.

Test Plan: Augment existing test in db/listener_test.cc

Reviewers: anthony, kradhakrishnan, rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38865
2015-06-02 14:12:23 -07:00
sdong
ac81130fa0 Fix Bug: CompactRange() doesn't change to correct level caused by using wrong level
Summary: In previous change https://reviews.facebook.net/D39099 , while renaming parameters, use a wrong parameter, causing CompactRange() to compact not wrong level.

Test Plan: Run "DBTest.MigrateToDynamicLevelMaxBytesBase" which failed with the patch.

Reviewers: rven, yhchiang, kradhakrishnan, igor, anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39393
2015-06-02 10:00:58 -07:00
Mike Kolupaev
ec7a944360 more times in perf_context and iostats_context
Summary:
We occasionally get write stalls (>1s Write() calls) on HDD under read load. The following timers explain almost all of the stalls:
 - perf_context.db_mutex_lock_nanos
 - perf_context.db_condition_wait_nanos
 - iostats_context.open_time
 - iostats_context.allocate_time
 - iostats_context.write_time
 - iostats_context.range_sync_time
 - iostats_context.logger_time

In my experiments each of these occasionally takes >1s on write path under some workload. There are rare cases when Write() takes long but none of these takes long.

Test Plan: Added code to our application to write the listed timings to log for slow writes. They usually add up to almost exactly the time Write() call took.

Reviewers: rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: march, dhruba, tnovak

Differential Revision: https://reviews.facebook.net/D39177
2015-06-02 02:07:58 -07:00
sdong
4266d4fd90 Allow users to migrate to options.level_compaction_dynamic_level_bytes=true using CompactRange()
Summary: In DB::CompactRange(), change parameter "reduce_level" to "change_level". Users can compact all data to the last level if needed. By doing it, users can migrate the DB to options.level_compaction_dynamic_level_bytes=true.

Test Plan: Add a unit test for it.

Reviewers: yhchiang, anthony, kradhakrishnan, igor, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D39099
2015-06-01 18:21:14 -07:00
Mike Kolupaev
2ecac9f96d add rocksdb::WritableFileWrapper similar to rocksdb::EnvWrapper
Summary: It used to be no good (known to me) non-intrusive way to wrap WritableFile - you can't call protected virtual methods of the wrapped pointer to WritableFile. This diff adds a convenience class WritableFileWrapper that makes wrapping WritableFile both possible and easy.

Test Plan: `make clean; make -j release`, `make clean; OPT=-DROCKSDB_LITE make release`, `make clean; USE_CLANG=1 make -j all`.

Reviewers: sdong, yhchiang, rven

Reviewed By: rven

Subscribers: dhruba, tnovak, march

Differential Revision: https://reviews.facebook.net/D39147
2015-06-01 11:22:36 -07:00
Igor Canadi
a187e66ad0 Merge pull request #617 from rdallman/wb-merge-sliceparts
WriteBatch.Merge w/ SliceParts support
2015-05-31 13:34:34 -04:00
agiardullo
dc9d70de65 Optimistic Transactions
Summary: Optimistic transactions supporting begin/commit/rollback semantics.  Currently relies on checking the memtable to determine if there are any collisions at commit time.  Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty.  You should probably start with transaction.h to get an overview of what is currently supported.

Test Plan: Added a new test, but still need to look into stress testing.

Reviewers: yhchiang, igor, rven, sdong

Reviewed By: sdong

Subscribers: adamretter, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33435
2015-05-29 14:36:35 -07:00
Yueh-Hsuan Chiang
ebfdb3c7f6 Fixed a compile error in ROCKSDB_LITE
Summary: Fixed a compile error in ROCKSDB_LITE

Test Plan: make db_stress OPT=-DROCKSDB_LITE -j32

Reviewers: sdong, igor, anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39201
2015-05-29 13:21:09 -07:00
Yueh-Hsuan Chiang
9ffc8ba024 Include EventListener in stress test.
Summary: Include EventListener in stress test.

Test Plan: make blackbox_crash_test whitebox_crash_test

Reviewers: anthony, igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39105
2015-05-29 13:17:49 -07:00
Reed Allman
21cd6b7ad8 C: add support for WriteBatch SliceParts params 2015-05-29 10:23:43 -07:00
Reed Allman
a0635ba3f6 WriteBatch.Merge w/ SliceParts support
also hooked up WriteBatchInternal
2015-05-29 04:30:03 -07:00
agiardullo
c815351038 Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
Yueh-Hsuan Chiang
672dda9b3b [API Change] Move listeners from ColumnFamilyOptions to DBOptions
Summary: Move listeners from ColumnFamilyOptions to DBOptions

Test Plan:
listener_test
compact_files_test

Reviewers: rven, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D39087
2015-05-28 13:21:39 -07:00
Reed Allman
9c38ce1d02 C: extra bbto / noop slice transform 2015-05-22 22:56:28 -07:00
Igor Canadi
4cb4d546cd Set stats_dump_period_sec to 600 by default
Summary: Having stats in our LOG more often will help a lot with perf debugging.

Test Plan: none

Reviewers: sdong, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38781
2015-05-21 14:22:16 -04:00
Yueh-Hsuan Chiang
e2c1d4b57f [Public API Change] Make DB::GetDbIdentity() be const function.
Summary: Make DB::GetDbIdentity() be const function.

Test Plan: make db_test

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38745
2015-05-21 11:01:48 -07:00
Igor Canadi
4a855c0799 Add an option wal_bytes_per_sync to control sync_file_range for WAL files
Summary:
sync_file_range is not always asyncronous and thus can block writes if we do this for WAL in the foreground thread. See more here: http://yoshinorimatsunobu.blogspot.com/2014/03/how-syncfilerange-really-works.html

Some users don't want us to call sync_file_range on WALs. Some other do.
Thus, I'm adding a separate option wal_bytes_per_sync to control calling
sync_file_range on WAL files. bytes_per_sync will apply only to table
files now.

Test Plan: no more sync_file_range for WAL as evidenced by strace

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38253
2015-05-18 17:03:59 -07:00
sdong
6fa7085121 CompactRange skips levels 1 to base_level -1 for dynamic level base size
Summary: CompactRange() now is much more expensive for dynamic level base size as it goes through all the levels. Skip those not used levels between level 0 an base level.

Test Plan: Run all unit tests

Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37125
2015-05-18 10:54:11 -07:00
Yueh-Hsuan Chiang
3f0867c0fe Allow GetThreadList to report Flush properties.
Summary:
Allow GetThreadList to report Flush properties, which includes:
* job id
* number of bytes that has been written since flush started.
* total size of input mem-tables

Test Plan:
./db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=100 --value_size=1000

Sample output from db_bench which tracks same flush job

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140213879898240   High Pri      default                Flush       5789 us                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 577104 | JobID 8 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140213879898240   High Pri      default                Flush     30.634 ms                    FlushJob::WriteLevel0Table              BytesMemtables 4112835 | BytesWritten 1734865 | JobID 8 |

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38505
2015-05-15 23:22:22 -07:00
Yueh-Hsuan Chiang
a66f643e97 Use a better way to initialize ThreadStatus::kNumOperationProperties.
Summary: Use a better way to initialize ThreadStatus::kNumOperationProperties.

Test Plan: make

Reviewers: sdong, rven, anthony, krishnanm86, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38547
2015-05-15 15:55:20 -07:00
sdong
ec43a8b9fb Universal Compaction with multiple levels won't allocate up to output size
Summary: Universal compactions with multiple levels should use file preallocation size based on file size if output level is not level 0

Test Plan: Run all tests.

Reviewers: igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D38439
2015-05-13 14:15:46 -07:00
Yueh-Hsuan Chiang
714fcc067d Make ThreadStatus::InterpretOperationProperties take const uint64_t*
Summary: Make ThreadStatus::InterpretOperationProperties take const uint64_t*

Test Plan:
make
make OPT=-DROCKSDB_LITE shared_lib

Reviewers: igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38445
2015-05-13 12:26:07 -07:00
Yueh-Hsuan Chiang
14431e971d Fixed a bug in EventListener::OnCompactionCompleted().
Summary:
Fixed a bug in EventListener::OnCompactionCompleted() that returns
incorrect list of input / output file names.

Test Plan: Extend existing test in listener_test.cc

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38349
2015-05-12 16:10:23 -07:00
agiardullo
711465ccec API to fetch from both a WriteBatchWithIndex and the db
Summary:
Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch.  (This is needed for transactions).

I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep).

Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc.  WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)).

Test Plan: Added new unit tests.

Reviewers: rven, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38037
2015-05-11 14:51:51 -07:00
Igor Canadi
962f8ba332 Bump to 3.11
Summary: as title

Test Plan: none

Reviewers: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D38175
2015-05-07 13:27:14 -07:00
Yueh-Hsuan Chiang
77a5a543a5 Allow GetThreadList() to report basic compaction operation properties.
Summary:
Now we're able to show more details about a compaction in
GetThreadList() :)

This patch allows GetThreadList() to report basic compaction
operation properties.  Basic compaction properties include:
    1. job id
    2. compaction input / output level
    3. compaction property flags (is_manual, is_deletion, .. etc)
    4. total input bytes
    5. the number of bytes has been read currently.
    6. the number of bytes has been written currently.

Flush operation properties will be done in a seperate diff.

Test Plan:
/db_bench --threads=30 --num=1000000 --benchmarks=fillrandom --thread_status_per_interval=1

Sample output of tracking same job:

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction     31.357 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction     59.440 ms     CompactionJob::FinishCompactionOutputFile              BaseInputLevel 1 | BytesRead 2264663 | BytesWritten 1934241 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

          ThreadID ThreadType       cfName            Operation   ElapsedTime                                         Stage        State OperationProperties
   140664171987072    Low Pri      default           Compaction    226.375 ms                        CompactionJob::Install              BaseInputLevel 1 | BytesRead 3958013 | BytesWritten 3621940 | IsDeletion 0 | IsManual 0 | IsTrivialMove 0 | JobID 277 | OutputLevel 2 | TotalInputBytes 3964158 |

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37653
2015-05-06 22:51:06 -07:00
Laurent Demailly
df4130ad85 fix crashes in stats and compaction filter for db_ttl_impl
Summary: fix crashes in stats and compaction filter for db_ttl_impl

Test Plan:
Ran build with lots of debugging
https://reviews.facebook.net/differential/diff/194175/

Reviewers: yhchiang, igor, rven

Reviewed By: igor

Subscribers: rven, dhruba

Differential Revision: https://reviews.facebook.net/D38001
2015-05-05 16:54:47 -07:00
Igor Canadi
fd96b55402 Making GetOptions() comment better (#597) 2015-04-30 09:29:51 -07:00
Aashish Pant
794ccfde89 Task 6532943: Rocksdb - SetCapacity() can dynamically change cache capacity if feasible
Summary:
When new capacity is larger than existing capacity, simply update the capacity to the new valie
When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity

Test Plan: Created unit tests in cache_test.cc

Reviewers: sdong, rven, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37527
2015-04-24 14:12:58 -07:00
sdong
98a44559d5 Build for CYGWIN
Summary:
Make it build for CYGWIN.
Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions.

Test Plan: Build it and run some unit tests in CYGWIN

Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37605
2015-04-23 21:33:44 -07:00
Giuseppe Ottaviano
2dc421df48 Implement DB::PromoteL0 method
Summary:
This diff implements a new `DB` method `PromoteL0` which moves all files in L0
to a given level skipping compaction, provided that the files have disjoint
ranges and all levels up to the target level are empty.

This method provides finer-grain control for trivial compactions, and it is
useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change
the semantics of an existing operation, which can impact existing code.

PromoteL0 is designed to work well in combination with the proposed
`GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure
by populating one level at a time. Such fine-grained control can be very useful
for static or mostly-static databases.

Test Plan: `make check`

Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D37107
2015-04-23 12:10:36 -07:00
sdong
397b6588bd options.paranoid_file_checks to read all rows after writing to a file.
Summary: To further distinguish the corruption cases were caused by storage media or in memory states when writing it, add a paranoid check after writing the file to iterate all the rows.

Test Plan: Add a new unit test for it

Reviewers: rven, igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D37335
2015-04-23 11:34:35 -07:00
Igor Canadi
6059bdf86a Add experimental API MarkForCompaction()
Summary:
Some Mongo+Rocks datasets in Parse's environment are not doing compactions very frequently. During the quiet period (with no IO), we'd like to schedule compactions so that our reads become faster. Also, aggressively compacting during quiet periods helps when write bursts happen. In addition, we also want to compact files that are containing deleted key ranges (like old oplog keys).

All of this is currently not possible with CompactRange() because it's single-threaded and blocks all other compactions from happening. Running CompactRange() risks an issue of blocking writes because we generate too much Level 0 files before the compaction is over. Stopping writes is very dangerous because they hold transaction locks. We tried running manual compaction once on Mongo+Rocks and everything fell apart.

MarkForCompaction() solves all of those problems. This is very light-weight manual compaction. It is lower priority than automatic compactions, which means it shouldn't interfere with background process keeping the LSM tree clean. However, if no automatic compactions need to be run (or we have extra background threads available), we will start compacting files that are marked for compaction.

Test Plan: added a new unit test

Reviewers: yhchiang, rven, MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37083
2015-04-17 16:44:45 -07:00
agiardullo
84c5bd7eb9 Add thread-safety documentation to MemTable and related classes
Summary: Other than making some class members private, this is a documentation-only change

Test Plan: unit tests

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D36567
2015-04-08 21:10:35 -07:00
Yoshinori Matsunobu
824e646341 Adding another NewFlashcacheAwareEnv function to support pre-opened fd
Summary:
There are some cases when flachcache file descriptor was
already allocated (i.e. fb-MySQL). Then NewFlashcacheAwareEnv returns an
error at open() because fd was already assigned. This diff adds another
function to instantiate FlashcacheAwareEnv, with pre-allocated fd cachedev_fd.

Test Plan: Tested with MyRocks using this function, then worked

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, MarkCallaghan, rven

Differential Revision: https://reviews.facebook.net/D36447
2015-04-06 16:50:36 -07:00
Igor Canadi
5e067a7b19 Clean up compression logging
Summary: Now we add warnings when user configures compression and the compression is not supported.

Test Plan:
Configured compression to non-supported values. Observed messages in my log:

    2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2.

    2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data.

Reviewers: rven, sdong, yhchiang

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35979
2015-04-06 12:50:44 -07:00
sdong
953a885ebf A new call back to TablePropertiesCollector to allow users know the entry is add, delete or merge
Summary:
Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.

Also refactor the codes so that
(1) make table property collector and internal table property collector two separate data structures with the later one now exposed
(2) table builders only receive internal table properties

Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.

Reviewers: yhchiang, igor.sugak, rven, igor

Reviewed By: rven, igor

Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D35373
2015-04-06 10:27:21 -07:00
Igor Canadi
d61cb0b9de db_bench can now disable flashcache for background threads
Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code.

Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35391
2015-03-30 09:51:11 -07:00
Herman Lee
e018892bb6 Formalize the DB properties string definitions.
Summary:
Assign the string properties to const string variables under the
DB::Properties namespace. This helps catch typos during compilation and
also consolidates the property definition in one place.

Test Plan: Run rocksdb unit tests

Reviewers: sdong, yoshinorim, igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D35991
2015-03-27 14:50:20 -07:00
Anurag Indu
3d1a924ff3 Adding stats for the merge and filter operation
Summary:
We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.

Test Plan: WIP

Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34377
2015-03-24 14:42:04 -07:00
Yueh-Hsuan Chiang
248c063ba1 Report elapsed time in micros in ThreadStatus instead of start time.
Summary:
Report elapsed time of a thread operation in micros in ThreadStatus
instead of start time of a thread operation in seconds since the
Epoch, 1970-01-01 00:00:00 (UTC).

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10

Sample Output:
            ThreadID ThreadType                    cfName    Operation  ElapsedTime                                         Stage        State
     140667724562496   High Pri column_family_name_000002        Flush   772.419 ms                    FlushJob::WriteLevel0Table
     140667728756800   High Pri                   default        Flush   617.845 ms                    FlushJob::WriteLevel0Table
     140667732951104   High Pri column_family_name_000005        Flush   772.078 ms                    FlushJob::WriteLevel0Table
     140667875557440    Low Pri column_family_name_000008   Compaction  1409.216 ms                        CompactionJob::Install
     140667737145408    Low Pri
     140667749728320    Low Pri
     140667816837184    Low Pri column_family_name_000007   Compaction  1071.815 ms      CompactionJob::ProcessKeyValueCompaction
     140667787477056    Low Pri column_family_name_000009   Compaction   772.516 ms      CompactionJob::ProcessKeyValueCompaction
     140667741339712    Low Pri
     140667758116928    Low Pri column_family_name_000004   Compaction   620.739 ms      CompactionJob::ProcessKeyValueCompaction
     140667753922624    Low Pri
     140667842003008    Low Pri column_family_name_000006   Compaction  1260.079 ms      CompactionJob::ProcessKeyValueCompaction
     140667745534016    Low Pri

Reviewers: sdong, igor, rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35769
2015-03-24 11:32:25 -07:00
Igor Canadi
315abac945 Undeprecate GetLiveFiles()
Summary: There is no alternative to GetLiveFiles() function

Test Plan: none

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35805
2015-03-24 09:42:38 -07:00
sdong
0831a35994 Add a DB Property For Number of Deletions in Memtables
Summary: Add a DB property for number of deletions in memtables. It can sometimes help people debug slowness because of too many deletes.

Test Plan: Add test cases.

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D35247
2015-03-18 17:03:59 -07:00
Igor Canadi
51301b869f Enable dynamic changing of rate limiter's bytes_per_second
Summary: This feature is going to be useful for mongodb+rocksdb. I'll expose it through mongo's API.

Test Plan: added new unit test. also will run TSAN on the new unit test

Reviewers: meyering, sdong

Reviewed By: meyering, sdong

Subscribers: meyering, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35307
2015-03-18 15:35:55 -07:00
agiardullo
81345b90f9 Create an abstract interface for write batches
Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class.  This makes it easier to write code that is agnostic toward the implementation of the particular write batch.  In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch.

Test Plan: modified existing WriteBatchWithIndex tests to test new functions.  Running all tests.

Reviewers: igor, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34017
2015-03-17 19:23:08 -07:00
Igor Canadi
c88ff4ca76 Deprecate removeScanCountLimit in NewLRUCache
Summary: It is no longer used by the implementation, so we should also remove it from the public API.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34971
2015-03-17 15:04:37 -07:00
Venkatesh Radhakrishnan
d4d42c02ea Fixed clang build in env.h
Summary: Mark function as override.

Test Plan: USE_CLANG=1 make -j32 check

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35163
2015-03-16 19:51:25 -07:00
Venkatesh Radhakrishnan
b2b3086524 Speed up rocksDB close call.
Summary:
On RocksDB, when there are multiple instances doing
flushes/compactions in the background, the close call takes a long time
because the flushes/compactions need to complete before the database can
shut down. If another instance is using the background threads and the compaction for this instance is in the queue since it has been scheduled, we still cannot shutdown. We now remove the scheduled background tasks which have not yet started running, so that shutdown is speeded up.

Test Plan: DB Test added.

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33741
2015-03-16 18:49:14 -07:00
Yueh-Hsuan Chiang
c594b0e89d Allow GetThreadList() to report operation stage.
Summary: Allow GetThreadList() to report operation stage.

Test Plan:
  ./thread_list_test
  ./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
    --max_background_compactions=10 --max_background_flushes=3 \
    --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
    --num_column_families=10

  export ROCKSDB_TESTS=ThreadStatus
  ./db_test

Sample output
          ThreadID ThreadType                    cfName    Operation        OP_StartTime    ElapsedTime                                         Stage        State
   140116265861184    Low Pri
   140116270055488    Low Pri
   140116274249792   High Pri column_family_name_000005        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116400078912    Low Pri column_family_name_000004   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116358135872    Low Pri column_family_name_000006   Compaction 2015/03/10-14:58:10           1 us     CompactionJob::FinishCompactionOutputFile
   140116341358656    Low Pri
   140116295221312   High Pri                   default        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116324581440    Low Pri column_family_name_000009   Compaction 2015/03/10-14:58:11           0 us      CompactionJob::ProcessKeyValueCompaction
   140116278444096    Low Pri
   140116299415616    Low Pri column_family_name_000008   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116291027008   High Pri column_family_name_000001        Flush 2015/03/10-14:58:11           0 us                    FlushJob::WriteLevel0Table
   140116286832704    Low Pri column_family_name_000002   Compaction 2015/03/10-14:58:11           0 us     CompactionJob::FinishCompactionOutputFile
   140116282638400    Low Pri

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34683
2015-03-13 10:45:40 -07:00
sdong
e9de8b65a6 Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true
Summary:
Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc.

Test Plan: run all tests

Reviewers: rven, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D34431
2015-03-11 13:14:52 -07:00
Venkatesh Radhakrishnan
284be570c8 Provide a mechanism to inform Rocksdb that it is shutting down
Summary:
Provide an API which enables users to infor Rocksdb that it is
shutting down.

Test Plan: db_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34617
2015-03-11 10:31:02 -07:00
Yueh-Hsuan Chiang
89597bb66b Allow GetThreadList() to report the start time of the current operation.
Summary: Allow GetThreadList() to report the start time of the current operation.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
  --max_background_compactions=10 --max_background_flushes=3 \
  --thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
  --num_column_families=10

Sample output:
          ThreadID ThreadType                    cfName    Operation        OP_StartTime         State
   140338840797248   High Pri column_family_name_000003        Flush 2015/03/09-17:49:59
   140338844991552   High Pri column_family_name_000004        Flush 2015/03/09-17:49:59
   140338849185856    Low Pri
   140338983403584    Low Pri
   140339008569408    Low Pri
   140338861768768    Low Pri
   140338924683328    Low Pri
   140338899517504    Low Pri
   140338853380160    Low Pri
   140338882740288    Low Pri
   140338865963072   High Pri column_family_name_000006        Flush 2015/03/09-17:49:59
   140338954043456    Low Pri
   140338857574464    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: sdong

Subscribers: lgalanis, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34689
2015-03-10 14:51:28 -07:00
Yueh-Hsuan Chiang
dc4532c497 Add --thread_status_per_interval to db_bench
Summary:
Add --thread_status_per_interval to db_bench, which allows
db_bench to optionally enable print the current thread status
periodically.

Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10

Sample output:
          ThreadID ThreadType                         dbName                     cfName       Operation           State
   140281571770432    Low Pri
   140281575964736   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000001           Flush
   140281710182464    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000008      Compaction
   140281638879296    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000007      Compaction
   140281592741952    Low Pri
   140281580159040   High Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000002           Flush
   140281676628032    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000006      Compaction
   140281584353344    Low Pri
   140281622102080    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000009      Compaction
   140281605324864    Low Pri  /tmp/rocksdbtest-5297/dbbench  column_family_name_000004      Compaction
   140281601130560   High Pri  /tmp/rocksdbtest-5297/dbbench                    default           Flush
   140281596936256    Low Pri
   140281588547648    Low Pri

Reviewers: igor, rven, sdong

Reviewed By: rven

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D34515
2015-03-06 11:22:06 -08:00
Igor Canadi
db03739340 options.level_compaction_dynamic_level_bytes to allow RocksDB to pick size bases of levels dynamically.
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.

In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.

Test Plan: New unit tests and pass tests suites including valgrind.

Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo

Reviewed By: ikabiljo

Subscribers: yoshinorim, ikabiljo, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31437
2015-03-02 22:40:41 -08:00
Igor Canadi
3cf7f353d9 Instrument memtable seeks
Summary: As title

Test Plan: compiles

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34191
2015-02-27 17:06:06 -08:00
Igor Canadi
b9ff6b050d Fix a bug in ReadOnlyBackupEngine
Summary:
This diff fixes a bug introduced by D28521. Read-only backup engine can delete a backup that is later than the latest -- we never check the condition.

I also added a bunch of logging that will help with debugging cases like this in the future.

See more discussion at t6218248.

Test Plan: Added a unit test that was failing before the change. Also, see new LOG file contents: https://phabricator.fb.com/P19738984

Reviewers: benj, sanketh, sumeet, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33897
2015-02-27 14:03:56 -08:00
Sameet Agarwal
e7c434c364 Add columnfamily option optimize_filters_for_hits to optimize for key hits only
Summary:
    Summary:
    Added a new option to ColumnFamllyOptions  - optimize_filters_for_hits. This option can be used in the case where most
    accesses to the store are key hits and we dont need to optimize performance for key misses.
    This is useful when you have a very large database and most of your lookups succeed.  The option allows the store to
     not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
     space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
     for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.

    This option is only provided for BlockBasedTable. We skip the filters when we are compacting

Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
 2. Added another unit test to db_test which specifically tests that filters are being skipped

Reviewers: rven, igor, sdong

Reviewed By: sdong

Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33717
2015-02-26 16:25:56 -08:00
stash93
03bbf718cb Return fbson
Summary: mac compile is fixed in fbson, so it can be returned back from 7ce1b2c

Test Plan:
make all check
make valgrind_check

Reviewers: golovachalexander, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33855
2015-02-27 00:04:14 +03:00
Igor Sugak
62247ffa3b rocksdb: Add missing override
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.

Prerequisites: bear and clang 3.5 build with extra tools

```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```

Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```

Reviewers: igor, kradhakrishnan, rven, meyering, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D34077
2015-02-26 11:28:41 -08:00
Jim Meyering
c37937a9ce maint: remove extraneous "const" attribute from return type
Summary:
The "const" attribute does not make sense on a return type,
and provokes a warning/error from gcc -W -Wextra.

Test Plan:
  Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.

Reviewers: ljin, sdong, igor.sugak, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D33759
2015-02-20 11:07:07 -08:00
Igor Canadi
96ab15d306 GetOptionsFromString + fixes to block_based_table_options
Summary:
In mongo, we currently have a single column family and I'd like to support setting rocksdb::Options from string. This diff provides an option to GetOptionsFromString()

There's one more problem. Currently GetColumnFamilyOptionsFromString() overwrites block_based options. In mongo I set default values for block_cache and some other values of BlockBasedTableOptions and I don't want them reset to default with GetOptionsFromString().

Test Plan: added unit test

Reviewers: yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33729
2015-02-19 19:11:14 -08:00
sdong
d45a6a4002 Add rocksdb.num-live-versions: number of live versions
Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases.

Test Plan: make all check

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33327
2015-02-19 13:10:37 -08:00
Igor Canadi
b8ac71ba18 Revert "Fbson to Json"
This reverts commit 7ce1b2c19c.
2015-02-18 13:14:53 -08:00
stash93
7ce1b2c19c Fbson to Json
Summary: Replaced rapidjson with fbson

Test Plan:
make all check
make valgrind_check

Reviewers: golovachalexander, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32733
2015-02-18 23:58:56 +03:00
Venkatesh Radhakrishnan
7d817268b9 Managed iterator
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.

Test Plan: Managed* tests in db_test and XF tests for managed iterator

Reviewers: igor, yhchiang, anthony, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31401
2015-02-18 11:49:31 -08:00
sdong
68af7811ea Remember whole key/prefix filtering on/off in SST file
Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter.

Test Plan: Add a unit test for it

Reviewers: yhchiang, igor, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D32889
2015-02-11 11:20:04 -08:00
sdong
6d6305dd7d Perf Context to report DB mutex waiting time
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex

Test Plan: Add a test and run it.

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D33177
2015-02-09 17:55:12 -08:00
Grace Law
1851f977c2 Added RocksDB stats GET_HIT_L0 and GET_HIT_L1
Summary:
  - In statistics.h , added tickers.
  - In version_set.cc,
  -- Added a getter method for hit_file_level_ in the class FilePicker
  -- Added a line in the Get() method in case of a found, increment the corresponding counters based on the level of the file respectively.

Corresponding task: https://our.intern.facebook.com/intern/tasks/?s=506100481&t=5952818
Personal fork: 0c3f2e3600

Test Plan:
In terminal,
```
make -j32 db_test
ROCKSDB_TESTS=L0L1L2AndUpHitCounter ./db_test
```

Or to use debugger,
```
make -j32 db_test
export ROCKSDB_TESTS=L0L1L2AndUpHitCounter
gdb db_test
```

Reviewers: rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32205
2015-02-09 14:53:58 -08:00
fyrz
cfe8837e43 Switch logv with loglevel to virtual 2015-02-09 20:59:29 +01:00
Igor Canadi
ee4aa9a0ee Merge pull request #481 from mkevac/backupable
Allow creating and restoring backups from C
2015-02-09 09:51:19 -08:00
Marko Kevac
7e50ed8c24 Added some more wrappers and wrote a test for backup in C 2015-02-07 14:25:10 +03:00
Karthikeyan Radhakrishnan
da9cbce731 Add Header to logging to capture application level information
Summary:
This change adds LogHeader provision to the logger. For the rolling logger
implementation, the headers are copied over to the new log file every time
there is a log roll over.

Test Plan: Added a unit test to test the rolling log case.

Reviewers: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32817
2015-02-06 10:37:45 -08:00
Jonah Cohen
9a52e06a02 Add GetID to ColumnFamilyHandle
Summary:
Expose GetID to ColumnFamilyHandle interface so that we can save column
family data by id instead of name.

Test Plan: Testing in MySQL on Rocks.

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D32943
2015-02-05 08:26:33 -08:00
Yueh-Hsuan Chiang
181191a1e4 Add a counter for collecting the wait time on db mutex.
Summary:
Add a counter for collecting the wait time on db mutex.
Also add MutexWrapper and CondVarWrapper for measuring wait time.

Test Plan:
./db_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test

verify stats output using db_bench
make clean
make release
./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10

Sample output:
    rocksdb.db.mutex.wait.micros COUNT : 7546866

Reviewers: MarkCallaghan, rven, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32787
2015-02-04 21:39:45 -08:00
Yueh-Hsuan Chiang
678503ebcf Add utility functions for interpreting ThreadStatus
Summary:
Add ThreadStatus::GetOperationName() and ThreadStatus::GetStateName(),
two utility functions that help interpreting ThreadStatus.

Test Plan: ./thread_list_test

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32793
2015-02-04 01:47:32 -08:00
Igor Canadi
8d3819369f NewIteratorWithBase() for default column family
Summary: I'm moving mongo to a single column family, so I need DeltaBase iterator with default column family.

Test Plan: Added unit test

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32589
2015-02-02 22:29:43 -08:00
Venkatesh Radhakrishnan
0b8dec7172 Cross functional test infrastructure for RocksDB.
Summary:
This Diff provides the implementation of the cross functional
test infrastructure. This provides the ability to test a single feature
with every existing regression test in order to identify issues with
interoperability between features.

Test Plan:
Reference implementation of inplace update support cross
functional test. Able to find interoperability issues with inplace
support and ran all of db_test. Will add separate diff for those changes.

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32247
2015-02-02 14:49:22 -08:00
Marko Kevac
86e2a1eeea Allow creating backups from C 2015-01-31 15:47:49 +03:00
sdong
db9ed5fdb4 Unaddressed comment in previous diff. Change only in code comments. 2015-01-30 16:07:35 -08:00
sdong
5917de0bae CappedFixTransform: return fixed length prefix, or full key if key is shorter than the fixed length
Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key.

Test Plan:
Add a test case for
db_test
options_test
and a new test

Reviewers: yhchiang, rven, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D31887
2015-01-30 16:04:30 -08:00
Igor Canadi
6c6037f60c Expose Snapshot's SequenceNumber
Summary:
Requested here: https://www.facebook.com/groups/rocksdb.dev/permalink/705524519546065/

It might also help with mongo. I don't see a reason why we shouldn't expose this info.

Test Plan: make check

Reviewers: sdong, yhchiang, rven

Reviewed By: rven

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D32547
2015-01-29 18:22:43 -08:00
Ori Bernstein
f9758e0129 Add compaction listener.
Summary: This adds a listener for compactions, and gives some useful statistics on each compaction pass.

Test Plan: Unit tests.

Reviewers: sdong, igor, rven, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D31641
2015-01-27 14:44:02 -08:00
Igor Canadi
e5df90f5d0 Fix comment (minor) 2015-01-22 14:59:36 -08:00
Igor Canadi
3d628f8f22 Update format_version comment
Summary: We added a new format version. Reflect that in the comments.

Test Plan: none

Reviewers: sdong, rven, yhchiang, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31629
2015-01-16 09:18:45 -08:00
Igor Canadi
9ab5adfc59 New BlockBasedTable version -- better compressed block format
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.

It does not affect format for snappy.

If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.

Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB

Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31461
2015-01-14 16:24:24 -08:00
Igor Canadi
2ccc54301d Merge pull request #460 from neutronsharc/master
Remove duplicated method declarations in C header.
2015-01-13 18:19:26 -08:00
Xiangyong Ouyang
2a7bd0ea45 Remove duplicated method declarations in C header. 2015-01-14 01:20:30 +00:00
Igor Canadi
96b8240bc5 Support footer versions bigger than 1
Summary:
In this diff I add another parameter to BlockBasedTableOptions that will let users specify block based table's format. This will greatly simplify block based table's format changes in the future.

First format change that this will support is encoding decompressed size in Zlib and BZip2 blocks. This diff is blocking https://reviews.facebook.net/D31311.

Test Plan: Added a unit tests. More tests to come as part of https://reviews.facebook.net/D31311.

Reviewers: dhruba, MarkCallaghan, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31383
2015-01-13 14:33:04 -08:00
sdong
9132e52ea4 DB Stats Dump to print total stall time
Summary:
Add printing of stall time in DB Stats:

Sample outputs:

** DB Stats **
Uptime(secs): 53.2 total, 1.7 interval
Cumulative writes: 625940 writes, 625939 keys, 625940 batches, 1.0 writes per batch, 0.49 GB user ingest, stall micros: 50691070
Cumulative WAL: 625940 writes, 625939 syncs, 1.00 writes per sync, 0.49 GB written
Interval writes: 10859 writes, 10859 keys, 10859 batches, 1.0 writes per batch, 8.7 MB user ingest, stall micros: 1692319
Interval WAL: 10859 writes, 10859 syncs, 1.00 writes per sync, 0.01 MB written

Test Plan:
make all check
verify printing using db_bench

Reviewers: igor, yhchiang, rven, MarkCallaghan

Reviewed By: MarkCallaghan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D31239
2015-01-09 11:44:19 -08:00
sdong
73ee4febab Add comments about properties supported by DB::GetProperty() and DB::GetIntProperty()
Summary: Add comments in db.h to help users discover their options.

Test Plan: Compile

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: MarkCallaghan, yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D31077
2015-01-07 15:09:35 -08:00
Igor Canadi
62ad0a9b19 Deprecating skip_log_error_on_recovery
Summary:
Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery.

The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings.

Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery.

Reviewers: yhchiang, rven, dhruba, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D30603
2015-01-05 13:35:56 -08:00