Commit Graph

1089 Commits

Author SHA1 Message Date
Siying Dong
9b51af5a17 [RocksDB Performance Branch] DBImpl.NewInternalIterator() to reduce works inside mutex
Summary: To reduce mutex contention caused by DBImpl.NewInternalIterator(), in this function, move all the iteration creation works out of mutex, only leaving object ref and get.

Test Plan:
make all check
will run db_stress for a while too to make sure no problem.

Reviewers: haobo, dhruba, kailiu

Reviewed By: haobo

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D14589

Conflicts:
	db/db_impl.cc
2014-01-14 17:41:44 -08:00
Igor Canadi
d9cd7a063f Fix CompactRange to apply filter to every key
Summary:
When doing CompactRange(), we should first flush the memtable and then calculate max_level_with_files. Also, we want to compact all the levels that have files, including level `max_level_with_files`.

This patch fixed the unit test.

Test Plan: Added a failing unit test and a fix, so it's not failing anymore.

Reviewers: dhruba, haobo, sdong

Reviewed By: haobo

CC: leveldb, xjin

Differential Revision: https://reviews.facebook.net/D14421
2014-01-14 16:19:09 -08:00
Igor Canadi
1ed2404f27 Wrong number of levels is Invalid argument now, not corruption 2014-01-14 15:54:11 -08:00
Igor Canadi
6291020284 Fix test 2014-01-14 15:41:30 -08:00
Igor Canadi
7f3e417f59 Fix memtable construction in tests 2014-01-14 15:36:12 -08:00
Igor Canadi
055e6df45b VersionEdit not to take NumLevels()
Summary:
I will submit a sequence of diffs that are preparing master branch for column families. There are a lot of implicit assumptions in the code that are making column family implementation hard. If I make the change only in column family branch, it will make merging back to master impossible.

Most of the diffs will be simple code refactorings, so I hope we can have fast turnaround time. Feel free to grab me in person to discuss any of them.

This diff removes number of level check from VersionEdit. It is used only when VersionEdit is read, not written, but has to be set when it is written. I believe it is a right thing to make VersionEdit dumb and check consistency on the caller side. This will also make it much easier to implement Column Families, since different column families can have different number of levels.

Test Plan: make check

Reviewers: dhruba, haobo, sdong, kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15159
2014-01-14 15:27:09 -08:00
Igor Canadi
7d9f21cf23 BuildBatchGroup -- memcpy outside of lock
Summary: When building batch group, don't actually build a new batch since it requires heavy-weight mem copy and malloc. Only store references to the batches and build the batch group without lock held.

Test Plan:
`make check`

I am also planning to run performance tests. The workload that will benefit from this change is readwhilewriting. I will post the results once I have them.

Reviewers: dhruba, haobo, kailiu

Reviewed By: haobo

CC: leveldb, xjin

Differential Revision: https://reviews.facebook.net/D15063
2014-01-14 14:49:31 -08:00
kailiu
481c77e526 Move the compilation of the shared libraries to "make release"
Compiling the shared libraries took a long time. Thus to speed up the development speed, it still makes sense to be separated from regular compilation.
2014-01-14 13:54:33 -08:00
Naman Gupta
78ee22508b Merge branch 'master' of github.com:facebook/rocksdb into sanitizedOptions 2014-01-14 12:41:29 -08:00
Kai Liu
d702d8073e A script that automatically reformat affected lines
Summary:
Added a script that reformat only the affected lines in a given diff.

I planned to make that file as pre-commit hook but looks it's a little bit more difficult than I thought. Since I don't want to spend too much time on this task right now, I eventually added a "make command" to achieve this with a few additional key strokes.

Also make the clang-format solely inherited from Google's style -- there are still debates on some of the style issues, but we can address them later once we reach a consensus.

Test Plan: Did some ugly format change and ran "make format", all affected lines are formatted as expected.

Reviewers: igor, sdong, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15147
2014-01-14 12:21:24 -08:00
Naman Gupta
1d9bac4d7f Use sanitized options while opening db
Summary: We use SanitizeOptions() to set appropriate values for some options, based on other options. So we should use the sanitized options by default. Luckily it hasn't caused a bug yet, but can result in a bug in the fugture.

Test Plan: make check

Reviewers: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14103
2014-01-14 11:46:24 -08:00
Siying Dong
fbbf0d1456 Pre-calculate whether to slow down for too many level 0 files
Summary: Currently in DBImpl::MakeRoomForWrite(), we do  "versions_->NumLevelFiles(0) >= options_.level0_slowdown_writes_trigger" to check whether the writer thread needs to slow down. However, versions_->NumLevelFiles(0) is slightly more expensive than we expected. By caching the result of the comparison when installing a new version, we can avoid this function call every time.

Test Plan:
make all check
Manually trigger this behavior by applying universal compaction style and make sure inserts are made slow after there are certain number of files.

Reviewers: haobo, kailiu, igor

Reviewed By: kailiu

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D15141
2014-01-14 11:23:02 -08:00
Siying Dong
51dd21926c DB::Put() to estimate write batch data size needed and pre-allocate buffer
Summary:
In one of CPU profiles, we see some CPU costs of string::reserve() inside Batch.Put(). This patch should be able to reduce some of the costs by allocating sufficient buffer before hand.

Since it is a trivial percentage of CPU costs, I didn't find a way to show the improvement in one of the benchmarks. I'll deploy it to same application and do the same CPU profiling to make sure those CPU costs are reduced.

Test Plan: make all check

Reviewers: haobo, kailiu, igor

Reviewed By: haobo

CC: leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D15135
2014-01-14 10:53:16 -08:00
kailiu
ac2fe72832 Compile dynamic library by default
Summary:
Per request, some users need to use dynamic rocksdb library instead of static one.

However currently the dynamic libraries have to be manually compiled by default, which is inconvenient. I made dymamic libraries to be compiled by default.

Test Plan: make clean; make; make clean;

Reviewers: haobo, sdong, dhruba, igor

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15117
2014-01-14 00:28:10 -08:00
Siying Dong
c4548d5f1f WriteBatch to provide a way for user to query data size directly and only return constant reference of data in Data()
Summary:
WriteBatch::Data() now is easily to be misuse by users. Also, there is no cheap way for user of WriteBatch to know the data size accumulated. This patch fix the problem by:
(1) return a constant reference to Data() so it's obvious to caller what it means.
(2) add a function to return data size directly

Test Plan: make all check

Reviewers: haobo, igor, kailiu

Reviewed By: kailiu

CC: zshao, leveldb

Differential Revision: https://reviews.facebook.net/D15123
2014-01-13 16:52:14 -08:00
Igor Canadi
00065d0d5d Fix merge test
Copy max_successive_merges from options to column family options
2014-01-13 13:27:47 -08:00
Igor Canadi
a107691711 [column families] Implement refcounting ColumnFamilyData
Summary: We don't want to delete ColumnFamilyData object if somebody has references to it.

Test Plan: `make check` for now, but will need to implement bigger column family test case

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15111
2014-01-13 09:22:44 -08:00
Igor Canadi
151f9e144f Merge branch 'master' into columnfamilies 2014-01-13 09:09:51 -08:00
Igor Canadi
d076cef347 [column families] Get rid of VersionSet::current_ and keep current Version for each column family
Summary:
The biggest change here is getting rid of current_ Version and adding a column_family_data->current Version to each column family.

I have also fixed some smaller things in VersionSet that made it easier to implement Column family support.

Test Plan: make check

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15105
2014-01-13 08:59:15 -08:00
Igor Canadi
dd6ecdf342 Use ASSERT_EQ() instead of assert() in merge_test 2014-01-11 09:25:47 -08:00
Schalk-Willem Kruger
a09ee1069d Improve RocksDB "get" performance by computing merge result in memtable
Summary:
Added an option (max_successive_merges) that can be used to specify the
maximum number of successive merge operations on a key in the memtable.
This can be used to improve performance of the "get" operation. If many
successive merge operations are performed on a key, the performance of "get"
operations on the key deteriorates, as the value has to be computed for each
"get" operation by applying all the successive merge operations.

FB Task ID: #3428853

Test Plan:
make all check
db_bench --benchmarks=readrandommergerandom
counter_stress_test

Reviewers: haobo, vamsi, dhruba, sdong

Reviewed By: haobo

CC: zshao

Differential Revision: https://reviews.facebook.net/D14991
2014-01-10 17:33:56 -08:00
Igor Canadi
62197d28b6 Merge pull request #62 from matope/fix-BackupableDBTest-NoDoubleCopy-test-fail
Fix share_table_files condition in BackupEngine constructor.
2014-01-10 13:40:46 -08:00
ono_matope
f8642dacde Fix share_table_files condition in BackupEngine constructor.
That makes BackupableDBTest.NoDoubleCopy test error.
2014-01-11 05:12:07 +09:00
Kai Liu
9996e2d21c Merge pull request #61 from Yancey1989/master
fix compile warning
2014-01-10 10:28:50 -08:00
Yancey
afdd2d1a46 fix compile warning 2014-01-10 17:56:35 +08:00
Igor Canadi
cb37ddf229 Feature requests for BackupableDB
Summary:
This diff introduces some features that were requested by two internal customers:
* Ability for backups not to share table files, because we can't guarantee that equal filename means equal content accross replicas
* Ability for two threads to call EnableFileDeletions() and DisableFileDeletions()
* Ability to stop backup from another thread and not slow down the DB close
* Copy the files to the temporary folder first and then atomically rename

Test Plan: Added some tests to backupable_db_test

Reviewers: dhruba, sanketh, muthu, sdong, haobo

Reviewed By: haobo

CC: leveldb, sanketh, muthu

Differential Revision: https://reviews.facebook.net/D14769
2014-01-09 12:24:28 -08:00
Igor Canadi
d0406675c2 readwhilewriting benchmark
Summary:
Added readwhilewriting benchmark to our regression tests.
Changed block cache shards from 16 to 64, as Mark found that cache mutex contention is a big bottleneck.

Test Plan: Ran it.

Reviewers: dhruba, haobo, MarkCallaghan, xjin

Reviewed By: MarkCallaghan

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15075
2014-01-08 17:44:58 -08:00
Siying Dong
5575316350 StopWatch not to get time if it is created for statistics and it is disabled
Summary: Currently, even if statistics is not enabled, StopWatch only for the stats still gets the time of the day, which is wasteful. This patch adds a new option to StopWatch to disable this get in this case.

Test Plan: make all check

Reviewers: dhruba, haobo, igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14703
2014-01-08 16:05:36 -08:00
kailiu
12b6d2b839 Separate the aligned and unaligned memory allocation
Summary: Use two vectors for different types of memory allocation.

Test Plan: run all unit tests.

Reviewers: haobo, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15027
2014-01-08 15:11:42 -08:00
Igor Canadi
19e3ee64ac Add column family information to WAL
Summary:
I have added three new value types:
* kTypeColumnFamilyDeletion
* kTypeColumnFamilyValue
* kTypeColumnFamilyMerge
which include column family Varint32 before the data (value, deletion and merge). These values are used only in WAL (not in memtables yet).

This endeavour required changing some WriteBatch internals.

Test Plan: Added a unittest

Reviewers: dhruba, haobo, sdong, kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15045
2014-01-08 12:53:33 -08:00
Mark Callaghan
50994bf699 Don't always compress L0 files written by memtable flush
Summary:
Code was always compressing L0 files written by a memtable flush
when compression was enabled. Now this is done when
min_level_to_compress=0 for leveled compaction and when
universal_compaction_size_percent=-1 for universal compaction.

Task ID: #3416472

Blame Rev:

Test Plan:
ran db_bench with compression options

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: dhruba, igor, sdong

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14757
2014-01-07 21:50:26 -08:00
Igor Canadi
a45b7d83ba Merge pull request #59 from mlin/more-c-bindings
C API: add rocksdb_env_set_high_priority_background_threads
2014-01-07 16:33:03 -08:00
Igor Canadi
72918efffe [column families] Implement DB::OpenWithColumnFamilies()
Summary:
In addition to implementing OpenWithColumnFamilies, this diff also includes some minor changes:
* Changed all column family names from Slice() to std::string. The performance of column family name handling is not critical, and it's more convenient and cleaner to have names as std::strings
* Implemented ColumnFamilyOptions(const Options&) and DBOptions(const Options&)
* Added ColumnFamilyOptions to VersionSet::ColumnFamilyData. ColumnFamilyOptions are specified on OpenWithColumnFamilies() and CreateColumnFamily()

I will keep the diff in the Phabricator for a day or two and will push to the branch then. Feel free to comment even after the diff has been pushed.

Test Plan: Added a simple unit test

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15033
2014-01-07 11:05:50 -08:00
Igor Canadi
d3a2ba9c64 Merge branch 'master' into columnfamilies 2014-01-07 11:05:03 -08:00
Mike Lin
4c75e21c20 Eliminate stdout message when launching a posix thread.
This seems out of place as it's the only time RocksDB prints to stdout in the
normal course of operations. Thread IDs can still be retrieved from the LOG
file: cut -d ' ' -f2 LOG | sort | uniq | egrep -x '[0-9a-f]+'
2014-01-07 10:44:02 -08:00
Tomislav Novak
9f690ec62c Fix a deadlock in CompactRange()
Summary:
The way DBImpl::TEST_CompactRange() throttles down the number of bg compactions
can cause it to deadlock when CompactRange() is called concurrently from
multiple threads. Imagine a following scenario with only two threads
(max_background_compactions is 10 and bg_compaction_scheduled_ is initially 0):

   1. Thread #1 increments bg_compaction_scheduled_ (to LargeNumber), sets
      bg_compaction_scheduled_ to 9 (newvalue), schedules the compaction
      (bg_compaction_scheduled_ is now 10) and waits for it to complete.
   2. Thread #2 calls TEST_CompactRange(), increments bg_compaction_scheduled_
      (now LargeNumber + 10) and waits on a cv for bg_compaction_scheduled_ to
      drop to LargeNumber.
   3. BG thread completes the first manual compaction, decrements
      bg_compaction_scheduled_ and wakes up all threads waiting on bg_cv_.
      Thread #1 runs, increments bg_compaction_scheduled_ by LargeNumber again
      (now 2*LargeNumber + 9). Since that's more than LargeNumber + newvalue,
      thread #2 also goes to sleep (waiting on bg_cv_), without resetting
      bg_compaction_scheduled_.

This diff attempts to address the problem by introducing a new counter
bg_manual_only_ (when positive, MaybeScheduleFlushOrCompaction() will only
schedule manual compactions).

Test Plan:
I could pretty much consistently reproduce the deadlock with a program that
calls CompactRange(nullptr, nullptr) immediately after Write() from multiple
threads. This no longer happens with this patch.

Tests (make check) pass.

Reviewers: dhruba, igor, sdong, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14799
2014-01-07 10:37:34 -08:00
Igor Canadi
fff5c7e817 Merge branch 'master' into columnfamilies 2014-01-06 13:31:41 -08:00
kailiu
c370f5597a Revert change in 8f6e319. 2014-01-06 11:53:19 -08:00
Kai Liu
be271c3357 Merge pull request #56 from sepeth/refactor-detect-version
Refactor build_tools/build_detect_version
2014-01-06 11:50:24 -08:00
kailiu
7e70ff63d6 Fix issue #57 2014-01-06 11:11:19 -08:00
Doğan Çeçen
d800dc567a Refactor build_tools/build_detect_version 2014-01-06 08:44:43 +02:00
Kai Liu
8f6e31951e Add a hack to build_detect_platform so it works in all types of fb-servers 2014-01-04 23:47:44 -08:00
Igor Canadi
ef6ad1708d [column families] Support to create and drop column families
Summary:
This diff provides basic implementations of CreateColumnFamily(), DropColumnFamily() and ListColumnFamilies(). It builds on top of https://reviews.facebook.net/D14733

It also includes a bug fix for DBImplReadOnly, where Get implementation would be redirected to DBImpl instead of DBImplReadOnly.

Test Plan: Added unit test

Reviewers: dhruba, haobo, kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15021
2014-01-03 01:12:16 -08:00
Kai Liu
463086bce8 Add clang-format rules
Summary:
The rule file is forked from that in Facebook's repo.

I'll add format file for now and team members can tune the rules later.

In this patch, I made only two changes in order to be consistent with existing coding style

`SpacesBeforeTrailingComments: 2`

`ColumnLimit:     80`

Test Plan: N/A

Reviewers: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15015
2014-01-02 14:32:15 -08:00
Kai Liu
46950597d0 Automate the preparation step for a new release
Summary: Added a script that prepares the repo for facebook's new rocksdb release, which will automatically do some necessary work to make sure this repo is ready for 3rdparty release.

Test Plan:
Run this script and observed:

* new version was created (both in local and remote repo) as a git tag.
* build_version.cc was updated
* build_detect_platform was changed so that it won't create any new change.

Reviewers: haobo, dhruba, sdong, igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15003
2014-01-02 11:35:33 -08:00
kailiu
9281a826f1 Hotfix the bug in table cache's GetSliceForFileNumber
Forgot to fix this problem in master branch. Already fixed it in performance branch.
2014-01-02 10:30:42 -08:00
Igor Canadi
7535443083 [RocksDB] Support for column families in manifest
Summary:
<This diff is for Column Family branch>

Added fields in manifest file to support adding and deleting column families.

Pretty simple change, each version edit record can be:
1. add column family
2. drop column family
3. add and delete N files from a single column family (compactions and flushes will generate such records)

Test Plan: make check works, the code is backward compatible

Reviewers: dhruba, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14733
2014-01-02 04:18:28 -08:00
Igor Canadi
6de1b5b83e Merge branch 'master' into columnfamilies 2014-01-02 04:18:07 -08:00
Igor Canadi
b60c14f6ee Support multi-threaded DisableFileDeletions() and EnableFileDeletions()
Summary:
We don't want two threads to clash if they concurrently call DisableFileDeletions() and EnableFileDeletions(). I'm adding a counter that will enable file deletions only after all DisableFileDeletions() calls have been negated with EnableFileDeletions().

However, we also don't want to break the old behavior, so I added a parameter force to EnableFileDeletions(). If force is true, we will still enable file deletions after every call to EnableFileDeletions(), which is what is happening now.

Test Plan: make check

Reviewers: dhruba, haobo, sanketh

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14781
2014-01-02 03:33:42 -08:00
Igor Canadi
345fb94d26 moving autovector_test after db_test 2014-01-02 03:30:29 -08:00