2016-02-09 15:12:00 -08:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-15 16:03:42 -07:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
2013-10-16 14:59:46 -07:00
|
|
|
//
|
2013-10-04 22:32:05 -07:00
|
|
|
#pragma once
|
2014-01-02 11:26:57 -08:00
|
|
|
|
2012-10-19 14:00:53 -07:00
|
|
|
#include <string>
|
|
|
|
#include <list>
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
#include <vector>
|
2014-01-27 13:55:47 -08:00
|
|
|
#include <set>
|
[RocksDB] [MergeOperator] The new Merge Interface! Uses merge sequences.
Summary:
Here are the major changes to the Merge Interface. It has been expanded
to handle cases where the MergeOperator is not associative. It does so by stacking
up merge operations while scanning through the key history (i.e.: during Get() or
Compaction), until a valid Put/Delete/end-of-history is encountered; it then
applies all of the merge operations in the correct sequence starting with the
base/sentinel value.
I have also introduced an "AssociativeMerge" function which allows the user to
take advantage of associative merge operations (such as in the case of counters).
The implementation will always attempt to merge the operations/operands themselves
together when they are encountered, and will resort to the "stacking" method if
and only if the "associative-merge" fails.
This implementation is conjectured to allow MergeOperator to handle the general
case, while still providing the user with the ability to take advantage of certain
efficiencies in their own merge-operator / data-structure.
NOTE: This is a preliminary diff. This must still go through a lot of review,
revision, and testing. Feedback welcome!
Test Plan:
-This is a preliminary diff. I have only just begun testing/debugging it.
-I will be testing this with the existing MergeOperator use-cases and unit-tests
(counters, string-append, and redis-lists)
-I will be "desk-checking" and walking through the code with the help gdb.
-I will find a way of stress-testing the new interface / implementation using
db_bench, db_test, merge_test, and/or db_stress.
-I will ensure that my tests cover all cases: Get-Memtable,
Get-Immutable-Memtable, Get-from-Disk, Iterator-Range-Scan, Flush-Memtable-to-L0,
Compaction-L0-L1, Compaction-Ln-L(n+1), Put/Delete found, Put/Delete not-found,
end-of-history, end-of-file, etc.
-A lot of feedback from the reviewers.
Reviewers: haobo, dhruba, zshao, emayanke
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11499
2013-08-05 20:14:32 -07:00
|
|
|
#include <deque>
|
2014-01-02 11:26:57 -08:00
|
|
|
|
2012-10-19 14:00:53 -07:00
|
|
|
#include "db/dbformat.h"
|
2014-01-24 14:30:28 -08:00
|
|
|
#include "db/memtable.h"
|
2016-11-03 18:40:23 -07:00
|
|
|
#include "db/range_del_aggregator.h"
|
2017-04-05 19:02:00 -07:00
|
|
|
#include "monitoring/instrumented_mutex.h"
|
2014-01-28 10:35:48 -08:00
|
|
|
#include "rocksdb/db.h"
|
|
|
|
#include "rocksdb/iterator.h"
|
|
|
|
#include "rocksdb/options.h"
|
2015-05-29 14:36:35 -07:00
|
|
|
#include "rocksdb/types.h"
|
2014-01-02 11:26:57 -08:00
|
|
|
#include "util/autovector.h"
|
2017-04-03 18:27:24 -07:00
|
|
|
#include "util/filename.h"
|
2014-04-07 11:29:48 -07:00
|
|
|
#include "util/log_buffer.h"
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2013-10-03 21:49:15 -07:00
|
|
|
namespace rocksdb {
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2014-01-27 13:55:47 -08:00
|
|
|
class ColumnFamilyData;
|
2012-10-19 14:00:53 -07:00
|
|
|
class InternalKeyComparator;
|
2015-02-04 21:39:45 -08:00
|
|
|
class InstrumentedMutex;
|
In DB::NewIterator(), try to allocate the whole iterator tree in an arena
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.
Test Plan: make all check
Reviewers: ljin, haobo
Reviewed By: haobo
Subscribers: leveldb, dhruba, yhchiang, igor
Differential Revision: https://reviews.facebook.net/D18513
2014-06-02 16:38:00 -07:00
|
|
|
class MergeIteratorBuilder;
|
2012-10-19 14:00:53 -07:00
|
|
|
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
// keeps a list of immutable memtables in a vector. the list is immutable
|
|
|
|
// if refcount is bigger than one. It is used as a state for Get() and
|
|
|
|
// Iterator code paths
|
2015-04-08 21:10:35 -07:00
|
|
|
//
|
|
|
|
// This class is not thread-safe. External synchronization is required
|
|
|
|
// (such as holding the db mutex or being on the write thread).
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
class MemTableListVersion {
|
|
|
|
public:
|
2015-08-19 13:32:09 -07:00
|
|
|
explicit MemTableListVersion(size_t* parent_memtable_list_memory_usage,
|
|
|
|
MemTableListVersion* old = nullptr);
|
|
|
|
explicit MemTableListVersion(size_t* parent_memtable_list_memory_usage,
|
|
|
|
int max_write_buffer_number_to_maintain);
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
|
|
|
|
void Ref();
|
2014-01-28 10:35:48 -08:00
|
|
|
void Unref(autovector<MemTable*>* to_delete = nullptr);
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
|
|
|
|
// Search all the memtables starting from the most recent one.
|
|
|
|
// Return the most recent value found, if any.
|
2015-05-29 14:36:35 -07:00
|
|
|
//
|
|
|
|
// If any operation was found for this key, its most recent sequence number
|
|
|
|
// will be stored in *seq on success (regardless of whether true/false is
|
|
|
|
// returned). Otherwise, *seq will be set to kMaxSequenceNumber.
|
2017-01-08 13:49:15 -08:00
|
|
|
bool Get(const LookupKey& key, std::string* value, Status* s,
|
|
|
|
MergeContext* merge_context, RangeDelAggregator* range_del_agg,
|
2017-09-11 08:58:52 -07:00
|
|
|
SequenceNumber* seq, const ReadOptions& read_opts,
|
2017-10-03 09:08:07 -07:00
|
|
|
ReadCallback* callback = nullptr, bool* is_blob_index = nullptr);
|
2017-01-08 14:08:51 -08:00
|
|
|
|
2017-01-08 13:49:15 -08:00
|
|
|
bool Get(const LookupKey& key, std::string* value, Status* s,
|
|
|
|
MergeContext* merge_context, RangeDelAggregator* range_del_agg,
|
2017-10-03 09:08:07 -07:00
|
|
|
const ReadOptions& read_opts, ReadCallback* callback = nullptr,
|
|
|
|
bool* is_blob_index = nullptr) {
|
2017-01-08 14:08:51 -08:00
|
|
|
SequenceNumber seq;
|
2017-09-11 08:58:52 -07:00
|
|
|
return Get(key, value, s, merge_context, range_del_agg, &seq, read_opts,
|
2017-10-03 09:08:07 -07:00
|
|
|
callback, is_blob_index);
|
2017-01-08 14:08:51 -08:00
|
|
|
}
|
2017-01-08 13:49:15 -08:00
|
|
|
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
// Similar to Get(), but searches the Memtable history of memtables that
|
|
|
|
// have already been flushed. Should only be used from in-memory only
|
|
|
|
// queries (such as Transaction validation) as the history may contain
|
|
|
|
// writes that are also present in the SST files.
|
|
|
|
bool GetFromHistory(const LookupKey& key, std::string* value, Status* s,
|
2016-11-03 18:40:23 -07:00
|
|
|
MergeContext* merge_context,
|
|
|
|
RangeDelAggregator* range_del_agg, SequenceNumber* seq,
|
2017-10-17 17:24:25 -07:00
|
|
|
const ReadOptions& read_opts,
|
|
|
|
bool* is_blob_index = nullptr);
|
2017-01-08 14:08:51 -08:00
|
|
|
bool GetFromHistory(const LookupKey& key, std::string* value, Status* s,
|
|
|
|
MergeContext* merge_context,
|
|
|
|
RangeDelAggregator* range_del_agg,
|
2017-10-17 17:24:25 -07:00
|
|
|
const ReadOptions& read_opts,
|
|
|
|
bool* is_blob_index = nullptr) {
|
2015-05-29 14:36:35 -07:00
|
|
|
SequenceNumber seq;
|
2016-11-03 18:40:23 -07:00
|
|
|
return GetFromHistory(key, value, s, merge_context, range_del_agg, &seq,
|
2017-10-17 17:24:25 -07:00
|
|
|
read_opts, is_blob_index);
|
2015-05-29 14:36:35 -07:00
|
|
|
}
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
|
2016-11-04 11:53:38 -07:00
|
|
|
Status AddRangeTombstoneIterators(const ReadOptions& read_opts, Arena* arena,
|
|
|
|
RangeDelAggregator* range_del_agg);
|
2017-06-01 22:14:27 -07:00
|
|
|
Status AddRangeTombstoneIterators(
|
|
|
|
const ReadOptions& read_opts,
|
|
|
|
std::vector<InternalIterator*>* range_del_iters);
|
2016-11-04 11:53:38 -07:00
|
|
|
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
void AddIterators(const ReadOptions& options,
|
2015-10-12 15:06:38 -07:00
|
|
|
std::vector<InternalIterator*>* iterator_list,
|
|
|
|
Arena* arena);
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
|
In DB::NewIterator(), try to allocate the whole iterator tree in an arena
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.
Test Plan: make all check
Reviewers: ljin, haobo
Reviewed By: haobo
Subscribers: leveldb, dhruba, yhchiang, igor
Differential Revision: https://reviews.facebook.net/D18513
2014-06-02 16:38:00 -07:00
|
|
|
void AddIterators(const ReadOptions& options,
|
|
|
|
MergeIteratorBuilder* merge_iter_builder);
|
|
|
|
|
2014-04-22 17:17:33 -07:00
|
|
|
uint64_t GetTotalNumEntries() const;
|
|
|
|
|
2015-03-18 16:11:02 -07:00
|
|
|
uint64_t GetTotalNumDeletes() const;
|
|
|
|
|
2017-02-06 14:42:38 -08:00
|
|
|
MemTable::MemTableStats ApproximateStats(const Slice& start_ikey,
|
|
|
|
const Slice& end_ikey);
|
2015-06-12 18:04:30 -07:00
|
|
|
|
2015-05-29 14:36:35 -07:00
|
|
|
// Returns the value of MemTable::GetEarliestSequenceNumber() on the most
|
|
|
|
// recent MemTable in this list or kMaxSequenceNumber if the list is empty.
|
|
|
|
// If include_history=true, will also search Memtables in MemTableList
|
|
|
|
// History.
|
|
|
|
SequenceNumber GetEarliestSequenceNumber(bool include_history = false) const;
|
|
|
|
|
2014-01-26 17:40:43 -08:00
|
|
|
private:
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
// REQUIRE: m is an immutable memtable
|
|
|
|
void Add(MemTable* m, autovector<MemTable*>* to_delete);
|
|
|
|
// REQUIRE: m is an immutable memtable
|
|
|
|
void Remove(MemTable* m, autovector<MemTable*>* to_delete);
|
|
|
|
|
|
|
|
void TrimHistory(autovector<MemTable*>* to_delete);
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
|
2015-05-29 14:36:35 -07:00
|
|
|
bool GetFromList(std::list<MemTable*>* list, const LookupKey& key,
|
2017-01-08 14:08:51 -08:00
|
|
|
std::string* value, Status* s, MergeContext* merge_context,
|
2016-11-03 18:40:23 -07:00
|
|
|
RangeDelAggregator* range_del_agg, SequenceNumber* seq,
|
2017-09-11 08:58:52 -07:00
|
|
|
const ReadOptions& read_opts,
|
2017-10-03 09:08:07 -07:00
|
|
|
ReadCallback* callback = nullptr,
|
|
|
|
bool* is_blob_index = nullptr);
|
2015-05-29 14:36:35 -07:00
|
|
|
|
2015-08-19 13:32:09 -07:00
|
|
|
void AddMemTable(MemTable* m);
|
|
|
|
|
|
|
|
void UnrefMemTable(autovector<MemTable*>* to_delete, MemTable* m);
|
|
|
|
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
friend class MemTableList;
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
|
|
|
|
// Immutable MemTables that have not yet been flushed.
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
std::list<MemTable*> memlist_;
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
|
|
|
|
// MemTables that have already been flushed
|
|
|
|
// (used during Transaction validation)
|
|
|
|
std::list<MemTable*> memlist_history_;
|
|
|
|
|
|
|
|
// Maximum number of MemTables to keep in memory (including both flushed
|
|
|
|
// and not-yet-flushed tables).
|
|
|
|
const int max_write_buffer_number_to_maintain_;
|
|
|
|
|
2014-01-25 13:50:30 -08:00
|
|
|
int refs_ = 0;
|
2015-08-19 13:32:09 -07:00
|
|
|
|
|
|
|
size_t* parent_memtable_list_memory_usage_;
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
};
|
|
|
|
|
2013-06-11 14:23:58 -07:00
|
|
|
// This class stores references to all the immutable memtables.
|
2012-11-28 16:42:36 -08:00
|
|
|
// The memtables are flushed to L0 as soon as possible and in
|
2012-10-19 14:00:53 -07:00
|
|
|
// any order. If there are more than one immutable memtable, their
|
2012-11-28 16:42:36 -08:00
|
|
|
// flushes can occur concurrently. However, they are 'committed'
|
2012-10-19 14:00:53 -07:00
|
|
|
// to the manifest in FIFO order to maintain correctness and
|
|
|
|
// recoverability from a crash.
|
2015-04-08 21:10:35 -07:00
|
|
|
//
|
|
|
|
//
|
|
|
|
// Other than imm_flush_needed, this class is not thread-safe and requires
|
|
|
|
// external synchronization (such as holding the db mutex or being on the
|
|
|
|
// write thread.)
|
2012-10-19 14:00:53 -07:00
|
|
|
class MemTableList {
|
|
|
|
public:
|
2012-11-28 16:42:36 -08:00
|
|
|
// A list of memtables.
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
explicit MemTableList(int min_write_buffer_number_to_merge,
|
|
|
|
int max_write_buffer_number_to_maintain)
|
2014-10-27 14:50:21 -07:00
|
|
|
: imm_flush_needed(false),
|
|
|
|
min_write_buffer_number_to_merge_(min_write_buffer_number_to_merge),
|
2015-08-19 13:32:09 -07:00
|
|
|
current_(new MemTableListVersion(¤t_memory_usage_,
|
|
|
|
max_write_buffer_number_to_maintain)),
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
num_flush_not_started_(0),
|
|
|
|
commit_in_progress_(false),
|
|
|
|
flush_requested_(false) {
|
|
|
|
current_->Ref();
|
2015-08-19 13:32:09 -07:00
|
|
|
current_memory_usage_ = 0;
|
2012-10-19 14:00:53 -07:00
|
|
|
}
|
2015-04-10 14:16:03 -07:00
|
|
|
|
|
|
|
// Should not delete MemTableList without making sure MemTableList::current()
|
|
|
|
// is Unref()'d.
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
~MemTableList() {}
|
|
|
|
|
|
|
|
MemTableListVersion* current() { return current_; }
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2014-02-12 11:42:54 -08:00
|
|
|
// so that background threads can detect non-nullptr pointer to
|
|
|
|
// determine whether there is anything more to start flushing.
|
2014-10-27 14:50:21 -07:00
|
|
|
std::atomic<bool> imm_flush_needed;
|
2012-10-19 14:00:53 -07:00
|
|
|
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
// Returns the total number of memtables in the list that haven't yet
|
|
|
|
// been flushed and logged.
|
|
|
|
int NumNotFlushed() const;
|
|
|
|
|
|
|
|
// Returns total number of memtables in the list that have been
|
|
|
|
// completely flushed and logged.
|
|
|
|
int NumFlushed() const;
|
2012-10-19 14:00:53 -07:00
|
|
|
|
|
|
|
// Returns true if there is at least one memtable on which flush has
|
|
|
|
// not yet started.
|
2014-03-18 12:37:42 -07:00
|
|
|
bool IsFlushPending() const;
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2013-10-24 15:58:00 -07:00
|
|
|
// Returns the earliest memtables that needs to be flushed. The returned
|
|
|
|
// memtables are guaranteed to be in the ascending order of created time.
|
2014-01-02 11:26:57 -08:00
|
|
|
void PickMemtablesToFlush(autovector<MemTable*>* mems);
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2014-02-12 11:42:54 -08:00
|
|
|
// Reset status of the given memtable list back to pending state so that
|
|
|
|
// they can get picked up again on the next round of flush.
|
|
|
|
void RollbackMemtableFlush(const autovector<MemTable*>& mems,
|
2014-11-07 11:50:34 -08:00
|
|
|
uint64_t file_number);
|
2014-02-12 11:42:54 -08:00
|
|
|
|
2012-10-19 14:00:53 -07:00
|
|
|
// Commit a successful flush in the manifest file
|
2014-07-02 09:54:20 -07:00
|
|
|
Status InstallMemtableFlushResults(
|
2014-10-01 16:19:16 -07:00
|
|
|
ColumnFamilyData* cfd, const MutableCFOptions& mutable_cf_options,
|
2015-02-04 21:39:45 -08:00
|
|
|
const autovector<MemTable*>& m, VersionSet* vset, InstrumentedMutex* mu,
|
2014-11-07 11:50:34 -08:00
|
|
|
uint64_t file_number, autovector<MemTable*>* to_delete,
|
|
|
|
Directory* db_directory, LogBuffer* log_buffer);
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2012-11-28 16:42:36 -08:00
|
|
|
// New memtables are inserted at the front of the list.
|
2013-01-03 17:13:56 -08:00
|
|
|
// Takes ownership of the referenced held on *m by the caller of Add().
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
void Add(MemTable* m, autovector<MemTable*>* to_delete);
|
2012-10-19 14:00:53 -07:00
|
|
|
|
|
|
|
// Returns an estimate of the number of bytes of data in use.
|
|
|
|
size_t ApproximateMemoryUsage();
|
|
|
|
|
2015-08-19 13:32:09 -07:00
|
|
|
// Returns an estimate of the number of bytes of data used by
|
|
|
|
// the unflushed mem-tables.
|
|
|
|
size_t ApproximateUnflushedMemTablesMemoryUsage();
|
|
|
|
|
2017-10-23 15:22:05 -07:00
|
|
|
// Returns an estimate of the timestamp of the earliest key.
|
|
|
|
uint64_t ApproximateOldestKeyTime() const;
|
|
|
|
|
2015-04-09 18:01:11 -07:00
|
|
|
// Request a flush of all existing memtables to storage. This will
|
|
|
|
// cause future calls to IsFlushPending() to return true if this list is
|
|
|
|
// non-empty (regardless of the min_write_buffer_number_to_merge
|
|
|
|
// parameter). This flush request will persist until the next time
|
|
|
|
// PickMemtablesToFlush() is called.
|
2013-09-04 17:24:35 -07:00
|
|
|
void FlushRequested() { flush_requested_ = true; }
|
|
|
|
|
2017-01-19 15:21:07 -08:00
|
|
|
bool HasFlushRequested() { return flush_requested_; }
|
|
|
|
|
2012-10-19 14:00:53 -07:00
|
|
|
// Copying allowed
|
|
|
|
// MemTableList(const MemTableList&);
|
|
|
|
// void operator=(const MemTableList&);
|
|
|
|
|
2015-08-19 13:32:09 -07:00
|
|
|
size_t* current_memory_usage() { return ¤t_memory_usage_; }
|
|
|
|
|
2016-04-18 11:11:51 -07:00
|
|
|
uint64_t GetMinLogContainingPrepSection();
|
|
|
|
|
2012-10-19 14:00:53 -07:00
|
|
|
private:
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
// DB mutex held
|
|
|
|
void InstallNewVersion();
|
|
|
|
|
Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00
|
|
|
const int min_write_buffer_number_to_merge_;
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
|
|
|
|
MemTableListVersion* current_;
|
2012-10-19 14:00:53 -07:00
|
|
|
|
|
|
|
// the number of elements that still need flushing
|
|
|
|
int num_flush_not_started_;
|
|
|
|
|
|
|
|
// committing in progress
|
|
|
|
bool commit_in_progress_;
|
|
|
|
|
2013-09-04 17:24:35 -07:00
|
|
|
// Requested a flush of all memtables to storage
|
|
|
|
bool flush_requested_;
|
|
|
|
|
2015-08-19 13:32:09 -07:00
|
|
|
// The current memory usage.
|
|
|
|
size_t current_memory_usage_;
|
2012-10-19 14:00:53 -07:00
|
|
|
};
|
|
|
|
|
2013-10-03 21:49:15 -07:00
|
|
|
} // namespace rocksdb
|