2013-10-16 14:59:46 -07:00
|
|
|
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under the BSD-style license found in the
|
|
|
|
// LICENSE file in the root directory of this source tree. An additional grant
|
|
|
|
// of patent rights can be found in the PATENTS file in the same directory.
|
|
|
|
//
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2013-10-04 22:32:05 -07:00
|
|
|
#pragma once
|
2012-10-19 14:00:53 -07:00
|
|
|
#include <string>
|
|
|
|
#include <list>
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
#include <vector>
|
2014-01-27 13:55:47 -08:00
|
|
|
#include <set>
|
2013-08-23 08:38:13 -07:00
|
|
|
#include "rocksdb/db.h"
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
#include "rocksdb/options.h"
|
|
|
|
#include "rocksdb/iterator.h"
|
2012-10-19 14:00:53 -07:00
|
|
|
#include "db/dbformat.h"
|
|
|
|
#include "db/skiplist.h"
|
2014-01-24 14:30:28 -08:00
|
|
|
#include "db/memtable.h"
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2013-10-03 21:49:15 -07:00
|
|
|
namespace rocksdb {
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2014-01-27 13:55:47 -08:00
|
|
|
class ColumnFamilyData;
|
2012-10-19 14:00:53 -07:00
|
|
|
class InternalKeyComparator;
|
|
|
|
class Mutex;
|
|
|
|
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
// keeps a list of immutable memtables in a vector. the list is immutable
|
|
|
|
// if refcount is bigger than one. It is used as a state for Get() and
|
|
|
|
// Iterator code paths
|
|
|
|
class MemTableListVersion {
|
|
|
|
public:
|
|
|
|
explicit MemTableListVersion(MemTableListVersion* old = nullptr);
|
|
|
|
|
|
|
|
void Ref();
|
|
|
|
void Unref(std::vector<MemTable*>* to_delete = nullptr);
|
|
|
|
|
|
|
|
int size() const;
|
|
|
|
|
|
|
|
// Search all the memtables starting from the most recent one.
|
|
|
|
// Return the most recent value found, if any.
|
|
|
|
bool Get(const LookupKey& key, std::string* value, Status* s,
|
|
|
|
MergeContext& merge_context, const Options& options);
|
|
|
|
|
|
|
|
void AddIterators(const ReadOptions& options,
|
|
|
|
std::vector<Iterator*>* iterator_list);
|
|
|
|
|
2014-01-26 17:40:43 -08:00
|
|
|
private:
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
// REQUIRE: m is mutable memtable
|
|
|
|
void Add(MemTable* m);
|
|
|
|
// REQUIRE: m is mutable memtable
|
|
|
|
void Remove(MemTable* m);
|
|
|
|
|
|
|
|
friend class MemTableList;
|
|
|
|
std::list<MemTable*> memlist_;
|
|
|
|
int size_ = 0;
|
2014-01-25 13:50:30 -08:00
|
|
|
int refs_ = 0;
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
};
|
|
|
|
|
2013-06-11 14:23:58 -07:00
|
|
|
// This class stores references to all the immutable memtables.
|
2012-11-28 16:42:36 -08:00
|
|
|
// The memtables are flushed to L0 as soon as possible and in
|
2012-10-19 14:00:53 -07:00
|
|
|
// any order. If there are more than one immutable memtable, their
|
2012-11-28 16:42:36 -08:00
|
|
|
// flushes can occur concurrently. However, they are 'committed'
|
2012-10-19 14:00:53 -07:00
|
|
|
// to the manifest in FIFO order to maintain correctness and
|
|
|
|
// recoverability from a crash.
|
|
|
|
class MemTableList {
|
|
|
|
public:
|
2012-11-28 16:42:36 -08:00
|
|
|
// A list of memtables.
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
explicit MemTableList(int min_write_buffer_number_to_merge)
|
|
|
|
: min_write_buffer_number_to_merge_(min_write_buffer_number_to_merge),
|
|
|
|
current_(new MemTableListVersion()),
|
|
|
|
num_flush_not_started_(0),
|
|
|
|
commit_in_progress_(false),
|
|
|
|
flush_requested_(false) {
|
2013-02-28 18:04:58 -08:00
|
|
|
imm_flush_needed.Release_Store(nullptr);
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
current_->Ref();
|
2012-10-19 14:00:53 -07:00
|
|
|
}
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
~MemTableList() {}
|
|
|
|
|
|
|
|
MemTableListVersion* current() { return current_; }
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2013-02-28 18:04:58 -08:00
|
|
|
// so that backgrund threads can detect non-nullptr pointer to
|
2012-10-19 14:00:53 -07:00
|
|
|
// determine whether this is anything more to start flushing.
|
|
|
|
port::AtomicPointer imm_flush_needed;
|
|
|
|
|
|
|
|
// Returns the total number of memtables in the list
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
int size() const;
|
2012-10-19 14:00:53 -07:00
|
|
|
|
|
|
|
// Returns true if there is at least one memtable on which flush has
|
|
|
|
// not yet started.
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
bool IsFlushPending();
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2013-10-24 15:58:00 -07:00
|
|
|
// Returns the earliest memtables that needs to be flushed. The returned
|
|
|
|
// memtables are guaranteed to be in the ascending order of created time.
|
2013-06-11 14:23:58 -07:00
|
|
|
void PickMemtablesToFlush(std::vector<MemTable*>* mems);
|
2012-10-19 14:00:53 -07:00
|
|
|
|
|
|
|
// Commit a successful flush in the manifest file
|
2014-01-27 13:55:47 -08:00
|
|
|
Status InstallMemtableFlushResults(
|
|
|
|
ColumnFamilyData* cfd, const std::vector<MemTable*>& m, VersionSet* vset,
|
|
|
|
Status flushStatus, port::Mutex* mu, Logger* info_log,
|
|
|
|
uint64_t file_number, std::set<uint64_t>& pending_outputs,
|
|
|
|
std::vector<MemTable*>* to_delete, Directory* db_directory);
|
2012-10-19 14:00:53 -07:00
|
|
|
|
2012-11-28 16:42:36 -08:00
|
|
|
// New memtables are inserted at the front of the list.
|
2013-01-03 17:13:56 -08:00
|
|
|
// Takes ownership of the referenced held on *m by the caller of Add().
|
2012-10-19 14:00:53 -07:00
|
|
|
void Add(MemTable* m);
|
|
|
|
|
|
|
|
// Returns an estimate of the number of bytes of data in use.
|
|
|
|
size_t ApproximateMemoryUsage();
|
|
|
|
|
2013-09-04 17:24:35 -07:00
|
|
|
// Request a flush of all existing memtables to storage
|
|
|
|
void FlushRequested() { flush_requested_ = true; }
|
|
|
|
|
2012-10-19 14:00:53 -07:00
|
|
|
// Copying allowed
|
|
|
|
// MemTableList(const MemTableList&);
|
|
|
|
// void operator=(const MemTableList&);
|
|
|
|
|
|
|
|
private:
|
MemTableListVersion
Summary:
MemTableListVersion is to MemTableList what Version is to VersionSet. I took almost the same ideas to develop MemTableListVersion. The reason is to have copying std::list done in background, while flushing, rather than in foreground (MultiGet() and NewIterator()) under a mutex! Also, whenever we copied MemTableList, we copied also some MemTableList metadata (flush_requested_, commit_in_progress_, etc.), which was wasteful.
This diff avoids std::list copy under a mutex in both MultiGet() and NewIterator(). I created a small database with some number of immutable memtables, and creating 100.000 iterators in a single-thread (!) decreased from {188739, 215703, 198028} to {154352, 164035, 159817}. A lot of the savings come from code under a mutex, so we should see much higher savings with multiple threads. Creating new iterator is very important to LogDevice team.
I also think this diff will make SuperVersion obsolete for performance reasons. I will try it in the next diff. SuperVersion gave us huge savings on Get() code path, but I think that most of the savings came from copying MemTableList under a mutex. If we had MemTableListVersion, we would never need to copy the entire object (like we still do in NewIterator() and MultiGet())
Test Plan: `make check` works. I will also do `make valgrind_check` before commit
Reviewers: dhruba, haobo, kailiu, sdong, emayanke, tnovak
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15255
2014-01-24 14:52:08 -08:00
|
|
|
// DB mutex held
|
|
|
|
void InstallNewVersion();
|
|
|
|
|
|
|
|
int min_write_buffer_number_to_merge_;
|
|
|
|
|
|
|
|
MemTableListVersion* current_;
|
2012-10-19 14:00:53 -07:00
|
|
|
|
|
|
|
// the number of elements that still need flushing
|
|
|
|
int num_flush_not_started_;
|
|
|
|
|
|
|
|
// committing in progress
|
|
|
|
bool commit_in_progress_;
|
|
|
|
|
2013-09-04 17:24:35 -07:00
|
|
|
// Requested a flush of all memtables to storage
|
|
|
|
bool flush_requested_;
|
|
|
|
|
2012-10-19 14:00:53 -07:00
|
|
|
};
|
|
|
|
|
2013-10-03 21:49:15 -07:00
|
|
|
} // namespace rocksdb
|