rocksdb/db/memtable_list.h
agiardullo c815351038 Support saving history in memtable_list
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts.  But after flushing, we don't have any memtables, and transactions could fail to commit.  So we want to someone keep around some extra history to use for conflict checking.  In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.

After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure).  It seems like the best place for this is abstracted inside the memtable_list.  I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.

This diff adds a new parameter to control how much memtable history to keep around after flushing.  However, it sounds like people aren't too fond of adding new parameters.  So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers.  This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit.  (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached).  So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).

However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.

Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests.  Added testing in memtablelist_test and planning on adding more testing here.

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D37443
2015-05-28 16:34:24 -07:00

196 lines
6.7 KiB
C++

// Copyright (c) 2013, Facebook, Inc. All rights reserved.
// This source code is licensed under the BSD-style license found in the
// LICENSE file in the root directory of this source tree. An additional grant
// of patent rights can be found in the PATENTS file in the same directory.
//
#pragma once
#include <string>
#include <list>
#include <vector>
#include <set>
#include <deque>
#include "rocksdb/db.h"
#include "rocksdb/options.h"
#include "rocksdb/iterator.h"
#include "db/dbformat.h"
#include "db/filename.h"
#include "db/skiplist.h"
#include "db/memtable.h"
#include "rocksdb/db.h"
#include "rocksdb/iterator.h"
#include "rocksdb/options.h"
#include "util/autovector.h"
#include "util/instrumented_mutex.h"
#include "util/log_buffer.h"
namespace rocksdb {
class ColumnFamilyData;
class InternalKeyComparator;
class InstrumentedMutex;
class MergeIteratorBuilder;
// keeps a list of immutable memtables in a vector. the list is immutable
// if refcount is bigger than one. It is used as a state for Get() and
// Iterator code paths
//
// This class is not thread-safe. External synchronization is required
// (such as holding the db mutex or being on the write thread).
class MemTableListVersion {
public:
explicit MemTableListVersion(MemTableListVersion* old = nullptr);
explicit MemTableListVersion(int max_write_buffer_number_to_maintain);
void Ref();
void Unref(autovector<MemTable*>* to_delete = nullptr);
// Search all the memtables starting from the most recent one.
// Return the most recent value found, if any.
bool Get(const LookupKey& key, std::string* value, Status* s,
MergeContext* merge_context);
// Similar to Get(), but searches the Memtable history of memtables that
// have already been flushed. Should only be used from in-memory only
// queries (such as Transaction validation) as the history may contain
// writes that are also present in the SST files.
bool GetFromHistory(const LookupKey& key, std::string* value, Status* s,
MergeContext* merge_context);
void AddIterators(const ReadOptions& options,
std::vector<Iterator*>* iterator_list, Arena* arena);
void AddIterators(const ReadOptions& options,
MergeIteratorBuilder* merge_iter_builder);
uint64_t GetTotalNumEntries() const;
uint64_t GetTotalNumDeletes() const;
private:
// REQUIRE: m is an immutable memtable
void Add(MemTable* m, autovector<MemTable*>* to_delete);
// REQUIRE: m is an immutable memtable
void Remove(MemTable* m, autovector<MemTable*>* to_delete);
void TrimHistory(autovector<MemTable*>* to_delete);
friend class MemTableList;
// Immutable MemTables that have not yet been flushed.
std::list<MemTable*> memlist_;
// MemTables that have already been flushed
// (used during Transaction validation)
std::list<MemTable*> memlist_history_;
// Maximum number of MemTables to keep in memory (including both flushed
// and not-yet-flushed tables).
const int max_write_buffer_number_to_maintain_;
int refs_ = 0;
};
// This class stores references to all the immutable memtables.
// The memtables are flushed to L0 as soon as possible and in
// any order. If there are more than one immutable memtable, their
// flushes can occur concurrently. However, they are 'committed'
// to the manifest in FIFO order to maintain correctness and
// recoverability from a crash.
//
//
// Other than imm_flush_needed, this class is not thread-safe and requires
// external synchronization (such as holding the db mutex or being on the
// write thread.)
class MemTableList {
public:
// A list of memtables.
explicit MemTableList(int min_write_buffer_number_to_merge,
int max_write_buffer_number_to_maintain)
: imm_flush_needed(false),
min_write_buffer_number_to_merge_(min_write_buffer_number_to_merge),
current_(new MemTableListVersion(max_write_buffer_number_to_maintain)),
num_flush_not_started_(0),
commit_in_progress_(false),
flush_requested_(false) {
current_->Ref();
}
// Should not delete MemTableList without making sure MemTableList::current()
// is Unref()'d.
~MemTableList() {}
MemTableListVersion* current() { return current_; }
// so that background threads can detect non-nullptr pointer to
// determine whether there is anything more to start flushing.
std::atomic<bool> imm_flush_needed;
// Returns the total number of memtables in the list that haven't yet
// been flushed and logged.
int NumNotFlushed() const;
// Returns total number of memtables in the list that have been
// completely flushed and logged.
int NumFlushed() const;
// Returns true if there is at least one memtable on which flush has
// not yet started.
bool IsFlushPending() const;
// Returns the earliest memtables that needs to be flushed. The returned
// memtables are guaranteed to be in the ascending order of created time.
void PickMemtablesToFlush(autovector<MemTable*>* mems);
// Reset status of the given memtable list back to pending state so that
// they can get picked up again on the next round of flush.
void RollbackMemtableFlush(const autovector<MemTable*>& mems,
uint64_t file_number);
// Commit a successful flush in the manifest file
Status InstallMemtableFlushResults(
ColumnFamilyData* cfd, const MutableCFOptions& mutable_cf_options,
const autovector<MemTable*>& m, VersionSet* vset, InstrumentedMutex* mu,
uint64_t file_number, autovector<MemTable*>* to_delete,
Directory* db_directory, LogBuffer* log_buffer);
// New memtables are inserted at the front of the list.
// Takes ownership of the referenced held on *m by the caller of Add().
void Add(MemTable* m, autovector<MemTable*>* to_delete);
// Returns an estimate of the number of bytes of data in use.
size_t ApproximateMemoryUsage();
// Request a flush of all existing memtables to storage. This will
// cause future calls to IsFlushPending() to return true if this list is
// non-empty (regardless of the min_write_buffer_number_to_merge
// parameter). This flush request will persist until the next time
// PickMemtablesToFlush() is called.
void FlushRequested() { flush_requested_ = true; }
// Copying allowed
// MemTableList(const MemTableList&);
// void operator=(const MemTableList&);
private:
// DB mutex held
void InstallNewVersion();
const int min_write_buffer_number_to_merge_;
MemTableListVersion* current_;
// the number of elements that still need flushing
int num_flush_not_started_;
// committing in progress
bool commit_in_progress_;
// Requested a flush of all memtables to storage
bool flush_requested_;
};
} // namespace rocksdb