[RocksDB] BackupableDB
Summary:
In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you.
Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes.
There are multiple things you can configure:
1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like.
2. sync - if true, it *guarantees* backup consistency on machine reboot
3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. *IMPORTANT* -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files.
4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup.
5. More things you can find in BackupableDBOptions
Here is the directory structure I use:
backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot
0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files
files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file
files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files
All the files are ref counted and deleted immediatelly when they get out of scope.
Some other stuff in this diff:
1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do.
2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB.
Test Plan:
I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff.
Also, `make asan_check`
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D14295
2013-12-09 23:06:52 +01:00
|
|
|
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under the BSD-style license found in the
|
|
|
|
// LICENSE file in the root directory of this source tree. An additional grant
|
|
|
|
// of patent rights can be found in the PATENTS file in the same directory.
|
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
|
|
|
|
#pragma once
|
|
|
|
#include "utilities/stackable_db.h"
|
|
|
|
#include "rocksdb/env.h"
|
|
|
|
#include "rocksdb/status.h"
|
|
|
|
|
|
|
|
#include <string>
|
|
|
|
#include <map>
|
|
|
|
#include <vector>
|
|
|
|
|
|
|
|
namespace rocksdb {
|
|
|
|
|
|
|
|
struct BackupableDBOptions {
|
|
|
|
// Where to keep the backup files. Has to be different than dbname_
|
|
|
|
// Best to set this to dbname_ + "/backups"
|
|
|
|
// Required
|
|
|
|
std::string backup_dir;
|
|
|
|
|
|
|
|
// Backup Env object. It will be used for backup file I/O. If it's
|
|
|
|
// nullptr, backups will be written out using DBs Env. If it's
|
|
|
|
// non-nullptr, backup's I/O will be performed using this object.
|
|
|
|
// If you want to have backups on HDFS, use HDFS Env here!
|
|
|
|
// Default: nullptr
|
|
|
|
Env* backup_env;
|
|
|
|
|
|
|
|
// Backup info and error messages will be written to info_log
|
|
|
|
// if non-nullptr.
|
|
|
|
// Default: nullptr
|
|
|
|
Logger* info_log;
|
|
|
|
|
|
|
|
// If sync == true, we can guarantee you'll get consistent backup even
|
|
|
|
// on a machine crash/reboot. Backup process is slower with sync enabled.
|
|
|
|
// If sync == false, we don't guarantee anything on machine reboot. However,
|
|
|
|
// chances are some of the backups are consistent.
|
|
|
|
// Default: true
|
|
|
|
bool sync;
|
|
|
|
|
|
|
|
// If true, it will delete whatever backups there are already
|
|
|
|
// Default: false
|
|
|
|
bool destroy_old_data;
|
|
|
|
|
|
|
|
explicit BackupableDBOptions(const std::string& _backup_dir,
|
|
|
|
Env* _backup_env = nullptr,
|
|
|
|
Logger* _info_log = nullptr,
|
|
|
|
bool _sync = true,
|
|
|
|
bool _destroy_old_data = false) :
|
|
|
|
backup_dir(_backup_dir),
|
|
|
|
backup_env(_backup_env),
|
|
|
|
info_log(_info_log),
|
|
|
|
sync(_sync),
|
|
|
|
destroy_old_data(_destroy_old_data) { }
|
|
|
|
};
|
|
|
|
|
|
|
|
class BackupEngine;
|
|
|
|
|
|
|
|
typedef uint32_t BackupID;
|
|
|
|
|
|
|
|
struct BackupInfo {
|
|
|
|
BackupID backup_id;
|
|
|
|
int64_t timestamp;
|
|
|
|
uint64_t size;
|
|
|
|
|
|
|
|
BackupInfo() {}
|
|
|
|
BackupInfo(BackupID _backup_id, int64_t _timestamp, uint64_t _size)
|
|
|
|
: backup_id(_backup_id), timestamp(_timestamp), size(_size) {}
|
|
|
|
};
|
|
|
|
|
|
|
|
// Stack your DB with BackupableDB to be able to backup the DB
|
|
|
|
class BackupableDB : public StackableDB {
|
|
|
|
public:
|
|
|
|
// BackupableDBOptions have to be the same as the ones used in a previous
|
|
|
|
// incarnation of the DB
|
2013-12-11 05:49:28 +01:00
|
|
|
//
|
|
|
|
// BackupableDB ownes the pointer `DB* db` now. You should not delete it or
|
|
|
|
// use it after the invocation of BackupableDB
|
[RocksDB] BackupableDB
Summary:
In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you.
Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes.
There are multiple things you can configure:
1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like.
2. sync - if true, it *guarantees* backup consistency on machine reboot
3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. *IMPORTANT* -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files.
4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup.
5. More things you can find in BackupableDBOptions
Here is the directory structure I use:
backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot
0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files
files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file
files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files
All the files are ref counted and deleted immediatelly when they get out of scope.
Some other stuff in this diff:
1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do.
2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB.
Test Plan:
I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff.
Also, `make asan_check`
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D14295
2013-12-09 23:06:52 +01:00
|
|
|
BackupableDB(DB* db, const BackupableDBOptions& options);
|
|
|
|
virtual ~BackupableDB();
|
|
|
|
|
|
|
|
// Captures the state of the database in the latest backup
|
|
|
|
// NOT a thread safe call
|
|
|
|
Status CreateNewBackup(bool flush_before_backup = false);
|
|
|
|
// Returns info about backups in backup_info
|
|
|
|
void GetBackupInfo(std::vector<BackupInfo>* backup_info);
|
|
|
|
// deletes old backups, keeping latest num_backups_to_keep alive
|
|
|
|
Status PurgeOldBackups(uint32_t num_backups_to_keep);
|
|
|
|
// deletes a specific backup
|
|
|
|
Status DeleteBackup(BackupID backup_id);
|
|
|
|
|
|
|
|
private:
|
|
|
|
BackupEngine* backup_engine_;
|
|
|
|
};
|
|
|
|
|
|
|
|
// Use this class to access information about backups and restore from them
|
|
|
|
class RestoreBackupableDB {
|
|
|
|
public:
|
|
|
|
RestoreBackupableDB(Env* db_env, const BackupableDBOptions& options);
|
|
|
|
~RestoreBackupableDB();
|
|
|
|
|
|
|
|
// Returns info about backups in backup_info
|
|
|
|
void GetBackupInfo(std::vector<BackupInfo>* backup_info);
|
|
|
|
|
|
|
|
// restore from backup with backup_id
|
|
|
|
// IMPORTANT -- if you restore from some backup that is not the latest,
|
2013-12-11 05:49:28 +01:00
|
|
|
// and you start creating new backups from the new DB, all the backups
|
|
|
|
// that were newer than the backup you restored from will be deleted
|
|
|
|
//
|
|
|
|
// Example: Let's say you have backups 1, 2, 3, 4, 5 and you restore 3.
|
|
|
|
// If you try creating a new backup now, old backups 4 and 5 will be deleted
|
|
|
|
// and new backup with ID 4 will be created.
|
[RocksDB] BackupableDB
Summary:
In this diff I present you BackupableDB v1. You can easily use it to backup your DB and it will do incremental snapshots for you.
Let's first describe how you would use BackupableDB. It's inheriting StackableDB interface so you can easily construct it with your DB object -- it will add a method RollTheSnapshot() to the DB object. When you call RollTheSnapshot(), current snapshot of the DB will be stored in the backup dir. To restore, you can just call RestoreDBFromBackup() on a BackupableDB (which is a static method) and it will restore all files from the backup dir. In the next version, it will even support automatic backuping every X minutes.
There are multiple things you can configure:
1. backup_env and db_env can be different, which is awesome because then you can easily backup to HDFS or wherever you feel like.
2. sync - if true, it *guarantees* backup consistency on machine reboot
3. number of snapshots to keep - this will keep last N snapshots around if you want, for some reason, be able to restore from an earlier snapshot. All the backuping is done in incremental fashion - if we already have 00010.sst, we will not copy it again. *IMPORTANT* -- This is based on assumption that 00010.sst never changes - two files named 00010.sst from the same DB will always be exactly the same. Is this true? I always copy manifest, current and log files.
4. You can decide if you want to flush the memtables before you backup, or you're fine with backing up the log files -- either way, you get a complete and consistent view of the database at a time of backup.
5. More things you can find in BackupableDBOptions
Here is the directory structure I use:
backup_dir/CURRENT_SNAPSHOT - just 4 bytes holding the latest snapshot
0, 1, 2, ... - files containing serialized version of each snapshot - containing a list of files
files/*.sst - sst files shared between snapshots - if one snapshot references 00010.sst and another one needs to backup it from the DB, it will just reference the same file
files/ 0/, 1/, 2/, ... - snapshot directories containing private snapshot files - current, manifest and log files
All the files are ref counted and deleted immediatelly when they get out of scope.
Some other stuff in this diff:
1. Added GetEnv() method to the DB. Discussed with @haobo and we agreed that it seems right thing to do.
2. Fixed StackableDB interface. The way it was set up before, I was not able to implement BackupableDB.
Test Plan:
I have a unittest, but please don't look at this yet. I just hacked it up to help me with debugging. I will write a lot of good tests and update the diff.
Also, `make asan_check`
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D14295
2013-12-09 23:06:52 +01:00
|
|
|
Status RestoreDBFromBackup(BackupID backup_id, const std::string& db_dir,
|
|
|
|
const std::string& wal_dir);
|
|
|
|
|
|
|
|
// restore from the latest backup
|
|
|
|
Status RestoreDBFromLatestBackup(const std::string& db_dir,
|
|
|
|
const std::string& wal_dir);
|
|
|
|
// deletes old backups, keeping latest num_backups_to_keep alive
|
|
|
|
Status PurgeOldBackups(uint32_t num_backups_to_keep);
|
|
|
|
// deletes a specific backup
|
|
|
|
Status DeleteBackup(BackupID backup_id);
|
|
|
|
|
|
|
|
private:
|
|
|
|
BackupEngine* backup_engine_;
|
|
|
|
};
|
|
|
|
|
|
|
|
} // rocksdb namespace
|