Define WAL related classes to be used in VersionEdit and VersionSet (#7164)
Summary:
`WalAddition`, `WalDeletion` are defined in `wal_version.h` and used in `VersionEdit`.
`WalAddition` is used to represent events of creating a new WAL (no size, just log number), or closing a WAL (with size).
`WalDeletion` is used to represent events of deleting or archiving a WAL, it means the WAL is no longer alive (won't be replayed during recovery).
`WalSet` is the set of alive WALs kept in `VersionSet`.
1. Why use `WalDeletion` instead of relying on `MinLogNumber` to identify outdated WALs
On recovery, we can compute `MinLogNumber()` based on the log numbers kept in MANIFEST, any log with number < MinLogNumber can be ignored. So it seems that we don't need to persist `WalDeletion` to MANIFEST, since we can ignore the WALs based on MinLogNumber.
But the `MinLogNumber()` is actually a lower bound, it does not exactly mean that logs starting from MinLogNumber must exist. This is because in a corner case, when a column family is empty and never flushed, its log number is set to the largest log number, but not persisted in MANIFEST. So let's say there are 2 column families, when creating the DB, the first WAL has log number 1, so it's persisted to MANIFEST for both column families. Then CF 0 is empty and never flushed, CF 1 is updated and flushed, so a new WAL with log number 2 is created and persisted to MANIFEST for CF 1. But CF 0's log number in MANIFEST is still 1. So on recovery, MinLogNumber is 1, but since log 1 only contains data for CF 1, and CF 1 is flushed, log 1 might have already been deleted from disk.
We can make `MinLogNumber()` be the exactly minimum log number that must exist, by persisting the most recent log number for empty column families that are not flushed. But if there are N such column families, then every time a new WAL is created, we need to add N records to MANIFEST.
In current design, a record is persisted to MANIFEST only when WAL is created, closed, or deleted/archived, so the number of WAL related records are bounded to 3x number of WALs.
2. Why keep `WalSet` in `VersionSet` instead of applying the `VersionEdit`s to `VersionStorageInfo`
`VersionEdit`s are originally designed to track the addition and deletion of SST files. The SST files are related to column families, each column family has a list of `Version`s, and each `Version` keeps the set of active SST files in `VersionStorageInfo`.
But WALs are a concept of DB, they are not bounded to specific column families. So logically it does not make sense to store WALs in a column family's `Version`s.
Also, `Version`'s purpose is to keep reference to SST / blob files, so that they are not deleted until there is no version referencing them. But a WAL is deleted regardless of version references.
So we keep the WALs in `VersionSet` for the purpose of writing out the DB state's snapshot when creating new MANIFESTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7164
Test Plan:
make version_edit_test && ./version_edit_test
make wal_edit_test && ./wal_edit_test
Reviewed By: ltamasi
Differential Revision: D22677936
Pulled By: cheng-chang
fbshipit-source-id: 5a3b6890140e572ffd79eb37e6e4c3c32361a859
2020-08-05 16:32:26 -07:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
|
|
|
|
#include "db/wal_edit.h"
|
|
|
|
|
|
|
|
#include "port/port.h"
|
|
|
|
#include "port/stack_trace.h"
|
|
|
|
#include "test_util/testharness.h"
|
|
|
|
#include "test_util/testutil.h"
|
|
|
|
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
|
|
|
|
TEST(WalSet, AddDeleteReset) {
|
|
|
|
WalSet wals;
|
|
|
|
ASSERT_TRUE(wals.GetWals().empty());
|
|
|
|
|
|
|
|
// Create WAL 1 - 10.
|
|
|
|
for (WalNumber log_number = 1; log_number <= 10; log_number++) {
|
|
|
|
wals.AddWal(WalAddition(log_number));
|
|
|
|
}
|
|
|
|
ASSERT_EQ(wals.GetWals().size(), 10);
|
|
|
|
|
|
|
|
// Close WAL 1 - 5.
|
|
|
|
for (WalNumber log_number = 1; log_number <= 5; log_number++) {
|
|
|
|
wals.AddWal(WalAddition(log_number, WalMetadata(100)));
|
|
|
|
}
|
|
|
|
ASSERT_EQ(wals.GetWals().size(), 10);
|
|
|
|
|
|
|
|
// Delete WAL 1 - 5.
|
|
|
|
for (WalNumber log_number = 1; log_number <= 5; log_number++) {
|
|
|
|
wals.DeleteWal(WalDeletion(log_number));
|
|
|
|
}
|
|
|
|
ASSERT_EQ(wals.GetWals().size(), 5);
|
|
|
|
|
|
|
|
WalNumber expected_log_number = 6;
|
|
|
|
for (auto it : wals.GetWals()) {
|
|
|
|
WalNumber log_number = it.first;
|
|
|
|
ASSERT_EQ(log_number, expected_log_number++);
|
|
|
|
}
|
|
|
|
|
|
|
|
wals.Reset();
|
|
|
|
ASSERT_TRUE(wals.GetWals().empty());
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(WalSet, Overwrite) {
|
|
|
|
constexpr WalNumber kNumber = 100;
|
|
|
|
constexpr uint64_t kBytes = 200;
|
|
|
|
WalSet wals;
|
|
|
|
wals.AddWal(WalAddition(kNumber));
|
|
|
|
ASSERT_FALSE(wals.GetWals().at(kNumber).HasSize());
|
|
|
|
wals.AddWal(WalAddition(kNumber, WalMetadata(kBytes)));
|
|
|
|
ASSERT_TRUE(wals.GetWals().at(kNumber).HasSize());
|
|
|
|
ASSERT_EQ(wals.GetWals().at(kNumber).GetSizeInBytes(), kBytes);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(WalSet, CreateTwice) {
|
|
|
|
constexpr WalNumber kNumber = 100;
|
|
|
|
WalSet wals;
|
|
|
|
ASSERT_OK(wals.AddWal(WalAddition(kNumber)));
|
|
|
|
Status s = wals.AddWal(WalAddition(kNumber));
|
|
|
|
ASSERT_TRUE(s.IsCorruption());
|
|
|
|
ASSERT_TRUE(s.ToString().find("WAL 100 is created more than once") !=
|
|
|
|
std::string::npos);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(WalSet, CloseTwice) {
|
|
|
|
constexpr WalNumber kNumber = 100;
|
|
|
|
constexpr uint64_t kBytes = 200;
|
|
|
|
WalSet wals;
|
|
|
|
ASSERT_OK(wals.AddWal(WalAddition(kNumber)));
|
|
|
|
ASSERT_OK(wals.AddWal(WalAddition(kNumber, WalMetadata(kBytes))));
|
|
|
|
Status s = wals.AddWal(WalAddition(kNumber, WalMetadata(kBytes)));
|
|
|
|
ASSERT_TRUE(s.IsCorruption());
|
|
|
|
ASSERT_TRUE(s.ToString().find("WAL 100 is closed more than once") !=
|
|
|
|
std::string::npos);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(WalSet, CloseBeforeCreate) {
|
|
|
|
constexpr WalNumber kNumber = 100;
|
|
|
|
constexpr uint64_t kBytes = 200;
|
|
|
|
WalSet wals;
|
|
|
|
Status s = wals.AddWal(WalAddition(kNumber, WalMetadata(kBytes)));
|
|
|
|
ASSERT_TRUE(s.IsCorruption());
|
|
|
|
ASSERT_TRUE(s.ToString().find("WAL 100 is not created before closing") !=
|
|
|
|
std::string::npos);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(WalSet, CreateAfterClose) {
|
|
|
|
constexpr WalNumber kNumber = 100;
|
|
|
|
constexpr uint64_t kBytes = 200;
|
|
|
|
WalSet wals;
|
|
|
|
ASSERT_OK(wals.AddWal(WalAddition(kNumber)));
|
|
|
|
ASSERT_OK(wals.AddWal(WalAddition(kNumber, WalMetadata(kBytes))));
|
|
|
|
Status s = wals.AddWal(WalAddition(kNumber));
|
|
|
|
ASSERT_TRUE(s.IsCorruption());
|
|
|
|
ASSERT_TRUE(s.ToString().find("WAL 100 is created more than once") !=
|
|
|
|
std::string::npos);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(WalSet, DeleteNonExistingWal) {
|
|
|
|
constexpr WalNumber kNonExistingNumber = 100;
|
|
|
|
WalSet wals;
|
|
|
|
Status s = wals.DeleteWal(WalDeletion(kNonExistingNumber));
|
|
|
|
ASSERT_TRUE(s.IsCorruption());
|
|
|
|
ASSERT_TRUE(s.ToString().find("WAL 100 must exist before deletion") !=
|
|
|
|
std::string::npos);
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST(WalSet, DeleteNonClosedWal) {
|
2020-08-20 15:10:38 -07:00
|
|
|
constexpr WalNumber kNonClosedWalNumber = 100;
|
Define WAL related classes to be used in VersionEdit and VersionSet (#7164)
Summary:
`WalAddition`, `WalDeletion` are defined in `wal_version.h` and used in `VersionEdit`.
`WalAddition` is used to represent events of creating a new WAL (no size, just log number), or closing a WAL (with size).
`WalDeletion` is used to represent events of deleting or archiving a WAL, it means the WAL is no longer alive (won't be replayed during recovery).
`WalSet` is the set of alive WALs kept in `VersionSet`.
1. Why use `WalDeletion` instead of relying on `MinLogNumber` to identify outdated WALs
On recovery, we can compute `MinLogNumber()` based on the log numbers kept in MANIFEST, any log with number < MinLogNumber can be ignored. So it seems that we don't need to persist `WalDeletion` to MANIFEST, since we can ignore the WALs based on MinLogNumber.
But the `MinLogNumber()` is actually a lower bound, it does not exactly mean that logs starting from MinLogNumber must exist. This is because in a corner case, when a column family is empty and never flushed, its log number is set to the largest log number, but not persisted in MANIFEST. So let's say there are 2 column families, when creating the DB, the first WAL has log number 1, so it's persisted to MANIFEST for both column families. Then CF 0 is empty and never flushed, CF 1 is updated and flushed, so a new WAL with log number 2 is created and persisted to MANIFEST for CF 1. But CF 0's log number in MANIFEST is still 1. So on recovery, MinLogNumber is 1, but since log 1 only contains data for CF 1, and CF 1 is flushed, log 1 might have already been deleted from disk.
We can make `MinLogNumber()` be the exactly minimum log number that must exist, by persisting the most recent log number for empty column families that are not flushed. But if there are N such column families, then every time a new WAL is created, we need to add N records to MANIFEST.
In current design, a record is persisted to MANIFEST only when WAL is created, closed, or deleted/archived, so the number of WAL related records are bounded to 3x number of WALs.
2. Why keep `WalSet` in `VersionSet` instead of applying the `VersionEdit`s to `VersionStorageInfo`
`VersionEdit`s are originally designed to track the addition and deletion of SST files. The SST files are related to column families, each column family has a list of `Version`s, and each `Version` keeps the set of active SST files in `VersionStorageInfo`.
But WALs are a concept of DB, they are not bounded to specific column families. So logically it does not make sense to store WALs in a column family's `Version`s.
Also, `Version`'s purpose is to keep reference to SST / blob files, so that they are not deleted until there is no version referencing them. But a WAL is deleted regardless of version references.
So we keep the WALs in `VersionSet` for the purpose of writing out the DB state's snapshot when creating new MANIFESTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7164
Test Plan:
make version_edit_test && ./version_edit_test
make wal_edit_test && ./wal_edit_test
Reviewed By: ltamasi
Differential Revision: D22677936
Pulled By: cheng-chang
fbshipit-source-id: 5a3b6890140e572ffd79eb37e6e4c3c32361a859
2020-08-05 16:32:26 -07:00
|
|
|
WalSet wals;
|
2020-08-20 15:10:38 -07:00
|
|
|
ASSERT_OK(wals.AddWal(WalAddition(kNonClosedWalNumber)));
|
|
|
|
Status s = wals.DeleteWal(WalDeletion(kNonClosedWalNumber));
|
Define WAL related classes to be used in VersionEdit and VersionSet (#7164)
Summary:
`WalAddition`, `WalDeletion` are defined in `wal_version.h` and used in `VersionEdit`.
`WalAddition` is used to represent events of creating a new WAL (no size, just log number), or closing a WAL (with size).
`WalDeletion` is used to represent events of deleting or archiving a WAL, it means the WAL is no longer alive (won't be replayed during recovery).
`WalSet` is the set of alive WALs kept in `VersionSet`.
1. Why use `WalDeletion` instead of relying on `MinLogNumber` to identify outdated WALs
On recovery, we can compute `MinLogNumber()` based on the log numbers kept in MANIFEST, any log with number < MinLogNumber can be ignored. So it seems that we don't need to persist `WalDeletion` to MANIFEST, since we can ignore the WALs based on MinLogNumber.
But the `MinLogNumber()` is actually a lower bound, it does not exactly mean that logs starting from MinLogNumber must exist. This is because in a corner case, when a column family is empty and never flushed, its log number is set to the largest log number, but not persisted in MANIFEST. So let's say there are 2 column families, when creating the DB, the first WAL has log number 1, so it's persisted to MANIFEST for both column families. Then CF 0 is empty and never flushed, CF 1 is updated and flushed, so a new WAL with log number 2 is created and persisted to MANIFEST for CF 1. But CF 0's log number in MANIFEST is still 1. So on recovery, MinLogNumber is 1, but since log 1 only contains data for CF 1, and CF 1 is flushed, log 1 might have already been deleted from disk.
We can make `MinLogNumber()` be the exactly minimum log number that must exist, by persisting the most recent log number for empty column families that are not flushed. But if there are N such column families, then every time a new WAL is created, we need to add N records to MANIFEST.
In current design, a record is persisted to MANIFEST only when WAL is created, closed, or deleted/archived, so the number of WAL related records are bounded to 3x number of WALs.
2. Why keep `WalSet` in `VersionSet` instead of applying the `VersionEdit`s to `VersionStorageInfo`
`VersionEdit`s are originally designed to track the addition and deletion of SST files. The SST files are related to column families, each column family has a list of `Version`s, and each `Version` keeps the set of active SST files in `VersionStorageInfo`.
But WALs are a concept of DB, they are not bounded to specific column families. So logically it does not make sense to store WALs in a column family's `Version`s.
Also, `Version`'s purpose is to keep reference to SST / blob files, so that they are not deleted until there is no version referencing them. But a WAL is deleted regardless of version references.
So we keep the WALs in `VersionSet` for the purpose of writing out the DB state's snapshot when creating new MANIFESTs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7164
Test Plan:
make version_edit_test && ./version_edit_test
make wal_edit_test && ./wal_edit_test
Reviewed By: ltamasi
Differential Revision: D22677936
Pulled By: cheng-chang
fbshipit-source-id: 5a3b6890140e572ffd79eb37e6e4c3c32361a859
2020-08-05 16:32:26 -07:00
|
|
|
ASSERT_TRUE(s.IsCorruption());
|
|
|
|
ASSERT_TRUE(s.ToString().find("WAL 100 must be closed before deletion") !=
|
|
|
|
std::string::npos);
|
|
|
|
}
|
|
|
|
|
|
|
|
} // namespace ROCKSDB_NAMESPACE
|
|
|
|
|
|
|
|
int main(int argc, char** argv) {
|
|
|
|
ROCKSDB_NAMESPACE::port::InstallStackTraceHandler();
|
|
|
|
::testing::InitGoogleTest(&argc, argv);
|
|
|
|
return RUN_ALL_TESTS();
|
|
|
|
}
|