Compare commits
20 Commits
Author | SHA1 | Date | |
---|---|---|---|
|
b705f808e4 | ||
|
cc4c9e80e4 | ||
|
9d95050ca7 | ||
|
9857d45f37 | ||
|
5884cf5d5b | ||
|
e32a64aa54 | ||
|
031368f67a | ||
|
5105db8d7b | ||
|
ba7c46a3d8 | ||
|
be40c99dda | ||
|
89dd231f6a | ||
|
e931bbfec0 | ||
|
67f8189e68 | ||
|
0a91c691a9 | ||
|
dff9219df6 | ||
|
eeea27a048 | ||
|
2b64cddf99 | ||
|
79a21d67cb | ||
|
2eaad9e1b1 | ||
|
2a2d4d0f37 |
25
HISTORY.md
25
HISTORY.md
@ -1,7 +1,29 @@
|
||||
# Rocksdb Change Log
|
||||
## 6.16.5 (09/08/2021)
|
||||
### Bug Fixes
|
||||
* Fixed a bug affecting the batched `MultiGet` API when used with keys spanning multiple column families and `sorted_input == false`.
|
||||
|
||||
## 6.16.4 (03/30/2021)
|
||||
### Bug Fixes
|
||||
* Fix build on ppc64 and musl build.
|
||||
|
||||
## 6.16.3 (02/05/2021)
|
||||
### Bug Fixes
|
||||
* Since 6.15.0, `TransactionDB` returns error `Status`es from calls to `DeleteRange()` and calls to `Write()` where the `WriteBatch` contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees. There are certain cases where range deletion can still be used on such DBs; see the API doc on `TransactionDB::DeleteRange()` for details.
|
||||
* `OptimisticTransactionDB` now returns error `Status`es from calls to `DeleteRange()` and calls to `Write()` where the `WriteBatch` contains a range deletion. Previously such operations may have succeeded while not providing the expected transactional guarantees.
|
||||
|
||||
## 6.16.2 (1/21/2021)
|
||||
### Bug Fixes
|
||||
* Fix a race condition between DB startups and shutdowns in managing the periodic background worker threads. One effect of this race condition could be the process being terminated.
|
||||
|
||||
## 6.16.1 (1/20/2021)
|
||||
### Bug Fixes
|
||||
* Version older than 6.15 cannot decode VersionEdits `WalAddition` and `WalDeletion`, fixed this by changing the encoded format of them to be ignorable by older versions.
|
||||
|
||||
## 6.16.0 (12/18/2020)
|
||||
### Behavior Changes
|
||||
* Attempting to write a merge operand without explicitly configuring `merge_operator` now fails immediately, causing the DB to enter read-only mode. Previously, failure was deferred until the `merge_operator` was needed by a user read or a background operation.
|
||||
* Since RocksDB does not continue write the same file if a file write fails for any reason, the file scope write IO error is treated the same as retryable IO error. More information about error handling of file scope IO error is included in `ErrorHandler::SetBGError`.
|
||||
|
||||
### Bug Fixes
|
||||
* Truncated WALs ending in incomplete records can no longer produce gaps in the recovered data when `WALRecoveryMode::kPointInTimeRecovery` is used. Gaps are still possible when WALs are truncated exactly on record boundaries; for complete protection, users should enable `track_and_verify_wals_in_manifest`.
|
||||
@ -10,6 +32,7 @@
|
||||
* Fixed the logic of populating native data structure for `read_amp_bytes_per_bit` during OPTIONS file parsing on big-endian architecture. Without this fix, original code introduced in PR7659, when running on big-endian machine, can mistakenly store read_amp_bytes_per_bit (an uint32) in little endian format. Future access to `read_amp_bytes_per_bit` will give wrong values. Little endian architecture is not affected.
|
||||
* Fixed prefix extractor with timestamp issues.
|
||||
* Fixed a bug in atomic flush: in two-phase commit mode, the minimum WAL log number to keep is incorrect.
|
||||
* Fixed a bug related to checkpoint in PR7789: if there are multiple column families, and the checkpoint is not opened as read only, then in rare cases, data loss may happen in the checkpoint. Since backup engine relies on checkpoint, it may also be affected.
|
||||
|
||||
### New Features
|
||||
* User defined timestamp feature supports `CompactRange` and `GetApproximateSizes`.
|
||||
@ -18,6 +41,7 @@
|
||||
|
||||
### Public API Change
|
||||
* Deprecated public but rarely-used FilterBitsBuilder::CalculateNumEntry, which is replaced with ApproximateNumEntries taking a size_t parameter and returning size_t.
|
||||
* Added a new option `track_and_verify_wals_in_manifest`. If `true`, the log numbers and sizes of the synced WALs are tracked in MANIFEST, then during DB recovery, if a synced WAL is missing from disk, or the WAL's size does not match the recorded size in MANIFEST, an error will be reported and the recovery will be aborted. Note that this option does not work with secondary instance.
|
||||
|
||||
## 6.15.0 (11/13/2020)
|
||||
### Bug Fixes
|
||||
@ -39,7 +63,6 @@
|
||||
### Public API Change
|
||||
* Deprecate `BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`. These options still take effect until users migrate to the replacement APIs in `BlockBasedTableOptions::metadata_cache_options`. Migration guidance can be found in the API comments on the deprecated options.
|
||||
* Add new API `DB::VerifyFileChecksums` to verify SST file checksum with corresponding entries in the MANIFEST if present. Current implementation requires scanning and recomputing file checksums.
|
||||
* Added a new option `track_and_verify_wals_in_manifest`. If `true`, the log numbers and sizes of the synced WALs are tracked in MANIFEST, then during DB recovery, if a synced WAL is missing from disk, or the WAL's size does not match the recorded size in MANIFEST, an error will be reported and the recovery will be aborted. Note that this option does not work with secondary instance.
|
||||
|
||||
### Behavior Changes
|
||||
* The dictionary compression settings specified in `ColumnFamilyOptions::compression_opts` now additionally affect files generated by flush and compaction to non-bottommost level. Previously those settings at most affected files generated by compaction to bottommost level, depending on whether `ColumnFamilyOptions::bottommost_compression_opts` overrode them. Users who relied on dictionary compression settings in `ColumnFamilyOptions::compression_opts` affecting only the bottommost level can keep the behavior by moving their dictionary settings to `ColumnFamilyOptions::bottommost_compression_opts` and setting its `enabled` flag.
|
||||
|
22
Makefile
22
Makefile
@ -510,6 +510,12 @@ ifeq ($(USE_FOLLY_DISTRIBUTED_MUTEX),1)
|
||||
LIB_OBJECTS += $(patsubst %.cpp, $(OBJ_DIR)/%.o, $(FOLLY_SOURCES))
|
||||
endif
|
||||
|
||||
# range_tree is not compatible with non GNU libc on ppc64
|
||||
# see https://jira.percona.com/browse/PS-7559
|
||||
ifneq ($(PPC_LIBC_IS_GNU),0)
|
||||
LIB_OBJECTS += $(patsubst %.cc, $(OBJ_DIR)/%.o, $(RANGE_TREE_SOURCES))
|
||||
endif
|
||||
|
||||
GTEST = $(OBJ_DIR)/$(GTEST_DIR)/gtest/gtest-all.o
|
||||
TESTUTIL = $(OBJ_DIR)/test_util/testutil.o
|
||||
TESTHARNESS = $(OBJ_DIR)/test_util/testharness.o $(TESTUTIL) $(GTEST)
|
||||
@ -2206,9 +2212,10 @@ endif
|
||||
|
||||
JAVA_STATIC_FLAGS = -DZLIB -DBZIP2 -DSNAPPY -DLZ4 -DZSTD
|
||||
JAVA_STATIC_INCLUDES = -I./zlib-$(ZLIB_VER) -I./bzip2-$(BZIP2_VER) -I./snappy-$(SNAPPY_VER) -I./snappy-$(SNAPPY_VER)/build -I./lz4-$(LZ4_VER)/lib -I./zstd-$(ZSTD_VER)/lib -I./zstd-$(ZSTD_VER)/lib/dictBuilder
|
||||
ifneq ($(findstring rocksdbjavastatic, $(MAKECMDGOALS)),)
|
||||
|
||||
ifneq ($(findstring rocksdbjavastatic, $(filter-out rocksdbjavastatic_deps, $(MAKECMDGOALS))),)
|
||||
CXXFLAGS += $(JAVA_STATIC_FLAGS) $(JAVA_STATIC_INCLUDES)
|
||||
CFLAGS += $(JAVA_STATIC_FLAGS) $(JAVA_STATIC_INCLUDES)
|
||||
CFLAGS += $(JAVA_STATIC_FLAGS) $(JAVA_STATIC_INCLUDES)
|
||||
endif
|
||||
rocksdbjavastatic:
|
||||
ifeq ($(JAVA_HOME),)
|
||||
@ -2216,8 +2223,11 @@ ifeq ($(JAVA_HOME),)
|
||||
endif
|
||||
$(MAKE) rocksdbjavastatic_deps
|
||||
$(MAKE) rocksdbjavastatic_libobjects
|
||||
cd java;$(MAKE) javalib;
|
||||
rm -f ./java/target/$(ROCKSDBJNILIB)
|
||||
$(MAKE) rocksdbjavastatic_javalib
|
||||
|
||||
rocksdbjavastatic_javalib:
|
||||
cd java;$(MAKE) javalib
|
||||
rm -f java/target/$(ROCKSDBJNILIB)
|
||||
$(CXX) $(CXXFLAGS) -I./java/. $(JAVA_INCLUDE) -shared -fPIC \
|
||||
-o ./java/target/$(ROCKSDBJNILIB) $(JNI_NATIVE_SOURCES) \
|
||||
$(LIB_OBJECTS) $(COVERAGEFLAGS) \
|
||||
@ -2440,6 +2450,8 @@ ifneq ($(MAKECMDGOALS),clean)
|
||||
ifneq ($(MAKECMDGOALS),format)
|
||||
ifneq ($(MAKECMDGOALS),jclean)
|
||||
ifneq ($(MAKECMDGOALS),jtest)
|
||||
ifneq ($(MAKECMDGOALS),rocksdbjavastatic)
|
||||
ifneq ($(MAKECMDGOALS),rocksdbjavastatic_deps)
|
||||
ifneq ($(MAKECMDGOALS),package)
|
||||
ifneq ($(MAKECMDGOALS),analyze)
|
||||
-include $(DEPFILES)
|
||||
@ -2449,3 +2461,5 @@ endif
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
endif
|
||||
|
@ -135,11 +135,15 @@ def generate_targets(repo_path, deps_map):
|
||||
TARGETS.add_library(
|
||||
"rocksdb_lib",
|
||||
src_mk["LIB_SOURCES"] +
|
||||
# always add range_tree, it's only excluded on ppc64, which we don't use internally
|
||||
src_mk["RANGE_TREE_SOURCES"] +
|
||||
src_mk["TOOL_LIB_SOURCES"])
|
||||
# rocksdb_whole_archive_lib
|
||||
TARGETS.add_library(
|
||||
"rocksdb_whole_archive_lib",
|
||||
src_mk["LIB_SOURCES"] +
|
||||
# always add range_tree, it's only excluded on ppc64, which we don't use internally
|
||||
src_mk["RANGE_TREE_SOURCES"] +
|
||||
src_mk["TOOL_LIB_SOURCES"],
|
||||
deps=None,
|
||||
headers=None,
|
||||
|
@ -663,6 +663,23 @@ else
|
||||
fi
|
||||
fi
|
||||
|
||||
if test -n "`echo $TARGET_ARCHITECTURE | grep ^ppc64`"; then
|
||||
# check for GNU libc on ppc64
|
||||
$CXX -x c++ - -o /dev/null 2>/dev/null <<EOF
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <gnu/libc-version.h>
|
||||
|
||||
int main(int argc, char *argv[]) {
|
||||
printf("GNU libc version: %s\n", gnu_get_libc_version());
|
||||
return 0;
|
||||
}
|
||||
EOF
|
||||
if [ "$?" != 0 ]; then
|
||||
PPC_LIBC_IS_GNU=0
|
||||
fi
|
||||
fi
|
||||
|
||||
if test "$TRY_SSE_ETC"; then
|
||||
# The USE_SSE flag now means "attempt to compile with widely-available
|
||||
# Intel architecture extensions utilized by specific optimizations in the
|
||||
@ -856,3 +873,6 @@ echo "LUA_PATH=$LUA_PATH" >> "$OUTPUT"
|
||||
if test -n "$USE_FOLLY_DISTRIBUTED_MUTEX"; then
|
||||
echo "USE_FOLLY_DISTRIBUTED_MUTEX=$USE_FOLLY_DISTRIBUTED_MUTEX" >> "$OUTPUT"
|
||||
fi
|
||||
if test -n "$PPC_LIBC_IS_GNU"; then
|
||||
echo "PPC_LIBC_IS_GNU=$PPC_LIBC_IS_GNU" >> "$OUTPUT"
|
||||
fi
|
||||
|
@ -1361,6 +1361,28 @@ TEST_P(DBMultiGetTestWithParam, MultiGetMultiCFSnapshot) {
|
||||
}
|
||||
}
|
||||
|
||||
TEST_P(DBMultiGetTestWithParam, MultiGetMultiCFUnsorted) {
|
||||
Options options = CurrentOptions();
|
||||
CreateAndReopenWithCF({"one", "two"}, options);
|
||||
|
||||
ASSERT_OK(Put(1, "foo", "bar"));
|
||||
ASSERT_OK(Put(2, "baz", "xyz"));
|
||||
ASSERT_OK(Put(1, "abc", "def"));
|
||||
|
||||
// Note: keys for the same CF do not form a consecutive range
|
||||
std::vector<int> cfs{1, 2, 1};
|
||||
std::vector<std::string> keys{"foo", "baz", "abc"};
|
||||
std::vector<std::string> values;
|
||||
|
||||
values =
|
||||
MultiGet(cfs, keys, /* snapshot */ nullptr, /* batched */ GetParam());
|
||||
|
||||
ASSERT_EQ(values.size(), 3);
|
||||
ASSERT_EQ(values[0], "bar");
|
||||
ASSERT_EQ(values[1], "xyz");
|
||||
ASSERT_EQ(values[2], "def");
|
||||
}
|
||||
|
||||
INSTANTIATE_TEST_CASE_P(DBMultiGetTestWithParam, DBMultiGetTestWithParam,
|
||||
testing::Bool());
|
||||
|
||||
|
@ -2164,20 +2164,18 @@ void DBImpl::MultiGet(const ReadOptions& read_options, const size_t num_keys,
|
||||
multiget_cf_data;
|
||||
size_t cf_start = 0;
|
||||
ColumnFamilyHandle* cf = sorted_keys[0]->column_family;
|
||||
|
||||
for (size_t i = 0; i < num_keys; ++i) {
|
||||
KeyContext* key_ctx = sorted_keys[i];
|
||||
if (key_ctx->column_family != cf) {
|
||||
multiget_cf_data.emplace_back(
|
||||
MultiGetColumnFamilyData(cf, cf_start, i - cf_start, nullptr));
|
||||
multiget_cf_data.emplace_back(cf, cf_start, i - cf_start, nullptr);
|
||||
cf_start = i;
|
||||
cf = key_ctx->column_family;
|
||||
}
|
||||
}
|
||||
{
|
||||
// multiget_cf_data.emplace_back(
|
||||
// MultiGetColumnFamilyData(cf, cf_start, num_keys - cf_start, nullptr));
|
||||
multiget_cf_data.emplace_back(cf, cf_start, num_keys - cf_start, nullptr);
|
||||
}
|
||||
|
||||
multiget_cf_data.emplace_back(cf, cf_start, num_keys - cf_start, nullptr);
|
||||
|
||||
std::function<MultiGetColumnFamilyData*(
|
||||
autovector<MultiGetColumnFamilyData,
|
||||
MultiGetContext::MAX_BATCH_SIZE>::iterator&)>
|
||||
@ -2237,7 +2235,7 @@ struct CompareKeyContext {
|
||||
static_cast<ColumnFamilyHandleImpl*>(lhs->column_family);
|
||||
uint32_t cfd_id1 = cfh->cfd()->GetID();
|
||||
const Comparator* comparator = cfh->cfd()->user_comparator();
|
||||
cfh = static_cast<ColumnFamilyHandleImpl*>(lhs->column_family);
|
||||
cfh = static_cast<ColumnFamilyHandleImpl*>(rhs->column_family);
|
||||
uint32_t cfd_id2 = cfh->cfd()->GetID();
|
||||
|
||||
if (cfd_id1 < cfd_id2) {
|
||||
@ -2261,39 +2259,24 @@ struct CompareKeyContext {
|
||||
void DBImpl::PrepareMultiGetKeys(
|
||||
size_t num_keys, bool sorted_input,
|
||||
autovector<KeyContext*, MultiGetContext::MAX_BATCH_SIZE>* sorted_keys) {
|
||||
#ifndef NDEBUG
|
||||
if (sorted_input) {
|
||||
for (size_t index = 0; index < sorted_keys->size(); ++index) {
|
||||
if (index > 0) {
|
||||
KeyContext* lhs = (*sorted_keys)[index - 1];
|
||||
KeyContext* rhs = (*sorted_keys)[index];
|
||||
ColumnFamilyHandleImpl* cfh =
|
||||
static_cast_with_check<ColumnFamilyHandleImpl>(lhs->column_family);
|
||||
uint32_t cfd_id1 = cfh->cfd()->GetID();
|
||||
const Comparator* comparator = cfh->cfd()->user_comparator();
|
||||
cfh =
|
||||
static_cast_with_check<ColumnFamilyHandleImpl>(lhs->column_family);
|
||||
uint32_t cfd_id2 = cfh->cfd()->GetID();
|
||||
#ifndef NDEBUG
|
||||
CompareKeyContext key_context_less;
|
||||
|
||||
assert(cfd_id1 <= cfd_id2);
|
||||
if (cfd_id1 < cfd_id2) {
|
||||
continue;
|
||||
}
|
||||
for (size_t index = 1; index < sorted_keys->size(); ++index) {
|
||||
const KeyContext* const lhs = (*sorted_keys)[index - 1];
|
||||
const KeyContext* const rhs = (*sorted_keys)[index];
|
||||
|
||||
// Both keys are from the same column family
|
||||
int cmp = comparator->CompareWithoutTimestamp(
|
||||
*(lhs->key), /*a_has_ts=*/false, *(rhs->key), /*b_has_ts=*/false);
|
||||
assert(cmp <= 0);
|
||||
}
|
||||
index++;
|
||||
// lhs should be <= rhs, or in other words, rhs should NOT be < lhs
|
||||
assert(!key_context_less(rhs, lhs));
|
||||
}
|
||||
}
|
||||
#endif
|
||||
if (!sorted_input) {
|
||||
CompareKeyContext sort_comparator;
|
||||
std::sort(sorted_keys->begin(), sorted_keys->begin() + num_keys,
|
||||
sort_comparator);
|
||||
|
||||
return;
|
||||
}
|
||||
|
||||
std::sort(sorted_keys->begin(), sorted_keys->begin() + num_keys,
|
||||
CompareKeyContext());
|
||||
}
|
||||
|
||||
void DBImpl::MultiGet(const ReadOptions& read_options,
|
||||
|
@ -350,12 +350,17 @@ const Status& ErrorHandler::SetBGError(const Status& bg_err,
|
||||
|
||||
// This is the main function for looking at IO related error during the
|
||||
// background operations. The main logic is:
|
||||
// 1) File scope IO error is treated as retryable IO error in the write
|
||||
// path. In RocksDB, If a file has write IO error and it is at file scope,
|
||||
// RocksDB never write to the same file again. RocksDB will create a new
|
||||
// file and rewrite the whole content. Thus, it is retryable.
|
||||
// 1) if the error is caused by data loss, the error is mapped to
|
||||
// unrecoverable error. Application/user must take action to handle
|
||||
// this situation.
|
||||
// 2) if the error is a Retryable IO error, auto resume will be called and the
|
||||
// auto resume can be controlled by resume count and resume interval
|
||||
// options. There are three sub-cases:
|
||||
// this situation (File scope case is excluded).
|
||||
// 2) if the error is a Retryable IO error (i.e., it is a file scope IO error,
|
||||
// or its retryable flag is set and not a data loss error), auto resume
|
||||
// will be called and the auto resume can be controlled by resume count
|
||||
// and resume interval options. There are three sub-cases:
|
||||
// a) if the error happens during compaction, it is mapped to a soft error.
|
||||
// the compaction thread will reschedule a new compaction.
|
||||
// b) if the error happens during flush and also WAL is empty, it is mapped
|
||||
@ -384,9 +389,10 @@ const Status& ErrorHandler::SetBGError(const IOStatus& bg_io_err,
|
||||
|
||||
Status new_bg_io_err = bg_io_err;
|
||||
DBRecoverContext context;
|
||||
if (bg_io_err.GetDataLoss()) {
|
||||
// First, data loss is treated as unrecoverable error. So it can directly
|
||||
// overwrite any existing bg_error_.
|
||||
if (bg_io_err.GetScope() != IOStatus::IOErrorScope::kIOErrorScopeFile &&
|
||||
bg_io_err.GetDataLoss()) {
|
||||
// First, data loss (non file scope) is treated as unrecoverable error. So
|
||||
// it can directly overwrite any existing bg_error_.
|
||||
bool auto_recovery = false;
|
||||
Status bg_err(new_bg_io_err, Status::Severity::kUnrecoverableError);
|
||||
bg_error_ = bg_err;
|
||||
@ -397,13 +403,15 @@ const Status& ErrorHandler::SetBGError(const IOStatus& bg_io_err,
|
||||
&bg_err, db_mutex_, &auto_recovery);
|
||||
recover_context_ = context;
|
||||
return bg_error_;
|
||||
} else if (bg_io_err.GetRetryable()) {
|
||||
// Second, check if the error is a retryable IO error or not. if it is
|
||||
// retryable error and its severity is higher than bg_error_, overwrite
|
||||
// the bg_error_ with new error.
|
||||
// In current stage, for retryable IO error of compaction, treat it as
|
||||
// soft error. In other cases, treat the retryable IO error as hard
|
||||
// error.
|
||||
} else if (bg_io_err.GetScope() ==
|
||||
IOStatus::IOErrorScope::kIOErrorScopeFile ||
|
||||
bg_io_err.GetRetryable()) {
|
||||
// Second, check if the error is a retryable IO error (file scope IO error
|
||||
// is also treated as retryable IO error in RocksDB write path). if it is
|
||||
// retryable error and its severity is higher than bg_error_, overwrite the
|
||||
// bg_error_ with new error. In current stage, for retryable IO error of
|
||||
// compaction, treat it as soft error. In other cases, treat the retryable
|
||||
// IO error as hard error.
|
||||
bool auto_recovery = false;
|
||||
EventHelpers::NotifyOnBackgroundError(db_options_.listeners, reason,
|
||||
&new_bg_io_err, db_mutex_,
|
||||
|
@ -241,6 +241,90 @@ TEST_F(DBErrorHandlingFSTest, FLushWritRetryableError) {
|
||||
Destroy(options);
|
||||
}
|
||||
|
||||
TEST_F(DBErrorHandlingFSTest, FLushWritFileScopeError) {
|
||||
std::shared_ptr<ErrorHandlerFSListener> listener(
|
||||
new ErrorHandlerFSListener());
|
||||
Options options = GetDefaultOptions();
|
||||
options.env = fault_env_.get();
|
||||
options.create_if_missing = true;
|
||||
options.listeners.emplace_back(listener);
|
||||
options.max_bgerror_resume_count = 0;
|
||||
Status s;
|
||||
|
||||
listener->EnableAutoRecovery(false);
|
||||
DestroyAndReopen(options);
|
||||
|
||||
IOStatus error_msg = IOStatus::IOError("File Scope Data Loss Error");
|
||||
error_msg.SetDataLoss(true);
|
||||
error_msg.SetScope(
|
||||
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFile);
|
||||
error_msg.SetRetryable(false);
|
||||
|
||||
ASSERT_OK(Put(Key(1), "val1"));
|
||||
SyncPoint::GetInstance()->SetCallBack(
|
||||
"BuildTable:BeforeFinishBuildTable",
|
||||
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
|
||||
SyncPoint::GetInstance()->EnableProcessing();
|
||||
s = Flush();
|
||||
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
||||
SyncPoint::GetInstance()->DisableProcessing();
|
||||
fault_fs_->SetFilesystemActive(true);
|
||||
s = dbfull()->Resume();
|
||||
ASSERT_OK(s);
|
||||
Reopen(options);
|
||||
ASSERT_EQ("val1", Get(Key(1)));
|
||||
|
||||
ASSERT_OK(Put(Key(2), "val2"));
|
||||
SyncPoint::GetInstance()->SetCallBack(
|
||||
"BuildTable:BeforeSyncTable",
|
||||
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
|
||||
SyncPoint::GetInstance()->EnableProcessing();
|
||||
s = Flush();
|
||||
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
||||
SyncPoint::GetInstance()->DisableProcessing();
|
||||
fault_fs_->SetFilesystemActive(true);
|
||||
s = dbfull()->Resume();
|
||||
ASSERT_OK(s);
|
||||
Reopen(options);
|
||||
ASSERT_EQ("val2", Get(Key(2)));
|
||||
|
||||
ASSERT_OK(Put(Key(3), "val3"));
|
||||
SyncPoint::GetInstance()->SetCallBack(
|
||||
"BuildTable:BeforeCloseTableFile",
|
||||
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
|
||||
SyncPoint::GetInstance()->EnableProcessing();
|
||||
s = Flush();
|
||||
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
||||
SyncPoint::GetInstance()->DisableProcessing();
|
||||
fault_fs_->SetFilesystemActive(true);
|
||||
s = dbfull()->Resume();
|
||||
ASSERT_OK(s);
|
||||
Reopen(options);
|
||||
ASSERT_EQ("val3", Get(Key(3)));
|
||||
|
||||
// not file scope, but retyrable set
|
||||
error_msg.SetDataLoss(false);
|
||||
error_msg.SetScope(
|
||||
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFileSystem);
|
||||
error_msg.SetRetryable(true);
|
||||
|
||||
ASSERT_OK(Put(Key(3), "val3"));
|
||||
SyncPoint::GetInstance()->SetCallBack(
|
||||
"BuildTable:BeforeCloseTableFile",
|
||||
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
|
||||
SyncPoint::GetInstance()->EnableProcessing();
|
||||
s = Flush();
|
||||
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
||||
SyncPoint::GetInstance()->DisableProcessing();
|
||||
fault_fs_->SetFilesystemActive(true);
|
||||
s = dbfull()->Resume();
|
||||
ASSERT_OK(s);
|
||||
Reopen(options);
|
||||
ASSERT_EQ("val3", Get(Key(3)));
|
||||
|
||||
Destroy(options);
|
||||
}
|
||||
|
||||
TEST_F(DBErrorHandlingFSTest, FLushWritNoWALRetryableError1) {
|
||||
std::shared_ptr<ErrorHandlerFSListener> listener(
|
||||
new ErrorHandlerFSListener());
|
||||
@ -453,6 +537,52 @@ TEST_F(DBErrorHandlingFSTest, ManifestWriteRetryableError) {
|
||||
Close();
|
||||
}
|
||||
|
||||
TEST_F(DBErrorHandlingFSTest, ManifestWriteFileScopeError) {
|
||||
std::shared_ptr<ErrorHandlerFSListener> listener(
|
||||
new ErrorHandlerFSListener());
|
||||
Options options = GetDefaultOptions();
|
||||
options.env = fault_env_.get();
|
||||
options.create_if_missing = true;
|
||||
options.listeners.emplace_back(listener);
|
||||
options.max_bgerror_resume_count = 0;
|
||||
Status s;
|
||||
std::string old_manifest;
|
||||
std::string new_manifest;
|
||||
|
||||
listener->EnableAutoRecovery(false);
|
||||
DestroyAndReopen(options);
|
||||
old_manifest = GetManifestNameFromLiveFiles();
|
||||
|
||||
IOStatus error_msg = IOStatus::IOError("File Scope Data Loss Error");
|
||||
error_msg.SetDataLoss(true);
|
||||
error_msg.SetScope(
|
||||
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFile);
|
||||
error_msg.SetRetryable(false);
|
||||
|
||||
ASSERT_OK(Put(Key(0), "val"));
|
||||
ASSERT_OK(Flush());
|
||||
ASSERT_OK(Put(Key(1), "val"));
|
||||
SyncPoint::GetInstance()->SetCallBack(
|
||||
"VersionSet::LogAndApply:WriteManifest",
|
||||
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
|
||||
SyncPoint::GetInstance()->EnableProcessing();
|
||||
s = Flush();
|
||||
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
|
||||
SyncPoint::GetInstance()->ClearAllCallBacks();
|
||||
SyncPoint::GetInstance()->DisableProcessing();
|
||||
fault_fs_->SetFilesystemActive(true);
|
||||
s = dbfull()->Resume();
|
||||
ASSERT_OK(s);
|
||||
|
||||
new_manifest = GetManifestNameFromLiveFiles();
|
||||
ASSERT_NE(new_manifest, old_manifest);
|
||||
|
||||
Reopen(options);
|
||||
ASSERT_EQ("val", Get(Key(0)));
|
||||
ASSERT_EQ("val", Get(Key(1)));
|
||||
Close();
|
||||
}
|
||||
|
||||
TEST_F(DBErrorHandlingFSTest, ManifestWriteNoWALRetryableError) {
|
||||
std::shared_ptr<ErrorHandlerFSListener> listener(
|
||||
new ErrorHandlerFSListener());
|
||||
@ -779,6 +909,54 @@ TEST_F(DBErrorHandlingFSTest, CompactionWriteRetryableError) {
|
||||
Destroy(options);
|
||||
}
|
||||
|
||||
TEST_F(DBErrorHandlingFSTest, CompactionWriteFileScopeError) {
|
||||
std::shared_ptr<ErrorHandlerFSListener> listener(
|
||||
new ErrorHandlerFSListener());
|
||||
Options options = GetDefaultOptions();
|
||||
options.env = fault_env_.get();
|
||||
options.create_if_missing = true;
|
||||
options.level0_file_num_compaction_trigger = 2;
|
||||
options.listeners.emplace_back(listener);
|
||||
options.max_bgerror_resume_count = 0;
|
||||
Status s;
|
||||
DestroyAndReopen(options);
|
||||
|
||||
IOStatus error_msg = IOStatus::IOError("File Scope Data Loss Error");
|
||||
error_msg.SetDataLoss(true);
|
||||
error_msg.SetScope(
|
||||
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFile);
|
||||
error_msg.SetRetryable(false);
|
||||
|
||||
ASSERT_OK(Put(Key(0), "va;"));
|
||||
ASSERT_OK(Put(Key(2), "va;"));
|
||||
s = Flush();
|
||||
ASSERT_OK(s);
|
||||
|
||||
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
|
||||
listener->EnableAutoRecovery(false);
|
||||
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
|
||||
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
|
||||
"BackgroundCallCompaction:0"}});
|
||||
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
|
||||
"CompactionJob::OpenCompactionOutputFile",
|
||||
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
|
||||
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
|
||||
|
||||
ASSERT_OK(Put(Key(1), "val"));
|
||||
s = Flush();
|
||||
ASSERT_OK(s);
|
||||
|
||||
s = dbfull()->TEST_WaitForCompact();
|
||||
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
|
||||
|
||||
fault_fs_->SetFilesystemActive(true);
|
||||
SyncPoint::GetInstance()->ClearAllCallBacks();
|
||||
SyncPoint::GetInstance()->DisableProcessing();
|
||||
s = dbfull()->Resume();
|
||||
ASSERT_OK(s);
|
||||
Destroy(options);
|
||||
}
|
||||
|
||||
TEST_F(DBErrorHandlingFSTest, CorruptionError) {
|
||||
Options options = GetDefaultOptions();
|
||||
options.env = fault_env_.get();
|
||||
|
@ -10,13 +10,14 @@
|
||||
#ifndef ROCKSDB_LITE
|
||||
namespace ROCKSDB_NAMESPACE {
|
||||
|
||||
PeriodicWorkScheduler::PeriodicWorkScheduler(Env* env) {
|
||||
PeriodicWorkScheduler::PeriodicWorkScheduler(Env* env) : timer_mu_(env) {
|
||||
timer = std::unique_ptr<Timer>(new Timer(env));
|
||||
}
|
||||
|
||||
void PeriodicWorkScheduler::Register(DBImpl* dbi,
|
||||
unsigned int stats_dump_period_sec,
|
||||
unsigned int stats_persist_period_sec) {
|
||||
MutexLock l(&timer_mu_);
|
||||
static std::atomic<uint64_t> initial_delay(0);
|
||||
timer->Start();
|
||||
if (stats_dump_period_sec > 0) {
|
||||
@ -41,6 +42,7 @@ void PeriodicWorkScheduler::Register(DBImpl* dbi,
|
||||
}
|
||||
|
||||
void PeriodicWorkScheduler::Unregister(DBImpl* dbi) {
|
||||
MutexLock l(&timer_mu_);
|
||||
timer->Cancel(GetTaskName(dbi, "dump_st"));
|
||||
timer->Cancel(GetTaskName(dbi, "pst_st"));
|
||||
timer->Cancel(GetTaskName(dbi, "flush_info_log"));
|
||||
@ -78,7 +80,10 @@ PeriodicWorkTestScheduler* PeriodicWorkTestScheduler::Default(Env* env) {
|
||||
MutexLock l(&mutex);
|
||||
if (scheduler.timer.get() != nullptr &&
|
||||
scheduler.timer->TEST_GetPendingTaskNum() == 0) {
|
||||
scheduler.timer->Shutdown();
|
||||
{
|
||||
MutexLock timer_mu_guard(&scheduler.timer_mu_);
|
||||
scheduler.timer->Shutdown();
|
||||
}
|
||||
scheduler.timer.reset(new Timer(env));
|
||||
}
|
||||
}
|
||||
|
@ -42,6 +42,12 @@ class PeriodicWorkScheduler {
|
||||
|
||||
protected:
|
||||
std::unique_ptr<Timer> timer;
|
||||
// `timer_mu_` serves two purposes currently:
|
||||
// (1) to ensure calls to `Start()` and `Shutdown()` are serialized, as
|
||||
// they are currently not implemented in a thread-safe way; and
|
||||
// (2) to ensure the `Timer::Add()`s and `Timer::Start()` run atomically, and
|
||||
// the `Timer::Cancel()`s and `Timer::Shutdown()` run atomically.
|
||||
port::Mutex timer_mu_;
|
||||
|
||||
explicit PeriodicWorkScheduler(Env* env);
|
||||
|
||||
|
@ -226,13 +226,17 @@ bool VersionEdit::EncodeTo(std::string* dst) const {
|
||||
}
|
||||
|
||||
for (const auto& wal_addition : wal_additions_) {
|
||||
PutVarint32(dst, kWalAddition);
|
||||
wal_addition.EncodeTo(dst);
|
||||
PutVarint32(dst, kWalAddition2);
|
||||
std::string encoded;
|
||||
wal_addition.EncodeTo(&encoded);
|
||||
PutLengthPrefixedSlice(dst, encoded);
|
||||
}
|
||||
|
||||
if (!wal_deletion_.IsEmpty()) {
|
||||
PutVarint32(dst, kWalDeletion);
|
||||
wal_deletion_.EncodeTo(dst);
|
||||
PutVarint32(dst, kWalDeletion2);
|
||||
std::string encoded;
|
||||
wal_deletion_.EncodeTo(&encoded);
|
||||
PutLengthPrefixedSlice(dst, encoded);
|
||||
}
|
||||
|
||||
// 0 is default and does not need to be explicitly written
|
||||
@ -375,6 +379,11 @@ const char* VersionEdit::DecodeNewFile4From(Slice* input) {
|
||||
|
||||
Status VersionEdit::DecodeFrom(const Slice& src) {
|
||||
Clear();
|
||||
#ifndef NDEBUG
|
||||
bool ignore_ignorable_tags = false;
|
||||
TEST_SYNC_POINT_CALLBACK("VersionEdit::EncodeTo:IgnoreIgnorableTags",
|
||||
&ignore_ignorable_tags);
|
||||
#endif
|
||||
Slice input = src;
|
||||
const char* msg = nullptr;
|
||||
uint32_t tag = 0;
|
||||
@ -385,6 +394,11 @@ Status VersionEdit::DecodeFrom(const Slice& src) {
|
||||
Slice str;
|
||||
InternalKey key;
|
||||
while (msg == nullptr && GetVarint32(&input, &tag)) {
|
||||
#ifndef NDEBUG
|
||||
if (ignore_ignorable_tags && tag > kTagSafeIgnoreMask) {
|
||||
tag = kTagSafeIgnoreMask;
|
||||
}
|
||||
#endif
|
||||
switch (tag) {
|
||||
case kDbId:
|
||||
if (GetLengthPrefixedSlice(&input, &str)) {
|
||||
@ -575,6 +589,23 @@ Status VersionEdit::DecodeFrom(const Slice& src) {
|
||||
break;
|
||||
}
|
||||
|
||||
case kWalAddition2: {
|
||||
Slice encoded;
|
||||
if (!GetLengthPrefixedSlice(&input, &encoded)) {
|
||||
msg = "WalAddition not prefixed by length";
|
||||
break;
|
||||
}
|
||||
|
||||
WalAddition wal_addition;
|
||||
const Status s = wal_addition.DecodeFrom(&encoded);
|
||||
if (!s.ok()) {
|
||||
return s;
|
||||
}
|
||||
|
||||
wal_additions_.emplace_back(std::move(wal_addition));
|
||||
break;
|
||||
}
|
||||
|
||||
case kWalDeletion: {
|
||||
WalDeletion wal_deletion;
|
||||
const Status s = wal_deletion.DecodeFrom(&input);
|
||||
@ -586,6 +617,23 @@ Status VersionEdit::DecodeFrom(const Slice& src) {
|
||||
break;
|
||||
}
|
||||
|
||||
case kWalDeletion2: {
|
||||
Slice encoded;
|
||||
if (!GetLengthPrefixedSlice(&input, &encoded)) {
|
||||
msg = "WalDeletion not prefixed by length";
|
||||
break;
|
||||
}
|
||||
|
||||
WalDeletion wal_deletion;
|
||||
const Status s = wal_deletion.DecodeFrom(&encoded);
|
||||
if (!s.ok()) {
|
||||
return s;
|
||||
}
|
||||
|
||||
wal_deletion_ = std::move(wal_deletion);
|
||||
break;
|
||||
}
|
||||
|
||||
case kColumnFamily:
|
||||
if (!GetVarint32(&input, &column_family_)) {
|
||||
if (!msg) {
|
||||
|
@ -62,6 +62,8 @@ enum Tag : uint32_t {
|
||||
kWalAddition,
|
||||
kWalDeletion,
|
||||
kFullHistoryTsLow,
|
||||
kWalAddition2,
|
||||
kWalDeletion2,
|
||||
};
|
||||
|
||||
enum NewFileCustomTag : uint32_t {
|
||||
|
@ -324,14 +324,22 @@ TEST_F(VersionEditTest, AddWalEncodeDecode) {
|
||||
TestEncodeDecode(edit);
|
||||
}
|
||||
|
||||
static std::string PrefixEncodedWalAdditionWithLength(
|
||||
const std::string& encoded) {
|
||||
std::string ret;
|
||||
PutVarint32(&ret, Tag::kWalAddition2);
|
||||
PutLengthPrefixedSlice(&ret, encoded);
|
||||
return ret;
|
||||
}
|
||||
|
||||
TEST_F(VersionEditTest, AddWalDecodeBadLogNumber) {
|
||||
std::string encoded;
|
||||
PutVarint32(&encoded, Tag::kWalAddition);
|
||||
|
||||
{
|
||||
// No log number.
|
||||
std::string encoded_edit = PrefixEncodedWalAdditionWithLength(encoded);
|
||||
VersionEdit edit;
|
||||
Status s = edit.DecodeFrom(encoded);
|
||||
Status s = edit.DecodeFrom(encoded_edit);
|
||||
ASSERT_TRUE(s.IsCorruption());
|
||||
ASSERT_TRUE(s.ToString().find("Error decoding WAL log number") !=
|
||||
std::string::npos)
|
||||
@ -345,8 +353,10 @@ TEST_F(VersionEditTest, AddWalDecodeBadLogNumber) {
|
||||
unsigned char* ptr = reinterpret_cast<unsigned char*>(&c);
|
||||
*ptr = 128;
|
||||
encoded.append(1, c);
|
||||
|
||||
std::string encoded_edit = PrefixEncodedWalAdditionWithLength(encoded);
|
||||
VersionEdit edit;
|
||||
Status s = edit.DecodeFrom(encoded);
|
||||
Status s = edit.DecodeFrom(encoded_edit);
|
||||
ASSERT_TRUE(s.IsCorruption());
|
||||
ASSERT_TRUE(s.ToString().find("Error decoding WAL log number") !=
|
||||
std::string::npos)
|
||||
@ -358,14 +368,14 @@ TEST_F(VersionEditTest, AddWalDecodeBadTag) {
|
||||
constexpr WalNumber kLogNumber = 100;
|
||||
constexpr uint64_t kSizeInBytes = 100;
|
||||
|
||||
std::string encoded_without_tag;
|
||||
PutVarint32(&encoded_without_tag, Tag::kWalAddition);
|
||||
PutVarint64(&encoded_without_tag, kLogNumber);
|
||||
std::string encoded;
|
||||
PutVarint64(&encoded, kLogNumber);
|
||||
|
||||
{
|
||||
// No tag.
|
||||
std::string encoded_edit = PrefixEncodedWalAdditionWithLength(encoded);
|
||||
VersionEdit edit;
|
||||
Status s = edit.DecodeFrom(encoded_without_tag);
|
||||
Status s = edit.DecodeFrom(encoded_edit);
|
||||
ASSERT_TRUE(s.IsCorruption());
|
||||
ASSERT_TRUE(s.ToString().find("Error decoding tag") != std::string::npos)
|
||||
<< s.ToString();
|
||||
@ -373,12 +383,15 @@ TEST_F(VersionEditTest, AddWalDecodeBadTag) {
|
||||
|
||||
{
|
||||
// Only has size tag, no terminate tag.
|
||||
std::string encoded_with_size = encoded_without_tag;
|
||||
std::string encoded_with_size = encoded;
|
||||
PutVarint32(&encoded_with_size,
|
||||
static_cast<uint32_t>(WalAdditionTag::kSyncedSize));
|
||||
PutVarint64(&encoded_with_size, kSizeInBytes);
|
||||
|
||||
std::string encoded_edit =
|
||||
PrefixEncodedWalAdditionWithLength(encoded_with_size);
|
||||
VersionEdit edit;
|
||||
Status s = edit.DecodeFrom(encoded_with_size);
|
||||
Status s = edit.DecodeFrom(encoded_edit);
|
||||
ASSERT_TRUE(s.IsCorruption());
|
||||
ASSERT_TRUE(s.ToString().find("Error decoding tag") != std::string::npos)
|
||||
<< s.ToString();
|
||||
@ -386,11 +399,14 @@ TEST_F(VersionEditTest, AddWalDecodeBadTag) {
|
||||
|
||||
{
|
||||
// Only has terminate tag.
|
||||
std::string encoded_with_terminate = encoded_without_tag;
|
||||
std::string encoded_with_terminate = encoded;
|
||||
PutVarint32(&encoded_with_terminate,
|
||||
static_cast<uint32_t>(WalAdditionTag::kTerminate));
|
||||
|
||||
std::string encoded_edit =
|
||||
PrefixEncodedWalAdditionWithLength(encoded_with_terminate);
|
||||
VersionEdit edit;
|
||||
ASSERT_OK(edit.DecodeFrom(encoded_with_terminate));
|
||||
ASSERT_OK(edit.DecodeFrom(encoded_edit));
|
||||
auto& wal_addition = edit.GetWalAdditions()[0];
|
||||
ASSERT_EQ(wal_addition.GetLogNumber(), kLogNumber);
|
||||
ASSERT_FALSE(wal_addition.GetMetadata().HasSyncedSize());
|
||||
@ -401,15 +417,15 @@ TEST_F(VersionEditTest, AddWalDecodeNoSize) {
|
||||
constexpr WalNumber kLogNumber = 100;
|
||||
|
||||
std::string encoded;
|
||||
PutVarint32(&encoded, Tag::kWalAddition);
|
||||
PutVarint64(&encoded, kLogNumber);
|
||||
PutVarint32(&encoded, static_cast<uint32_t>(WalAdditionTag::kSyncedSize));
|
||||
// No real size after the size tag.
|
||||
|
||||
{
|
||||
// Without terminate tag.
|
||||
std::string encoded_edit = PrefixEncodedWalAdditionWithLength(encoded);
|
||||
VersionEdit edit;
|
||||
Status s = edit.DecodeFrom(encoded);
|
||||
Status s = edit.DecodeFrom(encoded_edit);
|
||||
ASSERT_TRUE(s.IsCorruption());
|
||||
ASSERT_TRUE(s.ToString().find("Error decoding WAL file size") !=
|
||||
std::string::npos)
|
||||
@ -419,8 +435,10 @@ TEST_F(VersionEditTest, AddWalDecodeNoSize) {
|
||||
{
|
||||
// With terminate tag.
|
||||
PutVarint32(&encoded, static_cast<uint32_t>(WalAdditionTag::kTerminate));
|
||||
|
||||
std::string encoded_edit = PrefixEncodedWalAdditionWithLength(encoded);
|
||||
VersionEdit edit;
|
||||
Status s = edit.DecodeFrom(encoded);
|
||||
Status s = edit.DecodeFrom(encoded_edit);
|
||||
ASSERT_TRUE(s.IsCorruption());
|
||||
// The terminate tag is misunderstood as the size.
|
||||
ASSERT_TRUE(s.ToString().find("Error decoding tag") != std::string::npos)
|
||||
@ -515,6 +533,62 @@ TEST_F(VersionEditTest, FullHistoryTsLow) {
|
||||
TestEncodeDecode(edit);
|
||||
}
|
||||
|
||||
// Tests that if RocksDB is downgraded, the new types of VersionEdits
|
||||
// that have a tag larger than kTagSafeIgnoreMask can be safely ignored.
|
||||
TEST_F(VersionEditTest, IgnorableTags) {
|
||||
SyncPoint::GetInstance()->SetCallBack(
|
||||
"VersionEdit::EncodeTo:IgnoreIgnorableTags", [&](void* arg) {
|
||||
bool* ignore = static_cast<bool*>(arg);
|
||||
*ignore = true;
|
||||
});
|
||||
SyncPoint::GetInstance()->EnableProcessing();
|
||||
|
||||
constexpr uint64_t kPrevLogNumber = 100;
|
||||
constexpr uint64_t kLogNumber = 200;
|
||||
constexpr uint64_t kNextFileNumber = 300;
|
||||
constexpr uint64_t kColumnFamilyId = 400;
|
||||
|
||||
VersionEdit edit;
|
||||
// Add some ignorable entries.
|
||||
for (int i = 0; i < 2; i++) {
|
||||
edit.AddWal(i + 1, WalMetadata(i + 2));
|
||||
}
|
||||
edit.SetDBId("db_id");
|
||||
// Add unignorable entries.
|
||||
edit.SetPrevLogNumber(kPrevLogNumber);
|
||||
edit.SetLogNumber(kLogNumber);
|
||||
// Add more ignorable entries.
|
||||
edit.DeleteWalsBefore(100);
|
||||
// Add unignorable entry.
|
||||
edit.SetNextFile(kNextFileNumber);
|
||||
// Add more ignorable entries.
|
||||
edit.SetFullHistoryTsLow("ts");
|
||||
// Add unignorable entry.
|
||||
edit.SetColumnFamily(kColumnFamilyId);
|
||||
|
||||
std::string encoded;
|
||||
ASSERT_TRUE(edit.EncodeTo(&encoded));
|
||||
|
||||
VersionEdit decoded;
|
||||
ASSERT_OK(decoded.DecodeFrom(encoded));
|
||||
|
||||
// Check that all ignorable entries are ignored.
|
||||
ASSERT_FALSE(decoded.HasDbId());
|
||||
ASSERT_FALSE(decoded.HasFullHistoryTsLow());
|
||||
ASSERT_FALSE(decoded.IsWalAddition());
|
||||
ASSERT_FALSE(decoded.IsWalDeletion());
|
||||
ASSERT_TRUE(decoded.GetWalAdditions().empty());
|
||||
ASSERT_TRUE(decoded.GetWalDeletion().IsEmpty());
|
||||
|
||||
// Check that unignorable entries are still present.
|
||||
ASSERT_EQ(edit.GetPrevLogNumber(), kPrevLogNumber);
|
||||
ASSERT_EQ(edit.GetLogNumber(), kLogNumber);
|
||||
ASSERT_EQ(edit.GetNextFile(), kNextFileNumber);
|
||||
ASSERT_EQ(edit.GetColumnFamily(), kColumnFamilyId);
|
||||
|
||||
SyncPoint::GetInstance()->DisableProcessing();
|
||||
}
|
||||
|
||||
} // namespace ROCKSDB_NAMESPACE
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
|
@ -51,6 +51,8 @@ struct OptimisticTransactionDBOptions {
|
||||
uint32_t occ_lock_buckets = (1 << 20);
|
||||
};
|
||||
|
||||
// Range deletions (including those in `WriteBatch`es passed to `Write()`) are
|
||||
// incompatible with `OptimisticTransactionDB` and will return a non-OK `Status`
|
||||
class OptimisticTransactionDB : public StackableDB {
|
||||
public:
|
||||
// Open an OptimisticTransactionDB similar to DB::Open().
|
||||
|
@ -340,6 +340,17 @@ class TransactionDB : public StackableDB {
|
||||
// falls back to the un-optimized version of ::Write
|
||||
return Write(opts, updates);
|
||||
}
|
||||
// Transactional `DeleteRange()` is not yet supported.
|
||||
// However, users who know their deleted range does not conflict with
|
||||
// anything can still use it via the `Write()` API. In all cases, the
|
||||
// `Write()` overload specifying `TransactionDBWriteOptimizations` must be
|
||||
// used and `skip_concurrency_control` must be set. When using either
|
||||
// WRITE_PREPARED or WRITE_UNPREPARED , `skip_duplicate_key_check` must
|
||||
// additionally be set.
|
||||
virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*,
|
||||
const Slice&, const Slice&) override {
|
||||
return Status::NotSupported();
|
||||
}
|
||||
// Open a TransactionDB similar to DB::Open().
|
||||
// Internally call PrepareWrap() and WrapDB()
|
||||
// If the return status is not ok, then dbptr is set to nullptr.
|
||||
|
@ -6,7 +6,7 @@
|
||||
|
||||
#define ROCKSDB_MAJOR 6
|
||||
#define ROCKSDB_MINOR 16
|
||||
#define ROCKSDB_PATCH 0
|
||||
#define ROCKSDB_PATCH 5
|
||||
|
||||
// Do not use these. We made the mistake of declaring macros starting with
|
||||
// double underscore. Now we have to live with our choice. We'll deprecate these
|
||||
|
26
src.mk
26
src.mk
@ -255,18 +255,6 @@ LIB_SOURCES = \
|
||||
utilities/transactions/lock/lock_manager.cc \
|
||||
utilities/transactions/lock/point/point_lock_tracker.cc \
|
||||
utilities/transactions/lock/point/point_lock_manager.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/concurrent_tree.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/keyrange.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/lock_request.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/locktree.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/manager.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/range_buffer.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/treenode.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/txnid_set.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/wfg.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/standalone_port.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/util/dbt.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/util/memarena.cc \
|
||||
utilities/transactions/optimistic_transaction.cc \
|
||||
utilities/transactions/optimistic_transaction_db_impl.cc \
|
||||
utilities/transactions/pessimistic_transaction.cc \
|
||||
@ -298,6 +286,20 @@ LIB_SOURCES_ASM =
|
||||
LIB_SOURCES_C =
|
||||
endif
|
||||
|
||||
RANGE_TREE_SOURCES =\
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/concurrent_tree.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/keyrange.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/lock_request.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/locktree.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/manager.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/range_buffer.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/treenode.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/txnid_set.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/locktree/wfg.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/standalone_port.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/util/dbt.cc \
|
||||
utilities/transactions/lock/range/range_tree/lib/util/memarena.cc
|
||||
|
||||
TOOL_LIB_SOURCES = \
|
||||
tools/io_tracer_parser_tool.cc \
|
||||
tools/ldb_cmd.cc \
|
||||
|
@ -22,6 +22,9 @@ namespace ROCKSDB_NAMESPACE {
|
||||
|
||||
// A Timer class to handle repeated work.
|
||||
//
|
||||
// `Start()` and `Shutdown()` are currently not thread-safe. The client must
|
||||
// serialize calls to these two member functions.
|
||||
//
|
||||
// A single timer instance can handle multiple functions via a single thread.
|
||||
// It is better to leave long running work to a dedicated thread pool.
|
||||
//
|
||||
|
@ -70,6 +70,16 @@ class DummyDB : public StackableDB {
|
||||
|
||||
DBOptions GetDBOptions() const override { return DBOptions(options_); }
|
||||
|
||||
using StackableDB::GetIntProperty;
|
||||
bool GetIntProperty(ColumnFamilyHandle*, const Slice& property,
|
||||
uint64_t* value) override {
|
||||
if (property == DB::Properties::kMinLogNumberToKeep) {
|
||||
*value = 1;
|
||||
return true;
|
||||
}
|
||||
return false;
|
||||
}
|
||||
|
||||
Status EnableFileDeletions(bool /*force*/) override {
|
||||
EXPECT_TRUE(!deletions_enabled_);
|
||||
deletions_enabled_ = true;
|
||||
|
@ -232,32 +232,22 @@ Status CheckpointImpl::CreateCustomCheckpoint(
|
||||
// this will return live_files prefixed with "/"
|
||||
s = db_->GetLiveFiles(live_files, &manifest_file_size, flush_memtable);
|
||||
|
||||
if (s.ok() && db_options.allow_2pc) {
|
||||
// If 2PC is enabled, we need to get minimum log number after the flush.
|
||||
// Need to refetch the live files to recapture the snapshot.
|
||||
if (!db_->GetIntProperty(DB::Properties::kMinLogNumberToKeep,
|
||||
&min_log_num)) {
|
||||
return Status::InvalidArgument(
|
||||
"2PC enabled but cannot fine the min log number to keep.");
|
||||
}
|
||||
// We need to refetch live files with flush to handle this case:
|
||||
// A previous 000001.log contains the prepare record of transaction tnx1.
|
||||
// The current log file is 000002.log, and sequence_number points to this
|
||||
// file.
|
||||
// After calling GetLiveFiles(), 000003.log is created.
|
||||
// Then tnx1 is committed. The commit record is written to 000003.log.
|
||||
// Now we fetch min_log_num, which will be 3.
|
||||
// Then only 000002.log and 000003.log will be copied, and 000001.log will
|
||||
// be skipped. 000003.log contains commit message of tnx1, but we don't
|
||||
// have respective prepare record for it.
|
||||
// In order to avoid this situation, we need to force flush to make sure
|
||||
// all transactions committed before getting min_log_num will be flushed
|
||||
// to SST files.
|
||||
// We cannot get min_log_num before calling the GetLiveFiles() for the
|
||||
// first time, because if we do that, all the logs files will be included,
|
||||
// far more than needed.
|
||||
s = db_->GetLiveFiles(live_files, &manifest_file_size, flush_memtable);
|
||||
if (!db_->GetIntProperty(DB::Properties::kMinLogNumberToKeep, &min_log_num)) {
|
||||
return Status::InvalidArgument("cannot get the min log number to keep.");
|
||||
}
|
||||
// Between GetLiveFiles and getting min_log_num, flush might happen
|
||||
// concurrently, so new WAL deletions might be tracked in MANIFEST. If we do
|
||||
// not get the new MANIFEST size, the deleted WALs might not be reflected in
|
||||
// the checkpoint's MANIFEST.
|
||||
//
|
||||
// If we get min_log_num before the above GetLiveFiles, then there might
|
||||
// be too many unnecessary WALs to be included in the checkpoint.
|
||||
//
|
||||
// Ideally, min_log_num should be got together with manifest_file_size in
|
||||
// GetLiveFiles atomically. But that needs changes to GetLiveFiles' signature
|
||||
// which is a public API.
|
||||
s = db_->GetLiveFiles(live_files, &manifest_file_size, flush_memtable);
|
||||
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:FlushDone");
|
||||
|
||||
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles1");
|
||||
TEST_SYNC_POINT("CheckpointImpl::CreateCheckpoint:SavedLiveFiles2");
|
||||
@ -385,7 +375,6 @@ Status CheckpointImpl::CreateCustomCheckpoint(
|
||||
for (size_t i = 0; s.ok() && i < wal_size; ++i) {
|
||||
if ((live_wal_files[i]->Type() == kAliveLogFile) &&
|
||||
(!flush_memtable ||
|
||||
live_wal_files[i]->StartSequence() >= *sequence_number ||
|
||||
live_wal_files[i]->LogNumber() >= min_log_num)) {
|
||||
if (i + 1 == wal_size) {
|
||||
s = copy_file_cb(db_options.wal_dir, live_wal_files[i]->PathName(),
|
||||
|
@ -549,7 +549,7 @@ TEST_F(CheckpointTest, CurrentFileModifiedWhileCheckpointing) {
|
||||
{// Get past the flush in the checkpoint thread before adding any keys to
|
||||
// the db so the checkpoint thread won't hit the WriteManifest
|
||||
// syncpoints.
|
||||
{"DBImpl::GetLiveFiles:1",
|
||||
{"CheckpointImpl::CreateCheckpoint:FlushDone",
|
||||
"CheckpointTest::CurrentFileModifiedWhileCheckpointing:PrePut"},
|
||||
// Roll the manifest during checkpointing right after live files are
|
||||
// snapshotted.
|
||||
|
@ -31,6 +31,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -47,6 +47,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -1,3 +1,49 @@
|
||||
/*======
|
||||
This file is part of PerconaFT.
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU General Public License, version 2,
|
||||
as published by the Free Software Foundation.
|
||||
|
||||
PerconaFT is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify
|
||||
it under the terms of the GNU Affero General Public License, version 3,
|
||||
as published by the Free Software Foundation.
|
||||
|
||||
PerconaFT is distributed in the hope that it will be useful,
|
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
||||
GNU Affero General Public License for more details.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <stdio.h> // FILE
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
@ -139,7 +153,12 @@ typedef struct toku_mutex_aligned {
|
||||
{ .pmutex = PTHREAD_MUTEX_INITIALIZER, .psi_mutex = nullptr }
|
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
#else // __FreeBSD__, __linux__, at least
|
||||
#if defined(__GLIBC__)
|
||||
#define TOKU_MUTEX_ADAPTIVE PTHREAD_MUTEX_ADAPTIVE_NP
|
||||
#else
|
||||
// not all libc (e.g. musl) implement NP (Non-POSIX) attributes
|
||||
#define TOKU_MUTEX_ADAPTIVE PTHREAD_MUTEX_DEFAULT
|
||||
#endif
|
||||
#if defined(TOKU_PTHREAD_DEBUG)
|
||||
#define TOKU_ADAPTIVE_MUTEX_INITIALIZER \
|
||||
{ \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -34,6 +34,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -45,6 +45,7 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -32,6 +32,20 @@ Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved.
|
||||
|
||||
You should have received a copy of the GNU Affero General Public License
|
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
----------------------------------------
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
======= */
|
||||
|
||||
#ident \
|
||||
|
@ -46,6 +46,22 @@ class OptimisticTransactionDBImpl : public OptimisticTransactionDB {
|
||||
const OptimisticTransactionOptions& txn_options,
|
||||
Transaction* old_txn) override;
|
||||
|
||||
// Transactional `DeleteRange()` is not yet supported.
|
||||
virtual Status DeleteRange(const WriteOptions&, ColumnFamilyHandle*,
|
||||
const Slice&, const Slice&) override {
|
||||
return Status::NotSupported();
|
||||
}
|
||||
|
||||
// Range deletions also must not be snuck into `WriteBatch`es as they are
|
||||
// incompatible with `OptimisticTransactionDB`.
|
||||
virtual Status Write(const WriteOptions& write_opts,
|
||||
WriteBatch* batch) override {
|
||||
if (batch->HasDeleteRange()) {
|
||||
return Status::NotSupported();
|
||||
}
|
||||
return OptimisticTransactionDB::Write(write_opts, batch);
|
||||
}
|
||||
|
||||
size_t GetLockBucketsSize() const { return bucketed_locks_.size(); }
|
||||
|
||||
OccValidationPolicy GetValidatePolicy() const { return validate_policy_; }
|
||||
|
@ -1033,6 +1033,17 @@ TEST_P(OptimisticTransactionTest, IteratorTest) {
|
||||
delete txn;
|
||||
}
|
||||
|
||||
TEST_P(OptimisticTransactionTest, DeleteRangeSupportTest) {
|
||||
// `OptimisticTransactionDB` does not allow range deletion in any API.
|
||||
ASSERT_TRUE(
|
||||
txn_db
|
||||
->DeleteRange(WriteOptions(), txn_db->DefaultColumnFamily(), "a", "b")
|
||||
.IsNotSupported());
|
||||
WriteBatch wb;
|
||||
ASSERT_OK(wb.DeleteRange("a", "b"));
|
||||
ASSERT_NOK(txn_db->Write(WriteOptions(), &wb));
|
||||
}
|
||||
|
||||
TEST_P(OptimisticTransactionTest, SavepointTest) {
|
||||
WriteOptions write_options;
|
||||
ReadOptions read_options, snapshot_read_options;
|
||||
|
@ -4835,6 +4835,56 @@ TEST_P(TransactionTest, MergeTest) {
|
||||
ASSERT_EQ("a,3", value);
|
||||
}
|
||||
|
||||
TEST_P(TransactionTest, DeleteRangeSupportTest) {
|
||||
// The `DeleteRange()` API is banned everywhere.
|
||||
ASSERT_TRUE(
|
||||
db->DeleteRange(WriteOptions(), db->DefaultColumnFamily(), "a", "b")
|
||||
.IsNotSupported());
|
||||
|
||||
// But range deletions can be added via the `Write()` API by specifying the
|
||||
// proper flags to promise there are no conflicts according to the DB type
|
||||
// (see `TransactionDB::DeleteRange()` API doc for details).
|
||||
for (bool skip_concurrency_control : {false, true}) {
|
||||
for (bool skip_duplicate_key_check : {false, true}) {
|
||||
ASSERT_OK(db->Put(WriteOptions(), "a", "val"));
|
||||
WriteBatch wb;
|
||||
ASSERT_OK(wb.DeleteRange("a", "b"));
|
||||
TransactionDBWriteOptimizations flags;
|
||||
flags.skip_concurrency_control = skip_concurrency_control;
|
||||
flags.skip_duplicate_key_check = skip_duplicate_key_check;
|
||||
Status s = db->Write(WriteOptions(), flags, &wb);
|
||||
std::string value;
|
||||
switch (txn_db_options.write_policy) {
|
||||
case WRITE_COMMITTED:
|
||||
if (skip_concurrency_control) {
|
||||
ASSERT_OK(s);
|
||||
ASSERT_TRUE(db->Get(ReadOptions(), "a", &value).IsNotFound());
|
||||
} else {
|
||||
ASSERT_NOK(s);
|
||||
ASSERT_OK(db->Get(ReadOptions(), "a", &value));
|
||||
}
|
||||
break;
|
||||
case WRITE_PREPARED:
|
||||
// Intentional fall-through
|
||||
case WRITE_UNPREPARED:
|
||||
if (skip_concurrency_control && skip_duplicate_key_check) {
|
||||
ASSERT_OK(s);
|
||||
ASSERT_TRUE(db->Get(ReadOptions(), "a", &value).IsNotFound());
|
||||
} else {
|
||||
ASSERT_NOK(s);
|
||||
ASSERT_OK(db->Get(ReadOptions(), "a", &value));
|
||||
}
|
||||
break;
|
||||
}
|
||||
// Without any promises from the user, range deletion via other `Write()`
|
||||
// APIs are still banned.
|
||||
ASSERT_OK(db->Put(WriteOptions(), "a", "val"));
|
||||
ASSERT_NOK(db->Write(WriteOptions(), &wb));
|
||||
ASSERT_OK(db->Get(ReadOptions(), "a", &value));
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
TEST_P(TransactionTest, DeferSnapshotTest) {
|
||||
WriteOptions write_options;
|
||||
ReadOptions read_options;
|
||||
|
@ -157,7 +157,9 @@ Status WritePreparedTxnDB::WriteInternal(const WriteOptions& write_options_orig,
|
||||
// TODO(myabandeh): add an option to allow user skipping this cost
|
||||
SubBatchCounter counter(*GetCFComparatorMap());
|
||||
auto s = batch->Iterate(&counter);
|
||||
assert(s.ok());
|
||||
if (!s.ok()) {
|
||||
return s;
|
||||
}
|
||||
batch_cnt = counter.BatchCount();
|
||||
WPRecordTick(TXN_DUPLICATE_KEY_OVERHEAD);
|
||||
ROCKS_LOG_DETAILS(info_log_, "Duplicate key overhead: %" PRIu64 " batches",
|
||||
|
Loading…
Reference in New Issue
Block a user