Fix some typos in comments and docs.
Summary: Closes https://github.com/facebook/rocksdb/pull/3568 Differential Revision: D7170953 Pulled By: siying fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c
This commit is contained in:
parent
a277b0f2b7
commit
a3a3f5497c
@ -56,7 +56,7 @@ if(MSVC)
|
|||||||
include(${CMAKE_CURRENT_SOURCE_DIR}/thirdparty.inc)
|
include(${CMAKE_CURRENT_SOURCE_DIR}/thirdparty.inc)
|
||||||
else()
|
else()
|
||||||
if(CMAKE_SYSTEM_NAME MATCHES "FreeBSD")
|
if(CMAKE_SYSTEM_NAME MATCHES "FreeBSD")
|
||||||
# FreeBSD has jemaloc as default malloc
|
# FreeBSD has jemalloc as default malloc
|
||||||
# but it does not have all the jemalloc files in include/...
|
# but it does not have all the jemalloc files in include/...
|
||||||
set(WITH_JEMALLOC ON)
|
set(WITH_JEMALLOC ON)
|
||||||
else()
|
else()
|
||||||
|
@ -72,7 +72,7 @@
|
|||||||
* `BackupableDBOptions::max_valid_backups_to_open == 0` now means no backups will be opened during BackupEngine initialization. Previously this condition disabled limiting backups opened.
|
* `BackupableDBOptions::max_valid_backups_to_open == 0` now means no backups will be opened during BackupEngine initialization. Previously this condition disabled limiting backups opened.
|
||||||
* `DBOptions::preserve_deletes` is a new option that allows one to specify that DB should not drop tombstones for regular deletes if they have sequence number larger than what was set by the new API call `DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum)`. Disabled by default.
|
* `DBOptions::preserve_deletes` is a new option that allows one to specify that DB should not drop tombstones for regular deletes if they have sequence number larger than what was set by the new API call `DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum)`. Disabled by default.
|
||||||
* API call `DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum)` was added, users who wish to preserve deletes are expected to periodically call this function to advance the cutoff seqnum (all deletes made before this seqnum can be dropped by DB). It's user responsibility to figure out how to advance the seqnum in the way so the tombstones are kept for the desired period of time, yet are eventually processed in time and don't eat up too much space.
|
* API call `DB::SetPreserveDeletesSequenceNumber(SequenceNumber seqnum)` was added, users who wish to preserve deletes are expected to periodically call this function to advance the cutoff seqnum (all deletes made before this seqnum can be dropped by DB). It's user responsibility to figure out how to advance the seqnum in the way so the tombstones are kept for the desired period of time, yet are eventually processed in time and don't eat up too much space.
|
||||||
* `ReadOptions::iter_start_seqnum` was added; if set to something > 0 user will see 2 changes in iterators behavior 1) only keys written with sequence larger than this parameter would be returned and 2) the `Slice` returned by iter->key() now points to the the memory that keep User-oriented representation of the internal key, rather than user key. New struct `FullKey` was added to represent internal keys, along with a new helper function `ParseFullKey(const Slice& internal_key, FullKey* result);`.
|
* `ReadOptions::iter_start_seqnum` was added; if set to something > 0 user will see 2 changes in iterators behavior 1) only keys written with sequence larger than this parameter would be returned and 2) the `Slice` returned by iter->key() now points to the memory that keep User-oriented representation of the internal key, rather than user key. New struct `FullKey` was added to represent internal keys, along with a new helper function `ParseFullKey(const Slice& internal_key, FullKey* result);`.
|
||||||
* Deprecate trash_dir param in NewSstFileManager, right now we will rename deleted files to <name>.trash instead of moving them to trash directory
|
* Deprecate trash_dir param in NewSstFileManager, right now we will rename deleted files to <name>.trash instead of moving them to trash directory
|
||||||
* Allow setting a custom trash/DB size ratio limit in the SstFileManager, after which files that are to be scheduled for deletion are deleted immediately, regardless of any delete ratelimit.
|
* Allow setting a custom trash/DB size ratio limit in the SstFileManager, after which files that are to be scheduled for deletion are deleted immediately, regardless of any delete ratelimit.
|
||||||
* Return an error on write if write_options.sync = true and write_options.disableWAL = true to warn user of inconsistent options. Previously we will not write to WAL and not respecting the sync options in this case.
|
* Return an error on write if write_options.sync = true and write_options.disableWAL = true to warn user of inconsistent options. Previously we will not write to WAL and not respecting the sync options in this case.
|
||||||
|
@ -5,7 +5,7 @@ RocksDBLite is a project focused on mobile use cases, which don't need a lot of
|
|||||||
Some examples of the features disabled by ROCKSDB_LITE:
|
Some examples of the features disabled by ROCKSDB_LITE:
|
||||||
* compiled-in support for LDB tool
|
* compiled-in support for LDB tool
|
||||||
* No backupable DB
|
* No backupable DB
|
||||||
* No support for replication (which we provide in form of TrasactionalIterator)
|
* No support for replication (which we provide in form of TransactionalIterator)
|
||||||
* No advanced monitoring tools
|
* No advanced monitoring tools
|
||||||
* No special-purpose memtables that are highly optimized for specific use cases
|
* No special-purpose memtables that are highly optimized for specific use cases
|
||||||
* No Transactions
|
* No Transactions
|
||||||
|
@ -469,7 +469,7 @@ class DBImpl : public DB {
|
|||||||
bool no_full_scan = false);
|
bool no_full_scan = false);
|
||||||
|
|
||||||
// Diffs the files listed in filenames and those that do not
|
// Diffs the files listed in filenames and those that do not
|
||||||
// belong to live files are posibly removed. Also, removes all the
|
// belong to live files are possibly removed. Also, removes all the
|
||||||
// files in sst_delete_files and log_delete_files.
|
// files in sst_delete_files and log_delete_files.
|
||||||
// It is not necessary to hold the mutex when invoking this method.
|
// It is not necessary to hold the mutex when invoking this method.
|
||||||
// If FindObsoleteFiles() was run, we need to also run
|
// If FindObsoleteFiles() was run, we need to also run
|
||||||
|
@ -65,7 +65,7 @@ void DBImpl::MarkLogAsContainingPrepSection(uint64_t log) {
|
|||||||
|
|
||||||
auto rit = logs_with_prep_.rbegin();
|
auto rit = logs_with_prep_.rbegin();
|
||||||
bool updated = false;
|
bool updated = false;
|
||||||
// Most probabely the last log is the one that is being marked for
|
// Most probably the last log is the one that is being marked for
|
||||||
// having a prepare section; so search from the end.
|
// having a prepare section; so search from the end.
|
||||||
for (; rit != logs_with_prep_.rend() && rit->log >= log; ++rit) {
|
for (; rit != logs_with_prep_.rend() && rit->log >= log; ++rit) {
|
||||||
if (rit->log == log) {
|
if (rit->log == log) {
|
||||||
@ -97,7 +97,7 @@ uint64_t DBImpl::FindMinLogContainingOutstandingPrep() {
|
|||||||
completed_it->second == it->cnt);
|
completed_it->second == it->cnt);
|
||||||
prepared_section_completed_.erase(completed_it);
|
prepared_section_completed_.erase(completed_it);
|
||||||
}
|
}
|
||||||
// erase from beigning in vector is not efficient but this function is not
|
// erase from beginning in vector is not efficient but this function is not
|
||||||
// on the fast path.
|
// on the fast path.
|
||||||
it = logs_with_prep_.erase(it);
|
it = logs_with_prep_.erase(it);
|
||||||
}
|
}
|
||||||
@ -113,11 +113,11 @@ uint64_t DBImpl::MinLogNumberToKeep() {
|
|||||||
// sections of outstanding transactions.
|
// sections of outstanding transactions.
|
||||||
//
|
//
|
||||||
// We must check min logs with outstanding prep before we check
|
// We must check min logs with outstanding prep before we check
|
||||||
// logs referneces by memtables because a log referenced by the
|
// logs references by memtables because a log referenced by the
|
||||||
// first data structure could transition to the second under us.
|
// first data structure could transition to the second under us.
|
||||||
//
|
//
|
||||||
// TODO(horuff): iterating over all column families under db mutex.
|
// TODO(horuff): iterating over all column families under db mutex.
|
||||||
// should find more optimial solution
|
// should find more optimal solution
|
||||||
auto min_log_in_prep_heap = FindMinLogContainingOutstandingPrep();
|
auto min_log_in_prep_heap = FindMinLogContainingOutstandingPrep();
|
||||||
|
|
||||||
if (min_log_in_prep_heap != 0 && min_log_in_prep_heap < log_number) {
|
if (min_log_in_prep_heap != 0 && min_log_in_prep_heap < log_number) {
|
||||||
@ -153,7 +153,7 @@ void DBImpl::FindObsoleteFiles(JobContext* job_context, bool force,
|
|||||||
|
|
||||||
bool doing_the_full_scan = false;
|
bool doing_the_full_scan = false;
|
||||||
|
|
||||||
// logic for figurint out if we're doing the full scan
|
// logic for figuring out if we're doing the full scan
|
||||||
if (no_full_scan) {
|
if (no_full_scan) {
|
||||||
doing_the_full_scan = false;
|
doing_the_full_scan = false;
|
||||||
} else if (force ||
|
} else if (force ||
|
||||||
@ -173,7 +173,7 @@ void DBImpl::FindObsoleteFiles(JobContext* job_context, bool force,
|
|||||||
// threads
|
// threads
|
||||||
// Since job_context->min_pending_output is set, until file scan finishes,
|
// Since job_context->min_pending_output is set, until file scan finishes,
|
||||||
// mutex_ cannot be released. Otherwise, we might see no min_pending_output
|
// mutex_ cannot be released. Otherwise, we might see no min_pending_output
|
||||||
// here but later find newer generated unfinalized files while scannint.
|
// here but later find newer generated unfinalized files while scanning.
|
||||||
if (!pending_outputs_.empty()) {
|
if (!pending_outputs_.empty()) {
|
||||||
job_context->min_pending_output = *pending_outputs_.begin();
|
job_context->min_pending_output = *pending_outputs_.begin();
|
||||||
} else {
|
} else {
|
||||||
@ -344,7 +344,7 @@ void DBImpl::DeleteObsoleteFileImpl(int job_id, const std::string& fname,
|
|||||||
}
|
}
|
||||||
|
|
||||||
// Diffs the files listed in filenames and those that do not
|
// Diffs the files listed in filenames and those that do not
|
||||||
// belong to live files are posibly removed. Also, removes all the
|
// belong to live files are possibly removed. Also, removes all the
|
||||||
// files in sst_delete_files and log_delete_files.
|
// files in sst_delete_files and log_delete_files.
|
||||||
// It is not necessary to hold the mutex when invoking this method.
|
// It is not necessary to hold the mutex when invoking this method.
|
||||||
void DBImpl::PurgeObsoleteFiles(const JobContext& state, bool schedule_only) {
|
void DBImpl::PurgeObsoleteFiles(const JobContext& state, bool schedule_only) {
|
||||||
|
@ -420,7 +420,7 @@ class VersionBuilder::Rep {
|
|||||||
|
|
||||||
void MaybeAddFile(VersionStorageInfo* vstorage, int level, FileMetaData* f) {
|
void MaybeAddFile(VersionStorageInfo* vstorage, int level, FileMetaData* f) {
|
||||||
if (levels_[level].deleted_files.count(f->fd.GetNumber()) > 0) {
|
if (levels_[level].deleted_files.count(f->fd.GetNumber()) > 0) {
|
||||||
// f is to-be-delected table file
|
// f is to-be-deleted table file
|
||||||
vstorage->RemoveCurrentStats(f);
|
vstorage->RemoveCurrentStats(f);
|
||||||
} else {
|
} else {
|
||||||
vstorage->AddFile(level, f, info_log_);
|
vstorage->AddFile(level, f, info_log_);
|
||||||
|
@ -115,7 +115,7 @@ class VersionStorageInfo {
|
|||||||
// Update the accumulated stats from a file-meta.
|
// Update the accumulated stats from a file-meta.
|
||||||
void UpdateAccumulatedStats(FileMetaData* file_meta);
|
void UpdateAccumulatedStats(FileMetaData* file_meta);
|
||||||
|
|
||||||
// Decrease the current stat form a to-be-delected file-meta
|
// Decrease the current stat from a to-be-deleted file-meta
|
||||||
void RemoveCurrentStats(FileMetaData* file_meta);
|
void RemoveCurrentStats(FileMetaData* file_meta);
|
||||||
|
|
||||||
void ComputeCompensatedSizes();
|
void ComputeCompensatedSizes();
|
||||||
@ -491,7 +491,7 @@ class VersionStorageInfo {
|
|||||||
uint64_t accumulated_num_deletions_;
|
uint64_t accumulated_num_deletions_;
|
||||||
// current number of non_deletion entries
|
// current number of non_deletion entries
|
||||||
uint64_t current_num_non_deletions_;
|
uint64_t current_num_non_deletions_;
|
||||||
// current number of delection entries
|
// current number of deletion entries
|
||||||
uint64_t current_num_deletions_;
|
uint64_t current_num_deletions_;
|
||||||
// current number of file samples
|
// current number of file samples
|
||||||
uint64_t current_num_samples_;
|
uint64_t current_num_samples_;
|
||||||
@ -565,13 +565,13 @@ class Version {
|
|||||||
// Return a human readable string that describes this version's contents.
|
// Return a human readable string that describes this version's contents.
|
||||||
std::string DebugString(bool hex = false, bool print_stats = false) const;
|
std::string DebugString(bool hex = false, bool print_stats = false) const;
|
||||||
|
|
||||||
// Returns the version nuber of this version
|
// Returns the version number of this version
|
||||||
uint64_t GetVersionNumber() const { return version_number_; }
|
uint64_t GetVersionNumber() const { return version_number_; }
|
||||||
|
|
||||||
// REQUIRES: lock is held
|
// REQUIRES: lock is held
|
||||||
// On success, "tp" will contains the table properties of the file
|
// On success, "tp" will contains the table properties of the file
|
||||||
// specified in "file_meta". If the file name of "file_meta" is
|
// specified in "file_meta". If the file name of "file_meta" is
|
||||||
// known ahread, passing it by a non-null "fname" can save a
|
// known ahead, passing it by a non-null "fname" can save a
|
||||||
// file-name conversion.
|
// file-name conversion.
|
||||||
Status GetTableProperties(std::shared_ptr<const TableProperties>* tp,
|
Status GetTableProperties(std::shared_ptr<const TableProperties>* tp,
|
||||||
const FileMetaData* file_meta,
|
const FileMetaData* file_meta,
|
||||||
@ -580,14 +580,14 @@ class Version {
|
|||||||
// REQUIRES: lock is held
|
// REQUIRES: lock is held
|
||||||
// On success, *props will be populated with all SSTables' table properties.
|
// On success, *props will be populated with all SSTables' table properties.
|
||||||
// The keys of `props` are the sst file name, the values of `props` are the
|
// The keys of `props` are the sst file name, the values of `props` are the
|
||||||
// tables' propertis, represented as shared_ptr.
|
// tables' properties, represented as shared_ptr.
|
||||||
Status GetPropertiesOfAllTables(TablePropertiesCollection* props);
|
Status GetPropertiesOfAllTables(TablePropertiesCollection* props);
|
||||||
Status GetPropertiesOfAllTables(TablePropertiesCollection* props, int level);
|
Status GetPropertiesOfAllTables(TablePropertiesCollection* props, int level);
|
||||||
Status GetPropertiesOfTablesInRange(const Range* range, std::size_t n,
|
Status GetPropertiesOfTablesInRange(const Range* range, std::size_t n,
|
||||||
TablePropertiesCollection* props) const;
|
TablePropertiesCollection* props) const;
|
||||||
|
|
||||||
// REQUIRES: lock is held
|
// REQUIRES: lock is held
|
||||||
// On success, "tp" will contains the aggregated table property amoug
|
// On success, "tp" will contains the aggregated table property among
|
||||||
// the table properties of all sst files in this version.
|
// the table properties of all sst files in this version.
|
||||||
Status GetAggregatedTableProperties(
|
Status GetAggregatedTableProperties(
|
||||||
std::shared_ptr<const TableProperties>* tp, int level = -1);
|
std::shared_ptr<const TableProperties>* tp, int level = -1);
|
||||||
@ -637,7 +637,7 @@ class Version {
|
|||||||
bool IsFilterSkipped(int level, bool is_file_last_in_level = false);
|
bool IsFilterSkipped(int level, bool is_file_last_in_level = false);
|
||||||
|
|
||||||
// The helper function of UpdateAccumulatedStats, which may fill the missing
|
// The helper function of UpdateAccumulatedStats, which may fill the missing
|
||||||
// fields of file_mata from its associated TableProperties.
|
// fields of file_meta from its associated TableProperties.
|
||||||
// Returns true if it does initialize FileMetaData.
|
// Returns true if it does initialize FileMetaData.
|
||||||
bool MaybeInitializeFileMetaData(FileMetaData* file_meta);
|
bool MaybeInitializeFileMetaData(FileMetaData* file_meta);
|
||||||
|
|
||||||
@ -775,7 +775,7 @@ class VersionSet {
|
|||||||
// Set the last sequence number to s.
|
// Set the last sequence number to s.
|
||||||
void SetLastSequence(uint64_t s) {
|
void SetLastSequence(uint64_t s) {
|
||||||
assert(s >= last_sequence_);
|
assert(s >= last_sequence_);
|
||||||
// Last visible seqeunce must always be less than last written seq
|
// Last visible sequence must always be less than last written seq
|
||||||
assert(!db_options_->two_write_queues || s <= last_allocated_sequence_);
|
assert(!db_options_->two_write_queues || s <= last_allocated_sequence_);
|
||||||
last_sequence_.store(s, std::memory_order_release);
|
last_sequence_.store(s, std::memory_order_release);
|
||||||
}
|
}
|
||||||
@ -913,7 +913,7 @@ class VersionSet {
|
|||||||
// The last allocated sequence that is also published to the readers. This is
|
// The last allocated sequence that is also published to the readers. This is
|
||||||
// applicable only when last_seq_same_as_publish_seq_ is not set. Otherwise
|
// applicable only when last_seq_same_as_publish_seq_ is not set. Otherwise
|
||||||
// last_sequence_ also indicates the last published seq.
|
// last_sequence_ also indicates the last published seq.
|
||||||
// We have last_sequence <= last_published_seqeunce_ <=
|
// We have last_sequence <= last_published_sequence_ <=
|
||||||
// last_allocated_sequence_
|
// last_allocated_sequence_
|
||||||
std::atomic<uint64_t> last_published_sequence_;
|
std::atomic<uint64_t> last_published_sequence_;
|
||||||
uint64_t prev_log_number_; // 0 or backing store for memtable being compacted
|
uint64_t prev_log_number_; // 0 or backing store for memtable being compacted
|
||||||
|
@ -397,7 +397,7 @@ Status WriteBatch::Iterate(Handler* handler) const {
|
|||||||
input.remove_prefix(WriteBatchInternal::kHeader);
|
input.remove_prefix(WriteBatchInternal::kHeader);
|
||||||
Slice key, value, blob, xid;
|
Slice key, value, blob, xid;
|
||||||
// Sometimes a sub-batch starts with a Noop. We want to exclude such Noops as
|
// Sometimes a sub-batch starts with a Noop. We want to exclude such Noops as
|
||||||
// the batch boundry sybmols otherwise we would mis-count the number of
|
// the batch boundary symbols otherwise we would mis-count the number of
|
||||||
// batches. We do that by checking whether the accumulated batch is empty
|
// batches. We do that by checking whether the accumulated batch is empty
|
||||||
// before seeing the next Noop.
|
// before seeing the next Noop.
|
||||||
bool empty_batch = true;
|
bool empty_batch = true;
|
||||||
@ -1070,11 +1070,11 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
// is set when a batch, which is tagged with seq, is read from the WAL.
|
// is set when a batch, which is tagged with seq, is read from the WAL.
|
||||||
// Within a sequenced batch, which could be a merge of multiple batches, we
|
// Within a sequenced batch, which could be a merge of multiple batches, we
|
||||||
// have two policies to advance the seq: i) seq_per_key (default) and ii)
|
// have two policies to advance the seq: i) seq_per_key (default) and ii)
|
||||||
// seq_per_batch. To implement the latter we need to mark the boundry between
|
// seq_per_batch. To implement the latter we need to mark the boundary between
|
||||||
// the individual batches. The approach is this: 1) Use the terminating
|
// the individual batches. The approach is this: 1) Use the terminating
|
||||||
// markers to indicate the boundry (kTypeEndPrepareXID, kTypeCommitXID,
|
// markers to indicate the boundary (kTypeEndPrepareXID, kTypeCommitXID,
|
||||||
// kTypeRollbackXID) 2) Terminate a batch with kTypeNoop in the absense of a
|
// kTypeRollbackXID) 2) Terminate a batch with kTypeNoop in the absence of a
|
||||||
// natural boundy marker.
|
// natural boundary marker.
|
||||||
void MaybeAdvanceSeq(bool batch_boundry = false) {
|
void MaybeAdvanceSeq(bool batch_boundry = false) {
|
||||||
if (batch_boundry == seq_per_batch_) {
|
if (batch_boundry == seq_per_batch_) {
|
||||||
sequence_++;
|
sequence_++;
|
||||||
@ -1150,7 +1150,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
bool batch_boundry = false;
|
bool batch_boundry = false;
|
||||||
if (rebuilding_trx_ != nullptr) {
|
if (rebuilding_trx_ != nullptr) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// The CF is probabely flushed and hence no need for insert but we still
|
// The CF is probably flushed and hence no need for insert but we still
|
||||||
// need to keep track of the keys for upcoming rollback/commit.
|
// need to keep track of the keys for upcoming rollback/commit.
|
||||||
WriteBatchInternal::Put(rebuilding_trx_, column_family_id, key, value);
|
WriteBatchInternal::Put(rebuilding_trx_, column_family_id, key, value);
|
||||||
batch_boundry = duplicate_detector_.IsDuplicateKeySeq(column_family_id,
|
batch_boundry = duplicate_detector_.IsDuplicateKeySeq(column_family_id,
|
||||||
@ -1230,10 +1230,10 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// If the ret_status is TryAgain then let the next try to add the ky to
|
// If the ret_status is TryAgain then let the next try to add the ky to
|
||||||
// the the rebuilding transaction object.
|
// the rebuilding transaction object.
|
||||||
WriteBatchInternal::Put(rebuilding_trx_, column_family_id, key, value);
|
WriteBatchInternal::Put(rebuilding_trx_, column_family_id, key, value);
|
||||||
}
|
}
|
||||||
// Since all Puts are logged in trasaction logs (if enabled), always bump
|
// Since all Puts are logged in transaction logs (if enabled), always bump
|
||||||
// sequence number. Even if the update eventually fails and does not result
|
// sequence number. Even if the update eventually fails and does not result
|
||||||
// in memtable add/update.
|
// in memtable add/update.
|
||||||
MaybeAdvanceSeq();
|
MaybeAdvanceSeq();
|
||||||
@ -1278,7 +1278,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
bool batch_boundry = false;
|
bool batch_boundry = false;
|
||||||
if (rebuilding_trx_ != nullptr) {
|
if (rebuilding_trx_ != nullptr) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// The CF is probabely flushed and hence no need for insert but we still
|
// The CF is probably flushed and hence no need for insert but we still
|
||||||
// need to keep track of the keys for upcoming rollback/commit.
|
// need to keep track of the keys for upcoming rollback/commit.
|
||||||
WriteBatchInternal::Delete(rebuilding_trx_, column_family_id, key);
|
WriteBatchInternal::Delete(rebuilding_trx_, column_family_id, key);
|
||||||
batch_boundry = duplicate_detector_.IsDuplicateKeySeq(column_family_id,
|
batch_boundry = duplicate_detector_.IsDuplicateKeySeq(column_family_id,
|
||||||
@ -1293,7 +1293,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// If the ret_status is TryAgain then let the next try to add the ky to
|
// If the ret_status is TryAgain then let the next try to add the ky to
|
||||||
// the the rebuilding transaction object.
|
// the rebuilding transaction object.
|
||||||
WriteBatchInternal::Delete(rebuilding_trx_, column_family_id, key);
|
WriteBatchInternal::Delete(rebuilding_trx_, column_family_id, key);
|
||||||
}
|
}
|
||||||
return ret_status;
|
return ret_status;
|
||||||
@ -1313,7 +1313,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
bool batch_boundry = false;
|
bool batch_boundry = false;
|
||||||
if (rebuilding_trx_ != nullptr) {
|
if (rebuilding_trx_ != nullptr) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// The CF is probabely flushed and hence no need for insert but we still
|
// The CF is probably flushed and hence no need for insert but we still
|
||||||
// need to keep track of the keys for upcoming rollback/commit.
|
// need to keep track of the keys for upcoming rollback/commit.
|
||||||
WriteBatchInternal::SingleDelete(rebuilding_trx_, column_family_id,
|
WriteBatchInternal::SingleDelete(rebuilding_trx_, column_family_id,
|
||||||
key);
|
key);
|
||||||
@ -1330,7 +1330,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// If the ret_status is TryAgain then let the next try to add the ky to
|
// If the ret_status is TryAgain then let the next try to add the ky to
|
||||||
// the the rebuilding transaction object.
|
// the rebuilding transaction object.
|
||||||
WriteBatchInternal::SingleDelete(rebuilding_trx_, column_family_id, key);
|
WriteBatchInternal::SingleDelete(rebuilding_trx_, column_family_id, key);
|
||||||
}
|
}
|
||||||
return ret_status;
|
return ret_status;
|
||||||
@ -1352,11 +1352,11 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
bool batch_boundry = false;
|
bool batch_boundry = false;
|
||||||
if (rebuilding_trx_ != nullptr) {
|
if (rebuilding_trx_ != nullptr) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// The CF is probabely flushed and hence no need for insert but we still
|
// The CF is probably flushed and hence no need for insert but we still
|
||||||
// need to keep track of the keys for upcoming rollback/commit.
|
// need to keep track of the keys for upcoming rollback/commit.
|
||||||
WriteBatchInternal::DeleteRange(rebuilding_trx_, column_family_id,
|
WriteBatchInternal::DeleteRange(rebuilding_trx_, column_family_id,
|
||||||
begin_key, end_key);
|
begin_key, end_key);
|
||||||
// TODO(myabandeh): when transctional DeleteRange support is added,
|
// TODO(myabandeh): when transactional DeleteRange support is added,
|
||||||
// check if end_key must also be added.
|
// check if end_key must also be added.
|
||||||
batch_boundry = duplicate_detector_.IsDuplicateKeySeq(
|
batch_boundry = duplicate_detector_.IsDuplicateKeySeq(
|
||||||
column_family_id, begin_key, sequence_);
|
column_family_id, begin_key, sequence_);
|
||||||
@ -1384,7 +1384,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// If the ret_status is TryAgain then let the next try to add the ky to
|
// If the ret_status is TryAgain then let the next try to add the ky to
|
||||||
// the the rebuilding transaction object.
|
// the rebuilding transaction object.
|
||||||
WriteBatchInternal::DeleteRange(rebuilding_trx_, column_family_id,
|
WriteBatchInternal::DeleteRange(rebuilding_trx_, column_family_id,
|
||||||
begin_key, end_key);
|
begin_key, end_key);
|
||||||
}
|
}
|
||||||
@ -1406,7 +1406,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
bool batch_boundry = false;
|
bool batch_boundry = false;
|
||||||
if (rebuilding_trx_ != nullptr) {
|
if (rebuilding_trx_ != nullptr) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// The CF is probabely flushed and hence no need for insert but we still
|
// The CF is probably flushed and hence no need for insert but we still
|
||||||
// need to keep track of the keys for upcoming rollback/commit.
|
// need to keep track of the keys for upcoming rollback/commit.
|
||||||
WriteBatchInternal::Merge(rebuilding_trx_, column_family_id, key,
|
WriteBatchInternal::Merge(rebuilding_trx_, column_family_id, key,
|
||||||
value);
|
value);
|
||||||
@ -1498,7 +1498,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
if (UNLIKELY(!ret_status.IsTryAgain() && rebuilding_trx_ != nullptr)) {
|
||||||
assert(!write_after_commit_);
|
assert(!write_after_commit_);
|
||||||
// If the ret_status is TryAgain then let the next try to add the ky to
|
// If the ret_status is TryAgain then let the next try to add the ky to
|
||||||
// the the rebuilding transaction object.
|
// the rebuilding transaction object.
|
||||||
WriteBatchInternal::Merge(rebuilding_trx_, column_family_id, key, value);
|
WriteBatchInternal::Merge(rebuilding_trx_, column_family_id, key, value);
|
||||||
}
|
}
|
||||||
MaybeAdvanceSeq();
|
MaybeAdvanceSeq();
|
||||||
@ -1596,7 +1596,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
// and commit.
|
// and commit.
|
||||||
auto trx = db_->GetRecoveredTransaction(name.ToString());
|
auto trx = db_->GetRecoveredTransaction(name.ToString());
|
||||||
|
|
||||||
// the log contaiting the prepared section may have
|
// the log containing the prepared section may have
|
||||||
// been released in the last incarnation because the
|
// been released in the last incarnation because the
|
||||||
// data was flushed to L0
|
// data was flushed to L0
|
||||||
if (trx != nullptr) {
|
if (trx != nullptr) {
|
||||||
@ -1604,7 +1604,7 @@ class MemTableInserter : public WriteBatch::Handler {
|
|||||||
// duplicate re-insertion of values.
|
// duplicate re-insertion of values.
|
||||||
assert(log_number_ref_ == 0);
|
assert(log_number_ref_ == 0);
|
||||||
if (write_after_commit_) {
|
if (write_after_commit_) {
|
||||||
// all insertes must reference this trx log number
|
// all inserts must reference this trx log number
|
||||||
log_number_ref_ = trx->log_number_;
|
log_number_ref_ = trx->log_number_;
|
||||||
s = trx->batch_->Iterate(this);
|
s = trx->batch_->Iterate(this);
|
||||||
log_number_ref_ = 0;
|
log_number_ref_ = 0;
|
||||||
|
@ -117,10 +117,10 @@ class WriteBatchInternal {
|
|||||||
// Set the count for the number of entries in the batch.
|
// Set the count for the number of entries in the batch.
|
||||||
static void SetCount(WriteBatch* batch, int n);
|
static void SetCount(WriteBatch* batch, int n);
|
||||||
|
|
||||||
// Return the seqeunce number for the start of this batch.
|
// Return the sequence number for the start of this batch.
|
||||||
static SequenceNumber Sequence(const WriteBatch* batch);
|
static SequenceNumber Sequence(const WriteBatch* batch);
|
||||||
|
|
||||||
// Store the specified number as the seqeunce number for the start of
|
// Store the specified number as the sequence number for the start of
|
||||||
// this batch.
|
// this batch.
|
||||||
static void SetSequence(WriteBatch* batch, SequenceNumber seq);
|
static void SetSequence(WriteBatch* batch, SequenceNumber seq);
|
||||||
|
|
||||||
@ -168,7 +168,7 @@ class WriteBatchInternal {
|
|||||||
bool seq_per_batch = false);
|
bool seq_per_batch = false);
|
||||||
|
|
||||||
// Convenience form of InsertInto when you have only one batch
|
// Convenience form of InsertInto when you have only one batch
|
||||||
// next_seq returns the seq after last sequnce number used in MemTable insert
|
// next_seq returns the seq after last sequence number used in MemTable insert
|
||||||
static Status InsertInto(const WriteBatch* batch,
|
static Status InsertInto(const WriteBatch* batch,
|
||||||
ColumnFamilyMemTables* memtables,
|
ColumnFamilyMemTables* memtables,
|
||||||
FlushScheduler* flush_scheduler,
|
FlushScheduler* flush_scheduler,
|
||||||
|
6
env/env_encryption.cc
vendored
6
env/env_encryption.cc
vendored
@ -150,7 +150,7 @@ class EncryptedRandomAccessFile : public RandomAccessFile {
|
|||||||
// may not have been modified.
|
// may not have been modified.
|
||||||
//
|
//
|
||||||
// This function guarantees, for IDs from a given environment, two unique ids
|
// This function guarantees, for IDs from a given environment, two unique ids
|
||||||
// cannot be made equal to eachother by adding arbitrary bytes to one of
|
// cannot be made equal to each other by adding arbitrary bytes to one of
|
||||||
// them. That is, no unique ID is the prefix of another.
|
// them. That is, no unique ID is the prefix of another.
|
||||||
//
|
//
|
||||||
// This function guarantees that the returned ID will not be interpretable as
|
// This function guarantees that the returned ID will not be interpretable as
|
||||||
@ -584,7 +584,7 @@ class EncryptedEnv : public EnvWrapper {
|
|||||||
return Status::OK();
|
return Status::OK();
|
||||||
}
|
}
|
||||||
|
|
||||||
// Open `fname` for random read and write, if file dont exist the file
|
// Open `fname` for random read and write, if file doesn't exist the file
|
||||||
// will be created. On success, stores a pointer to the new file in
|
// will be created. On success, stores a pointer to the new file in
|
||||||
// *result and returns OK. On failure returns non-OK.
|
// *result and returns OK. On failure returns non-OK.
|
||||||
//
|
//
|
||||||
@ -828,7 +828,7 @@ Status CTRCipherStream::DecryptBlock(uint64_t blockIndex, char *data, char* scra
|
|||||||
// GetPrefixLength returns the length of the prefix that is added to every file
|
// GetPrefixLength returns the length of the prefix that is added to every file
|
||||||
// and used for storing encryption options.
|
// and used for storing encryption options.
|
||||||
// For optimal performance, the prefix length should be a multiple of
|
// For optimal performance, the prefix length should be a multiple of
|
||||||
// the a page size.
|
// the page size.
|
||||||
size_t CTREncryptionProvider::GetPrefixLength() {
|
size_t CTREncryptionProvider::GetPrefixLength() {
|
||||||
return defaultPrefixLength;
|
return defaultPrefixLength;
|
||||||
}
|
}
|
||||||
|
@ -253,7 +253,7 @@ struct AdvancedColumnFamilyOptions {
|
|||||||
// if prefix_extractor is set and memtable_prefix_bloom_size_ratio is not 0,
|
// if prefix_extractor is set and memtable_prefix_bloom_size_ratio is not 0,
|
||||||
// create prefix bloom for memtable with the size of
|
// create prefix bloom for memtable with the size of
|
||||||
// write_buffer_size * memtable_prefix_bloom_size_ratio.
|
// write_buffer_size * memtable_prefix_bloom_size_ratio.
|
||||||
// If it is larger than 0.25, it is santinized to 0.25.
|
// If it is larger than 0.25, it is sanitized to 0.25.
|
||||||
//
|
//
|
||||||
// Default: 0 (disable)
|
// Default: 0 (disable)
|
||||||
//
|
//
|
||||||
@ -560,7 +560,7 @@ struct AdvancedColumnFamilyOptions {
|
|||||||
// Default: false
|
// Default: false
|
||||||
bool paranoid_file_checks = false;
|
bool paranoid_file_checks = false;
|
||||||
|
|
||||||
// In debug mode, RocksDB run consistency checks on the LSM everytime the LSM
|
// In debug mode, RocksDB run consistency checks on the LSM every time the LSM
|
||||||
// change (Flush, Compaction, AddFile). These checks are disabled in release
|
// change (Flush, Compaction, AddFile). These checks are disabled in release
|
||||||
// mode, use this option to enable them in release mode as well.
|
// mode, use this option to enable them in release mode as well.
|
||||||
// Default: false
|
// Default: false
|
||||||
|
@ -30,7 +30,7 @@ class Cleanable {
|
|||||||
Cleanable(Cleanable&) = delete;
|
Cleanable(Cleanable&) = delete;
|
||||||
Cleanable& operator=(Cleanable&) = delete;
|
Cleanable& operator=(Cleanable&) = delete;
|
||||||
|
|
||||||
// Move consturctor and move assignment is allowed.
|
// Move constructor and move assignment is allowed.
|
||||||
Cleanable(Cleanable&&);
|
Cleanable(Cleanable&&);
|
||||||
Cleanable& operator=(Cleanable&&);
|
Cleanable& operator=(Cleanable&&);
|
||||||
|
|
||||||
|
@ -72,7 +72,7 @@ struct CompactionJobStats {
|
|||||||
// Time spent on file fsync.
|
// Time spent on file fsync.
|
||||||
uint64_t file_fsync_nanos;
|
uint64_t file_fsync_nanos;
|
||||||
|
|
||||||
// Time spent on preparing file write (falocate, etc)
|
// Time spent on preparing file write (fallocate, etc)
|
||||||
uint64_t file_prepare_write_nanos;
|
uint64_t file_prepare_write_nanos;
|
||||||
|
|
||||||
// 0-terminated strings storing the first 8 bytes of the smallest and
|
// 0-terminated strings storing the first 8 bytes of the smallest and
|
||||||
|
@ -172,7 +172,7 @@ class DB {
|
|||||||
std::vector<ColumnFamilyHandle*>* handles, DB** dbptr);
|
std::vector<ColumnFamilyHandle*>* handles, DB** dbptr);
|
||||||
|
|
||||||
// Close the DB by releasing resources, closing files etc. This should be
|
// Close the DB by releasing resources, closing files etc. This should be
|
||||||
// called before calling the desctructor so that the caller can get back a
|
// called before calling the destructor so that the caller can get back a
|
||||||
// status in case there are any errors. This will not fsync the WAL files.
|
// status in case there are any errors. This will not fsync the WAL files.
|
||||||
// If syncing is required, the caller must first call SyncWAL(), or Write()
|
// If syncing is required, the caller must first call SyncWAL(), or Write()
|
||||||
// using an empty write batch with WriteOptions.sync=true.
|
// using an empty write batch with WriteOptions.sync=true.
|
||||||
|
@ -17,7 +17,7 @@ struct DumpOptions {
|
|||||||
std::string db_path;
|
std::string db_path;
|
||||||
// File location that will contain dump output
|
// File location that will contain dump output
|
||||||
std::string dump_location;
|
std::string dump_location;
|
||||||
// Dont include db information header in the dump
|
// Don't include db information header in the dump
|
||||||
bool anonymous = false;
|
bool anonymous = false;
|
||||||
};
|
};
|
||||||
|
|
||||||
|
@ -363,7 +363,7 @@ class Env {
|
|||||||
return NowMicros() * 1000;
|
return NowMicros() * 1000;
|
||||||
}
|
}
|
||||||
|
|
||||||
// Sleep/delay the thread for the perscribed number of micro-seconds.
|
// Sleep/delay the thread for the prescribed number of micro-seconds.
|
||||||
virtual void SleepForMicroseconds(int micros) = 0;
|
virtual void SleepForMicroseconds(int micros) = 0;
|
||||||
|
|
||||||
// Get the current host name.
|
// Get the current host name.
|
||||||
|
@ -133,7 +133,7 @@ class EncryptionProvider {
|
|||||||
// GetPrefixLength returns the length of the prefix that is added to every file
|
// GetPrefixLength returns the length of the prefix that is added to every file
|
||||||
// and used for storing encryption options.
|
// and used for storing encryption options.
|
||||||
// For optimal performance, the prefix length should be a multiple of
|
// For optimal performance, the prefix length should be a multiple of
|
||||||
// the a page size.
|
// the page size.
|
||||||
virtual size_t GetPrefixLength() = 0;
|
virtual size_t GetPrefixLength() = 0;
|
||||||
|
|
||||||
// CreateNewPrefix initialized an allocated block of prefix memory
|
// CreateNewPrefix initialized an allocated block of prefix memory
|
||||||
@ -165,7 +165,7 @@ class CTREncryptionProvider : public EncryptionProvider {
|
|||||||
// GetPrefixLength returns the length of the prefix that is added to every file
|
// GetPrefixLength returns the length of the prefix that is added to every file
|
||||||
// and used for storing encryption options.
|
// and used for storing encryption options.
|
||||||
// For optimal performance, the prefix length should be a multiple of
|
// For optimal performance, the prefix length should be a multiple of
|
||||||
// the a page size.
|
// the page size.
|
||||||
virtual size_t GetPrefixLength() override;
|
virtual size_t GetPrefixLength() override;
|
||||||
|
|
||||||
// CreateNewPrefix initialized an allocated block of prefix memory
|
// CreateNewPrefix initialized an allocated block of prefix memory
|
||||||
|
@ -346,7 +346,7 @@ extern MemTableRepFactory* NewHashLinkListRepFactory(
|
|||||||
|
|
||||||
// This factory creates a cuckoo-hashing based mem-table representation.
|
// This factory creates a cuckoo-hashing based mem-table representation.
|
||||||
// Cuckoo-hash is a closed-hash strategy, in which all key/value pairs
|
// Cuckoo-hash is a closed-hash strategy, in which all key/value pairs
|
||||||
// are stored in the bucket array itself intead of in some data structures
|
// are stored in the bucket array itself instead of in some data structures
|
||||||
// external to the bucket array. In addition, each key in cuckoo hash
|
// external to the bucket array. In addition, each key in cuckoo hash
|
||||||
// has a constant number of possible buckets in the bucket array. These
|
// has a constant number of possible buckets in the bucket array. These
|
||||||
// two properties together makes cuckoo hash more memory efficient and
|
// two properties together makes cuckoo hash more memory efficient and
|
||||||
|
@ -87,7 +87,7 @@ class MergeOperator {
|
|||||||
// The key associated with the merge operation.
|
// The key associated with the merge operation.
|
||||||
const Slice& key;
|
const Slice& key;
|
||||||
// The existing value of the current key, nullptr means that the
|
// The existing value of the current key, nullptr means that the
|
||||||
// value dont exist.
|
// value doesn't exist.
|
||||||
const Slice* existing_value;
|
const Slice* existing_value;
|
||||||
// A list of operands to apply.
|
// A list of operands to apply.
|
||||||
const std::vector<Slice>& operand_list;
|
const std::vector<Slice>& operand_list;
|
||||||
|
@ -121,7 +121,7 @@ class Slice {
|
|||||||
/**
|
/**
|
||||||
* A Slice that can be pinned with some cleanup tasks, which will be run upon
|
* A Slice that can be pinned with some cleanup tasks, which will be run upon
|
||||||
* ::Reset() or object destruction, whichever is invoked first. This can be used
|
* ::Reset() or object destruction, whichever is invoked first. This can be used
|
||||||
* to avoid memcpy by having the PinnsableSlice object referring to the data
|
* to avoid memcpy by having the PinnableSlice object referring to the data
|
||||||
* that is locked in the memory and release them after the data is consumed.
|
* that is locked in the memory and release them after the data is consumed.
|
||||||
*/
|
*/
|
||||||
class PinnableSlice : public Slice, public Cleanable {
|
class PinnableSlice : public Slice, public Cleanable {
|
||||||
|
@ -22,7 +22,7 @@ namespace rocksdb {
|
|||||||
class Slice;
|
class Slice;
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* A SliceTranform is a generic pluggable way of transforming one string
|
* A SliceTransform is a generic pluggable way of transforming one string
|
||||||
* to another. Its primary use-case is in configuring rocksdb
|
* to another. Its primary use-case is in configuring rocksdb
|
||||||
* to store prefix blooms by setting prefix_extractor in
|
* to store prefix blooms by setting prefix_extractor in
|
||||||
* ColumnFamilyOptions.
|
* ColumnFamilyOptions.
|
||||||
@ -72,7 +72,7 @@ class SliceTransform {
|
|||||||
// by setting ReadOptions.total_order_seek = true.
|
// by setting ReadOptions.total_order_seek = true.
|
||||||
//
|
//
|
||||||
// Here is an example: Suppose we implement a slice transform that returns
|
// Here is an example: Suppose we implement a slice transform that returns
|
||||||
// the first part of the string after spliting it using delimiter ",":
|
// the first part of the string after splitting it using delimiter ",":
|
||||||
// 1. SameResultWhenAppended("abc,") should return true. If applying prefix
|
// 1. SameResultWhenAppended("abc,") should return true. If applying prefix
|
||||||
// bloom filter using it, all slices matching "abc:.*" will be extracted
|
// bloom filter using it, all slices matching "abc:.*" will be extracted
|
||||||
// to "abc,", so any SST file or memtable containing any of those key
|
// to "abc,", so any SST file or memtable containing any of those key
|
||||||
|
@ -67,7 +67,7 @@ class SstFileWriter {
|
|||||||
// be ingested into this column_family, note that passing nullptr means that
|
// be ingested into this column_family, note that passing nullptr means that
|
||||||
// the column_family is unknown.
|
// the column_family is unknown.
|
||||||
// If invalidate_page_cache is set to true, SstFileWriter will give the OS a
|
// If invalidate_page_cache is set to true, SstFileWriter will give the OS a
|
||||||
// hint that this file pages is not needed everytime we write 1MB to the file.
|
// hint that this file pages is not needed every time we write 1MB to the file.
|
||||||
// To use the rate limiter an io_priority smaller than IO_TOTAL can be passed.
|
// To use the rate limiter an io_priority smaller than IO_TOTAL can be passed.
|
||||||
SstFileWriter(const EnvOptions& env_options, const Options& options,
|
SstFileWriter(const EnvOptions& env_options, const Options& options,
|
||||||
ColumnFamilyHandle* column_family = nullptr,
|
ColumnFamilyHandle* column_family = nullptr,
|
||||||
|
@ -156,7 +156,7 @@ struct BlockBasedTableOptions {
|
|||||||
// well.
|
// well.
|
||||||
// TODO(myabandeh): remove the note above once the limitation is lifted
|
// TODO(myabandeh): remove the note above once the limitation is lifted
|
||||||
// Use partitioned full filters for each SST file. This option is
|
// Use partitioned full filters for each SST file. This option is
|
||||||
// incompatibile with block-based filters.
|
// incompatible with block-based filters.
|
||||||
bool partition_filters = false;
|
bool partition_filters = false;
|
||||||
|
|
||||||
// Use delta encoding to compress keys in blocks.
|
// Use delta encoding to compress keys in blocks.
|
||||||
@ -214,7 +214,7 @@ struct BlockBasedTableOptions {
|
|||||||
// encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
|
// encode compressed blocks with LZ4, BZip2 and Zlib compression. If you
|
||||||
// don't plan to run RocksDB before version 3.10, you should probably use
|
// don't plan to run RocksDB before version 3.10, you should probably use
|
||||||
// this.
|
// this.
|
||||||
// This option only affects newly written tables. When reading exising tables,
|
// This option only affects newly written tables. When reading existing tables,
|
||||||
// the information about version is read from the footer.
|
// the information about version is read from the footer.
|
||||||
uint32_t format_version = 2;
|
uint32_t format_version = 2;
|
||||||
|
|
||||||
@ -226,7 +226,7 @@ struct BlockBasedTableOptions {
|
|||||||
|
|
||||||
// Table Properties that are specific to block-based table properties.
|
// Table Properties that are specific to block-based table properties.
|
||||||
struct BlockBasedTablePropertyNames {
|
struct BlockBasedTablePropertyNames {
|
||||||
// value of this propertis is a fixed int32 number.
|
// value of this properties is a fixed int32 number.
|
||||||
static const std::string kIndexType;
|
static const std::string kIndexType;
|
||||||
// value is "1" for true and "0" for false.
|
// value is "1" for true and "0" for false.
|
||||||
static const std::string kWholeKeyFiltering;
|
static const std::string kWholeKeyFiltering;
|
||||||
@ -319,7 +319,7 @@ struct PlainTableOptions {
|
|||||||
};
|
};
|
||||||
|
|
||||||
// -- Plain Table with prefix-only seek
|
// -- Plain Table with prefix-only seek
|
||||||
// For this factory, you need to set Options.prefix_extrator properly to make it
|
// For this factory, you need to set Options.prefix_extractor properly to make it
|
||||||
// work. Look-up will starts with prefix hash lookup for key prefix. Inside the
|
// work. Look-up will starts with prefix hash lookup for key prefix. Inside the
|
||||||
// hash bucket found, a binary search is executed for hash conflicts. Finally,
|
// hash bucket found, a binary search is executed for hash conflicts. Finally,
|
||||||
// a linear search is used.
|
// a linear search is used.
|
||||||
@ -382,7 +382,7 @@ struct CuckooTableOptions {
|
|||||||
bool identity_as_first_hash = false;
|
bool identity_as_first_hash = false;
|
||||||
// If this option is set to true, module is used during hash calculation.
|
// If this option is set to true, module is used during hash calculation.
|
||||||
// This often yields better space efficiency at the cost of performance.
|
// This often yields better space efficiency at the cost of performance.
|
||||||
// If this optino is set to false, # of entries in table is constrained to be
|
// If this option is set to false, # of entries in table is constrained to be
|
||||||
// power of two, and bit and is used to calculate hash, which is faster in
|
// power of two, and bit and is used to calculate hash, which is faster in
|
||||||
// general.
|
// general.
|
||||||
bool use_module_hash = true;
|
bool use_module_hash = true;
|
||||||
|
@ -15,7 +15,7 @@ namespace rocksdb {
|
|||||||
// Other than basic table properties, each table may also have the user
|
// Other than basic table properties, each table may also have the user
|
||||||
// collected properties.
|
// collected properties.
|
||||||
// The value of the user-collected properties are encoded as raw bytes --
|
// The value of the user-collected properties are encoded as raw bytes --
|
||||||
// users have to interprete these values by themselves.
|
// users have to interpret these values by themselves.
|
||||||
// Note: To do prefix seek/scan in `UserCollectedProperties`, you can do
|
// Note: To do prefix seek/scan in `UserCollectedProperties`, you can do
|
||||||
// something similar to:
|
// something similar to:
|
||||||
//
|
//
|
||||||
@ -59,7 +59,7 @@ extern const std::string kRangeDelBlock;
|
|||||||
// `TablePropertiesCollector` provides the mechanism for users to collect
|
// `TablePropertiesCollector` provides the mechanism for users to collect
|
||||||
// their own properties that they are interested in. This class is essentially
|
// their own properties that they are interested in. This class is essentially
|
||||||
// a collection of callback functions that will be invoked during table
|
// a collection of callback functions that will be invoked during table
|
||||||
// building. It is construced with TablePropertiesCollectorFactory. The methods
|
// building. It is constructed with TablePropertiesCollectorFactory. The methods
|
||||||
// don't need to be thread-safe, as we will create exactly one
|
// don't need to be thread-safe, as we will create exactly one
|
||||||
// TablePropertiesCollector object per table and then call it sequentially
|
// TablePropertiesCollector object per table and then call it sequentially
|
||||||
class TablePropertiesCollector {
|
class TablePropertiesCollector {
|
||||||
|
@ -118,7 +118,7 @@ class Transaction {
|
|||||||
// longer be valid and should be discarded after a call to ClearSnapshot().
|
// longer be valid and should be discarded after a call to ClearSnapshot().
|
||||||
virtual void ClearSnapshot() = 0;
|
virtual void ClearSnapshot() = 0;
|
||||||
|
|
||||||
// Prepare the current transation for 2PC
|
// Prepare the current transaction for 2PC
|
||||||
virtual Status Prepare() = 0;
|
virtual Status Prepare() = 0;
|
||||||
|
|
||||||
// Write all batched keys to the db atomically.
|
// Write all batched keys to the db atomically.
|
||||||
@ -169,8 +169,8 @@ class Transaction {
|
|||||||
ColumnFamilyHandle* column_family, const Slice& key,
|
ColumnFamilyHandle* column_family, const Slice& key,
|
||||||
std::string* value) = 0;
|
std::string* value) = 0;
|
||||||
|
|
||||||
// An overload of the the above method that receives a PinnableSlice
|
// An overload of the above method that receives a PinnableSlice
|
||||||
// For backward compatiblity a default implementation is provided
|
// For backward compatibility a default implementation is provided
|
||||||
virtual Status Get(const ReadOptions& options,
|
virtual Status Get(const ReadOptions& options,
|
||||||
ColumnFamilyHandle* column_family, const Slice& key,
|
ColumnFamilyHandle* column_family, const Slice& key,
|
||||||
PinnableSlice* pinnable_val) {
|
PinnableSlice* pinnable_val) {
|
||||||
@ -230,8 +230,8 @@ class Transaction {
|
|||||||
const Slice& key, std::string* value,
|
const Slice& key, std::string* value,
|
||||||
bool exclusive = true) = 0;
|
bool exclusive = true) = 0;
|
||||||
|
|
||||||
// An overload of the the above method that receives a PinnableSlice
|
// An overload of the above method that receives a PinnableSlice
|
||||||
// For backward compatiblity a default implementation is provided
|
// For backward compatibility a default implementation is provided
|
||||||
virtual Status GetForUpdate(const ReadOptions& options,
|
virtual Status GetForUpdate(const ReadOptions& options,
|
||||||
ColumnFamilyHandle* /*column_family*/,
|
ColumnFamilyHandle* /*column_family*/,
|
||||||
const Slice& key, PinnableSlice* pinnable_val,
|
const Slice& key, PinnableSlice* pinnable_val,
|
||||||
@ -368,7 +368,7 @@ class Transaction {
|
|||||||
virtual void EnableIndexing() = 0;
|
virtual void EnableIndexing() = 0;
|
||||||
|
|
||||||
// Returns the number of distinct Keys being tracked by this transaction.
|
// Returns the number of distinct Keys being tracked by this transaction.
|
||||||
// If this transaction was created by a TransactinDB, this is the number of
|
// If this transaction was created by a TransactionDB, this is the number of
|
||||||
// keys that are currently locked by this transaction.
|
// keys that are currently locked by this transaction.
|
||||||
// If this transaction was created by an OptimisticTransactionDB, this is the
|
// If this transaction was created by an OptimisticTransactionDB, this is the
|
||||||
// number of keys that need to be checked for conflicts at commit time.
|
// number of keys that need to be checked for conflicts at commit time.
|
||||||
|
@ -75,7 +75,7 @@ struct TransactionDBOptions {
|
|||||||
// expiration set.
|
// expiration set.
|
||||||
int64_t default_lock_timeout = 1000; // 1 second
|
int64_t default_lock_timeout = 1000; // 1 second
|
||||||
|
|
||||||
// If set, the TransactionDB will use this implemenation of a mutex and
|
// If set, the TransactionDB will use this implementation of a mutex and
|
||||||
// condition variable for all transaction locking instead of the default
|
// condition variable for all transaction locking instead of the default
|
||||||
// mutex/condvar implementation.
|
// mutex/condvar implementation.
|
||||||
std::shared_ptr<TransactionDBMutexFactory> custom_mutex_factory;
|
std::shared_ptr<TransactionDBMutexFactory> custom_mutex_factory;
|
||||||
@ -100,7 +100,7 @@ struct TransactionOptions {
|
|||||||
// If set, it states that the CommitTimeWriteBatch represents the latest state
|
// If set, it states that the CommitTimeWriteBatch represents the latest state
|
||||||
// of the application and meant to be used later during recovery. It enables
|
// of the application and meant to be used later during recovery. It enables
|
||||||
// an optimization to postpone updating the memtable with CommitTimeWriteBatch
|
// an optimization to postpone updating the memtable with CommitTimeWriteBatch
|
||||||
// to only SwithcMamtable or recovery.
|
// to only SwitchMemtable or recovery.
|
||||||
bool use_only_the_last_commit_time_batch_for_recovery = false;
|
bool use_only_the_last_commit_time_batch_for_recovery = false;
|
||||||
|
|
||||||
// TODO(agiardullo): TransactionDB does not yet support comparators that allow
|
// TODO(agiardullo): TransactionDB does not yet support comparators that allow
|
||||||
@ -131,15 +131,15 @@ struct TransactionOptions {
|
|||||||
};
|
};
|
||||||
|
|
||||||
// The per-write optimizations that do not involve transactions. TransactionDB
|
// The per-write optimizations that do not involve transactions. TransactionDB
|
||||||
// implemenation might or might not make use of the specified optimizations.
|
// implementation might or might not make use of the specified optimizations.
|
||||||
struct TransactionDBWriteOptimizations {
|
struct TransactionDBWriteOptimizations {
|
||||||
// If it is true it means that the applicatinn guratnees that the
|
// If it is true it means that the application guarantees that the
|
||||||
// key-set in the write batch do not conflict with any concurrent transaction
|
// key-set in the write batch do not conflict with any concurrent transaction
|
||||||
// and hence the concurrency control mechanism could be skipped for this
|
// and hence the concurrency control mechanism could be skipped for this
|
||||||
// write.
|
// write.
|
||||||
bool skip_concurrency_control = false;
|
bool skip_concurrency_control = false;
|
||||||
// If true, the application guarantees that there is no duplicate <column
|
// If true, the application guarantees that there is no duplicate <column
|
||||||
// family, key> in the write batch and any employed mechanism to hanlde
|
// family, key> in the write batch and any employed mechanism to handle
|
||||||
// duplicate keys could be skipped.
|
// duplicate keys could be skipped.
|
||||||
bool skip_duplicate_key_check = false;
|
bool skip_duplicate_key_check = false;
|
||||||
};
|
};
|
||||||
|
@ -188,7 +188,7 @@ class WriteBatchWithIndex : public WriteBatchBase {
|
|||||||
Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
|
Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
|
||||||
const Slice& key, std::string* value);
|
const Slice& key, std::string* value);
|
||||||
|
|
||||||
// An overload of the the above method that receives a PinnableSlice
|
// An overload of the above method that receives a PinnableSlice
|
||||||
Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
|
Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
|
||||||
const Slice& key, PinnableSlice* value);
|
const Slice& key, PinnableSlice* value);
|
||||||
|
|
||||||
@ -196,7 +196,7 @@ class WriteBatchWithIndex : public WriteBatchBase {
|
|||||||
ColumnFamilyHandle* column_family, const Slice& key,
|
ColumnFamilyHandle* column_family, const Slice& key,
|
||||||
std::string* value);
|
std::string* value);
|
||||||
|
|
||||||
// An overload of the the above method that receives a PinnableSlice
|
// An overload of the above method that receives a PinnableSlice
|
||||||
Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
|
Status GetFromBatchAndDB(DB* db, const ReadOptions& read_options,
|
||||||
ColumnFamilyHandle* column_family, const Slice& key,
|
ColumnFamilyHandle* column_family, const Slice& key,
|
||||||
PinnableSlice* value);
|
PinnableSlice* value);
|
||||||
|
@ -334,7 +334,7 @@ class WriteBatch : public WriteBatchBase {
|
|||||||
friend class WriteBatchInternal;
|
friend class WriteBatchInternal;
|
||||||
friend class LocalSavePoint;
|
friend class LocalSavePoint;
|
||||||
// TODO(myabandeh): this is needed for a hack to collapse the write batch and
|
// TODO(myabandeh): this is needed for a hack to collapse the write batch and
|
||||||
// remove duplicate keys. Remove it when the hack is replaced with a propper
|
// remove duplicate keys. Remove it when the hack is replaced with a proper
|
||||||
// solution.
|
// solution.
|
||||||
friend class WriteBatchWithIndex;
|
friend class WriteBatchWithIndex;
|
||||||
SavePoints* save_points_;
|
SavePoints* save_points_;
|
||||||
|
@ -20,7 +20,7 @@ struct SliceParts;
|
|||||||
|
|
||||||
// Abstract base class that defines the basic interface for a write batch.
|
// Abstract base class that defines the basic interface for a write batch.
|
||||||
// See WriteBatch for a basic implementation and WrithBatchWithIndex for an
|
// See WriteBatch for a basic implementation and WrithBatchWithIndex for an
|
||||||
// indexed implemenation.
|
// indexed implementation.
|
||||||
class WriteBatchBase {
|
class WriteBatchBase {
|
||||||
public:
|
public:
|
||||||
virtual ~WriteBatchBase() {}
|
virtual ~WriteBatchBase() {}
|
||||||
|
@ -441,7 +441,7 @@ public interface AdvancedColumnFamilyOptionsInterface
|
|||||||
boolean optimizeFiltersForHits();
|
boolean optimizeFiltersForHits();
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* In debug mode, RocksDB run consistency checks on the LSM everytime the LSM
|
* In debug mode, RocksDB run consistency checks on the LSM every time the LSM
|
||||||
* change (Flush, Compaction, AddFile). These checks are disabled in release
|
* change (Flush, Compaction, AddFile). These checks are disabled in release
|
||||||
* mode, use this option to enable them in release mode as well.
|
* mode, use this option to enable them in release mode as well.
|
||||||
*
|
*
|
||||||
@ -455,7 +455,7 @@ public interface AdvancedColumnFamilyOptionsInterface
|
|||||||
boolean forceConsistencyChecks);
|
boolean forceConsistencyChecks);
|
||||||
|
|
||||||
/**
|
/**
|
||||||
* In debug mode, RocksDB run consistency checks on the LSM everytime the LSM
|
* In debug mode, RocksDB run consistency checks on the LSM every time the LSM
|
||||||
* change (Flush, Compaction, AddFile). These checks are disabled in release
|
* change (Flush, Compaction, AddFile). These checks are disabled in release
|
||||||
* mode.
|
* mode.
|
||||||
*
|
*
|
||||||
|
@ -59,7 +59,7 @@ public class TransactionOptions extends RocksObject
|
|||||||
* If negative, {@link TransactionDBOptions#getTransactionLockTimeout(long)}
|
* If negative, {@link TransactionDBOptions#getTransactionLockTimeout(long)}
|
||||||
* will be used
|
* will be used
|
||||||
*
|
*
|
||||||
* @return the lock tiemout in milliseconds
|
* @return the lock timeout in milliseconds
|
||||||
*/
|
*/
|
||||||
public long getLockTimeout() {
|
public long getLockTimeout() {
|
||||||
assert(isOwningHandle());
|
assert(isOwningHandle());
|
||||||
@ -76,7 +76,7 @@ public class TransactionOptions extends RocksObject
|
|||||||
*
|
*
|
||||||
* Default: -1
|
* Default: -1
|
||||||
*
|
*
|
||||||
* @param lockTimeout the lock tiemout in milliseconds
|
* @param lockTimeout the lock timeout in milliseconds
|
||||||
*
|
*
|
||||||
* @return this TransactionOptions instance
|
* @return this TransactionOptions instance
|
||||||
*/
|
*/
|
||||||
@ -136,7 +136,7 @@ public class TransactionOptions extends RocksObject
|
|||||||
*
|
*
|
||||||
* Default: 50
|
* Default: 50
|
||||||
*
|
*
|
||||||
* @param deadlockDetectDepth the the number of traversals to make during
|
* @param deadlockDetectDepth the number of traversals to make during
|
||||||
* deadlock detection
|
* deadlock detection
|
||||||
*
|
*
|
||||||
* @return this TransactionOptions instance
|
* @return this TransactionOptions instance
|
||||||
|
@ -50,7 +50,7 @@ struct SstFileWriter::Rep {
|
|||||||
std::string column_family_name;
|
std::string column_family_name;
|
||||||
ColumnFamilyHandle* cfh;
|
ColumnFamilyHandle* cfh;
|
||||||
// If true, We will give the OS a hint that this file pages is not needed
|
// If true, We will give the OS a hint that this file pages is not needed
|
||||||
// everytime we write 1MB to the file.
|
// every time we write 1MB to the file.
|
||||||
bool invalidate_page_cache;
|
bool invalidate_page_cache;
|
||||||
// The size of the file during the last time we called Fadvise to remove
|
// The size of the file during the last time we called Fadvise to remove
|
||||||
// cached pages from page cache.
|
// cached pages from page cache.
|
||||||
|
@ -997,11 +997,11 @@ struct ThreadState {
|
|||||||
Stats stats;
|
Stats stats;
|
||||||
struct SnapshotState {
|
struct SnapshotState {
|
||||||
const Snapshot* snapshot;
|
const Snapshot* snapshot;
|
||||||
// The cf from which we did a Get at this stapshot
|
// The cf from which we did a Get at this snapshot
|
||||||
int cf_at;
|
int cf_at;
|
||||||
// The name of the cf at the the time that we did a read
|
// The name of the cf at the time that we did a read
|
||||||
std::string cf_at_name;
|
std::string cf_at_name;
|
||||||
// The key with which we did a Get at this stapshot
|
// The key with which we did a Get at this snapshot
|
||||||
std::string key;
|
std::string key;
|
||||||
// The status of the Get
|
// The status of the Get
|
||||||
Status status;
|
Status status;
|
||||||
|
@ -26,7 +26,7 @@ extern std::vector<std::string> rocksdb_kill_prefix_blacklist;
|
|||||||
#else
|
#else
|
||||||
|
|
||||||
namespace rocksdb {
|
namespace rocksdb {
|
||||||
// Kill the process with probablity 1/odds for testing.
|
// Kill the process with probability 1/odds for testing.
|
||||||
extern void TestKillRandom(std::string kill_point, int odds,
|
extern void TestKillRandom(std::string kill_point, int odds,
|
||||||
const std::string& srcfile, int srcline);
|
const std::string& srcfile, int srcline);
|
||||||
|
|
||||||
@ -105,7 +105,7 @@ class SyncPoint {
|
|||||||
|
|
||||||
// triggered by TEST_SYNC_POINT, blocking execution until all predecessors
|
// triggered by TEST_SYNC_POINT, blocking execution until all predecessors
|
||||||
// are executed.
|
// are executed.
|
||||||
// And/or call registered callback functionn, with argument `cb_arg`
|
// And/or call registered callback function, with argument `cb_arg`
|
||||||
void Process(const std::string& point, void* cb_arg = nullptr);
|
void Process(const std::string& point, void* cb_arg = nullptr);
|
||||||
|
|
||||||
// TODO: it might be useful to provide a function that blocks until all
|
// TODO: it might be useful to provide a function that blocks until all
|
||||||
@ -133,7 +133,7 @@ class SyncPoint {
|
|||||||
} // namespace rocksdb
|
} // namespace rocksdb
|
||||||
|
|
||||||
// Use TEST_SYNC_POINT to specify sync points inside code base.
|
// Use TEST_SYNC_POINT to specify sync points inside code base.
|
||||||
// Sync points can have happens-after depedency on other sync points,
|
// Sync points can have happens-after dependency on other sync points,
|
||||||
// configured at runtime via SyncPoint::LoadDependency. This could be
|
// configured at runtime via SyncPoint::LoadDependency. This could be
|
||||||
// utilized to re-produce race conditions between threads.
|
// utilized to re-produce race conditions between threads.
|
||||||
// See TransactionLogIteratorRace in db_test.cc for an example use case.
|
// See TransactionLogIteratorRace in db_test.cc for an example use case.
|
||||||
|
@ -373,7 +373,7 @@ Transaction* PessimisticTransactionDB::BeginInternalTransaction(
|
|||||||
//
|
//
|
||||||
// Put(), Merge(), and Delete() only lock a single key per call. Write() will
|
// Put(), Merge(), and Delete() only lock a single key per call. Write() will
|
||||||
// sort its keys before locking them. This guarantees that TransactionDB write
|
// sort its keys before locking them. This guarantees that TransactionDB write
|
||||||
// methods cannot deadlock with eachother (but still could deadlock with a
|
// methods cannot deadlock with each other (but still could deadlock with a
|
||||||
// Transaction).
|
// Transaction).
|
||||||
Status PessimisticTransactionDB::Put(const WriteOptions& options,
|
Status PessimisticTransactionDB::Put(const WriteOptions& options,
|
||||||
ColumnFamilyHandle* column_family,
|
ColumnFamilyHandle* column_family,
|
||||||
|
@ -101,7 +101,7 @@ TEST(PreparedHeap, BasicsTest) {
|
|||||||
heap.erase(89l);
|
heap.erase(89l);
|
||||||
heap.erase(86l);
|
heap.erase(86l);
|
||||||
heap.erase(88l);
|
heap.erase(88l);
|
||||||
// Test top remians the same after a ranodm order of many erases
|
// Test top remains the same after a random order of many erases
|
||||||
ASSERT_EQ(64l, heap.top());
|
ASSERT_EQ(64l, heap.top());
|
||||||
heap.pop();
|
heap.pop();
|
||||||
// Test that pop works with a series of random pending erases
|
// Test that pop works with a series of random pending erases
|
||||||
@ -240,7 +240,7 @@ TEST(WriteBatchWithIndex, SubBatchCnt) {
|
|||||||
ASSERT_EQ(batch_cnt, counter.BatchCount());
|
ASSERT_EQ(batch_cnt, counter.BatchCount());
|
||||||
|
|
||||||
// Test that RollbackToSavePoint will properly resets the number of
|
// Test that RollbackToSavePoint will properly resets the number of
|
||||||
// sub-bathces
|
// sub-batches
|
||||||
for (size_t i = save_points; i > 0; i--) {
|
for (size_t i = save_points; i > 0; i--) {
|
||||||
batch.RollbackToSavePoint();
|
batch.RollbackToSavePoint();
|
||||||
ASSERT_EQ(batch_cnt_at[i - 1], batch.SubBatchCnt());
|
ASSERT_EQ(batch_cnt_at[i - 1], batch.SubBatchCnt());
|
||||||
@ -280,7 +280,7 @@ TEST(CommitEntry64b, BasicTest) {
|
|||||||
const size_t INDEX_SIZE = static_cast<size_t>(1ull << INDEX_BITS);
|
const size_t INDEX_SIZE = static_cast<size_t>(1ull << INDEX_BITS);
|
||||||
const CommitEntry64bFormat FORMAT(static_cast<size_t>(INDEX_BITS));
|
const CommitEntry64bFormat FORMAT(static_cast<size_t>(INDEX_BITS));
|
||||||
|
|
||||||
// zero-initialized CommitEntry64b should inidcate an empty entry
|
// zero-initialized CommitEntry64b should indicate an empty entry
|
||||||
CommitEntry64b empty_entry64b;
|
CommitEntry64b empty_entry64b;
|
||||||
uint64_t empty_index = 11ul;
|
uint64_t empty_index = 11ul;
|
||||||
CommitEntry empty_entry;
|
CommitEntry empty_entry;
|
||||||
@ -353,7 +353,7 @@ class WritePreparedTransactionTestBase : public TransactionTestBase {
|
|||||||
protected:
|
protected:
|
||||||
// If expect_update is set, check if it actually updated old_commit_map_. If
|
// If expect_update is set, check if it actually updated old_commit_map_. If
|
||||||
// it did not and yet suggested not to check the next snapshot, do the
|
// it did not and yet suggested not to check the next snapshot, do the
|
||||||
// opposite to check if it was not a bad suggstion.
|
// opposite to check if it was not a bad suggestion.
|
||||||
void MaybeUpdateOldCommitMapTestWithNext(uint64_t prepare, uint64_t commit,
|
void MaybeUpdateOldCommitMapTestWithNext(uint64_t prepare, uint64_t commit,
|
||||||
uint64_t snapshot,
|
uint64_t snapshot,
|
||||||
uint64_t next_snapshot,
|
uint64_t next_snapshot,
|
||||||
@ -371,7 +371,7 @@ class WritePreparedTransactionTestBase : public TransactionTestBase {
|
|||||||
}
|
}
|
||||||
EXPECT_EQ(!expect_update, wp_db->old_commit_map_empty_);
|
EXPECT_EQ(!expect_update, wp_db->old_commit_map_empty_);
|
||||||
if (!check_next && wp_db->old_commit_map_empty_) {
|
if (!check_next && wp_db->old_commit_map_empty_) {
|
||||||
// do the oppotisite to make sure it was not a bad suggestion
|
// do the opposite to make sure it was not a bad suggestion
|
||||||
const bool dont_care_bool = true;
|
const bool dont_care_bool = true;
|
||||||
wp_db->MaybeUpdateOldCommitMap(prepare, commit, next_snapshot,
|
wp_db->MaybeUpdateOldCommitMap(prepare, commit, next_snapshot,
|
||||||
dont_care_bool);
|
dont_care_bool);
|
||||||
@ -772,7 +772,7 @@ TEST_P(WritePreparedTransactionTest, CheckAgainstSnapshotsTest) {
|
|||||||
wp_db->UpdateSnapshots(snapshots, version);
|
wp_db->UpdateSnapshots(snapshots, version);
|
||||||
ASSERT_EQ(snapshots.size(), wp_db->snapshots_total_);
|
ASSERT_EQ(snapshots.size(), wp_db->snapshots_total_);
|
||||||
// seq numbers are chosen so that we have two of them between each two
|
// seq numbers are chosen so that we have two of them between each two
|
||||||
// snapshots. If the diff of two consecuitive seq is more than 5, there is a
|
// snapshots. If the diff of two consecutive seq is more than 5, there is a
|
||||||
// snapshot between them.
|
// snapshot between them.
|
||||||
std::vector<SequenceNumber> seqs = {50l, 55l, 150l, 155l, 250l, 255l, 350l,
|
std::vector<SequenceNumber> seqs = {50l, 55l, 150l, 155l, 250l, 255l, 350l,
|
||||||
355l, 450l, 455l, 550l, 555l, 650l, 655l,
|
355l, 450l, 455l, 550l, 555l, 650l, 655l,
|
||||||
@ -904,7 +904,7 @@ TEST_P(WritePreparedTransactionTest, AdvanceMaxEvictedSeqBasicTest) {
|
|||||||
// a. max should be updated to new_max
|
// a. max should be updated to new_max
|
||||||
ASSERT_EQ(wp_db->max_evicted_seq_, new_max);
|
ASSERT_EQ(wp_db->max_evicted_seq_, new_max);
|
||||||
// b. delayed prepared should contain every txn <= max and prepared should
|
// b. delayed prepared should contain every txn <= max and prepared should
|
||||||
// only contian txns > max
|
// only contain txns > max
|
||||||
auto it = initial_prepared.begin();
|
auto it = initial_prepared.begin();
|
||||||
for (; it != initial_prepared.end() && *it <= new_max; it++) {
|
for (; it != initial_prepared.end() && *it <= new_max; it++) {
|
||||||
ASSERT_EQ(1, wp_db->delayed_prepared_.erase(*it));
|
ASSERT_EQ(1, wp_db->delayed_prepared_.erase(*it));
|
||||||
@ -1034,9 +1034,9 @@ TEST_P(WritePreparedTransactionTest, SeqAdvanceConcurrentTest) {
|
|||||||
}
|
}
|
||||||
if (options.two_write_queues) {
|
if (options.two_write_queues) {
|
||||||
// In this case none of the above scheduling tricks to deterministically
|
// In this case none of the above scheduling tricks to deterministically
|
||||||
// form merged bactches works because the writes go to saparte queues.
|
// form merged batches works because the writes go to separate queues.
|
||||||
// This would result in different write groups in each run of the test. We
|
// This would result in different write groups in each run of the test. We
|
||||||
// still keep the test since althgouh non-deterministic and hard to debug,
|
// still keep the test since although non-deterministic and hard to debug,
|
||||||
// it is still useful to have.
|
// it is still useful to have.
|
||||||
// TODO(myabandeh): Add a deterministic unit test for two_write_queues
|
// TODO(myabandeh): Add a deterministic unit test for two_write_queues
|
||||||
}
|
}
|
||||||
@ -1069,7 +1069,7 @@ TEST_P(WritePreparedTransactionTest, SeqAdvanceConcurrentTest) {
|
|||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
|
||||||
// Run a couple of differnet txns among them some uncommitted. Restart the db at
|
// Run a couple of different txns among them some uncommitted. Restart the db at
|
||||||
// a couple points to check whether the list of uncommitted txns are recovered
|
// a couple points to check whether the list of uncommitted txns are recovered
|
||||||
// properly.
|
// properly.
|
||||||
TEST_P(WritePreparedTransactionTest, BasicRecoveryTest) {
|
TEST_P(WritePreparedTransactionTest, BasicRecoveryTest) {
|
||||||
@ -1284,16 +1284,16 @@ TEST_P(WritePreparedTransactionTest, IsInSnapshotTest) {
|
|||||||
// only a few snapshots are below the max_evicted_seq_.
|
// only a few snapshots are below the max_evicted_seq_.
|
||||||
for (int max_snapshots = 1; max_snapshots < 20; max_snapshots++) {
|
for (int max_snapshots = 1; max_snapshots < 20; max_snapshots++) {
|
||||||
// Leave some gap between the preliminary snapshots and the final snapshot
|
// Leave some gap between the preliminary snapshots and the final snapshot
|
||||||
// that we check. This should test for also different overlapping scnearios
|
// that we check. This should test for also different overlapping scenarios
|
||||||
// between the last snapshot and the commits.
|
// between the last snapshot and the commits.
|
||||||
for (int max_gap = 1; max_gap < 10; max_gap++) {
|
for (int max_gap = 1; max_gap < 10; max_gap++) {
|
||||||
// Since we do not actually write to db, we mock the seq as it would be
|
// Since we do not actually write to db, we mock the seq as it would be
|
||||||
// increaased by the db. The only exception is that we need db seq to
|
// increased by the db. The only exception is that we need db seq to
|
||||||
// advance for our snapshots. for which we apply a dummy put each time we
|
// advance for our snapshots. for which we apply a dummy put each time we
|
||||||
// increase our mock of seq.
|
// increase our mock of seq.
|
||||||
uint64_t seq = 0;
|
uint64_t seq = 0;
|
||||||
// At each step we prepare a txn and then we commit it in the next txn.
|
// At each step we prepare a txn and then we commit it in the next txn.
|
||||||
// This emulates the consecuitive transactions that write to the same key
|
// This emulates the consecutive transactions that write to the same key
|
||||||
uint64_t cur_txn = 0;
|
uint64_t cur_txn = 0;
|
||||||
// Number of snapshots taken so far
|
// Number of snapshots taken so far
|
||||||
int num_snapshots = 0;
|
int num_snapshots = 0;
|
||||||
@ -1306,7 +1306,7 @@ TEST_P(WritePreparedTransactionTest, IsInSnapshotTest) {
|
|||||||
// we add a new prepare txn. These do not mean to be committed for
|
// we add a new prepare txn. These do not mean to be committed for
|
||||||
// snapshot inspection.
|
// snapshot inspection.
|
||||||
std::set<uint64_t> prepared;
|
std::set<uint64_t> prepared;
|
||||||
// We keep the list of txns comitted before we take the last snaphot.
|
// We keep the list of txns committed before we take the last snapshot.
|
||||||
// These should be the only seq numbers that will be found in the snapshot
|
// These should be the only seq numbers that will be found in the snapshot
|
||||||
std::set<uint64_t> committed_before;
|
std::set<uint64_t> committed_before;
|
||||||
// The set of commit seq numbers to be excluded from IsInSnapshot queries
|
// The set of commit seq numbers to be excluded from IsInSnapshot queries
|
||||||
|
@ -34,8 +34,8 @@ namespace rocksdb {
|
|||||||
|
|
||||||
class WritePreparedTxnDB;
|
class WritePreparedTxnDB;
|
||||||
|
|
||||||
// This impl could write to DB also uncomitted data and then later tell apart
|
// This impl could write to DB also uncommitted data and then later tell apart
|
||||||
// committed data from uncomitted data. Uncommitted data could be after the
|
// committed data from uncommitted data. Uncommitted data could be after the
|
||||||
// Prepare phase in 2PC (WritePreparedTxn) or before that
|
// Prepare phase in 2PC (WritePreparedTxn) or before that
|
||||||
// (WriteUnpreparedTxnImpl).
|
// (WriteUnpreparedTxnImpl).
|
||||||
class WritePreparedTxn : public PessimisticTransaction {
|
class WritePreparedTxn : public PessimisticTransaction {
|
||||||
|
@ -109,17 +109,17 @@ class WritePreparedTxnDB : public PessimisticTransactionDB {
|
|||||||
|
|
||||||
virtual void ReleaseSnapshot(const Snapshot* snapshot) override;
|
virtual void ReleaseSnapshot(const Snapshot* snapshot) override;
|
||||||
|
|
||||||
// Check whether the transaction that wrote the value with seqeunce number seq
|
// Check whether the transaction that wrote the value with sequence number seq
|
||||||
// is visible to the snapshot with sequence number snapshot_seq
|
// is visible to the snapshot with sequence number snapshot_seq
|
||||||
bool IsInSnapshot(uint64_t seq, uint64_t snapshot_seq) const;
|
bool IsInSnapshot(uint64_t seq, uint64_t snapshot_seq) const;
|
||||||
// Add the trasnaction with prepare sequence seq to the prepared list
|
// Add the transaction with prepare sequence seq to the prepared list
|
||||||
void AddPrepared(uint64_t seq);
|
void AddPrepared(uint64_t seq);
|
||||||
// Rollback a prepared txn identified with prep_seq. rollback_seq is the seq
|
// Rollback a prepared txn identified with prep_seq. rollback_seq is the seq
|
||||||
// with which the additional data is written to cancel the txn effect. It can
|
// with which the additional data is written to cancel the txn effect. It can
|
||||||
// be used to idenitfy the snapshots that overlap with the rolled back txn.
|
// be used to identify the snapshots that overlap with the rolled back txn.
|
||||||
void RollbackPrepared(uint64_t prep_seq, uint64_t rollback_seq);
|
void RollbackPrepared(uint64_t prep_seq, uint64_t rollback_seq);
|
||||||
// Add the transaction with prepare sequence prepare_seq and commit sequence
|
// Add the transaction with prepare sequence prepare_seq and commit sequence
|
||||||
// commit_seq to the commit map. prepare_skipped is set if the prpeare phase
|
// commit_seq to the commit map. prepare_skipped is set if the prepare phase
|
||||||
// is skipped for this commit. loop_cnt is to detect infinite loops.
|
// is skipped for this commit. loop_cnt is to detect infinite loops.
|
||||||
void AddCommitted(uint64_t prepare_seq, uint64_t commit_seq,
|
void AddCommitted(uint64_t prepare_seq, uint64_t commit_seq,
|
||||||
bool prepare_skipped = false, uint8_t loop_cnt = 0);
|
bool prepare_skipped = false, uint8_t loop_cnt = 0);
|
||||||
@ -158,7 +158,7 @@ class WritePreparedTxnDB : public PessimisticTransactionDB {
|
|||||||
};
|
};
|
||||||
|
|
||||||
// Prepare Seq (64 bits) = PAD ... PAD PREP PREP ... PREP INDEX INDEX ...
|
// Prepare Seq (64 bits) = PAD ... PAD PREP PREP ... PREP INDEX INDEX ...
|
||||||
// INDEX Detal Seq (64 bits) = 0 0 0 0 0 0 0 0 0 0 0 0 DELTA DELTA ...
|
// INDEX Delta Seq (64 bits) = 0 0 0 0 0 0 0 0 0 0 0 0 DELTA DELTA ...
|
||||||
// DELTA DELTA Encoded Value = PREP PREP .... PREP PREP DELTA DELTA
|
// DELTA DELTA Encoded Value = PREP PREP .... PREP PREP DELTA DELTA
|
||||||
// ... DELTA DELTA PAD: first bits of a seq that is reserved for tagging and
|
// ... DELTA DELTA PAD: first bits of a seq that is reserved for tagging and
|
||||||
// hence ignored PREP/INDEX: the used bits in a prepare seq number INDEX: the
|
// hence ignored PREP/INDEX: the used bits in a prepare seq number INDEX: the
|
||||||
@ -274,7 +274,7 @@ class WritePreparedTxnDB : public PessimisticTransactionDB {
|
|||||||
while (!heap_.empty() && !erased_heap_.empty() &&
|
while (!heap_.empty() && !erased_heap_.empty() &&
|
||||||
// heap_.top() > erased_heap_.top() could happen if we have erased
|
// heap_.top() > erased_heap_.top() could happen if we have erased
|
||||||
// a non-existent entry. Ideally the user should not do that but we
|
// a non-existent entry. Ideally the user should not do that but we
|
||||||
// should be resiliant againt it.
|
// should be resilient against it.
|
||||||
heap_.top() >= erased_heap_.top()) {
|
heap_.top() >= erased_heap_.top()) {
|
||||||
if (heap_.top() == erased_heap_.top()) {
|
if (heap_.top() == erased_heap_.top()) {
|
||||||
heap_.pop();
|
heap_.pop();
|
||||||
@ -330,7 +330,7 @@ class WritePreparedTxnDB : public PessimisticTransactionDB {
|
|||||||
// the time of updating the max. Thread-safety: this function can be called
|
// the time of updating the max. Thread-safety: this function can be called
|
||||||
// concurrently. The concurrent invocations of this function is equivalent to
|
// concurrently. The concurrent invocations of this function is equivalent to
|
||||||
// a serial invocation in which the last invocation is the one with the
|
// a serial invocation in which the last invocation is the one with the
|
||||||
// largetst new_max value.
|
// largest new_max value.
|
||||||
void AdvanceMaxEvictedSeq(const SequenceNumber& prev_max,
|
void AdvanceMaxEvictedSeq(const SequenceNumber& prev_max,
|
||||||
const SequenceNumber& new_max);
|
const SequenceNumber& new_max);
|
||||||
|
|
||||||
@ -342,9 +342,9 @@ class WritePreparedTxnDB : public PessimisticTransactionDB {
|
|||||||
void ReleaseSnapshotInternal(const SequenceNumber snap_seq);
|
void ReleaseSnapshotInternal(const SequenceNumber snap_seq);
|
||||||
|
|
||||||
// Update the list of snapshots corresponding to the soon-to-be-updated
|
// Update the list of snapshots corresponding to the soon-to-be-updated
|
||||||
// max_eviceted_seq_. Thread-safety: this function can be called concurrently.
|
// max_evicted_seq_. Thread-safety: this function can be called concurrently.
|
||||||
// The concurrent invocations of this function is equivalent to a serial
|
// The concurrent invocations of this function is equivalent to a serial
|
||||||
// invocation in which the last invocation is the one with the largetst
|
// invocation in which the last invocation is the one with the largest
|
||||||
// version value.
|
// version value.
|
||||||
void UpdateSnapshots(const std::vector<SequenceNumber>& snapshots,
|
void UpdateSnapshots(const std::vector<SequenceNumber>& snapshots,
|
||||||
const SequenceNumber& version);
|
const SequenceNumber& version);
|
||||||
@ -383,7 +383,7 @@ class WritePreparedTxnDB : public PessimisticTransactionDB {
|
|||||||
// Thread-safety is provided with snapshots_mutex_.
|
// Thread-safety is provided with snapshots_mutex_.
|
||||||
std::vector<SequenceNumber> snapshots_;
|
std::vector<SequenceNumber> snapshots_;
|
||||||
// The version of the latest list of snapshots. This can be used to avoid
|
// The version of the latest list of snapshots. This can be used to avoid
|
||||||
// rewrittiing a list that is concurrently updated with a more recent version.
|
// rewriting a list that is concurrently updated with a more recent version.
|
||||||
SequenceNumber snapshots_version_ = 0;
|
SequenceNumber snapshots_version_ = 0;
|
||||||
|
|
||||||
// A heap of prepared transactions. Thread-safety is provided with
|
// A heap of prepared transactions. Thread-safety is provided with
|
||||||
@ -408,7 +408,7 @@ class WritePreparedTxnDB : public PessimisticTransactionDB {
|
|||||||
// maintenance work under the lock.
|
// maintenance work under the lock.
|
||||||
size_t INC_STEP_FOR_MAX_EVICTED = 1;
|
size_t INC_STEP_FOR_MAX_EVICTED = 1;
|
||||||
// A map from old snapshots (expected to be used by a few read-only txns) to
|
// A map from old snapshots (expected to be used by a few read-only txns) to
|
||||||
// prpared sequence number of the evicted entries from commit_cache_ that
|
// prepared sequence number of the evicted entries from commit_cache_ that
|
||||||
// overlaps with such snapshot. These are the prepared sequence numbers that
|
// overlaps with such snapshot. These are the prepared sequence numbers that
|
||||||
// the snapshot, to which they are mapped, cannot assume to be committed just
|
// the snapshot, to which they are mapped, cannot assume to be committed just
|
||||||
// because it is no longer in the commit_cache_. The vector must be sorted
|
// because it is no longer in the commit_cache_. The vector must be sorted
|
||||||
@ -483,7 +483,7 @@ class WritePreparedCommitEntryPreReleaseCallback : public PreReleaseCallback {
|
|||||||
} // else there was no prepare phase
|
} // else there was no prepare phase
|
||||||
if (includes_data_) {
|
if (includes_data_) {
|
||||||
assert(data_batch_cnt_);
|
assert(data_batch_cnt_);
|
||||||
// Commit the data that is accompnaied with the commit request
|
// Commit the data that is accompanied with the commit request
|
||||||
const bool PREPARE_SKIPPED = true;
|
const bool PREPARE_SKIPPED = true;
|
||||||
for (size_t i = 0; i < data_batch_cnt_; i++) {
|
for (size_t i = 0; i < data_batch_cnt_; i++) {
|
||||||
// For commit seq of each batch use the commit seq of the last batch.
|
// For commit seq of each batch use the commit seq of the last batch.
|
||||||
|
Loading…
Reference in New Issue
Block a user