Improve / clean up meta block code & integrity (#9163)

Summary:
* Checksums are now checked on meta blocks unless specifically
suppressed or not applicable (e.g. plain table). (Was other way around.)
This means a number of cases that were not checking checksums now are,
including direct read TableProperties in Version::GetTableProperties
(fixed in meta_blocks ReadTableProperties), reading any block from
PersistentCache (fixed in BlockFetcher), read TableProperties in
SstFileDumper (ldb/sst_dump/BackupEngine) before table reader open,
maybe more.
* For that to work, I moved the global_seqno+TableProperties checksum
logic to the shared table/ code, because that is used by many utilies
such as SstFileDumper.
* Also for that to work, we have to know when we're dealing with a block
that has a checksum (trailer), so added that capability to Footer based
on magic number, and from there BlockFetcher.
* Knowledge of trailer presence has also fixed a problem where other
table formats were reading blocks including bytes for a non-existant
trailer--and awkwardly kind-of not using them, e.g. no shared code
checking checksums. (BlockFetcher compression type was populated
incorrectly.) Now we only read what is needed.
* Minimized code duplication and differing/incompatible/awkward
abstractions in meta_blocks.{cc,h} (e.g. SeekTo in metaindex block
without parsing block handle)
* Moved some meta block handling code from table_properties*.*
* Moved some code specific to block-based table from shared table/ code
to BlockBasedTable class. The checksum stuff means we can't completely
separate it, but things that don't need to be in shared table/ code
should not be.
* Use unique_ptr rather than raw ptr in more places. (Note: you can
std::move from unique_ptr to shared_ptr.)

Without enhancements to GetPropertiesOfAllTablesTest (see below),
net reduction of roughly 100 lines of code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9163

Test Plan:
existing tests and
* Enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to verify that
checksums are now checked on direct read of table properties by TableCache
(new test would fail before this change)
* Also enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to test
putting table properties under old meta name
* Also generally enhanced that same test to actually test what it was
supposed to be testing already, by kicking things out of table cache when
we don't want them there.

Reviewed By: ajkr, mrambacher

Differential Revision: D32514757

Pulled By: pdillinger

fbshipit-source-id: 507964b9311d186ae8d1131182290cbd97a99fa9
This commit is contained in:
Peter Dillinger 2021-11-18 11:42:12 -08:00 committed by Facebook GitHub Bot
parent f4294669e0
commit 230660be73
36 changed files with 507 additions and 517 deletions

View File

@ -17,6 +17,7 @@
* Added input sanitization on negative bytes passed into `GenericRateLimiter::Request`.
* Fixed an assertion failure in CompactionIterator when write-prepared transaction is used. We prove that certain operations can lead to a Delete being followed by a SingleDelete (same user key). We can drop the SingleDelete.
* Fixed a bug of timestamp-based GC which can cause all versions of a key under full_history_ts_low to be dropped. This bug will be triggered when some of the ikeys' timestamps are lower than full_history_ts_low, while others are newer.
* In some cases outside of the DB read and compaction paths, SST block checksums are now checked where they were not before.
* Explicitly check for and disallow the `BlockBasedTableOptions` if insertion into one of {`block_cache`, `block_cache_compressed`, `persistent_cache`} can show up in another of these. (RocksDB expects to be able to use the same key for different physical data among tiers.)
### Behavior Changes
@ -29,6 +30,7 @@
* Made FileSystem extend the Customizable class and added a CreateFromString method. Implementations need to be registered with the ObjectRegistry and to implement a Name() method in order to be created via this method.
* Clarified in API comments that RocksDB is not exception safe for callbacks and custom extensions. An exception propagating into RocksDB can lead to undefined behavior, including data loss, unreported corruption, deadlocks, and more.
* Marked `WriteBufferManager` as `final` because it is not intended for extension.
* Removed unimportant implementation details from table_properties.h
* Add API `FSDirectory::FsyncWithDirOptions()`, which provides extra information like directory fsync reason in `DirFsyncOptions`. File system like btrfs is using that to skip directory fsync for creating a new file, or when renaming a file, fsync the target file instead of the directory, which improves the `DB::Open()` speed by ~20%.
* `DB::Open()` is not going be blocked by obsolete file purge if `DBOptions::avoid_unnecessary_blocking_io` is set to true.
* In builds where glibc provides `gettid()`, info log ("LOG" file) lines now print a system-wide thread ID from `gettid()` instead of the process-local `pthread_self()`. For all users, the thread ID format is changed from hexadecimal to decimal integer.

View File

@ -306,6 +306,13 @@ ifdef COMPILE_WITH_ASAN
EXEC_LDFLAGS += -fsanitize=address
PLATFORM_CCFLAGS += -fsanitize=address
PLATFORM_CXXFLAGS += -fsanitize=address
ifeq ($(LIB_MODE),shared)
ifdef USE_CLANG
# Fix false ODR violation; see https://github.com/google/sanitizers/issues/1017
EXEC_LDFLAGS += -mllvm -asan-use-private-alias=1
PLATFORM_CXXFLAGS += -mllvm -asan-use-private-alias=1
endif
endif
endif
# TSAN doesn't work well with jemalloc. If we're compiling with TSAN, we should use regular malloc.

View File

@ -544,7 +544,7 @@ TEST_F(CorruptionTest, RangeDeletionCorrupted) {
fs->GetFileSize(filename, file_opts.io_options, &file_size, nullptr));
BlockHandle range_del_handle;
ASSERT_OK(FindMetaBlock(
ASSERT_OK(FindMetaBlockInFile(
file_reader.get(), file_size, kBlockBasedTableMagicNumber,
ImmutableOptions(options_), kRangeDelBlock, &range_del_handle));

View File

@ -14,8 +14,10 @@
#include "port/port.h"
#include "port/stack_trace.h"
#include "rocksdb/db.h"
#include "rocksdb/types.h"
#include "rocksdb/utilities/table_properties_collectors.h"
#include "table/format.h"
#include "table/meta_blocks.h"
#include "test_util/testharness.h"
#include "test_util/testutil.h"
#include "util/random.h"
@ -63,21 +65,49 @@ class DBTablePropertiesTest : public DBTestBase,
TEST_F(DBTablePropertiesTest, GetPropertiesOfAllTablesTest) {
Options options = CurrentOptions();
options.level0_file_num_compaction_trigger = 8;
// Part of strategy to prevent pinning table files
options.max_open_files = 42;
Reopen(options);
// Create 4 tables
for (int table = 0; table < 4; ++table) {
// Use old meta name for table properties for one file
if (table == 3) {
SyncPoint::GetInstance()->SetCallBack(
"BlockBasedTableBuilder::WritePropertiesBlock:Meta", [&](void* meta) {
*reinterpret_cast<const std::string**>(meta) =
&kPropertiesBlockOldName;
});
SyncPoint::GetInstance()->EnableProcessing();
}
// Build file
for (int i = 0; i < 10 + table; ++i) {
ASSERT_OK(db_->Put(WriteOptions(), ToString(table * 100 + i), "val"));
}
ASSERT_OK(db_->Flush(FlushOptions()));
}
SyncPoint::GetInstance()->DisableProcessing();
std::string original_session_id;
ASSERT_OK(db_->GetDbSessionId(original_session_id));
// Part of strategy to prevent pinning table files
SyncPoint::GetInstance()->SetCallBack(
"VersionEditHandler::LoadTables:skip_load_table_files",
[&](void* skip_load) { *reinterpret_cast<bool*>(skip_load) = true; });
SyncPoint::GetInstance()->EnableProcessing();
// 1. Read table properties directly from file
Reopen(options);
// Clear out auto-opened files
dbfull()->TEST_table_cache()->EraseUnRefEntries();
ASSERT_EQ(dbfull()->TEST_table_cache()->GetUsage(), 0U);
VerifyTableProperties(db_, 10 + 11 + 12 + 13);
// 2. Put two tables to table cache and
Reopen(options);
// Clear out auto-opened files
dbfull()->TEST_table_cache()->EraseUnRefEntries();
ASSERT_EQ(dbfull()->TEST_table_cache()->GetUsage(), 0U);
// fetch key from 1st and 2nd table, which will internally place that table to
// the table cache.
for (int i = 0; i < 2; ++i) {
@ -88,12 +118,57 @@ TEST_F(DBTablePropertiesTest, GetPropertiesOfAllTablesTest) {
// 3. Put all tables to table cache
Reopen(options);
// fetch key from 1st and 2nd table, which will internally place that table to
// the table cache.
// fetch key from all tables, which will place them in table cache.
for (int i = 0; i < 4; ++i) {
Get(ToString(i * 100 + 0));
}
VerifyTableProperties(db_, 10 + 11 + 12 + 13);
// 4. Try to read CORRUPT properties (a) directly from file, and (b)
// through reader on Get
// It's not practical to prevent table file read on Open, so we
// corrupt after open and after purging table cache.
for (bool direct : {true, false}) {
Reopen(options);
// Clear out auto-opened files
dbfull()->TEST_table_cache()->EraseUnRefEntries();
ASSERT_EQ(dbfull()->TEST_table_cache()->GetUsage(), 0U);
TablePropertiesCollection props;
ASSERT_OK(db_->GetPropertiesOfAllTables(&props));
std::string sst_file = props.begin()->first;
// Corrupt the file's TableProperties using session id
std::string contents;
ASSERT_OK(
ReadFileToString(env_->GetFileSystem().get(), sst_file, &contents));
size_t pos = contents.find(original_session_id);
ASSERT_NE(pos, std::string::npos);
ASSERT_OK(test::CorruptFile(env_, sst_file, static_cast<int>(pos), 1,
/*verify checksum fails*/ false));
// Try to read CORRUPT properties
if (direct) {
ASSERT_TRUE(db_->GetPropertiesOfAllTables(&props).IsCorruption());
} else {
bool found_corruption = false;
for (int i = 0; i < 4; ++i) {
std::string result = Get(ToString(i * 100 + 0));
if (result.find_first_of("Corruption: block checksum mismatch") !=
std::string::npos) {
found_corruption = true;
}
}
ASSERT_TRUE(found_corruption);
}
// UN-corrupt file for next iteration
ASSERT_OK(test::CorruptFile(env_, sst_file, static_cast<int>(pos), 1,
/*verify checksum fails*/ false));
}
SyncPoint::GetInstance()->DisableProcessing();
}
TEST_F(DBTablePropertiesTest, CreateOnDeletionCollectorFactory) {

View File

@ -266,24 +266,22 @@ class TestPlainTableReader : public PlainTableReader {
const EnvOptions& env_options, const InternalKeyComparator& icomparator,
EncodingType encoding_type, uint64_t file_size, int bloom_bits_per_key,
double hash_table_ratio, size_t index_sparseness,
const TableProperties* table_properties,
std::unique_ptr<TableProperties>&& props,
std::unique_ptr<RandomAccessFileReader>&& file,
const ImmutableOptions& ioptions, const SliceTransform* prefix_extractor,
bool* expect_bloom_not_match, bool store_index_in_file,
uint32_t column_family_id, const std::string& column_family_name)
: PlainTableReader(ioptions, std::move(file), env_options, icomparator,
encoding_type, file_size, table_properties,
encoding_type, file_size, props.get(),
prefix_extractor),
expect_bloom_not_match_(expect_bloom_not_match) {
Status s = MmapDataIfNeeded();
EXPECT_TRUE(s.ok());
s = PopulateIndex(const_cast<TableProperties*>(table_properties),
bloom_bits_per_key, hash_table_ratio, index_sparseness,
2 * 1024 * 1024);
s = PopulateIndex(props.get(), bloom_bits_per_key, hash_table_ratio,
index_sparseness, 2 * 1024 * 1024);
EXPECT_TRUE(s.ok());
TableProperties* props = const_cast<TableProperties*>(table_properties);
EXPECT_EQ(column_family_id, static_cast<uint32_t>(props->column_family_id));
EXPECT_EQ(column_family_name, props->column_family_name);
if (store_index_in_file) {
@ -297,7 +295,7 @@ class TestPlainTableReader : public PlainTableReader {
EXPECT_TRUE(num_blocks_ptr != props->user_collected_properties.end());
}
}
table_properties_.reset(props);
table_properties_ = std::move(props);
}
~TestPlainTableReader() override {}
@ -337,26 +335,24 @@ class TestPlainTableFactory : public PlainTableFactory {
std::unique_ptr<RandomAccessFileReader>&& file, uint64_t file_size,
std::unique_ptr<TableReader>* table,
bool /*prefetch_index_and_filter_in_cache*/) const override {
TableProperties* props = nullptr;
auto s =
ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions, &props,
true /* compression_type_missing */);
std::unique_ptr<TableProperties> props;
auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions, &props);
EXPECT_TRUE(s.ok());
if (store_index_in_file_) {
BlockHandle bloom_block_handle;
s = FindMetaBlock(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions,
BloomBlockBuilder::kBloomBlock, &bloom_block_handle,
/* compression_type_missing */ true);
s = FindMetaBlockInFile(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions,
BloomBlockBuilder::kBloomBlock,
&bloom_block_handle);
EXPECT_TRUE(s.ok());
BlockHandle index_block_handle;
s = FindMetaBlock(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions,
PlainTableIndexBuilder::kPlainTableIndexBlock,
&index_block_handle, /* compression_type_missing */ true);
s = FindMetaBlockInFile(file.get(), file_size, kPlainTableMagicNumber,
table_reader_options.ioptions,
PlainTableIndexBuilder::kPlainTableIndexBlock,
&index_block_handle);
EXPECT_TRUE(s.ok());
}
@ -370,8 +366,8 @@ class TestPlainTableFactory : public PlainTableFactory {
std::unique_ptr<PlainTableReader> new_reader(new TestPlainTableReader(
table_reader_options.env_options,
table_reader_options.internal_comparator, encoding_type, file_size,
bloom_bits_per_key_, hash_table_ratio_, index_sparseness_, props,
std::move(file), table_reader_options.ioptions,
bloom_bits_per_key_, hash_table_ratio_, index_sparseness_,
std::move(props), std::move(file), table_reader_options.ioptions,
table_reader_options.prefix_extractor, expect_bloom_not_match_,
store_index_in_file_, column_family_id_, column_family_name_));

View File

@ -655,6 +655,16 @@ size_t TableCache::GetMemoryUsageByTableReader(
return ret;
}
bool TableCache::HasEntry(Cache* cache, uint64_t file_number) {
Cache::Handle* handle = cache->Lookup(GetSliceForFileNumber(&file_number));
if (handle) {
cache->Release(handle);
return true;
} else {
return false;
}
}
void TableCache::Evict(Cache* cache, uint64_t file_number) {
cache->Erase(GetSliceForFileNumber(&file_number));
}

View File

@ -125,6 +125,9 @@ class TableCache {
// Evict any entry for the specified file number
static void Evict(Cache* cache, uint64_t file_number);
// Query whether specified file number is currently in cache
static bool HasEntry(Cache* cache, uint64_t file_number);
// Clean table handle and erase it from the table cache
// Used in DB close, or the file is not live anymore.
void EraseHandle(const FileDescriptor& fd, Cache::Handle* handle);

View File

@ -291,11 +291,9 @@ void TestCustomizedTablePropertiesCollector(
std::unique_ptr<RandomAccessFileReader> fake_file_reader(
new RandomAccessFileReader(std::move(source), "test"));
TableProperties* props;
std::unique_ptr<TableProperties> props;
Status s = ReadTableProperties(fake_file_reader.get(), fwf->contents().size(),
magic_number, ioptions, &props,
true /* compression_type_missing */);
std::unique_ptr<TableProperties> props_guard(props);
magic_number, ioptions, &props);
ASSERT_OK(s);
auto user_collected = props->user_collected_properties;
@ -432,13 +430,11 @@ void TestInternalKeyPropertiesCollector(
std::unique_ptr<RandomAccessFileReader> reader(
new RandomAccessFileReader(std::move(source), "test"));
TableProperties* props;
Status s =
ReadTableProperties(reader.get(), fwf->contents().size(), magic_number,
ioptions, &props, true /* compression_type_missing */);
std::unique_ptr<TableProperties> props;
Status s = ReadTableProperties(reader.get(), fwf->contents().size(),
magic_number, ioptions, &props);
ASSERT_OK(s);
std::unique_ptr<TableProperties> props_guard(props);
auto user_collected = props->user_collected_properties;
uint64_t deleted = GetDeletedKeys(user_collected);
ASSERT_EQ(5u, deleted); // deletes + single-deletes

View File

@ -517,7 +517,11 @@ Status VersionEditHandler::MaybeCreateVersion(const VersionEdit& /*edit*/,
Status VersionEditHandler::LoadTables(ColumnFamilyData* cfd,
bool prefetch_index_and_filter_in_cache,
bool is_initial_load) {
if (skip_load_table_files_) {
bool skip_load_table_files = skip_load_table_files_;
TEST_SYNC_POINT_CALLBACK(
"VersionEditHandler::LoadTables:skip_load_table_files",
&skip_load_table_files);
if (skip_load_table_files) {
return Status::OK();
}
assert(cfd != nullptr);

View File

@ -1271,7 +1271,6 @@ Status Version::GetTableProperties(std::shared_ptr<const TableProperties>* tp,
return s;
}
TableProperties* raw_table_properties;
// By setting the magic number to kInvalidTableMagicNumber, we can by
// pass the magic number check in the footer.
std::unique_ptr<RandomAccessFileReader> file_reader(
@ -1279,16 +1278,16 @@ Status Version::GetTableProperties(std::shared_ptr<const TableProperties>* tp,
std::move(file), file_name, nullptr /* env */, io_tracer_,
nullptr /* stats */, 0 /* hist_type */, nullptr /* file_read_hist */,
nullptr /* rate_limiter */, ioptions->listeners));
std::unique_ptr<TableProperties> props;
s = ReadTableProperties(
file_reader.get(), file_meta->fd.GetFileSize(),
Footer::kInvalidTableMagicNumber /* table's magic number */, *ioptions,
&raw_table_properties, false /* compression_type_missing */);
&props);
if (!s.ok()) {
return s;
}
*tp = std::move(props);
RecordTick(ioptions->stats, NUMBER_DIRECT_LOAD_TABLE_PROPERTIES);
*tp = std::shared_ptr<const TableProperties>(raw_table_properties);
return s;
}

View File

@ -71,10 +71,6 @@ struct TablePropertiesNames {
static const std::string kFastCompressionEstimatedDataSize;
};
extern const std::string kPropertiesBlock;
extern const std::string kCompressionDictBlock;
extern const std::string kRangeDelBlock;
// `TablePropertiesCollector` provides the mechanism for users to collect
// their own properties that they are interested in. This class is essentially
// a collection of callback functions that will be invoked during table

View File

@ -46,6 +46,7 @@
#include "table/block_based/full_filter_block.h"
#include "table/block_based/partitioned_filter_block.h"
#include "table/format.h"
#include "table/meta_blocks.h"
#include "table/table_builder.h"
#include "util/coding.h"
#include "util/compression.h"
@ -62,6 +63,8 @@ extern const std::string kHashIndexPrefixesMetadataBlock;
// Without anonymous namespace here, we fail the warning -Wmissing-prototypes
namespace {
constexpr size_t kBlockTrailerSize = BlockBasedTable::kBlockTrailerSize;
// Create a filter block builder based on its type.
FilterBlockBuilder* CreateFilterBlockBuilder(
const ImmutableCFOptions& /*opt*/, const MutableCFOptions& mopt,
@ -1722,7 +1725,12 @@ void BlockBasedTableBuilder::WritePropertiesBlock(
&props_block_size);
}
#endif // !NDEBUG
meta_index_builder->Add(kPropertiesBlock, properties_block_handle);
const std::string* properties_block_meta = &kPropertiesBlock;
TEST_SYNC_POINT_CALLBACK(
"BlockBasedTableBuilder::WritePropertiesBlock:Meta",
&properties_block_meta);
meta_index_builder->Add(*properties_block_meta, properties_block_handle);
}
}

View File

@ -17,6 +17,7 @@
#include "cache/cache_entry_roles.h"
#include "cache/sharded_cache.h"
#include "db/compaction/compaction_picker.h"
#include "db/dbformat.h"
#include "db/pinned_iterators_manager.h"
#include "file/file_prefetch_buffer.h"
@ -747,75 +748,27 @@ Status BlockBasedTable::PrefetchTail(
return s;
}
Status BlockBasedTable::TryReadPropertiesWithGlobalSeqno(
const ReadOptions& ro, FilePrefetchBuffer* prefetch_buffer,
const Slice& handle_value, TableProperties** table_properties) {
assert(table_properties != nullptr);
// If this is an external SST file ingested with write_global_seqno set to
// true, then we expect the checksum mismatch because checksum was written
// by SstFileWriter, but its global seqno in the properties block may have
// been changed during ingestion. In this case, we read the properties
// block, copy it to a memory buffer, change the global seqno to its
// original value, i.e. 0, and verify the checksum again.
BlockHandle props_block_handle;
CacheAllocationPtr tmp_buf;
Status s = ReadProperties(ro, handle_value, rep_->file.get(), prefetch_buffer,
rep_->footer, rep_->ioptions, table_properties,
false /* verify_checksum */, &props_block_handle,
&tmp_buf, false /* compression_type_missing */,
nullptr /* memory_allocator */);
if (s.ok() && tmp_buf) {
const auto seqno_pos_iter =
(*table_properties)
->properties_offsets.find(
ExternalSstFilePropertyNames::kGlobalSeqno);
size_t block_size = static_cast<size_t>(props_block_handle.size());
if (seqno_pos_iter != (*table_properties)->properties_offsets.end()) {
uint64_t global_seqno_offset = seqno_pos_iter->second;
EncodeFixed64(
tmp_buf.get() + global_seqno_offset - props_block_handle.offset(), 0);
}
s = ROCKSDB_NAMESPACE::VerifyBlockChecksum(
rep_->footer.checksum(), tmp_buf.get(), block_size,
rep_->file->file_name(), props_block_handle.offset());
}
return s;
}
Status BlockBasedTable::ReadPropertiesBlock(
const ReadOptions& ro, FilePrefetchBuffer* prefetch_buffer,
InternalIterator* meta_iter, const SequenceNumber largest_seqno) {
bool found_properties_block = true;
Status s;
s = SeekToPropertiesBlock(meta_iter, &found_properties_block);
BlockHandle handle;
s = FindOptionalMetaBlock(meta_iter, kPropertiesBlock, &handle);
if (!s.ok()) {
ROCKS_LOG_WARN(rep_->ioptions.logger,
"Error when seeking to properties block from file: %s",
s.ToString().c_str());
} else if (found_properties_block) {
} else if (!handle.IsNull()) {
s = meta_iter->status();
TableProperties* table_properties = nullptr;
std::unique_ptr<TableProperties> table_properties;
if (s.ok()) {
s = ReadProperties(
ro, meta_iter->value(), rep_->file.get(), prefetch_buffer,
rep_->footer, rep_->ioptions, &table_properties,
true /* verify_checksum */, nullptr /* ret_block_handle */,
nullptr /* ret_block_contents */,
false /* compression_type_missing */, nullptr /* memory_allocator */);
s = ReadTablePropertiesHelper(
ro, handle, rep_->file.get(), prefetch_buffer, rep_->footer,
rep_->ioptions, &table_properties, nullptr /* memory_allocator */);
}
IGNORE_STATUS_IF_ERROR(s);
if (s.IsCorruption()) {
s = TryReadPropertiesWithGlobalSeqno(
ro, prefetch_buffer, meta_iter->value(), &table_properties);
IGNORE_STATUS_IF_ERROR(s);
}
std::unique_ptr<TableProperties> props_guard;
if (table_properties != nullptr) {
props_guard.reset(table_properties);
}
if (!s.ok()) {
ROCKS_LOG_WARN(rep_->ioptions.logger,
"Encountered error while reading data from properties "
@ -823,7 +776,7 @@ Status BlockBasedTable::ReadPropertiesBlock(
s.ToString().c_str());
} else {
assert(table_properties != nullptr);
rep_->table_properties.reset(props_guard.release());
rep_->table_properties = std::move(table_properties);
rep_->blocks_maybe_compressed =
rep_->table_properties->compression_name !=
CompressionTypeToString(kNoCompression);
@ -898,15 +851,14 @@ Status BlockBasedTable::ReadRangeDelBlock(
const InternalKeyComparator& internal_comparator,
BlockCacheLookupContext* lookup_context) {
Status s;
bool found_range_del_block;
BlockHandle range_del_handle;
s = SeekToRangeDelBlock(meta_iter, &found_range_del_block, &range_del_handle);
s = FindOptionalMetaBlock(meta_iter, kRangeDelBlock, &range_del_handle);
if (!s.ok()) {
ROCKS_LOG_WARN(
rep_->ioptions.logger,
"Error when seeking to range delete tombstones block from file: %s",
s.ToString().c_str());
} else if (found_range_del_block && !range_del_handle.IsNull()) {
} else if (!range_del_handle.IsNull()) {
std::unique_ptr<InternalIterator> iter(NewDataBlockIterator<DataBlockIter>(
read_options, range_del_handle,
/*input_iter=*/nullptr, BlockType::kRangeDeletion,
@ -969,9 +921,8 @@ Status BlockBasedTable::PrefetchIndexAndFilterBlocks(
rep_->index_type == BlockBasedTableOptions::kTwoLevelIndexSearch);
// Find compression dictionary handle
bool found_compression_dict = false;
s = SeekToCompressionDictBlock(meta_iter, &found_compression_dict,
&rep_->compression_dict_handle);
s = FindOptionalMetaBlock(meta_iter, kCompressionDictBlock,
&rep_->compression_dict_handle);
if (!s.ok()) {
return s;
}
@ -1243,7 +1194,7 @@ Status BlockBasedTable::GetDataBlockFromCache(
RecordTick(statistics, BLOCK_CACHE_COMPRESSED_HIT);
compressed_block = reinterpret_cast<BlockContents*>(
block_cache_compressed->Value(block_cache_compressed_handle));
CompressionType compression_type = compressed_block->get_compression_type();
CompressionType compression_type = GetBlockCompressionType(*compressed_block);
assert(compression_type != kNoCompression);
// Retrieve the uncompressed contents into a new buffer
@ -1523,8 +1474,9 @@ Status BlockBasedTable::MaybeReadBlockAndLoadToCache(
// Update the block details so that PrefetchBuffer can use the read
// pattern to determine if reads are sequential or not for
// prefetching. It should also take in account blocks read from cache.
prefetch_buffer->UpdateReadPattern(
handle.offset(), block_size(handle), ro.adaptive_readahead);
prefetch_buffer->UpdateReadPattern(handle.offset(),
BlockSizeWithTrailer(handle),
ro.adaptive_readahead);
}
}
}
@ -1569,7 +1521,7 @@ Status BlockBasedTable::MaybeReadBlockAndLoadToCache(
}
}
} else {
raw_block_comp_type = contents->get_compression_type();
raw_block_comp_type = GetBlockCompressionType(*contents);
}
if (s.ok()) {
@ -1731,7 +1683,7 @@ void BlockBasedTable::RetrieveMultipleBlocks(
if (use_shared_buffer && !file->use_direct_io() &&
prev_end == handle.offset()) {
req_offset_for_block.emplace_back(prev_len);
prev_len += block_size(handle);
prev_len += BlockSizeWithTrailer(handle);
} else {
// No compression or current block and previous one is not adjacent:
// Step 1, create a new request for previous blocks
@ -1752,13 +1704,13 @@ void BlockBasedTable::RetrieveMultipleBlocks(
// Step 2, remeber the previous block info
prev_offset = handle.offset();
prev_len = block_size(handle);
prev_len = BlockSizeWithTrailer(handle);
req_offset_for_block.emplace_back(0);
}
req_idx_for_block.emplace_back(read_reqs.size());
PERF_COUNTER_ADD(block_read_count, 1);
PERF_COUNTER_ADD(block_read_byte, block_size(handle));
PERF_COUNTER_ADD(block_read_byte, BlockSizeWithTrailer(handle));
}
// Handle the last block and process the pending last request
if (prev_len != 0) {
@ -1815,7 +1767,7 @@ void BlockBasedTable::RetrieveMultipleBlocks(
Status s = req.status;
if (s.ok()) {
if ((req.result.size() != req.len) ||
(req_offset + block_size(handle) > req.result.size())) {
(req_offset + BlockSizeWithTrailer(handle) > req.result.size())) {
s = Status::Corruption(
"truncated block read from " + rep_->file->file_name() +
" offset " + ToString(handle.offset()) + ", expected " +
@ -1829,7 +1781,7 @@ void BlockBasedTable::RetrieveMultipleBlocks(
// We allocated a buffer for this block. Give ownership of it to
// BlockContents so it can free the memory
assert(req.result.data() == req.scratch);
assert(req.result.size() == block_size(handle));
assert(req.result.size() == BlockSizeWithTrailer(handle));
assert(req_offset == 0);
std::unique_ptr<char[]> raw_block(req.scratch);
raw_block_contents = BlockContents(std::move(raw_block), handle.size());
@ -1852,9 +1804,9 @@ void BlockBasedTable::RetrieveMultipleBlocks(
// begin address of each read request, we need to add the offset
// in each read request. Checksum is stored in the block trailer,
// beyond the payload size.
s = ROCKSDB_NAMESPACE::VerifyBlockChecksum(
footer.checksum(), data + req_offset, handle.size(),
rep_->file->file_name(), handle.offset());
s = VerifyBlockChecksum(footer.checksum(), data + req_offset,
handle.size(), rep_->file->file_name(),
handle.offset());
TEST_SYNC_POINT_CALLBACK("RetrieveMultipleBlocks:VerifyChecksum", &s);
}
} else if (!use_shared_buffer) {
@ -1875,11 +1827,12 @@ void BlockBasedTable::RetrieveMultipleBlocks(
// In all other cases, the raw block is either uncompressed into a heap
// buffer or there is no cache at all.
CompressionType compression_type =
raw_block_contents.get_compression_type();
GetBlockCompressionType(raw_block_contents);
if (use_shared_buffer && (compression_type == kNoCompression ||
(compression_type != kNoCompression &&
rep_->table_options.block_cache_compressed))) {
Slice raw = Slice(req.result.data() + req_offset, block_size(handle));
Slice raw =
Slice(req.result.data() + req_offset, BlockSizeWithTrailer(handle));
raw_block_contents = BlockContents(
CopyBufferToHeap(GetMemoryAllocator(rep_->table_options), raw),
handle.size());
@ -1913,7 +1866,7 @@ void BlockBasedTable::RetrieveMultipleBlocks(
}
CompressionType compression_type =
raw_block_contents.get_compression_type();
GetBlockCompressionType(raw_block_contents);
BlockContents contents;
if (compression_type != kNoCompression) {
UncompressionContext context(compression_type);
@ -2669,7 +2622,7 @@ void BlockBasedTable::MultiGet(const ReadOptions& read_options,
}
} else {
block_handles.emplace_back(handle);
total_len += block_size(handle);
total_len += BlockSizeWithTrailer(handle);
}
}
@ -2690,7 +2643,7 @@ void BlockBasedTable::MultiGet(const ReadOptions& read_options,
// or a false positive. We need to read the data block from
// the SST file
results[i].Reset();
total_len += block_size(block_handles[i]);
total_len += BlockSizeWithTrailer(block_handles[i]);
} else {
block_handles[i] = BlockHandle::NullBlockHandle();
}
@ -3088,20 +3041,22 @@ Status BlockBasedTable::VerifyChecksumInMetaBlocks(
s = handle.DecodeFrom(&input);
BlockContents contents;
const Slice meta_block_name = index_iter->key();
BlockFetcher block_fetcher(
rep_->file.get(), nullptr /* prefetch buffer */, rep_->footer,
ReadOptions(), handle, &contents, rep_->ioptions,
false /* decompress */, false /*maybe_compressed*/,
GetBlockTypeForMetaBlockByName(meta_block_name),
UncompressionDict::GetEmptyDict(), rep_->persistent_cache_options);
s = block_fetcher.ReadBlockContents();
if (s.IsCorruption() && meta_block_name == kPropertiesBlock) {
TableProperties* table_properties;
ReadOptions ro;
s = TryReadPropertiesWithGlobalSeqno(ro, nullptr /* prefetch_buffer */,
index_iter->value(),
&table_properties);
delete table_properties;
if (meta_block_name == kPropertiesBlock) {
// Unfortunate special handling for properties block checksum w/
// global seqno
std::unique_ptr<TableProperties> table_properties;
s = ReadTablePropertiesHelper(ReadOptions(), handle, rep_->file.get(),
nullptr /* prefetch_buffer */, rep_->footer,
rep_->ioptions, &table_properties,
nullptr /* memory_allocator */);
} else {
s = BlockFetcher(
rep_->file.get(), nullptr /* prefetch buffer */, rep_->footer,
ReadOptions(), handle, &contents, rep_->ioptions,
false /* decompress */, false /*maybe_compressed*/,
GetBlockTypeForMetaBlockByName(meta_block_name),
UncompressionDict::GetEmptyDict(), rep_->persistent_cache_options)
.ReadBlockContents();
}
if (!s.ok()) {
break;

View File

@ -19,6 +19,7 @@
#include "table/block_based/cachable_entry.h"
#include "table/block_based/filter_block.h"
#include "table/block_based/uncompression_dict_reader.h"
#include "table/format.h"
#include "table/table_properties_internal.h"
#include "table/table_reader.h"
#include "table/two_level_iterator.h"
@ -68,6 +69,9 @@ class BlockBasedTable : public TableReader {
static const size_t kInitAutoReadaheadSize = 8 * 1024;
static const int kMinNumFileReadsToStartAutoReadahead = 2;
// 1-byte compression type + 32-bit checksum
static constexpr size_t kBlockTrailerSize = 5;
// Attempt to open the table that is stored in bytes [0..file_size)
// of "file", and read the metadata entries necessary to allow
// retrieving data from the table.
@ -224,6 +228,25 @@ class BlockBasedTable : public TableReader {
bool redundant,
Statistics* const statistics);
// Get the size to read from storage for a BlockHandle. size_t because we
// are about to load into memory.
static inline size_t BlockSizeWithTrailer(const BlockHandle& handle) {
return static_cast<size_t>(handle.size() + kBlockTrailerSize);
}
// It's the caller's responsibility to make sure that this is
// for raw block contents, which contains the compression
// byte in the end.
static inline CompressionType GetBlockCompressionType(const char* block_data,
size_t block_size) {
return static_cast<CompressionType>(block_data[block_size]);
}
static inline CompressionType GetBlockCompressionType(
const BlockContents& contents) {
assert(contents.is_raw_block);
return GetBlockCompressionType(contents.data.data(), contents.data.size());
}
// Retrieve all key value pairs from data blocks in the table.
// The key retrieved are internal keys.
Status GetKVPairsFromDataBlocks(std::vector<KVPairBlock>* kv_pair_blocks);
@ -431,10 +454,6 @@ class BlockBasedTable : public TableReader {
FilePrefetchBuffer* prefetch_buffer,
std::unique_ptr<Block>* metaindex_block,
std::unique_ptr<InternalIterator>* iter);
Status TryReadPropertiesWithGlobalSeqno(const ReadOptions& ro,
FilePrefetchBuffer* prefetch_buffer,
const Slice& handle_value,
TableProperties** table_properties);
Status ReadPropertiesBlock(const ReadOptions& ro,
FilePrefetchBuffer* prefetch_buffer,
InternalIterator* meta_iter,

View File

@ -8,6 +8,8 @@
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#include "table/block_based/block_prefetcher.h"
#include "table/block_based/block_based_table_reader.h"
namespace ROCKSDB_NAMESPACE {
void BlockPrefetcher::PrefetchIfNeeded(const BlockBasedTable::Rep* rep,
const BlockHandle& handle,
@ -36,7 +38,7 @@ void BlockPrefetcher::PrefetchIfNeeded(const BlockBasedTable::Rep* rep,
return;
}
size_t len = static_cast<size_t>(block_size(handle));
size_t len = BlockBasedTable::BlockSizeWithTrailer(handle);
size_t offset = handle.offset();
// If FS supports prefetching (readahead_limit_ will be non zero in that case)
@ -80,8 +82,9 @@ void BlockPrefetcher::PrefetchIfNeeded(const BlockBasedTable::Rep* rep,
// If prefetch is not supported, fall back to use internal prefetch buffer.
// Discarding other return status of Prefetch calls intentionally, as
// we can fallback to reading from disk if Prefetch fails.
Status s = rep->file->Prefetch(handle.offset(),
block_size(handle) + readahead_size_);
Status s = rep->file->Prefetch(
handle.offset(),
BlockBasedTable::BlockSizeWithTrailer(handle) + readahead_size_);
if (s.IsNotSupported()) {
rep->CreateFilePrefetchBufferIfNotExists(initial_auto_readahead_size_,
max_auto_readahead_size,

View File

@ -4,7 +4,10 @@
// (found in the LICENSE.Apache file in the root directory).
//
#include "table/block_based/block.h"
#include <stdio.h>
#include <algorithm>
#include <set>
#include <string>
@ -20,7 +23,7 @@
#include "rocksdb/iterator.h"
#include "rocksdb/slice_transform.h"
#include "rocksdb/table.h"
#include "table/block_based/block.h"
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/block_builder.h"
#include "table/format.h"
#include "test_util/testharness.h"
@ -520,7 +523,7 @@ void GenerateRandomIndexEntries(std::vector<std::string> *separators,
separators->emplace_back(*it++);
uint64_t size = rnd.Uniform(1024 * 16);
BlockHandle handle(offset, size);
offset += size + kBlockTrailerSize;
offset += size + BlockBasedTable::kBlockTrailerSize;
block_handles->emplace_back(handle);
}
}

View File

@ -19,6 +19,7 @@
#include "logging/logging.h"
#include "rocksdb/slice.h"
#include "table/block_based/block_based_filter_block.h"
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/filter_policy_internal.h"
#include "table/block_based/full_filter_block.h"
#include "third-party/folly/folly/ConstexprMath.h"
@ -158,7 +159,7 @@ class XXPH3FilterBitsBuilder : public BuiltinFilterBitsBuilder {
// Filter blocks are loaded into block cache with their block trailer.
// We need to make sure that's accounted for in choosing a
// fragmentation-friendly size.
const size_t kExtraPadding = kBlockTrailerSize;
const size_t kExtraPadding = BlockBasedTable::kBlockTrailerSize;
size_t requested = rv + kExtraPadding;
// Allocate and get usable size

View File

@ -11,6 +11,7 @@
#include "rocksdb/options.h"
#include "rocksdb/slice.h"
#include "rocksdb/utilities/customizable_util.h"
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/block_builder.h"
#include "table/block_based/flush_block_policy.h"
#include "table/format.h"
@ -62,7 +63,7 @@ class FlushBlockBySizePolicy : public FlushBlockPolicy {
data_block_builder_.EstimateSizeAfterKV(key, value);
if (align_) {
estimated_size_after += kBlockTrailerSize;
estimated_size_after += BlockBasedTable::kBlockTrailerSize;
return estimated_size_after > block_size_;
}

View File

@ -475,7 +475,8 @@ Status PartitionedFilterBlockReader::CacheDependencies(const ReadOptions& ro,
// Read the last block's offset
biter.SeekToLast();
handle = biter.value().handle;
uint64_t last_off = handle.offset() + handle.size() + kBlockTrailerSize;
uint64_t last_off =
handle.offset() + handle.size() + BlockBasedTable::kBlockTrailerSize;
uint64_t prefetch_len = last_off - prefetch_off;
std::unique_ptr<FilePrefetchBuffer> prefetch_buffer;
rep->CreateFilePrefetchBuffer(0, 0, &prefetch_buffer,

View File

@ -9,6 +9,7 @@
#include "table/block_based/partitioned_index_reader.h"
#include "file/random_access_file_reader.h"
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/partitioned_index_iterator.h"
namespace ROCKSDB_NAMESPACE {
@ -146,7 +147,8 @@ Status PartitionIndexReader::CacheDependencies(const ReadOptions& ro,
return biter.status();
}
handle = biter.value().handle;
uint64_t last_off = handle.offset() + block_size(handle);
uint64_t last_off =
handle.offset() + BlockBasedTable::BlockSizeWithTrailer(handle);
uint64_t prefetch_len = last_off - prefetch_off;
std::unique_ptr<FilePrefetchBuffer> prefetch_buffer;
rep->CreateFilePrefetchBuffer(0, 0, &prefetch_buffer,

View File

@ -22,6 +22,7 @@ void ForceReleaseCachedEntry(void* arg, void* h) {
cache->Release(handle, true /* force_erase */);
}
// WART: this is specific to block-based table
Status VerifyBlockChecksum(ChecksumType type, const char* data,
size_t block_size, const std::string& file_name,
uint64_t offset) {

View File

@ -15,9 +15,11 @@
#include "logging/logging.h"
#include "memory/memory_allocator.h"
#include "monitoring/perf_context_imp.h"
#include "rocksdb/compression_type.h"
#include "rocksdb/env.h"
#include "table/block_based/block.h"
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/block_type.h"
#include "table/block_based/reader_common.h"
#include "table/format.h"
#include "table/persistent_cache_helper.h"
@ -26,12 +28,19 @@
namespace ROCKSDB_NAMESPACE {
inline void BlockFetcher::CheckBlockChecksum() {
// Check the crc of the type and the block contents
if (read_options_.verify_checksums) {
io_status_ = status_to_io_status(ROCKSDB_NAMESPACE::VerifyBlockChecksum(
footer_.checksum(), slice_.data(), block_size_, file_->file_name(),
handle_.offset()));
inline void BlockFetcher::ProcessTrailerIfPresent() {
if (footer_.GetBlockTrailerSize() > 0) {
assert(footer_.GetBlockTrailerSize() == BlockBasedTable::kBlockTrailerSize);
if (read_options_.verify_checksums) {
io_status_ = status_to_io_status(
VerifyBlockChecksum(footer_.checksum(), slice_.data(), block_size_,
file_->file_name(), handle_.offset()));
}
compression_type_ =
BlockBasedTable::GetBlockCompressionType(slice_.data(), block_size_);
} else {
// E.g. plain table or cuckoo table
compression_type_ = kNoCompression;
}
}
@ -63,7 +72,7 @@ inline bool BlockFetcher::TryGetFromPrefetchBuffer() {
if (io_s.ok() && prefetch_buffer_->TryReadFromCache(
opts, handle_.offset(), block_size_with_trailer_,
&slice_, &io_s, for_compaction_)) {
CheckBlockChecksum();
ProcessTrailerIfPresent();
if (!io_status_.ok()) {
return true;
}
@ -88,6 +97,7 @@ inline bool BlockFetcher::TryGetCompressedBlockFromPersistentCache() {
heap_buf_ = CacheAllocationPtr(raw_data.release());
used_buf_ = heap_buf_.get();
slice_ = Slice(heap_buf_.get(), block_size_);
ProcessTrailerIfPresent();
return true;
} else if (!io_status_.IsNotFound() && ioptions_.logger) {
assert(!io_status_.ok());
@ -290,7 +300,7 @@ IOStatus BlockFetcher::ReadBlockContents() {
" bytes, got " + ToString(slice_.size()));
}
CheckBlockChecksum();
ProcessTrailerIfPresent();
if (io_status_.ok()) {
InsertCompressedBlockToPersistentCacheIfNeeded();
} else {
@ -298,8 +308,6 @@ IOStatus BlockFetcher::ReadBlockContents() {
}
}
compression_type_ = get_block_compression_type(slice_.data(), block_size_);
if (do_uncompress_ && compression_type_ != kNoCompression) {
PERF_TIMER_GUARD(block_decompress_time);
// compressed page, uncompress, update cache

View File

@ -37,12 +37,15 @@ namespace ROCKSDB_NAMESPACE {
class BlockFetcher {
public:
BlockFetcher(RandomAccessFileReader* file,
FilePrefetchBuffer* prefetch_buffer, const Footer& footer,
const ReadOptions& read_options, const BlockHandle& handle,
BlockContents* contents, const ImmutableOptions& ioptions,
FilePrefetchBuffer* prefetch_buffer,
const Footer& footer /* ref retained */,
const ReadOptions& read_options,
const BlockHandle& handle /* ref retained */,
BlockContents* contents,
const ImmutableOptions& ioptions /* ref retained */,
bool do_uncompress, bool maybe_compressed, BlockType block_type,
const UncompressionDict& uncompression_dict,
const PersistentCacheOptions& cache_options,
const UncompressionDict& uncompression_dict /* ref retained */,
const PersistentCacheOptions& cache_options /* ref retained */,
MemoryAllocator* memory_allocator = nullptr,
MemoryAllocator* memory_allocator_compressed = nullptr,
bool for_compaction = false)
@ -57,7 +60,7 @@ class BlockFetcher {
maybe_compressed_(maybe_compressed),
block_type_(block_type),
block_size_(static_cast<size_t>(handle_.size())),
block_size_with_trailer_(block_size(handle_)),
block_size_with_trailer_(block_size_ + footer.GetBlockTrailerSize()),
uncompression_dict_(uncompression_dict),
cache_options_(cache_options),
memory_allocator_(memory_allocator),
@ -67,7 +70,12 @@ class BlockFetcher {
}
IOStatus ReadBlockContents();
CompressionType get_compression_type() const { return compression_type_; }
inline CompressionType get_compression_type() const {
return compression_type_;
}
inline size_t GetBlockSizeWithTrailer() const {
return block_size_with_trailer_;
}
#ifndef NDEBUG
int TEST_GetNumStackBufMemcpy() const { return num_stack_buf_memcpy_; }
@ -126,6 +134,6 @@ class BlockFetcher {
void GetBlockContents();
void InsertCompressedBlockToPersistentCacheIfNeeded();
void InsertUncompressedBlockToPersistentCacheIfNeeded();
void CheckBlockChecksum();
void ProcessTrailerIfPresent();
};
} // namespace ROCKSDB_NAMESPACE

View File

@ -68,10 +68,9 @@ class CuckooBuilderTest : public testing::Test {
ImmutableOptions ioptions(options);
// Assert Table Properties.
TableProperties* props = nullptr;
std::unique_ptr<TableProperties> props;
ASSERT_OK(ReadTableProperties(file_reader.get(), read_file_size,
kCuckooTableMagicNumber, ioptions,
&props, true /* compression_type_missing */));
kCuckooTableMagicNumber, ioptions, &props));
// Check unused bucket.
std::string unused_key = props->user_collected_properties[
CuckooTablePropertyNames::kEmptyKey];
@ -108,7 +107,6 @@ class CuckooBuilderTest : public testing::Test {
ASSERT_EQ(props->raw_key_size, keys.size()*props->fixed_key_len);
ASSERT_EQ(props->column_family_id, 0);
ASSERT_EQ(props->column_family_name, kDefaultColumnFamilyName);
delete props;
// Check contents of the bucket.
std::vector<bool> keys_found(keys.size(), false);

View File

@ -58,14 +58,16 @@ CuckooTableReader::CuckooTableReader(
status_ = Status::InvalidArgument("File is not mmaped");
return;
}
TableProperties* props = nullptr;
status_ = ReadTableProperties(file_.get(), file_size, kCuckooTableMagicNumber,
ioptions, &props, true /* compression_type_missing */);
if (!status_.ok()) {
return;
{
std::unique_ptr<TableProperties> props;
status_ = ReadTableProperties(file_.get(), file_size,
kCuckooTableMagicNumber, ioptions, &props);
if (!status_.ok()) {
return;
}
table_props_ = std::move(props);
}
table_props_.reset(props);
auto& user_props = props->user_collected_properties;
auto& user_props = table_props_->user_collected_properties;
auto hash_funs = user_props.find(CuckooTablePropertyNames::kNumHashFunc);
if (hash_funs == user_props.end()) {
status_ = Status::Corruption("Number of hash functions not found");
@ -79,7 +81,7 @@ CuckooTableReader::CuckooTableReader(
}
unused_key_ = unused_key->second;
key_length_ = static_cast<uint32_t>(props->fixed_key_len);
key_length_ = static_cast<uint32_t>(table_props_->fixed_key_len);
auto user_key_len = user_props.find(CuckooTablePropertyNames::kUserKeyLength);
if (user_key_len == user_props.end()) {
status_ = Status::Corruption("User key length not found");

View File

@ -97,8 +97,10 @@ const BlockHandle BlockHandle::kNullBlockHandle(0, 0);
void IndexValue::EncodeTo(std::string* dst, bool have_first_key,
const BlockHandle* previous_handle) const {
if (previous_handle) {
// WART: this is specific to Block-based table
assert(handle.offset() == previous_handle->offset() +
previous_handle->size() + kBlockTrailerSize);
previous_handle->size() +
BlockBasedTable::kBlockTrailerSize);
PutVarsignedint64(dst, handle.size() - previous_handle->size());
} else {
handle.EncodeTo(dst);
@ -117,9 +119,10 @@ Status IndexValue::DecodeFrom(Slice* input, bool have_first_key,
if (!GetVarsignedint64(input, &delta)) {
return Status::Corruption("bad delta-encoded index value");
}
handle = BlockHandle(
previous_handle->offset() + previous_handle->size() + kBlockTrailerSize,
previous_handle->size() + delta);
// WART: this is specific to Block-based table
handle = BlockHandle(previous_handle->offset() + previous_handle->size() +
BlockBasedTable::kBlockTrailerSize,
previous_handle->size() + delta);
} else {
Status s = handle.DecodeFrom(input);
if (!s.ok()) {
@ -163,6 +166,18 @@ inline uint64_t UpconvertLegacyFooterFormat(uint64_t magic_number) {
}
} // namespace
void Footer::set_table_magic_number(uint64_t magic_number) {
assert(!HasInitializedTableMagicNumber());
table_magic_number_ = magic_number;
if (magic_number == kBlockBasedTableMagicNumber ||
magic_number == kLegacyBlockBasedTableMagicNumber) {
block_trailer_size_ =
static_cast<uint8_t>(BlockBasedTable::kBlockTrailerSize);
} else {
block_trailer_size_ = 0;
}
}
// legacy footer format:
// metaindex handle (varint64 offset, varint64 size)
// index handle (varint64 offset, varint64 size)

View File

@ -185,12 +185,13 @@ class Footer {
// convert this object to a human readable form
std::string ToString() const;
// Block trailer size used by file with this footer (e.g. 5 for block-based
// table and 0 for plain table)
inline size_t GetBlockTrailerSize() const { return block_trailer_size_; }
private:
// REQUIRES: magic number wasn't initialized.
void set_table_magic_number(uint64_t magic_number) {
assert(!HasInitializedTableMagicNumber());
table_magic_number_ = magic_number;
}
void set_table_magic_number(uint64_t magic_number);
// return true if @table_magic_number_ is set to a value different
// from @kInvalidTableMagicNumber.
@ -200,6 +201,7 @@ class Footer {
uint32_t version_;
ChecksumType checksum_;
uint8_t block_trailer_size_ = 0; // set based on magic number
BlockHandle metaindex_handle_;
BlockHandle index_handle_;
uint64_t table_magic_number_ = 0;
@ -213,19 +215,6 @@ Status ReadFooterFromFile(const IOOptions& opts, RandomAccessFileReader* file,
uint64_t file_size, Footer* footer,
uint64_t enforce_table_magic_number = 0);
// 1-byte compression type + 32-bit checksum
static const size_t kBlockTrailerSize = 5;
// Make block size calculation for IO less error prone
inline uint64_t block_size(const BlockHandle& handle) {
return handle.size() + kBlockTrailerSize;
}
inline CompressionType get_block_compression_type(const char* block_data,
size_t block_size) {
return static_cast<CompressionType>(block_data[block_size]);
}
// Computes a checksum using the given ChecksumType. Sometimes we need to
// include one more input byte logically at the end but not part of the main
// data buffer. If data_size >= 1, then
@ -242,12 +231,13 @@ uint32_t ComputeBuiltinChecksumWithLastByte(ChecksumType type, const char* data,
// BlockContents objects representing data read from mmapped files only point
// into the mmapped region.
struct BlockContents {
Slice data; // Actual contents of data
// Points to block payload (without trailer)
Slice data;
CacheAllocationPtr allocation;
#ifndef NDEBUG
// Whether the block is a raw block, which contains compression type
// byte. It is only used for assertion.
// Whether there is a known trailer after what is pointed to by `data`.
// See BlockBasedTable::GetCompressionType.
bool is_raw_block = false;
#endif // NDEBUG
@ -269,14 +259,6 @@ struct BlockContents {
// Returns whether the object has ownership of the underlying data bytes.
bool own_bytes() const { return allocation.get() != nullptr; }
// It's the caller's responsibility to make sure that this is
// for raw block contents, which contains the compression
// byte in the end.
CompressionType get_compression_type() const {
assert(is_raw_block);
return get_block_compression_type(data.data(), data.size());
}
// The additional memory space taken by the block data.
size_t usable_size() const {
if (allocation.get() != nullptr) {

View File

@ -11,18 +11,27 @@
#include "db/table_properties_collector.h"
#include "file/random_access_file_reader.h"
#include "logging/logging.h"
#include "rocksdb/options.h"
#include "rocksdb/table.h"
#include "rocksdb/table_properties.h"
#include "table/block_based/block.h"
#include "table/block_based/reader_common.h"
#include "table/format.h"
#include "table/internal_iterator.h"
#include "table/persistent_cache_helper.h"
#include "table/sst_file_writer_collectors.h"
#include "table/table_properties_internal.h"
#include "test_util/sync_point.h"
#include "util/coding.h"
namespace ROCKSDB_NAMESPACE {
const std::string kPropertiesBlock = "rocksdb.properties";
// Old property block name for backward compatibility
const std::string kPropertiesBlockOldName = "rocksdb.stats";
const std::string kCompressionDictBlock = "rocksdb.compression_dict";
const std::string kRangeDelBlock = "rocksdb.range_del";
MetaIndexBuilder::MetaIndexBuilder()
: meta_index_block_(new BlockBuilder(1 /* restart interval */)) {}
@ -211,40 +220,33 @@ bool NotifyCollectTableCollectorsOnFinish(
return all_succeeded;
}
Status ReadProperties(const ReadOptions& read_options,
const Slice& handle_value, RandomAccessFileReader* file,
FilePrefetchBuffer* prefetch_buffer, const Footer& footer,
const ImmutableOptions& ioptions,
TableProperties** table_properties, bool verify_checksum,
BlockHandle* ret_block_handle,
CacheAllocationPtr* verification_buf,
bool /*compression_type_missing*/,
MemoryAllocator* memory_allocator) {
// FIXME: should be a parameter for reading table properties to use persistent
// cache?
Status ReadTablePropertiesHelper(
const ReadOptions& ro, const BlockHandle& handle,
RandomAccessFileReader* file, FilePrefetchBuffer* prefetch_buffer,
const Footer& footer, const ImmutableOptions& ioptions,
std::unique_ptr<TableProperties>* table_properties,
MemoryAllocator* memory_allocator) {
assert(table_properties);
Slice v = handle_value;
BlockHandle handle;
if (!handle.DecodeFrom(&v).ok()) {
return Status::InvalidArgument("Failed to decode properties block handle");
}
// If this is an external SST file ingested with write_global_seqno set to
// true, then we expect the checksum mismatch because checksum was written
// by SstFileWriter, but its global seqno in the properties block may have
// been changed during ingestion. For this reason, we initially read
// and process without checksum verification, then later try checksum
// verification so that if it fails, we can copy to a temporary buffer with
// global seqno set to its original value, i.e. 0, and attempt checksum
// verification again.
ReadOptions modified_ro = ro;
modified_ro.verify_checksums = false;
BlockContents block_contents;
Status s;
// FIXME: should be a parameter for reading table properties to use persistent
// cache
PersistentCacheOptions cache_options;
ReadOptions ro = read_options;
ro.verify_checksums = verify_checksum;
BlockFetcher block_fetcher(file, prefetch_buffer, footer, ro, handle,
BlockFetcher block_fetcher(file, prefetch_buffer, footer, modified_ro, handle,
&block_contents, ioptions, false /* decompress */,
false /*maybe_compressed*/, BlockType::kProperties,
UncompressionDict::GetEmptyDict(), cache_options,
memory_allocator);
s = block_fetcher.ReadBlockContents();
// property block is never compressed. Need to add uncompress logic if we are
// to compress it..
UncompressionDict::GetEmptyDict(),
PersistentCacheOptions::kEmpty, memory_allocator);
Status s = block_fetcher.ReadBlockContents();
if (!s.ok()) {
return s;
}
@ -254,7 +256,7 @@ Status ReadProperties(const ReadOptions& read_options,
properties_block.NewDataIterator(BytewiseComparator(),
kDisableGlobalSequenceNumber, &iter);
auto new_table_properties = new TableProperties();
std::unique_ptr<TableProperties> new_table_properties{new TableProperties};
// All pre-defined properties of type uint64_t
std::unordered_map<std::string, uint64_t*> predefined_uint64_properties = {
{TablePropertiesNames::kOriginalFileNumber,
@ -370,21 +372,30 @@ Status ReadProperties(const ReadOptions& read_options,
{key, raw_val.ToString()});
}
}
if (s.ok()) {
*table_properties = new_table_properties;
if (ret_block_handle != nullptr) {
*ret_block_handle = handle;
}
if (verification_buf != nullptr) {
size_t len = static_cast<size_t>(handle.size() + kBlockTrailerSize);
*verification_buf =
ROCKSDB_NAMESPACE::AllocateBlock(len, memory_allocator);
if (verification_buf->get() != nullptr) {
memcpy(verification_buf->get(), block_contents.data.data(), len);
// Modified version of BlockFetcher checksum verification
// (See write_global_seqno comment above)
if (s.ok() && footer.GetBlockTrailerSize() > 0) {
s = VerifyBlockChecksum(footer.checksum(), properties_block.data(),
properties_block.size(), file->file_name(),
handle.offset());
if (s.IsCorruption()) {
const auto seqno_pos_iter = new_table_properties->properties_offsets.find(
ExternalSstFilePropertyNames::kGlobalSeqno);
if (seqno_pos_iter != new_table_properties->properties_offsets.end()) {
std::string tmp_buf(properties_block.data(),
block_fetcher.GetBlockSizeWithTrailer());
uint64_t global_seqno_offset = seqno_pos_iter->second - handle.offset();
EncodeFixed64(&tmp_buf[static_cast<size_t>(global_seqno_offset)], 0);
s = VerifyBlockChecksum(footer.checksum(), tmp_buf.data(),
properties_block.size(), file->file_name(),
handle.offset());
}
}
} else {
delete new_table_properties;
}
if (s.ok()) {
*table_properties = std::move(new_table_properties);
}
return s;
@ -393,11 +404,72 @@ Status ReadProperties(const ReadOptions& read_options,
Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number,
const ImmutableOptions& ioptions,
TableProperties** properties,
bool compression_type_missing,
std::unique_ptr<TableProperties>* properties,
MemoryAllocator* memory_allocator,
FilePrefetchBuffer* prefetch_buffer) {
// -- Read metaindex block
BlockHandle block_handle;
Footer footer;
Status s = FindMetaBlockInFile(file, file_size, table_magic_number, ioptions,
kPropertiesBlock, &block_handle,
memory_allocator, prefetch_buffer, &footer);
if (!s.ok()) {
return s;
}
if (!block_handle.IsNull()) {
s = ReadTablePropertiesHelper(ReadOptions(), block_handle, file,
prefetch_buffer, footer, ioptions, properties,
memory_allocator);
} else {
s = Status::NotFound();
}
return s;
}
Status FindOptionalMetaBlock(InternalIterator* meta_index_iter,
const std::string& meta_block_name,
BlockHandle* block_handle) {
assert(block_handle != nullptr);
meta_index_iter->Seek(meta_block_name);
if (meta_index_iter->status().ok()) {
if (meta_index_iter->Valid() && meta_index_iter->key() == meta_block_name) {
Slice v = meta_index_iter->value();
return block_handle->DecodeFrom(&v);
} else if (meta_block_name == kPropertiesBlock) {
// Have to try old name for compatibility
meta_index_iter->Seek(kPropertiesBlockOldName);
if (meta_index_iter->status().ok() && meta_index_iter->Valid() &&
meta_index_iter->key() == kPropertiesBlockOldName) {
Slice v = meta_index_iter->value();
return block_handle->DecodeFrom(&v);
}
}
}
// else
*block_handle = BlockHandle::NullBlockHandle();
return meta_index_iter->status();
}
Status FindMetaBlock(InternalIterator* meta_index_iter,
const std::string& meta_block_name,
BlockHandle* block_handle) {
Status s =
FindOptionalMetaBlock(meta_index_iter, meta_block_name, block_handle);
if (s.ok() && block_handle->IsNull()) {
return Status::Corruption("Cannot find the meta block", meta_block_name);
} else {
return s;
}
}
Status FindMetaBlockInFile(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number,
const ImmutableOptions& ioptions,
const std::string& meta_block_name,
BlockHandle* block_handle,
MemoryAllocator* memory_allocator,
FilePrefetchBuffer* prefetch_buffer,
Footer* footer_out) {
Footer footer;
IOOptions opts;
auto s = ReadFooterFromFile(opts, file, prefetch_buffer, file_size, &footer,
@ -405,89 +477,18 @@ Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size,
if (!s.ok()) {
return s;
}
auto metaindex_handle = footer.metaindex_handle();
BlockContents metaindex_contents;
ReadOptions read_options;
read_options.verify_checksums = false;
PersistentCacheOptions cache_options;
BlockFetcher block_fetcher(
file, prefetch_buffer, footer, read_options, metaindex_handle,
&metaindex_contents, ioptions, false /* decompress */,
false /*maybe_compressed*/, BlockType::kMetaIndex,
UncompressionDict::GetEmptyDict(), cache_options, memory_allocator);
s = block_fetcher.ReadBlockContents();
if (!s.ok()) {
return s;
}
// property blocks are never compressed. Need to add uncompress logic if we
// are to compress it.
Block metaindex_block(std::move(metaindex_contents));
std::unique_ptr<InternalIterator> meta_iter(metaindex_block.NewDataIterator(
BytewiseComparator(), kDisableGlobalSequenceNumber));
// -- Read property block
bool found_properties_block = true;
s = SeekToPropertiesBlock(meta_iter.get(), &found_properties_block);
if (!s.ok()) {
return s;
}
TableProperties table_properties;
if (found_properties_block == true) {
s = ReadProperties(
read_options, meta_iter->value(), file, prefetch_buffer, footer,
ioptions, properties, false /* verify_checksum */,
nullptr /* ret_block_hanel */, nullptr /* ret_block_contents */,
compression_type_missing, memory_allocator);
} else {
s = Status::NotFound();
}
return s;
}
Status FindMetaBlock(InternalIterator* meta_index_iter,
const std::string& meta_block_name,
BlockHandle* block_handle) {
meta_index_iter->Seek(meta_block_name);
if (meta_index_iter->status().ok() && meta_index_iter->Valid() &&
meta_index_iter->key() == meta_block_name) {
Slice v = meta_index_iter->value();
return block_handle->DecodeFrom(&v);
} else {
return Status::Corruption("Cannot find the meta block", meta_block_name);
}
}
Status FindMetaBlock(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number,
const ImmutableOptions& ioptions,
const std::string& meta_block_name,
BlockHandle* block_handle,
bool /*compression_type_missing*/,
MemoryAllocator* memory_allocator) {
Footer footer;
IOOptions opts;
auto s = ReadFooterFromFile(opts, file, nullptr /* prefetch_buffer */,
file_size, &footer, table_magic_number);
if (!s.ok()) {
return s;
if (footer_out) {
*footer_out = footer;
}
auto metaindex_handle = footer.metaindex_handle();
BlockContents metaindex_contents;
ReadOptions read_options;
read_options.verify_checksums = false;
PersistentCacheOptions cache_options;
BlockFetcher block_fetcher(
file, nullptr /* prefetch_buffer */, footer, read_options,
metaindex_handle, &metaindex_contents, ioptions,
false /* do decompression */, false /*maybe_compressed*/,
BlockType::kMetaIndex, UncompressionDict::GetEmptyDict(), cache_options,
memory_allocator);
s = block_fetcher.ReadBlockContents();
s = BlockFetcher(file, prefetch_buffer, footer, ReadOptions(),
metaindex_handle, &metaindex_contents, ioptions,
false /* do decompression */, false /*maybe_compressed*/,
BlockType::kMetaIndex, UncompressionDict::GetEmptyDict(),
PersistentCacheOptions::kEmpty, memory_allocator)
.ReadBlockContents();
if (!s.ok()) {
return s;
}
@ -507,56 +508,27 @@ Status ReadMetaBlock(RandomAccessFileReader* file,
uint64_t table_magic_number,
const ImmutableOptions& ioptions,
const std::string& meta_block_name, BlockType block_type,
BlockContents* contents, bool /*compression_type_missing*/,
BlockContents* contents,
MemoryAllocator* memory_allocator) {
Status status;
Footer footer;
IOOptions opts;
status = ReadFooterFromFile(opts, file, prefetch_buffer, file_size, &footer,
table_magic_number);
if (!status.ok()) {
return status;
}
// Reading metaindex block
auto metaindex_handle = footer.metaindex_handle();
BlockContents metaindex_contents;
ReadOptions read_options;
read_options.verify_checksums = false;
PersistentCacheOptions cache_options;
BlockFetcher block_fetcher(
file, prefetch_buffer, footer, read_options, metaindex_handle,
&metaindex_contents, ioptions, false /* decompress */,
false /*maybe_compressed*/, BlockType::kMetaIndex,
UncompressionDict::GetEmptyDict(), cache_options, memory_allocator);
status = block_fetcher.ReadBlockContents();
if (!status.ok()) {
return status;
}
// meta block is never compressed. Need to add uncompress logic if we are to
// compress it.
// Finding metablock
Block metaindex_block(std::move(metaindex_contents));
std::unique_ptr<InternalIterator> meta_iter;
meta_iter.reset(metaindex_block.NewDataIterator(
BytewiseComparator(), kDisableGlobalSequenceNumber));
// TableProperties requires special handling because of checksum issues.
// Call ReadTableProperties instead for that case.
assert(block_type != BlockType::kProperties);
BlockHandle block_handle;
status = FindMetaBlock(meta_iter.get(), meta_block_name, &block_handle);
Footer footer;
Status status = FindMetaBlockInFile(
file, file_size, table_magic_number, ioptions, meta_block_name,
&block_handle, memory_allocator, prefetch_buffer, &footer);
if (!status.ok()) {
return status;
}
// Reading metablock
BlockFetcher block_fetcher2(
file, prefetch_buffer, footer, read_options, block_handle, contents,
ioptions, false /* decompress */, false /*maybe_compressed*/, block_type,
UncompressionDict::GetEmptyDict(), cache_options, memory_allocator);
return block_fetcher2.ReadBlockContents();
return BlockFetcher(file, prefetch_buffer, footer, ReadOptions(),
block_handle, contents, ioptions, false /* decompress */,
false /*maybe_compressed*/, block_type,
UncompressionDict::GetEmptyDict(),
PersistentCacheOptions::kEmpty, memory_allocator)
.ReadBlockContents();
}
} // namespace ROCKSDB_NAMESPACE

View File

@ -30,6 +30,12 @@ class Logger;
class RandomAccessFile;
struct TableProperties;
// Meta block names for metaindex
extern const std::string kPropertiesBlock;
extern const std::string kPropertiesBlockOldName;
extern const std::string kCompressionDictBlock;
extern const std::string kRangeDelBlock;
class MetaIndexBuilder {
public:
MetaIndexBuilder(const MetaIndexBuilder&) = delete;
@ -95,49 +101,49 @@ bool NotifyCollectTableCollectorsOnFinish(
const std::vector<std::unique_ptr<IntTblPropCollector>>& collectors,
Logger* info_log, PropertyBlockBuilder* builder);
// Read the properties from the table.
// Read table properties from a file using known BlockHandle.
// @returns a status to indicate if the operation succeeded. On success,
// *table_properties will point to a heap-allocated TableProperties
// object, otherwise value of `table_properties` will not be modified.
Status ReadProperties(const ReadOptions& ro, const Slice& handle_value,
RandomAccessFileReader* file,
FilePrefetchBuffer* prefetch_buffer, const Footer& footer,
const ImmutableOptions& ioptions,
TableProperties** table_properties, bool verify_checksum,
BlockHandle* block_handle,
CacheAllocationPtr* verification_buf,
bool compression_type_missing = false,
MemoryAllocator* memory_allocator = nullptr);
Status ReadTablePropertiesHelper(
const ReadOptions& ro, const BlockHandle& handle,
RandomAccessFileReader* file, FilePrefetchBuffer* prefetch_buffer,
const Footer& footer, const ImmutableOptions& ioptions,
std::unique_ptr<TableProperties>* table_properties,
MemoryAllocator* memory_allocator = nullptr);
// Directly read the properties from the properties block of a plain table.
// Read table properties from the properties block of a plain table.
// @returns a status to indicate if the operation succeeded. On success,
// *table_properties will point to a heap-allocated TableProperties
// object, otherwise value of `table_properties` will not be modified.
// certain tables do not have compression_type byte setup properly for
// uncompressed blocks, caller can request to reset compression type by
// passing compression_type_missing = true, the same applies to
// `ReadProperties`, `FindMetaBlock`, and `ReadMetaBlock`
Status ReadTableProperties(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number,
const ImmutableOptions& ioptions,
TableProperties** properties,
bool compression_type_missing = false,
std::unique_ptr<TableProperties>* properties,
MemoryAllocator* memory_allocator = nullptr,
FilePrefetchBuffer* prefetch_buffer = nullptr);
// Find the meta block from the meta index block.
// Find the meta block from the meta index block. Returns OK and
// block_handle->IsNull() if not found.
Status FindOptionalMetaBlock(InternalIterator* meta_index_iter,
const std::string& meta_block_name,
BlockHandle* block_handle);
// Find the meta block from the meta index block. Returns Corruption if not
// found.
Status FindMetaBlock(InternalIterator* meta_index_iter,
const std::string& meta_block_name,
BlockHandle* block_handle);
// Find the meta block
Status FindMetaBlock(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number,
const ImmutableOptions& ioptions,
const std::string& meta_block_name,
BlockHandle* block_handle,
bool compression_type_missing = false,
MemoryAllocator* memory_allocator = nullptr);
Status FindMetaBlockInFile(RandomAccessFileReader* file, uint64_t file_size,
uint64_t table_magic_number,
const ImmutableOptions& ioptions,
const std::string& meta_block_name,
BlockHandle* block_handle,
MemoryAllocator* memory_allocator = nullptr,
FilePrefetchBuffer* prefetch_buffer = nullptr,
Footer* footer_out = nullptr);
// Read the specified meta block with name meta_block_name
// from `file` and initialize `contents` with contents of this block.
@ -148,7 +154,6 @@ Status ReadMetaBlock(RandomAccessFileReader* file,
const ImmutableOptions& ioptions,
const std::string& meta_block_name, BlockType block_type,
BlockContents* contents,
bool compression_type_missing = false,
MemoryAllocator* memory_allocator = nullptr);
} // namespace ROCKSDB_NAMESPACE

View File

@ -9,6 +9,8 @@
namespace ROCKSDB_NAMESPACE {
const PersistentCacheOptions PersistentCacheOptions::kEmpty;
void PersistentCacheHelper::InsertRawPage(
const PersistentCacheOptions& cache_options, const BlockHandle& handle,
const char* data, const size_t size) {
@ -70,7 +72,8 @@ Status PersistentCacheHelper::LookupRawPage(
}
// cache hit
assert(raw_data_size == handle.size() + kBlockTrailerSize);
// Block-based table is assumed
assert(raw_data_size == handle.size() + BlockBasedTable::kBlockTrailerSize);
assert(size == raw_data_size);
RecordTick(cache_options.statistics, PERSISTENT_CACHE_HIT);
return Status::OK();

View File

@ -29,6 +29,8 @@ struct PersistentCacheOptions {
std::shared_ptr<PersistentCache> persistent_cache;
std::string key_prefix;
Statistics* statistics = nullptr;
static const PersistentCacheOptions kEmpty;
};
} // namespace ROCKSDB_NAMESPACE

View File

@ -129,11 +129,9 @@ Status PlainTableReader::Open(
return Status::NotSupported("File is too large for PlainTableReader!");
}
TableProperties* props_ptr = nullptr;
std::unique_ptr<TableProperties> props;
auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber,
ioptions, &props_ptr,
true /* compression_type_missing */);
std::shared_ptr<TableProperties> props(props_ptr);
ioptions, &props);
if (!s.ok()) {
return s;
}
@ -186,7 +184,7 @@ Status PlainTableReader::Open(
new_reader->full_scan_mode_ = true;
}
// PopulateIndex can add to the props, so don't store them until now
new_reader->table_properties_ = props;
new_reader->table_properties_ = std::move(props);
if (immortal_table && new_reader->file_info_.is_mmap_mode) {
new_reader->dummy_cleanable_.reset(new Cleanable());
@ -308,8 +306,7 @@ Status PlainTableReader::PopulateIndex(TableProperties* props,
Status s = ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */,
file_size_, kPlainTableMagicNumber, ioptions_,
PlainTableIndexBuilder::kPlainTableIndexBlock,
BlockType::kIndex, &index_block_contents,
true /* compression_type_missing */);
BlockType::kIndex, &index_block_contents);
bool index_in_file = s.ok();
@ -320,8 +317,7 @@ Status PlainTableReader::PopulateIndex(TableProperties* props,
s = ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */,
file_size_, kPlainTableMagicNumber, ioptions_,
BloomBlockBuilder::kBloomBlock, BlockType::kFilter,
&bloom_block_contents,
true /* compression_type_missing */);
&bloom_block_contents);
bloom_in_file = s.ok() && bloom_block_contents.data.size() > 0;
}

View File

@ -332,18 +332,16 @@ Status SstFileDumper::ShowCompressionSize(
return Status::OK();
}
// Reads TableProperties prior to opening table reader in order to set up
// options.
Status SstFileDumper::ReadTableProperties(uint64_t table_magic_number,
RandomAccessFileReader* file,
uint64_t file_size,
FilePrefetchBuffer* prefetch_buffer) {
TableProperties* table_properties = nullptr;
Status s = ROCKSDB_NAMESPACE::ReadTableProperties(
file, file_size, table_magic_number, ioptions_, &table_properties,
/* compression_type_missing= */ false,
file, file_size, table_magic_number, ioptions_, &table_properties_,
/* memory_allocator= */ nullptr, prefetch_buffer);
if (s.ok()) {
table_properties_.reset(table_properties);
} else {
if (!s.ok()) {
if (!silent_) {
fprintf(stdout, "Not able to read table properties\n");
}
@ -488,6 +486,7 @@ Status SstFileDumper::ReadSequential(bool print_kv, uint64_t read_num,
return ret;
}
// Provides TableProperties to API user
Status SstFileDumper::ReadTableProperties(
std::shared_ptr<const TableProperties>* table_properties) {
if (!table_reader_) {

View File

@ -7,12 +7,10 @@
#include "port/port.h"
#include "rocksdb/env.h"
#include "rocksdb/iterator.h"
#include "rocksdb/unique_id.h"
#include "table/block_based/block.h"
#include "table/internal_iterator.h"
#include "table/table_properties_internal.h"
#include "table/unique_id_impl.h"
#include "util/random.h"
#include "util/string_util.h"
namespace ROCKSDB_NAMESPACE {
@ -44,31 +42,6 @@ namespace {
props, key, ToString(value), prop_delim, kv_delim
);
}
// Seek to the specified meta block.
// Return true if it successfully seeks to that block.
Status SeekToMetaBlock(InternalIterator* meta_iter,
const std::string& block_name, bool* is_found,
BlockHandle* block_handle = nullptr) {
if (block_handle != nullptr) {
*block_handle = BlockHandle::NullBlockHandle();
}
*is_found = true;
meta_iter->Seek(block_name);
if (meta_iter->status().ok()) {
if (meta_iter->Valid() && meta_iter->key() == block_name) {
*is_found = true;
if (block_handle) {
Slice v = meta_iter->value();
return block_handle->DecodeFrom(&v);
}
} else {
*is_found = false;
return Status::OK();
}
}
return meta_iter->status();
}
}
std::string TableProperties::ToString(
@ -306,12 +279,6 @@ const std::string TablePropertiesNames::kSlowCompressionEstimatedDataSize =
const std::string TablePropertiesNames::kFastCompressionEstimatedDataSize =
"rocksdb.sample_for_compression.fast.data.size";
extern const std::string kPropertiesBlock = "rocksdb.properties";
// Old property block name for backward compatibility
extern const std::string kPropertiesBlockOldName = "rocksdb.stats";
extern const std::string kCompressionDictBlock = "rocksdb.compression_dict";
extern const std::string kRangeDelBlock = "rocksdb.range_del";
#ifndef NDEBUG
void TEST_SetRandomTableProperties(TableProperties* props) {
Random* r = Random::GetTLSInstance();
@ -335,26 +302,4 @@ void TEST_SetRandomTableProperties(TableProperties* props) {
}
#endif
// Seek to the properties block.
// Return true if it successfully seeks to the properties block.
Status SeekToPropertiesBlock(InternalIterator* meta_iter, bool* is_found) {
Status status = SeekToMetaBlock(meta_iter, kPropertiesBlock, is_found);
if (!*is_found && status.ok()) {
status = SeekToMetaBlock(meta_iter, kPropertiesBlockOldName, is_found);
}
return status;
}
// Seek to the compression dictionary block.
// Return true if it successfully seeks to that block.
Status SeekToCompressionDictBlock(InternalIterator* meta_iter, bool* is_found,
BlockHandle* block_handle) {
return SeekToMetaBlock(meta_iter, kCompressionDictBlock, is_found, block_handle);
}
Status SeekToRangeDelBlock(InternalIterator* meta_iter, bool* is_found,
BlockHandle* block_handle = nullptr) {
return SeekToMetaBlock(meta_iter, kRangeDelBlock, is_found, block_handle);
}
} // namespace ROCKSDB_NAMESPACE

View File

@ -5,29 +5,9 @@
#pragma once
#include "rocksdb/status.h"
#include "rocksdb/table_properties.h"
#include "table/internal_iterator.h"
namespace ROCKSDB_NAMESPACE {
class BlockHandle;
// Seek to the properties block.
// If it successfully seeks to the properties block, "is_found" will be
// set to true.
Status SeekToPropertiesBlock(InternalIterator* meta_iter, bool* is_found);
// Seek to the compression dictionary block.
// If it successfully seeks to the properties block, "is_found" will be
// set to true.
Status SeekToCompressionDictBlock(InternalIterator* meta_iter, bool* is_found,
BlockHandle* block_handle);
// TODO(andrewkr) should not put all meta block in table_properties.h/cc
Status SeekToRangeDelBlock(InternalIterator* meta_iter, bool* is_found,
BlockHandle* block_handle);
#ifndef NDEBUG
void TEST_SetRandomTableProperties(TableProperties* props);
#endif

View File

@ -1698,7 +1698,8 @@ TEST_P(BlockBasedTableTest, BasicBlockBasedTableProperties) {
block_builder.Add(item.first, item.second);
}
Slice content = block_builder.Finish();
ASSERT_EQ(content.size() + kBlockTrailerSize + diff_internal_user_bytes,
ASSERT_EQ(content.size() + BlockBasedTable::kBlockTrailerSize +
diff_internal_user_bytes,
props.data_size);
c.ResetTableReader();
}
@ -3836,11 +3837,9 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) {
std::unique_ptr<RandomAccessFileReader> file_reader(
new RandomAccessFileReader(std::move(source), "test"));
TableProperties* props = nullptr;
std::unique_ptr<TableProperties> props;
auto s = ReadTableProperties(file_reader.get(), ss->contents().size(),
kPlainTableMagicNumber, ioptions,
&props, true /* compression_type_missing */);
std::unique_ptr<TableProperties> props_guard(props);
kPlainTableMagicNumber, ioptions, &props);
ASSERT_OK(s);
ASSERT_EQ(0ul, props->index_size);
@ -4481,10 +4480,10 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) {
std::unique_ptr<RandomAccessFileReader> file_reader(
new RandomAccessFileReader(std::move(source), ""));
TableProperties* props = nullptr;
std::unique_ptr<TableProperties> props;
ASSERT_OK(ReadTableProperties(file_reader.get(), ss_rw.contents().size(),
kBlockBasedTableMagicNumber, ioptions,
&props, true /* compression_type_missing */));
&props));
UserCollectedProperties user_props = props->user_collected_properties;
version = DecodeFixed32(
@ -4493,8 +4492,6 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) {
user_props[ExternalSstFilePropertyNames::kGlobalSeqno].c_str());
global_seqno_offset =
props->properties_offsets[ExternalSstFilePropertyNames::kGlobalSeqno];
delete props;
};
// Helper function to update the value of the global seqno in the file
@ -4661,15 +4658,14 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) {
new RandomAccessFileReader(std::move(source), "test"));
// Helper function to get version, global_seqno, global_seqno_offset
std::function<void()> VerifyBlockAlignment = [&]() {
TableProperties* props = nullptr;
std::unique_ptr<TableProperties> props;
ASSERT_OK(ReadTableProperties(file_reader.get(), sink->contents().size(),
kBlockBasedTableMagicNumber, ioptions, &props,
true /* compression_type_missing */));
kBlockBasedTableMagicNumber, ioptions,
&props));
uint64_t data_block_size = props->data_size / props->num_data_blocks;
ASSERT_EQ(data_block_size, 4096);
ASSERT_EQ(props->data_size, data_block_size * props->num_data_blocks);
delete props;
};
VerifyBlockAlignment();
@ -4788,16 +4784,13 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) {
std::unique_ptr<InternalIterator> meta_iter(metaindex_block.NewDataIterator(
BytewiseComparator(), kDisableGlobalSequenceNumber));
bool found_properties_block = true;
ASSERT_OK(SeekToPropertiesBlock(meta_iter.get(), &found_properties_block));
ASSERT_TRUE(found_properties_block);
// -- Read properties block
Slice v = meta_iter->value();
BlockHandle properties_handle;
ASSERT_OK(properties_handle.DecodeFrom(&v));
ASSERT_OK(FindOptionalMetaBlock(meta_iter.get(), kPropertiesBlock,
&properties_handle));
ASSERT_FALSE(properties_handle.IsNull());
BlockContents properties_contents;
BlockFetchHelper(properties_handle, BlockType::kProperties,
&properties_contents);
Block properties_block(std::move(properties_contents));