2016-02-10 00:12:00 +01:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-16 01:03:42 +02:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
2014-11-24 19:04:16 +01:00
|
|
|
#pragma once
|
2015-10-15 02:08:28 +02:00
|
|
|
#ifndef ROCKSDB_LITE
|
2014-11-24 19:04:16 +01:00
|
|
|
|
2016-01-13 03:20:06 +01:00
|
|
|
#include <memory>
|
2014-11-24 19:04:16 +01:00
|
|
|
#include <string>
|
New backup meta schema, with file temperatures (#9660)
Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".
To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.
Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)
* Add temperature hint support to SstFileDumper (used by BackupEngine)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660
Test Plan:
related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.
Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.
Reviewed By: ajkr
Differential Revision: D34686968
Pulled By: pdillinger
fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
2022-03-18 19:06:17 +01:00
|
|
|
|
2014-11-24 19:04:16 +01:00
|
|
|
#include "db/dbformat.h"
|
2019-09-16 19:31:27 +02:00
|
|
|
#include "file/writable_file_writer.h"
|
2017-04-06 04:02:00 +02:00
|
|
|
#include "options/cf_options.h"
|
New backup meta schema, with file temperatures (#9660)
Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".
To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.
Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)
* Add temperature hint support to SstFileDumper (used by BackupEngine)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660
Test Plan:
related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.
Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.
Reviewed By: ajkr
Differential Revision: D34686968
Pulled By: pdillinger
fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
2022-03-18 19:06:17 +01:00
|
|
|
#include "rocksdb/advanced_options.h"
|
2014-11-24 19:04:16 +01:00
|
|
|
|
2020-02-20 21:07:53 +01:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2014-11-24 19:04:16 +01:00
|
|
|
|
2018-11-27 21:59:27 +01:00
|
|
|
class SstFileDumper {
|
2014-11-24 19:04:16 +01:00
|
|
|
public:
|
2019-01-03 20:11:09 +01:00
|
|
|
explicit SstFileDumper(const Options& options, const std::string& file_name,
|
New backup meta schema, with file temperatures (#9660)
Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".
To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.
Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)
* Add temperature hint support to SstFileDumper (used by BackupEngine)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660
Test Plan:
related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.
Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.
Reviewed By: ajkr
Differential Revision: D34686968
Pulled By: pdillinger
fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
2022-03-18 19:06:17 +01:00
|
|
|
Temperature file_temp, size_t readahead_size,
|
|
|
|
bool verify_checksum, bool output_hex,
|
|
|
|
bool decode_blob_index,
|
2020-07-09 17:36:41 +02:00
|
|
|
const EnvOptions& soptions = EnvOptions(),
|
2020-06-25 04:30:15 +02:00
|
|
|
bool silent = false);
|
2014-11-24 19:04:16 +01:00
|
|
|
|
|
|
|
Status ReadSequential(bool print_kv, uint64_t read_num, bool has_from,
|
|
|
|
const std::string& from_key, bool has_to,
|
2017-03-13 18:24:52 +01:00
|
|
|
const std::string& to_key,
|
|
|
|
bool use_from_as_prefix = false);
|
2014-11-24 19:04:16 +01:00
|
|
|
|
|
|
|
Status ReadTableProperties(
|
|
|
|
std::shared_ptr<const TableProperties>* table_properties);
|
|
|
|
uint64_t GetReadNumber() { return read_num_; }
|
|
|
|
TableProperties* GetInitTableProperties() { return table_properties_.get(); }
|
|
|
|
|
2017-08-10 00:49:40 +02:00
|
|
|
Status VerifyChecksum();
|
2014-12-23 22:24:07 +01:00
|
|
|
Status DumpTable(const std::string& out_filename);
|
|
|
|
Status getStatus() { return init_result_; }
|
|
|
|
|
2020-09-05 04:25:20 +02:00
|
|
|
Status ShowAllCompressionSizes(
|
2017-08-12 00:49:17 +02:00
|
|
|
size_t block_size,
|
|
|
|
const std::vector<std::pair<CompressionType, const char*>>&
|
2020-09-04 00:48:29 +02:00
|
|
|
compression_types,
|
|
|
|
int32_t compress_level_from, int32_t compress_level_to,
|
Limit buffering for collecting samples for compression dictionary (#7970)
Summary:
For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
Related changes include:
- Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
- Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
- Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
Test Plan:
- updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
- looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
Reviewed By: pdillinger
Differential Revision: D26467994
Pulled By: ajkr
fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
2021-02-19 23:06:59 +01:00
|
|
|
uint32_t max_dict_bytes, uint32_t zstd_max_train_bytes,
|
|
|
|
uint64_t max_dict_buffer_bytes);
|
2020-04-27 21:33:49 +02:00
|
|
|
|
2020-09-05 04:25:20 +02:00
|
|
|
Status ShowCompressionSize(size_t block_size, CompressionType compress_type,
|
|
|
|
const CompressionOptions& compress_opt);
|
2015-07-24 02:05:33 +02:00
|
|
|
|
2014-11-24 19:04:16 +01:00
|
|
|
private:
|
2015-02-26 01:34:26 +01:00
|
|
|
// Get the TableReader implementation for the sst file
|
|
|
|
Status GetTableReader(const std::string& file_path);
|
2014-11-24 19:04:16 +01:00
|
|
|
Status ReadTableProperties(uint64_t table_magic_number,
|
2020-05-13 03:21:32 +02:00
|
|
|
RandomAccessFileReader* file, uint64_t file_size,
|
|
|
|
FilePrefetchBuffer* prefetch_buffer);
|
2015-07-24 02:05:33 +02:00
|
|
|
|
2020-09-05 04:25:20 +02:00
|
|
|
Status CalculateCompressedTableSize(const TableBuilderOptions& tb_options,
|
|
|
|
size_t block_size,
|
|
|
|
uint64_t* num_data_blocks,
|
|
|
|
uint64_t* compressed_table_size);
|
2015-07-24 02:05:33 +02:00
|
|
|
|
2014-11-24 19:04:16 +01:00
|
|
|
Status SetTableOptionsByMagicNumber(uint64_t table_magic_number);
|
|
|
|
Status SetOldTableOptions();
|
|
|
|
|
2015-02-26 01:34:26 +01:00
|
|
|
// Helper function to call the factory with settings specific to the
|
|
|
|
// factory implementation
|
2021-05-05 22:59:21 +02:00
|
|
|
Status NewTableReader(const ImmutableOptions& ioptions,
|
2015-02-26 01:34:26 +01:00
|
|
|
const EnvOptions& soptions,
|
|
|
|
const InternalKeyComparator& internal_comparator,
|
Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.
Test Plan: Run all existing unit tests.
Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D42321
2015-07-18 01:16:11 +02:00
|
|
|
uint64_t file_size,
|
2018-11-09 20:17:34 +01:00
|
|
|
std::unique_ptr<TableReader>* table_reader);
|
2015-02-26 01:34:26 +01:00
|
|
|
|
2014-11-24 19:04:16 +01:00
|
|
|
std::string file_name_;
|
|
|
|
uint64_t read_num_;
|
New backup meta schema, with file temperatures (#9660)
Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".
To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.
Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)
* Add temperature hint support to SstFileDumper (used by BackupEngine)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660
Test Plan:
related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.
Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.
Reviewed By: ajkr
Differential Revision: D34686968
Pulled By: pdillinger
fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
2022-03-18 19:06:17 +01:00
|
|
|
Temperature file_temp_;
|
2014-11-24 19:04:16 +01:00
|
|
|
bool output_hex_;
|
2019-10-18 04:35:22 +02:00
|
|
|
bool decode_blob_index_;
|
2014-11-24 19:04:16 +01:00
|
|
|
EnvOptions soptions_;
|
2020-06-25 04:30:15 +02:00
|
|
|
// less verbose in stdout/stderr
|
|
|
|
bool silent_;
|
2014-11-24 19:04:16 +01:00
|
|
|
|
|
|
|
// options_ and internal_comparator_ will also be used in
|
|
|
|
// ReadSequential internally (specifically, seek-related operations)
|
|
|
|
Options options_;
|
Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
2016-04-01 19:42:39 +02:00
|
|
|
|
|
|
|
Status init_result_;
|
2018-11-09 20:17:34 +01:00
|
|
|
std::unique_ptr<TableReader> table_reader_;
|
|
|
|
std::unique_ptr<RandomAccessFileReader> file_;
|
Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
2016-04-01 19:42:39 +02:00
|
|
|
|
2021-05-05 22:59:21 +02:00
|
|
|
const ImmutableOptions ioptions_;
|
2018-05-21 23:33:55 +02:00
|
|
|
const MutableCFOptions moptions_;
|
2020-05-13 03:21:32 +02:00
|
|
|
ReadOptions read_options_;
|
2014-11-24 19:04:16 +01:00
|
|
|
InternalKeyComparator internal_comparator_;
|
2018-11-09 20:17:34 +01:00
|
|
|
std::unique_ptr<TableProperties> table_properties_;
|
2014-11-24 19:04:16 +01:00
|
|
|
};
|
|
|
|
|
2020-02-20 21:07:53 +01:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|
2014-11-24 19:04:16 +01:00
|
|
|
|
|
|
|
#endif // ROCKSDB_LITE
|