Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".
To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.
Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)
* Add temperature hint support to SstFileDumper (used by BackupEngine)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660
Test Plan:
related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.
Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.
Reviewed By: ajkr
Differential Revision: D34686968
Pulled By: pdillinger
fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
Summary:
The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions. This change cleans that up by introducing an ImmutableOptions struct. Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form).
Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR. All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes.
Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262
Reviewed By: pdillinger
Differential Revision: D28226540
Pulled By: mrambacher
fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf
Summary:
For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
Related changes include:
- Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
- Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
- Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
Test Plan:
- updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
- looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
Reviewed By: pdillinger
Differential Revision: D26467994
Pulled By: ajkr
fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7315
Test Plan:
`ASSERT_STATUS_CHECKED=1 make sst_dump_test && ./sst_dump_test`
And manually run `./sst_dump --file=*.sst` before and after the change.
Reviewed By: pdillinger
Differential Revision: D23361669
Pulled By: jay-zhuang
fbshipit-source-id: 5bf51a2a90ee35c8c679e5f604732ec2aef5949a
Summary:
In block-based table builder, the cut-over from buffered to unbuffered
mode involves sampling the buffered blocks and generating a dictionary.
There was a bug where `SstFileWriter` passed zero as the `target_file_size`
causing the cutover to happen immediately, so there were no samples
available for generating the dictionary.
This PR changes the meaning of `target_file_size == 0` to mean buffer
the whole file before cutting over. It also adds dictionary compression
support to `sst_dump --command=recompress` for easy evaluation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7323
Reviewed By: cheng-chang
Differential Revision: D23412158
Pulled By: ajkr
fbshipit-source-id: 3b232050e70ef3c2ee85a4b5f6fadb139c569873
Summary:
Although PR https://github.com/facebook/rocksdb/issues/7032 fixes the construction of the `SstFileDumper` in `GetFileDbIdentities` by setting a proper `Env` of the `Options` passed in the constructor, the file path was not corrected accordingly. This actually disables backup engine to use db session ids in the file names since the `db_session_id` is always empty.
Now it is fixed by setting the correct path in the construction of `SstFileDumper`. Furthermore, to preserve the Direct IO property that backup engine already has, parameter `EnvOptions` is added to `GetFileDbIdentities` and `SstFileDumper`.
The `BackupUsingDirectIO` test is updated accordingly.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7104
Test Plan: backupable_db_test and some manual tests.
Reviewed By: ajkr
Differential Revision: D22443245
Pulled By: gg814
fbshipit-source-id: 056a9bb8b82947c5e73d7c3fbb62bfe23af5e562
Summary:
`BackupableDBOptions::new_naming_for_backup_files` is added. This option is false by default. When it is true, backup table filenames under directory shared_checksum are of the form `<file_number>_<crc32c>_<db_session_id>.sst`.
Note that when this option is true, it comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.
Three new test cases are added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6997
Test Plan: Passed make check.
Reviewed By: ajkr
Differential Revision: D22098895
Pulled By: gg814
fbshipit-source-id: a1d9145e7fe562d71cde7ac995e17cb24fd42e76