rocksdb/db
Andrew Kryczka 1ba2b8a568 Add sample_for_compression results to table properties (#8139)
Summary:
Added `TableProperties::{fast,slow}_compression_estimated_data_size`.
These properties are present in block-based tables when
`ColumnFamilyOptions::sample_for_compression > 0` and the necessary
compression library is supported when the file is generated. They
contain estimates of what `TableProperties::data_size` would be if the
"fast"/"slow" compression library had been used instead. One
limitation is we do not record exactly which "fast" (ZSTD or Zlib)
or "slow" (LZ4 or Snappy) compression library produced the result.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8139

Test Plan:
- new unit test
- ran `db_bench` with `sample_for_compression=1`; verified the `data_size` property matches the `{slow,fast}_compression_estimated_data_size` when the same compression type is used for the output file compression and the sampled compression

Reviewed By: riversand963

Differential Revision: D27454338

Pulled By: ajkr

fbshipit-source-id: 9529293de93ddac7f03b2e149d746e9f634abac4
2021-03-31 18:21:50 -07:00
..
blob Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
compaction Compaction should not move data to up level (#8116) 2021-03-29 17:10:42 -07:00
db_impl Fix possible hang issue in ~DBImpl() when flush is scheduled in LOW pool (#8125) 2021-03-30 18:35:20 -07:00
arena_wrapped_db_iter.cc Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
arena_wrapped_db_iter.h Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
builder.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
builder.h Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
c_test.c make:Fix c header prototypes (#7994) 2021-03-09 20:44:23 -08:00
c.cc Use malloc in rocksdb_transaction_get_snapshot (#8114) 2021-03-26 15:51:34 -07:00
column_family_test.cc Using emplace_back replace push_back (#7568) 2021-01-15 16:56:41 -08:00
column_family.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
column_family.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
compact_files_test.cc Compaction should not move data to up level (#8116) 2021-03-29 17:10:42 -07:00
comparator_db_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
convenience.cc sst_dump to reduce number of file reads (#6836) 2020-05-12 18:23:33 -07:00
corruption_test.cc Extend VerifyFileChecksums API for blob files (#7979) 2021-02-22 22:09:22 -08:00
cuckoo_table_db_test.cc Revert "Turn on memtable bloom filter by default. (#6584)" (#7939) 2021-02-06 22:34:30 -08:00
db_basic_test.cc Create a CustomEnv class; Add WinFileSystem; Make LegacyFileSystemWrapper private (#7703) 2021-01-06 10:49:32 -08:00
db_block_cache_test.cc Add further tests to ASSERT_STATUS_CHECKED (2) (#7698) 2020-12-09 21:21:16 -08:00
db_bloom_filter_test.cc Revert "Turn on memtable bloom filter by default. (#6584)" (#7939) 2021-02-06 22:34:30 -08:00
db_compaction_filter_test.cc Add more tests to ASSERT_STATUS_CHECKED (3), API change (#7715) 2021-01-06 14:15:02 -08:00
db_compaction_test.cc Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00
db_dynamic_level_test.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
db_encryption_test.cc Improvements to Env::GetChildren (#7819) 2021-01-09 09:44:34 -08:00
db_filesnapshot.cc Fix checkpoint file deletion race with avoid_unnecessary_blocking_io (#7369) 2020-09-10 22:35:25 -07:00
db_flush_test.cc Fix possible hang issue in ~DBImpl() when flush is scheduled in LOW pool (#8125) 2021-03-30 18:35:20 -07:00
db_info_dumper.cc Fix write-ahead log file size overflow (#7870) 2021-01-19 13:47:48 -08:00
db_info_dumper.h Add a DB Session ID (#6959) 2020-06-15 10:47:02 -07:00
db_inplace_update_test.cc Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
db_io_failure_test.cc Add more tests to ASSERT_STATUS_CHECKED (3), API change (#7715) 2021-01-06 14:15:02 -08:00
db_iter_stress_test.cc Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
db_iter_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
db_iter.cc Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
db_iter.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
db_iterator_test.cc Instantiate tests DBIteratorTestForPinnedData (#8051) 2021-03-12 12:31:29 -08:00
db_kv_checksum_test.cc Integrity protection for live updates to WriteBatch (#7748) 2021-01-29 12:18:58 -08:00
db_log_iter_test.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
db_logical_block_size_cache_test.cc Add further tests to ASSERT_STATUS_CHECKED (2) (#7698) 2020-12-09 21:21:16 -08:00
db_memtable_test.cc Integrity protection for live updates to WriteBatch (#7748) 2021-01-29 12:18:58 -08:00
db_merge_operand_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
db_merge_operator_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
db_options_test.cc Fix handling of Mutable options; Allow DB::SetOptions to update mutable TableFactory Options (#7936) 2021-02-19 10:29:02 -08:00
db_properties_test.cc Add sample_for_compression results to table properties (#8139) 2021-03-31 18:21:50 -07:00
db_range_del_test.cc Add more tests to ASSERT_STATUS_CHECKED (4) (#7718) 2020-12-22 15:09:39 -08:00
db_secondary_test.cc Move a test file to a better location (#8054) 2021-03-15 15:03:27 -07:00
db_sst_test.cc Fix Race condition in db_sst_test (#8092) 2021-03-23 17:38:52 -07:00
db_statistics_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
db_table_properties_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
db_tailing_iter_test.cc Add more tests to ASSERT_STATUS_CHECKED (3), API change (#7715) 2021-01-06 14:15:02 -08:00
db_test2.cc Deflake DBTest2.PartitionedIndexUserToInternalKey on ppc64le (#8044) 2021-03-08 14:47:56 -08:00
db_test_util.cc Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00
db_test_util.h Add new Append API with DataVerificationInfo to Env WritableFile (#8071) 2021-03-19 11:44:13 -07:00
db_test.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
db_universal_compaction_test.cc Revert "Turn on memtable bloom filter by default. (#6584)" (#7939) 2021-02-06 22:34:30 -08:00
db_wal_test.cc Always truncate the latest WAL file on DB Open (#8122) 2021-03-28 10:00:08 -07:00
db_with_timestamp_basic_test.cc Fix a bug in key comparison when index type is kBinarySearchWithFirstKey (#8062) 2021-03-15 17:44:52 -07:00
db_with_timestamp_compaction_test.cc Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
db_write_test.cc Add more tests to ASSERT_STATUS_CHECKED (4) (#7718) 2020-12-22 15:09:39 -08:00
dbformat_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
dbformat.cc Make CompactRange and GetApproximateSizes work with timestamp (#7684) 2020-12-02 13:00:53 -08:00
dbformat.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
deletefile_test.cc Add more tests to ASSERT_STATUS_CHECKED (4) (#7718) 2020-12-22 15:09:39 -08:00
error_handler_fs_test.cc Fix error_handler_fs_test failure due to statistics (#8136) 2021-03-30 21:44:44 -07:00
error_handler.cc Add the statistics and info log for Error handler (#8050) 2021-03-17 22:38:13 -07:00
error_handler.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
event_helpers.cc Add sample_for_compression results to table properties (#8139) 2021-03-31 18:21:50 -07:00
event_helpers.h Pass SST file checksum information through OnTableFileCreated (#7108) 2020-08-25 10:46:11 -07:00
experimental.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
external_sst_file_basic_test.cc Add further tests to ASSERT_STATUS_CHECKED (2) (#7698) 2020-12-09 21:21:16 -08:00
external_sst_file_ingestion_job.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
external_sst_file_ingestion_job.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
external_sst_file_test.cc Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112) 2021-03-25 22:08:58 -07:00
fault_injection_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
file_indexer_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
file_indexer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
file_indexer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
filename_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
flush_job_test.cc Fix testcase failures on windows (#7992) 2021-02-23 14:35:06 -08:00
flush_job.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
flush_job.h Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00
flush_scheduler.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
flush_scheduler.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
forward_iterator_bench.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
forward_iterator.cc Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
forward_iterator.h Properly report IO errors when IndexType::kBinarySearchWithFirstKey is used (#6621) 2020-04-15 17:40:44 -07:00
import_column_family_job.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
import_column_family_job.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
import_column_family_test.cc No elide constructors (#7798) 2020-12-23 16:55:53 -08:00
internal_stats.cc Update compaction statistics to include the amount of data read from blob files (#8022) 2021-03-04 00:43:48 -08:00
internal_stats.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
job_context.h Expose the set of live blob files from Version/VersionSet (#6785) 2020-05-04 15:08:13 -07:00
kv_checksum.h Integrity protection for live updates to WriteBatch (#7748) 2021-01-29 12:18:58 -08:00
listener_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
log_format.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
log_reader.cc Fix kPointInTimeRecovery handling of truncated WAL (#7701) 2020-11-30 18:11:38 -08:00
log_reader.h Real fix for race in backup custom checksum checking (#7309) 2020-08-26 10:39:20 -07:00
log_test.cc Make StringEnv, StringSink, StringSource use FS classes (#7786) 2021-01-04 16:01:01 -08:00
log_writer.cc No elide constructors (#7798) 2020-12-23 16:55:53 -08:00
log_writer.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
logs_with_prep_tracker.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
logs_with_prep_tracker.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
lookup_key.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
malloc_stats.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
malloc_stats.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
manual_compaction_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
memtable_list_test.cc Revert "Turn on memtable bloom filter by default. (#6584)" (#7939) 2021-02-06 22:34:30 -08:00
memtable_list.cc Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
memtable_list.h Add initial blob support to batched MultiGet (#7766) 2020-12-14 13:48:22 -08:00
memtable.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
memtable.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
merge_context.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_helper_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_helper.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
merge_helper.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
merge_operator.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
merge_test.cc MergeHelper::FilterMerge() calling ElapsedNanosSafe() upon exit even … (#7867) 2021-01-21 13:13:02 -08:00
obsolete_files_test.cc Fix a harmless data race affecting two test cases (#8055) 2021-03-12 16:44:35 -08:00
options_file_test.cc No elide constructors (#7798) 2020-12-23 16:55:53 -08:00
output_validator.cc Use NPHash64 in more places (#7632) 2020-11-10 23:42:13 -08:00
output_validator.h Use NPHash64 in more places (#7632) 2020-11-10 23:42:13 -08:00
perf_context_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
periodic_work_scheduler_test.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
periodic_work_scheduler.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
periodic_work_scheduler.h Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
pinned_iterators_manager.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
plain_table_db_test.cc Make StringEnv, StringSink, StringSource use FS classes (#7786) 2021-01-04 16:01:01 -08:00
pre_release_callback.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
prefix_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
range_del_aggregator_bench.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
range_del_aggregator_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_del_aggregator.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
range_del_aggregator.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
range_tombstone_fragmenter_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
range_tombstone_fragmenter.cc Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
range_tombstone_fragmenter.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
read_callback.h Get() with timestamp should respect snapshot (#7227) 2020-08-14 19:20:58 -07:00
repair_test.cc Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00
repair.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
snapshot_checker.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
snapshot_impl.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
snapshot_impl.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
table_cache.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
table_cache.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
table_properties_collector_test.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
table_properties_collector.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
table_properties_collector.h Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
transaction_log_impl.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
transaction_log_impl.h Store FileSystemPtr object that contains FileSystem ptr (#7180) 2020-08-12 17:31:23 -07:00
trim_history_scheduler.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
trim_history_scheduler.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
version_builder_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
version_builder.cc Log sst number in Corruption status (#7767) 2020-12-14 14:07:52 -08:00
version_builder.h make L0 index/filter pinned memory usage predictable (#6911) 2020-06-09 16:51:23 -07:00
version_edit_handler.cc Make secondary instance use ManifestTailer (#7998) 2021-03-10 10:59:44 -08:00
version_edit_handler.h Make secondary instance use ManifestTailer (#7998) 2021-03-10 10:59:44 -08:00
version_edit_test.cc Make it able to ignore WAL related VersionEdits in older versions (#7873) 2021-01-19 19:27:53 -08:00
version_edit.cc Make blob related VersionEdit tags unignorable (#7886) 2021-01-20 20:29:04 -08:00
version_edit.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
version_set_test.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
version_set.cc Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
version_set.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
wal_edit_test.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit.h Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_manager_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
wal_manager.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
wal_manager.h Store FileSystemPtr object that contains FileSystem ptr (#7180) 2020-08-12 17:31:23 -07:00
write_batch_base.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_batch_internal.h Integrity protection for live updates to WriteBatch (#7748) 2021-01-29 12:18:58 -08:00
write_batch_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
write_batch.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
write_callback_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
write_callback.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
write_controller_test.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller.h Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_thread.cc Integrity protection for live updates to WriteBatch (#7748) 2021-01-29 12:18:58 -08:00
write_thread.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00