rocksdb/tools at 5af9446ee6becbf71061074d9dd464a4b05b4afa - rocksdb - iGNUranza Git

andreacavalli/rocksdb

History

Andrew Kryczka 62f70f6d14 Reduce scope of compression dictionary to single SST (#4952 )

Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.

So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:

- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952

Differential Revision: D13967980

Pulled By: ajkr

fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f

2019-02-11 19:47:32 -08:00

..

Rules Advisor: some fixes to support fetching stats from ODS (#4223 )

2018-08-02 15:42:42 -07:00

fix gflags namespace

2017-12-01 10:42:05 -08:00

Fix /bin/bash shebangs

2017-08-03 15:56:46 -07:00

auto_sanity_test.sh

Suppress lint in old files

2018-01-29 12:56:42 -08:00

benchmark_leveldb.sh

Suppress lint in old files

2018-01-29 12:56:42 -08:00

benchmark.sh

Updated benchmark script (#4134 )

2018-12-17 16:34:30 -08:00

blob_dump.cc

comment unused parameters to turn on -Wunused-parameter flag

2018-04-12 17:59:16 -07:00

check_format_compatible.sh

Include newer RocksDB versions in compat test (#4634 )

2018-11-06 14:25:39 -08:00

CMakeLists.txt

cmake support for linux and osx (#1358 )

2016-09-28 11:53:15 -07:00

db_bench_tool_test.cc

Update all unique/shared_ptr instances to be qualified with namespace std (#4638 )

2018-11-09 11:19:58 -08:00

db_bench_tool.cc

Remove cuckoo hash memtable (#4953 )

2019-02-07 16:15:27 -08:00

db_bench.cc

Change RocksDB License

2017-07-15 16:11:23 -07:00

db_crashtest.py

Fix compression_zstd_max_train_bytes coverage in stress test (#4957 )

2019-02-11 14:56:39 -08:00

db_repl_stress.cc

Update all unique/shared_ptr instances to be qualified with namespace std (#4638 )

2018-11-09 11:19:58 -08:00

db_sanity_test.cc

Change RocksDB License

2017-07-15 16:11:23 -07:00

db_stress.cc

Free memory after use

2019-01-08 17:19:09 -08:00

dbench_monitor

Fix /bin/bash shebangs

2017-08-03 15:56:46 -07:00

Dockerfile

adding docker build script and dockerfile

2015-05-22 16:03:39 -07:00

generate_random_db.sh

Fix /bin/bash shebangs

2017-08-03 15:56:46 -07:00

ingest_external_sst.sh

Add compatibility test of SST ingestion (#4310 )

2018-08-24 14:27:43 -07:00

ldb_cmd_impl.h

Add SST ingestion to ldb (#4205 )

2018-08-09 14:29:11 -07:00

ldb_cmd_test.cc

tools: use provided options instead of the default (#4839 )

2019-01-03 11:23:49 -08:00

ldb_cmd.cc

With ldb --try_load_options and wal_dir doesn't exist, ignore it (#4875 )

2019-01-11 16:48:32 -08:00

ldb_test.py

Add SST ingestion to ldb (#4205 )

2018-08-09 14:29:11 -07:00

ldb_tool.cc

Add SST ingestion to ldb (#4205 )

2018-08-09 14:29:11 -07:00

ldb.cc

comment unused parameters to turn on -Wunused-parameter flag

2018-04-12 17:59:16 -07:00

pflag

Fix /bin/bash shebangs

2017-08-03 15:56:46 -07:00

reduce_levels_test.cc

Per-thread unique test db names (#4135 )

2018-07-13 17:27:39 -07:00

regression_test.sh

Suppress lint in old files

2018-01-29 12:56:42 -08:00

report_lite_binary_size.sh

Legocastle job to report lite build binary size to scuba

2018-02-15 17:27:24 -08:00

rocksdb_dump_test.sh

Suppress lint in old files

2018-01-29 12:56:42 -08:00

run_flash_bench.sh

Fix /bin/bash shebangs

2017-08-03 15:56:46 -07:00

run_leveldb.sh

Fix /bin/bash shebangs

2017-08-03 15:56:46 -07:00

sample-dump.dmp

First version of rocksdb_dump and rocksdb_undump.

2015-06-19 16:24:36 -07:00

sst_dump_test.cc

Reduce scope of compression dictionary to single SST (#4952 )

2019-02-11 19:47:32 -08:00

sst_dump_tool_imp.h

tools: use provided options instead of the default (#4839 )

2019-01-03 11:23:49 -08:00

sst_dump_tool.cc

Reduce scope of compression dictionary to single SST (#4952 )

2019-02-11 19:47:32 -08:00

sst_dump.cc

comment unused parameters to turn on -Wunused-parameter flag

2018-04-12 17:59:16 -07:00

trace_analyzer_test.cc

Add the unit test of Iterator to trace_analyzer_test (#4282 )

2018-08-23 17:28:32 -07:00

trace_analyzer_tool.cc

Add unique key number changing statistics to Trace_analyzer (#4646 )

2018-11-12 08:26:50 -08:00

trace_analyzer_tool.h

Add unique key number changing statistics to Trace_analyzer (#4646 )

2018-11-12 08:26:50 -08:00

trace_analyzer.cc

RocksDB Trace Analyzer (#4091 )

2018-08-13 11:44:02 -07:00

verify_random_db.sh

tools/check_format_compatible.sh to cover forward option reading too (#3994 )

2018-06-15 11:12:29 -07:00

write_external_sst.sh

correct mistyped msg. (#4341 )

2018-09-13 14:57:38 -07:00

write_stress_runner.py

Suppress lint in old files

2018-01-29 12:56:42 -08:00

write_stress.cc

Compilation fixes for powerpc build, -Wparentheses-equality error and missing header guards

2018-02-09 14:12:43 -08:00