Update documentation
Summary: Added more options for compaction settings + thread pools. Please check if thread pool description is correct. Test Plan: - Reviewers: dhruba Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D14043
This commit is contained in:
parent
9df2b217e9
commit
c3dda7276c
131
doc/index.html
131
doc/index.html
@ -387,7 +387,8 @@ of point reads of small values may wish to switch to a smaller block
|
|||||||
size if performance measurements indicate an improvement. There isn't
|
size if performance measurements indicate an improvement. There isn't
|
||||||
much benefit in using blocks smaller than one kilobyte, or larger than
|
much benefit in using blocks smaller than one kilobyte, or larger than
|
||||||
a few megabytes. Also note that compression will be more effective
|
a few megabytes. Also note that compression will be more effective
|
||||||
with larger block sizes.
|
with larger block sizes. To change block size parameter, use
|
||||||
|
<code>Options::block_size</code>.
|
||||||
<p>
|
<p>
|
||||||
<h2>Write buffer</h2>
|
<h2>Write buffer</h2>
|
||||||
<p>
|
<p>
|
||||||
@ -434,7 +435,7 @@ filesystem and each file stores a sequence of compressed blocks. If
|
|||||||
used uncompressed block contents. If <code>options.block_cache_compressed</code>
|
used uncompressed block contents. If <code>options.block_cache_compressed</code>
|
||||||
is non-NULL, it is used to cache frequently used compressed blocks. Compressed
|
is non-NULL, it is used to cache frequently used compressed blocks. Compressed
|
||||||
cache is an alternative to OS cache, which also caches compressed blocks. If
|
cache is an alternative to OS cache, which also caches compressed blocks. If
|
||||||
compressed cache is used, you should disable OS cache by setting
|
compressed cache is used, the OS cache will be disabled automatically by setting
|
||||||
<code>options.allow_os_buffer</code> to false.
|
<code>options.allow_os_buffer</code> to false.
|
||||||
<p>
|
<p>
|
||||||
<pre>
|
<pre>
|
||||||
@ -588,7 +589,7 @@ Here we give overview of the options that impact behavior of Compactions:
|
|||||||
<ul>
|
<ul>
|
||||||
<p>
|
<p>
|
||||||
<li><code>Options::compaction_style</code> - RocksDB currently supports two
|
<li><code>Options::compaction_style</code> - RocksDB currently supports two
|
||||||
compaction algorithms - Compaction style and Level style. This option switches
|
compaction algorithms - Universal style and Level style. This option switches
|
||||||
between the two. Can be kCompactionStyleUniversal or kCompactionStyleLevel.
|
between the two. Can be kCompactionStyleUniversal or kCompactionStyleLevel.
|
||||||
If this is kCompactionStyleUniversal, then you can configure universal style
|
If this is kCompactionStyleUniversal, then you can configure universal style
|
||||||
parameters with <code>Options::compaction_options_universal</code>.
|
parameters with <code>Options::compaction_options_universal</code>.
|
||||||
@ -608,16 +609,126 @@ key-value during background compaction.
|
|||||||
</ul>
|
</ul>
|
||||||
<p>
|
<p>
|
||||||
Other options impacting performance of compactions and when they get triggered
|
Other options impacting performance of compactions and when they get triggered
|
||||||
are: <code>access_hint_on_compaction_start</code>,
|
are:
|
||||||
<code>level0_file_num_compaction_trigger</code>,
|
<ul>
|
||||||
<code>max_mem_compaction_level</code>, <code>target_file_size_base</code>,
|
<p>
|
||||||
<code>target_file_size_multiplier</code>,
|
<li> <code>Options::access_hint_on_compaction_start</code> - Specify the file access
|
||||||
<code>expanded_compaction_factor</code>, <code>source_compaction_factor</code>,
|
pattern once a compaction is started. It will be applied to all input files of a compaction. Default: NORMAL
|
||||||
<code>max_grandparent_overlap_factor</code>,
|
<p>
|
||||||
<code>disable_seek_compaction</code>, <code>max_background_compactions</code>.
|
<li> <code>Options::level0_file_num_compaction_trigger</code> - Number of files to trigger level-0 compaction.
|
||||||
|
A negative value means that level-0 compaction will not be triggered by number of files at all.
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::max_mem_compaction_level</code> - Maximum level to which a new compacted memtable is pushed if it
|
||||||
|
does not create overlap. We try to push to level 2 to avoid the relatively expensive level 0=>1 compactions and to avoid some
|
||||||
|
expensive manifest file operations. We do not push all the way to the largest level since that can generate a lot of wasted disk
|
||||||
|
space if the same key space is being repeatedly overwritten.
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::target_file_size_base</code> and <code>Options::target_file_size_multiplier</code> -
|
||||||
|
Target file size for compaction. target_file_size_base is per-file size for level-1.
|
||||||
|
Target file size for level L can be calculated by target_file_size_base * (target_file_size_multiplier ^ (L-1))
|
||||||
|
For example, if target_file_size_base is 2MB and target_file_size_multiplier is 10, then each file on level-1 will
|
||||||
|
be 2MB, and each file on level 2 will be 20MB, and each file on level-3 will be 200MB. Default target_file_size_base is 2MB
|
||||||
|
and default target_file_size_multiplier is 1.
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::expanded_compaction_factor</code> - Maximum number of bytes in all compacted files. We avoid expanding
|
||||||
|
the lower level file set of a compaction if it would make the total compaction cover more than
|
||||||
|
(expanded_compaction_factor * targetFileSizeLevel()) many bytes.
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::source_compaction_factor</code> - Maximum number of bytes in all source files to be compacted in a
|
||||||
|
single compaction run. We avoid picking too many files in the source level so that we do not exceed the total source bytes
|
||||||
|
for compaction to exceed (source_compaction_factor * targetFileSizeLevel()) many bytes.
|
||||||
|
Default:1, i.e. pick maxfilesize amount of data as the source of a compaction.
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::max_grandparent_overlap_factor</code> - Control maximum bytes of overlaps in grandparent (i.e., level+2) before we
|
||||||
|
stop building a single file in a level->level+1 compaction.
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::disable_seek_compaction</code> - Disable compaction triggered by seek.
|
||||||
|
With bloomfilter and fast storage, a miss on one level is very cheap if the file handle is cached in table cache
|
||||||
|
(which is true if max_open_files is large).
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::max_background_compactions</code> - Maximum number of concurrent background jobs, submitted to
|
||||||
|
the default LOW priority thread pool
|
||||||
|
</ul>
|
||||||
|
|
||||||
<p>
|
<p>
|
||||||
You can learn more about all of those options in <code>rocksdb/options.h</code>
|
You can learn more about all of those options in <code>rocksdb/options.h</code>
|
||||||
|
|
||||||
|
<h2> Universal style compaction specific settings</h2>
|
||||||
|
<p>
|
||||||
|
If you're using Universal style compaction, there is an object <code>CompactionOptionsUniversal</code>
|
||||||
|
that hold all the different options for that compaction. The exact definition is in
|
||||||
|
<code>rocksdb/universal_compaction.h</code> and you can set it in <code>Options::compaction_options_universal</code>.
|
||||||
|
Here we give short overview of options in <code>CompactionOptionsUniversal</code>:
|
||||||
|
<ul>
|
||||||
|
<p>
|
||||||
|
<li> <code>CompactionOptionsUniversal::size_ratio</code> - Percentage flexibilty while comparing file size. If the candidate file(s)
|
||||||
|
size is 1% smaller than the next file's size, then include next file into
|
||||||
|
this candidate set. Default: 1
|
||||||
|
<p>
|
||||||
|
<li> <code>CompactionOptionsUniversal::min_merge_width</code> - The minimum number of files in a single compaction run. Default: 2
|
||||||
|
<p>
|
||||||
|
<li> <code>CompactionOptionsUniversal::max_merge_width</code> - The maximum number of files in a single compaction run. Default: UINT_MAX
|
||||||
|
<p>
|
||||||
|
<li> <code>CompactionOptionsUniversal::max_size_amplification_percent</code> - The size amplification is defined as the amount (in percentage) of
|
||||||
|
additional storage needed to store a single byte of data in the database. For example, a size amplification of 2% means that a database that
|
||||||
|
contains 100 bytes of user-data may occupy upto 102 bytes of physical storage. By this definition, a fully compacted database has
|
||||||
|
a size amplification of 0%. Rocksdb uses the following heuristic to calculate size amplification: it assumes that all files excluding
|
||||||
|
the earliest file contribute to the size amplification. Default: 200, which means that a 100 byte database could require upto
|
||||||
|
300 bytes of storage.
|
||||||
|
<p>
|
||||||
|
<li> <code>CompactionOptionsUniversal::compression_size_percent</code> - If this option is set to be -1 (the default value), all the output files
|
||||||
|
will follow compression type specified. If this option is not negative, we will try to make sure compressed
|
||||||
|
size is just above this value. In normal cases, at least this percentage
|
||||||
|
of data will be compressed.
|
||||||
|
When we are compacting to a new file, here is the criteria whether
|
||||||
|
it needs to be compressed: assuming here are the list of files sorted
|
||||||
|
by generation time: [ A1...An B1...Bm C1...Ct ],
|
||||||
|
where A1 is the newest and Ct is the oldest, and we are going to compact
|
||||||
|
B1...Bm, we calculate the total size of all the files as total_size, as
|
||||||
|
well as the total size of C1...Ct as total_C, the compaction output file
|
||||||
|
will be compressed iff total_C / total_size < this percentage
|
||||||
|
<p>
|
||||||
|
<li> <code>CompactionOptionsUniversal::stop_style</code> - The algorithm used to stop picking files into a single compaction run.
|
||||||
|
Can be kCompactionStopStyleSimilarSize (pick files of similar size) or kCompactionStopStyleTotalSize (total size of picked files > next file).
|
||||||
|
Default: kCompactionStopStyleTotalSize
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h1>Thread pools</h1>
|
||||||
|
<p>
|
||||||
|
A thread pool is associated with Env environment object. The client has to create a thread pool by setting the number of background
|
||||||
|
threads using method <code>Env::SetBackgroundThreads()</code> defined in <code>rocksdb/env.h</code>.
|
||||||
|
We use the thread pool for compactions and memtable flushes.
|
||||||
|
Since memtable flushes are in critical code path (stalling memtable flush can stall writes, increasing p99), we suggest
|
||||||
|
having two thread pools - with priorities HIGH and LOW. Memtable flushes can be set up to be scheduled on HIGH thread pool.
|
||||||
|
There are two options available for configuration of background compactions and flushes:
|
||||||
|
<ul>
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::max_background_compactions</code> - Maximum number of concurrent background jobs,
|
||||||
|
submitted to the default LOW priority thread pool
|
||||||
|
<p>
|
||||||
|
<li> <code>Options::max_background_flushes</code> - Maximum number of concurrent background memtable flush jobs, submitted to
|
||||||
|
the HIGH priority thread pool. By default, all background jobs (major compaction and memtable flush) go
|
||||||
|
to the LOW priority pool. If this option is set to a positive number, memtable flush jobs will be submitted to the HIGH priority pool.
|
||||||
|
It is important when the same Env is shared by multiple db instances. Without a separate pool, long running major compaction jobs could
|
||||||
|
potentially block memtable flush jobs of other db instances, leading to unnecessary Put stalls.
|
||||||
|
</ul>
|
||||||
|
<p>
|
||||||
|
<pre>
|
||||||
|
#include "rocksdb/env.h"
|
||||||
|
#include "rocksdb/db.h"
|
||||||
|
|
||||||
|
auto env = rocksdb::Env::Default();
|
||||||
|
env->SetBackgroundThreads(2, rocksdb::Env::LOW);
|
||||||
|
env->SetBackgroundThreads(1, rocksdb::Env::HIGH);
|
||||||
|
rocksdb::DB* db;
|
||||||
|
rocksdb::Options options;
|
||||||
|
options.env = env;
|
||||||
|
options.max_background_compactions = 2;
|
||||||
|
options.max_background_flushes = 1;
|
||||||
|
rocksdb::Status status = rocksdb::DB::Open(options, "/tmp/testdb", &db);
|
||||||
|
assert(status.ok());
|
||||||
|
...
|
||||||
|
</pre>
|
||||||
<h1>Approximate Sizes</h1>
|
<h1>Approximate Sizes</h1>
|
||||||
<p>
|
<p>
|
||||||
The <code>GetApproximateSizes</code> method can used to get the approximate
|
The <code>GetApproximateSizes</code> method can used to get the approximate
|
||||||
|
Loading…
Reference in New Issue
Block a user