Update documentation

Summary:
Added more options for compaction settings + thread pools.

Please check if thread pool description is correct.

Test Plan: -

Reviewers: dhruba

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14043
This commit is contained in:
Igor Canadi 2013-11-12 16:09:57 -08:00
parent 9df2b217e9
commit c3dda7276c

View File

@ -387,7 +387,8 @@ of point reads of small values may wish to switch to a smaller block
size if performance measurements indicate an improvement. There isn't size if performance measurements indicate an improvement. There isn't
much benefit in using blocks smaller than one kilobyte, or larger than much benefit in using blocks smaller than one kilobyte, or larger than
a few megabytes. Also note that compression will be more effective a few megabytes. Also note that compression will be more effective
with larger block sizes. with larger block sizes. To change block size parameter, use
<code>Options::block_size</code>.
<p> <p>
<h2>Write buffer</h2> <h2>Write buffer</h2>
<p> <p>
@ -434,7 +435,7 @@ filesystem and each file stores a sequence of compressed blocks. If
used uncompressed block contents. If <code>options.block_cache_compressed</code> used uncompressed block contents. If <code>options.block_cache_compressed</code>
is non-NULL, it is used to cache frequently used compressed blocks. Compressed is non-NULL, it is used to cache frequently used compressed blocks. Compressed
cache is an alternative to OS cache, which also caches compressed blocks. If cache is an alternative to OS cache, which also caches compressed blocks. If
compressed cache is used, you should disable OS cache by setting compressed cache is used, the OS cache will be disabled automatically by setting
<code>options.allow_os_buffer</code> to false. <code>options.allow_os_buffer</code> to false.
<p> <p>
<pre> <pre>
@ -588,7 +589,7 @@ Here we give overview of the options that impact behavior of Compactions:
<ul> <ul>
<p> <p>
<li><code>Options::compaction_style</code> - RocksDB currently supports two <li><code>Options::compaction_style</code> - RocksDB currently supports two
compaction algorithms - Compaction style and Level style. This option switches compaction algorithms - Universal style and Level style. This option switches
between the two. Can be kCompactionStyleUniversal or kCompactionStyleLevel. between the two. Can be kCompactionStyleUniversal or kCompactionStyleLevel.
If this is kCompactionStyleUniversal, then you can configure universal style If this is kCompactionStyleUniversal, then you can configure universal style
parameters with <code>Options::compaction_options_universal</code>. parameters with <code>Options::compaction_options_universal</code>.
@ -608,16 +609,126 @@ key-value during background compaction.
</ul> </ul>
<p> <p>
Other options impacting performance of compactions and when they get triggered Other options impacting performance of compactions and when they get triggered
are: <code>access_hint_on_compaction_start</code>, are:
<code>level0_file_num_compaction_trigger</code>, <ul>
<code>max_mem_compaction_level</code>, <code>target_file_size_base</code>, <p>
<code>target_file_size_multiplier</code>, <li> <code>Options::access_hint_on_compaction_start</code> - Specify the file access
<code>expanded_compaction_factor</code>, <code>source_compaction_factor</code>, pattern once a compaction is started. It will be applied to all input files of a compaction. Default: NORMAL
<code>max_grandparent_overlap_factor</code>, <p>
<code>disable_seek_compaction</code>, <code>max_background_compactions</code>. <li> <code>Options::level0_file_num_compaction_trigger</code> - Number of files to trigger level-0 compaction.
A negative value means that level-0 compaction will not be triggered by number of files at all.
<p>
<li> <code>Options::max_mem_compaction_level</code> - Maximum level to which a new compacted memtable is pushed if it
does not create overlap. We try to push to level 2 to avoid the relatively expensive level 0=>1 compactions and to avoid some
expensive manifest file operations. We do not push all the way to the largest level since that can generate a lot of wasted disk
space if the same key space is being repeatedly overwritten.
<p>
<li> <code>Options::target_file_size_base</code> and <code>Options::target_file_size_multiplier</code> -
Target file size for compaction. target_file_size_base is per-file size for level-1.
Target file size for level L can be calculated by target_file_size_base * (target_file_size_multiplier ^ (L-1))
For example, if target_file_size_base is 2MB and target_file_size_multiplier is 10, then each file on level-1 will
be 2MB, and each file on level 2 will be 20MB, and each file on level-3 will be 200MB. Default target_file_size_base is 2MB
and default target_file_size_multiplier is 1.
<p>
<li> <code>Options::expanded_compaction_factor</code> - Maximum number of bytes in all compacted files. We avoid expanding
the lower level file set of a compaction if it would make the total compaction cover more than
(expanded_compaction_factor * targetFileSizeLevel()) many bytes.
<p>
<li> <code>Options::source_compaction_factor</code> - Maximum number of bytes in all source files to be compacted in a
single compaction run. We avoid picking too many files in the source level so that we do not exceed the total source bytes
for compaction to exceed (source_compaction_factor * targetFileSizeLevel()) many bytes.
Default:1, i.e. pick maxfilesize amount of data as the source of a compaction.
<p>
<li> <code>Options::max_grandparent_overlap_factor</code> - Control maximum bytes of overlaps in grandparent (i.e., level+2) before we
stop building a single file in a level->level+1 compaction.
<p>
<li> <code>Options::disable_seek_compaction</code> - Disable compaction triggered by seek.
With bloomfilter and fast storage, a miss on one level is very cheap if the file handle is cached in table cache
(which is true if max_open_files is large).
<p>
<li> <code>Options::max_background_compactions</code> - Maximum number of concurrent background jobs, submitted to
the default LOW priority thread pool
</ul>
<p> <p>
You can learn more about all of those options in <code>rocksdb/options.h</code> You can learn more about all of those options in <code>rocksdb/options.h</code>
<h2> Universal style compaction specific settings</h2>
<p>
If you're using Universal style compaction, there is an object <code>CompactionOptionsUniversal</code>
that hold all the different options for that compaction. The exact definition is in
<code>rocksdb/universal_compaction.h</code> and you can set it in <code>Options::compaction_options_universal</code>.
Here we give short overview of options in <code>CompactionOptionsUniversal</code>:
<ul>
<p>
<li> <code>CompactionOptionsUniversal::size_ratio</code> - Percentage flexibilty while comparing file size. If the candidate file(s)
size is 1% smaller than the next file's size, then include next file into
this candidate set. Default: 1
<p>
<li> <code>CompactionOptionsUniversal::min_merge_width</code> - The minimum number of files in a single compaction run. Default: 2
<p>
<li> <code>CompactionOptionsUniversal::max_merge_width</code> - The maximum number of files in a single compaction run. Default: UINT_MAX
<p>
<li> <code>CompactionOptionsUniversal::max_size_amplification_percent</code> - The size amplification is defined as the amount (in percentage) of
additional storage needed to store a single byte of data in the database. For example, a size amplification of 2% means that a database that
contains 100 bytes of user-data may occupy upto 102 bytes of physical storage. By this definition, a fully compacted database has
a size amplification of 0%. Rocksdb uses the following heuristic to calculate size amplification: it assumes that all files excluding
the earliest file contribute to the size amplification. Default: 200, which means that a 100 byte database could require upto
300 bytes of storage.
<p>
<li> <code>CompactionOptionsUniversal::compression_size_percent</code> - If this option is set to be -1 (the default value), all the output files
will follow compression type specified. If this option is not negative, we will try to make sure compressed
size is just above this value. In normal cases, at least this percentage
of data will be compressed.
When we are compacting to a new file, here is the criteria whether
it needs to be compressed: assuming here are the list of files sorted
by generation time: [ A1...An B1...Bm C1...Ct ],
where A1 is the newest and Ct is the oldest, and we are going to compact
B1...Bm, we calculate the total size of all the files as total_size, as
well as the total size of C1...Ct as total_C, the compaction output file
will be compressed iff total_C / total_size < this percentage
<p>
<li> <code>CompactionOptionsUniversal::stop_style</code> - The algorithm used to stop picking files into a single compaction run.
Can be kCompactionStopStyleSimilarSize (pick files of similar size) or kCompactionStopStyleTotalSize (total size of picked files > next file).
Default: kCompactionStopStyleTotalSize
</ul>
<h1>Thread pools</h1>
<p>
A thread pool is associated with Env environment object. The client has to create a thread pool by setting the number of background
threads using method <code>Env::SetBackgroundThreads()</code> defined in <code>rocksdb/env.h</code>.
We use the thread pool for compactions and memtable flushes.
Since memtable flushes are in critical code path (stalling memtable flush can stall writes, increasing p99), we suggest
having two thread pools - with priorities HIGH and LOW. Memtable flushes can be set up to be scheduled on HIGH thread pool.
There are two options available for configuration of background compactions and flushes:
<ul>
<p>
<li> <code>Options::max_background_compactions</code> - Maximum number of concurrent background jobs,
submitted to the default LOW priority thread pool
<p>
<li> <code>Options::max_background_flushes</code> - Maximum number of concurrent background memtable flush jobs, submitted to
the HIGH priority thread pool. By default, all background jobs (major compaction and memtable flush) go
to the LOW priority pool. If this option is set to a positive number, memtable flush jobs will be submitted to the HIGH priority pool.
It is important when the same Env is shared by multiple db instances. Without a separate pool, long running major compaction jobs could
potentially block memtable flush jobs of other db instances, leading to unnecessary Put stalls.
</ul>
<p>
<pre>
#include "rocksdb/env.h"
#include "rocksdb/db.h"
auto env = rocksdb::Env::Default();
env->SetBackgroundThreads(2, rocksdb::Env::LOW);
env->SetBackgroundThreads(1, rocksdb::Env::HIGH);
rocksdb::DB* db;
rocksdb::Options options;
options.env = env;
options.max_background_compactions = 2;
options.max_background_flushes = 1;
rocksdb::Status status = rocksdb::DB::Open(options, "/tmp/testdb", &amp;db);
assert(status.ok());
...
</pre>
<h1>Approximate Sizes</h1> <h1>Approximate Sizes</h1>
<p> <p>
The <code>GetApproximateSizes</code> method can used to get the approximate The <code>GetApproximateSizes</code> method can used to get the approximate