rocksdb/db
Dhruba Borthakur 321dfdc3ae Allow having different compression algorithms on different levels.
Summary:
The leveldb API is enhanced to support different compression algorithms at
different levels.

This adds the option min_level_to_compress to db_bench that specifies
the minimum level for which compression should be done when
compression is enabled. This can be used to disable compression for levels
0 and 1 which are likely to suffer from stalls because of the CPU load
for memtable flushes and (L0,L1) compaction.  Level 0 is special as it
gets frequent memtable flushes. Level 1 is special as it frequently
gets all:all file compactions between it and level 0. But all other levels
could be the same. For any level N where N > 1, the rate of sequential
IO for that level should be the same. The last level is the
exception because it might not be full and because files from it are
not read to compact with the next larger level.

The same amount of time will be spent doing compaction at any
level N excluding N=0, 1 or the last level. By this standard all
of those levels should use the same compression. The difference is that
the loss (using more disk space) from a faster compression algorithm
is less significant for N=2 than for N=3. So we might be willing to
trade disk space for faster write rates with no compression
for L0 and L1, snappy for L2, zlib for L3. Using a faster compression
algorithm for the mid levels also allows us to reclaim some cpu
without trading off much loss in disk space overhead.

Also note that little is to be gained by compressing levels 0 and 1. For
a 4-level tree they account for 10% of the data. For a 5-level tree they
account for 1% of the data.

With compression enabled:
* memtable flush rate is ~18MB/second
* (L0,L1) compaction rate is ~30MB/second

With compression enabled but min_level_to_compress=2
* memtable flush rate is ~320MB/second
* (L0,L1) compaction rate is ~560MB/second

This practicaly takes the same code from https://reviews.facebook.net/D6225
but makes the leveldb api more general purpose with a few additional
lines of code.

Test Plan: make check

Differential Revision: https://reviews.facebook.net/D6261
2012-10-29 11:48:09 -07:00
..
builder.cc Allow having different compression algorithms on different levels. 2012-10-29 11:48:09 -07:00
builder.h A number of fixes: 2011-10-31 17:22:06 +00:00
c_test.c merge 1.5 2012-08-28 11:43:33 -07:00
c.cc put log in a seperate dir 2012-09-06 17:52:08 -07:00
corruption_test.cc Make some variables configurable for each db instance 2012-06-27 14:36:31 -07:00
db_bench.cc Allow having different compression algorithms on different levels. 2012-10-29 11:48:09 -07:00
db_filesnapshot.cc The BackupAPI should also list the length of the manifest file. 2012-09-25 03:13:25 -07:00
db_impl.cc Allow having different compression algorithms on different levels. 2012-10-29 11:48:09 -07:00
db_impl.h Adds DB::GetNextCompaction and then uses that for rate limiting db_bench 2012-10-29 10:17:43 -07:00
db_iter.cc A number of fixes: 2011-10-31 17:22:06 +00:00
db_iter.h A number of fixes: 2011-10-31 17:22:06 +00:00
db_statistics.h Fix table-cache size bug, gather table-cache statistics and prevent readahead done by fs. Summary: 2012-05-30 16:42:45 -07:00
db_stats_logger.cc remove boost 2012-09-16 19:33:43 -07:00
db_test.cc Allow having different compression algorithms on different levels. 2012-10-29 11:48:09 -07:00
dbformat_test.cc A number of fixes: 2011-10-31 17:22:06 +00:00
dbformat.cc Added bloom filter support. 2012-04-17 08:36:46 -07:00
dbformat.h Make some variables configurable for each db instance 2012-06-27 14:36:31 -07:00
filename_test.cc A number of fixes: 2011-10-31 17:22:06 +00:00
filename.cc put log in a seperate dir 2012-09-06 17:52:08 -07:00
filename.h put log in a seperate dir 2012-09-06 17:52:08 -07:00
log_format.h A number of fixes: 2011-10-31 17:22:06 +00:00
log_reader.cc A number of fixes: 2011-10-31 17:22:06 +00:00
log_reader.h A number of fixes: 2011-10-31 17:22:06 +00:00
log_test.cc A number of fixes: 2011-10-31 17:22:06 +00:00
log_writer.cc A number of fixes: 2011-10-31 17:22:06 +00:00
log_writer.h A number of fixes: 2011-10-31 17:22:06 +00:00
memtable.cc A number of fixes: 2011-10-31 17:22:06 +00:00
memtable.h A number of fixes: 2011-10-31 17:22:06 +00:00
repair.cc Make some variables configurable for each db instance 2012-06-27 14:36:31 -07:00
skiplist_test.cc A number of fixes: 2011-10-31 17:22:06 +00:00
skiplist.h skiplist: optimize for sequential insert pattern 2012-05-11 09:57:40 -07:00
snapshot.h A number of fixes: 2011-10-31 17:22:06 +00:00
table_cache.cc Trigger read compaction only if seeks to storage are incurred. 2012-09-28 11:10:52 -07:00
table_cache.h Trigger read compaction only if seeks to storage are incurred. 2012-09-28 11:10:52 -07:00
version_edit_test.cc Make some variables configurable for each db instance 2012-06-27 14:36:31 -07:00
version_edit.cc Clean up compiler warnings generated by -Wall option. 2012-08-29 14:24:51 -07:00
version_edit.h Make some variables configurable for each db instance 2012-06-27 14:36:31 -07:00
version_set_test.cc A number of fixes: 2011-10-31 17:22:06 +00:00
version_set.cc add "seek_compaction" to log for better debug Summary: 2012-10-22 10:00:25 -07:00
version_set.h Adds DB::GetNextCompaction and then uses that for rate limiting db_bench 2012-10-29 10:17:43 -07:00
write_batch_internal.h added group commit; drastically speeds up mult-threaded synchronous write workloads 2012-03-08 16:23:21 -08:00
write_batch_test.cc added group commit; drastically speeds up mult-threaded synchronous write workloads 2012-03-08 16:23:21 -08:00
write_batch.cc added group commit; drastically speeds up mult-threaded synchronous write workloads 2012-03-08 16:23:21 -08:00