ee95cae9a4
Summary: Currently, blocks which have more than one reference (ie referenced by something other than cache itself) are evicted from cache. This doesn't make much sense: - blocks are still in RAM, so the RAM usage reported by the cache is incorrect - if the same block is needed by another iterator, it will be loaded and decompressed again This diff changes the reference counting scheme a bit. Previously, if the cache contained the block, this was accounted for in its refcount. After this change, the refcount is only used to track external references. There is a boolean flag which indicates whether or not the block is contained in the cache. This diff also changes how LRU list is used. Previously, both hashtable and the LRU list contained all blocks. After this change, the LRU list contains blocks with the refcount==0, ie those which can be evicted from the cache. Note that this change still allows for cache to grow beyond its capacity. This happens when all blocks are pinned (ie refcount>0). This is consistent with the current behavior. The cache's insert function never fails. I spent lots of time trying to make table_reader and other places work with the insert which might failed. It turned out to be pretty hard. It might really destabilize some customers, so finally, I decided against doing this. table_cache_remove_scan_count_limit option will be unneeded after this change, but I will remove it in the following diff, if this one gets approved Test Plan: Ran tests, made sure they pass Reviewers: sdong, ljin Differential Revision: https://reviews.facebook.net/D25503 |
||
---|---|---|
build_tools | ||
coverage | ||
db | ||
doc | ||
examples | ||
hdfs | ||
helpers/memenv | ||
include | ||
java | ||
linters | ||
port | ||
table | ||
third-party/rapidjson | ||
tools | ||
util | ||
utilities | ||
.arcconfig | ||
.clang-format | ||
.gitignore | ||
.travis.yml | ||
AUTHORS | ||
CONTRIBUTING.md | ||
HISTORY.md | ||
INSTALL.md | ||
LICENSE | ||
Makefile | ||
PATENTS | ||
README.md | ||
ROCKSDB_LITE.md | ||
Vagrantfile |
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)
This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.
Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples
See the github wiki for more explanation.
The public interface is in include/
. Callers should not include or
rely on the details of any other header files in this package. Those
internal APIs may be changed without warning.
Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/