Summary:
Summary:
Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most
accesses to the store are key hits and we dont need to optimize performance for key misses.
This is useful when you have a very large database and most of your lookups succeed. The option allows the store to
not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.
This option is only provided for BlockBasedTable. We skip the filters when we are compacting
Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
2. Added another unit test to db_test which specifically tests that filters are being skipped
Reviewers: rven, igor, sdong
Reviewed By: sdong
Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33717
Summary:
This patch will update the Makefile and source code so that we can build RocksDB successfully on FreeBSD 10 and 11 (64-bit and 32-bit)
I have also encountered some problems when running tests on FreeBSD, I will try to fix them individually in different diffs
Notes:
- FreeBSD uses clang as it's default compiler (http://lists.freebsd.org/pipermail/freebsd-current/2012-September/036480.html)
- GNU C++ compiler have C++ 11 problems on FreeBSD (https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=193528)
- make is not gmake on FreeBSD (http://www.khmere.com/freebsd_book/html/ch01.html)
Test Plan:
Using VMWare Fusion Create 4 VM machines (FreeBSD 11 64-bit, FreeBSD 11 32-bit, FreeBSD 10 64-bit, FreeBSD 10 32-bit)
- pkg install git gmake gflags archivers/snappy
- git clone https://github.com/facebook/rocksdb.git
- apply this patch
- setenv CXX c++
- setenv CPATH /usr/local/include/
- setenv LIBRARY_PATH /usr/local/lib/
- gmake db_bench
- make sure compilation is successful and db_bench is running
- gmake all
- make sure compilation is successful
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33891
Summary:
The LoadDependency function does not take a lock when it runs
and it could be modifying data structures while other threads are
accessing it.
Test Plan: Run TSAN.
Reviewers: igor, sdong
Reviewed By: igor, sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34095
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Some errors are uninitialized use errors.
```
...
CC db/log_test.o
util/ldb_cmd.cc:394:16: error: base class 'rocksdb::LDBCommand' is uninitialized when used here to access 'rocksdb::LDBCommand::BuildCmdLineOptions' [-Werror,-Wuninitialized]
BuildCmdLineOptions({ARG_FROM, ARG_TO, ARG_HEX, ARG_KEY_HEX,
^
...
```
```lang=c++
CompactorCommand::CompactorCommand(const vector<string>& params,
const map<string, string>& options, const vector<string>& flags) :
LDBCommand(options, flags, false,
BuildCmdLineOptions({ARG_FROM, ARG_TO, ARG_HEX, ARG_KEY_HEX,
ARG_VALUE_HEX, ARG_TTL})),
null_from_(true), null_to_(true) {
. . .
}
```
For the fourth parameter of the base constructor (`LDBCommand`) we call `BuildCmdLineOptions`, which is a private non-static method of `LDBCommand` base class.
This diff adds missing `static` keyword for `LDBCommand::BuildCmdLineOptions` method.
Test Plan:
Build with trunk clang and make sure all tests are passing.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make check
``
Reviewers: meyering, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34083
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
BlockBasedTable pre-fetches the filter and index blocks on Open call.
This is an optimistic optimization targeted for runtime scenario. The
optimization is unnecessary for sst_dump_tool
- Added a provision to disable pre-fetching of index and filter blocks
in BlockBasedTable
- Disabled pre-fetching for the sst_dump tool
Stack for reference :
#01 0x00000000005ed944 in snappy::InternalUncompress<snappy::SnappyArrayWriter> () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:148
#02 0x00000000005edeee in snappy::RawUncompress () from /home/engshare/third-party2/snappy/1.0.3/src/snappy-1.0.3/snappy.cc:947
#03 0x00000000004e0b4d in rocksdb::UncompressBlockContents () from /data/users/paultuckfield/rocksdb/./util/compression.h:69
#04 0x00000000004e145c in rocksdb::ReadBlockContents () from /data/users/paultuckfield/rocksdb/table/format.cc:334
#05 0x00000000004ca424 in rocksdb::(anonymous namespace)::ReadBlockFromFile () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:70
#06 0x00000000004cccad in rocksdb::BlockBasedTable::CreateIndexReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:173
#07 0x00000000004d17e5 in rocksdb::BlockBasedTable::Open () from /data/users/paultuckfield/rocksdb/table/block_based_table_reader.cc:553
#08 0x00000000004c8184 in rocksdb::BlockBasedTableFactory::NewTableReader () from /data/users/paultuckfield/rocksdb/table/block_based_table_factory.cc:51
#09 0x0000000000598463 in rocksdb::SstFileReader::NewTableReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:69
#10 0x00000000005986c2 in rocksdb::SstFileReader::SstFileReader () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:26
#11 0x0000000000599047 in rocksdb::SSTDumpTool::Run () from /data/users/paultuckfield/rocksdb/util/sst_dump_tool.cc:332
#12 0x0000000000409b06 in main () from /data/users/paultuckfield/rocksdb/tools/sst_dump.cc:12
Test Plan:
- Added a unit test to trigger the code.
- Also did some manual verification.
- Passed all unit tests
task #6296048
Reviewers: igor, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34041
Summary:
The unit test was supposed to check that the old file and the new file contains
the header message.
Test Plan: Run the unit test.
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33705
Summary: as above
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33753
Summary:
During running AutoRollLoggerTest on FreeBSD we have found out that AutoRollLogger is not flushing correctly (test fails on FreeBSD)
This diff add Flush to AutoRollLogger to fix this problem
Test Plan:
[My machine] make all check ( all tests pass)
[FreeBSD VM] running AutoRollLoggerTest ( all tests pass )
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33633
Summary:
In mongo, we currently have a single column family and I'd like to support setting rocksdb::Options from string. This diff provides an option to GetOptionsFromString()
There's one more problem. Currently GetColumnFamilyOptionsFromString() overwrites block_based options. In mongo I set default values for block_cache and some other values of BlockBasedTableOptions and I don't want them reset to default with GetOptionsFromString().
Test Plan: added unit test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33729
Summary:
Prior to this change, "make check" would always waste a lot of
time relinking 60+ binaries. With this change, it does that
only when the generated file, util/build_version.cc, changes,
and that happens only when the date changes or when the
current git SHA changes.
This change makes some other improvements: before, there was no
rule to build a deleted util/build_version.cc. If it was somehow
removed, any attempt to link a program would fail.
There is no longer any need for the separate file,
build_tools/build_detect_version. Its functionality is
now in the Makefile.
* Makefile (DEPFILES): Don't filter-out util/build_version.cc.
No need, and besides, removing that dependency was wrong.
(date, git_sha, gen_build_version): New helper variables.
(util/build_version.cc): New rule, to create this file
and update it only if it would contain new information.
* build_tools/build_detect_platform: Remove file.
* db/db_impl.cc: Now, print only date (not the time).
* util/build_version.h (rocksdb_build_compile_time): Remove
declaration. No longer used.
Test Plan:
- Run "make check" twice, and note that the second time no linking is performed.
- Remove util/build_version.cc and ensure that any "make"
command regenerates it before doing anything else.
- Run this: strings librocksdb.a|grep _build_.
That prints output including the following:
rocksdb_build_git_date:2015-02-19
rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33591
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.
Test Plan: Managed* tests in db_test and XF tests for managed iterator
Reviewers: igor, yhchiang, anthony, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31401
Summary:
scan-build complains with division by zero warning in a test. Added an assertion to prevent this.
scan-build report: http://home.fburl.com/~sugak/latest6/report-c61be9.html#EndPath
Test Plan:
Make sure scan-build does not report 'Division by zero' and all tests are passing.
```lang=bash
% make analyze
% make check
```
Reviewers: igor, meyering
Reviewed By: meyering
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33495
Summary:
Add thread_status_util_debug.cc back as InstrumentedMutex related tests
are using it to produce wait that can be reflected in the counter.
Test Plan:
./perf_context_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33525
Summary: Allow GetThreadList to reflect flush activity.
Test Plan:
Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test.
./db_test ./thread_list_test
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32871
Summary:
In the existing implementation of `ASSERT*`, test termination happens in `~Tester`, which is called when instance of `Tester` goes out of scope. This is the cause of many scan-build bugs.
This diff changes `ASSERT*` to terminate the test immediately. Also added one suppression in `util/signal_test.cc`
scan-build bugs
before: http://home.fburl.com/~sugak/latest/index.html
after: http://home.fburl.com/~sugak/latest2/index.html
Test Plan:
Modify some test to fail an assertion and make sure that `ASSERT*` terminated the test.
Run `make analyze` and make sure no 'Called C++ object pointer is null' and 'Dereference of null pointer' bugs reported.
Run tests and make sure no failing tests:
```lang=bash
% make check
% USE_CLANG=1 make check
```
Reviewers: meyering, lgalanis, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33381
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex
Test Plan: Add a test and run it.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33177
Summary:
This change adds LogHeader provision to the logger. For the rolling logger
implementation, the headers are copied over to the new log file every time
there is a log roll over.
Test Plan: Added a unit test to test the rolling log case.
Reviewers: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32817
Summary:
This diff basically reverts D30249 and also adds a unit test that was failing before this patch.
I have no idea how I didn't catch this terrible bug when writing a diff, sorry about that :(
I think we should redesign our system of keeping track of and deleting files. This is already a second bug in this critical piece of code. I'll think of few ideas.
BTW this diff is also a regression when running lots of column families. I plan to revisit this separately.
Test Plan: added a unit test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33045
Summary:
Add a counter for collecting the wait time on db mutex.
Also add MutexWrapper and CondVarWrapper for measuring wait time.
Test Plan:
./db_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test
verify stats output using db_bench
make clean
make release
./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10
Sample output:
rocksdb.db.mutex.wait.micros COUNT : 7546866
Reviewers: MarkCallaghan, rven, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32787
Summary: A bug in MockEnv causes fault_injestion_test to fail. I don't know why it doesn't fail every time but it doesn't seem to be right.
Test Plan:
Run fault_injestion_test
Also run db_test with MEM_ENV=1 until the first failure.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32877
Summary:
Add ThreadStatus::GetOperationName() and ThreadStatus::GetStateName(),
two utility functions that help interpreting ThreadStatus.
Test Plan: ./thread_list_test
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32793
Summary: Increasing parallelism of flushes will help bulk load throughput.
Test Plan: Compile it.
Reviewers: MarkCallaghan, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32685
Summary: I broke it with 2fd8f750ab
Test Plan: make unity
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32577
Summary:
./util/xfunc.h:31:1: error: class 'Options' was previously declared as a struct [-Werror,-Wmismatched-tags]
class Options;
^
Test Plan:
make dbg -j32
Summary:
This Diff provides the implementation of the cross functional
test infrastructure. This provides the ability to test a single feature
with every existing regression test in order to identify issues with
interoperability between features.
Test Plan:
Reference implementation of inplace update support cross
functional test. Able to find interoperability issues with inplace
support and ran all of db_test. Will add separate diff for those changes.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32247
Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key.
Test Plan:
Add a test case for
db_test
options_test
and a new test
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31887
Summary: This was a feature request by osquery. See task t5617758
Test Plan: compiles and memenv_test runs
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32115
Summary: This bug fails DBTest.CheckLock
Test Plan: DBTest.CheckLock now passes with MEM_ENV=1.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32451
Summary:
1) need to do acquire load when read the first entry in the bucket.
2) Make num_entries atomic
Test Plan: Ran DBTest.MultiThreaded with TSAN
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32361
Summary: Add a new test case in fault_injection_test, which covers parallel compactions and multiple levels. Use MockEnv to run the new test case to speed it up. Improve MockEnv to avoid DestoryDB(), previously failed when deleting lock files.
Test Plan: Run ./fault_injection_test, including valgrind
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32415
Summary: TSAN complained that these are non-atomic reads and writes from different threads.
Test Plan: TSAN no longer complains
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32409
Summary: We don't have plans to work on this in the short term. If we ever resurrect the project, we can find the code in the history. No need for it to linger around
Test Plan: no test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32349
Summary:
ThreadSanitizer complains data race of super version and version's destructor with Get(). This patch will fix those warning.
The warning is likely from ColumnFamilyData::ReturnThreadLocalSuperVersion(). With relaxed consistency of CAS, reading the data of the super version can technically happen after swapping it in, enabling the background thread to clean it up.
Test Plan: make all check
Reviewers: rven, igor, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32265
Summary:
Make options_test runnable on ROCKSDB_LITE by blocking
those tests that require non-ROCKSDB_LITE feature.
Test Plan:
make options_test OPT=-DROCKSDB_LITE -j32
./options_test
make clean
make options_test -j32
./options_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32025
Summary:
This diff enables to configure prefix_extractor
string parameter as a CF option.
Test Plan: make all check, ./options_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31653
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.
It does not affect format for snappy.
If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.
Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB
Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31461
Summary: We keep checksum functions in util/, there is no reason for compression to be in port/
Test Plan: compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31281
Summary:
Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery.
The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings.
Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery.
Reviewers: yhchiang, rven, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30603
Summary:
Add structures for exposing events and operations. Event describes
high-level action about a thread such as doing compaciton or
doing flush, while an operation describes lower-level action
of a thread such as reading / writing a SST table, waiting for
mutex. Events and operations are designed to be independent.
One thread would typically involve in one event and one operation.
Code instrument will be in a separate diff.
Test Plan:
Add unit-tests in thread_list_test
make dbg -j32
./thread_list_test
export ROCKSDB_TESTS=ThreadList
./db_test
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: rven, jonahcohen, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29781
Summary:
Add support to allow nested config for block-based table factory. The format looks like this:
"write_buffer_size=1024;block_based_table_factory={block_size=4k};max_write_buffer_num=2"
Test Plan: unit test
Reviewers: yhchiang, rven, igor, ljin, jonahcohen
Reviewed By: jonahcohen
Subscribers: jonahcohen, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29223
Summary:
GetThreadList() feature depends on the thread creation and destruction, which is currently handled under Env.
This patch moves GetThreadList() feature under Env to better manage the dependency of GetThreadList() feature
on thread creation and destruction.
Renamed ThreadStatusImpl to ThreadStatusUpdater. Add ThreadStatusUtil, which is a static class contains
utility functions for ThreadStatusUpdater.
Test Plan: run db_test, thread_list_test and db_bench and verify the life cycle of Env and ThreadStatusUpdater is properly managed.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: ljin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30057
Summary:
Priliminary diff to solicit comments.
Given DB path, dump all SST files (key/value and properties), WAL file and manifest
files. What command options do we need to support for this command? Maybe
output_hex for keys?
Test Plan: Create additional ldb unit tests.
Reviewers: sdong, rven
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D29547
Summary:
Currently, blocks which have more than one reference (ie referenced by something other than cache itself) are evicted from cache. This doesn't make much sense:
- blocks are still in RAM, so the RAM usage reported by the cache is incorrect
- if the same block is needed by another iterator, it will be loaded and decompressed again
This diff changes the reference counting scheme a bit. Previously, if the cache contained the block, this was accounted for in its refcount. After this change, the refcount is only used to track external references. There is a boolean flag which indicates whether or not the block is contained in the cache.
This diff also changes how LRU list is used. Previously, both hashtable and the LRU list contained all blocks. After this change, the LRU list contains blocks with the refcount==0, ie those which can be evicted from the cache.
Note that this change still allows for cache to grow beyond its capacity. This happens when all blocks are pinned (ie refcount>0). This is consistent with the current behavior. The cache's insert function never fails. I spent lots of time trying to make table_reader and other places work with the insert which might failed. It turned out to be pretty hard. It might really destabilize some customers, so finally, I decided against doing this.
table_cache_remove_scan_count_limit option will be unneeded after this change, but I will remove it in the following diff, if this one gets approved
Test Plan: Ran tests, made sure they pass
Reviewers: sdong, ljin
Differential Revision: https://reviews.facebook.net/D25503
Summary: Why do we assert here? This doesn't seem like user friendly thing to do :)
Test Plan: none
Reviewers: sdong, yhchiang, rven
Reviewed By: rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30027
Summary:
Remove the compability check on log2 OS_ANDROID as it's already blocked by ROCKSDB_LITE
Test Plan:
make OPT="-DROCKSDB_LITE -DOS_ANDROID" shared_lib -j32
make shared_lib -j32
Summary: Replace runtime_error exception by abort() in thread_local
Test Plan: make dbg -j32
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29853
Summary: Replace exception by assertion in autovector
Test Plan: autovector_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29847
Summary:
Introduces a new class for managing write buffer memory across column
families. We supplement ColumnFamilyOptions::write_buffer_size with
ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
instance that enforces memory limits before flushing out to disk.
Test Plan: Added SharedWriteBuffer unit test to db_test.cc
Reviewers: sdong, rven, ljin, igor
Reviewed By: igor
Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
Differential Revision: https://reviews.facebook.net/D22581
Summary: Block Universal and FIFO compactions in ROCKSDB_LITE
Test Plan:
make shared_lib -j32
make OPT=-DROCKSDB_LITE shared_lib
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29589
Summary:
log2 function is only used in options_builder, and this function
is not available under certain platform such as android.
This patch implements Log2 by log(n) / log(2).
Test Plan:
make
Summary:
In some environment such as android, the c++ library does not have
std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string
when std::to_string is not available, and implements std::to_string
in the other case.
Test Plan:
make dbg -j32
./db_test
make clean
make dbg OPT=-DOS_ANDROID -j32
./db_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29181
Summary: We want to make sure people without gflags can compile RocksDB.
Test Plan: remove gflags, make all
Reviewers: sdong, rven, yhchiang, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29469
Summary:
An entry of ConstantColumnFamilyInfo is created when:
1. DB::Open
2. CreateColumnFamily.
However, there are cases that DB::Open could also call CreateColumnFamily
when create_missing_column_families=true. As a result, it will create
duplicate ConstantColumnFamilyInfo and one of them would be leaked.
Test Plan: ./deletefile_test
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29307
Summary:
arena doesn't use huge page by default. This change will make it happen
if possible. A new paramerter is added for Arena(). If it's set, Arena
will use huge page always. If huge page allocation fails, Arena
allocation will fallback to malloc().
Test Plan:
Change util/arena_test to support huge page allocation.
Run below tests:
1. normal regression test:
make check
2. Check if huge page allocation works
echo 50 > /proc/sys/vm/nr_hugepages
make check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D28647
Summary: stringSplit is not how we name our functions. Also, we had two StringSplit's in the codebase
Test Plan: make check
Reviewers: yhchiang, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29361
Summary:
Add enable_thread_tracking to DBOptions to allow
tracking thread status related to the DB. Default is off.
Test Plan:
export ROCKSDB_TESTS=ThreadList
./db_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29289
Summary: Created a CompatibleOptions object that can be used as a LevelDB Options object and then converted to a RocksDB Options object using the ConvertOptions() method.
Test Plan: Unit test included in diff.
Reviewers: ljin
Reviewed By: ljin
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28893
Summary:
Add GetThreadList API, which allows developer to track the
status of each process. Currently, calling GetThreadList will
only get the list of background threads in RocksDB with their
thread-id and thread-type (priority) set. Will add more support
on this in the later diffs.
ThreadStatus currently has the following properties:
// An unique ID for the thread.
const uint64_t thread_id;
// The type of the thread, it could be ROCKSDB_HIGH_PRIORITY,
// ROCKSDB_LOW_PRIORITY, and USER_THREAD
const ThreadType thread_type;
// The name of the DB instance where the thread is currently
// involved with. It would be set to empty string if the thread
// does not involve in any DB operation.
const std::string db_name;
// The name of the column family where the thread is currently
// It would be set to empty string if the thread does not involve
// in any column family.
const std::string cf_name;
// The event that the current thread is involved.
// It would be set to empty string if the information about event
// is not currently available.
Test Plan:
./thread_list_test
export ROCKSDB_TESTS=GetThreadList
./db_test
Reviewers: rven, igor, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D25047
Summary:
The very last reference happens in DBImpl::GetOptions()
I built with both DBImpl::GetOptions() and ColumnFamilyData::options() commented out
Test Plan: make all check
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29073
Summary:
Add unit support in options helper so we can specify, e.g., 10m for
10 megabytes.
Test Plan: Updated options_test
Reviewers: sdong, igor, ljin
Reviewed By: ljin
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D28977
Summary: Store links to live files in directory on same disk
Test Plan:
Take snapshot and open it. Added a test GetSnapshotLink in
db_test.
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28713
Summary:
Make db_stress built for ROCKSDB_LITE.
The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB.
Test Plan: make db_stress
Reviewers: yhchiang, rven, ljin, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D28797
It seems that on some FS we get more blocks than we ask for. This is
already handled when checking the allocated number of blocks, but
after the file is closed it checks for an exact number of blocks,
which fails on my machine.
I changed the test to add one full page to the size, then calculate
the expected number of blocks and check if the actual number of blocks
is less or equal to that.
Summary: So iOS size_t is 32-bit, so we need to static_cast<size_t> any uint64_t :(
Test Plan: TARGET_OS=IOS make static_lib
Reviewers: dhruba, ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28743
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
Test Plan: compiles
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: bobbaldwin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28689
Summary:
Refactor sst_dump to follow the same structure as ldb. Introduce a
SSTDump interface.
Test Plan: built sst_dump and tried it manually.
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28485
Summary: Previously I made `make check` work with -Wshadow, but there are some tools that are not compiled using `make check`.
Test Plan: make all
Reviewers: yhchiang, rven, ljin, sdong
Reviewed By: ljin, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28497
Summary:
This diff adds three sets of APIs to RocksDB.
= GetColumnFamilyMetaData =
* This APIs allow users to obtain the current state of a RocksDB instance on one column family.
* See GetColumnFamilyMetaData in include/rocksdb/db.h
= EventListener =
* A virtual class that allows users to implement a set of
call-back functions which will be called when specific
events of a RocksDB instance happens.
* To register EventListener, simply insert an EventListener to ColumnFamilyOptions::listeners
= CompactFiles =
* CompactFiles API inputs a set of file numbers and an output level, and RocksDB
will try to compact those files into the specified level.
= Example =
* Example code can be found in example/compact_files_example.cc, which implements
a simple external compactor using EventListener, GetColumnFamilyMetaData, and
CompactFiles API.
Test Plan:
listener_test
compactor_test
example/compact_files_example
export ROCKSDB_TESTS=CompactFiles
db_test
export ROCKSDB_TESTS=MetaData
db_test
Reviewers: ljin, igor, rven, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24705
Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc.
Test Plan: compiles
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28131
Summary: Apply InfoLogLevel to the logs in util/env_hdfs.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D28011
Summary: as title
Test Plan: ./db_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28269
Summary:
With the patch, thread pool size will be automatically increased if DB's options ask for more parallelism of compactions or flushes.
Too many users have been confused by the API. Change it to make it harder for users to make mistakes
Test Plan: Add two unit tests to cover the function.
Reviewers: yhchiang, rven, igor, MarkCallaghan, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27555
Summary: as title
Test Plan: env_mem_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28077
Summary:
TestMemEnv simulates all Env APIs using in-memory data structures.
We can use it to speed up db_test run, which is now reduced ~7mins when it is
enabled.
We can also add features to simulate power/disk failures in the next
step
TestMemEnv is derived from helper/mem_env
mem_env can not be used for rocksdb since some of its APIs do not give
the same results as env_posix. And its file read/write is not thread safe
Test Plan:
make all -j32
./db_test
./env_mem_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28035
Summary: Fix lint errors and coding style of ldb related codes.
Test Plan: ./ldb
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28125
Summary:
...and fix all the errors :)
Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
Test Plan: compiles
Reviewers: yhchiang, rven, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27711
Summary:
Rename Version::Builder to VersionBuilder and expose its definition to a header.
Make VerisonBuilder not reference Version or ColumnFamilyData, only working with VersionStorageInfo.
Add version_builder_test which has a simple test.
Test Plan: make all check
Reviewers: rven, yhchiang, igor, ljin
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27969
Summary:
* Rename util/db_info_dummper.cc to util/db_info_dumper.cc
* Apply InfoLogLevel to the logs in util/db_info_dumper.cc
Test Plan: make
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27915
Summary:
Add some helper functions to make sure DB works well for non-default comparators.
Add a test for SimpleSuffixReverseComparator.
Test Plan: Run the new test
Reviewers: ljin, rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27831
Summary: Revmoed this in D25641, causing compiler complain. put it back
Test Plan: make release
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27891
Summary:
Make compaction picker easier to test.
The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.
It now passes most tests. Still two things need to be done:
(1) deal with the FIFO compaction's file size.
(2) write an example test to make sure the interface can do the job.
Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.
Test Plan:
Pass all unit tests and compaction_picker_test
Reviewers: yhchiang, rven, igor, ljin
Reviewed By: ljin
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27639
Summary:
ftruncate does not always free preallocated unused space at the end of file.
In some cases, we pin too much disk space than it should
Test Plan: env_test
Reviewers: sdong, rven, yhchiang, igor
Reviewed By: igor
Subscribers: nkg-, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D25641
Summary:
Sometimes, I got a test failure. After fixing that, I want to resume
db_test from that test. ROCKSDB_TESTS_FROM is for this purpose.
Test Plan: as title
Reviewers: yhchiang, rven, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27807
Summary: RocksDB already depends on C++11, so we might as well all the goodness that C++11 provides. This means that we don't need AtomicPointer anymore. The less things in port/, the easier it will be to port to other platforms.
Test Plan: make check + careful visual review verifying that NoBarried got memory_order_relaxed, while Acquire/Release methods got memory_order_acquire and memory_order_release
Reviewers: rven, yhchiang, ljin, sdong
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D27543
Summary:
Make inplace_update_support and inplace_update_num_locks dynamic.
inplace_callback becomes immutable
We are almost free of references to cfd->options() in db_impl
Test Plan: unit test
Reviewers: igor, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25293
Summary: Since we depend on C++11, we might as well use it for timing, instead of this platform-depended code.
Test Plan: Ran autovector_test, which reports time and confirmed that output is similar to master
Reviewers: ljin, sdong, yhchiang, rven, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D25587
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25347
Summary:
This is not a critical options. Making it dynamic so that we can remove
more reference to cfd->options()
Test Plan: unit test
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24957
Summary: as title
Test Plan: make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24903
Summary: as title
Test Plan:
unit test
I am only able to build the test case for hard_rate_limit.
soft_rate_limit is essentially the same thing as hard_rate_limit
Reviewers: igor, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24759
Summary: Add more tests as well
Test Plan: unit test
Reviewers: igor, sdong, yhchiang
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24747
Summary: as title
Test Plan: unit test
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24729
Summary:
ldb to support --fix_prefix_len to allow us to verify more cases.
Also fix a small issue that --bloom_bits might not be applied if --block_size is not given.
Test Plan: run ldb tool against an example DB.
Reviewers: ljin, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24819
Summary:
Fixed the following error in Mac:
./util/testharness.h:93:19: error: comparison of integers of different signs: 'const unsigned long' and 'const int' [-Werror,-Wsign-compare]
BINARY_OP(IsEq, ==)
~~~~~~~~~~~~~~~~^~~
./util/testharness.h:86:14: note: expanded from macro 'BINARY_OP'
if (! (x op y)) { \
^
util/options_test.cc:269:3: note: in instantiation of function template specialization 'rocksdb::test::Tester::IsEq<unsigned long, int>' requested here
ASSERT_EQ(new_cf_opt.write_buffer_size, 5);
^
Test Plan:
options_test
Summary: Allow accepting Options as a string of key/value pairs
Test Plan: unit test
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24597
Summary:
This diff introduces the `lookahead` argument to `SkipListFactory()`. This is an
optimization for the tailing use case which includes many seeks. E.g. consider
the following operations on a skip list iterator:
Seek(x), Next(), Next(), Seek(x+2), Next(), Seek(x+3), Next(), Next(), ...
If `lookahead` is positive, `SkipListRep` will return an iterator which also
keeps track of the previously visited node. Seek() then first does a linear
search starting from that node (up to `lookahead` steps). As in the tailing
example above, this may require fewer than ~log(n) comparisons as with regular
skip list search.
Test Plan:
Added a new benchmark (`fillseekseq`) which simulates the usage pattern. It
first writes N records (with consecutive keys), then measures how much time it
takes to read them by calling `Seek()` and `Next()`.
$ time ./db_bench -num 10000000 -benchmarks fillseekseq -prefix_size 1 \
-key_size 8 -write_buffer_size $[1024*1024*1024] -value_size 50 \
-seekseq_next 2 -skip_list_lookahead=0
[...]
DB path: [/dev/shm/rocksdbtest/dbbench]
fillseekseq : 0.389 micros/op 2569047 ops/sec;
real 0m21.806s
user 0m12.106s
sys 0m9.672s
$ time ./db_bench [...] -skip_list_lookahead=2
[...]
DB path: [/dev/shm/rocksdbtest/dbbench]
fillseekseq : 0.153 micros/op 6540684 ops/sec;
real 0m19.469s
user 0m10.192s
sys 0m9.252s
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb, march, lovro
Differential Revision: https://reviews.facebook.net/D23997
Summary:
make compaction related options changeable. Most of changes are tedious,
following the same convention: grabs MutableCFOptions at the beginning
of compaction under mutex, then pass it throughout the job and register
it in SuperVersion at the end.
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23349
Prefer prefix ++operator for non-primitive types like iterators for
performance reasons. Prefix ++/-- operators avoid creating a temporary
copy.
Signed-off-by: Danny Al-Gaaf <danny.al-gaaf@bisect.de>