Summary:
Here's my proposal for making our LOGs easier to read by machines.
The idea is to dump all events as JSON objects. JSON is easy to read by humans, but more importantly, it's easy to read by machines. That way, we can parse this, load into SQLite/mongo and then query or visualize.
I started with table_create and table_delete events, but if everybody agrees, I'll continue by adding more events (flush/compaction/etc etc)
Test Plan:
Ran db_bench. Observed:
2015/01/15-14:13:25.788019 1105ef000 EVENT_LOG_v1 {"time_micros": 1421360005788015, "event": "table_file_creation", "file_number": 12, "file_size": 1909699}
2015/01/15-14:13:25.956500 110740000 EVENT_LOG_v1 {"time_micros": 1421360005956498, "event": "table_file_deletion", "file_number": 12}
Reviewers: yhchiang, rven, dhruba, MarkCallaghan, lgalanis, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31647
Summary:
The tests using sync_point for intent to shutdown stop
compaction and this results in stalls if too many rows are written. We
now limit the number of rows written to prevent stalls, since the focus
of the test is to cancel background work, which is being correctly
tested. This fixes a Jenkins issue.
Test Plan: DBTest.PreShutdown*
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34893
Summary:
Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc.
Test Plan: run all tests
Reviewers: rven, yhchiang, kradhakrishnan, igor
Reviewed By: igor
Subscribers: yoshinorim, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D34431
Summary: Use a better way to map from a key with locality to a random location. Now with the same -read_random_exp_range setting, hit rate drops, which it is expected.
Test Plan: ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=<multiple_values>
Reviewers: MarkCallaghan, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D34761
Summary:
Provide an API which enables users to infor Rocksdb that it is
shutting down.
Test Plan: db_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34617
Summary: Got it working by some voodoo programming
Test Plan: works!
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34611
Summary: Allow GetThreadList() to report the start time of the current operation.
Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10
Sample output:
ThreadID ThreadType cfName Operation OP_StartTime State
140338840797248 High Pri column_family_name_000003 Flush 2015/03/09-17:49:59
140338844991552 High Pri column_family_name_000004 Flush 2015/03/09-17:49:59
140338849185856 Low Pri
140338983403584 Low Pri
140339008569408 Low Pri
140338861768768 Low Pri
140338924683328 Low Pri
140338899517504 Low Pri
140338853380160 Low Pri
140338882740288 Low Pri
140338865963072 High Pri column_family_name_000006 Flush 2015/03/09-17:49:59
140338954043456 Low Pri
140338857574464 Low Pri
Reviewers: igor, rven, sdong
Reviewed By: sdong
Subscribers: lgalanis, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34689
Summary: Introduce parameter -read_random_exp_range in db_bench to provide some key skewness in readrandom and multireadrandom benchmarks. It will helpful to cover block cache better.
Test Plan:
Run benchmarks with this new parameter. I can clearly see block cache hit rate change while I increase this value (DB size is about 66MB):
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=0.0
rocksdb.block.cache.data.miss COUNT : 958418
rocksdb.block.cache.data.hit COUNT : 41582
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=5.0
rocksdb.block.cache.data.miss COUNT : 819518
rocksdb.block.cache.data.hit COUNT : 180482
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=10.0
rocksdb.block.cache.data.miss COUNT : 450479
rocksdb.block.cache.data.hit COUNT : 549521
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=20.0
rocksdb.block.cache.data.miss COUNT : 223192
rocksdb.block.cache.data.hit COUNT : 776808
Reviewers: MarkCallaghan, kradhakrishnan, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D34629
Summary: I want to be able to set this through mongo config.
Test Plan: added unit test
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34599
Summary:
Add --thread_status_per_interval to db_bench, which allows
db_bench to optionally enable print the current thread status
periodically.
Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10
Sample output:
ThreadID ThreadType dbName cfName Operation State
140281571770432 Low Pri
140281575964736 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000001 Flush
140281710182464 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000008 Compaction
140281638879296 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000007 Compaction
140281592741952 Low Pri
140281580159040 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000002 Flush
140281676628032 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000006 Compaction
140281584353344 Low Pri
140281622102080 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000009 Compaction
140281605324864 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000004 Compaction
140281601130560 High Pri /tmp/rocksdbtest-5297/dbbench default Flush
140281596936256 Low Pri
140281588547648 Low Pri
Reviewers: igor, rven, sdong
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34515
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.
In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.
Test Plan: New unit tests and pass tests suites including valgrind.
Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo
Reviewed By: ikabiljo
Subscribers: yoshinorim, ikabiljo, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31437
Summary:
Summary:
Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most
accesses to the store are key hits and we dont need to optimize performance for key misses.
This is useful when you have a very large database and most of your lookups succeed. The option allows the store to
not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.
This option is only provided for BlockBasedTable. We skip the filters when we are compacting
Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
2. Added another unit test to db_test which specifically tests that filters are being skipped
Reviewers: rven, igor, sdong
Reviewed By: sdong
Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33717
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
An old commit (482401) changed DoWrite to use the value of --writes rather
than --num to determine the range for keys. This restores the old and correct
behavior which is to limit it using --num.
Task ID: #6353043
Blame Rev:
Test Plan:
run db_bench
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34065
Summary:
In a release build, a member was not being accessed. This
member was only being accessed in a debug build. We now add an accessor
function for this member and the buid succeeds.
Test Plan: build release/unity/debug on linux/mac
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34035
Summary: This causes warnings on OS X
Test Plan: compiles
Reviewers: rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33969
Summary:
This diff contains trivial fixes for 6 scan-build warnings:
**db/c_test.c**
`db` variable is never read. Removed assignment.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath
**db/db_iter.cc**
`skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath
**db/log_reader.cc**
In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath
In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases.
scan-build reprots:
http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPathhttp://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath
**table/plain_table_key_coding.cc**
Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath
**tools/db_stress.cc**
Missing `break` in switch case block. This seems to be a bug. Added missing `break`.
Test Plan:
Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs.
```lang=bash
% make check
% make analyze
```
Reviewers: meyering, igor, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33795
Summary:
The "const" attribute applies to the type, and placing it
before that return type retains the desired semantics,
yet avoids the compiler error/warning.
Test Plan: Run make
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33789
Summary:
Otherwise, we would assert that an unsigned expression is always >= 0.
The intent was to form a possibly negative number, and to assert that
that value is always >= 0, but since one variable in the computation
was unsigned, the result was guaranteed to be unsigned, too, rendering
the assertion useless.
Cast that unsigned variable to "int", so that all operands
are signed, and thus so that the result can be negative.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33771
Summary:
The "const" attribute does not make sense on a return type,
and provokes a warning/error from gcc -W -Wextra.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33759
Summary:
Remove some always-true assertions.
They provoke these compilation failures:
table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
* table/plain_table_key_coding.cc (rocksdb): Remove assertion that
unsigned type variable is >= 0.
* db/version_set.cc (DoGenerateLevelFilesBrief): Likewise.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33747
Summary:
The bug is detected by scan-build.
In `void WriteSeqSeekSeq(ThreadState* thread)` memory is allocated in line 3118 `Slice key = AllocateKey();` but `Slice` is not responsible deleting `Slice::data()`.
Added `std::unique_ptr<const char[]>*` parameter to ` AllocateKey()`, so that it requires caller to not forget about Slice::data() management.
scan-build bug report: http://home.fburl.com/~sugak/latest6/report-6e9754.html#EndPath
Test Plan:
Make sure scan-build does not report 'Memory leak' in db/db_bench.cc and all tests are passing.
```lang=bash
% make analyze
% make check
```
Reviewers: lgalanis, igor, meyering, sdong
Reviewed By: meyering, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33501
Summary:
Prior to this change, "make check" would always waste a lot of
time relinking 60+ binaries. With this change, it does that
only when the generated file, util/build_version.cc, changes,
and that happens only when the date changes or when the
current git SHA changes.
This change makes some other improvements: before, there was no
rule to build a deleted util/build_version.cc. If it was somehow
removed, any attempt to link a program would fail.
There is no longer any need for the separate file,
build_tools/build_detect_version. Its functionality is
now in the Makefile.
* Makefile (DEPFILES): Don't filter-out util/build_version.cc.
No need, and besides, removing that dependency was wrong.
(date, git_sha, gen_build_version): New helper variables.
(util/build_version.cc): New rule, to create this file
and update it only if it would contain new information.
* build_tools/build_detect_platform: Remove file.
* db/db_impl.cc: Now, print only date (not the time).
* util/build_version.h (rocksdb_build_compile_time): Remove
declaration. No longer used.
Test Plan:
- Run "make check" twice, and note that the second time no linking is performed.
- Remove util/build_version.cc and ensure that any "make"
command regenerates it before doing anything else.
- Run this: strings librocksdb.a|grep _build_.
That prints output including the following:
rocksdb_build_git_date:2015-02-19
rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33591
Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33327
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.
Test Plan: Managed* tests in db_test and XF tests for managed iterator
Reviewers: igor, yhchiang, anthony, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31401
Summary:
Remove ThreadStatusMultiCompaction test as it's currently written
in a way that depends on some randomness, while the flush / compaction
status of a single thread is also covered in ThreadStatusFlush
and ThreadStatusSingleCompaction tests.
Test Plan: ./db_test
Reviewers: igor, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33537
Summary: Allow GetThreadList to reflect flush activity.
Test Plan:
Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test.
./db_test ./thread_list_test
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32871
Summary:
It would be good to assing background job their IDs. Two benefits:
1) makes LOGs more readable
2) I might use it in my EventLogger, which will try to make our LOG easier to read/query/visualize
Test Plan: ran rocksdb, read the LOG
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31617
Summary: DBTest.DestroyDBMetaDatabase occasionally fails on my dev host, for file not existing. Always create directories to avoid that.
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33321
Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter.
Test Plan: Add a unit test for it
Reviewers: yhchiang, igor, rven
Reviewed By: rven
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32889
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex
Test Plan: Add a test and run it.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33177
Summary: For description of the bug, see comment in db_test. The fix is pretty straight forward.
Test Plan: added unit test. eventually we need better testing of FOF/POF process.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33081
Summary:
- In statistics.h , added tickers.
- In version_set.cc,
-- Added a getter method for hit_file_level_ in the class FilePicker
-- Added a line in the Get() method in case of a found, increment the corresponding counters based on the level of the file respectively.
Corresponding task: https://our.intern.facebook.com/intern/tasks/?s=506100481&t=5952818
Personal fork: 0c3f2e3600
Test Plan:
In terminal,
```
make -j32 db_test
ROCKSDB_TESTS=L0L1L2AndUpHitCounter ./db_test
```
Or to use debugger,
```
make -j32 db_test
export ROCKSDB_TESTS=L0L1L2AndUpHitCounter
gdb db_test
```
Reviewers: rven, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32205
Summary: Having a pointer for DB will be helpful to debug when GDB or working on a dump. If the client process doesn't have any thread actively working on RocksDB, it can be hard to find out.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33159
Summary:
This diff basically reverts D30249 and also adds a unit test that was failing before this patch.
I have no idea how I didn't catch this terrible bug when writing a diff, sorry about that :(
I think we should redesign our system of keeping track of and deleting files. This is already a second bug in this critical piece of code. I'll think of few ideas.
BTW this diff is also a regression when running lots of column families. I plan to revisit this separately.
Test Plan: added a unit test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33045
Summary:
When DestroyDB() finds a wal file in the DB directory, it assumes it is actually in WAL directory. This can lead to confusion, since it reports IO error when it tries to delete wal file from DB directory. For example: https://ci-builds.fb.com/job/rocksdb_clang_build/296/console
This change will fix our unit tests.
Test Plan: unit tests work
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32907
Summary:
Add a counter for collecting the wait time on db mutex.
Also add MutexWrapper and CondVarWrapper for measuring wait time.
Test Plan:
./db_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test
verify stats output using db_bench
make clean
make release
./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10
Sample output:
rocksdb.db.mutex.wait.micros COUNT : 7546866
Reviewers: MarkCallaghan, rven, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32787
Summary: db_test's test class SpecialEnv has a thread unsafe variable rnd_ but it can be accessed by multiple threads. It is complained by TSAN. Protect it by a mutex.
Test Plan: Run the test
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32895
Summary: Added requirement that ComputeCompactionScore() be executed in mutex, since it's accessing being_compacted bool, which can be mutated by other threads. Also added more comments about thread safety of FileMetaData, since it was a bit confusing. However, it seems that FileMetaData doesn't have data races (except being_compacted)
Test Plan: Ran 100 ConvertCompactionStyle tests with thread sanitizer. On master -- some failures. With this patch -- none.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32283
Summary:
Get() now doesn't make use of bloom filter if it is prefix based. Add the check.
Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter.
Test Plan:
make all check
Add a test case in DBTest
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31941
Summary:
This diff containes the changes to the code and db_test
for supporting cross functional tests for inplace_update
Test Plan: Run XF with inplace_test and also without
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32367
Summary:
When building on my host, I saw warning:
In file included from db/db_iter_test.cc:17:0:
db/db_iter_test.cc: In member function ‘void rocksdb::_Test_DBIterator::_Run()’:
./util/testharness.h:147:14: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without
void TCONCAT(_Test_,name)::_Run()
^
./util/testharness.h:134:23: note: in definition of macro ‘TCONCAT1’
#define TCONCAT1(a,b) a##b
^
./util/testharness.h:147:6: note: in expansion of macro ‘TCONCAT’
void TCONCAT(_Test_,name)::_Run()
^
db/db_iter_test.cc:589:1: note: in expansion of macro ‘TEST’
TEST(DBIteratorTest, DBIterator) {
^
By dividing the test into small tests, it should fix the problem
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32679
Summary:
This Diff provides the implementation of the cross functional
test infrastructure. This provides the ability to test a single feature
with every existing regression test in order to identify issues with
interoperability between features.
Test Plan:
Reference implementation of inplace update support cross
functional test. Able to find interoperability issues with inplace
support and ran all of db_test. Will add separate diff for those changes.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32247
Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key.
Test Plan:
Add a test case for
db_test
options_test
and a new test
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31887
Summary: DBTest.SharedWriteBuffer uses an Options that doesn't pass CurrentOptions(), so that it doesn't use MockEnv. However, DBTest's constructor uses MockEnv to call DestoryDB() to clean up, causing uncleaned state before it runs.
Test Plan: Run the test modified to make sure they pass default Env and SharedWriteBuffer now passes MockEnv.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32475
Summary:
There's a bug in TSAN (or libstdc++?) with std::shared_ptr<> for some reason. In db_test, only FlushSchedule is affected.
See more: https://groups.google.com/forum/#!topic/thread-sanitizer/vz_s-t226Vg
With this change and all other @sdong's and mine diffs, our db_test should be TSAN-clean. I'll move to other tests.
Test Plan: no more flush schedule when running TSAN
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32469
Summary: Add a new test case in fault_injection_test, which covers parallel compactions and multiple levels. Use MockEnv to run the new test case to speed it up. Improve MockEnv to avoid DestoryDB(), previously failed when deleting lock files.
Test Plan: Run ./fault_injection_test, including valgrind
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32415
Summary: fault_injection_test occasionally fails because file closing can happen after deletion. Improve the test to support it.
Test Plan: I have a new test case I'm working on, where the issue appears almost every time. With the patch, the problem goes away.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32373
Summary: Fixed a compile warning in clang in db/listener_test.cc
Test Plan: make listener_test
Reviewers: oridb
Reviewed By: oridb
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32337
Summary: This adds a listener for compactions, and gives some useful statistics on each compaction pass.
Test Plan: Unit tests.
Reviewers: sdong, igor, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D31641
Summary: I'm not sure the expected results of std::atomic::fetch_sub() when using memory_order_relaxed, and I suspect TSAN complains.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32259
Summary:
We see failure of the test in travis but I can't repro it.
Add more logging in failure cases to help us figure out which failure it is.
Also makes synchronization slightly stronger, though there isn't seem to be a problem without it
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32319
Summary: log_dir_unsynced_ is a confusing name. Rename it to log_dir_synced_ and flip the value.
Test Plan: Run ./fault_injection_test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32235
Summary: Currently fault_injection_test has a test case to drop all the unsynced data. Add one more case to take a randomized bytes from it.
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32229
Summary:
1. If WAL directory is different from db directory. Sync the directory after creating a log file under it.
2. After creating an SST file, sync its parent directory instead of DB directory.
3. change the check of kResetDeleteUnsyncedFiles in fault_injection_test. Since we changed the behavior to sync log files' parent directory after first WAL sync, instead of creating, kResetDeleteUnsyncedFiles will not guarantee to show post sync updates.
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32067
Summary:
This is first in a series of diffs that fixes data races detected by thread sanitizer.
Here the problem is that we call Ref() on a column family during a single-threaded write, without holding a mutex.
Test Plan: TSAN is no longer complaining about LevelLimitReopen.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32121
Summary: We should not be calling InternalStats methods outside of the mutex.
Test Plan:
COMPILE_WITH_TSAN=1 m db_test && ROCKSDB_TESTS=CompactionTrigger ./db_test
failing before the diff, works now
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32127
Summary: More race condition bugs with our archive WAL files. I do believe this caused t5988326, but can't reproduce the failure unfortunately.
Test Plan: make check
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32103
Summary: 3 iterations were disabled by mistake by one recent commit, causing CLANG build error. Fix it
Test Plan:
USE_CLANG=1 make fault_injection_test
and run the test
Reviewers: igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32109
Summary: fault_injection_test.cc has two variable names not following the convention fix it.
Test Plan: run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32097
Summary:
Wrapper classes in fault_injection_test doesn't simulate RocksDB Env behavior close enough. Improve it by:
(1) when fsync, don't sync parent
(2) support directory fsync
(3) support multiple directories
Add test cases of
(1) persisting by WAL fsync, not just compact range
(2) different WAL dir
(3) combination of (1) and (2)
(4) data directory is not the same as db name.
Test Plan: Run the test and make sure it passes.
Reviewers: rven, yhchiang, igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32031
Summary: Now we don't sync manifest file when initializing it, so DB cannot be safely reopened before the first mem table flush. Fix it by syncing it. This fixes fault_injection_test.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32001
Summary:
I saw this when running readrandom benchmark with corrupted database -- benchmark worked!
If a Get() returns corruption we should probably abort.
Test Plan: compiles
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31701
Summary: GetLiveFilesMetaData() already adds a leading "/" in file name. No need to add one extra "/" in DBImpl::CheckConsistency()
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D31779
Summary:
This patch remove the unnecessary Compaction::ReleaseInputs().
Compaction::ReleaseInputs() tries to unref its input_version
and column_family. However, such unref is always done in
~Compaction(), and all current ReleaseInputs() calls are
right before the destructor.
Test Plan: ./db_test
Reviewers: igor
Reviewed By: igor
Subscribers: igor, rven, dhruba, sdong
Differential Revision: https://reviews.facebook.net/D31605
Summary:
This is a port of [[ https://github.com/google/leveldb/blob/master/db/fault_injection_test.cc | LevelDB's fault_injection_test ]] to RocksDB. Unfortunately it fails with:
```
==== Test FaultInjectionTest.FaultTest
db/fault_injection_test.cc:491: Corruption: no meta-nextfile entry in descriptor
#0 ./fault_injection_test() [0x41477a] rocksdb::FaultInjectionTest::PartialCompactTestReopenWithFault(rocksdb::FaultInjectionTest::ResetMethod, int, int) /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:491
#1 ./fault_injection_test() [0x40a38a] rocksdb::_Test_FaultTest::_Run() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:517
#2 ./fault_injection_test() [0x415bea] rocksdb::_Test_FaultTest::_RunIt() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:507
#3 ./fault_injection_test() [0x584367] rocksdb::test::RunAllTests() /data/users/tomdzk/rocksdb/util/testharness.cc:70
#4 /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6(__libc_start_main+0x10e) [0x7f7a40857efe] ?? ??:0
#5 ./fault_injection_test() [0x408bb8] _start ??:0
```
so I commented out the test invocation in the source code for now (lines 514-520) so it can be merged.
Test Plan: This is a new test.
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31587
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.
It does not affect format for snappy.
If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.
Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB
Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31461
Test Plan: Compile. Run it.
Reviewers: yhchiang, dhruba, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D31479
Summary: We keep checksum functions in util/, there is no reason for compression to be in port/
Test Plan: compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31281
Summary:
Create a benchmark for testing memtablereps. This diff is a bit rough, but it should do the trick until other bootcampers can clean it up.
Addressing comments
Removed the mutexes
Changed ReadWriteBenchmark to fix number of reads and count the number of writes we can perform in that time.
Test Plan:
Run it.
Below runs pass
./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep skiplist
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep skiplist
./memtablerep_bench --benchmarks readwrite,seqreadwrite --memtablerep skiplist --num_operations 200 --num_threads 5
./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep hashskiplist
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep hashskiplist
--num_scans 2
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep vector
Reviewers: jpaton, ikabiljo, sdong
Reviewed By: sdong
Subscribers: dhruba, ameyag
Differential Revision: https://reviews.facebook.net/D22683
Summary: Add comments in db.h to help users discover their options.
Test Plan: Compile
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31077
Summary: Add an extra assert to make sure current version is included in VersionSet::AddLiveFiles().
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, hermanlee4, leveldb
Differential Revision: https://reviews.facebook.net/D30819
Summary: During recovery, VersionBuilder::Apply() was called multiple times. If the DB is open for long enough, most of files added earlier will be deleted by later deletes. In current solution, sorting added file happens first and then deletes are applied. In this patch, deletes are applied when possible inside Apply(), which can significantly reduce the sorting time in some cases.
Test Plan:
Add unit tests in version_builder
valgrind_check
Open a manifest of 50MB, with 9K live files. The manifest read time reduced from 1.6 seconds to 0.7 seconds.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30765
Summary:
This patch changes concurrency guarantees around ColumnFamilySet::column_families_ and ColumnFamilySet::column_families_data_.
Before:
* When mutating: lock DB mutex and spin lock
* When reading: lock DB mutex OR spin lock
After:
* When mutating: lock DB mutex and be in write thread
* When reading: lock DB mutex or be in write thread
That way, we eliminate the spin lock that protects these hash maps and simplify concurrency. That means we don't need to lock the spin lock during writing, since writing is mutually exclusive with column family create/drop (the only operations that mutate those hash maps).
With these new restrictions, I also needed to move column family create to the write thread (column family drop was already in the write thread).
Even though we don't need to lock the spin lock during write, impact on performance should be minimal -- the spin lock is almost never busy, so locking it is almost free.
This addresses task t5116919.
Test Plan:
make check
Stress test with lots and lots of column family drop and create:
time ./db_stress --threads=30 --ops_per_thread=5000000 --max_key=5000 --column_families=200 --clear_column_family_one_in=100000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress/
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30651
Summary: When trivial move commit is done, we log the summary of the input version instead of current. This is inconsistent with other log messages and confusing.
Test Plan: compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30939
Summary:
A command line like this to run all the tests:
source benchmark.config.sh && nohup ./benchmark.sh 'bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting'
where
benchmark.config.sh is:
export DB_DIR=/data/mysql/rocksdata
export WAL_DIR=/txlogs/rockswal
export OUTPUT_DIR=/root/rocks_benchmarking/output
Will fail for the tests that need a new DB .
Also 1) set disable_data_sync=0 and 2) add debug mode to run through all the tests more quickly
Test Plan: run ./benchmark.sh 'debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' and verify that there are no complaints about WAL dir not being empty.
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D30909
Summary:
Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery.
The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings.
Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery.
Reviewers: yhchiang, rven, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30603
Summary:
This is a serious bug. If paranod_check == true and WAL is corrupted, we don't fail DB::Open(). I tried going into history and it seems we've been doing this for a long long time.
I found this when investigating t5852041.
Test Plan: Added unit test to verify correct behavior.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30597
Summary: CLANG was broken for a recent change in db_ench. Fix it.
Test Plan: Build db_bench using CLANG.
Reviewers: rven, igor, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30801
Summary:
Add structures for exposing events and operations. Event describes
high-level action about a thread such as doing compaciton or
doing flush, while an operation describes lower-level action
of a thread such as reading / writing a SST table, waiting for
mutex. Events and operations are designed to be independent.
One thread would typically involve in one event and one operation.
Code instrument will be in a separate diff.
Test Plan:
Add unit-tests in thread_list_test
make dbg -j32
./thread_list_test
export ROCKSDB_TESTS=ThreadList
./db_test
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: rven, jonahcohen, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29781
Summary: Having --num_hot_column_families default on fails some existing regression tests. By default turn it off
Test Plan: Run db_bench to make sure it is default off.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D30705
Summary:
Add option --num_hot_column_families in db_bench. If it is set, write options will first write to that number of column families, and then move on to next set of hot column families. The working set of column families can be smaller than total number of CFs.
It is to test how RocksDB can handle cold column families
Test Plan: Run db_bench with --num_hot_column_families set and not set.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30663
Dike out the body of VerifyCompactionResult. With assert() compiled out, the
loop index variable in the inner loop was unused, breaking the build when
-Werror is enabled.
Summary:
GetThreadList() feature depends on the thread creation and destruction, which is currently handled under Env.
This patch moves GetThreadList() feature under Env to better manage the dependency of GetThreadList() feature
on thread creation and destruction.
Renamed ThreadStatusImpl to ThreadStatusUpdater. Add ThreadStatusUtil, which is a static class contains
utility functions for ThreadStatusUpdater.
Test Plan: run db_test, thread_list_test and db_bench and verify the life cycle of Env and ThreadStatusUpdater is properly managed.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: ljin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30057
Summary: As title. We shouldn't need to execute flush from compaction if there are dedicated threads doing flushes.
Test Plan: make check
Reviewers: rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30579
Summary:
There are two versions of FindObsoleteFiles():
* full scan, which is executed every 6 hours (and it's terribly slow)
* no full scan, which is executed every time a background process finishes and iterator is deleted
This diff is optimizing the second case (no full scan). Here's what we do before the diff:
* Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live.
* Get the list of live files to avoid deleting files that are live.
* Delete files that are in obsolete_files and not in live_files.
After this diff:
* The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files.
* Get the list of obsolete files (which exclude moved files).
* No need to get the list of live files, since all files in obsolete_files need to be deleted.
I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123
This depends on D30123.
P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff.
Test Plan:
One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version.
make check
Big number of compactions and flushes:
./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30249
Summary:
This one wasn't easy to find :)
What happens is we go through all cfds on flush_queue_ and find no cfds to flush, *but* the cfd is set to the last CF we looped through and following code assumes we want it flushed.
BTW @sdong do you think we should also make BackgroundFlush() only check a single cfd for flushing instead of doing this `while (!flush_queue_.empty())`?
Test Plan: regression test no longer fails
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30591
Summary:
When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.
The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.
Here are the performance results:
Command:
./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000 --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333
Before the patch:
fillrandom : 26.950 micros/op 37105 ops/sec; 4.1 MB/s
After the patch:
fillrandom : 17.404 micros/op 57456 ops/sec; 6.4 MB/s
Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:
fillrandom : 7.590 micros/op 131758 ops/sec; 14.6 MB/s
Test Plan:
make check
two stress tests:
Big number of compactions and flushes:
./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
max_background_flushes=0, to verify that this case also works correctly
./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30123
Summary: Avoid unnecessary unlock and lock mutex when notifying events.
Test Plan: ./listener_test
Reviewers: igor
Reviewed By: igor
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30267
Summary:
We now release the mutex before copying the files in the case
of the trivial move. This path does not use the compaction job.
Test Plan: DBTest.LevelCompactionThirdPath
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30381
Summary:
Allow Level-style compaction to place files in different paths
This diff provides the code for task 4854591. We now support level-compaction
to place files in different paths by specifying them in db_paths along with
the minimum level for files to store in that path.
Test Plan: ManualLevelCompactionOutputPathId in db_test.cc
Reviewers: yhchiang, MarkCallaghan, dhruba, yoshinorim, sdong
Reviewed By: sdong
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29799
Summary:
ASAN build fails once for this error:
14:04:52 ==== Test DBTest.CompactFilesOnLevelCompaction
14:04:52 db_test: db/version_set.cc:1062: void rocksdb::VersionStorageInfo::AddFile(int, rocksdb::FileMetaData*): Assertion `level <= 0 || level_files->empty() || internal_comparator_->Compare( (*level_files)[level_files->size() - 1]->largest, f->smallest) < 0' failed.
Not abling figure out reason. We use std:vector for sorting for save and add one more assert to help figure out whether it is the sorting's problem.
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D30117
Summary: Now DB::GetSnapshot() doesn't scale to more column families, as it needs to go through all the column families to find whether snapshot is supported. This patch optimizes it.
Test Plan:
Add unit tests to cover negative cases.
make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30093
Summary: Set operations in VerisonBuilder is shown as a performance bottleneck of restarting DB when there are lots of files. Make both of added_files and deleted_files use unordered set or map. Only when adding the files, sort the added files.
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: hermanlee4, leveldb, dhruba, ljin
Differential Revision: https://reviews.facebook.net/D30051
Summary: This is a regression bug introduced by https://reviews.facebook.net/D24729 . max_total_wal_size would be off the target it should be more and more in the case that the a user holds the current super version after flush or compaction. This patch fixes it
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: ljin, yoshinorim, MarkCallaghan, hermanlee4, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29961
Summary: When wal_dir is used, DestroyDB is not passed the wal_dir option and so we get a Corruption exception.
Test Plan:
Verified manually that the following command line works now:
./db_bench --db=/mnt/db/rocksdb ... --disable_wal=0 --wal_dir=/data/users/rocksdb/WAL... --benchmarks=filluniquerandom --use_existing_db=0...
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29859
Summary:
Add a counter in SnapshotList to show number of snapshots. Also a unix timestamp in every snapshot.
Add two DB Properties to return number of snapshots and timestamp of the oldest one.
Test Plan: Add unit test checking
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba, MarkCallaghan
Differential Revision: https://reviews.facebook.net/D29919
Summary:
Remove the use of exception in WriteBatch::Handler. Now the default
implementations of Put, Merge, and Delete in WriteBatch::Handler are no-op.
Test Plan:
Add three test cases in write_batch_test
./write_batch_test
Reviewers: sdong, igor
Reviewed By: sdong, igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29835
Summary: Replace exception by setting valid_ = false in DBIter::MergeValuesNewToOld().
Test Plan:
Not sure if I am right at this, but it seems we currently don't have a good
way to test that code path as it requires dynamically set merge_operator = nullptr
at the time while Merge() is calling.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29811
Summary:
Introduces a new class for managing write buffer memory across column
families. We supplement ColumnFamilyOptions::write_buffer_size with
ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
instance that enforces memory limits before flushing out to disk.
Test Plan: Added SharedWriteBuffer unit test to db_test.cc
Reviewers: sdong, rven, ljin, igor
Reviewed By: igor
Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
Differential Revision: https://reviews.facebook.net/D22581
Summary:
Two fixes:
1. if cpplint is not present on the system, don't return a confusing error in the linter
2. Add include_alpha, which means our includes should be sorted lexicographically
Test Plan: Tried unsorting our includes, lint complained
Reviewers: rven, ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28845
Summary: Block Universal and FIFO compactions in ROCKSDB_LITE
Test Plan:
make shared_lib -j32
make OPT=-DROCKSDB_LITE shared_lib
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29589
Summary:
db_imp_readonly.o is one of the big obj file. If it's not a necessary
feature, we should probably block it in ROCKSDB_LITE.
1322704 Nov 24 16:55 db/db_impl_readonly.o
Test Plan:
make shared_lib -j32
make ROCKSDB_LITE shared_lib -j32
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29583
Summary: Block internal_stats in ROCKSDB_LITE.
Test Plan: make OPT=-DROCKSDB_LITE shared_lib
Reviewers: ljin, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29541
Summary:
In some environment such as android, the c++ library does not have
std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string
when std::to_string is not available, and implements std::to_string
in the other case.
Test Plan:
make dbg -j32
./db_test
make clean
make dbg OPT=-DOS_ANDROID -j32
./db_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29181
Summary:
This compaction trigger does not seem to test any thing specific to
cuckoo table. Remove it.
Test Plan: make all check
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29523
Summary:
Reported by bootcamper
This causes ldb tool to fail the assertion in ~ColumnFamilyData()
Test Plan:
./ldb --db=/tmp/test_db1 --create_if_missing put a1 b1
./ldb manifest_dump --path=/tmp/test_db1/MANIFEST-000001
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29517
Summary: Free checkpoint after its directory is removed.
Test Plan: Run valgrind with GetSnapshotLink.
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29493
Summary:
An entry of ConstantColumnFamilyInfo is created when:
1. DB::Open
2. CreateColumnFamily.
However, there are cases that DB::Open could also call CreateColumnFamily
when create_missing_column_families=true. As a result, it will create
duplicate ConstantColumnFamilyInfo and one of them would be leaked.
Test Plan: ./deletefile_test
Reviewers: igor, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29307
Summary: "build all" breaks in Clang mode with db_bench. Fix it.
Test Plan: USE_CLANG=1 make all
Reviewers: ljin, rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D29379
Summary: Improve listener_test by ensuring flushes are completed before assert.
Test Plan: listener_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29319
Summary: stringSplit is not how we name our functions. Also, we had two StringSplit's in the codebase
Test Plan: make check
Reviewers: yhchiang, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29361
Summary:
Add enable_thread_tracking to DBOptions to allow
tracking thread status related to the DB. Default is off.
Test Plan:
export ROCKSDB_TESTS=ThreadList
./db_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29289