Summary:
An old commit (482401) changed DoWrite to use the value of --writes rather
than --num to determine the range for keys. This restores the old and correct
behavior which is to limit it using --num.
Task ID: #6353043
Blame Rev:
Test Plan:
run db_bench
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34065
Summary:
In a release build, a member was not being accessed. This
member was only being accessed in a debug build. We now add an accessor
function for this member and the buid succeeds.
Test Plan: build release/unity/debug on linux/mac
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34035
Summary: This causes warnings on OS X
Test Plan: compiles
Reviewers: rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33969
Summary:
This diff contains trivial fixes for 6 scan-build warnings:
**db/c_test.c**
`db` variable is never read. Removed assignment.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath
**db/db_iter.cc**
`skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath
**db/log_reader.cc**
In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath
In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases.
scan-build reprots:
http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPathhttp://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath
**table/plain_table_key_coding.cc**
Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath
**tools/db_stress.cc**
Missing `break` in switch case block. This seems to be a bug. Added missing `break`.
Test Plan:
Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs.
```lang=bash
% make check
% make analyze
```
Reviewers: meyering, igor, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33795
Summary:
The "const" attribute applies to the type, and placing it
before that return type retains the desired semantics,
yet avoids the compiler error/warning.
Test Plan: Run make
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33789
Summary:
Otherwise, we would assert that an unsigned expression is always >= 0.
The intent was to form a possibly negative number, and to assert that
that value is always >= 0, but since one variable in the computation
was unsigned, the result was guaranteed to be unsigned, too, rendering
the assertion useless.
Cast that unsigned variable to "int", so that all operands
are signed, and thus so that the result can be negative.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33771
Summary:
The "const" attribute does not make sense on a return type,
and provokes a warning/error from gcc -W -Wextra.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33759
Summary:
Remove some always-true assertions.
They provoke these compilation failures:
table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
* table/plain_table_key_coding.cc (rocksdb): Remove assertion that
unsigned type variable is >= 0.
* db/version_set.cc (DoGenerateLevelFilesBrief): Likewise.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33747
Summary:
The bug is detected by scan-build.
In `void WriteSeqSeekSeq(ThreadState* thread)` memory is allocated in line 3118 `Slice key = AllocateKey();` but `Slice` is not responsible deleting `Slice::data()`.
Added `std::unique_ptr<const char[]>*` parameter to ` AllocateKey()`, so that it requires caller to not forget about Slice::data() management.
scan-build bug report: http://home.fburl.com/~sugak/latest6/report-6e9754.html#EndPath
Test Plan:
Make sure scan-build does not report 'Memory leak' in db/db_bench.cc and all tests are passing.
```lang=bash
% make analyze
% make check
```
Reviewers: lgalanis, igor, meyering, sdong
Reviewed By: meyering, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33501
Summary:
Prior to this change, "make check" would always waste a lot of
time relinking 60+ binaries. With this change, it does that
only when the generated file, util/build_version.cc, changes,
and that happens only when the date changes or when the
current git SHA changes.
This change makes some other improvements: before, there was no
rule to build a deleted util/build_version.cc. If it was somehow
removed, any attempt to link a program would fail.
There is no longer any need for the separate file,
build_tools/build_detect_version. Its functionality is
now in the Makefile.
* Makefile (DEPFILES): Don't filter-out util/build_version.cc.
No need, and besides, removing that dependency was wrong.
(date, git_sha, gen_build_version): New helper variables.
(util/build_version.cc): New rule, to create this file
and update it only if it would contain new information.
* build_tools/build_detect_platform: Remove file.
* db/db_impl.cc: Now, print only date (not the time).
* util/build_version.h (rocksdb_build_compile_time): Remove
declaration. No longer used.
Test Plan:
- Run "make check" twice, and note that the second time no linking is performed.
- Remove util/build_version.cc and ensure that any "make"
command regenerates it before doing anything else.
- Run this: strings librocksdb.a|grep _build_.
That prints output including the following:
rocksdb_build_git_date:2015-02-19
rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33591
Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33327
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.
Test Plan: Managed* tests in db_test and XF tests for managed iterator
Reviewers: igor, yhchiang, anthony, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31401
Summary:
Remove ThreadStatusMultiCompaction test as it's currently written
in a way that depends on some randomness, while the flush / compaction
status of a single thread is also covered in ThreadStatusFlush
and ThreadStatusSingleCompaction tests.
Test Plan: ./db_test
Reviewers: igor, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33537
Summary: Allow GetThreadList to reflect flush activity.
Test Plan:
Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test.
./db_test ./thread_list_test
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32871
Summary:
It would be good to assing background job their IDs. Two benefits:
1) makes LOGs more readable
2) I might use it in my EventLogger, which will try to make our LOG easier to read/query/visualize
Test Plan: ran rocksdb, read the LOG
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31617
Summary: DBTest.DestroyDBMetaDatabase occasionally fails on my dev host, for file not existing. Always create directories to avoid that.
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33321
Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter.
Test Plan: Add a unit test for it
Reviewers: yhchiang, igor, rven
Reviewed By: rven
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32889
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex
Test Plan: Add a test and run it.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33177
Summary: For description of the bug, see comment in db_test. The fix is pretty straight forward.
Test Plan: added unit test. eventually we need better testing of FOF/POF process.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33081
Summary:
- In statistics.h , added tickers.
- In version_set.cc,
-- Added a getter method for hit_file_level_ in the class FilePicker
-- Added a line in the Get() method in case of a found, increment the corresponding counters based on the level of the file respectively.
Corresponding task: https://our.intern.facebook.com/intern/tasks/?s=506100481&t=5952818
Personal fork: 0c3f2e3600
Test Plan:
In terminal,
```
make -j32 db_test
ROCKSDB_TESTS=L0L1L2AndUpHitCounter ./db_test
```
Or to use debugger,
```
make -j32 db_test
export ROCKSDB_TESTS=L0L1L2AndUpHitCounter
gdb db_test
```
Reviewers: rven, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32205
Summary: Having a pointer for DB will be helpful to debug when GDB or working on a dump. If the client process doesn't have any thread actively working on RocksDB, it can be hard to find out.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33159
Summary:
This diff basically reverts D30249 and also adds a unit test that was failing before this patch.
I have no idea how I didn't catch this terrible bug when writing a diff, sorry about that :(
I think we should redesign our system of keeping track of and deleting files. This is already a second bug in this critical piece of code. I'll think of few ideas.
BTW this diff is also a regression when running lots of column families. I plan to revisit this separately.
Test Plan: added a unit test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33045
Summary:
When DestroyDB() finds a wal file in the DB directory, it assumes it is actually in WAL directory. This can lead to confusion, since it reports IO error when it tries to delete wal file from DB directory. For example: https://ci-builds.fb.com/job/rocksdb_clang_build/296/console
This change will fix our unit tests.
Test Plan: unit tests work
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32907
Summary:
Add a counter for collecting the wait time on db mutex.
Also add MutexWrapper and CondVarWrapper for measuring wait time.
Test Plan:
./db_test
export ROCKSDB_TESTS=MutexWaitStats
./db_test
verify stats output using db_bench
make clean
make release
./db_bench --statistics=1 --benchmarks=fillseq,readwhilewriting --num=10000 --threads=10
Sample output:
rocksdb.db.mutex.wait.micros COUNT : 7546866
Reviewers: MarkCallaghan, rven, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32787
Summary: db_test's test class SpecialEnv has a thread unsafe variable rnd_ but it can be accessed by multiple threads. It is complained by TSAN. Protect it by a mutex.
Test Plan: Run the test
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32895
Summary: Added requirement that ComputeCompactionScore() be executed in mutex, since it's accessing being_compacted bool, which can be mutated by other threads. Also added more comments about thread safety of FileMetaData, since it was a bit confusing. However, it seems that FileMetaData doesn't have data races (except being_compacted)
Test Plan: Ran 100 ConvertCompactionStyle tests with thread sanitizer. On master -- some failures. With this patch -- none.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32283
Summary:
Get() now doesn't make use of bloom filter if it is prefix based. Add the check.
Didn't touch block based bloom filter. I can't fully reason whether it is correct to do that. But it's straight-forward to for full bloom filter.
Test Plan:
make all check
Add a test case in DBTest
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31941
Summary:
This diff containes the changes to the code and db_test
for supporting cross functional tests for inplace_update
Test Plan: Run XF with inplace_test and also without
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32367
Summary:
When building on my host, I saw warning:
In file included from db/db_iter_test.cc:17:0:
db/db_iter_test.cc: In member function ‘void rocksdb::_Test_DBIterator::_Run()’:
./util/testharness.h:147:14: note: variable tracking size limit exceeded with -fvar-tracking-assignments, retrying without
void TCONCAT(_Test_,name)::_Run()
^
./util/testharness.h:134:23: note: in definition of macro ‘TCONCAT1’
#define TCONCAT1(a,b) a##b
^
./util/testharness.h:147:6: note: in expansion of macro ‘TCONCAT’
void TCONCAT(_Test_,name)::_Run()
^
db/db_iter_test.cc:589:1: note: in expansion of macro ‘TEST’
TEST(DBIteratorTest, DBIterator) {
^
By dividing the test into small tests, it should fix the problem
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32679
Summary:
This Diff provides the implementation of the cross functional
test infrastructure. This provides the ability to test a single feature
with every existing regression test in order to identify issues with
interoperability between features.
Test Plan:
Reference implementation of inplace update support cross
functional test. Able to find interoperability issues with inplace
support and ran all of db_test. Will add separate diff for those changes.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32247
Summary: Add CappedFixTransform, which is the same as fixed length prefix extractor, except that when slice is shorter than the fixed length, it will use the full key.
Test Plan:
Add a test case for
db_test
options_test
and a new test
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: MarkCallaghan, leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D31887
Summary: DBTest.SharedWriteBuffer uses an Options that doesn't pass CurrentOptions(), so that it doesn't use MockEnv. However, DBTest's constructor uses MockEnv to call DestoryDB() to clean up, causing uncleaned state before it runs.
Test Plan: Run the test modified to make sure they pass default Env and SharedWriteBuffer now passes MockEnv.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32475
Summary:
There's a bug in TSAN (or libstdc++?) with std::shared_ptr<> for some reason. In db_test, only FlushSchedule is affected.
See more: https://groups.google.com/forum/#!topic/thread-sanitizer/vz_s-t226Vg
With this change and all other @sdong's and mine diffs, our db_test should be TSAN-clean. I'll move to other tests.
Test Plan: no more flush schedule when running TSAN
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32469
Summary: Add a new test case in fault_injection_test, which covers parallel compactions and multiple levels. Use MockEnv to run the new test case to speed it up. Improve MockEnv to avoid DestoryDB(), previously failed when deleting lock files.
Test Plan: Run ./fault_injection_test, including valgrind
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32415
Summary: fault_injection_test occasionally fails because file closing can happen after deletion. Improve the test to support it.
Test Plan: I have a new test case I'm working on, where the issue appears almost every time. With the patch, the problem goes away.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32373
Summary: Fixed a compile warning in clang in db/listener_test.cc
Test Plan: make listener_test
Reviewers: oridb
Reviewed By: oridb
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32337
Summary: This adds a listener for compactions, and gives some useful statistics on each compaction pass.
Test Plan: Unit tests.
Reviewers: sdong, igor, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D31641
Summary: I'm not sure the expected results of std::atomic::fetch_sub() when using memory_order_relaxed, and I suspect TSAN complains.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32259
Summary:
We see failure of the test in travis but I can't repro it.
Add more logging in failure cases to help us figure out which failure it is.
Also makes synchronization slightly stronger, though there isn't seem to be a problem without it
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32319
Summary: log_dir_unsynced_ is a confusing name. Rename it to log_dir_synced_ and flip the value.
Test Plan: Run ./fault_injection_test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32235
Summary: Currently fault_injection_test has a test case to drop all the unsynced data. Add one more case to take a randomized bytes from it.
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32229
Summary:
1. If WAL directory is different from db directory. Sync the directory after creating a log file under it.
2. After creating an SST file, sync its parent directory instead of DB directory.
3. change the check of kResetDeleteUnsyncedFiles in fault_injection_test. Since we changed the behavior to sync log files' parent directory after first WAL sync, instead of creating, kResetDeleteUnsyncedFiles will not guarantee to show post sync updates.
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32067
Summary:
This is first in a series of diffs that fixes data races detected by thread sanitizer.
Here the problem is that we call Ref() on a column family during a single-threaded write, without holding a mutex.
Test Plan: TSAN is no longer complaining about LevelLimitReopen.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32121
Summary: We should not be calling InternalStats methods outside of the mutex.
Test Plan:
COMPILE_WITH_TSAN=1 m db_test && ROCKSDB_TESTS=CompactionTrigger ./db_test
failing before the diff, works now
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32127
Summary: More race condition bugs with our archive WAL files. I do believe this caused t5988326, but can't reproduce the failure unfortunately.
Test Plan: make check
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32103
Summary: 3 iterations were disabled by mistake by one recent commit, causing CLANG build error. Fix it
Test Plan:
USE_CLANG=1 make fault_injection_test
and run the test
Reviewers: igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32109
Summary: fault_injection_test.cc has two variable names not following the convention fix it.
Test Plan: run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32097
Summary:
Wrapper classes in fault_injection_test doesn't simulate RocksDB Env behavior close enough. Improve it by:
(1) when fsync, don't sync parent
(2) support directory fsync
(3) support multiple directories
Add test cases of
(1) persisting by WAL fsync, not just compact range
(2) different WAL dir
(3) combination of (1) and (2)
(4) data directory is not the same as db name.
Test Plan: Run the test and make sure it passes.
Reviewers: rven, yhchiang, igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32031
Summary: Now we don't sync manifest file when initializing it, so DB cannot be safely reopened before the first mem table flush. Fix it by syncing it. This fixes fault_injection_test.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32001
Summary:
I saw this when running readrandom benchmark with corrupted database -- benchmark worked!
If a Get() returns corruption we should probably abort.
Test Plan: compiles
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31701
Summary: GetLiveFilesMetaData() already adds a leading "/" in file name. No need to add one extra "/" in DBImpl::CheckConsistency()
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D31779
Summary:
This patch remove the unnecessary Compaction::ReleaseInputs().
Compaction::ReleaseInputs() tries to unref its input_version
and column_family. However, such unref is always done in
~Compaction(), and all current ReleaseInputs() calls are
right before the destructor.
Test Plan: ./db_test
Reviewers: igor
Reviewed By: igor
Subscribers: igor, rven, dhruba, sdong
Differential Revision: https://reviews.facebook.net/D31605
Summary:
This is a port of [[ https://github.com/google/leveldb/blob/master/db/fault_injection_test.cc | LevelDB's fault_injection_test ]] to RocksDB. Unfortunately it fails with:
```
==== Test FaultInjectionTest.FaultTest
db/fault_injection_test.cc:491: Corruption: no meta-nextfile entry in descriptor
#0 ./fault_injection_test() [0x41477a] rocksdb::FaultInjectionTest::PartialCompactTestReopenWithFault(rocksdb::FaultInjectionTest::ResetMethod, int, int) /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:491
#1 ./fault_injection_test() [0x40a38a] rocksdb::_Test_FaultTest::_Run() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:517
#2 ./fault_injection_test() [0x415bea] rocksdb::_Test_FaultTest::_RunIt() /data/users/tomdzk/rocksdb/db/fault_injection_test.cc:507
#3 ./fault_injection_test() [0x584367] rocksdb::test::RunAllTests() /data/users/tomdzk/rocksdb/util/testharness.cc:70
#4 /usr/local/fbcode/gcc-4.8.1-glibc-2.17/lib/libc.so.6(__libc_start_main+0x10e) [0x7f7a40857efe] ?? ??:0
#5 ./fault_injection_test() [0x408bb8] _start ??:0
```
so I commented out the test invocation in the source code for now (lines 514-520) so it can be merged.
Test Plan: This is a new test.
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31587
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.
It does not affect format for snappy.
If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.
Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB
Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31461
Test Plan: Compile. Run it.
Reviewers: yhchiang, dhruba, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D31479
Summary: We keep checksum functions in util/, there is no reason for compression to be in port/
Test Plan: compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31281
Summary:
Create a benchmark for testing memtablereps. This diff is a bit rough, but it should do the trick until other bootcampers can clean it up.
Addressing comments
Removed the mutexes
Changed ReadWriteBenchmark to fix number of reads and count the number of writes we can perform in that time.
Test Plan:
Run it.
Below runs pass
./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep skiplist
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep skiplist
./memtablerep_bench --benchmarks readwrite,seqreadwrite --memtablerep skiplist --num_operations 200 --num_threads 5
./memtablerep_bench --benchmarks fillrandom,readrandom --memtablerep hashskiplist
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep hashskiplist
--num_scans 2
./memtablerep_bench --benchmarks fillseq,readseq --memtablerep vector
Reviewers: jpaton, ikabiljo, sdong
Reviewed By: sdong
Subscribers: dhruba, ameyag
Differential Revision: https://reviews.facebook.net/D22683
Summary: Add comments in db.h to help users discover their options.
Test Plan: Compile
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: MarkCallaghan, yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31077
Summary: Add an extra assert to make sure current version is included in VersionSet::AddLiveFiles().
Test Plan: make all check
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, hermanlee4, leveldb
Differential Revision: https://reviews.facebook.net/D30819
Summary: During recovery, VersionBuilder::Apply() was called multiple times. If the DB is open for long enough, most of files added earlier will be deleted by later deletes. In current solution, sorting added file happens first and then deletes are applied. In this patch, deletes are applied when possible inside Apply(), which can significantly reduce the sorting time in some cases.
Test Plan:
Add unit tests in version_builder
valgrind_check
Open a manifest of 50MB, with 9K live files. The manifest read time reduced from 1.6 seconds to 0.7 seconds.
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30765
Summary:
This patch changes concurrency guarantees around ColumnFamilySet::column_families_ and ColumnFamilySet::column_families_data_.
Before:
* When mutating: lock DB mutex and spin lock
* When reading: lock DB mutex OR spin lock
After:
* When mutating: lock DB mutex and be in write thread
* When reading: lock DB mutex or be in write thread
That way, we eliminate the spin lock that protects these hash maps and simplify concurrency. That means we don't need to lock the spin lock during writing, since writing is mutually exclusive with column family create/drop (the only operations that mutate those hash maps).
With these new restrictions, I also needed to move column family create to the write thread (column family drop was already in the write thread).
Even though we don't need to lock the spin lock during write, impact on performance should be minimal -- the spin lock is almost never busy, so locking it is almost free.
This addresses task t5116919.
Test Plan:
make check
Stress test with lots and lots of column family drop and create:
time ./db_stress --threads=30 --ops_per_thread=5000000 --max_key=5000 --column_families=200 --clear_column_family_one_in=100000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress/
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30651
Summary: When trivial move commit is done, we log the summary of the input version instead of current. This is inconsistent with other log messages and confusing.
Test Plan: compiles
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30939
Summary:
A command line like this to run all the tests:
source benchmark.config.sh && nohup ./benchmark.sh 'bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting'
where
benchmark.config.sh is:
export DB_DIR=/data/mysql/rocksdata
export WAL_DIR=/txlogs/rockswal
export OUTPUT_DIR=/root/rocks_benchmarking/output
Will fail for the tests that need a new DB .
Also 1) set disable_data_sync=0 and 2) add debug mode to run through all the tests more quickly
Test Plan: run ./benchmark.sh 'debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' and verify that there are no complaints about WAL dir not being empty.
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D30909
Summary:
Since https://reviews.facebook.net/D16119, we ignore partial tailing writes. Because of that, we no longer need skip_log_error_on_recovery.
The documentation says "Skip log corruption error on recovery (If client is ok with losing most recent changes)", while the option actually ignores any corruption of the WAL (not only just the most recent changes). This is very dangerous and can lead to DB inconsistencies. This was originally set up to ignore partial tailing writes, which we now do automatically (after D16119). I have digged up old task t2416297 which confirms my findings.
Test Plan: There was actually no tests that verified correct behavior of skip_log_error_on_recovery.
Reviewers: yhchiang, rven, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30603
Summary:
This is a serious bug. If paranod_check == true and WAL is corrupted, we don't fail DB::Open(). I tried going into history and it seems we've been doing this for a long long time.
I found this when investigating t5852041.
Test Plan: Added unit test to verify correct behavior.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30597
Summary: CLANG was broken for a recent change in db_ench. Fix it.
Test Plan: Build db_bench using CLANG.
Reviewers: rven, igor, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30801
Summary:
Add structures for exposing events and operations. Event describes
high-level action about a thread such as doing compaciton or
doing flush, while an operation describes lower-level action
of a thread such as reading / writing a SST table, waiting for
mutex. Events and operations are designed to be independent.
One thread would typically involve in one event and one operation.
Code instrument will be in a separate diff.
Test Plan:
Add unit-tests in thread_list_test
make dbg -j32
./thread_list_test
export ROCKSDB_TESTS=ThreadList
./db_test
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: rven, jonahcohen, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29781
Summary: Having --num_hot_column_families default on fails some existing regression tests. By default turn it off
Test Plan: Run db_bench to make sure it is default off.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D30705
Summary:
Add option --num_hot_column_families in db_bench. If it is set, write options will first write to that number of column families, and then move on to next set of hot column families. The working set of column families can be smaller than total number of CFs.
It is to test how RocksDB can handle cold column families
Test Plan: Run db_bench with --num_hot_column_families set and not set.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30663
Dike out the body of VerifyCompactionResult. With assert() compiled out, the
loop index variable in the inner loop was unused, breaking the build when
-Werror is enabled.
Summary:
GetThreadList() feature depends on the thread creation and destruction, which is currently handled under Env.
This patch moves GetThreadList() feature under Env to better manage the dependency of GetThreadList() feature
on thread creation and destruction.
Renamed ThreadStatusImpl to ThreadStatusUpdater. Add ThreadStatusUtil, which is a static class contains
utility functions for ThreadStatusUpdater.
Test Plan: run db_test, thread_list_test and db_bench and verify the life cycle of Env and ThreadStatusUpdater is properly managed.
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: ljin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30057
Summary: As title. We shouldn't need to execute flush from compaction if there are dedicated threads doing flushes.
Test Plan: make check
Reviewers: rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30579
Summary:
There are two versions of FindObsoleteFiles():
* full scan, which is executed every 6 hours (and it's terribly slow)
* no full scan, which is executed every time a background process finishes and iterator is deleted
This diff is optimizing the second case (no full scan). Here's what we do before the diff:
* Get the list of obsolete files (files with ref==0). Some files in obsolete_files set might actually be live.
* Get the list of live files to avoid deleting files that are live.
* Delete files that are in obsolete_files and not in live_files.
After this diff:
* The only files with ref==0 that are still live are files that have been part of move compaction. Don't include moved files in obsolete_files.
* Get the list of obsolete files (which exclude moved files).
* No need to get the list of live files, since all files in obsolete_files need to be deleted.
I'll post the benchmark results, but you can get the feel of it here: https://reviews.facebook.net/D30123
This depends on D30123.
P.S. We should do full scan only in failure scenarios, not every 6 hours. I'll do this in a follow-up diff.
Test Plan:
One new unit test. Made sure that unit test fails if we don't have a `if (!f->moved)` safeguard in ~Version.
make check
Big number of compactions and flushes:
./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30249
Summary:
This one wasn't easy to find :)
What happens is we go through all cfds on flush_queue_ and find no cfds to flush, *but* the cfd is set to the last CF we looped through and following code assumes we want it flushed.
BTW @sdong do you think we should also make BackgroundFlush() only check a single cfd for flushing instead of doing this `while (!flush_queue_.empty())`?
Test Plan: regression test no longer fails
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: sdong, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30591
Summary:
When scaling to higher number of column families, the worst bottleneck was MaybeScheduleFlushOrCompaction(), which did a for loop over all column families while holding a mutex. This patch addresses the issue.
The approach is similar to our earlier efforts: instead of a pull-model, where we do something for every column family, we can do a push-based model -- when we detect that column family is ready to be flushed/compacted, we add it to the flush_queue_/compaction_queue_. That way we don't need to loop over every column family in MaybeScheduleFlushOrCompaction.
Here are the performance results:
Command:
./db_bench --write_buffer_size=268435456 --db_write_buffer_size=268435456 --db=/fast-rocksdb-tmp/rocks_lots_of_cf --use_existing_db=0 --open_files=55000 --statistics=1 --histogram=1 --disable_data_sync=1 --max_write_buffer_number=2 --sync=0 --benchmarks=fillrandom --threads=16 --num_column_families=5000 --disable_wal=1 --max_background_flushes=16 --max_background_compactions=16 --level0_file_num_compaction_trigger=2 --level0_slowdown_writes_trigger=2 --level0_stop_writes_trigger=3 --hard_rate_limit=1 --num=33333333 --writes=33333333
Before the patch:
fillrandom : 26.950 micros/op 37105 ops/sec; 4.1 MB/s
After the patch:
fillrandom : 17.404 micros/op 57456 ops/sec; 6.4 MB/s
Next bottleneck is VersionSet::AddLiveFiles, which is painfully slow when we have a lot of files. This is coming in the next patch, but when I removed that code, here's what I got:
fillrandom : 7.590 micros/op 131758 ops/sec; 14.6 MB/s
Test Plan:
make check
two stress tests:
Big number of compactions and flushes:
./db_stress --threads=30 --ops_per_thread=20000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=15 --max_background_compactions=10 --max_background_flushes=10 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
max_background_flushes=0, to verify that this case also works correctly
./db_stress --threads=30 --ops_per_thread=2000000 --max_key=10000 --column_families=20 --clear_column_family_one_in=10000000 --verify_before_write=0 --reopen=3 --max_background_compactions=3 --max_background_flushes=0 --db=/fast-rocksdb-tmp/db_stress --prefixpercent=0 --iterpercent=0 --writepercent=75 --db_write_buffer_size=2000000
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D30123