Summary: Add --rate_limiter_bytes_per_sec to db_bench to allow rater limit to disk
Test Plan:
Run
./db_bench --benchmarks=fillseq --num=30000000 --rate_limiter_bytes_per_sec=3000000 --num_multi_db=8 -disable_wal
And see io_stats to have the rate limited.
Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D38385
Summary:
Fixed the following compile error in db/column_family.cc
db/column_family.cc:633:33: error: ‘ASSERT_GT’ was not declared in this scope
16:14:45 ASSERT_GT(listeners.size(), 0U);
Test Plan: make db_test
Reviewers: igor, sdong, rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38367
Summary:
Fixed a bug in EventListener::OnCompactionCompleted() that returns
incorrect list of input / output file names.
Test Plan: Extend existing test in listener_test.cc
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38349
Summary: This caused a crash of our MongoDB + RocksDB instance. PickCompactionBySize() sets its own parent_index. We never reset this parent_index when picking PickFilesMarkedForCompactionExperimental(). So we might end up doing SetupOtherInputs() with parent_index that was set by PickCompactionBySize, although we're using compaction calculated using PickFilesMarkedForCompactionExperimental.
Test Plan: Added a unit test that fails with assertion on master.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38337
Summary:
Added a couple functions to WriteBatchWithIndex to make it easier to query the value of a key including reading pending writes from a batch. (This is needed for transactions).
I created write_batch_with_index_internal.h to use to store an internal-only helper function since there wasn't a good place in the existing class hierarchy to store this function (and it didn't seem right to stick this function inside WriteBatchInternal::Rep).
Since I needed to access the WriteBatchEntryComparator, I moved some helper classes from write_batch_with_index.cc into write_batch_with_index_internal.h/.cc. WriteBatchIndexEntry, ReadableWriteBatch, and WriteBatchEntryComparator are all unchanged (just moved to a different file(s)).
Test Plan: Added new unit tests.
Reviewers: rven, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38037
Summary: In new clang we need to add override to every overriden function
Test Plan: none
Reviewers: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D38259
Summary: When reporting compaction that was started because of SuggestCompactRange() we should treat it as manual compaction.
Test Plan: none
Reviewers: yhchiang, rven
Reviewed By: rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38139
Summary: Don't treat warnings as error when building rocksdbjavastatic
Test Plan: make rocksdbjavastatic -j32
Reviewers: rven, fyrz, adamretter, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38187
Summary: Without this I get bunch of questions when I run `make clean`
Test Plan: no more questions!
Reviewers: rven, yhchiang, meyering, anthony
Reviewed By: meyering, anthony
Subscribers: meyering, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38145
Summary:
In D28521 we removed GarbageCollect() from BackupEngine's constructor. The reason was that opening BackupEngine on HDFS was very slow and in most cases we didn't have any garbage. We allowed the user to call GarbageCollect() when it detects some garbage files in his backup directory.
Unfortunately, this left us vulnerable to an interesting issue. Let's say we started a backup and copied files {1, 3} but the backup failed. On another host, we restore DB from backup and generate {1, 3, 5}. Since {1, 3} is already there, we will not overwrite. However, these files might be from a different database so their contents might be different. See internal task t6781803 for more info.
Now, when we're copying files and we discover a file already there, we check:
1. if the file is not referenced from any backups, we overwrite the file.
2. if the file is referenced from other backups AND the checksums don't match, we fail the backup. This will only happen if user is using a single backup directory for backing up two different databases.
3. if the file is referenced from other backups AND the checksums match, it's all good. We skip the copy and go copy the next file.
Test Plan: Added new test to backupable_db_test. The test fails before this patch.
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37599
Summary:
Couple changes:
1. instead of SnapshotList, just take a vector of snapshots
2. don't take a separate parameter is_snapshots_supported. If there are snapshots in the list, that means they are supported. I actually think we should get rid of this notion of snapshots not being supported.
3. don't pass in mutable_cf_options as a parameter. Lifetime of mutable_cf_options is a bit tricky to maintain, so it's better to not pass it in for the whole compaction job. We only really need it when we install the compaction results.
Test Plan: make check
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36627
Summary:
Fixes#6840824, running "make check" on centos6 hits
a deadlock in column_family_test
Test Plan:
seq 10000 | parallel --gnu --eta 't=/dev/shm/rdb-{}; rm -rf
$t; mkdir $t && export TEST_TMPDIR=$t; ./column_family_test > $t/log-{}'
Made the test deterministic by narrrowing the window for the flush.
Reviewers: igor, meyering
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38079
Summary: Optimize GetRange Function by checking the level of the files
Test Plan: pass make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37977
Summary:
[noticed a new warning when building with the very latest gcc]
* db/memtablerep_bench.cc (FLAGS_env): Remove declaration
of unused varaible, to avoid this warning/error:
db/memtablerep_bench.cc:135:22: error: ‘FLAGS_env’ defined but not\
used [-Werror=unused-variable]
static rocksdb::Env* FLAGS_env = rocksdb::Env::Default();
^
Test Plan: compile
Reviewers: ljin, rven, igor.sugak, yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37983
Summary:
The default, use one iter for the whole test, isn't good. This cost me
a few hours of debugging and a few days of tessting. For readonly
that isn't realistic and for read-write that keeps a lot of old sst files around.
I remove the option because nothing uses it and not calling gettimeofday per
loop iteration adds about 3% to QPS at 20 threads.
Task ID: #
Blame Rev:
Test Plan:
run db_bench
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37965
Summary:
CPU profiling reveals GetApproximateSizes as a bottleneck for performance. The current implementation is sub-optimal, it scans every file in every level to compute the result.
We can take advantage of the fact that all levels above 0 are sorted in the increasing order of key ranges and use binary search to locate the starting index. This can reduce the number of comparisons required to compute the result.
Test Plan: We have good test coverage. Run the tests.
Reviewers: sdong, igor, rven, dynamike
Subscribers: dynamike, maykov, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37755
Summary: Before the fix we also marked the bottommost level for compaction. This is wrong because then RocksDB has N+1 levels instead of N as before the compaction.
Test Plan: SuggestCompactRangeTest in db_test
Reviewers: yhchiang, rven
Reviewed By: rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37869
Summary: Remove duplicate code. If this diff looks good, I will cleanup other call sites as well.
Test Plan: unit tests
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37761
Summary:
This runs a benchmark for LevelDB similar to what we have
in tools/run_flash_bench.sh. It requires changes to db_bench that I published
in a LevelDB fork on github. Some results are at:
http://smalldatum.blogspot.com/2015/04/comparing-leveldb-and-rocksdb-take-2.html
Sample output:
ops/sec mb/sec usec/op avg p50 Test
525 16.4 1904.5 1904.5 111.0 fillseq.v32768
75187 15.5 13.3 13.3 4.4 fillseq.v200
28328 5.8 35.3 35.3 4.7 overwrite.t1.s0
175438 0.0 5.7 5.7 4.4 readrandom.t1
28490 5.9 35.1 35.1 4.7 overwrite.t1.s0
121951 0.0 8.2 8.2 5.7 readwhilewriting.t1
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37749
Summary:
Added these events:
* Recovery start, finish and also when recovery creates a file
* Trivial move
* Compaction start, finish and when compaction creates a file
* Flush start, finish
Also includes small fix to EventLogger
Also added option ROCKSDB_PRINT_EVENTS_TO_STDOUT which is useful when we debug things. I've spent far too much time chasing LOG files.
Still didn't get sst table properties in JSON. They are written very deeply into the stack. I'll address in separate diff.
TODO:
* Write specification. Let's first use this for a while and figure out what's good data to put here, too. After that we'll write spec
* Write tools that parse and analyze LOGs. This can be in python or go. Good intern task.
Test Plan: Ran db_bench with ROCKSDB_PRINT_EVENTS_TO_STDOUT. Here's the output: https://phabricator.fb.com/P19811976
Reviewers: sdong, yhchiang, rven, MarkCallaghan, kradhakrishnan, anthony
Reviewed By: anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37521
Test Plan: Verified that valgrind build passes for cache_test
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37665
Summary: Since we enabled jemalloc for open source builds, Travis looks like it's dying. Don't install jemalloc when running in travis
Test Plan: none
Reviewers: yhchiang
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37659
Summary: Added keyword override for SetCapacity()
Test Plan: Fixes build
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37647
Summary:
When new capacity is larger than existing capacity, simply update the capacity to the new valie
When new capacity is less than existing capacity, but more than the usage, simply update the capacity to new value
When new capacity is less than the existing capacity and existing usage both, try to purge entries in LRU if feasible to make usage < capacity
Test Plan: Created unit tests in cache_test.cc
Reviewers: sdong, rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37527
Summary:
Make it build for CYGWIN.
Need to define "-std=gnu++11" instead of "-std=c++11" and use some replacement functions.
Test Plan: Build it and run some unit tests in CYGWIN
Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37605
Summary:
CompactRange for universal compaction with num_levels > 1 seems to have a bug. The unit test also has a bug so it doesn't capture the problem.
Fix it. Revert the compact range to the logic equivalent to num_levels=1. Always compact all files together.
It should also fix DBTest.IncreaseUniversalCompactionNumLevels. The issue was that options.write_buffer_size = 100 << 10 and options.write_buffer_size = 100 << 10 are not used in later test scenarios. So write_buffer_size of 4MB was used. The compaction trigger condition is not anymore obvious as expected.
Test Plan: Run the new test and all test suites
Reviewers: yhchiang, rven, kradhakrishnan, anthony, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37551
Summary:
Based on feedback from D37083.
Are all of these correct? In some spaces it seems like we're doing SetMaxPossibleForUserKey() although we want the smallest possible internal key for user key.
Test Plan: make check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37341
Summary: Currently open source rocksdb only builds with tcmalloc. This diff first checks if jemalloc is available. If it is, it compiles with jemalloc. If it isn't, it checks for tcmalloc.
Test Plan: Tried this out on my Ubuntu virtual machine and confirms that jemalloc is correctly detected and compiled.
Reviewers: MarkCallaghan, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: adamretter, meyering, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36789
Summary: Reading CompactionPicker I noticed this dangerous substraction of two unsigned integers. We should assert to mark this as safe.
Test Plan: make check
Reviewers: anthony, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37041
Summary:
This diff implements a new `DB` method `PromoteL0` which moves all files in L0
to a given level skipping compaction, provided that the files have disjoint
ranges and all levels up to the target level are empty.
This method provides finer-grain control for trivial compactions, and it is
useful for bulk-loading pre-sorted keys. Compared to D34797, it does not change
the semantics of an existing operation, which can impact existing code.
PromoteL0 is designed to work well in combination with the proposed
`GetSstFileWriter`/`AddFile` interface, enabling to "design" the level structure
by populating one level at a time. Such fine-grained control can be very useful
for static or mostly-static databases.
Test Plan: `make check`
Reviewers: IslamAbdelRahman, philipp, MarkCallaghan, yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37107
Summary: Add more logging to help debugging issues.
Test Plan: Run test suites
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D37401