Summary: Now EnvOptions uses unsanitized DB options. bytes_per_sync is tuned off when rate_limiter is used, but this change doesn't take effort.
Test Plan: See different I/O pattern in db_bench running fillseq.
Reviewers: yhchiang, kradhakrishnan, rven, anthony, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D36723
Summary: Now trivial move is only triggered when moving from level n to n+1. With dynamic level base, it is possible that file is moved from level 0 to level n, while levels from 1 to n-1 are empty. Extend trivial move to this case.
Test Plan: Add a more unit test of sequential loading. Non-trivial compaction happened without the patch and now doesn't happen.
Reviewers: rven, yhchiang, MarkCallaghan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba, IslamAbdelRahman
Differential Revision: https://reviews.facebook.net/D36669
Summary:
There are some cases when flachcache file descriptor was
already allocated (i.e. fb-MySQL). Then NewFlashcacheAwareEnv returns an
error at open() because fd was already assigned. This diff adds another
function to instantiate FlashcacheAwareEnv, with pre-allocated fd cachedev_fd.
Test Plan: Tested with MyRocks using this function, then worked
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, MarkCallaghan, rven
Differential Revision: https://reviews.facebook.net/D36447
Summary: Now we add warnings when user configures compression and the compression is not supported.
Test Plan:
Configured compression to non-supported values. Observed messages in my log:
2015/03/26-12:17:57.586341 7ffb8a496840 [WARN] Compression type chosen for level 2 is not supported: LZ4. RocksDB will not compress data on level 2.
2015/03/26-12:19:10.768045 7f36f15c5840 [WARN] Compression type chosen is not supported: LZ4. RocksDB will not compress data.
Reviewers: rven, sdong, yhchiang
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35979
Summary:
Currently users have no idea a key is add, delete or merge from TablePropertiesCollector call back. Add a new function to add it.
Also refactor the codes so that
(1) make table property collector and internal table property collector two separate data structures with the later one now exposed
(2) table builders only receive internal table properties
Test Plan: Add cases in table_properties_collector_test to cover both of old and new ways of using TablePropertiesCollector.
Reviewers: yhchiang, igor.sugak, rven, igor
Reviewed By: rven, igor
Subscribers: meyering, yoshinorim, maykov, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D35373
Summary:
If accumulated_num_non_deletions_ were ever smaller than
accumulated_num_deletions_, the computation of
"accumulated_num_non_deletions_ - accumulated_num_deletions_"
would result in a logically "negative" value, but since
the two operands are unsigned (uint64_t), the result corresponding
to e.g., -1 would 2^64-1.
Instead, return 0 in that case.
Test Plan:
- ensure "make check" still passes
- temporarily add an "abort();" call in the new "if"-block, and
observe that it fails in some test cases. However, note that
this case is triggered only when the two numbers are equal.
Thus, no test case triggers the erroneous behavior this
change is designed to avoid. If anyone can construct a
scenario in which that bug would be triggered, I'll be
happy to add a test case.
Reviewers: ljin, igor, rven, igor.sugak, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36489
Summary: Int is used for level size targets when options_.level_compaction_dynamic_level_bytes=true, which will cause overflow when database grows big. Fix it.
Test Plan: Add a new unit test which fails without the fix.
Reviewers: rven, yhchiang, MarkCallaghan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D36453
Summary: In some db_test tests sync points are not cleared which will cause unexpected results in the next tests. Clean them up in test cleaning up.
Test Plan:
Run the same tests that used to fail:
build using USE_CLANG=1 and run
./db_test --gtest_filter="DBTest.CompressLevelCompaction:*DBTestUniversalCompactionParallel*"
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D36429
Summary:
This diff fixes a crash found when an empty database is opened in readonly mode.
We now check the number of levels before we open the DB as a compacted DB.
Test Plan: DBTest.EmptyCompactedDB
Reviewers: igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36327
Summary:
Fix the make unity build. The local stats variable name was shadowing a
global stats variable.
Test Plan:
Run the build
OPT=-DTRAVIS V=1 make unity
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36285
Summary:
After recent change of DBTest.DynamicCompactionOptions, occasionally hit another non-deterministic case where L0 showdown is triggered while timeout should not triggered for hard limit.
Fix it by increasing L0 slowdown trigger at the same time.
Test Plan: Run the failed test.
Reviewers: igor, rven
Reviewed By: rven
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D36219
Summary:
With this change, we use L1 and up to store compaction outputs in universal compaction.
The compaction pick logic stays the same. Outputs are stored in the largest "level" as possible.
If options.num_levels=1, it behaves all the same as now.
Test Plan:
1) convert most of existing unit tests for universal comapaction to include the option of one level and multiple levels.
2) add a unit test to cover parallel compaction in universal compaction and run it in one level and multiple levels
3) add unit test to migrate from multiple level setting back to one level setting
4) add a unit test to insert keys to trigger multiple rounds of compactions and verify results.
Reviewers: rven, kradhakrishnan, yhchiang, igor
Reviewed By: igor
Subscribers: meyering, leveldb, MarkCallaghan, dhruba
Differential Revision: https://reviews.facebook.net/D34539
Summary:
Just couple of small changes:
1. removed signal_test, since it doesn't seem useful and we don't even run it as part of `make check`
2. moved perf_context_test to TESTS instead of PROGRAMS
3. `make release` probably shouldn't compile benchmarks. We currently rely on `make release` building db_bench (via Jenkins), so I left db_bench there.
This is just a minor cleanup. We need to rethink our targets since they are a bit messy right now. We can do this during our tech debt week.
Test Plan: make release
Reviewers: anthony, rven, yhchiang, sdong, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36171
Summary:
The --stats_interval_seconds determines interval for stats reporting
and overrides --stats_interval when set. I also changed tools/benchmark.sh
to report stats every 60 seconds so I can avoid trying to figure out a
good value for --stats_interval per test and per storage device.
Task ID: #6631621
Blame Rev:
Test Plan:
run tools/run_flash_bench, look at output
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36189
Summary:
Cleaning up log files can do heavy IO, since we call ftruncate() in the destructor. We don't want to call ftruncate() in user threads.
This diff moves cleaning to background threads (flush and compaction)
Test Plan: make check, will also run valgrind
Reviewers: yhchiang, rven, MarkCallaghan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D36177
Summary:
This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
makes the duration per-test configurable and refactors the test scripts.
Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
the writer thread does Merge rather than Put.
Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
Sometimes automation and humans want different format.
Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
stats in the output at test end.
Adds the average ingest rate to compaction IO stats. Output now looks like:
https://gist.github.com/mdcallag/2bd64d18be1b93adc494
More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1
For benchmark.sh changes default RocksDB configuration to reduce stalls:
* min_level_to_compress from 2 to 3
* hard_rate_limit from 2 to 3
* max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
* L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)
Task ID: #6596829
Blame Rev:
Test Plan:
run tools/run_flash_bench.sh
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36075
Summary: Most of the approach is copied from WebSQL's MySQL branch. It's nice that we can do this without touching core RocksDB code.
Test Plan: Compiles and runs. Didn't test flashback code, as I don't have flashback device and most if it is c/p
Reviewers: MarkCallaghan, sdong
Reviewed By: sdong
Subscribers: rven, lgalanis, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35391
Summary:
Assign the string properties to const string variables under the
DB::Properties namespace. This helps catch typos during compilation and
also consolidates the property definition in one place.
Test Plan: Run rocksdb unit tests
Reviewers: sdong, yoshinorim, igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D35991
Summary: It's useful to know if we have compression support or no
Test Plan:
Observed this in my LOG:
2015/03/26-10:34:35.460681 7f5b322b7840 Snappy supported
2015/03/26-10:34:35.460682 7f5b322b7840 Zlib supported
2015/03/26-10:34:35.460686 7f5b322b7840 Bzip supported
2015/03/26-10:34:35.460687 7f5b322b7840 LZ4 NOT supported
Reviewers: sdong, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35955
[maa@srv2-nskb-devg2 rocksdb-master]$ CXX=/usr/local/CC/gcc-4.7.4/bin/g++ EXTRA_CXXFLAGS=-std=c++11 DISABLE_WARNING_AS_ERROR=1 make db_bench
CC db/db_bench.o
db/db_bench.cc: In member function 'rocksdb::Slice rocksdb::Benchmark::AllocateKey(std::unique_ptr<const char []>*)':
db/db_bench.cc:1434:41: error: use of deleted function 'void std::unique_ptr<_Tp [], _Dp>::reset(_Up) [with _Up = char*; _Tp = const char; _Dp = std::default_delete<const char []>]'
In file included from /usr/local/CC/gcc-4.7.4/lib/gcc/x86_64-unknown-linux-gnu/4.7.4/../../../../include/c++/4.7.4/memory:86:0,
from ./include/rocksdb/db.h:14,
from ./db/dbformat.h:14,
from ./db/db_impl.h:21,
from db/db_bench.cc:33:
Summary:
We have addded new stats and perf_context for measuring the merge and filter operation time consumption.
We have bounded all the merge operations within the GUARD statment and collected the total time for these operations in the DB.
Test Plan: WIP
Reviewers: rven, yhchiang, kradhakrishnan, igor, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34377
Summary:
Report elapsed time of a thread operation in micros in ThreadStatus
instead of start time of a thread operation in seconds since the
Epoch, 1970-01-01 00:00:00 (UTC).
Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10
Sample Output:
ThreadID ThreadType cfName Operation ElapsedTime Stage State
140667724562496 High Pri column_family_name_000002 Flush 772.419 ms FlushJob::WriteLevel0Table
140667728756800 High Pri default Flush 617.845 ms FlushJob::WriteLevel0Table
140667732951104 High Pri column_family_name_000005 Flush 772.078 ms FlushJob::WriteLevel0Table
140667875557440 Low Pri column_family_name_000008 Compaction 1409.216 ms CompactionJob::Install
140667737145408 Low Pri
140667749728320 Low Pri
140667816837184 Low Pri column_family_name_000007 Compaction 1071.815 ms CompactionJob::ProcessKeyValueCompaction
140667787477056 Low Pri column_family_name_000009 Compaction 772.516 ms CompactionJob::ProcessKeyValueCompaction
140667741339712 Low Pri
140667758116928 Low Pri column_family_name_000004 Compaction 620.739 ms CompactionJob::ProcessKeyValueCompaction
140667753922624 Low Pri
140667842003008 Low Pri column_family_name_000006 Compaction 1260.079 ms CompactionJob::ProcessKeyValueCompaction
140667745534016 Low Pri
Reviewers: sdong, igor, rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35769
Summary:
Improve ThreadStatusSingleCompaction in two ways:
1. Use SYNC_POINT to ensure compaction won't happen
before the test finishes its "Put Phase" instead of
using sleep.
2. In Put Phase, it continues until we have sufficient
number of L0 files. Note that during the put phase,
there won't be any compaction that consumes L0 files
because of item 1.
Test Plan: ./db_test --gtest_filter="*ThreadStatusSingleCompaction*"
Reviewers: sdong, igor, rven
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35727
Summary: DBTest doesn't clean up wal directory. It might cause failure after a failure test run. Fix it.
Test Plan:
Run unit tests
Try open DB with non-empty db_path/wal.
Reviewers: rven, yhchiang, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D35559
Summary:
To understand the bug read t5943287 and check out the new test in column_family_test (ReadDroppedColumnFamily), iter 0.
RocksDB contract allowes you to read a drop column family as long as there is a live reference. However, since our iteration ignores dropped column families, AddLiveFiles() didn't mark files of a dropped column families as live. So we deleted them.
In this patch I no longer ignore dropped column families in the iteration. I think this behavior was confusing and it also led to this bug. Now if an iterator client wants to ignore dropped column families, he needs to do it explicitly.
Test Plan: Added a new unit test that is failing on master. Unit test succeeds now.
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32535
Summary: Suprisingly, the only way we use this vector is to keep track of level0 compactions. Thus, I simplified it.
Test Plan: make check
Reviewers: rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35313
Summary: This is a simple change to make db_test::MultiThreadedDBTest as value parameterized test. There is a value of creating a separate set of such tests later.
Test Plan:
```lang=bash
% make db_test
% ./make db_test
```
Also with the following command I can execute all db_test in 2:37.87 on my box
```
% ./db_test --gtest_list_tests | sed 's/\# GetParam.*//' | tr -d ' ' | env time parallel --gnu --eta --joblog=LOG -- 'TEST_TMPDIR=/dev/shm/rocksdb-{} ./db_test --gtest_filter="*{}"'
```
Reviewers: igor, rven, meyering, sdong
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35361
Summary: Add a DB property for number of deletions in memtables. It can sometimes help people debug slowness because of too many deletes.
Test Plan: Add test cases.
Reviewers: rven, yhchiang, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba, yoshinorim
Differential Revision: https://reviews.facebook.net/D35247
Summary:
This is like readwhilewriting but uses Merge rather than Put in the writer thread.
I am using it for in-progress benchmarks. I don't think the other benchmarks for Merge
cover this behavior. The purpose for this test is to measure read performance when
readers might have to merge results. This will also benefit from work-in-progress
to add skewed key generation.
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D35115
Summary: WriteBatch and WriteBatchWithIndex now both inherit from a common abstract base class. This makes it easier to write code that is agnostic toward the implementation of the particular write batch. In particular, I plan on utilizing this abstraction to allow transactions to support using either implementation of a write batch.
Test Plan: modified existing WriteBatchWithIndex tests to test new functions. Running all tests.
Reviewers: igor, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34017
Summary: It is no longer used by the implementation, so we should also remove it from the public API.
Test Plan: make check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34971
Summary:
Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different.
In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest.
There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides.
```lang=bash
% cat ~/transform
#!/bin/sh
files=$(git ls-files '*test\.cc')
for file in $files
do
if grep -q "rocksdb::test::RunAllTests()" $file
then
if grep -Eq '^class \w+Test {' $file
then
perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file
perl -pi -e 's/^(TEST)/${1}_F/g' $file
fi
perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file
perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file
fi
done
% sh ~/transform
% make format
```
Second iteration of this diff contains only scripted changes.
Third iteration contains manual changes to fix last errors and make it compilable.
Test Plan:
Build and notice no errors.
```lang=bash
% USE_CLANG=1 make check -j55
```
Tests are still testing.
Reviewers: meyering, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35157
Summary: Some suggestions for cleanup from Igor.
Test Plan: Regression tests.
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D35169
Summary:
gtest does not use exceptions to fail a unit test by design, and `ASSERT*`s are implemented using `return`. As a consequence we cannot use `ASSERT*` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement | 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes.
In order to detect all existing `ASSERT*` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases.
In `util/testharness.h` I defined `EXPECT*` assertions, the same way as `ASSERT*`, and redefined `ASSERT*` to return `void`. Then executed:
```lang=bash
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if /: error:/' \
build.log | xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number'
% make format
```
After that I reverted back change to `ASSERT*` in `util/testharness.h`. But preserved introduced `EXPECT*`, which is the same as `ASSERT*`. This will be deleted once switched to gtest.
This diff is independent and contains manual changes only in `util/testharness.h`.
Test Plan:
Make sure all tests are passing.
```lang=bash
% USE_CLANG=1 make check
```
Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33333
Summary:
On RocksDB, when there are multiple instances doing
flushes/compactions in the background, the close call takes a long time
because the flushes/compactions need to complete before the database can
shut down. If another instance is using the background threads and the compaction for this instance is in the queue since it has been scheduled, we still cannot shutdown. We now remove the scheduled background tasks which have not yet started running, so that shutdown is speeded up.
Test Plan: DB Test added.
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33741
Summary: These changes are necessary to make tests look more generic, and avoid feature conflicts with gtest.
Test Plan:
Make sure no build errors, and all test are passing.
```
% make check
```
Reviewers: igor, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35145
Summary:
The output did not have space for 6-digit file counts or for 3-digit
counts of files being compacted. This adds space for that while preserving
existing alignment. See https://gist.github.com/mdcallag/0a61c6a18dd467224c11
Task ID: #
Blame Rev:
Test Plan:
run db_bench, look at output
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D35091
Summary:
The preshutdown tests check for stopped compactions/flushes.
Removing stalls on the write path.
Test Plan: DBTests.PreShutdown*
Reviewers: yhchiang, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35037
Summary:
Improve the robustness of ThreadStatusSingleCompaction
by ensuring the number of files flushed in the test.
Test Plan:
export ROCKSDB_TESTS=ThreadStatus
./db_test
Reviewers: sdong, igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35019
Summary:
Fix the deadlock issue in ThreadStatusSingleCompaction.
In the previous version of ThreadStatusSingleCompaction, the compaction
thread will wait for a SYNC_POINT while its db_mutex is held. However,
if the test hasn't finished its Put cycle while a compaction is running,
a deadlock will happen in the test.
Test Plan:
export ROCKSDB_TESTS=ThreadStatus
./db_test
Reviewers: sdong, igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35001
Summary: The test depends on snappy to be used. Skip the test if it is not supported.
Test Plan: Run the test
Reviewers: meyering, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D34995
Summary: Currently, we have `ifdef SNAPPY` around bunch of db_test code. Some tests that don't even use compression are also blocked when running system doesn't have snappy. This also causes hard-to-catch bugs, like D34983. We should dynamically figure out if compression is supported or not.
Test Plan: compiles
Reviewers: sdong, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34989
Summary:
Here's my proposal for making our LOGs easier to read by machines.
The idea is to dump all events as JSON objects. JSON is easy to read by humans, but more importantly, it's easy to read by machines. That way, we can parse this, load into SQLite/mongo and then query or visualize.
I started with table_create and table_delete events, but if everybody agrees, I'll continue by adding more events (flush/compaction/etc etc)
Test Plan:
Ran db_bench. Observed:
2015/01/15-14:13:25.788019 1105ef000 EVENT_LOG_v1 {"time_micros": 1421360005788015, "event": "table_file_creation", "file_number": 12, "file_size": 1909699}
2015/01/15-14:13:25.956500 110740000 EVENT_LOG_v1 {"time_micros": 1421360005956498, "event": "table_file_deletion", "file_number": 12}
Reviewers: yhchiang, rven, dhruba, MarkCallaghan, lgalanis, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31647
Summary:
The tests using sync_point for intent to shutdown stop
compaction and this results in stalls if too many rows are written. We
now limit the number of rows written to prevent stalls, since the focus
of the test is to cancel background work, which is being correctly
tested. This fixes a Jenkins issue.
Test Plan: DBTest.PreShutdown*
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34893
Summary:
Change the way options.compression_per_level is used when options.level_compaction_dynamic_level_bytes=true so that options.compression_per_level[1] determines compression for the level L0 is merged to, options.compression_per_level[2] to the level after that, etc.
Test Plan: run all tests
Reviewers: rven, yhchiang, kradhakrishnan, igor
Reviewed By: igor
Subscribers: yoshinorim, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D34431
Summary: Use a better way to map from a key with locality to a random location. Now with the same -read_random_exp_range setting, hit rate drops, which it is expected.
Test Plan: ./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=<multiple_values>
Reviewers: MarkCallaghan, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D34761
Summary:
Provide an API which enables users to infor Rocksdb that it is
shutting down.
Test Plan: db_test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34617
Summary: Got it working by some voodoo programming
Test Plan: works!
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34611
Summary: Allow GetThreadList() to report the start time of the current operation.
Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 \
--max_background_compactions=10 --max_background_flushes=3 \
--thread_status_per_interval=1000 --key_size=16 --value_size=1000 \
--num_column_families=10
Sample output:
ThreadID ThreadType cfName Operation OP_StartTime State
140338840797248 High Pri column_family_name_000003 Flush 2015/03/09-17:49:59
140338844991552 High Pri column_family_name_000004 Flush 2015/03/09-17:49:59
140338849185856 Low Pri
140338983403584 Low Pri
140339008569408 Low Pri
140338861768768 Low Pri
140338924683328 Low Pri
140338899517504 Low Pri
140338853380160 Low Pri
140338882740288 Low Pri
140338865963072 High Pri column_family_name_000006 Flush 2015/03/09-17:49:59
140338954043456 Low Pri
140338857574464 Low Pri
Reviewers: igor, rven, sdong
Reviewed By: sdong
Subscribers: lgalanis, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34689
Summary: Introduce parameter -read_random_exp_range in db_bench to provide some key skewness in readrandom and multireadrandom benchmarks. It will helpful to cover block cache better.
Test Plan:
Run benchmarks with this new parameter. I can clearly see block cache hit rate change while I increase this value (DB size is about 66MB):
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=0.0
rocksdb.block.cache.data.miss COUNT : 958418
rocksdb.block.cache.data.hit COUNT : 41582
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=5.0
rocksdb.block.cache.data.miss COUNT : 819518
rocksdb.block.cache.data.hit COUNT : 180482
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=10.0
rocksdb.block.cache.data.miss COUNT : 450479
rocksdb.block.cache.data.hit COUNT : 549521
./db_bench --benchmarks=readrandom -statistics -use_existing_db -cache_size=5000000 --read_random_exp_range=20.0
rocksdb.block.cache.data.miss COUNT : 223192
rocksdb.block.cache.data.hit COUNT : 776808
Reviewers: MarkCallaghan, kradhakrishnan, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D34629
Summary: I want to be able to set this through mongo config.
Test Plan: added unit test
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34599
Summary:
Add --thread_status_per_interval to db_bench, which allows
db_bench to optionally enable print the current thread status
periodically.
Test Plan:
./db_bench --benchmarks=fillrandom --num=100000 --threads=40 --max_background_compactions=10 --max_background_flushes=3 --thread_status_per_interval=1000 --key_size=16 --value_size=1000 --num_column_families=10
Sample output:
ThreadID ThreadType dbName cfName Operation State
140281571770432 Low Pri
140281575964736 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000001 Flush
140281710182464 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000008 Compaction
140281638879296 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000007 Compaction
140281592741952 Low Pri
140281580159040 High Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000002 Flush
140281676628032 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000006 Compaction
140281584353344 Low Pri
140281622102080 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000009 Compaction
140281605324864 Low Pri /tmp/rocksdbtest-5297/dbbench column_family_name_000004 Compaction
140281601130560 High Pri /tmp/rocksdbtest-5297/dbbench default Flush
140281596936256 Low Pri
140281588547648 Low Pri
Reviewers: igor, rven, sdong
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34515
Summary:
When having fixed max_bytes_for_level_base, the ratio of size of largest level and the second one can range from 0 to the multiplier. This makes LSM tree frequently irregular and unpredictable. It can also cause poor space amplification in some cases.
In this improvement (proposed by Igor Kabiljo), we introduce a parameter option.level_compaction_use_dynamic_max_bytes. When turning it on, RocksDB is free to pick a level base in the range of (options.max_bytes_for_level_base/options.max_bytes_for_level_multiplier, options.max_bytes_for_level_base] so that real level ratios are close to options.max_bytes_for_level_multiplier.
Test Plan: New unit tests and pass tests suites including valgrind.
Reviewers: MarkCallaghan, rven, yhchiang, igor, ikabiljo
Reviewed By: ikabiljo
Subscribers: yoshinorim, ikabiljo, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31437
Summary:
Summary:
Added a new option to ColumnFamllyOptions - optimize_filters_for_hits. This option can be used in the case where most
accesses to the store are key hits and we dont need to optimize performance for key misses.
This is useful when you have a very large database and most of your lookups succeed. The option allows the store to
not store and use filters in the last level (the largest level which contains data). These filters can take a large amount of
space for large databases (in memory and on-disk). For the last level, these filters are only useful for key misses and not
for key hits. If we are not optimizing for key misses, we can choose to not store these filters for that level.
This option is only provided for BlockBasedTable. We skip the filters when we are compacting
Test Plan:
1. Modified db_test toalso run tests with an additonal option (skip_filters_on_last_level)
2. Added another unit test to db_test which specifically tests that filters are being skipped
Reviewers: rven, igor, sdong
Reviewed By: sdong
Subscribers: lgalanis, yoshinorim, MarkCallaghan, rven, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33717
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
An old commit (482401) changed DoWrite to use the value of --writes rather
than --num to determine the range for keys. This restores the old and correct
behavior which is to limit it using --num.
Task ID: #6353043
Blame Rev:
Test Plan:
run db_bench
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34065
Summary:
In a release build, a member was not being accessed. This
member was only being accessed in a debug build. We now add an accessor
function for this member and the buid succeeds.
Test Plan: build release/unity/debug on linux/mac
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34035
Summary: This causes warnings on OS X
Test Plan: compiles
Reviewers: rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33969
Summary:
This diff contains trivial fixes for 6 scan-build warnings:
**db/c_test.c**
`db` variable is never read. Removed assignment.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath
**db/db_iter.cc**
`skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath
**db/log_reader.cc**
In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath
In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases.
scan-build reprots:
http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPathhttp://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath
**table/plain_table_key_coding.cc**
Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath
**tools/db_stress.cc**
Missing `break` in switch case block. This seems to be a bug. Added missing `break`.
Test Plan:
Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs.
```lang=bash
% make check
% make analyze
```
Reviewers: meyering, igor, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33795
Summary:
The "const" attribute applies to the type, and placing it
before that return type retains the desired semantics,
yet avoids the compiler error/warning.
Test Plan: Run make
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33789
Summary:
Otherwise, we would assert that an unsigned expression is always >= 0.
The intent was to form a possibly negative number, and to assert that
that value is always >= 0, but since one variable in the computation
was unsigned, the result was guaranteed to be unsigned, too, rendering
the assertion useless.
Cast that unsigned variable to "int", so that all operands
are signed, and thus so that the result can be negative.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33771
Summary:
The "const" attribute does not make sense on a return type,
and provokes a warning/error from gcc -W -Wextra.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33759
Summary:
Remove some always-true assertions.
They provoke these compilation failures:
table/plain_table_key_coding.cc:279:20: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
db/version_set.cc:336:15: error: comparison of unsigned expression >= 0 is always true [-Werror=type-limits]
* table/plain_table_key_coding.cc (rocksdb): Remove assertion that
unsigned type variable is >= 0.
* db/version_set.cc (DoGenerateLevelFilesBrief): Likewise.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33747
Summary:
The bug is detected by scan-build.
In `void WriteSeqSeekSeq(ThreadState* thread)` memory is allocated in line 3118 `Slice key = AllocateKey();` but `Slice` is not responsible deleting `Slice::data()`.
Added `std::unique_ptr<const char[]>*` parameter to ` AllocateKey()`, so that it requires caller to not forget about Slice::data() management.
scan-build bug report: http://home.fburl.com/~sugak/latest6/report-6e9754.html#EndPath
Test Plan:
Make sure scan-build does not report 'Memory leak' in db/db_bench.cc and all tests are passing.
```lang=bash
% make analyze
% make check
```
Reviewers: lgalanis, igor, meyering, sdong
Reviewed By: meyering, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33501
Summary:
Prior to this change, "make check" would always waste a lot of
time relinking 60+ binaries. With this change, it does that
only when the generated file, util/build_version.cc, changes,
and that happens only when the date changes or when the
current git SHA changes.
This change makes some other improvements: before, there was no
rule to build a deleted util/build_version.cc. If it was somehow
removed, any attempt to link a program would fail.
There is no longer any need for the separate file,
build_tools/build_detect_version. Its functionality is
now in the Makefile.
* Makefile (DEPFILES): Don't filter-out util/build_version.cc.
No need, and besides, removing that dependency was wrong.
(date, git_sha, gen_build_version): New helper variables.
(util/build_version.cc): New rule, to create this file
and update it only if it would contain new information.
* build_tools/build_detect_platform: Remove file.
* db/db_impl.cc: Now, print only date (not the time).
* util/build_version.h (rocksdb_build_compile_time): Remove
declaration. No longer used.
Test Plan:
- Run "make check" twice, and note that the second time no linking is performed.
- Remove util/build_version.cc and ensure that any "make"
command regenerates it before doing anything else.
- Run this: strings librocksdb.a|grep _build_.
That prints output including the following:
rocksdb_build_git_date:2015-02-19
rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33591
Summary: Add a DB property about live versions. It can be helpful to figure out whether there are files not live but not yet deleted, in some use cases.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33327
Summary:
This is a diff for managed iterator. A managed iterator
is a wrapper around an iterator which saves the options for that
iterator as well as the current key/value so that the underlying iterator
and its associated memory can be released when it is aged out
automatically or on the request of the user. Will provide the automatic release as a follow-up diff.
Test Plan: Managed* tests in db_test and XF tests for managed iterator
Reviewers: igor, yhchiang, anthony, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31401
Summary:
Remove ThreadStatusMultiCompaction test as it's currently written
in a way that depends on some randomness, while the flush / compaction
status of a single thread is also covered in ThreadStatusFlush
and ThreadStatusSingleCompaction tests.
Test Plan: ./db_test
Reviewers: igor, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33537
Summary: Allow GetThreadList to reflect flush activity.
Test Plan:
Developed ThreadStatusFlush test and updated ThreadStatusMultiCompaction test.
./db_test ./thread_list_test
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32871
Summary:
It would be good to assing background job their IDs. Two benefits:
1) makes LOGs more readable
2) I might use it in my EventLogger, which will try to make our LOG easier to read/query/visualize
Test Plan: ran rocksdb, read the LOG
Reviewers: sdong, rven, yhchiang
Reviewed By: yhchiang
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31617
Summary: DBTest.DestroyDBMetaDatabase occasionally fails on my dev host, for file not existing. Always create directories to avoid that.
Test Plan: Run the test
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33321
Summary: Remember whole key or prefix filtering on/off in SST files. If user opens the DB with a different setting that cannot be satisfied while reading the SST file, ignore the bloom filter.
Test Plan: Add a unit test for it
Reviewers: yhchiang, igor, rven
Reviewed By: rven
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D32889
Summary: Add counters in perf context to allow users to figure out how time spent on waiting for DB mutex
Test Plan: Add a test and run it.
Reviewers: yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33177
Summary: For description of the bug, see comment in db_test. The fix is pretty straight forward.
Test Plan: added unit test. eventually we need better testing of FOF/POF process.
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33081
Summary:
- In statistics.h , added tickers.
- In version_set.cc,
-- Added a getter method for hit_file_level_ in the class FilePicker
-- Added a line in the Get() method in case of a found, increment the corresponding counters based on the level of the file respectively.
Corresponding task: https://our.intern.facebook.com/intern/tasks/?s=506100481&t=5952818
Personal fork: 0c3f2e3600
Test Plan:
In terminal,
```
make -j32 db_test
ROCKSDB_TESTS=L0L1L2AndUpHitCounter ./db_test
```
Or to use debugger,
```
make -j32 db_test
export ROCKSDB_TESTS=L0L1L2AndUpHitCounter
gdb db_test
```
Reviewers: rven, sdong
Reviewed By: sdong
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D32205
Summary: Having a pointer for DB will be helpful to debug when GDB or working on a dump. If the client process doesn't have any thread actively working on RocksDB, it can be hard to find out.
Test Plan: make all check
Reviewers: rven, yhchiang, igor
Reviewed By: igor
Subscribers: yoshinorim, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33159