Summary: Make RocksDb build and run on Windows to be functionally
complete and performant. All existing test cases run with no
regressions. Performance numbers are in the pull-request.
Test plan: make all of the existing unit tests pass, obtain perf numbers.
Co-authored-by: Praveen Rao praveensinghrao@outlook.com
Co-authored-by: Sherlock Huang baihan.huang@gmail.com
Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com
Summary: Hack up rocksdb_dump and rocksdb_undump utilities to get this task rolling/promote discussion.
Test Plan: Dump/undump databases recursively to see if nothing is lost.
Reviewers: sdong, yhchiang, rven, anthony, kradhakrishnan, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37269
Summary:
EventListener::OnFlushCompleted() now passes a structure instead
of a list of parameters. This minimizes the API change in the
future.
Test Plan:
listener_test
compact_files_test
example/compact_files_example
Reviewers: kradhakrishnan, sdong, IslamAbdelRahman, rven, igor
Reviewed By: rven, igor
Subscribers: IslamAbdelRahman, rven, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39543
Summary:
Add EventListener::OnTableFileCreated(), which will be called
when a table file is created. This patch is part of the
EventLogger and EventListener integration.
Test Plan: Augment existing test in db/listener_test.cc
Reviewers: anthony, kradhakrishnan, rven, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D38865
Summary:
Fixed db_stress by correcting the verification of column family
names in the Listener of db_stress
Test Plan: db_stress
Reviewers: igor, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39255
Summary: Fixed a compile warning in db_stress in NDEBUG mode.
Test Plan: make OPT=-DNDEBUG db_stress
Reviewers: sdong, anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39213
Summary: Optimistic transactions supporting begin/commit/rollback semantics. Currently relies on checking the memtable to determine if there are any collisions at commit time. Not yet implemented would be a way of enuring the memtable has some minimum amount of history so that we won't fail to commit when the memtable is empty. You should probably start with transaction.h to get an overview of what is currently supported.
Test Plan: Added a new test, but still need to look into stress testing.
Reviewers: yhchiang, igor, rven, sdong
Reviewed By: sdong
Subscribers: adamretter, MarkCallaghan, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D33435
Summary:
Fixed the following compile warning in db_stress:
error: 'OnCompactionCompleted' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
Test Plan: make db_stress
Reviewers: sdong, igor, anthony
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D39207
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
Summary:
This runs a benchmark for LevelDB similar to what we have
in tools/run_flash_bench.sh. It requires changes to db_bench that I published
in a LevelDB fork on github. Some results are at:
http://smalldatum.blogspot.com/2015/04/comparing-leveldb-and-rocksdb-take-2.html
Sample output:
ops/sec mb/sec usec/op avg p50 Test
525 16.4 1904.5 1904.5 111.0 fillseq.v32768
75187 15.5 13.3 13.3 4.4 fillseq.v200
28328 5.8 35.3 35.3 4.7 overwrite.t1.s0
175438 0.0 5.7 5.7 4.4 readrandom.t1
28490 5.9 35.1 35.1 4.7 overwrite.t1.s0
121951 0.0 8.2 8.2 5.7 readwhilewriting.t1
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37749
Summary:
This is done to avoid having each thread use the same seed between runs
of db_bench. Without this we can inflate the OS filesystem cache hit rate on
reads for read heavy tests and generally see the same key sequences get generated
between teste runs.
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37563
Summary:
This adds:
1) use of --level_compaction_dynamic_level_bytes=true
2) use of --bytes_per_sync=2M
The second is a big win for disks. The first helps in general.
This also adds a new test, fillseq with 32kb values to increase the peak
ingest and make it more likely that storage limits throughput.
Sample outpout from the first 3 tests - https://gist.github.com/mdcallag/e793bd3038e367b05d6f
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D37509
Summary:
This changes loads to use vector memtable and disable the WAL. This also
increases the chance we will see IO bottlenecks during loads which is good to stress
test HW. But I also think it is a good way to load data quickly as this is a bulk
operation and the WAL isn't needed.
The two numbers below are the MB/sec rates for fillseq, bulkload using a skiplist
or vector memtable and the WAL enabled or disabled. There is a big benefit from
using the vector memtable and WAL disabled. Alas there is also a perf bug in
the use of std::sort for ordered input when the vector is flushed. Task is open
for that.
112, 66 - skiplist with wal
250, 116 - skiplist without wal
110, 108 - vector with wal
232, 370 - vector without wal
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36957
Summary: Add a script, which checks out changes from a list of tags, build them and load the same data into it. In the last, checkout the target build and make sure it can successfully open DB and read all the data. It is implemented through ldb tool, because ldb tool is available from all previous builds so that we don't have to cross build anything.
Test Plan: Run the script.
Reviewers: yhchiang, rven, anthony, kradhakrishnan, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D36639
Summary:
This adds p99.9 and p99.99 response times to the benchmark report and
adds a second report, report2.txt that has tests listed in test order rather
than the time in which they were run, so overwrite tests are listed for
all thread counts, then update etc.
Also changes fillseq to compress all levels to avoid write-amp from rewriting
uncompressed files when they reach the first level to compress.
Increase max_write_buffer_number to avoid stalls during fillseq and make
max_background_flushes agree with max_write_buffer_number.
See https://gist.github.com/mdcallag/297ff4316a25cb2988f7 for an example
of the new report (report2.txt)
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36537
Summary:
The --stats_interval_seconds determines interval for stats reporting
and overrides --stats_interval when set. I also changed tools/benchmark.sh
to report stats every 60 seconds so I can avoid trying to figure out a
good value for --stats_interval per test and per storage device.
Task ID: #6631621
Blame Rev:
Test Plan:
run tools/run_flash_bench, look at output
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36189
Summary:
This makes run_flash_bench.sh configurable. Previously it was hardwired for 1B keys and tests
ran for 12 hours each. That kept me from using it. This makes it configuable, adds more tests,
makes the duration per-test configurable and refactors the test scripts.
Adds the seekrandomwhilemerging test to db_bench which is the same as seekrandomwhilewriting except
the writer thread does Merge rather than Put.
Forces the stall-time column in compaction IO stats to use a fixed format (H:M:S) which makes
it easier to scrape and parse. Also adds an option to AppendHumanMicros to force a fixed format.
Sometimes automation and humans want different format.
Calls thread->stats.AddBytes(bytes); in db_bench for more tests to get the MB/sec summary
stats in the output at test end.
Adds the average ingest rate to compaction IO stats. Output now looks like:
https://gist.github.com/mdcallag/2bd64d18be1b93adc494
More information on the benchmark output is at https://gist.github.com/mdcallag/db43a58bd5ac624f01e1
For benchmark.sh changes default RocksDB configuration to reduce stalls:
* min_level_to_compress from 2 to 3
* hard_rate_limit from 2 to 3
* max_grandparent_overlap_factor and max_bytes_for_level_multiplier from 10 to 8
* L0 file count triggers from 4,8,12 to 4,12,20 for (start,stall,stop)
Task ID: #6596829
Blame Rev:
Test Plan:
run tools/run_flash_bench.sh
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D36075
Summary:
Whenever we add new tests in db_sanity_test.cc, the verification test
will fail since the old version db_sanity_test.cc does not have the
newly added test. This patch makes auto_sanity_test.sh always use
the db_sanity_test.cc of the newer commit.
As a result, a macro guard is added to allow db_sanity_test.cc to be
backward compatible.
Test Plan: tools/auto_sanity_check.sh
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35997
Summary:
This is like readwhilewriting but uses Merge rather than Put in the writer thread.
I am using it for in-progress benchmarks. I don't think the other benchmarks for Merge
cover this behavior. The purpose for this test is to measure read performance when
readers might have to merge results. This will also benefit from work-in-progress
to add skewed key generation.
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D35115
Summary:
Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different.
In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest.
There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides.
```lang=bash
% cat ~/transform
#!/bin/sh
files=$(git ls-files '*test\.cc')
for file in $files
do
if grep -q "rocksdb::test::RunAllTests()" $file
then
if grep -Eq '^class \w+Test {' $file
then
perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file
perl -pi -e 's/^(TEST)/${1}_F/g' $file
fi
perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file
perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file
fi
done
% sh ~/transform
% make format
```
Second iteration of this diff contains only scripted changes.
Third iteration contains manual changes to fix last errors and make it compilable.
Test Plan:
Build and notice no errors.
```lang=bash
% USE_CLANG=1 make check -j55
```
Tests are still testing.
Reviewers: meyering, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35157
Summary:
gtest does not use exceptions to fail a unit test by design, and `ASSERT*`s are implemented using `return`. As a consequence we cannot use `ASSERT*` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement | 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes.
In order to detect all existing `ASSERT*` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases.
In `util/testharness.h` I defined `EXPECT*` assertions, the same way as `ASSERT*`, and redefined `ASSERT*` to return `void`. Then executed:
```lang=bash
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if /: error:/' \
build.log | xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number'
% make format
```
After that I reverted back change to `ASSERT*` in `util/testharness.h`. But preserved introduced `EXPECT*`, which is the same as `ASSERT*`. This will be deleted once switched to gtest.
This diff is independent and contains manual changes only in `util/testharness.h`.
Test Plan:
Make sure all tests are passing.
```lang=bash
% USE_CLANG=1 make check
```
Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering
Reviewed By: meyering
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33333
Summary:
Without this change about half of the updaterandom reads and merge puts will be for keys that don't exist.
I think it is better for these tests to start with a full database and use fillseq to fill it.
Task ID: #
Blame Rev:
Test Plan:
Revert Plan:
Database Impact:
Memcache Impact:
Other Notes:
EImportant:
- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D35043
Summary:
Single threaded tests -> sync=0 Multi threaded tests -> sync=1 by default unless DB_BENCH_NO_SYNC is defined.
Also added updaterandom and mergerandom with putOperator. I am waiting for some results from udb on this.
Test Plan:
DB_BENCH_NO_SYNC=1 WAL_DIR=/tmp OUTPUT_DIR=/tmp/b DB_DIR=/tmp ./tools/benchmark.sh debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting,updaterandom,mergerandom
WAL_DIR=/tmp OUTPUT_DIR=/tmp/b DB_DIR=/tmp ./tools/benchmark.sh debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting,updaterandom,mergerandom
Verify sync settings
Reviewers: sdong, MarkCallaghan, igor, rven
Reviewed By: igor, rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D34185
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
This diff contains trivial fixes for 6 scan-build warnings:
**db/c_test.c**
`db` variable is never read. Removed assignment.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9b77d2.html#EndPath
**db/db_iter.cc**
`skipping` local variable is assigned to false. Then in the next switch block the only "non return" case assign `skipping` to true, the rest cases don't use it and all do return.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-13fca7.html#EndPath
**db/log_reader.cc**
In `bool Reader::SkipToInitialBlock()` `offset_in_block` local variable is assigned to 0 `if (offset_in_block > kBlockSize - 6)` and then never used. Removed the assignment and renamed it to `initial_offset_in_block` to avoid confusion.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-a618dd.html#EndPath
In `bool Reader::ReadRecord(Slice* record, std::string* scratch)` local variable `in_fragmented_record` in switch case `kFullType` block is assigned to false and then does `return` without use. In the other switch case `kFirstType` block the same `in_fragmented_record` is assigned to false, but later assigned to true without prior use. Removed assignment for both cases.
scan-build reprots:
http://home.fburl.com/~sugak/latest20/report-bb86b0.html#EndPathhttp://home.fburl.com/~sugak/latest20/report-a975be.html#EndPath
**table/plain_table_key_coding.cc**
Local variable `user_key_size` is assigned when declared. But then in both places where it is used assigned to `static_cast<uint32_t>(key.size() - 8)`. Changed to initialize the variable to the proper value in declaration.
scan-build report:
http://home.fburl.com/~sugak/latest20/report-9e6b86.html#EndPath
**tools/db_stress.cc**
Missing `break` in switch case block. This seems to be a bug. Added missing `break`.
Test Plan:
Make sure all tests are passing and scan-build does not report 'Dead assignment' and 'Dead initialization' bugs.
```lang=bash
% make check
% make analyze
```
Reviewers: meyering, igor, kradhakrishnan, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D33795
Summary: rockuse more memory that asked to. Monitor and report.
Test Plan: run the pro with conditions to simulate the overusage. It should report that the process is using more memory than needed.
Reviewers: yhchiang, rven, sdong, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33249
Summary: We don't have plans to work on this in the short term. If we ever resurrect the project, we can find the code in the history. No need for it to linger around
Test Plan: no test
Reviewers: yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D32349
Summary:
This diff adds BlockBasedTable format_version = 2. New format version brings better compressed block format for these compressions:
1) Zlib -- encode decompressed size in compressed block header
2) BZip2 -- encode decompressed size in compressed block header
3) LZ4 and LZ4HC -- instead of doing memcpy of size_t encode size as varint32. memcpy is very bad because the DB is not portable accross big/little endian machines or even platforms where size_t might be 8 or 4 bytes.
It does not affect format for snappy.
If you write a new database with format_version = 2, it will not be readable by RocksDB versions before 3.10. DB::Open() will return corruption in that case.
Test Plan:
Added a new test in db_test.
I will also run db_bench and verify VSIZE when block_cache == 1GB
Reviewers: yhchiang, rven, MarkCallaghan, dhruba, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D31461
Summary:
A command line like this to run all the tests:
source benchmark.config.sh && nohup ./benchmark.sh 'bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting'
where
benchmark.config.sh is:
export DB_DIR=/data/mysql/rocksdata
export WAL_DIR=/txlogs/rockswal
export OUTPUT_DIR=/root/rocks_benchmarking/output
Will fail for the tests that need a new DB .
Also 1) set disable_data_sync=0 and 2) add debug mode to run through all the tests more quickly
Test Plan: run ./benchmark.sh 'debug,bulkload,fillseq,overwrite,filluniquerandom,readrandom,readwhilewriting' and verify that there are no complaints about WAL dir not being empty.
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D30909
Summary:
Priliminary diff to solicit comments.
Given DB path, dump all SST files (key/value and properties), WAL file and manifest
files. What command options do we need to support for this command? Maybe
output_hex for keys?
Test Plan: Create additional ldb unit tests.
Reviewers: sdong, rven
Reviewed By: rven
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D29547
Summary: as title
Test Plan: ran it
Reviewers: yhchiang, igor, sdong, MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D25563
Summary:
Fix the following compile warning in db_stress.cc on Mac
tools/db_stress.cc:1688:52: error: format specifies type 'unsigned long' but the argument has type '::google::uint64' (aka 'unsigned long long') [-Werror,-Wformat]
fprintf(stdout, "DB-write-buffer-size: %lu\n", FLAGS_db_write_buffer_size);
~~~ ^~~~~~~~~~~~~~~~~~~~~~~~~~
%llu
Test Plan:
make
Summary:
Introduces a new class for managing write buffer memory across column
families. We supplement ColumnFamilyOptions::write_buffer_size with
ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
instance that enforces memory limits before flushing out to disk.
Test Plan: Added SharedWriteBuffer unit test to db_test.cc
Reviewers: sdong, rven, ljin, igor
Reviewed By: igor
Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
Differential Revision: https://reviews.facebook.net/D22581
Summary:
In some environment such as android, the c++ library does not have
std::to_string. This path adds rocksdb::ToString(), which wraps std::to_string
when std::to_string is not available, and implements std::to_string
in the other case.
Test Plan:
make dbg -j32
./db_test
make clean
make dbg OPT=-DOS_ANDROID -j32
./db_test
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D29181
Summary: First commit for rdb shell
Test Plan: unit_test.js does simple assertions on most of the main functionality; will update with rest of tests
Reviewers: igor, rven, lijn, yhciang, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28749
Summary:
Make db_stress built for ROCKSDB_LITE.
The test doesn't pass tough. It seg fault quickly. But I took a look and it doesn't seem to be related to lite version. Likely to be a bug inside RocksDB.
Test Plan: make db_stress
Reviewers: yhchiang, rven, ljin, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D28797
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.
This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.
Test Plan: compiles
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: yhchiang
Subscribers: bobbaldwin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28689
Summary:
Refactor sst_dump to follow the same structure as ldb. Introduce a
SSTDump interface.
Test Plan: built sst_dump and tried it manually.
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28485
Summary: Previously I made `make check` work with -Wshadow, but there are some tools that are not compiled using `make check`.
Test Plan: make all
Reviewers: yhchiang, rven, ljin, sdong
Reviewed By: ljin, sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28497
Summary: as title
Test Plan: ./db_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28269
Summary: Fix lint errors and coding style of ldb related codes.
Test Plan: ./ldb
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28125
Summary:
Added two flags which operate as follows:
in_place_update: enable in_place_update for default column family
set_in_place_one_in: toggles the value of the option inplace_update_support with a probability of 1/N
Test Plan:
Run db_stress with the two flags above set.
Specifically tried in_place_update set to true and set_in_place_one_in set to 10,000.
Reviewers: ljin, igor, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28029
Summary: RocksDB already depends on C++11, so we might as well all the goodness that C++11 provides. This means that we don't need AtomicPointer anymore. The less things in port/, the easier it will be to port to other platforms.
Test Plan: make check + careful visual review verifying that NoBarried got memory_order_relaxed, while Acquire/Release methods got memory_order_acquire and memory_order_release
Reviewers: rven, yhchiang, ljin, sdong
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D27543
Summary: Allow SetOptions() during db_stress test
Test Plan: make crash_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D25497
Summary: Try to match some parameters from Dhruba's benchmarks on github
Test Plan: ran it
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24687
Summary: Those were introduced with 2fb1fea30f because the flushing behavior changed when max_background_flushes is > 0.
Test Plan: make check
Reviewers: ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23577
Summary:
Hope these scripts will allow people to run/repro benchmark easily
I think it is time to re-run flash benchmarks and report results
Please comment if any other benchmark runs are needed
Test Plan: ran it
Reviewers: yhchiang, igor, sdong
Reviewed By: igor
Subscribers: dhruba, MarkCallaghan, leveldb
Differential Revision: https://reviews.facebook.net/D23139
Summary:
The compilers we use treat char as signed. However, this is not guarantee of C standard and some compilers (for ARM platform for example), treat char as unsigned. Code that assumes that char is either signed or unsigned is wrong.
This change explicitly casts the char to signed version. This will not break any of our use cases on x86, which, I believe are all of them. In case somebody out there is using RocksDB on ARM AND using bloom filters, they're going to have a bad time. However, it is very unlikely that this is the case.
Test Plan: sanity test with previous commit (with new sanity test)
Reviewers: yhchiang, ljin, sdong
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D22767
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.
Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20979
Summary:
As a preparation to support updating some options dynamically, I'd like
to first introduce ImmutableOptions, which is a subset of Options that
cannot be changed during the course of a DB lifetime without restart.
ColumnFamily will keep both Options and ImmutableOptions. Any component
below ColumnFamily should only take ImmutableOptions in their
constructor. Other options should be taken from APIs, which will be
allowed to adjust dynamically.
I am yet to make changes to memtable and other related classes to take
ImmutableOptions in their ctor. That can be done in a seprate diff as
this one is already pretty big.
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D22545
Summary:
1. assert db->Put to be true in db_stress
2. begin column family with name "1".
Test Plan: 1. ./db_stress
Reviewers: ljin, yhchiang, dhruba, sdong, igor
Reviewed By: sdong, igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22659
Summary:
I will move compression related options in a separate diff since this
diff is already pretty lengthy.
I guess I will also need to change JNI accordingly :(
Test Plan: make all check
Reviewers: yhchiang, igor, sdong
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21915
Summary:
1. fix segment error when dumping old sst format (no properties nor stats)
2. Enable dumpping old sst format
Test Plan:
Generate block based sst file with "properties", and one with "stats" and one without neither.
Read it using sst_dump
Reviewers: ljin, igor, yhchiang, dhruba, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21837
Summary:
All public headers need to be under `include/rocksdb` directory. Otherwise, clients include our header files like this:
#include <rocksdb/db.h>
#include <utilities/backupable_db.h> // still our public header!
Also, internally, we include:
#include "utilities/backupable/backupable_db.h" // internal header
#include "utilities/backupable_db.h" // public header
which is confusing.
This way, when we install rocksdb as a system library, we can just copy `include/rocksdb` directory to system's header files. We can't really copy `utilities` directory to system's header files.
Test Plan: compiles
Reviewers: dhruba, ljin, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20409
Summary:
Since we have a lot of options for PlainTable, add a struct PlainTableOptions
to manage them
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20175
Summary:
Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect.
Total order mode indicator is prefix_extractor == nullptr,
but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests
in plain_table_db_tests is incorrect.
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19587
Summary:
Now that the arena is used to allocate space for hashskiplist's bucket. The bucket size
need to be set small enough to avoid "should_flush_" failure in memtable's assertion.
Test Plan:
make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: igor
Differential Revision: https://reviews.facebook.net/D19371
Summary:
Fixed the following compile error.
tools/reduce_levels_test.cc:89:31: error: no matching function for call to 'InitFromCmdLineArgs'
LDBCommand* level_reducer = LDBCommand::InitFromCmdLineArgs(args);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
./util/ldb_cmd.h:56:22: note: candidate function not viable: requires 3 arguments, but 1 was provided
static LDBCommand* InitFromCmdLineArgs(
^
./util/ldb_cmd.h:62:22: note: candidate function not viable: requires 4 arguments, but 1 was provided
static LDBCommand* InitFromCmdLineArgs(
^
1 error generated.
Test Plan:
make reduce_levels_test
./reduce_levels_test
Reviewers: haobo, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19251
Summary:
As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases.
This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction.
There is one test case that relied on didIO information. I'll try to find another way to implement it.
Test Plan: make check
Reviewers: sdong, haobo, yhchiang, ljin, dhruba
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19161
Summary:
Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes.
The data format is documented in table/plain_table_factory.h
Test Plan: Add unit test coverage in plain_table_db_test
Reviewers: yhchiang, igor, dhruba, ljin, haobo
Reviewed By: haobo
Subscribers: nkg-, leveldb
Differential Revision: https://reviews.facebook.net/D18735
Summary: sst_dump now doesn't work well for PlainTable. Not sure when it started, but this should fix it.
Test Plan: Run sst_dump against a file that used to fail.
Reviewers: yhchiang, haobo, igor
Reviewed By: igor
Subscribers: dhruba, ljin, leveldb
Differential Revision: https://reviews.facebook.net/D19023
Summary: Even if the file is corrupted, table properties are usually available to print out. Now sst_dump would just fail without printing table properties. With this patch, table properties are still try to be printed out.
Test Plan: run sst_dump against multiple scenarios
Reviewers: igor, yhchiang, ljin, haobo
Reviewed By: haobo
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18981
Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490
Test Plan: added unit test. also, stress test for some time
Reviewers: sdong, haobo, dhruba, ljin, yhchiang
Reviewed By: yhchiang
Subscribers: yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D18951
Summary: Now sst_dump fails in block based tables if binary search index is used, as it requires a prefix extractor. Add it.
Test Plan: Run it against such a file to make sure it fixes the problem.
Reviewers: yhchiang, kailiu
Reviewed By: kailiu
Subscribers: ljin, igor, dhruba, haobo, leveldb
Differential Revision: https://reviews.facebook.net/D18927
Summary: Add an option to indicates the variation of background threads. If the flag is not 0, for every 100 milliseconds, adjust thread pool size to a value within the range.
Test Plan: run db_stress
Reviewers: haobo, dhruba, igor, ljin
Reviewed By: ljin
Subscribers: leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D18759
Summary:
Newer gflags switched from `google` namespace to `gflags` namespace. See: https://github.com/facebook/rocksdb/issues/139 and https://github.com/facebook/rocksdb/issues/102
Unfortunately, they don't define any macro with their namespace, so we need to actually try to compile gflags with two different namespace to figure out which one is the correct one.
Test Plan: works in fbcode environemnt. I'll also try in ubutnu with newer gflags
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18537
Summary:
also add an override option total_order_iteration if you want to use full
iterator with prefix_extractor
Test Plan: make all check
Reviewers: igor, haobo, sdong, yhchiang
Reviewed By: haobo
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D17805
Summary:
Currently, whenever DB Verification fails we bail out by calling `exit(1)`. This is kind of bad since it causes unclean shutdown and spew of error log messages like:
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
05:03:27 pthread lock: Invalid argument
This diff adds a new parameter that is set to true when verification fails. It can then use the parameter to bail out safely.
Test Plan: Casued artificail failure. Verified that exit was clean.
Reviewers: dhruba, haobo, ljin
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18243
Summary: Added benchmark functionality on the lines of folly/Benchmark.h
Test Plan: Added unit tests
Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17973
Summary:
The file tree structure in Version is prebuilt and the range of each file is known.
On the Get() code path, we do binary search in FindFile() by comparing
target key with each file's largest key and also check the range for each L0 file.
With some pre-calculated knowledge, each key comparision that has been done can serve
as a hint to narrow down further searches:
(1) If a key falls within a L0 file's range, we can safely skip the next
file if its range does not overlap with the current one.
(2) If a key falls within a file's range in level L0 - Ln-1, we should only
need to binary search in the next level for files that overlap with the current one.
(1) will be able to skip some files depending one the key distribution.
(2) can greatly reduce the range of binary search, especially for bottom
levels, given that one file most likely only overlaps with N files from
the level below (where N is max_bytes_for_level_multiplier). So on level
L, we will only look at ~N files instead of N^L files.
Some inital results: measured with 500M key DB, when write is light (10k/s = 1.2M/s), this
improves QPS ~7% on top of blocked bloom. When write is heavier (80k/s =
9.6M/s), it gives us ~13% improvement.
Test Plan: make all check
Reviewers: haobo, igor, dhruba, sdong, yhchiang
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17205
Summary: We don't use or build this code
Test Plan: builds
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17979
Summary:
This is first step of my effort to reduce size of librocksdb.a for use in mobile.
ldb object files are huge and are ment to be used as a command line tool. I moved them to `tools/` directory and include them only when compiling `ldb`
This diff reduced librocksdb.a from 42MB to 39MB on my mac (not stripped).
Test Plan: ran ldb
Reviewers: dhruba, haobo, sdong, ljin, yhchiang
Reviewed By: yhchiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17823
Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes.
Test Plan: compiles
Reviewers: dhruba, haobo, ljin, sdong
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17649
Summary: When opening DB in read-only mode, client can choose to only specify a subset of column families ("default" column family can't be omitted, though)
Test Plan: added a unit test in column_family_test
Reviewers: haobo, sdong, ljin, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17565
Summary:
Add script auto_sanity_test.sh to perform auto sanity test
usage: auto_sanity_test.sh [new_commit] [old_commit]
Running without commit parameter will do the sanity test with the latest
and the latest 10 commit.
Test Plan: ./auto_sanity_test.sh
Reviewers: haobo, igor
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17397
Summary: Added a function/command to check the consistency of live files' meta data
Test Plan:
Manual test (size mismatch, file not exist).
Command test script.
Reviewers: haobo
Reviewed By: haobo
CC: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D16935
Summary:
We should clean up after ourselves. Last crashtest failed because the disk on the cibuild machine was full.
Also, db_crashtest is supposed to run the test on the same database in every iteration. This isn't the case currently, because every run creates a new tmp directory. It's fixed with this diff.
Test Plan: ran it
Reviewers: ljin
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17073
Summary: We switched to prefix_seek method of seeking. This means that anytime we check Valid(), we also need to check starts_with(prefix)
Test Plan: ran db_stress
Reviewers: ljin
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16953
Summary: We're trying to deprecate prefix iterators, so no need to test them in db_stress
Test Plan: ran it
Reviewers: ljin
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16917
Summary:
this should fix the hash_skip_list issue, but I still see seqno
assertion failure in the last run. Will continue investigating and
address that in a different diff
Test Plan: make whitebox_crash_test
Reviewers: igor
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16851
Summary:
Hash skip list has issues, causing db_stress to fail badly.
For now, switching back to skip_list by default before we figure out root cause.
Test Plan: db_stress is happy(ier)
Reviewers: ljin
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16845
Summary: Syncing in stress test makes it run much much much slower. It also doesn't add much value IMO.
Test Plan: no
Reviewers: ljin
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16839
Summary:
*) fixed the comment
*) constant 1 was not casted to 64-bit, which (I think) might cause overflow if we shift it too much
*) default prefix size to be 7, like it was before
Test Plan: compiled
Reviewers: ljin
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16827
Summary: A bad Auto-Merge caused log buffer is flushed twice. Remove the unintended one.
Test Plan: Should already be tested (the code looks the same as when I ran unit tests).
Reviewers: haobo, igor
Reviewed By: haobo
CC: ljin, yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D16821
Summary: as title
Test Plan: running python tools/db_crashtest.py
Reviewers: igor
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16803
Summary: Fix the db_stress test, let is run with HashSkipList for real
Test Plan:
python tools/db_crashtest.py
python tools/db_crashtest2.py
Reviewers: igor, haobo
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16773
Summary:
I had this diff for a while to test column families implementation. Last night, I ran it sucessfully for 10 hours with the command:
time ./db_stress --threads=30 --ops_per_thread=200000000 --max_key=5000 --column_families=20 --clear_column_family_one_in=3000000 --verify_before_write=1 --reopen=50 --max_background_compactions=10 --max_background_flushes=10 --db=/tmp/db_stress
It is ready to be committed :)
Test Plan: Ran it for 10 hours
Reviewers: dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16797
Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.
Test Plan: db_bench && make asan_check
Reviewers: haobo, igor, sdong
Reviewed By: igor
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D16587
Summary:
If the last byte of the timestamp is 0, the output value of the ttl db
can cut off earlier. Change to compare hex format instead.
Test Plan:
this does not happen consistently, but it did not fail after the last
100 runs
for i in {1..100}; do python ./tools/ldb_test.py; done
Reviewers: igor, haobo
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16725
Summary:
@kailiu mentioned on meeting yesterday that we sometimes have trouble opening DB created by old version with the new version. This will be very important to test for column families, since I'm changing disk format for the MANIFEST.
I added a tool that can help us test that. Usage:
./db_sanity_test <path> create
will create a bunch of DBs under <path>
<change RocksDB version>
./db_sanity_test <path> verify
will verify consistency of DBs created under <path>
Test Plan: ran the db_sanity_test
Reviewers: kailiu, dhruba, haobo
Reviewed By: kailiu
CC: leveldb, kailiu, xjin
Differential Revision: https://reviews.facebook.net/D16605
Summary:
Why don't we automatically reopen DB when running crash test (running in our nightly build)? If I understand correctly, crashtest is manually reopenning the DB, but then the DB does not check its consistency when you kill db_stress process and then re-run it again.
Does this make sense?
Test Plan: not reall
Reviewers: dhruba, haobo, emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16167
Summary:
We are going to expose properties of all tables to end users through "some" db interface.
However, current design doesn't naturally fit for this need, which is because:
1. If a table presents in table cache, we cannot simply return the reference to its table properties, because the table may be destroy after compaction (and we don't want to hold the ref of the version).
2. Copy table properties is OK, but it's slow.
Thus in this diff, I change the table reader's interface to return a shared pointer (for const table properties), instead a const refernce.
Test Plan: `make check` passed
Reviewers: haobo, sdong, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15999
Summary:
This diff enables the command line tool `sst_dump` to work for sst files
under plain table format. Changes include:
* In tools/sst_dump.cc:
- add support for plain table format
- display prefix_extractor information when --show_properties is on
* In table/format.cc
- Now the table magic number of a Footer can be later initialized
via ReadFooterFromFile().
* In table/meta_bocks:
- add function ReadTableMagicNumber() that reads the magic number of
the specified file.
Minor fixes:
- remove a duplicate #include in table/table_test.cc
- fix a commentary typo in include/rocksdb/memtablerep.h
- fix lint errors.
Test Plan:
Runs sst_dump with both block-based and plain-table format files with
different arguments, specifically those with --show-properties and --from.
* sample output:
https://reviews.facebook.net/P261
Reviewers: kailiu, sdong, xjin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15903
Summary: RocksDB doesn't compile on 32-bit architecture apparently. This is attempt to fix some of 32-bit errors. They are reported here: https://gist.github.com/paxos/8789697
Test Plan: RocksDB still compiles on 64-bit :)
Reviewers: kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15825
Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases.
Test Plan: make all check
Reviewers: haobo, dhruba, kailiu
Reviewed By: haobo
CC: igor, leveldb
Differential Revision: https://reviews.facebook.net/D15489
Summary:
Easy thing goes first. This patch moves arena to internal dir; based
on which, the coming patch will deal with memtable_rep.
Test Plan: make check
Reviewers: haobo, sdong, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15615
Summary:
Add option '--show_properties' to sst_dump tool to allow displaying
property block of the specified files.
Test Plan:
Run sst_dump with the following arguments, which covers cases affected by
this diff:
1. with only --file
2. with both --file and --show_properties
3. with --file, --show_properties, and --from
Reviewers: kailiu, xjin
Differential Revision: https://reviews.facebook.net/D15453
Summary: By removing some includes form options.h and reply on forward declaration, we can more easily reason the dependencies.
Test Plan: make all check
Reviewers: kailiu, haobo, igor, dhruba
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15411
Summary:
It looks like we might have some trouble when building the new release with 4.8, since fbcode is using glibc2.17-fb by default and we are using glibc2.17. It was reported by Benjamin Renard in our internal group.
This diff moves our fbcode build to use glibc2.17-fb by default. I got some linker errors when compiling, complaining that `google::SetUsageMessage()` was undefined. After deleting all offending lines, the compile was successful and everything works.
Test Plan:
Compiled
Ran ./db_bench ./db_stress ./db_repl_stress
Reviewers: kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15405
Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review.
Test Plan: make check
Reviewers: haobo, kailiu, sdong, dhruba, tnovak
Reviewed By: tnovak
CC: leveldb
Differential Revision: https://reviews.facebook.net/D15099
Summary:
Let's get rid of TransformRep and it's children. We have confirmed that HashSkipListRep works better with multifeed, so there is no benefit to keeping this around.
This diff is mostly just deleting references to obsoleted functions. I also have a diff for fbcode that we'll need to push when we switch to new release.
I had to expose HashSkipListRepFactory in the client header files because db_impl.cc needs access to GetTransform() function for SanitizeOptions.
Test Plan: make check
Reviewers: dhruba, haobo, kailiu, sdong
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14397
Summary:
These bugs were caught by ASAN crash test.
1. The first one, in table/filter_block.cc is very nasty. We first reference entries_ and store the reference to Slice prev. Then, we call entries_.append(), which can change the reference. The Slice prev now points to junk.
2. The second one is a bug in a test, so it's not very serious. Once we set read_opts.prefix, we never clear it, so some other function might still reference it.
Test Plan: asan crash test now runs more than 5 mins. Before, it failed immediately. I will run the full one, but the full one takes quite some time (5 hours)
Reviewers: dhruba, haobo, kailiu
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14223
Summary: This diff invoves some more complicated issues in the posix environment.
Test Plan: works under mac os. will need to verify dev box.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D14061
Summary: This diff leverage the existing block cache and extend it to cache index/filter block.
Test Plan:
Added new tests in db_test and table_test
The correctness is checked by:
1. make check
2. make valgrind_check
Performance is test by:
1. 10 times of build_tools/regression_build_test.sh on two versions of rocksdb before/after the code change. Test results suggests no significant difference between them. For the two key operatons `overwrite` and `readrandom`, the average iops are both 20k and ~260k, with very small variance).
2. db_stress.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb, haobo, xjin
Differential Revision: https://reviews.facebook.net/D13167
Summary:
Archive cleaning will still happen every WAL_ttl seconds
but archived logs will be deleted only if archive size
is greater then a WAL_size_limit value.
Empty archived logs will be deleted evety WAL_ttl.
Test Plan:
1. Unit tests pass.
2. Benchmark.
Reviewers: emayanke, dhruba, haobo, sdong, kailiu, igor
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13869
Summary:
Rocksdb can now support a uncompressed block cache, or a compressed
block cache or both. Lookups first look for a block in the
uncompressed cache, if it is not found only then it is looked up
in the compressed cache. If it is found in the compressed cache,
then it is uncompressed and inserted into the uncompressed cache.
It is possible that the same block resides in the compressed cache
as well as the uncompressed cache at the same time. Both caches
have their own individual LRU policy.
Test Plan: Unit test case attached.
Reviewers: kailiu, sdong, haobo, leveldb
Reviewed By: haobo
CC: xjin, haobo
Differential Revision: https://reviews.facebook.net/D12675
Summary: Added an option --count_delim=<char> which takes the given character as delimiter ('.' by default) and reports count of each row type found in the db
Test Plan:
1. Created test in file (for DBDumperCommand) rocksdb/tools/ldb_test.py which puts various key value pair in db and checks the output using dump --count_delim ,--count_delim="." and --count_delim=",".
2. Created test in file (for InternalDumperCommand) rocksdb/tools/ldb_test.py which puts various key value pair in db and checks the output using dump --count_delim ,--count_delim="." and --count_delim=",".
3. Manually created a database with several keys of several type and verified by running the command
./ldb db=<path> dump --count_delim="<char>"
./ldb db=<path> idump --count_delim="<char>"
Reviewers: vamsi, dhruba, emayanke, kailiu
Reviewed By: vamsi
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13815
Summary: Don't define if already defined.
Test Plan: Running make release in parallel, will not commit if it fails.
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13833
Summary:
This patch is to address @haobo's comments on D13521:
1. rename Table to be TableReader and make its factory function to be GetTableReader
2. move the compression type selection logic out of TableBuilder but to compaction logic
3. more accurate comments
4. Move stat name constants into BlockBasedTable implementation.
5. remove some uncleaned codes in simple_table_db_test
Test Plan: pass test suites.
Reviewers: haobo, dhruba, kailiu
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13785
Summary: This patch makes Table and TableBuilder a abstract class and make all the implementation of the current table into BlockedBasedTable and BlockedBasedTable Builder.
Test Plan: Make db_test.cc to work with block based table. Add a new test simple_table_db_test.cc where a different simple table format is implemented.
Reviewers: dhruba, haobo, kailiu, emayanke, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13521
Summary: assert(Overlap) significantly slows down the benchmark. Ignore assertions when executing blob_store_bench.
Test Plan: Ran the benchmark
Reviewers: dhruba, haobo, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13737
Summary: Apparently C++ doesn't like it if you copy around its atomic<> variables. When running a benchmark for a longer time, benchmark used to stall. Changed WorkerThread in config to WorkerThread*. It works now.
Test Plan: Ran benchmark
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13731
Summary: Converted db_stress, db_repl_stress and db_bench to use gflags
Test Plan: I tested by printing out all the flags from old and new versions. Tried defaults, + various combinations with "interesting flags". Also, tested by running db_crashtest.py and db_crashtest2.py.
Reviewers: emayanke, dhruba, haobo, kailiu, sdong
Reviewed By: emayanke
CC: leveldb, xjin
Differential Revision: https://reviews.facebook.net/D13581
Summary:
Finally, arc diff works again! This has been sitting in my repo for a while.
I would like some comments on my BlobStore benchmark. We don't have to check this in.
Also, I don't do any fsync in the BlobStore, so this is all extremely fast. I'm not sure what durability guarantees we need from the BlobStore.
Test Plan: Nope
Reviewers: dhruba, haobo, kailiu, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13527
Summary: When I debug the unit test failures when enabling background flush thread, I feel the function names can be made clearer for people to understand. Also, if the names are fixed, in many places, some tests' bugs are obvious (and some of those tests are failing). This patch is to clean it up for future maintenance.
Test Plan: Run test suites.
Reviewers: haobo, dhruba, xjin
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13431
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.
Test Plan: compile rocksdb
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13287
Summary: Will use iterators to verify keys in the db for half of its keys and Gets for the other half.
Test Plan: ./db_stress --max_key=1000 --ops_per_thread=100
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13227
Summary: Using an iterator instead of the Get method, each thread goes through a portion of the database and verifies values by comparing to the shared state.
Test Plan:
./db_stress --db=/tmp/tmppp --max_key=10000 --ops_per_thread=10000
To test some basic cases, the following lines can be added (each set in turn) to the verifyDb method with the following expected results:
// Should abort with "Unexpected value found"
shared.Delete(start);
// Should abort with "Value not found"
WriteOptions write_opts;
db_->Delete(write_opts, Key(start));
// Should succeed
WriteOptions write_opts;
shared.Delete(start);
db_->Delete(write_opts, Key(start));
// Should abort with "Value not found"
WriteOptions write_opts;
db_->Delete(write_opts, Key(start + (end-start)/2));
// Should abort with "Value not found"
db_->Delete(write_opts, Key(end-1));
// Should abort with "Unexpected value"
shared.Delete(end-1);
// Should abort with "Unexpected value"
shared.Delete(start + (end-start)/2);
// Should abort with "Value not found"
db_->Delete(write_opts, Key(start));
shared.Delete(start);
db_->Delete(write_opts, Key(end-1));
db_->Delete(write_opts, Key(end-2));
To test the out of range abort, change the key in the for loop to Key(i+1), so that the key defined by the index i is now outside of the supposed range of the database.
Reviewers: emayanke
Reviewed By: emayanke
CC: dhruba, xjin
Differential Revision: https://reviews.facebook.net/D13071
Summary: Adding in the iterpercent flag to tests.
Test Plan: make crash_test
Reviewers: emayanke
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D13035
Summary:
Added MultiIterate() which does a seek and some Next/Prev
calls. Iterator status is checked only, no data integrity check
Test Plan:
make db_stress
./db_stress --iterpercent=<nonzero value> --readpercent=, etc.
Reviewers: emayanke, dhruba, xjin
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12915
Summary:
Added a new field called max_size_amplification_ratio in the
CompactionOptionsUniversal structure. This determines the maximum
percentage overhead of space amplification.
The size amplification is defined to be the ratio between the size of
the oldest file to the sum of the sizes of all other files. If the
size amplification exceeds the specified value, then min_merge_width
and max_merge_width are ignored and a full compaction of all files is done.
A value of 10 means that the size a database that stores 100 bytes
of user data could occupy 110 bytes of physical storage.
Test Plan: Unit test DBTest.UniversalCompactionSpaceAmplification added.
Reviewers: haobo, emayanke, xjin
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12825
Summary: Replace include/leveldb with include/rocksdb.
Test Plan:
make clean; make check
make clean; make release
Differential Revision: https://reviews.facebook.net/D12489
Summary:
This patch adds three new MemTableRep's: UnsortedRep, PrefixHashRep, and VectorRep.
UnsortedRep stores keys in an std::unordered_map of std::sets. When an iterator is requested, it dumps the keys into an std::set and iterates over that.
VectorRep stores keys in an std::vector. When an iterator is requested, it creates a copy of the vector and sorts it using std::sort. The iterator accesses that new vector.
PrefixHashRep stores keys in an unordered_map mapping prefixes to ordered sets.
I also added one API change. I added a function MemTableRep::MarkImmutable. This function is called when the rep is added to the immutable list. It doesn't do anything yet, but it seems like that could be useful. In particular, for the vectorrep, it means we could elide the extra copy and just sort in place. The only reason I haven't done that yet is because the use of the ArenaAllocator complicates things (I can elaborate on this if needed).
Test Plan:
make -j32 check
./db_stress --memtablerep=vector
./db_stress --memtablerep=unsorted
./db_stress --memtablerep=prefixhash --prefix_size=10
Reviewers: dhruba, haobo, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12117
Summary:
Most code change in this diff is code cleanup/rewrite. The logic changes include:
(1) add universal compaction to db_crashtest2.py
(2) randomly set --test_batches_snapshots to be 0 or 1 in db_crashtest2.py. Old codes always use 1.
(3) use different tmp directory as db directory in different runs. I saw some intermittent errors in my local tests. Use of different tmp directory seems to be able to solve the issue.
Test Plan: Have run "make crashtest" for multiple times. Also run "make all check"
Reviewers: emayanke, dhruba, haobo
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D12369
Test Plan:
- make all check;
- make release;
- make stringappend_test; ./stringappend_test
Reviewers: haobo, emayanke
Reviewed By: haobo
CC: leveldb, kailiu
Differential Revision: https://reviews.facebook.net/D12381
Summary: Also expanded class LogFile to have startSequene and FileSize and exposed it publicly
Test Plan: make all check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12087
Summary:
Updated db_bench and utilities/merge_operators.h to allow for dynamic benchmarking
of merge operators in db_bench. Added a new test (--benchmarks=mergerandom), which performs
a bunch of random Merge() operations over random keys. Also added a "--merge_operator=" flag
so that the tester can easily benchmark different merge operators. Currently supports
the PutOperator and UInt64Add operator. Support for stringappend or list append may come later.
Test Plan:
1. make db_bench
2. Test the PutOperator (simulating Put) as follows:
./db_bench --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom --merge_operator=put
--threads=2
3. Test the UInt64AddOperator (simulating numeric addition) similarly:
./db_bench --value_size=8 --benchmarks=fillrandom,readrandom,updaterandom,readrandom,mergerandom,readrandom
--merge_operator=uint64add --threads=2
Reviewers: haobo, dhruba, zshao, MarkCallaghan
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11535
Summary: I changed the db_stress configs, but forgot to update the scripts using the old configs.
Test Plan: 'make blackbox_crash_test' and 'make whitebox_crash_test' start running normally now (I haven't run them til the end, though).
Reviewers: vamsi
Reviewed By: vamsi
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D12303
Summary:
Minor fix to current codes, including: coding style, output format,
comments. No major logic change. There are only 2 real changes, please see my inline comments.
Test Plan: make all check
Reviewers: haobo, dhruba, emayanke
Differential Revision: https://reviews.facebook.net/D12297
Summary: The ability to dump internal keys associated with certain user keys, directly from sst files, is very useful for diagnosis. Will incorporate it directly into ldb later.
Test Plan: run it
Reviewers: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D12075
Summary: rocksdb replicaiton will need this when writing value+TS from master to slave 'as is'
Test Plan: make
Reviewers: dhruba, vamsi, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11919
Summary: Removed KeyMayExistImpl because KeyMayExist demanded Get like semantics now. Removed no_io from memtable and imm because we need the proper value now and shouldn't just stop when we see Merge in memtable. Added checks to block_cache. Updated documentation and unit-test
Test Plan: make all check;db_stress for 1 hour
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11853
Summary: filter_deletes options introduced in db_stress makes it drop Deletes on key if KeyMayExist(key) returns false on the key. code change was simple and tested so not wasting reviewer's time.
Test Plan: maek crash_test; python tools/db_crashtest[1|2].py
CC: dhruba, vamsi
Differential Revision: https://reviews.facebook.net/D11769
Summary:
Introduced KeyMayExist checking during writebatch-delete and removed from Outer Delete API because it uses writebatch-delete.
Added code to skip getting Table from disk if not already present in table_cache.
Some renaming of variables.
Introduced KeyMayExistImpl which allows checking since specified sequence number in GetImpl useful to check partially written writebatch.
Changed KeyMayExist to not be pure virtual and provided a default implementation.
Expanded unit-tests in db_test to check appropriately.
Ran db_stress for 1 hour with ./db_stress --max_key=100000 --ops_per_thread=10000000 --delpercent=50 --filter_deletes=1 --statistics=1.
Test Plan: db_stress;make check
Reviewers: dhruba, haobo
Reviewed By: dhruba
CC: leveldb, xjin
Differential Revision: https://reviews.facebook.net/D11745
Summary:
Wrote a new function in db_impl.c-CheckKeyMayExist that calls Get but with a new parameter turned on which makes Get return false only if bloom filters can guarantee that key is not in database. Delete calls this function and if the option- deletes_use_filter is turned on and CheckKeyMayExist returns false, the delete will be dropped saving:
1. Put of delete type
2. Space in the db,and
3. Compaction time
Test Plan:
make all check;
will run db_stress and db_bench and enhance unit-test once the basic design gets approved
Reviewers: dhruba, haobo, vamsi
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11607
Summary: db_stress should alos print complete statistics like db_bench. Needed this when I wanted to measure number of delete-IOs dropped due to CheckKeyMayExist to be introduced to rocksdb codebase later- to make deltes in rocksdb faster
Test Plan: make db_stress;./db_stress --max_key=100 --ops_per_thread=1000 --statistics=1
Reviewers: sheki, dhruba, vamsi, haobo
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D11655
Summary:
There is a new option called hybrid_mode which, when switched on,
causes HBase style compactions. Files from L0 are
compacted back into L0. This meat of this compaction algorithm
is in PickCompactionHybrid().
All files reside in L0. That means all files have overlapping
keys. Each file has a time-bound, i.e. each file contains a
range of keys that were inserted around the same time. The
start-seqno and the end-seqno refers to the timeframe when
these keys were inserted. Files that have contiguous seqno
are compacted together into a larger file. All files are
ordered from most recent to the oldest.
The current compaction algorithm starts to look for
candidate files starting from the most recent file. It continues to
add more files to the same compaction run as long as the
sum of the files chosen till now is smaller than the next
candidate file size. This logic needs to be debated
and validated.
The above logic should reduce write amplification to a
large extent... will publish numbers shortly.
Test Plan: dbstress runs for 6 hours with no data corruption (tested so far).
Differential Revision: https://reviews.facebook.net/D11289
Summary: As title, now that db_stress supports --map_read properly
Test Plan: make crash_test
Reviewers: vamsi, emayanke, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11391
Summary: as title, also removed an incorrect assertion
Test Plan: make check; db_stress --mmap_read=1; db_stress --mmap_read=0
Reviewers: dhruba, emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11367
Summary:
Rocksdb allos specifying the number of files in L0 that triggers
compactions. Expose this api as a command line parameter for
running db_stress.
Test Plan: Run test
Reviewers: sheki, emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11343
Summary:
This diff simplifies EnvOptions by treating it as POD, similar to Options.
- virtual functions are removed and member fields are accessed directly.
- StorageOptions is removed.
- Options.allow_readahead and Options.allow_readahead_compactions are deprecated.
- Unused global variables are removed: useOsBuffer, useFsReadAhead, useMmapRead, useMmapWrite
Test Plan: make check; db_stress
Reviewers: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11175
Summary: These extra options caught some bugs. Will be run via Jenkins now with the crash_test
Test Plan: ./make crashtest
Reviewers: dhruba, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D11151
Summary:
I think the check for "error" that I added had caused
false alarm. Fixed that.
Test Plan:
Revert Plan: OK
Task ID: #
Reviewers: emayanke, dhruba
Reviewed By: emayanke
Differential Revision: https://reviews.facebook.net/D11139
Summary: Make Statistics usable by client
Test Plan: make check; db_bench
Reviewers: dhruba
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D10899
Summary:
This is initial version. A few ways in which this could
be extended in the future are:
(a) Killing from more places in source code
(b) Hashing stack and using that hash in determining whether to crash.
This is to avoid crashing more often at source lines that are executed
more often.
(c) Raising exceptions or returning errors instead of killing
Test Plan:
This whole thing is for testing.
Here is part of output:
python2.7 tools/db_crashtest2.py -d 600
Running db_stress
db_stress retncode -15 output LevelDB version : 1.5
Number of threads : 32
Ops per thread : 10000000
Read percentage : 50
Write-buffer-size : 4194304
Delete percentage : 30
Max key : 1000
Ratio #ops/#keys : 320000
Num times DB reopens: 0
Batches/snapshots : 1
Purge redundant % : 50
Num keys per lock : 4
Compression : snappy
------------------------------------------------
No lock creation because test_batches_snapshots set
2013/04/26-17:55:17 Starting database operations
Created bg thread 0x7fc1f07ff700
... finished 60000 ops
Running db_stress
db_stress retncode -15 output LevelDB version : 1.5
Number of threads : 32
Ops per thread : 10000000
Read percentage : 50
Write-buffer-size : 4194304
Delete percentage : 30
Max key : 1000
Ratio #ops/#keys : 320000
Num times DB reopens: 0
Batches/snapshots : 1
Purge redundant % : 50
Num keys per lock : 4
Compression : snappy
------------------------------------------------
Created bg thread 0x7ff0137ff700
No lock creation because test_batches_snapshots set
2013/04/26-17:56:15 Starting database operations
... finished 90000 ops
Revert Plan: OK
Task ID: #2252691
Reviewers: dhruba, emayanke
Reviewed By: emayanke
CC: leveldb, haobo
Differential Revision: https://reviews.facebook.net/D10581
Summary: db can't reopen safely with disable_wal set!
Test Plan: make db_stress; run db_stress with disable_wal and reopens set and see error
Reviewers: dhruba, vamsi
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10857
Summary: Will help while debugging if the generated value is truncated at proper length.
Test Plan: make db_stress;/db_stress --max_key=10000 --db=/tmp/mcr --threads=1 --ops_per_thread=10000
Reviewers: dhruba, vamsi
Reviewed By: vamsi
Differential Revision: https://reviews.facebook.net/D10845
Summary: ldb works with raw data from the database and needs to be aware of ttl-database to work with it meaningfully. '-ttl' option now tells it that. Also added onto the ldb_test.py test. This option may be specified alongwith put, get, scan or dump. There is no support to provide a ttl-value and it uses default forever because there is no use-case for this currently.
Test Plan: make ldb_test; python tools/ldb_test.py
Reviewers: dhruba, sheki, haobo, vamsi
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10797
Summary:
When opened with DBTimestamp::Open call, timestamps are prepended to and stripped from the value during subsequent Put and Get calls respectively. The Timestamp is used to discard values in Get and custom compaction filter which have exceeded their TTL which is specified during Open.
Have made a temporary change to Makefile to let us test with the temporary file TestTime.cc. Have also changed the private members of db_impl.h to protected to let them be inherited by the new class DBTimestamp
Test Plan: make db_timestamp; TestTime.cc(will not check it in) shows how to use the apis currently, but I will write unit-tests shortly
Reviewers: dhruba, vamsi, haobo, sheki, heyongqiang, vkrest
Reviewed By: vamsi
CC: zshao, xjin, vkrest, MarkCallaghan
Differential Revision: https://reviews.facebook.net/D10311
Summary:
- don't see a point exposing table.h to the public.
- fixed make clean to remove also *.d files.
Test Plan: make check; db_stress
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10479
Summary: Primarily a refactor. Introduced LDBTool interface to which customers can plug in their options and this will create their own version of ldb tool.
Test Plan: made ldb tool and tried it.
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10191
Summary: To know which options the crashtest was run with. Also changed print to sys.stdout.write which is more standard.
Test Plan: python tools/db_crashtest.py
Reviewers: vamsi, akushner, dhruba
Reviewed By: akushner
Differential Revision: https://reviews.facebook.net/D10119
Summary: make crash_test will now invoke the crash_test. Also some cleanup in the db_crashtest.py file
Test Plan: make crash_test
Reviewers: akushner, vamsi, sheki, dhruba
Reviewed By: vamsi
Differential Revision: https://reviews.facebook.net/D9987
Summary: The crash_test depends on db_stress to work with pre-existing dir
Test Plan: make db_stress; Run db_stress with 'destroy_db_initially=0'
Reviewers: vamsi, dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10041
Summary: For sanity w.r.t. the way we split up the reopens equally among the ops/thread
Test Plan: make db_stress; db_stress --ops_per_thread=10 --reopens=10 => error
Reviewers: vamsi, dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D10023
Summary:
When I run db_crashtest, I am seeing lot of warnings that say db_stress completed
before it was killed. To fix that I made ops per thread a very large value so that it keeps
running until it is killed.
I also set #reopens to 0. Since we are killing the process anyway, the 'simulated crash'
that happens during reopen may not add additional value.
I usually see 10-25K ops happening before the kill. So I increased max_key from 100 to
1000 so that we use more distinct keys.
Test Plan:
Ran a few times.
Revert Plan: OK
Task ID: #
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9909
Summary: The script runs and kills the stress test periodically. Default values have been used in the script now. Should I make this a part of the Makefile or automated rocksdb build? The values can be easily changed in the script right now, but should I add some support for variable values or input to the script? I believe the script achieves its objective of unsafe crashes and reopening to expect sanity in the database.
Test Plan: python tools/db_crashtest.py
Reviewers: dhruba, vamsi, MarkCallaghan
Reviewed By: vamsi
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9369
Summary:
Earlier Statistics object was a raw pointer. This meant the user had to clear up
the Statistics object after creating the database. In most use cases the database is created in a function and the statistics pointer is out of scope. Hence the statistics object would never be deleted.
Now Using a shared_ptr to manage this.
Want this in before the next release.
Test Plan: make all check.
Reviewers: dhruba, emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9735
Summary: simple sed command to replace NULL in tools directory. Was missed by the previous codemod.
Test Plan: it compiles
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9621
Summary:
This patch allows an application to specify whether to use bufferedio,
reads-via-mmaps and writes-via-mmaps per database. Earlier, there
was a global static variable that was used to configure this functionality.
The default setting remains the same (and is backward compatible):
1. use bufferedio
2. do not use mmaps for reads
3. use mmap for writes
4. use readaheads for reads needed for compaction
I also added a parameter to db_bench to be able to explicitly specify
whether to do readaheads for compactions or not.
Test Plan: make check
Reviewers: sheki, heyongqiang, MarkCallaghan
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9429
Summary: ldb_test.py did a lot of assertFalse checks and displayed all the failed messages on the std output making it confusing to tell a successful from a failed run. Also many empty lines used to be needlessly printed. Also added some progression-"feel-good" lines in the tests
Test Plan: python ldb_test.py
Reviewers: dhruba, sheki, dilipj, chip
Reviewed By: dilipj
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9297
Summary:
Also added some comments and fixed some bugs in
stats reporting. Now the stats seem to match what is expected.
Test Plan:
[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --test_batches_snapshots=1 --ops_per_thread=1000 --threads=1 --max_key=320
LevelDB version : 1.5
Number of threads : 1
Ops per thread : 1000
Read percentage : 10
Delete percentage : 30
Max key : 320
Ratio #ops/#keys : 3
Num times DB reopens: 10
Batches/snapshots : 1
Num keys per lock : 4
Compression : snappy
------------------------------------------------
No lock creation because test_batches_snapshots set
2013/03/04-15:58:56 Starting database operations
2013/03/04-15:58:56 Reopening database for the 1th time
2013/03/04-15:58:56 Reopening database for the 2th time
2013/03/04-15:58:56 Reopening database for the 3th time
2013/03/04-15:58:56 Reopening database for the 4th time
Created bg thread 0x7f4542bff700
2013/03/04-15:58:56 Reopening database for the 5th time
2013/03/04-15:58:56 Reopening database for the 6th time
2013/03/04-15:58:56 Reopening database for the 7th time
2013/03/04-15:58:57 Reopening database for the 8th time
2013/03/04-15:58:57 Reopening database for the 9th time
2013/03/04-15:58:57 Reopening database for the 10th time
2013/03/04-15:58:57 Reopening database for the 11th time
2013/03/04-15:58:57 Limited verification already done during gets
Stress Test : 1811.551 micros/op 552 ops/sec
: Wrote 0.10 MB (0.05 MB/sec) (598% of 1011 ops)
: Wrote 6050 times
: Deleted 3050 times
: 500/900 gets found the key
: Got errors 0 times
[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --ops_per_thread=1000 --threads=1 --max_key=320
LevelDB version : 1.5
Number of threads : 1
Ops per thread : 1000
Read percentage : 10
Delete percentage : 30
Max key : 320
Ratio #ops/#keys : 3
Num times DB reopens: 10
Batches/snapshots : 0
Num keys per lock : 4
Compression : snappy
------------------------------------------------
Creating 80 locks
2013/03/04-15:58:17 Starting database operations
2013/03/04-15:58:17 Reopening database for the 1th time
2013/03/04-15:58:17 Reopening database for the 2th time
2013/03/04-15:58:17 Reopening database for the 3th time
2013/03/04-15:58:17 Reopening database for the 4th time
Created bg thread 0x7fc0f5bff700
2013/03/04-15:58:17 Reopening database for the 5th time
2013/03/04-15:58:17 Reopening database for the 6th time
2013/03/04-15:58:18 Reopening database for the 7th time
2013/03/04-15:58:18 Reopening database for the 8th time
2013/03/04-15:58:18 Reopening database for the 9th time
2013/03/04-15:58:18 Reopening database for the 10th time
2013/03/04-15:58:18 Reopening database for the 11th time
2013/03/04-15:58:18 Starting verification
Stress Test : 1836.258 micros/op 544 ops/sec
: Wrote 0.01 MB (0.01 MB/sec) (59% of 1011 ops)
: Wrote 605 times
: Deleted 305 times
: 50/90 gets found the key
: Got errors 0 times
2013/03/04-15:58:18 Verification successful
Revert Plan: OK
Task ID: #
Reviewers: emayanke, dhruba
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9081
Summary: In light of the new option introduced by commit 806e264350 where the database has an option to compact before flushing to disk, we want the stress test to test both sides of the option. Have made it to 'deterministically' and configurably change that option for reopens.
Test Plan: make db_stress; ./db_stress with some differnet options
Reviewers: dhruba, vamsi
Reviewed By: dhruba
CC: leveldb, sheki
Differential Revision: https://reviews.facebook.net/D9165
Summary:
Store the last flushed, seq no. in db_impl. Check against it in
transaction Log iterator. Do not attempt to read ahead if we do not know
if the data is flushed completely.
Does not work if flush is disabled. Any ideas on fixing that?
* Minor change, iter->Next is called the first time automatically for
* the first time.
Test Plan:
existing test pass.
More ideas on testing this?
Planning to run some stress test.
Reviewers: dhruba, heyongqiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D9087
Summary:
Use only the counter mechanism. Do away with
incNumFileOpens, incNumFileClose, incNumFileErrors
s/NULL/nullptr/g in db/table_cache.cc
Test Plan: make clean check
Reviewers: dhruba, heyongqiang, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8841
Summary:
Currently the test tracks all writes in memory and
uses it for verification at the end. This has 4 problems:
(a) It needs mutex for each write to ensure in-memory update
and leveldb update are done atomically. This slows down the
benchmark.
(b) Verification phase at the end is time consuming as well
(c) Does not test batch writes or snapshots
(d) We cannot kill the test and restart multiple times in a
loop because in-memory state will be lost.
I am adding a FLAGS_multi that does MultiGet/MultiPut/MultiDelete
instead of get/put/delete to get/put/delete a group of related
keys with same values atomically. Every get retrieves the group
of keys and checks that their values are same. This does not have
the above problems but the downside is that it does less amount
of validation than the other approach.
Test Plan:
This whole this is a test! Here is a small run. I am doing larger run now.
[nponnekanti@dev902 /data/users/nponnekanti/rocksdb] ./db_stress --ops_per_thread=10000 --multi=1 --ops_per_key=25
LevelDB version : 1.5
Number of threads : 32
Ops per thread : 10000
Read percentage : 10
Delete percentage : 30
Max key : 2147483648
Num times DB reopens: 10
Num keys per lock : 4
Compression : snappy
------------------------------------------------
Creating 536870912 locks
2013/02/20-16:59:32 Starting database operations
Created bg thread 0x7f9ebcfff700
2013/02/20-16:59:37 Reopening database for the 1th time
2013/02/20-16:59:46 Reopening database for the 2th time
2013/02/20-16:59:57 Reopening database for the 3th time
2013/02/20-17:00:11 Reopening database for the 4th time
2013/02/20-17:00:25 Reopening database for the 5th time
2013/02/20-17:00:36 Reopening database for the 6th time
2013/02/20-17:00:47 Reopening database for the 7th time
2013/02/20-17:00:59 Reopening database for the 8th time
2013/02/20-17:01:10 Reopening database for the 9th time
2013/02/20-17:01:20 Reopening database for the 10th time
2013/02/20-17:01:31 Reopening database for the 11th time
2013/02/20-17:01:31 Starting verification
Stress Test : 109.125 micros/op 22191 ops/sec
: Wrote 0.00 MB (0.23 MB/sec) (59% of 32 ops)
: Deleted 10 times
2013/02/20-17:01:31 Verification successful
Revert Plan: OK
Task ID: #
Reviewers: dhruba, emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8733
Summary:
Fixed a bug in the stress-test where the correct size was not being
passed to GenerateValue. This bug was there since the beginning but assertions
were switched on in our code-base only recently.
Added comments on the top detailing how the stress test works and how to
quicken/slow it down after investigation.
Test Plan: make all check. ./db_stress
Reviewers: dhruba, asad
Reviewed By: dhruba
CC: vamsi, sheki, heyongqiang, zshao
Differential Revision: https://reviews.facebook.net/D8727
Summary:
* Introduce is histogram in statistics.h
* stop watch to measure time.
* introduce two timers as a poc.
Replaced NULL with nullptr to fight some lint errors
Should be useful for google.
Test Plan:
ran db_bench and check stats.
make all check
Reviewers: dhruba, heyongqiang
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8637
Summary:
Previously, if you opened a db with num_levels set lower than
the database, you received the unhelpful message "Corruption:
VersionEdit: new-file entry." Now you get a more verbose message
describing the issue.
Also, fix handling of compression_levels (both the run-over-the-end
issue and the memory management of it).
Lastly, unique_ptr'ify a couple of minor calls.
Test Plan: make check
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8151
Summary:
Replace manual memory management with std::unique_ptr in a
number of places; not exhaustive, but this fixes a few leaks with file
handles as well as clarifies semantics of the ownership of file handles
with log classes.
Test Plan: db_stress, make check
Reviewers: dhruba
Reviewed By: dhruba
CC: zshao, leveldb, heyongqiang
Differential Revision: https://reviews.facebook.net/D8043
Summary:
Found issues with `db_test` and `db_stress` when running valgrind.
`DBImpl` had an issue where if an compaction failed then it will use the uninitialised file size of an output file is used. This manifested as the final call to output to the log in `DoCompactionWork()` branching on uninitialized memory (all the way down in printf's innards).
Test Plan:
Ran `valgrind --track_origins=yes ./db_test` and `valgrind ./db_stress` to see if issues disappeared.
Ran `make check` to see if there were no regressions.
Reviewers: vamsi, dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D8001
Summary:
Check in LogAndApply if the file size is more than the limit set in
Options.
Things to consider : will this be expensive?
Test Plan: make all check. Inputs on a new unit test?
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7701
Summary: The queries will come from stdin. One key per line. The output will be in stdout, in the format of "<key> ==> <value>" if found, or "<key>" if not found. "--hex" uses HEX-encoded keys and values in both input and output.
Test Plan: ldb query --db=leveldb_db --hex
Reviewers: dhruba, emayanke, sheki
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7617
Summary: The current parsing logic ignores any additional chars after the arg.
Test Plan: "./sst_dump --verify_checksumAAA" now outputs error.
Reviewers: dhruba, sheki, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7611
Summary:
Create an Executable with
* A thread to do Put's
* A thread to use GetUpdatesSince. Check if we miss any sequence
Numbers.
Test Plan: It runs.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7383
Summary: Now sst_dump has the same option --output_hex as "ldb dump" and also share the same output format. So we can do "sst_dump ... | ldb load ..." for an experiment.
Test Plan:
[zshao@dev485 ~/git/rocksdb] ./sst_dump --file=/data/users/zshao/test_leveldb/000005.sst --output_hex | head -n 2
0000027F4FBE00000101000000000000 ==> D901000000000000000057596F7520726563656976656420746F6461792773207370656369616C20676966742120436C69636B2041636365707420746F207669657720796F75722047696674206265666F72652069742064697361707065617273210000000000000000
000007F9C2D400000102000000000000 ==> D1010000000000000000544974277320676F6F6420746F206265204B696E67E280A6206F7220517565656E212048657265277320796F75722054756573646179204D79737465727920476966742066726F6D20436173746C6556696C6C65219B7B227A636F6465223A22633566306531633039663764222C227470223A22613275222C227A6B223A22663638663061343262666264303966383435666239626235366365396536643024246234506C345139382E734C4D33522169482D4F31315A64794C4B7A4F4653766D7863746534625F2A3968684E3433786521776C427636504A414355795F70222C227473223A313334343936313737357D0000000000000000
Reviewers: dhruba, sheki, emayanke
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7587
Summary: This command accepts key-value pairs from stdin with the same format of "ldb dump" command. This allows us to try out different compression algorithms/block sizes easily.
Test Plan: dump, load, dump, verify the data is the same.
Reviewers: dhruba
Reviewed By: dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D7443
Summary:
The manifest file contains a series of edits. If the verbose
option is switched on, then print each individual edit in the
manifest file. This helps in debugging.
Test Plan: make clean manifest_dump
Reviewers: emayanke, sheki
Reviewed By: sheki
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6807
Summary:
dbstress has an option to reopen the database. Make it such that the
previous handle is not closed before we reopen, this simulates a
situation similar to a process crash.
Added new api to DMImpl to remove the lock file.
Test Plan: run db_stress
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D6777
Summary:
Create more than one background compaction thread if specified.
This code peice is similar to what exists in db_bench.
Test Plan: make check
Differential Revision: https://reviews.facebook.net/D6753
Summary: FLAGS_reopen (configurable) specifies the number of times the databse is to be reopened. FLAGS_ops_per_thread is divided into points based on that reopen field. At these points all threads come together to wait for the databse to reopen. Each thread "votes" for the database to reopen and when all have voted, the database reopens.
Test Plan: make all;./db_stress
Reviewers: dhruba, MarkCallaghan, sheki, asad, heyongqiang
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D6627
Summary:
I changed the reduce_num_levels logic to avoid "compactRange()" call if the current number of levels in use (levels that contain files) is smaller than the new num of levels.
And that change breaks the assert in reduce_levels_test
Test Plan: run reduce_levels_test
Reviewers: dhruba, MarkCallaghan
Reviewed By: dhruba
CC: emayanke, sheki
Differential Revision: https://reviews.facebook.net/D6651
Summary:
disable size compaction in ldb reduce_levels, this will avoid compactions rather than the manual comapction,
added --compression=none|snappy|zlib|bzip2 and --file_size= per-file size to ldb reduce_levels command
Test Plan: run ldb
Reviewers: dhruba, MarkCallaghan
Reviewed By: dhruba
CC: sheki, emayanke
Differential Revision: https://reviews.facebook.net/D6597
Summary: Stress test modified to do deletes and later verify them
Test Plan: running the test: db_stress
Reviewers: dhruba, heyongqiang, asad, sheki, MarkCallaghan
Reviewed By: dhruba
Differential Revision: https://reviews.facebook.net/D6567
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.
Test Plan: make all check
Reviewers: heyongqiang, emayanke, sheki
Reviewed By: heyongqiang
CC: MarkCallaghan
Differential Revision: https://reviews.facebook.net/D6525
Summary: as subject.
Test Plan: manually test it, will add a testcase
Reviewers: dhruba, MarkCallaghan
Differential Revision: https://reviews.facebook.net/D6345