Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.
Test Plan: single unit test in write_batch_test.
Reviewers: hermanlee4, sdong, anthony
Subscribers: leveldb, dhruba, vasilep, andrewkr
Differential Revision: https://reviews.facebook.net/D57867
Summary:
This diff includes an initial script running a set of benchmarks for
regression test. The script does the following things:
checkout the specified rocksdb commit (or origin/master as default)
make clean && DEBUG_LEVEL=0 make db_bench
setup test directories
run set of benchmarks and store results
Currently, the script will run couple benchmarks, store all the benchmark
output, extract micros per op and percentile information for each benchmark
and store them in a single SUMMARY.csv file. The SUMMARY.csv will make the
follow-up regression detection easier.
In addition, the current script only takes env arguments to set important
attributes of db_bench. Will follow-up with a patch that allows db_bench
to construct options from an options file.
Test Plan:
NUM_KEYS=100 ./tools/regression_test.sh
Sample SUMMARY.csv file:
commit id, benchmark, ms-per-op, p50, p75, p99, p99.9, p99.99
7e23ddf575890510e7d2fc7a79b31a1bbf317917, fillseq, 15.28, 54.66, 77.14, 5000.00, 17900.00, 18483.00
7e23ddf575890510e7d2fc7a79b31a1bbf317917, overwrite, 13.54, 57.69, 86.39, 3000.00, 15600.00, 17013.00
7e23ddf575890510e7d2fc7a79b31a1bbf317917, readrandom, 1.04, 0.80, 1.67, 293.33, 395.00, 504.00
7e23ddf575890510e7d2fc7a79b31a1bbf317917, readwhilewriting, 2.75, 1.01, 1.87, 200.00, 460.00, 485.00
7e23ddf575890510e7d2fc7a79b31a1bbf317917, deleterandom, 3.64, 48.12, 70.09, 200.00, 336.67, 347.00
7e23ddf575890510e7d2fc7a79b31a1bbf317917, seekrandom, 24.31, 391.87, 513.69, 872.73, 990.00, 1048.00
7e23ddf575890510e7d2fc7a79b31a1bbf317917, seekrandomwhilewriting, 14.02, 185.14, 294.15, 700.00, 1440.00, 1527.00
Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr, gunnarku
Reviewed By: gunnarku
Subscribers: gunnarku, MarkCallaghan, andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D57597
Summary:
This is needed so that rocksdb users can add more
commands to the included ldb tool by adding more custom
commands.
Test Plan: make -j ldb
Reviewers: sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57243
Summary:
crash_test grep for 'fail' string in the output and if found it consider that we failed.
Update the output to use something else
Test Plan: make crash_test (still running)
Reviewers: yhchiang, sdong, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57381
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
Summary:
As db_stress with CompactFiles possibly catches a previous bug currently,
temporarily disable CompactFiles in db_stress in its default setting
to allows new bug to be detected while investigating the bug in CompactFiles.
Test Plan: crash test
Reviewers: sdong, kradhakrishnan, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D57333
Summary:
Introduced option to dump malloc statistics using new option flag.
Added new command line option to db_bench tool to enable this
funtionality.
Also extended build to support environments with/without jemalloc.
Test Plan:
1) Build rocksdb using `make` command. Launch the following command
`./db_bench --benchmarks=fillrandom --dump_malloc_stats=true
--num=10000000` end verified that jemalloc dump is present in LOG file.
2) Build rocksdb using `DISABLE_JEMALLOC=1 make db_bench -j32` and ran
the same db_bench tool and found the following message in LOG file:
"Please compile with jemalloc to enable malloc dump".
3) Also built rocksdb using `make` command on MacOS to verify behavior
in non-FB environment.
Also to debug build configuration change temporary changed
AM_DEFAULT_VERBOSITY = 1 in Makefile to see compiler and build
tools output. For case 1) -DROCKSDB_JEMALLOC was present in compiler
command line. For both 2) and 3) this flag was not present.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D57321
Comparable with Snappy on comp ratio.
Implemented using Windows API, does not require external package.
Avaiable since Windows 8 and server 2012.
Use -DXPRESS=1 with CMake to enable.
Summary:
Enable testing CompactFiles in db_stress by adding flag test_compact_files
to db_stress.
Test Plan:
./db_stress --test_compact_files=1 --compaction_style=0 --allow_concurrent_memtable_write=false --ops_per_thread=100000
./db_stress --test_compact_files=1 --compaction_style=1 --allow_concurrent_memtable_write=false --ops_per_thread=100000
Sample output (note that it's normal to have some CompactFiles() failed):
Stress Test : 491.891 micros/op 65054 ops/sec
: Wrote 21.98 MB (0.45 MB/sec) (45% of 3200352 ops)
: Wrote 1440728 times
: Deleted 441616 times
: Single deleted 38181 times
: 319251 read and 19025 found the key
: Prefix scanned 640520 times
: Iterator size sum is 9691415
: Iterated 319704 times
: Got errors 0 times
: 1323 CompactFiles() succeed
: 32 CompactFiles() failed
2016/04/11-15:50:58 Verification successful
Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr
Reviewed By: andrewkr
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D56565
Summary:
This is taken from the "Write(GB)" column in compaction stats, so the
units should be GB, not MB.
Test Plan: none
Reviewers: sdong, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56889
Summary: It is useful to print out IO stats in flush jobs too. Extend options.compaction_measure_io_stats to flush jobs and raname it.
Test Plan: Try db_bench and see the stats are printed out.
Reviewers: yhchiang
Reviewed By: yhchiang
Subscribers: kradhakrishnan, yiwu, IslamAbdelRahman, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56769
* Update db_bench_tool.cc
I fixed the wrong letters, LevelDB -> rocksDB, because I thought of LevelDB as the wrong presentation.
the following show my fix :
fprintf(stderr, "LevelDB: version %d.%d\n",
kMajorVersion, kMinorVersion);
----------------->
fprintf(stderr, "rocksDB: version %d.%d\n",
kMajorVersion, kMinorVersion);
* Update db_bench_tool.cc
* Update db_bench_tool.cc
Summary:
strings.h header does not exist on Windows. So, we can try another way
to compare strings ignoring case.
Test Plan:
built and ran:
$ ./ldb_cmd_test
Reviewers: sdong, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D56535
Summary:
Current Message
```
sst_dump [--command=check|scan|none|raw] [--verify_checksum] --file=data_dir_OR_sst_file [--output_hex] [--input_key_hex] [--from=<user_key>] [--to=<user_key>] [--read_num=NUM] [--show_properties] [--show_compression_sizes] [--show_compression_sizes [--set_block_size=<block_size>]]
```
New message
```
sst_dump --file=<data_dir_OR_sst_file> [--command=check|scan|raw]
--file=<data_dir_OR_sst_file>
Path to SST file or directory containing SST files
--command=check|scan|raw
check: Iterate over entries in files but dont print anything except if an error is encounterd (default command)
scan: Iterate over entries in files and print them to screen
raw: Dump all the table contents to <file_name>_dump.txt
--output_hex
Can be combined with scan command to print the keys and values in Hex
--from=<user_key>
Key to start reading from when executing check|scan
--to=<user_key>
Key to stop reading at when executing check|scan
--read_num=<num>
Maximum number of entries to read when executing check|scan
--verify_checksum
Verify file checksum when executing check|scan
--input_key_hex
Can be combined with --from and --to to indicate that these values are encoded in Hex
--show_properties
Print table properties after iterating over the file
--show_compression_sizes
Independent command that will recreate the SST file using 16K block size with different
compressions and report the size of the file using such compression
--set_block_size=<block_size>
Can be combined with --show_compression_sizes to set the block size that will be used
when trying different compression algorithms
```
Test Plan: none
Reviewers: yhchiang, andrewkr, kradhakrishnan, yiwu, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56325
Summary:
Added the column family name to the properties block. This property
is omitted only if the property is unavailable, such as when RepairDB()
writes SST files.
In a next diff, I will change RepairDB to use this new property for
deciding to which column family an existing SST file belongs. If this
property is missing, it will add it to the "unknown" column family (same
as its existing behavior).
Test Plan:
New unit test:
$ ./db_table_properties_test --gtest_filter=DBTablePropertiesTest.GetColumnFamilyNameProperty
Reviewers: IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D55605
Summary:
Adapted a stderr logger from the option tests. Moved it to a separate
header so we can reuse it, e.g., from ldb subcommands for faster debugging. This
is especially useful to make errors/warnings more visible when running
"ldb repair", which involves potential data loss.
Test Plan:
ran options_test and "ldb repair"
$ ./ldb repair --db=./tmp/
[WARN] **** Repaired rocksdb ./tmp/; recovered 1 files; 588bytes. Some data may have been lost. ****
OK
Reviewers: IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D56151
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
Summary:
Undefined Behavior Sanitizer (ubsan) is //a good thing// which will help us to find sneaky bugs with low cost. Please see http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/ for more details and official GCC documentation for more context: https://gcc.gnu.org/onlinedocs/gcc/Instrumentation-Options.html.
Changes itself are quite simple and pretty much imitating whatever is implemented for ASan.
Hooking the UBsan validation build to Sandcastle is a separate step and will be dealt as separate diff because code is in internal repository.
Test Plan: Make sure that that there no regressions when it comes to builds and test pass rate.
Reviewers: leveldb, sdong, IslamAbdelRahman, yhchiang
Reviewed By: yhchiang
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56049
Summary:
Expose the inverse of ToString(hex=true) on Slice: Slice::DecodeHex
Refactor the other implementation of to/from hex in ldb_cmd.h to use the Slice
version
(Difference between the 2 is whether 0x is expected/produced in front of the hex
string or not)
Eliminated support for invalid odd length hex string - this is now invalid
instead of having 1/2 byte set
Added (inverse of HexToString) test for LDBCommand::StringToHex which also
indirectly tests Slice::ToString(true)
After moving the original implementation from ldb_cmd.h, updated it to much simpler/efficient version
(originally/inspired from https://github.com/facebook/wdt/blob/master/util/EncryptionUtils.cpp#L140-L169 )
Test Plan: run tests
Reviewers: uddipta, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56121
record.size() should not be less than 12.
This "magic number" seems to be the WriteBatch header (8 byte sequence
and 4 byte count). Replaced all the places where "12" was used
by WriteBatchInternal::kHeader.
Summary: Test seems to fail if we don't use consistent version between testing forward and backward compatibility.
Test Plan: Run the script (with some version removed manually to make it shorter)
Reviewers: IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55773
Summary: After introducing a less forward-compatible change, update the backward compatible checking tool.
Test Plan: Run it.
Reviewers: IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: IslamAbdelRahman, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55695
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
When the table reader is destroyed, it releases the pinned blocks (if there were any). This has to happen before the cache is destroyed, so I had to introduce a TableReader::Close(), to guarantee the order of destruction.
Test Plan:
Added two unit tests for this. Existing unit tests run fine (default is pin_l0_filter_and_index_blocks_in_cache=false).
DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32
Mac: OK.
Linux: with D55287 patched in it's OK.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D54801
Summary: For compactions benchmarks (both level and universal) we'll use `--max_write_buffer_number=4`. For all the other benchmarks which don't customize the value of `--max_background_flushes` we'll continue using `--max_write_buffer_number=8`.
Test Plan:
To validate basic correctness and command-line options:
```
cd ~/rocksdb
NKEYS=10000000 ./tools/run_flash_bench.sh
```
Reviewers: MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55497
Summary: Set of updates to the subcompaction benchmark script which are based on our internal discussions. The intent behind the changes is to make sure that the scripts will correctly reflect how we're doing the actual benchmarking.
Test Plan: Tested by exercising the full set of compaction benchmarks and validating the execution and consistency of results.
Reviewers: MarkCallaghan, sdong, yhchiang
Reviewed By: yhchiang
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55461
Summary: This will make it easier for admins and devs to use RepairDB.
Test Plan:
Tried deleting the manifest and verified it recovers:
$ ldb --create_if_missing --db=/tmp/test_db put ok ok
$ rm -f /tmp/test_db/MANIFEST-000001
$ ./ldb --db=/tmp/test_db repair
$ ldb --db=/tmp/test_db get ok
ok
Reviewers: yhchiang, sdong, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D55359
Summary: Refactored db_bench transaction stress tests so that they can be called from unit tests as well.
Test Plan: run new unit test as well as db_bench
Reviewers: yhchiang, IslamAbdelRahman, sdong
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D55203
Summary: After closely working with Mark, Siying, and Yueh-Hsuan this set of changes reflects the updates needed to measure RocksDB subcompaction performance in a correct manner. The essence of the benchmark is executing `fillrandom` followed by `compact` with the correct set of options for various number of subcompactions specified.
Test Plan: Tested internally to verify correctness and reliability.
Reviewers: sdong, yhchiang, MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D55089
Test Plan:
Built and ran with gflags:
% ./db_bench
LevelDB: version 4.5
Date: Tue Feb 16 12:04:23 2016
CPU: 40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
...
And without gflags:
% ./db_bench
Please install gflags to run rocksdb tools
%
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: igor, dhruba
Differential Revision: https://reviews.facebook.net/D54243
Summary:
Implement a benchmark for universal compaction based on the feature description (see below), in-person discussions, and reading source code:
https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guidehttps://github.com/facebook/rocksdb/wiki/Universal-Compactionhttps://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#universal-compaction
Universal compaction benchmark is based on `overwrite` benchmark, adding compaction specific options to it, and executing it for different values of subcompaction to understand the impact of scaling out subcompactions for a particular scenario.
Test Plan:
- Execute the benchmark on various machines for multiple iterations to verify the reliability.
- Observe the output to make sure that compaction is taking place.
- Observe the execution to make sure that arguments passed to `db_bench` are correct.
Reviewers: sdong, MarkCallaghan
Reviewed By: MarkCallaghan
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D54045
Summary: Default crash test uses prefix hash memtable, which is not compatible to concurrent memtable. Allow prefix test run with skip list and use skip list memtable when concurrent insert is used.
Test Plan: Run "python -u tools/db_crashtest.py whitebox" and watch sometimes skip list is used.
Reviewers: anthony, yhchiang, kradhakrishnan, andrewkr, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D53907