Commit Graph

6492 Commits

Author SHA1 Message Date
Reid Horuff
1b8a2e8fdd [rocksdb] Memtable Log Referencing and Prepared Batch Recovery
Summary:
This diff is built on top of WriteBatch modification: https://reviews.facebook.net/D54093 and adds the required functionality to rocksdb core necessary for rocksdb to support 2PC.

modfication of DBImpl::WriteImpl()
- added two arguments *uint64_t log_used = nullptr, uint64_t log_ref = 0;
- *log_used is an output argument which will return the log number which the incoming batch was inserted into, 0 if no WAL insert took place.
-  log_ref is a supplied log_number which all memtables inserted into will reference after the batch insert takes place. This number will reside in 'FindMinPrepLogReferencedByMemTable()' until all Memtables insertinto have flushed.

- Recovery/writepath is now aware of prepared batches and commit and rollback markers.

Test Plan: There is currently no test on this diff. All testing of this functionality takes place in the Transaction layer/diff but I will add some testing.

Reviewers: IslamAbdelRahman, sdong

Subscribers: leveldb, santoshb, andrewkr, vasilep, dhruba, hermanlee4

Differential Revision: https://reviews.facebook.net/D56919
2016-05-10 14:06:07 -07:00
Reid Horuff
0460e9dcce Modification of WriteBatch to support two phase commit
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.

Test Plan: single unit test in write_batch_test.

Reviewers: hermanlee4, sdong, anthony

Subscribers: leveldb, dhruba, vasilep, andrewkr

Differential Revision: https://reviews.facebook.net/D57867
2016-05-10 14:06:07 -07:00
Andrew Kryczka
f548da33e8 Follow symlinks in chroot directory
Summary:
On Mac OS X, the chroot directory we typically use ("/tmp") is actually
a symlink for "/private/tmp". Since we dereference symlinks in user-defined
paths, we must also dereference symlinks in chroot_dir_ such that we can perform
string comparisons on those paths.

Test Plan: ran env_test on Mac OS X and devserver

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57957
2016-05-10 09:53:52 -07:00
Islam AbdelRahman
d86f9b9c3f Fix lite build
Summary: Fix lite build

Test Plan: run under lite

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57945
2016-05-09 16:08:30 -07:00
Islam AbdelRahman
4b31723433 Add bottommost_compression option
Summary:
Add a new option that can be used to set a specific compression algorithm for bottommost level.
This option will only affect levels larger than base level.

I have also updated CompactionJobInfo to include the compression algorithm used in compaction

Test Plan:
added new unittest
existing unittests

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: lightmark, andrewkr, dhruba, yoshinorim

Differential Revision: https://reviews.facebook.net/D57669
2016-05-09 15:57:19 -07:00
sdong
bfb6b1b8a8 Estimate pending compaction bytes more accurately
Summary: Currently we estimate bytes needed for compaction by assuming fanout value to be level multiplier. It overestimates when size of a level exceeds the target by large. We estimate by the ratio of actual sizes in levels instead.

Test Plan: Fix existing test cases and add a new one.

Reviewers: IslamAbdelRahman, igor, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57789
2016-05-09 15:30:02 -07:00
Andrew Kryczka
258459ed54 Properly destroy ChrootEnv in env_test
Summary: see title

Test Plan:
  $ /mnt/gvfs/third-party2/valgrind/af85c56f424cd5edfc2c97588299b44ecdec96bb/3.10.0/gcc-4.9-glibc-2.20/e9936bf/bin/valgrind --error-exitcode=2 --leak-check=full ./env_test

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57897
2016-05-09 14:38:50 -07:00
Yueh-Hsuan Chiang
fca5aa6fcc Initial script for the new regression test
Summary:
This diff includes an initial script running a set of benchmarks for
regression test.  The script does the following things:

  checkout the specified rocksdb commit (or origin/master as default)
  make clean && DEBUG_LEVEL=0 make db_bench
  setup test directories
  run set of benchmarks and store results

Currently, the script will run couple benchmarks, store all the benchmark
output, extract micros per op and percentile information for each benchmark
and store them in a single SUMMARY.csv file.  The SUMMARY.csv will make the
follow-up regression detection easier.

In addition, the current script only takes env arguments to set important
attributes of db_bench.  Will follow-up with a patch that allows db_bench
to construct options from an options file.

Test Plan:
NUM_KEYS=100 ./tools/regression_test.sh

  Sample SUMMARY.csv file:

                                     commit id,                      benchmark,  ms-per-op,        p50,        p75,        p99,      p99.9,     p99.99
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                        fillseq,      15.28,      54.66,      77.14,    5000.00,   17900.00,   18483.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                      overwrite,      13.54,      57.69,      86.39,    3000.00,   15600.00,   17013.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     readrandom,       1.04,       0.80,       1.67,     293.33,     395.00,     504.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,               readwhilewriting,       2.75,       1.01,       1.87,     200.00,     460.00,     485.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                   deleterandom,       3.64,      48.12,      70.09,     200.00,     336.67,     347.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     seekrandom,      24.31,     391.87,     513.69,     872.73,     990.00,    1048.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,         seekrandomwhilewriting,      14.02,     185.14,     294.15,     700.00,    1440.00,    1527.00

Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr, gunnarku

Reviewed By: gunnarku

Subscribers: gunnarku, MarkCallaghan, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57597
2016-05-09 13:32:57 -07:00
Islam AbdelRahman
e1951b6f28 Add --index_block_restart_interval option in db_bench
Summary:
Pass --index_block_restart_interval flag to block_based_options in db_bench tool.

Test Plan: none

Reviewers: sdong, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57699
2016-05-09 12:09:05 -07:00
Yi Wu
730f7e2e21 Fix win build
Summary: Fixing error with win build where we compare int64_t with size_t.

Test Plan: make check

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57885
2016-05-09 11:52:28 -07:00
Andrew Kryczka
a9b3c47c8e Fix includes for clang on OS X
Summary:
Fix below error:

  use of undeclared identifier 'errno'

Test Plan: doitlive

Reviewers: IslamAbdelRahman, sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57849
2016-05-06 18:32:54 -07:00
Andrew Kryczka
3f16a836a4 Introduce chroot Env
Summary:
For testing backups, we needed an Env that is fully isolated from other
Envs on the same machine. Our in-memory Envs (MockEnv and InMemoryEnv) were
insufficient because they don't implement most directory operations.

This diff introduces a new Env, "ChrootEnv", that translates paths such that the
chroot directory appears to be the root directory. This way, multiple Envs can
be isolated in the filesystem by using different chroot directories. Since we
use the filesystem, all directory operations are trivially supported.

Test Plan:
I parameterized the existing EnvPosixTest so it runs tests on ChrootEnv
except the ioctl-related cases.

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57543
2016-05-06 17:42:50 -07:00
Andrew Kryczka
269f6b2e2d Revert "Modification of WriteBatch to support two phase commit"
Summary: Revert D54093 and D57453

Test Plan: running make check

Reviewers: horuff, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57819
2016-05-06 16:58:24 -07:00
Arun Sharma
04dec2a359 [ldb] Export ldb_cmd*.h
Summary:
This is needed so that rocksdb users can add more
commands to the included ldb tool by adding more custom
commands.

Test Plan: make -j ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57243
2016-05-06 16:09:09 -07:00
Adam Retter
72c73cdc8b Java API - Add missing HEADER_LEVEL logging (#1104) 2016-05-06 15:06:12 -07:00
Adam Retter
4d02bfa3a6 Add support for PauseBackgroundWork and ContinueBackgroundWork to the Java API (#1087)
Closes https://github.com/facebook/rocksdb/issues/1071
2016-05-06 15:04:13 -07:00
Yi Wu
8f65feafc0 Have sandcastle run lite_test for every diff
Summary: Have sandcastle run unit test in lite mode for every diff.

Test Plan: seems sandcastle picked up changes here and running lite_test for this diff.

Reviewers: kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57741
2016-05-06 14:51:20 -07:00
Andrew Kryczka
0d590d9991 Make max_dict_bytes optional in options string
Summary:
For backwards compatibility with older option strings, the parser needs
to treat this argument as optional.

Test Plan:
Updated unit test to cover case where compression_opts is present but
max_dict_bytes is omitted.

Reviewers: MarkCallaghan, sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57759
2016-05-06 11:27:28 -07:00
sdong
7ccb8d6ef3 BlockBasedTable::Get() not to use prefix bloom if read_options.total_order_seek = true
Summary: This is to provide a way for users to skip prefix bloom in point look-up.

Test Plan: Add a new unit test scenario.

Reviewers: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57747
2016-05-06 10:16:11 -07:00
sdong
e3c6ba37dd OptimizeForSmallDb(): revert some options whose defaults were just changed
Summary: We changed default options of max_open_files and max_file_opening_threads but didn't revert it in OptimizeForSmallDb().

Test Plan: Add a unit test

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57675
2016-05-05 16:50:53 -07:00
Islam AbdelRahman
967476eaee Fix valgrind (DBIteratorTest.ReadAhead)
Summary: This test is failing under valgrind because we dont delete the Env that we allocated

Test Plan: run the test under valgrind

Reviewers: andrewkr, yhchiang, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57693
2016-05-05 11:24:08 -07:00
Mark Callaghan
9790b94c92 Add optimize_filters_for_hits option to db_bench
Summary:
Add optimize_filters_for_hits option to db_bench

Task ID: #

Blame Rev:

Test Plan:
run db_bench

Revert Plan:

Database Impact:

Memcache Impact:

Other Notes:

EImportant:

- begin *PUBLIC* platform impact section -
Bugzilla: #
- end platform impact -

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57621
2016-05-05 07:32:10 -07:00
Yi Wu
a4ea345b04 Fixing lite build
Summary: Fixing lite build broke in unit test. `FilesPerLevel()` depends on `DB::GetProperty()`, which lite build doesn't support.

Test Plan: OPT=-DROCKSDB_LITE make check -j64

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57651
2016-05-04 17:20:52 -07:00
Yi Wu
24a24f013d Enable configurable readahead for iterators
Summary:
Add an option `iterator_readahead_size` to `ReadOptions` to enable
configurable readahead for iterators similar to the corresponding
option for compaction.

Test Plan:
```
make commit_prereq
```

Reviewers: kumar.rangarajan, ott, igor, sdong

Reviewed By: sdong

Subscribers: yiwu, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55419
2016-05-04 15:25:58 -07:00
Islam AbdelRahman
ff4b3fb5b4 Fix Iterator::Prev memory pinning bug
Summary: We should not use IterKey::SetKey with copy = false except if we are pinning the iterator thru it's life time, otherwise we may release the temporarily pinned blocks and in this case the IterKey will be pointing to freed memory

Test Plan: added a new test

Reviewers: sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57561
2016-05-03 16:50:01 -07:00
Patrick Chan
cba752d588 sst_dump won't print size for unsupported compression type 2016-05-03 08:46:24 -07:00
Islam AbdelRahman
6e801b0bd1 Eliminate memcpy in Iterator::Prev() by pinning blocks for keys spanning multiple blocks
Summary:
This diff is stacked on top of this diff https://reviews.facebook.net/D56493
The current Iterator::Prev() implementation need to copy every value since the underlying Iterator may move after reading the value.
This can be optimized by making sure that the block containing the value is pinned until the Iterator move. which will improve the throughput by up to 1.5X

master
```
==> 1000000_Keys_100Byte.txt <==
readreverse  :       0.449 micros/op 2225887 ops/sec;  246.2 MB/s
readreverse  :       0.433 micros/op 2311508 ops/sec;  255.7 MB/s
readreverse  :       0.436 micros/op 2294335 ops/sec;  253.8 MB/s
readreverse  :       0.471 micros/op 2121295 ops/sec;  234.7 MB/s
readreverse  :       0.465 micros/op 2152227 ops/sec;  238.1 MB/s
readreverse  :       0.454 micros/op 2203011 ops/sec;  243.7 MB/s
readreverse  :       0.451 micros/op 2216095 ops/sec;  245.2 MB/s
readreverse  :       0.462 micros/op 2162447 ops/sec;  239.2 MB/s
readreverse  :       0.476 micros/op 2099151 ops/sec;  232.2 MB/s
readreverse  :       0.472 micros/op 2120710 ops/sec;  234.6 MB/s

avg : 242.34 MB/s

==> 1000000_Keys_1KB.txt <==
readreverse  :       1.013 micros/op 986793 ops/sec;  978.7 MB/s
readreverse  :       0.942 micros/op 1061136 ops/sec; 1052.5 MB/s
readreverse  :       0.951 micros/op 1051901 ops/sec; 1043.3 MB/s
readreverse  :       0.932 micros/op 1072894 ops/sec; 1064.1 MB/s
readreverse  :       1.024 micros/op 976720 ops/sec;  968.7 MB/s
readreverse  :       0.935 micros/op 1069169 ops/sec; 1060.4 MB/s
readreverse  :       1.012 micros/op 988132 ops/sec;  980.1 MB/s
readreverse  :       0.962 micros/op 1039579 ops/sec; 1031.1 MB/s
readreverse  :       0.991 micros/op 1008924 ops/sec; 1000.7 MB/s
readreverse  :       1.004 micros/op 996144 ops/sec;  988.0 MB/s

avg : 1016.76 MB/s

==> 1000000_Keys_10KB.txt <==
readreverse  :       4.167 micros/op 239952 ops/sec; 2346.9 MB/s
readreverse  :       4.070 micros/op 245713 ops/sec; 2403.3 MB/s
readreverse  :       4.572 micros/op 218733 ops/sec; 2139.4 MB/s
readreverse  :       4.497 micros/op 222388 ops/sec; 2175.2 MB/s
readreverse  :       4.203 micros/op 237920 ops/sec; 2327.1 MB/s
readreverse  :       4.206 micros/op 237756 ops/sec; 2325.5 MB/s
readreverse  :       4.181 micros/op 239149 ops/sec; 2339.1 MB/s
readreverse  :       4.157 micros/op 240552 ops/sec; 2352.8 MB/s
readreverse  :       4.187 micros/op 238848 ops/sec; 2336.1 MB/s
readreverse  :       4.106 micros/op 243575 ops/sec; 2382.4 MB/s

avg : 2312.78 MB/s

==> 100000_Keys_100KB.txt <==
readreverse  :      41.281 micros/op 24224 ops/sec; 2366.0 MB/s
readreverse  :      39.722 micros/op 25175 ops/sec; 2458.9 MB/s
readreverse  :      40.319 micros/op 24802 ops/sec; 2422.5 MB/s
readreverse  :      39.762 micros/op 25149 ops/sec; 2456.4 MB/s
readreverse  :      40.916 micros/op 24440 ops/sec; 2387.1 MB/s
readreverse  :      41.188 micros/op 24278 ops/sec; 2371.4 MB/s
readreverse  :      40.061 micros/op 24962 ops/sec; 2438.1 MB/s
readreverse  :      40.221 micros/op 24862 ops/sec; 2428.4 MB/s
readreverse  :      40.084 micros/op 24947 ops/sec; 2436.7 MB/s
readreverse  :      40.655 micros/op 24597 ops/sec; 2402.4 MB/s

avg : 2416.79 MB/s

==> 10000_Keys_1MB.txt <==
readreverse  :     298.038 micros/op 3355 ops/sec; 3355.3 MB/s
readreverse  :     335.001 micros/op 2985 ops/sec; 2985.1 MB/s
readreverse  :     286.956 micros/op 3484 ops/sec; 3484.9 MB/s
readreverse  :     329.954 micros/op 3030 ops/sec; 3030.8 MB/s
readreverse  :     306.428 micros/op 3263 ops/sec; 3263.5 MB/s
readreverse  :     330.749 micros/op 3023 ops/sec; 3023.5 MB/s
readreverse  :     328.903 micros/op 3040 ops/sec; 3040.5 MB/s
readreverse  :     324.853 micros/op 3078 ops/sec; 3078.4 MB/s
readreverse  :     320.488 micros/op 3120 ops/sec; 3120.3 MB/s
readreverse  :     320.536 micros/op 3119 ops/sec; 3119.8 MB/s

avg : 3150.21 MB/s
```

After memcpy elimination
```

==> 1000000_Keys_100Byte.txt <==
readreverse  :       0.395 micros/op 2529890 ops/sec;  279.9 MB/s
readreverse  :       0.368 micros/op 2715922 ops/sec;  300.5 MB/s
readreverse  :       0.384 micros/op 2603929 ops/sec;  288.1 MB/s
readreverse  :       0.375 micros/op 2663286 ops/sec;  294.6 MB/s
readreverse  :       0.357 micros/op 2802180 ops/sec;  310.0 MB/s
readreverse  :       0.363 micros/op 2757684 ops/sec;  305.1 MB/s
readreverse  :       0.372 micros/op 2689603 ops/sec;  297.5 MB/s
readreverse  :       0.379 micros/op 2638599 ops/sec;  291.9 MB/s
readreverse  :       0.375 micros/op 2663803 ops/sec;  294.7 MB/s
readreverse  :       0.375 micros/op 2665579 ops/sec;  294.9 MB/s

avg: 295.72 MB/s (1.22 X)

==> 1000000_Keys_1KB.txt <==
readreverse  :       0.879 micros/op 1138112 ops/sec; 1128.8 MB/s
readreverse  :       0.842 micros/op 1187998 ops/sec; 1178.3 MB/s
readreverse  :       0.837 micros/op 1194915 ops/sec; 1185.1 MB/s
readreverse  :       0.845 micros/op 1182983 ops/sec; 1173.3 MB/s
readreverse  :       0.877 micros/op 1140308 ops/sec; 1131.0 MB/s
readreverse  :       0.849 micros/op 1177581 ops/sec; 1168.0 MB/s
readreverse  :       0.915 micros/op 1093284 ops/sec; 1084.3 MB/s
readreverse  :       0.863 micros/op 1159418 ops/sec; 1149.9 MB/s
readreverse  :       0.895 micros/op 1117670 ops/sec; 1108.5 MB/s
readreverse  :       0.852 micros/op 1174116 ops/sec; 1164.5 MB/s

avg: 1147.17 MB/s (1.12 X)

==> 1000000_Keys_10KB.txt <==
readreverse  :       3.870 micros/op 258386 ops/sec; 2527.2 MB/s
readreverse  :       3.568 micros/op 280296 ops/sec; 2741.5 MB/s
readreverse  :       4.005 micros/op 249694 ops/sec; 2442.2 MB/s
readreverse  :       3.550 micros/op 281719 ops/sec; 2755.5 MB/s
readreverse  :       3.562 micros/op 280758 ops/sec; 2746.1 MB/s
readreverse  :       3.507 micros/op 285125 ops/sec; 2788.8 MB/s
readreverse  :       3.463 micros/op 288739 ops/sec; 2824.1 MB/s
readreverse  :       3.428 micros/op 291734 ops/sec; 2853.4 MB/s
readreverse  :       3.553 micros/op 281491 ops/sec; 2753.2 MB/s
readreverse  :       3.535 micros/op 282885 ops/sec; 2766.9 MB/s

avg : 2719.89 MB/s (1.17 X)

==> 100000_Keys_100KB.txt <==
readreverse  :      22.815 micros/op 43830 ops/sec; 4281.0 MB/s
readreverse  :      29.957 micros/op 33381 ops/sec; 3260.4 MB/s
readreverse  :      25.334 micros/op 39473 ops/sec; 3855.4 MB/s
readreverse  :      23.037 micros/op 43409 ops/sec; 4239.8 MB/s
readreverse  :      27.810 micros/op 35958 ops/sec; 3512.1 MB/s
readreverse  :      30.327 micros/op 32973 ops/sec; 3220.6 MB/s
readreverse  :      29.704 micros/op 33665 ops/sec; 3288.2 MB/s
readreverse  :      29.423 micros/op 33987 ops/sec; 3319.6 MB/s
readreverse  :      23.334 micros/op 42856 ops/sec; 4185.9 MB/s
readreverse  :      29.969 micros/op 33368 ops/sec; 3259.1 MB/s

avg : 3642.21 MB/s (1.5 X)

==> 10000_Keys_1MB.txt <==
readreverse  :     244.748 micros/op 4085 ops/sec; 4085.9 MB/s
readreverse  :     230.208 micros/op 4343 ops/sec; 4344.0 MB/s
readreverse  :     235.655 micros/op 4243 ops/sec; 4243.6 MB/s
readreverse  :     235.730 micros/op 4242 ops/sec; 4242.2 MB/s
readreverse  :     237.346 micros/op 4213 ops/sec; 4213.3 MB/s
readreverse  :     227.306 micros/op 4399 ops/sec; 4399.4 MB/s
readreverse  :     194.957 micros/op 5129 ops/sec; 5129.4 MB/s
readreverse  :     238.359 micros/op 4195 ops/sec; 4195.4 MB/s
readreverse  :     221.588 micros/op 4512 ops/sec; 4513.0 MB/s
readreverse  :     235.911 micros/op 4238 ops/sec; 4239.0 MB/s

avg : 4360.52 MB/s (1.38 X)
```

Test Plan: COMPILE_WITH_ASAN=1 make check -j64

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56511
2016-05-02 21:46:30 -07:00
Yi Wu
1b166928c7 Release RocksDB 4.8.0
Summary: Release RocksDB 4.8.0

Test Plan: N/A

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57525
2016-05-02 14:38:04 -07:00
Warren Falk
b8cf9130f8 Fix #1110, 32-bit build failure on Mac OSX (#1112)
Using explicit 64-bit type in conditional in platforms above 32-bits
This appears to be necessary on Mac OSX as std::conditional does not appear to short circuit and evaluates the third template arg
Making the third template arg be 64 bits explicitly works around this problem and will work on both 32 bit and 64+ bit platforms.
2016-05-02 10:04:37 -07:00
Islam AbdelRahman
21441c09bd Fix calling GetCurrentMutableCFOptions in CompactionJob::ProcessKeyValueCompaction()
Summary: GetCurrentMutableCFOptions() can only be called when DB mutex is held so we cannot call it in CompactionJob::ProcessKeyValueCompaction() since it's not holding the db mutex

Test Plan: make check -j64

Reviewers: sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57471
2016-04-29 17:00:50 -07:00
Dmitri Smirnov
4ea6e051ee Fix multiple issues with WinMmapFile fo sequential writing (#1108)
make preallocation inline with other writable files
  make sure that we map no more than pre-allocated size.
2016-04-29 16:43:13 -07:00
Islam AbdelRahman
f3bb024fd6 Fix clang build
Summary: fix clang build

Test Plan: USE_CLANG make all -j64

Reviewers: horuff, sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57453
2016-04-29 15:19:19 -07:00
Reid Horuff
6e56a114be Modification of WriteBatch to support two phase commit
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.

Test Plan: single unit test in write_batch_test.

Reviewers: hermanlee4, sdong, anthony

Subscribers: andrewkr, vasilep, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54093
2016-04-29 11:50:30 -07:00
Reid Horuff
1d2e4ef747 ldb support new WAL records 2016-04-29 11:47:24 -07:00
Yi Wu
a92049e3e7 Added EventListener::OnTableFileCreationStarted() callback
Summary: Added EventListener::OnTableFileCreationStarted. EventListener::OnTableFileCreated will be called on failure case. User can check creation status via TableFileCreationInfo::status.

Test Plan: unit test.

Reviewers: dhruba, yhchiang, ott, sdong

Reviewed By: sdong

Subscribers: sdong, kradhakrishnan, IslamAbdelRahman, andrewkr, yhchiang, leveldb, ott, dhruba

Differential Revision: https://reviews.facebook.net/D56337
2016-04-29 11:35:00 -07:00
PraveenSinghRao
e8115cea45 Revert "Use async file handle for better parallelism (#1049)" (#1105)
This reverts commit b54c347424.

Revert async file handle change as it causes failures with appveyor
2016-04-28 22:50:26 -07:00
sdong
6a14f7a976 Change several option defaults
Summary:
Changing several option defaults:
 options.max_open_files changes from 5000 to -1
 options.base_background_compactions changes from max_background_compactions to 1
 options.wal_recovery_mode changes from kTolerateCorruptedTailRecords to kTolerateCorruptedTailRecords
 options.compaction_pri changes from kByCompensatedSize to kByCompensatedSize

Test Plan: Write unit tests to see OldDefaults() works as expected.

Reviewers: IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: MarkCallaghan, yiwu, kradhakrishnan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56427
2016-04-28 17:50:58 -07:00
Peter Mattis
c6c770a1ac Use prefix_same_as_start to avoid iteration in FindNextUserEntryInternal. (#1102)
This avoids excessive iteration in tombstone fields.
2016-04-28 16:48:03 -07:00
sdong
992a8f83b7 Not enable jemalloc status printing if USE_CLANG=1
Summary: Warning is printed out with USE_CLANG=1 when including jemalloc.h. Disable it in that case.

Test Plan: Run db_bench with USE_CLANG=1 and not. Make sure they can all build and jemalloc status is printed out in the case where USE_CLANG is not set.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57399
2016-04-28 16:16:14 -07:00
Islam AbdelRahman
029022b0f1 Fix crash_test
Summary:
crash_test grep for 'fail' string in the output and if found it consider that we failed.
Update the output to use something else

Test Plan: make crash_test (still running)

Reviewers: yhchiang, sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57381
2016-04-28 15:59:33 -07:00
Andrew Kryczka
a06faa6327 Skip PresetCompressionDict test for lite
Summary:
This test relies on "rocksdb.num-files-at-levelN" property that isn't
implemented in rocksdb lite. So we will compile it only for non-lite builds.

Test Plan:
  $ make -j40 check 'OPT=-g -DROCKSDB_LITE'

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57387
2016-04-28 15:11:28 -07:00
Dmitri Smirnov
e7899c6618 Fix build issue. (#1103) 2016-04-28 11:39:12 -07:00
Andrew Kryczka
0f428c5619 Fix compression dictionary clang osx error
Summary:
There was one narrowing conversion in D52287 that only showed up with
clang on osx.

Test Plan:
  $ make clean && USE_CLANG=1 DISABLE_JEMALLOC=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j32 check

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57357
2016-04-28 10:42:10 -07:00
Li Peng
6d4832a998 Merge pull request #1101 from flyd1005/wip-fix-typo
fix typos and remove duplicated words
2016-04-28 02:30:44 -07:00
Islam AbdelRahman
af70f9ac6d Fix typo in build_tools/fbcode_config.sh
Summary: Fix typo in build_tools/fbcode_config.sh

Test Plan: none

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57363
2016-04-27 21:00:22 -07:00
Andrew Kryczka
54de13abac Fix compression dictionary clang errors
Summary: There were a few narrowing conversions that clang didn't like.

Test Plan:
  $ make clean && USE_CLANG=1 DISABLE_JEMALLOC=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j32 check

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57351
2016-04-27 18:30:04 -07:00
Islam AbdelRahman
0850bc5147 Fix build on machines without jemalloc
Summary: It looks like we mistakenly enable JEMALLOC even if it's not available on the machine, that's why travis is failing

Test Plan:
check on my devserver
check on my mac

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57345
2016-04-27 18:25:19 -07:00
Andrew Kryczka
4032145adc Configurable compression in db_bench
Summary:
Made compression type and dictionary size configurable via environment
variables.

Depends on D52287.

Test Plan:
check these options are passed to the db.

  $ COMPRESSION_MAX_DICT_BYTES=65536 COMPRESSION_TYPE=LZ4 NUM_KEYS=10000000 DB_DIR=./tmp/ WAL_DIR=./tmp/ ./tools/benchmark.sh filluniquerandom
  ...
  $ grep Options.compression tmp/LOG
  2016/04/22-19:11:30.397829 7f5f263a2980          Options.compression: LZ4
  ...
  2016/04/22-19:11:30.397837 7f5f263a2980         Options.compression_opts.max_dict_bytes: 65536

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57141
2016-04-27 17:39:18 -07:00
Andrew Kryczka
843d2e3137 Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.

During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.

On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.

Test Plan: new unit test

Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong

Reviewed By: sdong

Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52287
2016-04-27 17:36:03 -07:00
Yueh-Hsuan Chiang
ad573b9027 Temporarily disable CompactFiles in db_stress in its default setting
Summary:
As db_stress with CompactFiles possibly catches a previous bug currently,
temporarily disable CompactFiles in db_stress in its default setting
to allows new bug to be detected while investigating the bug in CompactFiles.

Test Plan: crash test

Reviewers: sdong, kradhakrishnan, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57333
2016-04-27 16:50:51 -07:00