Commit Graph

437 Commits

Author SHA1 Message Date
Andrew Kryczka
fa50fffaf5 Option to expand range tombstones in db_bench
Summary:
When enabled, this option replaces range tombstones with a sequence of
point tombstones covering the same range. This can be used to A/B test perf of
range tombstones vs sequential point tombstones, and help us find the cross-over
point, i.e., the size of the range above which range tombstones outperform point
tombstones.
Closes https://github.com/facebook/rocksdb/pull/1594

Differential Revision: D4246312

Pulled By: ajkr

fbshipit-source-id: 3b00b23
2016-12-06 18:09:14 -08:00
Yi Wu
56281f3a97 Add memtable_insert_with_hint_prefix_size option to db_bench
Summary:
Add memtable_insert_with_hint_prefix_size option to db_bench
Closes https://github.com/facebook/rocksdb/pull/1604

Differential Revision: D4260549

Pulled By: yiwu-arbug

fbshipit-source-id: cee5ef7
2016-12-01 16:54:16 -08:00
Islam AbdelRahman
4a21b1402c Cache heap::downheap() root comparison (optimize heap cmp call)
Summary:
Reduce number of comparisons in heap by caching which child node in the first level is smallest (left_child or right_child)
So next time we can compare directly against the smallest child

I see that the total number of calls to comparator drops significantly when using this optimization

Before caching (~2mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000  --perf_level=2
readseq      :       0.338 micros/op 2959201 ops/sec;  327.4 MB/s user_key_comparison_count = 2000008
```
After caching (~1mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000 --perf_level=2
readseq      :       0.309 micros/op 3236801 ops/sec;  358.1 MB/s user_key_comparison_count = 1000011
```

It also improves
Closes https://github.com/facebook/rocksdb/pull/1600

Differential Revision: D4256027

Pulled By: IslamAbdelRahman

fbshipit-source-id: 76fcc66
2016-12-01 13:39:14 -08:00
Igor Canadi
3f407b065c Kill flashcache code in RocksDB
Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72
2016-12-01 10:09:22 -08:00
Andrew Kryczka
fd43ee09da Range deletion microoptimizations
Summary:
- Made RangeDelAggregator's InternalKeyComparator member a reference-to-const so we don't need to copy-construct it. Also added InternalKeyComparator to ImmutableCFOptions so we don't need to construct one for each DBIter.
- Made MemTable::NewRangeTombstoneIterator and the table readers' NewRangeTombstoneIterator() functions return nullptr instead of NewEmptyInternalIterator to avoid the allocation. Updated callers accordingly.
Closes https://github.com/facebook/rocksdb/pull/1548

Differential Revision: D4208169

Pulled By: ajkr

fbshipit-source-id: 2fd65cf
2016-11-21 12:24:13 -08:00
Andrew Kryczka
018bb2ebf5 DeleteRange support for db_bench
Summary:
Added a few options to configure when to add range tombstones during
any benchmark involving writes.
Closes https://github.com/facebook/rocksdb/pull/1522

Differential Revision: D4187388

Pulled By: ajkr

fbshipit-source-id: 2c8a473
2016-11-15 17:39:47 -08:00
Andrew Kryczka
53b693f5fe ldb support for range delete
Summary:
Add a subcommand to ldb with which we can delete a range of keys.
Closes https://github.com/facebook/rocksdb/pull/1521

Differential Revision: D4186338

Pulled By: ajkr

fbshipit-source-id: b8e9861
2016-11-15 15:54:20 -08:00
Lijun Tang
adb665e0bf Allowed delayed_write_rate option to be dynamically set.
Summary: Closes https://github.com/facebook/rocksdb/pull/1488

Differential Revision: D4157784

Pulled By: siying

fbshipit-source-id: f150081
2016-11-12 15:54:11 -08:00
Anirban Rahut
14c0380e78 Convenience option to parse an internal key on command line
Summary:
enhancing sst_dump to be able to parse internal key
Closes https://github.com/facebook/rocksdb/pull/1482

Differential Revision: D4154175

Pulled By: siying

fbshipit-source-id: b0e28b1
2016-11-10 10:09:21 -08:00
Peter (Stig) Edwards
144cdb8f16 16384 as e.g .value for compression_max_dict_bytes
Summary:
Use 16384 as e.g .value for ldb the --compression_max_dict_bytes option.
I think 14 was copy and pasted from the options in the lines above.
Closes https://github.com/facebook/rocksdb/pull/1483

Differential Revision: D4154393

Pulled By: siying

fbshipit-source-id: ef53a69
2016-11-09 11:24:20 -08:00
Andrew Kryczka
9e7cf3469b DeleteRange user iterator support
Summary:
Note: reviewed in  https://reviews.facebook.net/D65115

- DBIter maintains a range tombstone accumulator. We don't cleanup obsolete tombstones yet, so if the user seeks back and forth, the same tombstones would be added to the accumulator multiple times.
- DBImpl::NewInternalIterator() (used to make DBIter's underlying iterator) adds memtable/L0 range tombstones, L1+ range tombstones are added on-demand during NewSecondaryIterator() (see D62205)
- DBIter uses ShouldDelete() when advancing to check whether keys are covered by range tombstones
Closes https://github.com/facebook/rocksdb/pull/1464

Differential Revision: D4131753

Pulled By: ajkr

fbshipit-source-id: be86559
2016-11-04 12:09:22 -07:00
Benoit Girard
2b16d664cb Change max_bytes_for_level_multiplier to double
Summary: Closes https://github.com/facebook/rocksdb/pull/1427

Differential Revision: D4094732

Pulled By: yiwu-arbug

fbshipit-source-id: b9b79e9
2016-11-01 21:09:23 -07:00
Siying Dong
f9eb56791a db_bench: --dump_malloc_stats takes no effect
Summary:
Fix the bug that --dump_malloc_stats is set before opening the DB.
Closes https://github.com/facebook/rocksdb/pull/1447

Differential Revision: D4106001

Pulled By: siying

fbshipit-source-id: 4e746da
2016-10-31 14:54:26 -07:00
Kien-hung Li
eeb27e1bbd Add handy option to turn on direct I/O in db_bench (#1424) 2016-10-28 10:36:05 -07:00
Yueh-Hsuan Chiang
f83cd64c0b Fix a bug that mistakenly disable regression_test.sh to update commit (#1415)
Summary:
Fix a bug that mistakenly disable regression_test.sh to update commit

Test Plan:
regression_test.sh
2016-10-21 17:26:24 -07:00
Aaron Gao
08616b4933 [db_bench] add filldeterministic (Universal+level compaction)
Summary:
in db_bench, we can dynamically create a rocksdb database that guarantees the shape of its LSM.
universal + level compaction
no fifo compaction
no multi db support

Test Plan:
./db_bench -benchmarks=fillseqdeterministic -compaction_style=1 -num_levels=3 --disable_auto_compactions -num=1000000 -value_size=1000
```
---------------------- LSM ---------------------
Level[0]: /000480.sst(size: 35060275 bytes)
Level[0]: /000479.sst(size: 70443197 bytes)
Level[0]: /000478.sst(size: 141600383 bytes)
Level[1]: /000341.sst - /000475.sst(total size: 284726629 bytes)
Level[2]: /000071.sst - /000340.sst(total size: 568649806 bytes)
fillseqdeterministic :      60.447 micros/op 16543 ops/sec;   16.0 MB/s
```

Reviewers: sdong, andrewkr, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63111
2016-10-18 16:30:57 -07:00
dhruba borthakur
5027dd17a7 Fix a minor bug in the ldb tool that was not selecting the specified (#1399)
column family for compaction.
2016-10-17 10:40:30 -07:00
Siying Dong
a249a0b75b check_format_compatible.sh to use some branch which allows to run with GCC 4.8 (#1393)
Summary:
Some older tags don't run GCC 4.8 with FB internal setting. Fixed them and created branches. Change the format compatible script accordingly.

Also add more releases to check format compatibility.
2016-10-13 16:15:55 -07:00
Kefu Chai
21f4bb5a89 cmake support for linux and osx (#1358)
* enable cmake to work on linux and osx also

* port part of build_detect_platform not covered by thirdparty.inc
  to cmake.
  - detect fallocate()
  - detect malloc_usable_size()
  - detect JeMalloc
  - detect snappy
* check for asan,tsan,ubsan
* create 'build_version.cc' in build directory.
* add `check` target to support 'make check'.
* add `tools` target to match its counterpart in Makefile.
* use `date` on non-win32 platforms.
* pass different cflags on non-win32 platforms
* detect pthead library using FindThread cmake module.
* enable CMP0042 to silence the cmake warning on osx
* reorder the linked libraries. because testutillib references gtest, to
  enable the linker to find the referenced symbols, we need to put gtest
  after testutillib.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>

* hash_table_bench.cc: fix build without gflags

Signed-off-by: Kefu Chai <kchai@redhat.com>

* remove gtest from librocksdb linkage

testharness.cc is included in librocksdb sources, and it uses gtest. but
gtest is not supposed to be part of the public API of librocksdb. so, in
this change, the testharness.cc is moved out out librocksdb, and is
built as an object target, then linked with the tools and tests instead.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-09-28 11:53:15 -07:00
Yi Wu
9ed928e7a9 Split DBOptions into ImmutableDBOptions and MutableDBOptions
Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.

Test Plan:
  make all check

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64065
2016-09-23 16:34:04 -07:00
rockeet
4c3f4496b5 Add TableBuilderOptions::level and relevant changes (#1335) 2016-09-17 22:30:43 -07:00
sdong
607628d349 Support ZSTD with finalized format
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.

Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63141
2016-09-06 12:22:16 -07:00
Yi Wu
a88677d2cf Remove ImmutableCFOptions from public API
Summary: There's no reference to ImmutableCFOptions elsewhere in /include/rocksdb. ImmutableCFOptions was introduced in this commit (5665e5e285) but later its reference in /include/rocksdb/table.h is removed.

Test Plan:
  make all check

Reviewers: IslamAbdelRahman, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63177
2016-09-02 14:16:31 -07:00
Islam AbdelRahman
5051755e35 Fix db_bench memory use after free (detected by clang_analyze)
Summary: Fix using `arg[i].thread` after deleting it

Test Plan: run clang_analyze

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63171
2016-09-02 10:58:08 -07:00
sdong
32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
Islam AbdelRahman
b49b92cf28 Introduce Read amplification bitmap (read amp statistics)
Summary:
Add ReadOptions::read_amp_bytes_per_bit option which allow us to create a bitmap for every data block we read
the bitmap will contain (block_size / read_amp_bytes_per_bit) bits.

We will use this bitmap to mark which bytes have been used of the block so we can calculate the read amplification

Test Plan: added new tests

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: yiwu, leveldb, march, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58707
2016-08-26 18:55:58 -07:00
Islam AbdelRahman
cce702a6e4 [db_bench] Support single benchmark arguments (Repeat for X times, Warm up for X times), Support CombinedStats (AVG / MEDIAN)
Summary:
This diff allow us to run a single benchmark X times and warm it up for Y times. and see the AVG & MEDIAN throughput of these X runs
for example

```
$ ./db_bench --benchmarks="fillseq,readseq[X5-W2]"
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 4.12
Date:       Wed Aug 24 10:45:26 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-8616/dbbench]
fillseq      :       4.695 micros/op 212971 ops/sec;   23.6 MB/s
DB path: [/tmp/rocksdbtest-8616/dbbench]
Warming up benchmark by running 2 times
readseq      :       0.214 micros/op 4677005 ops/sec;  517.4 MB/s
readseq      :       0.212 micros/op 4706834 ops/sec;  520.7 MB/s
Running benchmark for 5 times
readseq      :       0.218 micros/op 4588187 ops/sec;  507.6 MB/s
readseq      :       0.208 micros/op 4816538 ops/sec;  532.8 MB/s
readseq      :       0.213 micros/op 4685376 ops/sec;  518.3 MB/s
readseq      :       0.214 micros/op 4676787 ops/sec;  517.4 MB/s
readseq      :       0.217 micros/op 4618532 ops/sec;  510.9 MB/s
readseq [AVG    5 runs] : 4677084 ops/sec;  517.4 MB/sec
readseq [MEDIAN 5 runs] : 4676787 ops/sec;  517.4 MB/sec
```

Test Plan: run db_bench

Reviewers: sdong, andrewkr, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62235
2016-08-25 12:57:35 -07:00
Yi Wu
d76ddf327d Disable ClockCache db_crashtest
Summary: Tempororily disable clock cache in db_crashtest while we investigate data race issue with clock cache.

Test Plan:
    python ./tools/db_crashtest.py blackbox

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62481
2016-08-24 14:19:05 -07:00
Yi Wu
4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
Mark Callaghan
4fe12baa66 Make db_bench less space for --stats_per_interval
Summary:
Changes compaction IO stats to be printed once per interval rather
than once per interval per thread. https://github.com/facebook/rocksdb/issues/1276

Task ID: #12698508

Test Plan: run db_bench

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62067
2016-08-15 11:07:05 -07:00
sdong
8b79422b52 [Proof-Of-Concept] RocksDB Blob Storage with a blob log file.
Summary:
This is a proof of concept of a RocksDB blob log file. The actual value of the Put() is appended to a blob log using normal data block format, and the handle of the block is written as the value of the key in RocksDB.

The prototype only supports Put() and Get(). It doesn't support DB restart, garbage collection, Write() call, iterator, snapshots, etc.

Test Plan: Add unit tests.

Reviewers: arahut

Reviewed By: arahut

Subscribers: kradhakrishnan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61485
2016-08-10 17:05:17 -07:00
sdong
f35b16f246 db_bench add an option of --base_background_compactions
Summary: As title.

Test Plan: Run the benchmark with and without the parameter.

Reviewers: yiwu, andrewkr, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61491
2016-08-04 11:06:00 -07:00
omegaga
64046e581c Write a benchmark to emulate time series data
Summary: Add a benchmark to `db_bench`. In this benchmark, a write thread will populate time series data in the format of 'id | timestamp', and multiple read threads will randomly retrieve all data from one id at a time.

Test Plan: Run the benchmark: `num=134217728;bpl=536870912;mb=67108864;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4;delay=8;stop=12;wbn=3;mbc=20;wbs=134217728;dds=0;sync=0;t=32;vs=800;bs=4096;cs=17179869184;of=500000;wps=0;si=10000000; kir=100000; dir=/data/users/jhli/test/; ./db_bench --benchmarks=timeseries --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$num --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=6 --open_files=$of --verify_checksum=1 --db=$dir --sync=$sync --disable_wal=0 --compression_type=none --stats_interval=$si --compression_ratio=1 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=0 --key_id_range=$kir`

Reviewers: andrewkr, sdong

Reviewed By: sdong

Subscribers: lgalanis, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60651
2016-08-01 12:28:51 -07:00
Islam AbdelRahman
557748ff7b Fix db_stress failure (pass merge_operator even if not used)
Summary:
db_stress test is now failing because of this scenario
- run db_stress with merge_operator enabled (now we have a db with merge operands)
- run db_stress with merge_operator disabled (now when we fail to open the db)

the solution is to pass the merge_operator to the DB even if we are not going to do any merge operations

Test Plan: Check the failure

Reviewers: andrewkr, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61311
2016-07-29 11:02:03 -07:00
sdong
d4c45428af db_stress shouldn't assert file size 0 if file creation fails
Summary: OnTableFileCreated() now is also called when the file creaion fails. In that case, we shouldn't assert the file size is not 0.

Test Plan: Run crash test

Reviewers: yiwu, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61137
2016-07-27 12:08:16 -07:00
sdong
e5b5f12b81 Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60513
2016-07-26 18:15:11 -07:00
Andrew Kryczka
bbd6a5a184 ldb restore subcommand
Summary:
- Added a new subcommand, 'ldb restore', that restores from backup
- Made backup_env_uri optional (also for 'ldb backup') because it can use db_env when backup_env isn't provided

Test Plan:
verify backup and restore commands work:

  $ ./ldb backup --db=/data/users/andrewkr/rocksdb/db_bench-out/ --num_threads 1 --backup_dir=/data/users/andrewkr/rocksdb/db_bench-out/backup/
  $ ./ldb restore --db=/data/users/andrewkr/rocksdb/db_bench-out/restore/ --num_threads 1 --backup_dir=/data/users/andrewkr/rocksdb/db_bench-out/backup/

Reviewers: sdong, wanning

Reviewed By: wanning

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60849
2016-07-26 11:13:26 -07:00
zhang-tong
b2a8016df1 Update db_bench_tool.cc (#1239)
* Update db_bench_tool.cc

* Update db_bench_tool.cc
2016-07-25 14:50:44 -07:00
sdong
f85df120f2 Re-enable tsan crash white-box test with reduced killing odds
Summary:
Tsan crash white-box test was disabled because it never ends. Re-enable it with reduced killing odds.
Add a parameter in crash test script to allow we pass it through an environment variable.

Test Plan: Run it manually.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61053
2016-07-22 12:51:27 -07:00
Peter (Stig) Edwards
b06ca5f860 ldb load, prefer ifsteam(/dev/stdin) to std::cin (#1207)
getline on std::cin can be very inefficient when ldb is loading large values, with high CPU usage in libc _IO_(un)getc, this is because of the performance penalty that comes from synchronizing stdio and iostream buffers.
See the reproducers and tests in #1133 .
If an ifstream on /dev/stdin is used (when available) then using ldb to load large values can be much more efficient.
I thought for ldb load, that this approach is preferable to using <cstdio> or std::ios_base::sync_with_stdio(false).
I couldn't think of a use case where ldb load would need to support reading unbuffered input, an alternative approach would be to add support for passing --input_file=/dev/stdin.
I have a CLA in place, thanks.

The CI tests were failing at the time of https://github.com/facebook/rocksdb/pull/1156, so this change and PR will supersede it.
2016-07-22 11:46:40 -07:00
Islam AbdelRahman
68a8e6b8fa Introduce FullMergeV2 (eliminate memcpy from merge operators)
Summary:
This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>

This diff is stacked on top of D56493 and D56511

In this diff we
- Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
- Replace std::deque<std::string> with std::vector<Slice> to pass operands
- Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
- Allow FullMergeV2 output to be an existing operand

```
[Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s

[master]
readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
```

```
[Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s

[master]
readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s

```

```
[Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]

[FullMergeV2]
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s

[master]
readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
```

```
[Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions

[FullMergeV2]
readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s

[master]
readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
```

Test Plan: COMPILE_WITH_ASAN=1 make check -j64

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: lovro, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57075
2016-07-20 09:49:03 -07:00
John Alexander
9430333f84 New Statistics to track Compression/Decompression (#1197)
* Added new statistics and refactored to allow ioptions to be passed around as required to access environment and statistics pointers (and, as a convenient side effect, info_log pointer).

* Prevent incrementing compression counter when compression is turned off in options.

* Prevent incrementing compression counter when compression is turned off in options.

* Added two more supported compression types to test code in db_test.cc

* Prevent incrementing compression counter when compression is turned off in options.

* Added new StatsLevel that excludes compression timing.

* Fixed casting error in coding.h

* Fixed CompressionStatsTest for new StatsLevel.

* Removed unused variable that was breaking the Linux build
2016-07-19 09:44:03 -07:00
Wanning Jiang
880ee363eb ldb backup support
Summary: add backup support for ldb tool, and use it to run load test for backup on two HDFS envs

Test Plan: first generate some db, then compile against load test in fbcode, run load_test --db=<db path> backup --backup_env_uri=<URI of backup env> --backup_dir=<backup directory> --num_threads=<number of thread>

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60633
2016-07-14 14:09:31 -07:00
Yi Wu
296545a2c7 Fix clang analyzer errors
Summary:
Fixing erros reported by clang static analyzer.
* Removing some unused variables.
* Adding assertions to fix false positives reported by clang analyzer.
* Adding `__clang_analyzer__` macro to suppress false positive warnings.

Test Plan:
    USE_CLANG=1 OPT=-g make analyze -j64

Reviewers: andrewkr, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60549
2016-07-08 17:50:51 -07:00
sdong
32df9733d1 Add options.write_buffer_manager: control total memtable size across DB instances
Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances.

Test Plan: Add a new unit test.

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59925
2016-07-05 18:11:25 -07:00
John Alexander
43692793e3 Fixed Minor Bug on Windows Build and db_bench_tool.cc (#1189)
* Fixed Windows build error in CMakeLists.txt and perf_level error in db_bench_tool.cc

* Changed hard-coded perf levels in db_bench_tool.cc to enum values from perf_level.h

* Replaced remaining FLAGS_perf_level > 0
2016-06-29 18:44:22 -07:00
Yueh-Hsuan Chiang
faa7eb3b99 Improve regression_test.sh
Summary:
This diff makes the following improvement in regression_test.sh:

1. Add NUM_OPS and DELETE_TEST_PATH to regression_test.sh:

  * NUM_OPS:  The number of operations that will be issued
    in EACH thread.
    Default: $NUM_KEYS / $NUM_THREADS

  * DELETE_TEST_PATH: If true, then the test directory
    will be deleted after the script ends.
    Default: 0
2. Add more information in SUMMARY.csv
3. Fix a bug in regression_test.sh where each thread in fillseq will all issue $NUM_KEYS writes.
4. Add --deletes in db_bench, which allows us to control the number of deletes instead of must using FLAGS_num.

Test Plan: run regression test with and without DELETE_TEST_PATH and NUM_OPS

Reviewers: yiwu, sdong, IslamAbdelRahman, gunnarku

Reviewed By: gunnarku

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60039
2016-06-28 14:10:24 -07:00
Islam AbdelRahman
d6b79e2fd0 Remove filter_deletes from crash_test
Summary: filter_deletes option was removed, remove it from crash_test to fix it

Test Plan: make crash_test

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59901
2016-06-21 17:30:21 -07:00
sdong
7b79238b65 Deprectate filter_deletes
Summary: filter_deltes is not a frequently used feature. Remove it.

Test Plan: Run all test suites.

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59427
2016-06-17 10:30:47 -07:00
Andrew Kryczka
ca3db54788 Fetch branches from github for format compatibility test
Summary:
The sandcastle setup doesn't provide a remote with our branches. Need
to fetch them directly from github.

Test Plan:
ran script on devserver and the relevant commands on a sandcastle
host.

Reviewers: IslamAbdelRahman, lightmark, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59511
2016-06-10 13:17:14 -07:00