Commit Graph

518 Commits

Author SHA1 Message Date
Aaron Gao
78cb195595 add checkpoint to ldb
Summary: Closes https://github.com/facebook/rocksdb/pull/2017

Differential Revision: D4747656

Pulled By: lightmark

fbshipit-source-id: c52f160
2017-03-21 11:54:11 -07:00
Aaron Gao
93c68b642e change regression bash file with debug mode
Summary:
add debug mode for better debugging
refactor some regex
change micros/op to ops/sec
Closes https://github.com/facebook/rocksdb/pull/1999

Differential Revision: D4742806

Pulled By: lightmark

fbshipit-source-id: 0fe3ae6
2017-03-20 17:39:17 -07:00
Andrew Kryczka
e66221add4 fix db_bench rate limiter callsites
Summary:
pass nullptr as stats object for db_bench-specific rate limiters since its stats are intended to capture background write activity only.
Closes https://github.com/facebook/rocksdb/pull/1997

Differential Revision: D4726806

Pulled By: ajkr

fbshipit-source-id: 8e4b225
2017-03-16 17:54:12 -07:00
Maysam Yabandeh
11526252cc Pinnableslice (2nd attempt)
Summary:
PinnableSlice

    Summary:
    Currently the point lookup values are copied to a string provided by the
    user. This incures an extra memcpy cost. This patch allows doing point lookup
    via a PinnableSlice which pins the source memory location (instead of
    copying their content) and releases them after the content is consumed
    by the user. The old API of Get(string) is translated to the new API
    underneath.

    Here is the summary for improvements:

    value 100 byte: 1.8% regular, 1.2% merge values
    value 1k byte: 11.5% regular, 7.5% merge values
    value 10k byte: 26% regular, 29.9% merge values
    The improvement for merge could be more if we extend this approach to
    pin the merge output and delay the full merge operation until the user
    actually needs it. We have put that for future work.

    PS:
    Sometimes we observe a small decrease in performance when switching from
    t5452014 to this patch but with the old Get(string) API. The d
Closes https://github.com/facebook/rocksdb/pull/1756

Differential Revision: D4391738

Pulled By: maysamyabandeh

fbshipit-source-id: 6f3edd3
2017-03-13 11:54:10 -07:00
Reid Horuff
ebd5639b6d Add ability to search for key prefix in sst_dump tool
Summary:
Add the flag --prefix to the sst_dump tool
This flag is similar to, and exclusive from, the --from flag.

--prefix=0x00FF will return all rows prefixed with 0x00FF.
The --to flag may also be specified and will work as expected.

These changes were used to help in debugging the power cycle corruption issue and theses changes were tested by scanning through a udb.
Closes https://github.com/facebook/rocksdb/pull/1984

Differential Revision: D4691814

Pulled By: reidHoruff

fbshipit-source-id: 027f261
2017-03-13 10:39:12 -07:00
Maysam Yabandeh
5dae019477 Revert "Report cpu usage using time command"
Summary:
This reverts commit d43adf21bb.

The patch has caused problems in regression tests. Will revert it for now until we figure how to debug the problems regression tests.
Closes https://github.com/facebook/rocksdb/pull/1975

Differential Revision: D4682880

Pulled By: maysamyabandeh

fbshipit-source-id: 84df83a
2017-03-09 11:09:13 -08:00
Maysam Yabandeh
d43adf21bb Report cpu usage using time command
Summary:
It augments the regression benchmarks with a time command, parses the output, and print them to the SUMMARY.csv file.

I tested a variation of the script locally. Any idea how to do run a test that also involves writing to scuba tables?
Closes https://github.com/facebook/rocksdb/pull/1967

Differential Revision: D4679470

Pulled By: maysamyabandeh

fbshipit-source-id: 44dac30
2017-03-08 17:54:11 -08:00
Andrew Kryczka
2a5daa06f0 Add stderr log level for ldb backup commands
Summary:
Also extracted the common logic into a base class, BackupableCommand.
Closes https://github.com/facebook/rocksdb/pull/1939

Differential Revision: D4630121

Pulled By: ajkr

fbshipit-source-id: 04bb067
2017-03-03 13:24:15 -08:00
Islam AbdelRahman
f89b3893c0 Remove skip_table_builder_flush and default it to true
Summary:
This option is needed to be enabled for Direct IO
and I cannot think of a reason where we need to disable it

remove it and default it to true
Closes https://github.com/facebook/rocksdb/pull/1944

Differential Revision: D4641088

Pulled By: IslamAbdelRahman

fbshipit-source-id: d7085b9
2017-03-02 16:54:10 -08:00
Andrew Kryczka
cc253982dd Use more default options in db_bench
Summary:
The default behavior was too weird because, previously, we got the L0 file size limit (64MB) from Options default and L1+ file size limit (2MB) from the hardcoded value. We should get both from Options default.
Closes https://github.com/facebook/rocksdb/pull/1943

Differential Revision: D4640301

Pulled By: ajkr

fbshipit-source-id: fd8c0fd
2017-03-02 10:54:11 -08:00
Islam AbdelRahman
be3e5568be Fix unaligned reads in read cache
Summary:
- Fix unaligned reads in read cache by using RandomAccessFileReader
- Allow read cache flags in db_bench
Closes https://github.com/facebook/rocksdb/pull/1916

Differential Revision: D4610885

Pulled By: IslamAbdelRahman

fbshipit-source-id: 2aa1dc8
2017-02-27 13:09:12 -08:00
Aaron Gao
e7d902e693 add direct_io and compaction_readahead_size in db_stress
Summary:
add direct_io and compaction_readahead_size in db_stress
test direct_io under db_stress with compaction_readahead_size enabled to capture bugs found in production.
`./db_stress --allow_concurrent_memtable_write=0 --use_direct_reads --use_direct_writes --compaction_readahead_size=4096`
Closes https://github.com/facebook/rocksdb/pull/1906

Differential Revision: D4604514

Pulled By: IslamAbdelRahman

fbshipit-source-id: ebbf0ee
2017-02-23 11:39:14 -08:00
Sagar Vemuri
eb912a927e Remove disableDataSync option
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859

Differential Revision: D4541292

Pulled By: sagar0

fbshipit-source-id: 5b3a6ca
2017-02-13 11:09:13 -08:00
Andrew Kryczka
3b4ac8076b Clarify ldb column family argument
Summary:
move the argument description to the right section, make it clearer that the '=' sign is required, and use it in one subcommand where it seemed forgotten before.
Closes https://github.com/facebook/rocksdb/pull/1840

Differential Revision: D4515096

Pulled By: ajkr

fbshipit-source-id: 7b5b1c1
2017-02-07 11:54:10 -08:00
Andrew Kryczka
d70ce7ee0b Move db_bench flags out of unnamed namespace
Summary:
I want to be able to, e.g., DECLARE_string(statistics_string); in my application such that I can override the default value of statistics_string. For this to work, we need to remove the unnamed namespace containing all the flags, and make sure all variables/functions covered by that namespace are static.

Replaces #1828 due to internal tool issues.
Closes https://github.com/facebook/rocksdb/pull/1844

Differential Revision: D4515124

Pulled By: ajkr

fbshipit-source-id: 23b695e
2017-02-07 11:39:12 -08:00
Dmitri Smirnov
0a4cdde50a Windows thread
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
  an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823

Differential Revision: D4492902

Pulled By: siying

fbshipit-source-id: c74cb11
2017-02-06 14:54:18 -08:00
Andrew Kryczka
37d4a79e99 Deserialize custom Statistics object in db_bench
Summary:
Added -statistics_string to deserialize a Statistics object using the factory functions registered by applications.
Closes https://github.com/facebook/rocksdb/pull/1812

Differential Revision: D4469811

Pulled By: ajkr

fbshipit-source-id: 2d80862
2017-01-26 15:24:16 -08:00
Andrew Kryczka
17c1180603 Generalize Env registration framework
Summary:
The Env registration framework supports registering client Envs and selecting which one to instantiate according to a text field. This enabled things like adding the -env_uri argument to db_bench, so the same binary could be reused with different Envs just by changing CLI config.

Now this problem has come up again in a non-Env context, as I want to instantiate a client Statistics implementation from db_bench, which is configured entirely via text parameters. Also, in the future we may wish to use it for deserializing client objects when loading OPTIONS file.

This diff generalizes the Env registration logic to work with arbitrary types.

- Generalized registration and instantiation code by templating them
- The entire implementation is in a header file as that's Google style guide's recommendation for template definitions
- Pattern match with std::regex_match rather than checking prefix, which was the previous behavior
- Rename functions/files to be non-Env-specific
Closes https://github.com/facebook/rocksdb/pull/1776

Differential Revision: D4421933

Pulled By: ajkr

fbshipit-source-id: 34647d1
2017-01-25 16:09:14 -08:00
Maysam Yabandeh
d0ba8ec8f9 Revert "PinnableSlice"
Summary:
This reverts commit 54d94e9c2c.

The pull request was landed by mistake.
Closes https://github.com/facebook/rocksdb/pull/1755

Differential Revision: D4391678

Pulled By: maysamyabandeh

fbshipit-source-id: 36d5149
2017-01-08 14:24:12 -08:00
Maysam Yabandeh
54d94e9c2c PinnableSlice
Summary:
Currently the point lookup values are copied to a string provided by the user.
This incures an extra memcpy cost. This patch allows doing point lookup
via a PinnableSlice which pins the source memory location (instead of
copying their content) and releases them after the content is consumed
by the user. The old API of Get(string) is translated to the new API
underneath.

 Here is the summary for improvements:
 1. value 100 byte: 1.8%  regular, 1.2% merge values
 2. value 1k   byte: 11.5% regular, 7.5% merge values
 3. value 10k byte: 26% regular,    29.9% merge values

 The improvement for merge could be more if we extend this approach to
 pin the merge output and delay the full merge operation until the user
 actually needs it. We have put that for future work.

PS:
Sometimes we observe a small decrease in performance when switching from
t5452014 to this patch but with the old Get(string) API. The difference
is a little and could be noise. More importantly it is safely
cancelled
Closes https://github.com/facebook/rocksdb/pull/1732

Differential Revision: D4374613

Pulled By: maysamyabandeh

fbshipit-source-id: a077f1a
2017-01-08 13:54:13 -08:00
Dmitri Smirnov
e04480faed Fix MS warnings. Use ROCKSDB_Prsz for size_t.
Summary: Closes https://github.com/facebook/rocksdb/pull/1737

Differential Revision: D4378852

Pulled By: yiwu-arbug

fbshipit-source-id: ba8b02d
2017-01-06 16:24:14 -08:00
Yi Wu
640d724808 Update db_bench and sst_dump to test with block cache mid-point inser?
Summary:
?tion

Add flags in db_bench to test with block cache mid-point insertion.
Also update sst_dump to dump total block sizes of each type. I find it
useful to look at these test db stats and I don't know if we have them
elsewhere.
Closes https://github.com/facebook/rocksdb/pull/1706

Differential Revision: D4355812

Pulled By: yiwu-arbug

fbshipit-source-id: 3e4a348
2017-01-03 18:39:14 -08:00
Aaron Gao
972f96b3fb direct io write support
Summary:
rocksdb direct io support

```
[gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 5.0
Date:       Wed Nov 23 13:17:43 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s

[gzh@dev11575.prn2 ~/roc
Closes https://github.com/facebook/rocksdb/pull/1564

Differential Revision: D4241093

Pulled By: lightmark

fbshipit-source-id: 98c29e3
2016-12-22 13:09:19 -08:00
Siying Dong
7bd725e962 db_bench: introduce --benchmark_read_rate_limit
Summary:
Add the parameter in db_bench to help users to measure latency histogram with constant read rate.
Closes https://github.com/facebook/rocksdb/pull/1683

Differential Revision: D4341387

Pulled By: siying

fbshipit-source-id: 1b4b276
2016-12-19 12:39:11 -08:00
Daniel Black
816c1e30ca gcc-7 requires include <functional> for std::function
Summary:
Fixes compile error:

In file included from ./util/statistics.h:17:0,
                 from ./util/stop_watch.h:8,
                 from ./util/perf_step_timer.h:9,
                 from ./util/iostats_context_imp.h:8,
                 from ./util/posix_logger.h:27,
                 from ./port/util_logger.h:18,
                 from ./db/auto_roll_logger.h:15,
                 from db/auto_roll_logger.cc:6:
./util/thread_local.h:65:16: error: 'function' in namespace 'std' does not name a template type
   typedef std::function<void(void*, void*)> FoldFunc;
Closes https://github.com/facebook/rocksdb/pull/1656

Differential Revision: D4318702

Pulled By: yiwu-arbug

fbshipit-source-id: 8c5d17a
2016-12-16 11:24:18 -08:00
Daniel Black
0ab6fc167f Gcc-7 buffer size insufficient
Summary:
Bunch of commits related to insufficient buffer size. Errors in individual commits.
Closes https://github.com/facebook/rocksdb/pull/1673

Differential Revision: D4332127

Pulled By: IslamAbdelRahman

fbshipit-source-id: 878f73c
2016-12-14 19:24:26 -08:00
Islam AbdelRahman
d71e728c7a Print user collected properties in sst_dump
Summary:
Include a dump of user_collected_properties in sst_dump
Closes https://github.com/facebook/rocksdb/pull/1668

Differential Revision: D4325078

Pulled By: IslamAbdelRahman

fbshipit-source-id: 226b6d6
2016-12-14 11:24:11 -08:00
Daniel Black
1f6f7e3e89 cast to signed char in ldb_cmd_test for ppc64le
Summary:
char is unsigned on power by default causing this test to fail with the FF case. ppc64 return 255 while x86 returned -1. Casting works on both platforms.
Closes https://github.com/facebook/rocksdb/pull/1500

Differential Revision: D4308775

Pulled By: yiwu-arbug

fbshipit-source-id: db3e6e0
2016-12-12 14:39:18 -08:00
Andrew Kryczka
dc64f46b1c Add db_bench option for stderr logging
Summary:
The info log file ("LOG") is stored in the db directory by default. When the db is on a distributed env, this is unnecessarily slow. So, I added an option to db_bench to just print the info log messages to stderr.
Closes https://github.com/facebook/rocksdb/pull/1641

Differential Revision: D4309348

Pulled By: ajkr

fbshipit-source-id: 1e6f851
2016-12-09 16:54:14 -08:00
Andrew Kryczka
5241e0dbfc fix db_bench argument type
Summary: Closes https://github.com/facebook/rocksdb/pull/1633

Differential Revision: D4298161

Pulled By: yiwu-arbug

fbshipit-source-id: 2c7af35
2016-12-08 11:09:14 -08:00
Andrew Kryczka
2c2ba68247 db_stress support for range deletions
Summary:
made db_stress capable of adding range deletions to its db and verifying their correctness. i'll make db_crashtest.py use this option later once the collapsing optimization (https://github.com/facebook/rocksdb/pull/1614) is committed because currently it slows down the test too much.
Closes https://github.com/facebook/rocksdb/pull/1625

Differential Revision: D4293939

Pulled By: ajkr

fbshipit-source-id: d3beb3a
2016-12-07 13:09:24 -08:00
Andrew Kryczka
fa50fffaf5 Option to expand range tombstones in db_bench
Summary:
When enabled, this option replaces range tombstones with a sequence of
point tombstones covering the same range. This can be used to A/B test perf of
range tombstones vs sequential point tombstones, and help us find the cross-over
point, i.e., the size of the range above which range tombstones outperform point
tombstones.
Closes https://github.com/facebook/rocksdb/pull/1594

Differential Revision: D4246312

Pulled By: ajkr

fbshipit-source-id: 3b00b23
2016-12-06 18:09:14 -08:00
Yi Wu
56281f3a97 Add memtable_insert_with_hint_prefix_size option to db_bench
Summary:
Add memtable_insert_with_hint_prefix_size option to db_bench
Closes https://github.com/facebook/rocksdb/pull/1604

Differential Revision: D4260549

Pulled By: yiwu-arbug

fbshipit-source-id: cee5ef7
2016-12-01 16:54:16 -08:00
Islam AbdelRahman
4a21b1402c Cache heap::downheap() root comparison (optimize heap cmp call)
Summary:
Reduce number of comparisons in heap by caching which child node in the first level is smallest (left_child or right_child)
So next time we can compare directly against the smallest child

I see that the total number of calls to comparator drops significantly when using this optimization

Before caching (~2mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000  --perf_level=2
readseq      :       0.338 micros/op 2959201 ops/sec;  327.4 MB/s user_key_comparison_count = 2000008
```
After caching (~1mil key comparison for iterating the DB)
```
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq" --db="/dev/shm/heap_opt" --use_existing_db --disable_auto_compactions --cache_size=1000000000 --perf_level=2
readseq      :       0.309 micros/op 3236801 ops/sec;  358.1 MB/s user_key_comparison_count = 1000011
```

It also improves
Closes https://github.com/facebook/rocksdb/pull/1600

Differential Revision: D4256027

Pulled By: IslamAbdelRahman

fbshipit-source-id: 76fcc66
2016-12-01 13:39:14 -08:00
Igor Canadi
3f407b065c Kill flashcache code in RocksDB
Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72
2016-12-01 10:09:22 -08:00
Andrew Kryczka
fd43ee09da Range deletion microoptimizations
Summary:
- Made RangeDelAggregator's InternalKeyComparator member a reference-to-const so we don't need to copy-construct it. Also added InternalKeyComparator to ImmutableCFOptions so we don't need to construct one for each DBIter.
- Made MemTable::NewRangeTombstoneIterator and the table readers' NewRangeTombstoneIterator() functions return nullptr instead of NewEmptyInternalIterator to avoid the allocation. Updated callers accordingly.
Closes https://github.com/facebook/rocksdb/pull/1548

Differential Revision: D4208169

Pulled By: ajkr

fbshipit-source-id: 2fd65cf
2016-11-21 12:24:13 -08:00
Andrew Kryczka
018bb2ebf5 DeleteRange support for db_bench
Summary:
Added a few options to configure when to add range tombstones during
any benchmark involving writes.
Closes https://github.com/facebook/rocksdb/pull/1522

Differential Revision: D4187388

Pulled By: ajkr

fbshipit-source-id: 2c8a473
2016-11-15 17:39:47 -08:00
Andrew Kryczka
53b693f5fe ldb support for range delete
Summary:
Add a subcommand to ldb with which we can delete a range of keys.
Closes https://github.com/facebook/rocksdb/pull/1521

Differential Revision: D4186338

Pulled By: ajkr

fbshipit-source-id: b8e9861
2016-11-15 15:54:20 -08:00
Lijun Tang
adb665e0bf Allowed delayed_write_rate option to be dynamically set.
Summary: Closes https://github.com/facebook/rocksdb/pull/1488

Differential Revision: D4157784

Pulled By: siying

fbshipit-source-id: f150081
2016-11-12 15:54:11 -08:00
Anirban Rahut
14c0380e78 Convenience option to parse an internal key on command line
Summary:
enhancing sst_dump to be able to parse internal key
Closes https://github.com/facebook/rocksdb/pull/1482

Differential Revision: D4154175

Pulled By: siying

fbshipit-source-id: b0e28b1
2016-11-10 10:09:21 -08:00
Peter (Stig) Edwards
144cdb8f16 16384 as e.g .value for compression_max_dict_bytes
Summary:
Use 16384 as e.g .value for ldb the --compression_max_dict_bytes option.
I think 14 was copy and pasted from the options in the lines above.
Closes https://github.com/facebook/rocksdb/pull/1483

Differential Revision: D4154393

Pulled By: siying

fbshipit-source-id: ef53a69
2016-11-09 11:24:20 -08:00
Andrew Kryczka
9e7cf3469b DeleteRange user iterator support
Summary:
Note: reviewed in  https://reviews.facebook.net/D65115

- DBIter maintains a range tombstone accumulator. We don't cleanup obsolete tombstones yet, so if the user seeks back and forth, the same tombstones would be added to the accumulator multiple times.
- DBImpl::NewInternalIterator() (used to make DBIter's underlying iterator) adds memtable/L0 range tombstones, L1+ range tombstones are added on-demand during NewSecondaryIterator() (see D62205)
- DBIter uses ShouldDelete() when advancing to check whether keys are covered by range tombstones
Closes https://github.com/facebook/rocksdb/pull/1464

Differential Revision: D4131753

Pulled By: ajkr

fbshipit-source-id: be86559
2016-11-04 12:09:22 -07:00
Benoit Girard
2b16d664cb Change max_bytes_for_level_multiplier to double
Summary: Closes https://github.com/facebook/rocksdb/pull/1427

Differential Revision: D4094732

Pulled By: yiwu-arbug

fbshipit-source-id: b9b79e9
2016-11-01 21:09:23 -07:00
Siying Dong
f9eb56791a db_bench: --dump_malloc_stats takes no effect
Summary:
Fix the bug that --dump_malloc_stats is set before opening the DB.
Closes https://github.com/facebook/rocksdb/pull/1447

Differential Revision: D4106001

Pulled By: siying

fbshipit-source-id: 4e746da
2016-10-31 14:54:26 -07:00
Kien-hung Li
eeb27e1bbd Add handy option to turn on direct I/O in db_bench (#1424) 2016-10-28 10:36:05 -07:00
Yueh-Hsuan Chiang
f83cd64c0b Fix a bug that mistakenly disable regression_test.sh to update commit (#1415)
Summary:
Fix a bug that mistakenly disable regression_test.sh to update commit

Test Plan:
regression_test.sh
2016-10-21 17:26:24 -07:00
Aaron Gao
08616b4933 [db_bench] add filldeterministic (Universal+level compaction)
Summary:
in db_bench, we can dynamically create a rocksdb database that guarantees the shape of its LSM.
universal + level compaction
no fifo compaction
no multi db support

Test Plan:
./db_bench -benchmarks=fillseqdeterministic -compaction_style=1 -num_levels=3 --disable_auto_compactions -num=1000000 -value_size=1000
```
---------------------- LSM ---------------------
Level[0]: /000480.sst(size: 35060275 bytes)
Level[0]: /000479.sst(size: 70443197 bytes)
Level[0]: /000478.sst(size: 141600383 bytes)
Level[1]: /000341.sst - /000475.sst(total size: 284726629 bytes)
Level[2]: /000071.sst - /000340.sst(total size: 568649806 bytes)
fillseqdeterministic :      60.447 micros/op 16543 ops/sec;   16.0 MB/s
```

Reviewers: sdong, andrewkr, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63111
2016-10-18 16:30:57 -07:00
dhruba borthakur
5027dd17a7 Fix a minor bug in the ldb tool that was not selecting the specified (#1399)
column family for compaction.
2016-10-17 10:40:30 -07:00
Siying Dong
a249a0b75b check_format_compatible.sh to use some branch which allows to run with GCC 4.8 (#1393)
Summary:
Some older tags don't run GCC 4.8 with FB internal setting. Fixed them and created branches. Change the format compatible script accordingly.

Also add more releases to check format compatibility.
2016-10-13 16:15:55 -07:00
Kefu Chai
21f4bb5a89 cmake support for linux and osx (#1358)
* enable cmake to work on linux and osx also

* port part of build_detect_platform not covered by thirdparty.inc
  to cmake.
  - detect fallocate()
  - detect malloc_usable_size()
  - detect JeMalloc
  - detect snappy
* check for asan,tsan,ubsan
* create 'build_version.cc' in build directory.
* add `check` target to support 'make check'.
* add `tools` target to match its counterpart in Makefile.
* use `date` on non-win32 platforms.
* pass different cflags on non-win32 platforms
* detect pthead library using FindThread cmake module.
* enable CMP0042 to silence the cmake warning on osx
* reorder the linked libraries. because testutillib references gtest, to
  enable the linker to find the referenced symbols, we need to put gtest
  after testutillib.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>

* hash_table_bench.cc: fix build without gflags

Signed-off-by: Kefu Chai <kchai@redhat.com>

* remove gtest from librocksdb linkage

testharness.cc is included in librocksdb sources, and it uses gtest. but
gtest is not supposed to be part of the public API of librocksdb. so, in
this change, the testharness.cc is moved out out librocksdb, and is
built as an object target, then linked with the tools and tests instead.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-09-28 11:53:15 -07:00
Yi Wu
9ed928e7a9 Split DBOptions into ImmutableDBOptions and MutableDBOptions
Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.

Test Plan:
  make all check

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64065
2016-09-23 16:34:04 -07:00
rockeet
4c3f4496b5 Add TableBuilderOptions::level and relevant changes (#1335) 2016-09-17 22:30:43 -07:00
sdong
607628d349 Support ZSTD with finalized format
Summary:
ZSTD 1.0.0 is coming. We can finally add a support of ZSTD without worrying about compatibility.
Still keep ZSTDNotFinal for compatibility reason.

Test Plan: Run all tests. Run db_bench with ZSTD version with RocksDB built with ZSTD 1.0 and older.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: cyan, igor, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63141
2016-09-06 12:22:16 -07:00
Yi Wu
a88677d2cf Remove ImmutableCFOptions from public API
Summary: There's no reference to ImmutableCFOptions elsewhere in /include/rocksdb. ImmutableCFOptions was introduced in this commit (5665e5e285) but later its reference in /include/rocksdb/table.h is removed.

Test Plan:
  make all check

Reviewers: IslamAbdelRahman, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63177
2016-09-02 14:16:31 -07:00
Islam AbdelRahman
5051755e35 Fix db_bench memory use after free (detected by clang_analyze)
Summary: Fix using `arg[i].thread` after deleting it

Test Plan: run clang_analyze

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D63171
2016-09-02 10:58:08 -07:00
sdong
32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
Islam AbdelRahman
b49b92cf28 Introduce Read amplification bitmap (read amp statistics)
Summary:
Add ReadOptions::read_amp_bytes_per_bit option which allow us to create a bitmap for every data block we read
the bitmap will contain (block_size / read_amp_bytes_per_bit) bits.

We will use this bitmap to mark which bytes have been used of the block so we can calculate the read amplification

Test Plan: added new tests

Reviewers: andrewkr, yhchiang, sdong

Reviewed By: sdong

Subscribers: yiwu, leveldb, march, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58707
2016-08-26 18:55:58 -07:00
Islam AbdelRahman
cce702a6e4 [db_bench] Support single benchmark arguments (Repeat for X times, Warm up for X times), Support CombinedStats (AVG / MEDIAN)
Summary:
This diff allow us to run a single benchmark X times and warm it up for Y times. and see the AVG & MEDIAN throughput of these X runs
for example

```
$ ./db_bench --benchmarks="fillseq,readseq[X5-W2]"
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 4.12
Date:       Wed Aug 24 10:45:26 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-8616/dbbench]
fillseq      :       4.695 micros/op 212971 ops/sec;   23.6 MB/s
DB path: [/tmp/rocksdbtest-8616/dbbench]
Warming up benchmark by running 2 times
readseq      :       0.214 micros/op 4677005 ops/sec;  517.4 MB/s
readseq      :       0.212 micros/op 4706834 ops/sec;  520.7 MB/s
Running benchmark for 5 times
readseq      :       0.218 micros/op 4588187 ops/sec;  507.6 MB/s
readseq      :       0.208 micros/op 4816538 ops/sec;  532.8 MB/s
readseq      :       0.213 micros/op 4685376 ops/sec;  518.3 MB/s
readseq      :       0.214 micros/op 4676787 ops/sec;  517.4 MB/s
readseq      :       0.217 micros/op 4618532 ops/sec;  510.9 MB/s
readseq [AVG    5 runs] : 4677084 ops/sec;  517.4 MB/sec
readseq [MEDIAN 5 runs] : 4676787 ops/sec;  517.4 MB/sec
```

Test Plan: run db_bench

Reviewers: sdong, andrewkr, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62235
2016-08-25 12:57:35 -07:00
Yi Wu
d76ddf327d Disable ClockCache db_crashtest
Summary: Tempororily disable clock cache in db_crashtest while we investigate data race issue with clock cache.

Test Plan:
    python ./tools/db_crashtest.py blackbox

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62481
2016-08-24 14:19:05 -07:00
Yi Wu
4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
Mark Callaghan
4fe12baa66 Make db_bench less space for --stats_per_interval
Summary:
Changes compaction IO stats to be printed once per interval rather
than once per interval per thread. https://github.com/facebook/rocksdb/issues/1276

Task ID: #12698508

Test Plan: run db_bench

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62067
2016-08-15 11:07:05 -07:00
sdong
8b79422b52 [Proof-Of-Concept] RocksDB Blob Storage with a blob log file.
Summary:
This is a proof of concept of a RocksDB blob log file. The actual value of the Put() is appended to a blob log using normal data block format, and the handle of the block is written as the value of the key in RocksDB.

The prototype only supports Put() and Get(). It doesn't support DB restart, garbage collection, Write() call, iterator, snapshots, etc.

Test Plan: Add unit tests.

Reviewers: arahut

Reviewed By: arahut

Subscribers: kradhakrishnan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61485
2016-08-10 17:05:17 -07:00
sdong
f35b16f246 db_bench add an option of --base_background_compactions
Summary: As title.

Test Plan: Run the benchmark with and without the parameter.

Reviewers: yiwu, andrewkr, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61491
2016-08-04 11:06:00 -07:00
omegaga
64046e581c Write a benchmark to emulate time series data
Summary: Add a benchmark to `db_bench`. In this benchmark, a write thread will populate time series data in the format of 'id | timestamp', and multiple read threads will randomly retrieve all data from one id at a time.

Test Plan: Run the benchmark: `num=134217728;bpl=536870912;mb=67108864;overlap=10;mcz=2;del=300000000;levels=6;ctrig=4;delay=8;stop=12;wbn=3;mbc=20;wbs=134217728;dds=0;sync=0;t=32;vs=800;bs=4096;cs=17179869184;of=500000;wps=0;si=10000000; kir=100000; dir=/data/users/jhli/test/; ./db_bench --benchmarks=timeseries --disable_seek_compaction=1 --mmap_read=0 --statistics=1 --histogram=1 --num=$num --threads=$t --value_size=$vs --block_size=$bs --cache_size=$cs --bloom_bits=10 --cache_numshardbits=6 --open_files=$of --verify_checksum=1 --db=$dir --sync=$sync --disable_wal=0 --compression_type=none --stats_interval=$si --compression_ratio=1 --disable_data_sync=$dds --write_buffer_size=$wbs --target_file_size_base=$mb --max_write_buffer_number=$wbn --max_background_compactions=$mbc --level0_file_num_compaction_trigger=$ctrig --level0_slowdown_writes_trigger=$delay --level0_stop_writes_trigger=$stop --num_levels=$levels --delete_obsolete_files_period_micros=$del --min_level_to_compress=$mcz --max_grandparent_overlap_factor=$overlap --stats_per_interval=1 --max_bytes_for_level_base=$bpl --use_existing_db=0 --key_id_range=$kir`

Reviewers: andrewkr, sdong

Reviewed By: sdong

Subscribers: lgalanis, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60651
2016-08-01 12:28:51 -07:00
Islam AbdelRahman
557748ff7b Fix db_stress failure (pass merge_operator even if not used)
Summary:
db_stress test is now failing because of this scenario
- run db_stress with merge_operator enabled (now we have a db with merge operands)
- run db_stress with merge_operator disabled (now when we fail to open the db)

the solution is to pass the merge_operator to the DB even if we are not going to do any merge operations

Test Plan: Check the failure

Reviewers: andrewkr, yiwu, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61311
2016-07-29 11:02:03 -07:00
sdong
d4c45428af db_stress shouldn't assert file size 0 if file creation fails
Summary: OnTableFileCreated() now is also called when the file creaion fails. In that case, we shouldn't assert the file size is not 0.

Test Plan: Run crash test

Reviewers: yiwu, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61137
2016-07-27 12:08:16 -07:00
sdong
e5b5f12b81 Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60513
2016-07-26 18:15:11 -07:00
Andrew Kryczka
bbd6a5a184 ldb restore subcommand
Summary:
- Added a new subcommand, 'ldb restore', that restores from backup
- Made backup_env_uri optional (also for 'ldb backup') because it can use db_env when backup_env isn't provided

Test Plan:
verify backup and restore commands work:

  $ ./ldb backup --db=/data/users/andrewkr/rocksdb/db_bench-out/ --num_threads 1 --backup_dir=/data/users/andrewkr/rocksdb/db_bench-out/backup/
  $ ./ldb restore --db=/data/users/andrewkr/rocksdb/db_bench-out/restore/ --num_threads 1 --backup_dir=/data/users/andrewkr/rocksdb/db_bench-out/backup/

Reviewers: sdong, wanning

Reviewed By: wanning

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60849
2016-07-26 11:13:26 -07:00
zhang-tong
b2a8016df1 Update db_bench_tool.cc (#1239)
* Update db_bench_tool.cc

* Update db_bench_tool.cc
2016-07-25 14:50:44 -07:00
sdong
f85df120f2 Re-enable tsan crash white-box test with reduced killing odds
Summary:
Tsan crash white-box test was disabled because it never ends. Re-enable it with reduced killing odds.
Add a parameter in crash test script to allow we pass it through an environment variable.

Test Plan: Run it manually.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61053
2016-07-22 12:51:27 -07:00
Peter (Stig) Edwards
b06ca5f860 ldb load, prefer ifsteam(/dev/stdin) to std::cin (#1207)
getline on std::cin can be very inefficient when ldb is loading large values, with high CPU usage in libc _IO_(un)getc, this is because of the performance penalty that comes from synchronizing stdio and iostream buffers.
See the reproducers and tests in #1133 .
If an ifstream on /dev/stdin is used (when available) then using ldb to load large values can be much more efficient.
I thought for ldb load, that this approach is preferable to using <cstdio> or std::ios_base::sync_with_stdio(false).
I couldn't think of a use case where ldb load would need to support reading unbuffered input, an alternative approach would be to add support for passing --input_file=/dev/stdin.
I have a CLA in place, thanks.

The CI tests were failing at the time of https://github.com/facebook/rocksdb/pull/1156, so this change and PR will supersede it.
2016-07-22 11:46:40 -07:00
Islam AbdelRahman
68a8e6b8fa Introduce FullMergeV2 (eliminate memcpy from merge operators)
Summary:
This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>

This diff is stacked on top of D56493 and D56511

In this diff we
- Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
- Replace std::deque<std::string> with std::vector<Slice> to pass operands
- Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
- Allow FullMergeV2 output to be an existing operand

```
[Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s

[master]
readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
```

```
[Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s

[master]
readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s

```

```
[Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]

[FullMergeV2]
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s

[master]
readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
```

```
[Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions

[FullMergeV2]
readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s

[master]
readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
```

Test Plan: COMPILE_WITH_ASAN=1 make check -j64

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: lovro, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57075
2016-07-20 09:49:03 -07:00
John Alexander
9430333f84 New Statistics to track Compression/Decompression (#1197)
* Added new statistics and refactored to allow ioptions to be passed around as required to access environment and statistics pointers (and, as a convenient side effect, info_log pointer).

* Prevent incrementing compression counter when compression is turned off in options.

* Prevent incrementing compression counter when compression is turned off in options.

* Added two more supported compression types to test code in db_test.cc

* Prevent incrementing compression counter when compression is turned off in options.

* Added new StatsLevel that excludes compression timing.

* Fixed casting error in coding.h

* Fixed CompressionStatsTest for new StatsLevel.

* Removed unused variable that was breaking the Linux build
2016-07-19 09:44:03 -07:00
Wanning Jiang
880ee363eb ldb backup support
Summary: add backup support for ldb tool, and use it to run load test for backup on two HDFS envs

Test Plan: first generate some db, then compile against load test in fbcode, run load_test --db=<db path> backup --backup_env_uri=<URI of backup env> --backup_dir=<backup directory> --num_threads=<number of thread>

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60633
2016-07-14 14:09:31 -07:00
Yi Wu
296545a2c7 Fix clang analyzer errors
Summary:
Fixing erros reported by clang static analyzer.
* Removing some unused variables.
* Adding assertions to fix false positives reported by clang analyzer.
* Adding `__clang_analyzer__` macro to suppress false positive warnings.

Test Plan:
    USE_CLANG=1 OPT=-g make analyze -j64

Reviewers: andrewkr, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60549
2016-07-08 17:50:51 -07:00
sdong
32df9733d1 Add options.write_buffer_manager: control total memtable size across DB instances
Summary: Add option write_buffer_manager to help users control total memory spent on memtables across multiple DB instances.

Test Plan: Add a new unit test.

Reviewers: yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adela, benj, sumeet, muthu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59925
2016-07-05 18:11:25 -07:00
John Alexander
43692793e3 Fixed Minor Bug on Windows Build and db_bench_tool.cc (#1189)
* Fixed Windows build error in CMakeLists.txt and perf_level error in db_bench_tool.cc

* Changed hard-coded perf levels in db_bench_tool.cc to enum values from perf_level.h

* Replaced remaining FLAGS_perf_level > 0
2016-06-29 18:44:22 -07:00
Yueh-Hsuan Chiang
faa7eb3b99 Improve regression_test.sh
Summary:
This diff makes the following improvement in regression_test.sh:

1. Add NUM_OPS and DELETE_TEST_PATH to regression_test.sh:

  * NUM_OPS:  The number of operations that will be issued
    in EACH thread.
    Default: $NUM_KEYS / $NUM_THREADS

  * DELETE_TEST_PATH: If true, then the test directory
    will be deleted after the script ends.
    Default: 0
2. Add more information in SUMMARY.csv
3. Fix a bug in regression_test.sh where each thread in fillseq will all issue $NUM_KEYS writes.
4. Add --deletes in db_bench, which allows us to control the number of deletes instead of must using FLAGS_num.

Test Plan: run regression test with and without DELETE_TEST_PATH and NUM_OPS

Reviewers: yiwu, sdong, IslamAbdelRahman, gunnarku

Reviewed By: gunnarku

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60039
2016-06-28 14:10:24 -07:00
Islam AbdelRahman
d6b79e2fd0 Remove filter_deletes from crash_test
Summary: filter_deletes option was removed, remove it from crash_test to fix it

Test Plan: make crash_test

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59901
2016-06-21 17:30:21 -07:00
sdong
7b79238b65 Deprectate filter_deletes
Summary: filter_deltes is not a frequently used feature. Remove it.

Test Plan: Run all test suites.

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59427
2016-06-17 10:30:47 -07:00
Andrew Kryczka
ca3db54788 Fetch branches from github for format compatibility test
Summary:
The sandcastle setup doesn't provide a remote with our branches. Need
to fetch them directly from github.

Test Plan:
ran script on devserver and the relevant commands on a sandcastle
host.

Reviewers: IslamAbdelRahman, lightmark, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59511
2016-06-10 13:17:14 -07:00
sdong
20699df843 memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio

Test Plan: Run all existing tests

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59199
2016-06-10 12:12:10 -07:00
Andrew Kryczka
a683d4aba9 URI-based Env selection for db_bench
Summary:
Added an option, --env_uri. When provided, it is used as an argument to
NewEnvFromUri(), which instantiates an Env based on it.

Test Plan:
built a simple binary that registers ChrootEnv for prefix "/", then
ran:

  $ ./tmp --env_uri /tmp/ --db /abcde

/tmp/ is the chroot directory and /abcde is the db_name. Then I verified
db_bench uses /tmp/abcde

Reviewers: sdong, kradhakrishnan, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59325
2016-06-09 17:53:03 -07:00
sdong
2ae15b2d50 Format compatible test should cover forward compatibility up to 4.8.
Test Plan: Run the compatibility test

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59385
2016-06-09 10:22:31 -07:00
Andrew Kryczka
5091dfc1ed use branch names in format compatibility test
Summary:
We had to go back and update the g++ path for 4.4.fb-4.8.fb. So the
path is now fixed on the branches, but can't be fixed on the tags since they're
immutable. By making format compatibility tests use branch names (when
available), backported fixes like this will be used without having to re-release.

Also removed v1.5.7 and v2.1 because make fails.

Test Plan:
  $ build_tools/rocksdb-lego-determinator run_format_compatible

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59355
2016-06-08 15:38:04 -07:00
Yueh-Hsuan Chiang
fda098461b Allow regression_test.sh to specify OPTIONS_FILE. Add header comments.
Summary:
This patch does the following improvement for regression_test.sh
* Allow regression_test.sh to specify OPTIONS_FILE.
* Add header comments that includes examples on how to run the script
  and introduce all configurable parameters.
* bug fix.

Test Plan: Run the example commands in the header comments of regression_test.sh

Reviewers: sdong, yiwu, gunnarku

Reviewed By: gunnarku

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59175
2016-06-06 22:57:46 -07:00
Yueh-Hsuan Chiang
88acd932f6 Allows db_bench to take an options file
Summary:
This patch allows db_bench to initialize it's RocksDB Options via a
options file, specified by the --options_file flag.  Note that if
--options_file flag is set, then it has higher priority than the
command-line argument.

Test Plan: db_bench_tool_test

Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D58533
2016-06-02 16:24:14 -07:00
Yueh-Hsuan Chiang
8cf0f86d39 Allow regression test to run db_bench at a remost host
Summary:
This patch does the following improvement on the regression_test.sh
* allows db_bench being executed at a remost host while storing the
  benchmark results locally.
* kills all db_bench related processes before running db_bench
* better error handling.

Test Plan:
1. Run regression_test.sh both locally and remotely
2. Run multiple regression_test.sh at the same time and make sure
   i. Only one runs successfully.
   ii. The one that runs successfully will kill all other db_bench
       processes before it runs any benchmark.

Reviewers: sdong, yiwu, gunnarku

Reviewed By: gunnarku

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D58611
2016-06-02 02:53:39 -07:00
sdong
c40c4cae14 LDBCommand::SelectCommand to use a struct as the parameter
Summary: The function wrapper for LDBCommand::SelectCommand is too long so that Windows build fails with warning "decorated name length exceeded, name was truncated". Shrink the length by using a struct.

Test Plan: Build on both of Linux and Windows and make sure the warning doesn't show in either platform.

Reviewers: andrewkr, adsharma, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58965
2016-05-31 10:26:53 -07:00
sdong
0e20000171 LDBCommand::InitFromCmdLineArgs() to move from template to function wrapper
Summary:
Build failure with some compiler setting with

tools/reduce_levels_test.cc:97: undefined reference to `rocksdb::LDBCommand* rocksdb::LDBCommand::InitFromCmdLineArgs<rocksdb::LDBCommand* (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, std::vector<std::string, std::allocator<std::string> > const&)>(std::vector<std::string, std::allocator<std::string> > const&, rocksdb::Options const&, rocksdb::LDBOptions const&, std::vector<rocksdb::ColumnFamilyDescriptor, std::allocator<rocksdb::ColumnFamilyDescriptor> > const*, rocksdb::LDBCommand* (*)(std::string const&, std::vector<std::string, std::allocator<std::string> > const&, std::map<std::string, std::string, std::less<std::string>, std::allocator<std::pair<std::string const, std::string> > > const&, std::vector<std::string, std::allocator<std::string> > const&))'

Fix it by changing to function pointer instead

Test Plan: Run all existing tests

Reviewers: andrewkr, kradhakrishnan, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: adsharma, lightmark, yiwu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58905
2016-05-27 11:07:48 -07:00
Islam AbdelRahman
9dd50d9902 Fix db_bench
Summary: Fix simple issue with FLAGS_simcache_size condition

Test Plan: run db_bench

Reviewers: lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58743
2016-05-24 17:27:44 -07:00
Aaron Gao
5d660258e7 add simulator Cache as class SimCache/SimLRUCache(with test)
Summary: add class SimCache(base class with instrumentation api) and SimLRUCache(derived class with detailed implementation) which is used as an instrumented block cache that can predict hit rate for different cache size

Test Plan:
Add a test case in `db_block_cache_test.cc` called `SimCacheTest` to test basic logic of SimCache.
Also add option `-simcache_size` in db_bench. if set with a value other than -1, then the benchmark will use this value as the size of the simulator cache and finally output the simulation result.
```
[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 1000000
RocksDB:    version 4.8
Date:       Tue May 17 16:56:16 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       6.809 micros/op 146874 ops/sec;   16.2 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.343 micros/op 157665 ops/sec;   17.4 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 986559
SimCache HITs:    264760
SimCache HITRATE: 26.84%

[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 10000000
RocksDB:    version 4.8
Date:       Tue May 17 16:57:10 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       5.066 micros/op 197394 ops/sec;   21.8 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.457 micros/op 154870 ops/sec;   17.1 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 1059764
SimCache HITs:    374501
SimCache HITRATE: 35.34%

[gzh@dev9927.prn1 ~/local/rocksdb] ./db_bench -benchmarks "fillseq,readrandom" -cache_size 1000000 -simcache_size 100000000
RocksDB:    version 4.8
Date:       Tue May 17 16:57:32 2016
CPU:        32 * Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz
CPUCache:   20480 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 0
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       5.632 micros/op 177572 ops/sec;   19.6 MB/s
DB path: [/tmp/rocksdbtest-112628/dbbench]
readrandom   :       6.892 micros/op 145094 ops/sec;   16.1 MB/s (1000000 of 1000000 found)

SIMULATOR CACHE STATISTICS:
SimCache LOOKUPs: 1150767
SimCache HITs:    1034535
SimCache HITRATE: 89.90%
```

Reviewers: IslamAbdelRahman, andrewkr, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57999
2016-05-23 23:35:23 -07:00
Aaron Orenstein
2073cf3775 Eliminate use of 'using namespace std'. Also remove a number of ADL references to std functions.
Summary: Reduce use of argument-dependent name lookup in RocksDB.

Test Plan: 'make check' passed.

Reviewers: andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58203
2016-05-20 07:42:18 -07:00
Richard Cairns Jr
f6e404c20a Added "number of merge operands" to statistics in ssts.
Summary:
A couple of notes from the diff:
  - The namespace block I added at the top of table_properties_collector.cc was in reaction to an issue i was having with PutVarint64 and reusing the "val" string.  I'm not sure this is the cleanest way of doing this, but abstracting this out at least results in the correct behavior.
  - I chose "rocksdb.merge.operands" as the property name.  I am open to suggestions for better names.
  - The change to sst_dump_tool.cc seems a bit inelegant to me.  Is there a better way to do the if-else block?

Test Plan:
I added a test case in table_properties_collector_test.cc.  It adds two merge operands and checks to make sure that both of them are reflected by GetMergeOperands.  It also checks to make sure the wasPropertyPresent bool is properly set in the method.

Running both of these tests should pass:
./table_properties_collector_test
./sst_dump_test

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58119
2016-05-19 14:24:48 -07:00
krad
a08c8c851a Added PersistentCache abstraction
Summary:
Added a new abstraction to cache page to RocksDB designed for the read
cache use.

RocksDB current block cache is more of an object cache. For the persistent read cache
project, what we need is a page cache equivalent. This changes adds a cache
abstraction to RocksDB to cache pages called PersistentCache. PersistentCache can cache
uncompressed pages or raw pages (content as in filesystem). The user can
choose to operate PersistentCache either in  COMPRESSED or UNCOMPRESSED mode.

Blame Rev:

Test Plan: Run unit tests

Reviewers: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55707
2016-05-15 22:17:18 -07:00
Arun Sharma
5c06e0814c [ldb] Templatize the Selector
Summary:
So a customized ldb tool can pass it's own Selector.
Such a selector is expected to call LDBCommand::SelectCommand
and then add some of its own customized commands

Test Plan: make ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57249
2016-05-13 12:12:39 -07:00
Arun Sharma
49815e3841 [ldb] Export LDBCommandRunner
Summary:
The implementation remains where it is. Only the
header is exported. This is so that a customized
ldb tool can print help along with its own
extra commands

Test Plan: make ldb

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57255
2016-05-11 13:08:45 -07:00
Andrew Kryczka
5c1c904877 ldb option for compression dictionary size
Summary:
Expose the option so it's easy to run offline tests of compression
dictionary feature.

Test Plan:
verified compression dictionary is loaded into lz4 for below command:

  $ ./ldb compact --compression_type=lz4 --compression_max_dict_bytes=16384 --db=/tmp/feed-compression-test/

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57441
2016-05-10 16:33:47 -07:00
Reid Horuff
0460e9dcce Modification of WriteBatch to support two phase commit
Summary: Adds three new WriteBatch data types: Prepare(xid), Commit(xid), Rollback(xid). Prepare(xid) should precede the (single) operation to which is applies. There can obviously be multiple Prepare(xid) markers. There should only be one Rollback(xid) or Commit(xid) marker yet not both. None of this logic is currently enforced and will most likely be implemented further up such as in the memtableinserter. All three markers are similar to PutLogData in that they are writebatch meta-data, ie stored but not counted. All three markers differ from PutLogData in that they will actually be written to disk. As for WriteBatchWithIndex, Prepare, Commit, Rollback are all implemented just as PutLogData and none are tested just as PutLogData.

Test Plan: single unit test in write_batch_test.

Reviewers: hermanlee4, sdong, anthony

Subscribers: leveldb, dhruba, vasilep, andrewkr

Differential Revision: https://reviews.facebook.net/D57867
2016-05-10 14:06:07 -07:00
Yueh-Hsuan Chiang
fca5aa6fcc Initial script for the new regression test
Summary:
This diff includes an initial script running a set of benchmarks for
regression test.  The script does the following things:

  checkout the specified rocksdb commit (or origin/master as default)
  make clean && DEBUG_LEVEL=0 make db_bench
  setup test directories
  run set of benchmarks and store results

Currently, the script will run couple benchmarks, store all the benchmark
output, extract micros per op and percentile information for each benchmark
and store them in a single SUMMARY.csv file.  The SUMMARY.csv will make the
follow-up regression detection easier.

In addition, the current script only takes env arguments to set important
attributes of db_bench.  Will follow-up with a patch that allows db_bench
to construct options from an options file.

Test Plan:
NUM_KEYS=100 ./tools/regression_test.sh

  Sample SUMMARY.csv file:

                                     commit id,                      benchmark,  ms-per-op,        p50,        p75,        p99,      p99.9,     p99.99
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                        fillseq,      15.28,      54.66,      77.14,    5000.00,   17900.00,   18483.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                      overwrite,      13.54,      57.69,      86.39,    3000.00,   15600.00,   17013.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     readrandom,       1.04,       0.80,       1.67,     293.33,     395.00,     504.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,               readwhilewriting,       2.75,       1.01,       1.87,     200.00,     460.00,     485.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                   deleterandom,       3.64,      48.12,      70.09,     200.00,     336.67,     347.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,                     seekrandom,      24.31,     391.87,     513.69,     872.73,     990.00,    1048.00
      7e23ddf575890510e7d2fc7a79b31a1bbf317917,         seekrandomwhilewriting,      14.02,     185.14,     294.15,     700.00,    1440.00,    1527.00

Reviewers: sdong, IslamAbdelRahman, kradhakrishnan, yiwu, andrewkr, gunnarku

Reviewed By: gunnarku

Subscribers: gunnarku, MarkCallaghan, andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57597
2016-05-09 13:32:57 -07:00