Commit Graph

37 Commits

Author SHA1 Message Date
Maysam Yabandeh
11526252cc Pinnableslice (2nd attempt)
Summary:
PinnableSlice

    Summary:
    Currently the point lookup values are copied to a string provided by the
    user. This incures an extra memcpy cost. This patch allows doing point lookup
    via a PinnableSlice which pins the source memory location (instead of
    copying their content) and releases them after the content is consumed
    by the user. The old API of Get(string) is translated to the new API
    underneath.

    Here is the summary for improvements:

    value 100 byte: 1.8% regular, 1.2% merge values
    value 1k byte: 11.5% regular, 7.5% merge values
    value 10k byte: 26% regular, 29.9% merge values
    The improvement for merge could be more if we extend this approach to
    pin the merge output and delay the full merge operation until the user
    actually needs it. We have put that for future work.

    PS:
    Sometimes we observe a small decrease in performance when switching from
    t5452014 to this patch but with the old Get(string) API. The d
Closes https://github.com/facebook/rocksdb/pull/1756

Differential Revision: D4391738

Pulled By: maysamyabandeh

fbshipit-source-id: 6f3edd3
2017-03-13 11:54:10 -07:00
Siying Dong
8efb5ffa2a [rocksdb][PR] Remove option min_partial_merge_operands and verify_checksums_in_comp…
Summary:
…action

 The two options, min_partial_merge_operands and verify_checksums_in_compaction, are not seldom used. Remove them to reduce the total number of options. Also remove them from Java and C interface.
Closes https://github.com/facebook/rocksdb/pull/1902

Differential Revision: D4601219

Pulled By: siying

fbshipit-source-id: aad4cb2
2017-02-23 15:09:12 -08:00
Sagar Vemuri
eb912a927e Remove disableDataSync option
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859

Differential Revision: D4541292

Pulled By: sagar0

fbshipit-source-id: 5b3a6ca
2017-02-13 11:09:13 -08:00
Maysam Yabandeh
d0ba8ec8f9 Revert "PinnableSlice"
Summary:
This reverts commit 54d94e9c2c.

The pull request was landed by mistake.
Closes https://github.com/facebook/rocksdb/pull/1755

Differential Revision: D4391678

Pulled By: maysamyabandeh

fbshipit-source-id: 36d5149
2017-01-08 14:24:12 -08:00
Maysam Yabandeh
54d94e9c2c PinnableSlice
Summary:
Currently the point lookup values are copied to a string provided by the user.
This incures an extra memcpy cost. This patch allows doing point lookup
via a PinnableSlice which pins the source memory location (instead of
copying their content) and releases them after the content is consumed
by the user. The old API of Get(string) is translated to the new API
underneath.

 Here is the summary for improvements:
 1. value 100 byte: 1.8%  regular, 1.2% merge values
 2. value 1k   byte: 11.5% regular, 7.5% merge values
 3. value 10k byte: 26% regular,    29.9% merge values

 The improvement for merge could be more if we extend this approach to
 pin the merge output and delay the full merge operation until the user
 actually needs it. We have put that for future work.

PS:
Sometimes we observe a small decrease in performance when switching from
t5452014 to this patch but with the old Get(string) API. The difference
is a little and could be noise. More importantly it is safely
cancelled
Closes https://github.com/facebook/rocksdb/pull/1732

Differential Revision: D4374613

Pulled By: maysamyabandeh

fbshipit-source-id: a077f1a
2017-01-08 13:54:13 -08:00
Aaron Gao
972f96b3fb direct io write support
Summary:
rocksdb direct io support

```
[gzh@dev11575.prn2 ~/rocksdb] ./db_bench -benchmarks=fillseq --num=1000000
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
RocksDB:    version 5.0
Date:       Wed Nov 23 13:17:43 2016
CPU:        40 * Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz
CPUCache:   25600 KB
Keys:       16 bytes each
Values:     100 bytes each (50 bytes after compression)
Entries:    1000000
Prefix:    0 bytes
Keys per prefix:    0
RawSize:    110.6 MB (estimated)
FileSize:   62.9 MB (estimated)
Write rate: 0 bytes/second
Compression: Snappy
Memtablerep: skip_list
Perf Level: 1
WARNING: Assertions are enabled; benchmarks unnecessarily slow
------------------------------------------------
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
DB path: [/tmp/rocksdbtest-112628/dbbench]
fillseq      :       4.393 micros/op 227639 ops/sec;   25.2 MB/s

[gzh@dev11575.prn2 ~/roc
Closes https://github.com/facebook/rocksdb/pull/1564

Differential Revision: D4241093

Pulled By: lightmark

fbshipit-source-id: 98c29e3
2016-12-22 13:09:19 -08:00
Yi Wu
437942e481 Add avoid_flush_during_shutdown DB option
Summary:
Add avoid_flush_during_shutdown DB option.
Closes https://github.com/facebook/rocksdb/pull/1451

Differential Revision: D4108643

Pulled By: yiwu-arbug

fbshipit-source-id: abdaf4d
2016-11-02 15:39:18 -07:00
Islam AbdelRahman
2ad68b971a Support running consistency checks in release mode
Summary:
We always run consistency checks when compiling in debug mode
allow users to set Options::force_consistency_checks to true to be able to run such checks even when compiling in release mode

Test Plan:
make check -j64
make release

Reviewers: lightmark, sdong, yiwu

Reviewed By: yiwu

Subscribers: hermanlee4, andrewkr, yoshinorim, jkedgar, dhruba

Differential Revision: https://reviews.facebook.net/D64701
2016-10-07 17:21:45 -07:00
Yi Wu
8e061f9740 Refactor GetMutableOptionsFromStrings
Summary: Add mutable options info into `OptionsTypeInfo` and use it to parse mutable options map. Also support `max_bytes_for_level_multiplier_additional` in option file.

Test Plan: unit test

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D63843
2016-09-13 21:12:43 -07:00
Islam AbdelRahman
52ee07b021 Move AddFile() tests to external_sst_file_test.cc
Summary: Simply move the tests

Test Plan: make check -j64

Reviewers: andrewkr, lightmark, yiwu, yhchiang, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62529
2016-09-07 15:41:54 -07:00
sdong
32149059f9 Merge options source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.

Test Plan: Add two new unit tests. Run all existing tests, including jtest.

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59829
2016-09-01 14:33:24 -07:00
sdong
e5b5f12b81 Change options memtable_prefix_bloom_huge_page_tlb_size => memtable_huge_page_size and cover huge page to memtable too
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60513
2016-07-26 18:15:11 -07:00
sdong
7b79238b65 Deprectate filter_deletes
Summary: filter_deltes is not a frequently used feature. Remove it.

Test Plan: Run all test suites.

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59427
2016-06-17 10:30:47 -07:00
sdong
20699df843 memtable_prefix_bloom_bits -> memtable_prefix_bloom_bits_ratio and deprecate memtable_prefix_bloom_probes
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio

Test Plan: Run all existing tests

Reviewers: yhchiang, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59199
2016-06-10 12:12:10 -07:00
sdong
4b6833aec1 Rename options.compaction_measure_io_stats to options.report_bg_io_stats and include flush too.
Summary: It is useful to print out IO stats in flush jobs too. Extend options.compaction_measure_io_stats to flush jobs and raname it.

Test Plan: Try db_bench and see the stats are printed out.

Reviewers: yhchiang

Reviewed By: yhchiang

Subscribers: kradhakrishnan, yiwu, IslamAbdelRahman, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56769
2016-04-15 10:22:18 -07:00
Marton Trencseni
9b51987521 Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.

Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56133
2016-04-01 10:42:39 -07:00
sdong
b1fafcaca6 Revert "Adding pin_l0_filter_and_index_blocks_in_cache feature."
This reverts commit 522de4f59e.

It has bug of index block cleaning up.
2016-03-21 11:50:42 -07:00
Marton Trencseni
522de4f59e Adding pin_l0_filter_and_index_blocks_in_cache feature.
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
When the table reader is destroyed, it releases the pinned blocks (if there were any). This has to happen before the cache is destroyed, so I had to introduce a TableReader::Close(), to guarantee the order of destruction.

Test Plan:
Added two unit tests for this. Existing unit tests run fine (default is pin_l0_filter_and_index_blocks_in_cache=false).

DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32
  Mac: OK.
  Linux: with D55287 patched in it's OK.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54801
2016-03-17 22:40:01 +00:00
Edouard A
02e62ebbc8 Fixes warnings and ensure correct int behavior on 32-bit platforms. 2016-03-16 22:57:57 +01:00
Baraa Hamodi
21e95811d1 Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
Islam AbdelRahman
8e6172bc57 Add BlockBasedTableOptions::index_block_restart_interval
Summary: Add a new option to BlockBasedTableOptions that will allow us to change the restart interval for the index block

Test Plan: unit tests

Reviewers: yhchiang, anthony, andrewkr, sdong

Reviewed By: sdong

Subscribers: march, dhruba

Differential Revision: https://reviews.facebook.net/D53721
2016-02-05 10:22:37 -08:00
Yueh-Hsuan Chiang
e11f676e34 Add OptionsUtil::LoadOptionsFromFile() API
Summary:
This patch adds OptionsUtil::LoadOptionsFromFile() and
OptionsUtil::LoadLatestOptionsFromDB(), which allow developers
to construct DBOptions and ColumnFamilyOptions from a RocksDB
options file.  Note that most pointer-typed options such as
merge_operator will not be constructed.

With this API, developers no longer need to remember all the
options in order to reopen an existing rocksdb instance like
the following:

  DBOptions db_options;
  std::vector<std::string> cf_names;
  std::vector<ColumnFamilyOptions> cf_opts;

  // Load primitive-typed options from an existing DB
  OptionsUtil::LoadLatestOptionsFromDB(
      dbname, &db_options, &cf_names, &cf_opts);

  // Initialize necessary pointer-typed options
  cf_opts[0].merge_operator.reset(new MyMergeOperator());
  ...

  // Construct the vector of ColumnFamilyDescriptor
  std::vector<ColumnFamilyDescriptor> cf_descs;
  for (size_t i = 0; i < cf_opts.size(); ++i) {
    cf_descs.emplace_back(cf_names[i], cf_opts[i]);
  }

  // Open the DB
  DB* db = nullptr;
  std::vector<ColumnFamilyHandle*> cf_handles;
  auto s = DB::Open(db_options, dbname, cf_descs,
                    &handles, &db);

Test Plan:
Augment existing tests in column_family_test
options_test
db_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49095
2015-11-12 06:52:43 -08:00
Alexey Maykov
e1a09a7703 Implementation for GetPropertiesOfTablesInRange
Summary: In MyRocks, it is sometimes important to get propeties only for the subset of the database. This diff implements the API in RocksDB.

Test Plan: ran the GetPropertiesOfTablesInRange

Reviewers: rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48651
2015-10-17 13:34:43 -07:00
Andres Noetzli
014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00
Andres Notzli
f32a572099 Simplify querying of merge results
Summary:
While working on supporting mixing merge operators with
single deletes ( https://reviews.facebook.net/D43179 ),
I realized that returning and dealing with merge results
can be made simpler. Submitting this as a separate diff
because it is not directly related to single deletes.

Before, callers of merge helper had to retrieve the merge
result in one of two ways depending on whether the merge
was successful or not (success = result of merge was single
kTypeValue). For successful merges, the caller could query
the resulting key/value pair and for unsuccessful merges,
the result could be retrieved in the form of two deques of
keys and values. However, with single deletes, a successful merge
does not return a single key/value pair (if merge
operands are merged with a single delete, we have to generate
a value and keep the original single delete around to make
sure that we are not accidentially producing a key overwrite).
In addition, the two existing call sites of the merge
helper were taking the same actions independently from whether
the merge was successful or not, so this patch simplifies that.

Test Plan: make clean all check

Reviewers: rven, sdong, yhchiang, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43353
2015-08-17 17:34:38 -07:00
sdong
6e9fbeb27c Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.

Test Plan: Run all existing unit tests.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:58:18 -07:00
Igor Canadi
6bb7e3ef25 Merger test
Summary: I abandoned https://reviews.facebook.net/D18789, but I wrote a good unit test there, so let's check it in. :)

Test Plan: this is test

Reviewers: sdong, yhchiang, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D22827
2014-09-08 22:24:40 -07:00
Radheshyam Balasundaram
4142a3e783 Adding a user comparator for comparing Uint64 slices.
Summary:
- New Uint64 comparator
- Modify Reader and Builder to take custom user comparators instead of bytewise comparator
- Modify logic for choosing unused user key in builder
- Modify iterator logic in reader
- test changes

Test Plan:
cuckoo_table_{builder,reader,db}_test
make check all

Reviewers: ljin, sdong

Reviewed By: ljin

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D22377
2014-08-27 10:39:31 -07:00
Dhruba Borthakur
9cd221094c Add appropriate LICENSE and Copyright message.
Summary:
Add appropriate LICENSE and Copyright message.

Test Plan:
make check

Reviewers:

CC:

Task ID: #

Blame Rev:
2013-10-16 17:48:41 -07:00
Dhruba Borthakur
a143ef9b38 Change namespace from leveldb to rocksdb
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.

Test Plan: compile rocksdb

Reviewers: emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D13287
2013-10-04 11:59:26 -07:00
Dhruba Borthakur
aa42c66814 Fix all warnings generated by -Wall option to the compiler.
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.

Test Plan: make all check

Reviewers: heyongqiang, emayanke, sheki

Reviewed By: heyongqiang

CC: MarkCallaghan

Differential Revision: https://reviews.facebook.net/D6525
2012-11-06 14:07:31 -08:00
Hans Wennborg
36a5f8ed7f A number of fixes:
- Replace raw slice comparison with a call to user comparator.
  Added test for custom comparators.

- Fix end of namespace comments.

- Fixed bug in picking inputs for a level-0 compaction.

  When finding overlapping files, the covered range may expand
  as files are added to the input set.  We now correctly expand
  the range when this happens instead of continuing to use the
  old range.  For example, suppose L0 contains files with the
  following ranges:

      F1: a .. d
      F2:    c .. g
      F3:       f .. j

  and the initial compaction target is F3.  We used to search
  for range f..j which yielded {F2,F3}.  However we now expand
  the range as soon as another file is added.  In this case,
  when F2 is added, we expand the range to c..j and restart the
  search.  That picks up file F1 as well.

  This change fixes a bug related to deleted keys showing up
  incorrectly after a compaction as described in Issue 44.

(Sync with upstream @25072954)
2011-10-31 17:22:06 +00:00
dgrogan@chromium.org
69c6d38342 reverting disastrous MOE commit, returning to r21
git-svn-id: https://leveldb.googlecode.com/svn/trunk@23 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-19 23:11:15 +00:00
dgrogan@chromium.org
b743906eea Revision created by MOE tool push_codebase.
MOE_MIGRATION=


git-svn-id: https://leveldb.googlecode.com/svn/trunk@22 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-19 23:01:25 +00:00
dgrogan@chromium.org
b409afe968 chmod a-x
git-svn-id: https://leveldb.googlecode.com/svn/trunk@21 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-18 23:15:58 +00:00
dgrogan@chromium.org
f779e7a5d8 @20602303. Default file permission is now 755.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@20 62dab493-f737-651d-591e-8d6aee1b9529
2011-04-12 19:38:58 +00:00
jorlow@chromium.org
f67e15e50f Initial checkin.
git-svn-id: https://leveldb.googlecode.com/svn/trunk@2 62dab493-f737-651d-591e-8d6aee1b9529
2011-03-18 22:37:00 +00:00