Commit Graph

1111 Commits

Author SHA1 Message Date
Islam AbdelRahman
72d6e758b4 Fix WritableFileWriter::Append() return
Summary: It looks like WritableFileWriter::Append() was returning OK() even when there is an error

Test Plan: make check

Reviewers: sdong, yhchiang, anthony, rven, kradhakrishnan, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D49569
2015-10-27 21:04:00 -07:00
sdong
c37223c083 Avoid to reply on ROCKSDB_FALLOCATE_PRESENT in include/posix/io_posix.h
Summary: include/posix/io_posix.h should not depend on ROCKSDB_FALLOCATE_PRESENT. Remove it.

Test Plan: Build it with both of ROCKSDB_FALLOCATE_PRESENT defined and not defined.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49563
2015-10-27 16:48:40 -07:00
Dmitri Smirnov
6fbc4f9f3e Implement smart buffer management.
introduce a new DBOption random_access_max_buffer_size to limit
  the size of the random access buffer used for unbuffered access.
  Implement read ahead buffering when enabled.
  To that effect propagate compaction_readahead_size and the new option
  to the env options to make it available for the implementation.
  Add Hint() override so SetupForCompaction() call would call Hint()
  readahead can now be setup from both Hint() and EnableReadAhead()
  Add new option random_access_max_buffer_size support
  db_bench, options_helper to make it string parsable
  and the unit test.
2015-10-27 14:44:16 -07:00
sdong
d6219e4d9b Mac build break caused by include/posix/io_posix.h not declearing errno,
Summary: Mac build breaks as include/posix/io_posix.h doesn't include errno. Move the exact function declaration to io_posix.cc

Test Plan: Run all test. Will run on Mac

Reviewers: rven, anthony, yhchiang, IslamAbdelRahman, igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49551
2015-10-27 14:16:19 -07:00
Siying Dong
beb69d4511 Merge pull request #765 from PraveenSinghRao/wal_filter
Adding wal filter to inspect and filter wal records on recovery
2015-10-27 12:13:01 -07:00
sdong
ab0f3b964f crash_test to trigger some less frequent crash point more frequently
Summary: crash_test still has a very low chance to hit some crash point. Have another mode for covering them more likely.

Test Plan: Run crash_test and see db_stress is called with expected prameters.

Reviewers: kradhakrishnan, igor, anthony, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49473
2015-10-27 12:06:06 -07:00
Praveen Rao
4ce117c4d5 Merge branch 'master' into wal_filter 2015-10-26 19:03:34 -07:00
sdong
44d4057d78 Avoid some includes in io_posix.h
Summary: IO Posix depends on too many .h files. Move most of them to .cc files.

Test Plan: make all

Reviewers: anthony, rven, IslamAbdelRahman, yhchiang, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49479
2015-10-26 17:00:25 -07:00
Siying Dong
138876a62c Merge pull request #746 from ceph/wip-recycle
Add Options.recycle_log_file_num for Recycling WAL Files
2015-10-26 15:01:28 -07:00
Javier González
6e6dd5f6f9 Split posix storage backend into Env and library
Summary: This patch splits the posix storage backend into Env and
the actual *File implementations. The motivation is to allow other Envs
to use posix as a library. This enables a storage backend different from
posix to split its secondary storage between a normal file system
partition managed by posix, and it own media.

Test Plan: No new functionality is added to posix Env or the library,
thus the current tests should suffice.
2015-10-22 17:31:31 +02:00
Praveen Rao
2938c5c137 merge upstream changes 2015-10-19 15:21:33 -07:00
Igor Canadi
4e07c99a9a Fix iOS build
Summary: We don't yet have a CI build for iOS, so our iOS compile gets broken sometimes. Most of the errors are from assumption that size_t is 64-bit, while it's actually 32-bit on some (all?) iOS platforms. This diff fixes the compile.

Test Plan:
TARGET_OS=IOS make static_lib

Observe there are no warnings

Reviewers: sdong, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49029
2015-10-19 13:40:44 -07:00
Praveen Rao
0c59691dde Handle multiple batches in single log record - allow app to return a new batch + allow app to return corrupted record status 2015-10-19 13:27:40 -07:00
Sage Weil
543c12ab06 options: add recycle_log_file_num option
Signed-off-by: Sage Weil <sage@redhat.com>
2015-10-18 21:21:24 -04:00
Sage Weil
1bcafb62f4 env: add ReuseWritableFile
Add an environment method to reuse an existing file.  Provide a generic
implementation that does a simple rename + open (writeable), and also a
posix variant that is more careful about error handling (if we fail to
open, do not rename, etc.).

Signed-off-by: Sage Weil <sage@redhat.com>
2015-10-18 21:21:24 -04:00
Alexey Maykov
e1a09a7703 Implementation for GetPropertiesOfTablesInRange
Summary: In MyRocks, it is sometimes important to get propeties only for the subset of the database. This diff implements the API in RocksDB.

Test Plan: ran the GetPropertiesOfTablesInRange

Reviewers: rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48651
2015-10-17 13:34:43 -07:00
sdong
277dea78f0 Add more kill points
Summary:
Add kill points in:
1. after creating a file
2. before writing a manifest record
3. before syncing manifest
4. before creating a new current file
5. after creating a new current file

Test Plan: Run all current tests.

Reviewers: yhchiang, igor, anthony, IslamAbdelRahman, rven, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48855
2015-10-16 14:35:12 -07:00
Venkatesh Radhakrishnan
a98fbacfa0 Moving memtable related files from util to a new directory memtable
Summary:
We are cleaning up dependencies.
This diff takes a first step at moving memtable files to their own
directory called memtable. In future diffs, we will move other memtable
files from db to memtable.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48915
2015-10-16 14:10:33 -07:00
sdong
e1a5ff857b Allow users to disable some kill points in db_stress
Summary:
Give a name for every kill point, and allow users to disable some kill points based on prefixes. The kill points can be passed by db_stress through a command line paramter. This provides a way for users to boost the chance of triggering low frequency kill points
This allow follow up changes in crash test scripts to improve crash test coverage.

Test Plan:
Manually run db_stress with variable values of --kill_random_test and --kill_prefix_blacklist. Like this:
 --kill_random_test=2 --kill_prefix_blacklist=Posix,WritableFileWriter::Append,WritableFileWriter::WriteBuffered,WritableFileWriter::Sync

Reviewers: igor, kradhakrishnan, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48735
2015-10-15 14:33:13 -07:00
Venkatesh Radhakrishnan
63e507c59c Move ldb and sst_dump from utils to tools.
Summary: As part of cleaning up dependencies for tech debt week, we are moving ldb and sst_dump tools from util to tools, since they are tools.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48747
2015-10-14 17:08:28 -07:00
sdong
dae49e829e Make DBTest.ReadLatencyHistogramByLevel more robust
Summary:
Two fixes:
1. Wait compaction after generating each L0 file so that we are sure there are one L0 file left.
2. https://reviews.facebook.net/D48423 increased from 500 keys to 700 keys but in verification phase we are still querying the first 500 keys. It is a bug to fix.

Test Plan: Run the test in the same environment that fails by chance of one in tens of times. It doesn't fail after 1000 times.

Reviewers: yhchiang, IslamAbdelRahman, igor, rven, kradhakrishnan

Reviewed By: rven, kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48759
2015-10-14 16:08:55 -07:00
Venkatesh Radhakrishnan
e587dbe03a Move manual_compaction_test.cc from util to db
Summary: manual_compaction_test.cc incorrectly in util. Moved to db.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48687
2015-10-14 11:06:27 -07:00
sdong
35ad531be3 Seperate InternalIterator from Iterator
Summary:
Separate a new class InternalIterator from class Iterator, when the look-up is done internally, which also means they operate on key with sequence ID and type.

This change will enable potential future optimizations but for now InternalIterator's functions are still the same as Iterator's.
At the same time, separate the cleanup function to a separate class and let both of InternalIterator and Iterator inherit from it.

Test Plan: Run all existing tests.

Reviewers: igor, yhchiang, anthony, kradhakrishnan, IslamAbdelRahman, rven

Reviewed By: rven

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48549
2015-10-13 15:32:13 -07:00
Praveen Rao
cc4d13e0a8 Put wal_filter under #ifndef ROCKSDB_LITE 2015-10-13 11:10:14 -07:00
Yueh-Hsuan Chiang
7062d0ea68 Make perf_context.db_mutex_lock_nanos and db_condition_wait_nanos only measures DB Mutex
Summary:
In the current implementation, perf_context.db_mutex_lock_nanos and
perf_context.db_condition_wait_nanos also include the mutex-wait time
other than DB Mutex.

This patch fix this issue by incrementing the counters only when it detects
a DB mutex.

Test Plan: perf_context_test

Reviewers: anthony, IslamAbdelRahman, sdong, igor

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48555
2015-10-13 10:41:48 -07:00
Praveen Rao
eb24178553 merge from master 2015-10-12 17:24:21 -07:00
Islam AbdelRahman
c64ae05b1c Move TEST_NewInternalIterator to NewInternalIterator
Summary:
Long time ago we add InternalDumpCommand to ldb_tool https://reviews.facebook.net/D11517
This command is using TEST_NewInternalIterator although it's not a test. This patch move TEST_NewInternalIterator outside of db_impl_debug.cc

Test Plan:
make check
make static_lib

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48561
2015-10-12 17:22:37 -07:00
Praveen Rao
59a0c219bb Adding log filter to inspect and filter log records on recovery 2015-10-12 17:03:03 -07:00
Venkatesh Radhakrishnan
f1fdf5205b Clean up dependency: Move db_test_util.* to db directory
Summary:
As part of tech debt week, we are cleaning up dependencies.
This diff moves db_test_util.[h,cc] from util to db directory.

Test Plan: make check

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48543
2015-10-12 13:05:42 -07:00
Yueh-Hsuan Chiang
2379944093 Fixed an incorrect replace of const value in util/options_helper.cc
Summary: Fixed an incorrect replace of const value in util/options_helper.cc

Test Plan: options_test

Reviewers: igor, sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48513
2015-10-11 17:42:22 -07:00
Yueh-Hsuan Chiang
0bb8ea56be [RocksDB Options File] Add TableOptions section and support BlockBasedTable
Summary:
Introduce TableOptions section and support BlockBasedTable in RocksDB
options file.  A TableOptions section has the following format:

  [TableOptions/<FactoryClassName> "<ColumnFamily Name>"]

which includes information about its TableFactory class and belonging
column family.  Below is an example TableOptions section of a
BlockBasedTableOptions that belongs to the default column family:

  [TableOptions/BlockBasedTable "default"]
    format_version=0
    whole_key_filtering=true
    block_size_deviation=10
    block_size=4096
    block_restart_interval=16
    filter_policy=nullptr
    no_block_cache=false
    checksum=kCRC32c
    cache_index_and_filter_blocks=false
    index_type=kBinarySearch
    hash_index_allow_collision=true
    flush_block_policy_factory=FlushBlockBySizePolicyFactory

Currently, Cache-type options (i.e., block_cache and block_cache_compressed)
are not supported.

Test Plan: options_test

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48435
2015-10-11 12:17:42 -07:00
sdong
776bd8d5eb Pass column family ID to table property collector
Summary: Pass column family ID through TablePropertiesCollectorFactory::CreateTablePropertiesCollector() so that users can identify which column family this file is for and handle it differently.

Test Plan: Add unit test scenarios in tests related to table properties collectors to verify the information passed in is correct.

Reviewers: rven, yhchiang, anthony, kradhakrishnan, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48411
2015-10-09 14:36:51 -07:00
Islam AbdelRahman
51fa7ecec5 Bytes read/written from cache statistics
Summary: Add 2 new counters BLOCK_CACHE_BYTES_WRITE, BLOCK_CACHE_BYTES_READ to keep track of how many bytes were written to the cache and how many bytes that we read from cache

Test Plan: make check

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48195
2015-10-07 15:17:20 -07:00
dyniusz
a065cdb388 bloom hit/miss stats for SST and memtable
Summary:
	hit and miss bloom filter stats for memtable and SST
	stats added to perf_context struct
	key matches and prefix matches combined into one stat

Test Plan: unit test veryfing the functionality added, see BloomStatsTest in db_test.cc for details

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47859
2015-10-07 11:23:20 -07:00
Igor Canadi
40cdf797d2 Fix compile error on platforms without fallocate()
Summary:
If a platform doesn't have ROCKSDB_FALLOCATE_PRESENT, then compiler complains:

util/env_posix.cc:354:8: error: private field 'allow_fallocate_' is not used [-Werror,-Wunused-private-field]

This was caught by travis.

Test Plan: compiles with ROCKSDB_FALLOCATE_PRESENT.

Reviewers: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48327
2015-10-07 11:02:23 -07:00
Lakshmi Narayanan
4049bcde39 Added boolean variable to guard fallocate() calls
Summary:
Added boolean variable to guard fallocate() calls.
Set to false to prevent space leaks when tests fail.

Test Plan:
Compliles
Set to false and ran log device tests

Reviewers: sdong, lovro, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48027
2015-10-07 10:04:05 -07:00
Igor Canadi
d80ce7f99a Compaction filter on merge operands
Summary:
Since Andres' internship is over, I took over https://reviews.facebook.net/D42555 and rebased and simplified it a bit.

The behavior in this diff is a bit simpler than in D42555:
* only merge operators are passed through FilterMergeValue(). If fitler function returns true, the merge operator is ignored
* compaction filter is *not* called on: 1) results of merge operations and 2) base values that are getting merged with merge operands (the second case was also true in previous diff)

Do we also need a compaction filter to get called on merge results?

Test Plan: make && make check

Reviewers: lovro, tnovak, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: noetzli, kolmike, leveldb, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D47847
2015-10-07 09:30:03 -07:00
dyniusz
0267502655 Support for LevelDB SST with .ldb suffix
Summary:
	Handle SST files with both ".sst" and ".ldb" suffix.
	This enables user to migrate from leveldb to rocksdb.

Test Plan:
        Added unit test with DB operating on SSTs with names schema.
        See db/dc_test.cc:SSTsWithLdbSuffixHandling for details

Reviewers: yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48003
2015-10-06 17:46:22 -07:00
Yueh-Hsuan Chiang
5c7bf56d35 [RocksDB Options] Support more options in RocksDBOptionParser for sanity check.
Summary:
RocksDBOptionsParser now supports CompressionType and the following
pointer-typed options in RocksDBOptionParser
for sanity check:
  prefix_extractor
  table_factory
  comparator
  compaction_filter
  compaction_filter_factory
  merge_operator
  memtable_factory

In the RocksDB Options file, only high level information about pointer-typed
options are serialized, and those information is only used for verification
/ sanity check purpose.

Test Plan: added more tests in options_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47925
2015-10-02 15:35:32 -07:00
Evan Shaw
7a23e4d8ca New amalgamation target
This commit adds two new targets to the Makefile: rocksdb.cc and rocksdb.h

These files, when combined with the c.h header, are a self-contained RocksDB
source distribution called an amalgamation. (The name comes from SQLite's, which
is similar in concept.)

The main benefit of an amalgamation is that it's very easy to drop into a
new project. It also compiles faster compared to compiling individual source
files and potentially gives the compiler more opportunity to make optimizations
since it can see all functions at once.

rocksdb.cc and rocksdb.h are generated by a new script, amalgamate.py.
A detailed description of how amalgamate.py works is in a comment at the top of
the file.

There are also some small changes to existing files to enable the amalgamation:
* Use quotes for includes in unity build
* Fix an old header inclusion in util/xfunc.cc
* Move some includes outside ifdef in util/env_hdfs.cc
* Separate out tool sources in Makefile so they won't be included in unity.cc
* Unity build now produces a static library

Closes #733
2015-10-01 08:29:31 +13:00
Dmitri Smirnov
9320ffd67a Improve CI build and fix Windows build breakage
Is there a way to enforce CMake additions for internal changes that seem to come
  w/o a PR?
2015-09-30 11:20:23 -07:00
Yueh-Hsuan Chiang
da1cf8a9bc Add a missing check for deprecated options in options_helper.cc
Summary: Add a missing check for deprecated options in options_helper.cc

Test Plan: options_test

Reviewers: sdong, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47793
2015-09-29 17:58:00 -07:00
Yueh-Hsuan Chiang
1e73b11af7 Better handling of deprecated options in RocksDBOptionsParser
Summary:
Previously, we treat deprecated options as normal options in
RocksDBOptionsParser.  However, these deprecated options should
not be verified and serialized.

Test Plan: options_test

Reviewers: igor, sdong, IslamAbdelRahman, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47775
2015-09-29 17:13:02 -07:00
Yueh-Hsuan Chiang
a8b8295d18 Fixed a compile warning in options_test.cc under clang
Summary:
Fixed the following compile warning in options_test.cc under clang
util/options_test.cc:94:12: error: 'Skip' overrides a member function but is not marked 'override' [-Werror,-Winconsistent-missing-override]
    Status Skip(uint64_t n) {
           ^
./include/rocksdb/env.h:368:18: note: overridden virtual function is here
  virtual Status Skip(uint64_t n) = 0;
                 ^

Test Plan: options_test

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47763
2015-09-29 14:59:02 -07:00
Yueh-Hsuan Chiang
74b100ac17 RocksDB Options file format and its serialization / deserialization.
Summary:
This patch defines the format of RocksDB options file, which
follows the INI file format, and implements functions for its
serialization and deserialization.  An example RocksDB options
file can be found in examples/rocksdb_option_file_example.ini.

A typical RocksDB options file has three sections, which are
Version, DBOptions, and more than one CFOptions.  The RocksDB
options file in general follows the basic INI file format
with the following extensions / modifications:
 * Escaped characters
   We escaped the following characters:
    - \n -- line feed - new line
    - \r -- carriage return
    - \\ -- backslash \
    - \: -- colon symbol :
    - \# -- hash tag #
 * Comments
   We support # style comments.  Comments can appear at the ending
   part of a line.
 * Statements
   A statement is of the form option_name = value.
   Each statement contains a '=', where extra white-spaces
   are supported. However, we don't support multi-lined statement.
   Furthermore, each line can only contain at most one statement.
 * Section
   Sections are of the form [SecitonTitle "SectionArgument"],
   where section argument is optional.
 * List
   We use colon-separated string to represent a list.
   For instance, n1:n2:n3:n4 is a list containing four values.

Below is an example of a RocksDB options file:

[Version]
  rocksdb_version=4.0.0
  options_file_version=1.0
[DBOptions]
  max_open_files=12345
  max_background_flushes=301
[CFOptions "default"]
[CFOptions "the second column family"]
[CFOptions "the third column family"]

Test Plan: Added many tests in options_test.cc

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46059
2015-09-29 14:42:40 -07:00
Islam AbdelRahman
16d1ba7001 Clear SyncPoint Trace in DeleteSchedulerTests
Summary: DeleteSchedulerTests is running the same test with different rates, After the first iteraton sync points become useless because ClearTrace was not being called

Test Plan: Run the test

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D47709
2015-09-28 15:58:50 -07:00
Yueh-Hsuan Chiang
e4861e7d68 Fixed a compile error in util/arena.h
Summary:
Fixed the compile error in util/arena.h caused by not
including TLB related header.

Test Plan: make db_stress

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47697
2015-09-28 11:51:32 -07:00
Yueh-Hsuan Chiang
0fdb4f1688 Fixed a compile warning in util/arena.cc when hugetlb is not supported.
Summary:
Fixed the following compile warning when hugetlb is not supported.

    ./util/arena.h:102:10: error: private field 'hugetlb_size_' is not used [-Werror,-Wunused-private-field]
      size_t hugetlb_size_ = 0;
             ^
    1 error generated.

Test Plan: make db_stress

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47691
2015-09-28 11:43:04 -07:00
Igor Canadi
dac3f22b77 Fix the test failure
Summary: AllocateFromHugePage() can return nullptr, and then we need to try to allocate the block with AllocateNewBlock()

Test Plan: arena_test

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47607
2015-09-25 13:55:11 -07:00
Igor Canadi
7ee445dd68 Fix the compile warning
Summary: clang is a bit confused, see here: https://travis-ci.org/facebook/rocksdb/jobs/82214750

Test Plan: travis CI

Reviewers: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47601
2015-09-25 13:17:19 -07:00
jsteemann
aa58958d38 prevent potential memleaks in Arena::Allocate*()
The previous memory allocation procedures tried to allocate memory
via `new` or `mmap` and inserted the pointer to the memory into an
std::vector afterwards. In case `new` or `mmap` threw or returned
a nullptr, no memory was leaking. If `new` or `mmap` worked ok, the
following `vector::push_back` could still fail and throw an exception.
In this case, the memory just allocated was leaked.

The fix is to reserve space in the target memory pointer block
beforehand. If this throws, then no memory is allocated nor leaked.
If the reserve works but the actual allocation fails, still no
memory is leaked, only the target vector will have space for at
least one more element than actually required (but this may be
reused for the next allocation)
2015-09-25 21:23:40 +02:00
Igor Canadi
7b7b5d9f18 [minor] Reuse SleepingBackgroundTask
Summary: As title

Test Plan: make check

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46983
2015-09-25 10:29:44 -07:00
Assaf Sela
4805fa0eae Remove ldb HexToString method's usage of sscanf
Summary:
Fix hex2String performance issues by removing sscanf dependency.
Also fixed some edge case handling (odd length, bad input).

Test Plan: Created a test file which called old and new implementation, and validated results are the same. I'll paste results in the phabricator diff.

Reviewers: igor, rven, anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, sdong

Reviewed By: sdong

Subscribers: thatsafunnyname, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D46785
2015-09-23 14:25:46 -07:00
sdong
df34aea331 PlainTableReader to support non-mmap mode
Summary:
PlainTableReader now only allows mmap-mode. Add the support to non-mmap mode for more flexibility.
Refactor the codes to move all logic of reading data to PlainTableKeyDecoder, and consolidate the calls to Read() call and ReadVarint32() call. Implement the calls for both of mmap and non-mmap case seperately. For non-mmap mode, make copy of keys in several places when we need to move the buffer after reading the keys.

Test Plan: Add the mode of non-mmap case in plain_table_db_test. Run it in valgrind mode too.

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D47187
2015-09-23 11:41:07 -07:00
sdong
d746eaad5e RandomAccessFileReader should not inherit RandomAccessFile
Summary: RandomAccessFileReader unnecessarily inherited RandomAccessFile, which can introduce unnecessarily extra costs. Remove it.

Test Plan: Run all existing tests

Reviewers: yhchiang, anthony, igor, kradhakrishnan, rven, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D47409
2015-09-23 11:00:41 -07:00
sdong
f1b9f804e9 Add a mode to always pick the oldest file to compact for each level
Summary:
Add options.compaction_pri, which specifies the policy about which file to compact first.
kCompactionPriByLargestSeq will compact oldest files first.
Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable.

Test Plan: Will write unit tests

Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45951
2015-09-21 17:21:59 -07:00
jsteemann
bbb18c8278 removed unused variable of type Status, fixed indentation 2015-09-18 20:23:50 +02:00
Andres Noetzli
014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00
Andres Noetzli
16934d4959 Fix wrong constants in db_test_util
Summary: Fix two constants in WaitFor() that multiply a value with 000 instead of 1000.

Test Plan: make clean all check

Reviewers: rven, anthony, yhchiang, aekmekji, sdong, MarkCallaghan, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47091
2015-09-16 13:04:05 -07:00
Yueh-Hsuan Chiang
f21c7415a7 Change the log level of DB start-up log from Warn to Header.
Summary: Change the log level of DB start-up log from Warn to Header.

Test Plan: db_bench and observe the LOG header

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47067
2015-09-16 11:31:45 -07:00
Igor Canadi
1b7ea8ce81 Skipped tests shouldn't be failures
Summary: If we skip a test, we shouldn't mark `make check` as failure. This fixes travis CI test.

Test Plan: Travis CI

Reviewers: noetzli, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47031
2015-09-15 18:10:36 -07:00
Ari Ekmekji
2b683d4972 Add DBOption.max_subcompaction to option dump
Summary:
RocksDB options can be dumped to the log file, and
up to this point the max_subcompactions option was not included
in this dump. This fixes that.

Test Plan: makek all && make check

Reviewers: MarkCallaghan, igor, noetzli, anthony, yhchiang, sdong

Reviewed By: yhchiang, sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46971
2015-09-15 11:41:46 -07:00
Igor Canadi
a7e80379b0 LogAndApply() should fail if the column family has been dropped
Summary:
This patch finally fixes the ColumnFamilyTest.ReadDroppedColumnFamily test. The test has been failing very sporadically and it was hard to repro. However, I managed to write a new tests that reproes the failure deterministically.

Here's what happens:
1. We start the flush for the column family
2. We check if the column family was dropped here: a3fc49bfdd/db/flush_job.cc (L149)
3. This check goes through, ends up in InstallMemtableFlushResults() and it goes into LogAndApply()
4. At about this time, we start dropping the column family. Dropping the column family process gets to LogAndApply() at about the same time as LogAndApply() from flush process
5. Drop column family goes through LogAndApply() first, marking the column family as dropped.
6. Flush process gets woken up and gets a chance to write to the MANIFEST. However, this is where it gets stuck: a3fc49bfdd/db/version_set.cc (L1975)
7. We see that the column family was dropped, so there is no need to write to the MANIFEST. We return OK.
8. Flush gets OK back from LogAndApply() and it deletes the memtable, thinking that the data is now safely persisted to sst file.

The fix is pretty simple. Instead of OK, we return ShutdownInProgress. This is not really true, but we have been using this status code to also mean "this operation was canceled because the column family has been dropped".

The fix is only one LOC. All other code is related to tests. I added a new test that reproes the failure. I also moved SleepingBackgroundTask to util/testutil.h (because I needed it in column_family_test for my new test). There's plenty of other places where we reimplement SleepingBackgroundTask, but I'll address that in a separate commit.

Test Plan:
1. new test
2. make check
3. Make sure the ColumnFamilyTest.ReadDroppedColumnFamily doesn't fail on Travis: https://travis-ci.org/facebook/rocksdb/jobs/79952386

Reviewers: yhchiang, anthony, IslamAbdelRahman, kradhakrishnan, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46773
2015-09-15 11:28:44 -07:00
Andres Noetzli
df22e2fb71 Relax memory order for faster tickers
Summary:
The default behavior for atomic operations is sequentially consistent ordering
which is not needed for simple counters (see:
http://en.cppreference.com/w/cpp/atomic/memory_order). Change the memory order
to std::memory_order_relaxed for better performance.

Test Plan: make clean all check

Reviewers: rven, anthony, yhchiang, aekmekji, sdong, MarkCallaghan, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46953
2015-09-15 10:52:00 -07:00
Islam AbdelRahman
e467bf0de0 Fix valgrind error
Summary:
Valgrind is complaining because we are using hard_rate_limit (when serializing the options) without being initialized
http://our.intern.facebook.com/intern/sandcastle/3962140295/77533971/

Test Plan: run the test under valgrind

Reviewers: kradhakrishnan, yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46929
2015-09-14 17:41:40 -07:00
sdong
5de807ac16 Add options.hard_pending_compaction_bytes_limit to stop writes if compaction lagging behind
Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold.

Test Plan: Add unit test DBTest.HardLimit

Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45999
2015-09-14 12:51:16 -07:00
Islam AbdelRahman
7143242d12 Fix compaction_job_stats under ROCKSDB_LITE
Summary: Fix compaction_job_stats under ROCKSDB_LITE

Test Plan: compile using ROCKSDB_LITE

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46887
2015-09-14 12:42:06 -07:00
Siying Dong
592f6bf782 Merge pull request #716 from yuslepukhin/refactor_file_reader_writer_win
Refactor to support file_reader_writer on Windows.
2015-09-14 12:29:01 -07:00
Andres Noetzli
ad0d70ca1e Relax asserts in arena_test
Summary:
Commit c67d206898 did not fix all test conditions
which use Arena::MemoryAllocatedBytes() (see Travis failure
https://travis-ci.org/facebook/rocksdb/jobs/79957700). The assumption of that
commit was that aligned allocations do not call Arena::AllocateNewBlock(), so
malloc_usable_block_size() would not be used for Arena::MemoryAllocatedBytes().
However, there is a code path where Arena::AllocateAligned() calls
AllocateFallback() which in turn calls Arena::AllocateNewBlock(), so
Arena::MemoryAllocatedBytes() may return a greater value than expected even for
aligned requests.

Test Plan: make arena_test && ./arena_test

Reviewers: rven, anthony, yhchiang, aekmekji, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46869
2015-09-14 11:54:47 -07:00
Dmitri Smirnov
ddc8b44998 Address code review comments both GH and internal
Fix compilation issues on GCC/CLANG
 Address Windows Release test build issues due to Sync
2015-09-11 17:36:48 -07:00
Andres Noetzli
c67d206898 Fixed arena_test failure due to malloc_usable_size()
Summary:
ArenaTest.MemoryAllocatedBytes on Travis failed:
https://travis-ci.org/facebook/rocksdb/jobs/79887849 . This is probably due to
malloc_usable_size() returning a value greater than the requested size. From
the man page:

   The value returned by malloc_usable_size() may be greater than the requested
   size of the allocation because of alignment and minimum size constraints.
   Although the excess bytes can be overwritten by the application without ill
   effects, this is not good programming practice: the number of excess bytes
   in an allocation depends on the underlying implementation.

Test Plan: make arena_test && ./arena_test

Reviewers: rven, anthony, yhchiang, aekmekji, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46743
2015-09-11 12:32:59 -07:00
Islam AbdelRahman
45e9e4f0bb Refactor NewTableReader to accept TableReaderOptions
Summary:
Refactoring NewTableReader to accept TableReaderOptions
This will make it easier to add new options in the future, for example in this diff https://reviews.facebook.net/D46071

Test Plan: run existing tests

Reviewers: igor, yhchiang, anthony, rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46179
2015-09-11 11:36:33 -07:00
Dmitri Smirnov
30e82d5c41 Refactor to support file_reader_writer on Windows.
Summary. A change https://reviews.facebook.net/differential/diff/224721/
  Has attempted to move common functionality out of platform dependent
  code to a new facility called file_reader_writer.
  This includes:
  - perf counters
  - Buffering
  - RateLimiting

  However, the change did not attempt to refactor Windows code.
  To mitigate, we introduce new quering interfaces such as UseOSBuffer(),
  GetRequiredBufferAlignment() and ReaderWriterForward()
  for pure forwarding where required.
  Introduce WritableFile got a new method Truncate(). This is to communicate
  to the file as to how much data it has on close.
   - When space is pre-allocated on Linux it is filled with zeros implicitly,
    no such thing exist on Windows so we must truncate file on close.
   - When operating in unbuffered mode the last page is filled with zeros but we still want to truncate.

   Previously, Close() would take care of it but now buffer management is shifted to the wrappers and the file has
   no idea about the file true size.

   This means that Close() on the wrapper level must always include
   Truncate() as well as wrapper __dtor should call Close() and
   against double Close().
   Move buffered/unbuffered write logic to the wrapper.
   Utilize Aligned buffer class.
   Adjust tests and implement Truncate() where necessary.
   Come up with reasonable defaults for new virtual interfaces.
   Forward calls for RandomAccessReadAhead class to avoid double
   buffering and locking (double locking in unbuffered mode on WIndows).
2015-09-11 09:57:02 -07:00
Andres Noetzli
c66d53feed Fixed minor issue in CompressionTypeSupported()
Summary:
CompressionTypeSupported was returning LZ4_Supported() for
kZSTDNotFinalCompression. This patch changes it to ZSTD_Supported().

Test Plan: make clean all check

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46521
2015-09-09 16:36:19 -07:00
Andres Notzli
0ccf2db385 Fixed broken build due to format specifier
Summary:
Clang expects %llu for uint64_t, while gcc expects %lu. Replaced the format
specifier with a format macro. This should fix the build on gcc and Clang.

Test Plan: Build on gcc and clang.

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46431
2015-09-08 15:46:16 -07:00
Andres Noetzli
6bdc484fd8 Added Equal method to Comparator interface
Summary:
In some cases, equality comparisons can be done more efficiently than three-way
comparisons. There are quite a few places in the code where we only care about
equality. This patch adds an Equal() method that defaults to using the
Compare() method.

Test Plan: make clean all check

Reviewers: rven, anthony, yhchiang, igor, sdong

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46233
2015-09-08 15:30:49 -07:00
Amit Arya
7a31960ee9 Tests for ManifestDumpCommand and ListColumnFamiliesCommand
Summary:
Added tests for two LDBCommands namely i) ManifestDumpCommand and ii) ListColumnFamiliesCommand.
+ Minor fix in the sscanf formatter (along relace C cast with C++ cast) + replacing localtime with localtime_r which is thread safe.

Test Plan: make all && ./tools/ldb_test.py

Reviewers: anthony, igor, IslamAbdelRahman, kradhakrishnan, lgalanis, rven, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45819
2015-09-08 14:23:42 -07:00
Paul Marinescu
50dc5f0c5a Replace BackupRateLimiter with GenericRateLimiter
Summary: BackupRateLimiter removed and uses replaced with the existing GenericRateLimiter

Test Plan:
make all check

make clean
USE_CLANG=1 make all

make clean
OPT=-DROCKSDB_LITE make release

Reviewers: leveldb, igor

Reviewed By: igor

Subscribers: igor, dhruba

Differential Revision: https://reviews.facebook.net/D46095
2015-09-03 17:00:09 -07:00
Andres Noetzli
3c9cef1eed Unified maps with Comparator for sorting, other cleanup
Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).

Test Plan: make clean check all

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45993
2015-09-02 13:58:22 -07:00
Amit Arya
20c44fefb7 t6913679: Use fallocate on LOG FILESS
Summary: Use fallocate on LOG FILES to

Test Plan:
make check
+
===check with strace===

[arya@devvm1441 ~/rocksdb] strace -e trace=fallocate ./ldb --db=/tmp/test_new scan
fallocate(3, 01, 0, 4194304)            = 0

Reviewers: sdong, anthony, IslamAbdelRahman, kradhakrishnan, lgalanis, rven, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45969
2015-09-02 11:17:02 -07:00
Yueh-Hsuan Chiang
90415cfebe Fixed a compile warning in linux32 environment.
Summary:
Fixed the following compile warning in linux32 environment.

    ==> linux32: util/sst_dump_tool.cc: In member function ‘int
                 rocksdb::SstFileReader::ShowAllCompressionSizes(size_t)’:
    ==> linux32: util/sst_dump_tool.cc:167:50: warning: format ‘%lu’ expects
                 argument of type ‘long unsigned int’, but argument 3 has type
                 ‘size_t {aka unsigned int}’ [-Wformat=]
    ==> linux32:    fprintf(stdout, "Block Size: %lu\n", block_size);

Test Plan: make sst_dump

Reviewers: anthony, IslamAbdelRahman, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45885
2015-08-31 18:35:12 -07:00
sdong
9d6503f88d Fix arena_test test break using glibc-2.17
Summary: arena_test is failing with glibc-2.17. Make it more robust

Test Plan: Run arena_test using both of glibc-2.17 and 2.2 and make sure both passes.

Reviewers: yhchiang, rven, IslamAbdelRahman, kradhakrishnan, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45879
2015-08-31 16:53:23 -07:00
agiardullo
77a28615ec Support static Status messages
Summary: Provide a way to specify a detailed static error message for a Status without incurring a memcpy.  Let me know what people think of this approach.

Test Plan: added simple test

Reviewers: igor, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D44259
2015-08-31 16:13:29 -07:00
Ari Ekmekji
8b689546b6 Add Subcompactions to Universal Compaction Unit Tests
Summary:
Now that the approach to parallelizing L0-L1 level-based
compactions by breaking the compaction job into subcompactions is
being extended to apply to universal compactions as well, the unit
tests need to account for this and run the universal compaction
tests with subcompactions both enabled and disabled.

Test Plan: make all && make check

Reviewers: sdong, igor, noetzli, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D45657
2015-08-31 12:59:02 -07:00
sdong
3d78eb66bb Arena usage to be calculated using malloc_usable_size()
Summary: malloc_usable_size() gets a better estimation of memory usage. It is already used to calculate block cache memory usage. Use it in arena too.

Test Plan: Run all unit tests

Reviewers: anthony, kradhakrishnan, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43317
2015-08-31 09:39:27 -07:00
sdong
7a0dbdf3ac Add ZSTD (not final format) compression type
Summary: Add ZSTD compression type. The same way as adding LZ4.

Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests.

Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor

Reviewed By: igor

Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45747
2015-08-28 11:01:13 -07:00
Javier González
0886f4f66a Helper functions to support direct IO
Summary:
This patch adds the helper functions and variables to allow a backend
implementing WritableFile to support direct IO when persisting a
memtable.

Test Plan:
Since there is no upstream implementation of WritableFile supporting
direct IO, the new behavior is disabled.

Tests should be provided by the backend implementing WritableFile.
2015-08-27 08:36:56 +02:00
Yueh-Hsuan Chiang
1fb2abae2d ColumnFamilyOptions serialization / deserialization.
Summary:
This patch adds GetStringFromColumnFamilyOptions(), the inverse function
of the existing GetColumnFamilyOptionsFromString(), and improves
the implementation of GetColumnFamilyOptionsFromString().

Test Plan: Add a test in options_test.cc

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: noetzli, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45009
2015-08-26 16:13:56 -07:00
Igor Canadi
5f4166c90e ReadaheadRandomAccessFile -- userspace readahead
Summary:
ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.

We add ReadaheadRandomAccessFile layer only when file is read during compactions.

D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.

Test Plan: make check

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45123
2015-08-26 15:25:59 -07:00
Igor Canadi
16ebe3a2a9 Mmap reads should not return error if reading past file
Summary:
Currently, mmap returns IOError when user tries to read data past the end of the file. This diff changes the behavior. Now, we return just the bytes that we can, and report the size we returned via a Slice result. This is consistent with non-mmap behavior and also pread() system call.

This diff is taken out of D45123.

Test Plan: make check

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45645
2015-08-26 14:51:38 -07:00
Andres Notzli
09d982f9e0 Fix compact_files_example
Summary:
See task #7983654. The example was triggering an assert in compaction job
because the compaction was not marked as manual. With this patch,
CompactionPicker::FormCompaction() marks compactions as manual. This patch
also fixes a couple of typos, adds optimistic_transaction_example to
.gitignore and librocksdb as a dependency for examples. Adding librocksdb as
a dependency makes sure that the examples are built with the latest changes
in librocksdb.

Test Plan: make clean && cd examples && make all && ./compact_files_example

Reviewers: rven, sdong, anthony, igor, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45117
2015-08-25 12:29:44 -07:00
Ari Ekmekji
b6def58f73 Changed 'num_subcompactions' to the more accurate 'max_subcompactions'
Summary:
Up until this point we had DbOptions.num_subcompactions, but
it is semantically more correct to call this max_subcompactions since
we will schedule *up to* DbOptions.max_subcompactions smaller compactions
at a time during a compaction job.

I also added a --subcompactions option to db_bench

Test Plan: make all   make check

Reviewers: sdong, igor, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D45069
2015-08-21 14:25:34 -07:00
sdong
9130873a13 Add options.new_table_reader_for_compaction_inputs
Summary: Currently compaction inputs share the same file descriptor and table reader as other foreground threads. It makes fadvise works less predictable. Add options.new_table_reader_for_compaction_inputs to enforce to create a new file descriptor and new table reader for it.

Test Plan: Add the option.

Reviewers: rven, anthony, kradhakrishnan, IslamAbdelRahman, igor, yhchiang

Reviewed By: igor

Subscribers: igor, MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43311
2015-08-21 08:46:29 -07:00
Islam AbdelRahman
3fd70b05b8 Rate limit deletes issued by DestroyDB
Summary: Update DestroyDB so that all SST files in the first path id go through DeleteScheduler instead of being deleted immediately

Test Plan: added a unittest

Reviewers: igor, yhchiang, anthony, kradhakrishnan, rven, sdong

Reviewed By: sdong

Subscribers: jeanxu2012, dhruba

Differential Revision: https://reviews.facebook.net/D44955
2015-08-19 15:02:17 -07:00
Yueh-Hsuan Chiang
9ec9571593 DBOptions serialization and deserialization
Summary:
This patch implements DBOptions deserialization and improve
the current implementation of DBOptions serialization by
using a static structure that stores the offset of each
DBOptions member variables to perform serialization and
deserialization instead of using tons of if-then-branch
to determine the mapping between string and variables.

Test Plan: Added test in options_test.cc

Reviewers: igor, anthony, sdong, IslamAbdelRahman

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44097
2015-08-18 13:30:18 -07:00
Yueh-Hsuan Chiang
b2df20a890 Make HashCuckooRep::ApproximateMemoryUsage() return reasonable estimation.
Summary:
HashCuckooRep::ApproximateMemoryUsage() previously return
std::numeric_limits<size_t>::max() when it cannot accept more
entries.  This patch makes it return a more reasonable estimation.

This change is necessary in order to make GetIntProperty("rocksdb.cur-size-all-mem-tables")
handles HashCuckooRep properly in diff https://reviews.facebook.net/D44229.

Test Plan: db_test

Reviewers: sdong, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D44241
2015-08-18 13:19:55 -07:00
Ari Ekmekji
f0da6977a3 [Parallel L0-L1 Compaction Prep]: Giving Subcompactions Their Own State
Summary:
In prepration for running multiple threads at the same time during
a compaction job, this patch assigns each subcompaction its own state
(instead of sharing the one global CompactionState). Each subcompaction then
uses this state to update its statistics, keep track of its snapshots, etc.
during the course of execution. Then at the end of all the executions the
statistics are aggregated across the subcompactions so that the final result
is the same as if only one larger compaction had run.

Test Plan: ./db_test  ./db_compaction_test  ./compaction_job_test

Reviewers: sdong, anthony, igor, noetzli, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43239
2015-08-18 11:06:23 -07:00
Andres Notzli
f32a572099 Simplify querying of merge results
Summary:
While working on supporting mixing merge operators with
single deletes ( https://reviews.facebook.net/D43179 ),
I realized that returning and dealing with merge results
can be made simpler. Submitting this as a separate diff
because it is not directly related to single deletes.

Before, callers of merge helper had to retrieve the merge
result in one of two ways depending on whether the merge
was successful or not (success = result of merge was single
kTypeValue). For successful merges, the caller could query
the resulting key/value pair and for unsuccessful merges,
the result could be retrieved in the form of two deques of
keys and values. However, with single deletes, a successful merge
does not return a single key/value pair (if merge
operands are merged with a single delete, we have to generate
a value and keep the original single delete around to make
sure that we are not accidentially producing a key overwrite).
In addition, the two existing call sites of the merge
helper were taking the same actions independently from whether
the merge was successful or not, so this patch simplifies that.

Test Plan: make clean all check

Reviewers: rven, sdong, yhchiang, anthony, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43353
2015-08-17 17:34:38 -07:00
sdong
72613657f0 Measure file read latency histogram per level
Summary: In internal stats, remember read latency histogram, if statistics is enabled. It can be retrieved from DB::GetProperty() with "rocksdb.dbstats" property, if it is enabled.

Test Plan: Manually run db_bench and prints out "rocksdb.dbstats" by hand and make sure it prints out as expected

Reviewers: igor, IslamAbdelRahman, rven, kradhakrishnan, anthony, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44193
2015-08-14 17:32:42 -07:00
sdong
603b6da8b8 Add options.compaction_measure_io_stats to print write I/O stats in compactions
Summary:
Add options.compaction_measure_io_stats to print out / pass to listener accumulated time spent on write calls. Example outputs in info logs:

2015/08/12-16:27:59.463944 7fd428bff700 (Original Log Time 2015/08/12-16:27:59.463922) EVENT_LOG_v1 {"time_micros": 1439422079463897, "job": 6, "event": "compaction_finished", "output_level": 1, "num_output_files": 4, "total_output_size": 6900525, "num_input_records": 111483, "num_output_records": 106877, "file_write_nanos": 15663206, "file_range_sync_nanos": 649588, "file_fsync_nanos": 349614797, "file_prepare_write_nanos": 1505812, "lsm_state": [2, 4, 0, 0, 0, 0, 0]}

Add two more counters in iostats_context.

Also add a parameter of db_bench.

Test Plan: Add a unit test. Also manually verify LOG outputs in db_bench

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D44115
2015-08-13 16:52:26 -07:00