Commit Graph

869 Commits

Author SHA1 Message Date
sdong
000836a880 CompactionFilter::Context to contain column family ID
Summary: Add the column family ID to compaction filter context, so it is easier for compaction filter to apply different logic for different column families.

Test Plan: Add a unit test to verify the column family ID passed is correct.

Reviewers: rven, yhchiang, igor

Reviewed By: igor

Subscribers: yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48357
2015-10-08 11:27:38 -07:00
Yueh-Hsuan Chiang
3a0bf873b5 Change RocksDB version to 4.1
Summary: Change RocksDB version to 4.1

Test Plan: no code change.

Reviewers: sdong, anthony, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D48387
2015-10-08 11:15:18 -07:00
Igor Canadi
9803e0d813 compaction_filter.h cleanup
Summary:
Two changes:
1. remove *V2 filter stuff. we deprecated that a while ago
2. clarify what happens when user sets max_subcompactions to bigger than 1

Test Plan: none

Reviewers: yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47871
2015-10-08 09:32:50 -07:00
Islam AbdelRahman
51fa7ecec5 Bytes read/written from cache statistics
Summary: Add 2 new counters BLOCK_CACHE_BYTES_WRITE, BLOCK_CACHE_BYTES_READ to keep track of how many bytes were written to the cache and how many bytes that we read from cache

Test Plan: make check

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48195
2015-10-07 15:17:20 -07:00
dyniusz
a065cdb388 bloom hit/miss stats for SST and memtable
Summary:
	hit and miss bloom filter stats for memtable and SST
	stats added to perf_context struct
	key matches and prefix matches combined into one stat

Test Plan: unit test veryfing the functionality added, see BloomStatsTest in db_test.cc for details

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47859
2015-10-07 11:23:20 -07:00
Lakshmi Narayanan
4049bcde39 Added boolean variable to guard fallocate() calls
Summary:
Added boolean variable to guard fallocate() calls.
Set to false to prevent space leaks when tests fail.

Test Plan:
Compliles
Set to false and ran log device tests

Reviewers: sdong, lovro, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48027
2015-10-07 10:04:05 -07:00
Igor Canadi
d80ce7f99a Compaction filter on merge operands
Summary:
Since Andres' internship is over, I took over https://reviews.facebook.net/D42555 and rebased and simplified it a bit.

The behavior in this diff is a bit simpler than in D42555:
* only merge operators are passed through FilterMergeValue(). If fitler function returns true, the merge operator is ignored
* compaction filter is *not* called on: 1) results of merge operations and 2) base values that are getting merged with merge operands (the second case was also true in previous diff)

Do we also need a compaction filter to get called on merge results?

Test Plan: make && make check

Reviewers: lovro, tnovak, rven, yhchiang, sdong

Reviewed By: sdong

Subscribers: noetzli, kolmike, leveldb, dhruba, sdong

Differential Revision: https://reviews.facebook.net/D47847
2015-10-07 09:30:03 -07:00
Islam AbdelRahman
9babaeed16 Update dump_tool and undump_tool to accept Options
Summary:
Refactor dump_tool and undump_tool so that it's possible to use them with customized options
for example setting a specific comparator similar to what Dragon is doing with the LdbTool

https://phabricator.fb.com/diffusion/FBCODE/browse/master/dragon/tools/Ldb.cpp

Test Plan:
compiles
used it to dump / undump a dragon shard

Reviewers: sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, adsharma

Differential Revision: https://reviews.facebook.net/D47853
2015-10-05 19:49:48 -07:00
Igor Canadi
115427ef63 Add APIs PauseBackgroundWork() and ContinueBackgroundWork()
Summary:
To support a new MongoDB capability, we need to make sure that we don't do any IO for a short period of time. For background, see:
* https://jira.mongodb.org/browse/SERVER-20704
* https://jira.mongodb.org/browse/SERVER-18899

To implement that, I add a new API calls PauseBackgroundWork() and ContinueBackgroundWork() which reuse the capability we already have in place for RefitLevel() function.

Test Plan: Added a new test in db_test. Made sure that test fails when PauseBackgroundWork() is commented out.

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47901
2015-10-02 13:17:34 -07:00
agiardullo
03b08ba9a9 Return MergeInProgress when fetching from transactions or WBWI with overwrite_key
Summary:
WriteBatchWithIndex::GetFromBatchAndDB only works correctly for overwrite_key=false.  Transactions use overwrite_key=true (since WriteBatchWithIndex::GetIteratorWithBase only works when overwrite_key=true).  So currently, Transactions could return incorrectly merged results when calling Get/GetForUpdate().

Until a permanent fix can be put in place, Transaction::Get[ForUpdate] and WriteBatchWithIndex::GetFromBatch[AndDB] will now return MergeInProgress if the most recent write to a key in the batch is a Merge.

Test Plan: more tests

Reviewers: sdong, yhchiang, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47817
2015-09-30 11:14:42 -07:00
Yueh-Hsuan Chiang
74b100ac17 RocksDB Options file format and its serialization / deserialization.
Summary:
This patch defines the format of RocksDB options file, which
follows the INI file format, and implements functions for its
serialization and deserialization.  An example RocksDB options
file can be found in examples/rocksdb_option_file_example.ini.

A typical RocksDB options file has three sections, which are
Version, DBOptions, and more than one CFOptions.  The RocksDB
options file in general follows the basic INI file format
with the following extensions / modifications:
 * Escaped characters
   We escaped the following characters:
    - \n -- line feed - new line
    - \r -- carriage return
    - \\ -- backslash \
    - \: -- colon symbol :
    - \# -- hash tag #
 * Comments
   We support # style comments.  Comments can appear at the ending
   part of a line.
 * Statements
   A statement is of the form option_name = value.
   Each statement contains a '=', where extra white-spaces
   are supported. However, we don't support multi-lined statement.
   Furthermore, each line can only contain at most one statement.
 * Section
   Sections are of the form [SecitonTitle "SectionArgument"],
   where section argument is optional.
 * List
   We use colon-separated string to represent a list.
   For instance, n1:n2:n3:n4 is a list containing four values.

Below is an example of a RocksDB options file:

[Version]
  rocksdb_version=4.0.0
  options_file_version=1.0
[DBOptions]
  max_open_files=12345
  max_background_flushes=301
[CFOptions "default"]
[CFOptions "the second column family"]
[CFOptions "the third column family"]

Test Plan: Added many tests in options_test.cc

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46059
2015-09-29 14:42:40 -07:00
Igor Canadi
1eff1834b2 Remove non-existing functions. Closes #680
Summary: See https://github.com/facebook/rocksdb/issues/680

Test Plan: compiles

Reviewers: sdong, yhchiang, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47721
2015-09-29 09:34:42 -07:00
agiardullo
afe0dc539b SingleDelete support for Transactions
Summary: Transactional SingleDelete is needed for MyRocks.  Note: This diff requires D47529.

Test Plan: Added some new tests in this diff as well as more tests added in D47529

Reviewers: rven, sdong, igor, yhchiang

Reviewed By: yhchiang

Subscribers: yoshinorim, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47535
2015-09-28 12:14:26 -07:00
agiardullo
25fd743d75 Fix SingleDelete support in WriteBatchWithIndex
Summary: Fixed some  bugs in using SingleDelete on a WriteBatchWithIndex and added some tests.

Test Plan: new tests

Reviewers: sdong, yhchiang, rven, kradhakrishnan, IslamAbdelRahman, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D47529
2015-09-25 12:23:07 -07:00
Islam AbdelRahman
f03b5c987b Add experimental DB::AddFile() to plug sst files into empty DB
Summary:
This is an initial version of bulk load feature

This diff allow us to create sst files, and then bulk load them later, right now the restrictions for loading an sst file are
(1) Memtables are empty
(2) Added sst files have sequence number = 0, and existing values in database have sequence number = 0
(3) Added sst files values are not overlapping

Test Plan: unit testing

Reviewers: igor, ott, sdong

Reviewed By: sdong

Subscribers: leveldb, ott, dhruba

Differential Revision: https://reviews.facebook.net/D39081
2015-09-23 12:42:43 -07:00
sdong
f1b9f804e9 Add a mode to always pick the oldest file to compact for each level
Summary:
Add options.compaction_pri, which specifies the policy about which file to compact first.
kCompactionPriByLargestSeq will compact oldest files first.
Verified the behavior in db_bench but did not write unit tests yet. Also need to make it settable through option string and dynamically changeable.

Test Plan: Will write unit tests

Reviewers: igor, rven, anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, MarkCallaghan

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45951
2015-09-21 17:21:59 -07:00
Igor Canadi
199744f4c4 Merge pull request #728 from jsteemann/fix-missing-include-header
add missing header required for std::function
2015-09-21 19:50:00 +02:00
jsteemann
669b892f97 add missing header required for std::function
otherwise Visual Studio will have trouble compiling this file
2015-09-18 22:18:40 +02:00
jsteemann
624ef456dd fixed formatting. thanks @4tXJ7f for pointing me at make format 2015-09-18 22:03:47 +02:00
jsteemann
4704833357 pass input string to WriteBatch() by const reference
this may lead to copying less data (in case compilers don't
optimize away copying the string by themselves)
2015-09-18 20:20:32 +02:00
Andres Noetzli
014fd55adc Support for SingleDelete()
Summary:
This patch fixes #7460559. It introduces SingleDelete as a new database
operation. This operation can be used to delete keys that were never
overwritten (no put following another put of the same key). If an overwritten
key is single deleted the behavior is undefined. Single deletion of a
non-existent key has no effect but multiple consecutive single deletions are
not allowed (see limitations).

In contrast to the conventional Delete() operation, the deletion entry is
removed along with the value when the two are lined up in a compaction. Note:
The semantics are similar to @igor's prototype that allowed to have this
behavior on the granularity of a column family (
https://reviews.facebook.net/D42093 ). This new patch, however, is more
aggressive when it comes to removing tombstones: It removes the SingleDelete
together with the value whenever there is no snapshot between them while the
older patch only did this when the sequence number of the deletion was older
than the earliest snapshot.

Most of the complex additions are in the Compaction Iterator, all other changes
should be relatively straightforward. The patch also includes basic support for
single deletions in db_stress and db_bench.

Limitations:
- Not compatible with cuckoo hash tables
- Single deletions cannot be used in combination with merges and normal
  deletions on the same key (other keys are not affected by this)
- Consecutive single deletions are currently not allowed (and older version of
  this patch supported this so it could be resurrected if needed)

Test Plan: make all check

Reviewers: yhchiang, sdong, rven, anthony, yoshinorim, igor

Reviewed By: igor

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43179
2015-09-17 11:42:56 -07:00
jsteemann
f8b770a942 fixed typos 2015-09-17 19:58:09 +02:00
Dmytro Okhonko
31a27a3606 Callback for informing backup downloading added
Summary:
In case of huge db backup infromation about progress of downloading would help.
New callback parameter in CreateNewBackup() function will trigger whenever a some amount of data downloaded.
Task: 8057631

Test Plan:
ProgressCallbackDuringBackup test that cover new functionality added to BackupableDBTest tests.
other test succeed as well.

Reviewers: Guenena, benj, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46575
2015-09-15 17:08:30 -07:00
Ari Ekmekji
2b683d4972 Add DBOption.max_subcompaction to option dump
Summary:
RocksDB options can be dumped to the log file, and
up to this point the max_subcompactions option was not included
in this dump. This fixes that.

Test Plan: makek all && make check

Reviewers: MarkCallaghan, igor, noetzli, anthony, yhchiang, sdong

Reviewed By: yhchiang, sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46971
2015-09-15 11:41:46 -07:00
Yoshinori Matsunobu
4886073174 Adding Slice::difference_offset() function
Summary:
There are some use cases in MyRocks to compare two slices
and to return the first byte where they differ. It may be
useful to add it as a RocksDB Slice function.

Test Plan: db_test

Reviewers: sdong, rven, igor

Reviewed By: igor

Subscribers: jkedgar, dhruba

Differential Revision: https://reviews.facebook.net/D46935
2015-09-15 10:32:42 -07:00
sdong
5de807ac16 Add options.hard_pending_compaction_bytes_limit to stop writes if compaction lagging behind
Summary: Add an option to stop writes if compaction lefts behind. If estimated pending compaction bytes is more than threshold specified by options.hard_pending_compaction_bytes_liimt, writes will stop until compactions are cleared to under the threshold.

Test Plan: Add unit test DBTest.HardLimit

Reviewers: rven, kradhakrishnan, anthony, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: MarkCallaghan, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45999
2015-09-14 12:51:16 -07:00
Siying Dong
592f6bf782 Merge pull request #716 from yuslepukhin/refactor_file_reader_writer_win
Refactor to support file_reader_writer on Windows.
2015-09-14 12:29:01 -07:00
Ari Ekmekji
03ddce9a01 Add counters for L0 stall while L0-L1 compaction is taking place
Summary:
Although there are currently counters to keep track of the
stall caused by having too many L0 files, there is no distinction as
to whether when that stall occurs either (A) L0-L1 compaction is taking
place to try and mitigate it, or (B) no L0-L1 compaction has been scheduled
at the moment. This diff adds a counter for (A) so that the nature of L0
stalls can be better understood.

Test Plan: make all && make check

Reviewers: sdong, igor, anthony, noetzli, yhchiang

Reviewed By: yhchiang

Subscribers: MarkCallaghan, dhruba

Differential Revision: https://reviews.facebook.net/D46749
2015-09-14 11:03:37 -07:00
Dmitri Smirnov
ddc8b44998 Address code review comments both GH and internal
Fix compilation issues on GCC/CLANG
 Address Windows Release test build issues due to Sync
2015-09-11 17:36:48 -07:00
Manuel Ung
aeb4612685 Add counters for seek/next/prev
Summary:
There are currently no statistics on seeks, only on gets. This adds the following counters:

rocksdb.number.db.seek
rocksdb.number.db.next
rocksdb.number.db.prev
(number of calls)

rocksdb.db.iterate.bytes.read
(number of bytes read from key + value using seek/next/prev)

rocksdb.number.keys.seek.found
rocksdb.number.keys.next.found
rocksdb.number.keys.prev.found
(number of calls where seek/next/prev found a value)

Test Plan:
./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5
./db_bench -statistics -benchmarks fillrandom,seekrandom -seek_nexts 5 -reverse_iterator

Reviewers: yhchiang, rven, kradhakrishnan, IslamAbdelRahman, MarkCallaghan, sdong, igor

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46605
2015-09-11 11:37:44 -07:00
Islam AbdelRahman
45e9e4f0bb Refactor NewTableReader to accept TableReaderOptions
Summary:
Refactoring NewTableReader to accept TableReaderOptions
This will make it easier to add new options in the future, for example in this diff https://reviews.facebook.net/D46071

Test Plan: run existing tests

Reviewers: igor, yhchiang, anthony, rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46179
2015-09-11 11:36:33 -07:00
Dmitri Smirnov
30e82d5c41 Refactor to support file_reader_writer on Windows.
Summary. A change https://reviews.facebook.net/differential/diff/224721/
  Has attempted to move common functionality out of platform dependent
  code to a new facility called file_reader_writer.
  This includes:
  - perf counters
  - Buffering
  - RateLimiting

  However, the change did not attempt to refactor Windows code.
  To mitigate, we introduce new quering interfaces such as UseOSBuffer(),
  GetRequiredBufferAlignment() and ReaderWriterForward()
  for pure forwarding where required.
  Introduce WritableFile got a new method Truncate(). This is to communicate
  to the file as to how much data it has on close.
   - When space is pre-allocated on Linux it is filled with zeros implicitly,
    no such thing exist on Windows so we must truncate file on close.
   - When operating in unbuffered mode the last page is filled with zeros but we still want to truncate.

   Previously, Close() would take care of it but now buffer management is shifted to the wrappers and the file has
   no idea about the file true size.

   This means that Close() on the wrapper level must always include
   Truncate() as well as wrapper __dtor should call Close() and
   against double Close().
   Move buffered/unbuffered write logic to the wrapper.
   Utilize Aligned buffer class.
   Adjust tests and implement Truncate() where necessary.
   Come up with reasonable defaults for new virtual interfaces.
   Forward calls for RandomAccessReadAhead class to avoid double
   buffering and locking (double locking in unbuffered mode on WIndows).
2015-09-11 09:57:02 -07:00
Ari Ekmekji
3c37b3cccd Determine boundaries of subcompactions
Summary:
Up to this point, the subcompactions that make up a compaction
job have been divided based on the key range of the L1 files, and each
subcompaction has handled the key range of only one file. However
DBOption.max_subcompactions allows the user to designate how many
subcompactions at most to perform. This patch updates the
CompactionJob::GetSubcompactionBoundaries() to determine these
divisions accordingly based on that option and other input/system factors.

The current approach orders the starting and/or ending keys of certain
compaction input files and then generates a histogram to approximate the
size covered by the key range between each consecutive pair of keys. Then
it groups these ranges into groups so that the sizes are approximately equal
to one another. The approach has also been adapted to work for universal
compaction as well instead of just for level-based compaction as it was before.

These subcompactions are then executed in parallel by locally spawning
threads, one for each. The results are then aggregated and the compaction
completed.

Test Plan: make all && make check

Reviewers: yhchiang, anthony, igor, noetzli, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43269
2015-09-10 13:50:00 -07:00
Igor Canadi
ac9bcb55ce Set max_open_files based on ulimit
Summary: We should never set max_open_files to be bigger than the system's ulimit. Otherwise we will get "Too many open files" errors. See an example in this Travis run: https://travis-ci.org/facebook/rocksdb/jobs/79591566

Test Plan:
make check

I will also verify that max_max_open_files is reasonable.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46551
2015-09-10 10:49:28 -07:00
Islam AbdelRahman
f3f2032c46 Release RocksDB 4.0.0
Summary: Release RocksDB 4.0.0

Test Plan: no test

Reviewers: sdong, yhchiang, anthony, rven, kradhakrishnan, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46515
2015-09-09 16:01:03 -07:00
agiardullo
44b6e99e1b update max_write_buffer_number_to_maintain docblock
Summary: fix comment

Test Plan: n/a

Reviewers: sdong, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46455
2015-09-09 13:36:38 -07:00
agiardullo
aa6eed0c1e Transaction stats
Summary: Added funtions to fetch the number of locked keys in a transaction, the number of pending puts/merge/deletes, and the elapsed time

Test Plan: unit tests

Reviewers: yoshinorim, jkedgar, rven, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45417
2015-09-09 13:35:53 -07:00
agiardullo
b5b2b75e52 better tuning of arena block size
Summary: Currently, if users didn't set options.arena_block_size, we set "result.arena_block_size = result.write_buffer_size / 10". It makes result.arena_block_size not a multiplier of 4KB, even if options.write_buffer_size is a multiplier of MBs. When calling malloc to arena_block_size, we may waste a small amount of memory for it. We now make the default to be /8 or /16 and align it to 4KB.

Test Plan: unit tests

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46467
2015-09-08 20:53:32 -07:00
agiardullo
5e94f68f35 TransactionDB Custom Locking API
Summary:
Prototype of API to allow MyRocks to override default Mutex/CondVar used by transactions with their own implementations.  They would simply need to pass their own implementations of Mutex/CondVar to the templated TransactionDB::Open().

Default implementation of TransactionDBMutex/TransactionDBCondVar provided (but the code is not currently changed to use this).

Let me know if this API makes sense or if it should be changed

Test Plan: n/a

Reviewers: yhchiang, rven, igor, sdong, spetrunia

Reviewed By: spetrunia

Subscribers: maykov, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D43761
2015-09-08 17:03:57 -07:00
Andres Noetzli
6bdc484fd8 Added Equal method to Comparator interface
Summary:
In some cases, equality comparisons can be done more efficiently than three-way
comparisons. There are quite a few places in the code where we only care about
equality. This patch adds an Equal() method that defaults to using the
Compare() method.

Test Plan: make clean all check

Reviewers: rven, anthony, yhchiang, igor, sdong

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D46233
2015-09-08 15:30:49 -07:00
Mayank Pundir
d9f42aa60d Adding a verifyBackup method to BackupEngine
Summary: This diff adds a verifyBackup method to BackupEngine. The method verifies the name and size of each file in the backup.

Test Plan: Unit test cases created and passing.

Reviewers: igor, benj

Subscribers: zelaine.fong, yhchiang, sdong, lgalanis, dhruba, AaronFeldman

Differential Revision: https://reviews.facebook.net/D46029
2015-09-03 17:27:21 -07:00
Paul Marinescu
50dc5f0c5a Replace BackupRateLimiter with GenericRateLimiter
Summary: BackupRateLimiter removed and uses replaced with the existing GenericRateLimiter

Test Plan:
make all check

make clean
USE_CLANG=1 make all

make clean
OPT=-DROCKSDB_LITE make release

Reviewers: leveldb, igor

Reviewed By: igor

Subscribers: igor, dhruba

Differential Revision: https://reviews.facebook.net/D46095
2015-09-03 17:00:09 -07:00
agiardullo
0f1aab6c12 Add SetLockTimeout for Transactions
Summary: MyRocks wants to be able to change the lock timeout of a transaction that has already started.  Expose existing SetLockTimeout function to users.

Test Plan: unit test

Reviewers: spetrunia, rven, sdong, yhchiang, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45987
2015-09-02 20:07:19 -07:00
Andres Noetzli
3c9cef1eed Unified maps with Comparator for sorting, other cleanup
Summary:
This diff is a collection of cleanups that were initially part of D43179.
Additionally it adds a unified way of defining key-value maps that use a
Comparator for sorting (this was previously implemented in four different
places).

Test Plan: make clean check all

Reviewers: rven, anthony, yhchiang, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45993
2015-09-02 13:58:22 -07:00
Dmitri Smirnov
f14c3363e1 Make WinEnv::NowMicros return system time
Previous change for the function
  555ca3e7b7 (diff-bdc04e0404c2db4fd3ac5118a63eaa4a)
  made use of the QueryPerformanceCounter to return microseconds values that do not repeat
  as std::chrono::system_clock returned values that made auto_roll_logger_test fail.

 The interface documentation does not state that we need to return
 system time describing the return value as a number of microsecs since some
 moment in time. However, because on Linux it is implemented using gettimeofday
 various pieces of code (such as GenericRateLimiter) took advantage of that
 and make use of NowMicros() as a system timestamp. Thus the previous change
 broke rate_limiter_test on Windows.

 In addition, the interface name NowMicros() suggests that it is actually
 a timestamp so people use it as such.

 This change makes use of the new system call on Windows that returns
 system time with required precision. This change preserves the fix
 for  auto_roll_logger_test and fixes rate_limiter_test.

 Note that DBTest.RateLimitingTest still fails due to a separately reported issue.
2015-09-02 11:12:07 -07:00
agiardullo
77a28615ec Support static Status messages
Summary: Provide a way to specify a detailed static error message for a Status without incurring a memcpy.  Let me know what people think of this approach.

Test Plan: added simple test

Reviewers: igor, yhchiang, rven, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D44259
2015-08-31 16:13:29 -07:00
sdong
7a0dbdf3ac Add ZSTD (not final format) compression type
Summary: Add ZSTD compression type. The same way as adding LZ4.

Test Plan: run all tests. Generate files in db_bench. Make sure reads succeed. But the SST files cannot be opened in older versions. Also some other adhoc tests.

Reviewers: rven, anthony, IslamAbdelRahman, kradhakrishnan, igor

Reviewed By: igor

Subscribers: MarkCallaghan, maykov, yoshinorim, leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45747
2015-08-28 11:01:13 -07:00
Javier González
0886f4f66a Helper functions to support direct IO
Summary:
This patch adds the helper functions and variables to allow a backend
implementing WritableFile to support direct IO when persisting a
memtable.

Test Plan:
Since there is no upstream implementation of WritableFile supporting
direct IO, the new behavior is disabled.

Tests should be provided by the backend implementing WritableFile.
2015-08-27 08:36:56 +02:00
Yueh-Hsuan Chiang
1fb2abae2d ColumnFamilyOptions serialization / deserialization.
Summary:
This patch adds GetStringFromColumnFamilyOptions(), the inverse function
of the existing GetColumnFamilyOptionsFromString(), and improves
the implementation of GetColumnFamilyOptionsFromString().

Test Plan: Add a test in options_test.cc

Reviewers: igor, sdong, anthony, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: noetzli, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D45009
2015-08-26 16:13:56 -07:00
Igor Canadi
5f4166c90e ReadaheadRandomAccessFile -- userspace readahead
Summary:
ReadaheadRandomAccessFile acts as a transparent layer on top of RandomAccessFile. When a Read() request is issued, it issues a much bigger request to the OS and caches the result. When a new request comes in and we already have the data cached, it doesn't have to issue any requests to the OS.

We add ReadaheadRandomAccessFile layer only when file is read during compactions.

D45105 was incorrectly closed by Phabricator because I committed it to a separate branch (not master), so I'm resubmitting the diff.

Test Plan: make check

Reviewers: MarkCallaghan, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D45123
2015-08-26 15:25:59 -07:00