Commit Graph

394 Commits

Author SHA1 Message Date
Sagar Vemuri
12b6cdeed3 Trace and Replay for RocksDB (#3837)
Summary:
A framework for tracing and replaying RocksDB operations.

A binary trace file is created by capturing the DB operations, and it can be replayed back at the same rate using db_bench.

- Column-families are supported
- Multi-threaded tracing is supported.
- TraceReader and TraceWriter are exposed to the user, so that tracing to various destinations can be enabled (say, to other messaging/logging services). By default, a FileTraceReader and FileTraceWriter are implemented to capture to a file and replay from it.
- This is not yet ideal to be enabled in production due to large performance overhead, but it can be safely tried out in a shadow setup, say, for analyzing RocksDB operations.

Currently supported DB operations:
- Writes:
-- Put
-- Merge
-- Delete
-- SingleDelete
-- DeleteRange
-- Write
- Reads:
-- Get (point lookups)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3837

Differential Revision: D7974837

Pulled By: sagar0

fbshipit-source-id: 8ec65aaf336504bc1f6ed0feae67f6ed5ef97a72
2018-08-01 00:27:08 -07:00
Fenggang Wu
8805ec2f49 DataBlockHashIndex: Standalone Implementation with Unit Test (#4139)
Summary:
The first step of the `DataBlockHashIndex` implementation. A string based hash table is implemented and unit-tested.

`DataBlockHashIndexBuilder`: `Add()` takes pairs of `<key, restart_index>`, and formats it into a string when `Finish()` is called.
`DataBlockHashIndex`: initialized by the formatted string, and can interpret it as a hash table. Lookup for a key is supported by iterator operation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4139

Reviewed By: sagar0

Differential Revision: D8866764

Pulled By: fgwu

fbshipit-source-id: 7f015f0098632c65979a22898a50424384730b10
2018-07-24 11:43:37 -07:00
Chang Su
374c37da5b move static msgs out of Status class (#4144)
Summary:
The member msgs of class Status contains all types of status messages.
When users dump a Status object, msgs will confuse users. So move it out
of class Status by making it as file-local static variable.

Closes #3831 .
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4144

Differential Revision: D8941419

Pulled By: sagar0

fbshipit-source-id: 56b0510258465ff26db15aa6b04e01532e053e3d
2018-07-23 15:44:16 -07:00
Dmitri Smirnov
78ab11cd71 Return new operator for Status allocations for Windows (#4128)
Summary: Windows requires new/delete for memory allocations to be overriden. Refactor to be less intrusive.

Differential Revision: D8878047

Pulled By: siying

fbshipit-source-id: 35f2b5fec2f88ea48c9be926539c6469060aab36
2018-07-19 15:09:06 -07:00
Siying Dong
ddc07b40fc Remove managed iterator
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4124

Differential Revision: D8829910

Pulled By: siying

fbshipit-source-id: f3e952ccf3a631071a5d77c48e327046f8abb560
2018-07-17 14:43:18 -07:00
Tamir Duberstein
7bee48bdbd Add GCC 8 to Travis (#3433)
Summary:
- Avoid `strdup` to use jemalloc on Windows
- Use `size_t` for consistency
- Add GCC 8 to Travis
- Add CMAKE_BUILD_TYPE=Release to Travis
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3433

Differential Revision: D6837948

Pulled By: sagar0

fbshipit-source-id: b8543c3a4da9cd07ee9a33f9f4623188e233261f
2018-07-13 10:58:06 -07:00
Anand Ananthabhotla
52d4c9b7f6 Allow DB resume after background errors (#3997)
Summary:
Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
3. Provide an API for the user to clear the error and resume the DB instance

This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
Closes https://github.com/facebook/rocksdb/pull/3997

Differential Revision: D8653831

Pulled By: anand1976

fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
2018-06-28 12:34:40 -07:00
Dmitri Smirnov
f4b72d7056 Provide a way to override windows memory allocator with jemalloc for ZSTD
Summary:
Windows does not have LD_PRELOAD mechanism to override all memory allocation functions and ZSTD makes use of C-tuntime calloc. During flushes and compactions default system allocator fragments and the system slows down considerably.

For builds with jemalloc we employ an advanced ZSTD context creation API that re-directs memory allocation to jemalloc. To reduce the cost of context creation on each block we cache ZSTD context within the block based table builder while a new SST file is being built, this will help all platform builds including those w/o jemalloc. This avoids system allocator fragmentation and improves the performance.

The change does not address random reads and currently on Windows reads with ZSTD regress as compared with SNAPPY compression.
Closes https://github.com/facebook/rocksdb/pull/3838

Differential Revision: D8229794

Pulled By: miasantreble

fbshipit-source-id: 719b622ab7bf4109819bc44f45ec66f0dd3ee80d
2018-06-04 12:12:48 -07:00
Manuel Ung
01e3c30def Extend existing unit tests to run with WriteUnprepared as well
Summary:
As titled.

I have not extended the Compatibility tests because the new WAL markers are still unimplemented.
Closes https://github.com/facebook/rocksdb/pull/3941

Differential Revision: D8238394

Pulled By: lth

fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d
2018-06-01 14:58:41 -07:00
Manuel Ung
aaac6cd16f Add write unprepared classes by inheriting from write prepared
Summary: Closes https://github.com/facebook/rocksdb/pull/3907

Differential Revision: D8218325

Pulled By: lth

fbshipit-source-id: ff32d8dab4a159cd2762876cba4b15e3dc51ff3b
2018-05-31 10:47:42 -07:00
Mike Kolupaev
8bf555f487 Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
Summary:
Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
 * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
 * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.

However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).

This PR changes the convention to:
 * If status() is not ok, Valid() always returns false.
 * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)

This does sacrifice the two use cases listed above, but siying said it's ok.

Overview of the changes:
 * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
 * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
 * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.

To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:

Iterators that didn't need changes:
 * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
 * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.

Iterators with changes (see inline comments for details):
 * DBIter - an overhaul:
    - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
    - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
    - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
    - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
    - It used to not reset status on seek for some types of errors.
    - Some simplifications and better comments.
    - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
 * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
 * ForwardIterator - changed to the new convention, also slightly simplified.
 * ForwardLevelIterator - fixed some bugs and simplified.
 * LevelIterator - simplified.
 * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
 * BlockBasedTableIterator - minor changes.
 * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
 * PlainTableIterator - some seeks used to not reset status.
 * CuckooTableIterator - tiny code cleanup.
 * ManagedIterator - fixed some bugs.
 * BaseDeltaIterator - changed to the new convention and fixed a bug.
 * BlobDBIterator - seeks used to not reset status.
 * KeyConvertingIterator - some small change.
Closes https://github.com/facebook/rocksdb/pull/3810

Differential Revision: D7888019

Pulled By: al13n321

fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
2018-05-17 02:56:56 -07:00
Dmitri Smirnov
f92cd2feb4 Introduce and use the option to disable stall notifications structures
Summary:
and code. Removing this helps with insert performance.
Closes https://github.com/facebook/rocksdb/pull/3830

Differential Revision: D7921030

Pulled By: siying

fbshipit-source-id: 84e80d50a7ef96f5441c51c9a0d089c50217cce2
2018-05-09 10:13:53 -07:00
Tongliang Liao
1d9f24dc9a Link jemalloc
Summary:
Fix undefined reference to `malloc_*` linking errors on Linux.
Closes https://github.com/facebook/rocksdb/pull/3817

Differential Revision: D7899066

Pulled By: ajkr

fbshipit-source-id: 18c46569a59608388d6240f1b8ec20c2d2557dec
2018-05-07 14:28:36 -07:00
Tongliang Liao
9470ee45b8 Allows other cmake-specific "true" for USE_RTTI.
Summary:
People also use ON/OFF, TRUE/FALSE and other switch options that is allowed by cmake.
Closes https://github.com/facebook/rocksdb/pull/3814

Differential Revision: D7899032

Pulled By: ajkr

fbshipit-source-id: b71511af59e0a78eedafb639b5002c47050bf3c2
2018-05-07 14:28:36 -07:00
Fosco Marotto
4d40b10e0f Add USE_RTTI and default behavior to CMakeLists
Summary:
Proposed fix for #3701
Closes https://github.com/facebook/rocksdb/pull/3801

Differential Revision: D7868264

Pulled By: gfosco

fbshipit-source-id: 013963ed3d172c8dc2abd1dd5982580082ca5d2d
2018-05-04 15:13:03 -07:00
Siying Dong
d59549298f Skip deleted WALs during recovery
Summary:
This patch record min log number to keep to the manifest while flushing SST files to ignore them and any WAL older than them during recovery. This is to avoid scenarios when we have a gap between the WAL files are fed to the recovery procedure. The gap could happen by for example out-of-order WAL deletion. Such gap could cause problems in 2PC recovery where the prepared and commit entry are placed into two separate WAL and gap in the WALs could result into not processing the WAL with the commit entry and hence breaking the 2PC recovery logic.

Before the commit, for 2PC case, we determined which log number to keep in FindObsoleteFiles(). We looked at the earliest logs with outstanding prepare entries, or prepare entries whose respective commit or abort are in memtable. With the commit, the same calculation is done while we apply the SST flush. Just before installing the flush file, we precompute the earliest log file to keep after the flush finishes using the same logic (but skipping the memtables just flushed), record this information to the manifest entry for this new flushed SST file. This pre-computed value is also remembered in memory, and will later be used to determine whether a log file can be deleted. This value is unlikely to change until next flush because the commit entry will stay in memtable. (In WritePrepared, we could have removed the older log files as soon as all prepared entries are committed. It's not yet done anyway. Even if we do it, the only thing we loss with this new approach is earlier log deletion between two flushes, which does not guarantee to happen anyway because the obsolete file clean-up function is only executed after flush or compaction)

This min log number to keep is stored in the manifest using the safely-ignore customized field of AddFile entry, in order to guarantee that the DB generated using newer release can be opened by previous releases no older than 4.2.
Closes https://github.com/facebook/rocksdb/pull/3765

Differential Revision: D7747618

Pulled By: siying

fbshipit-source-id: d00c92105b4f83852e9754a1b70d6b64cb590729
2018-05-03 15:43:09 -07:00
Dmitri Smirnov
acb61b7a52 Adjust pread/pwrite to return Status
Summary:
Returning bytes_read causes the caller to call GetLastError()
  to report failure but the lasterror may be overwritten by then
  so we lose the error code.
  Fix up CMake file to include xpress source code only when needed.
  Fix warning for the uninitialized var.
Closes https://github.com/facebook/rocksdb/pull/3795

Differential Revision: D7832935

Pulled By: anand1976

fbshipit-source-id: 4be21affb9b85d361b96244f4ef459f492b7cb2b
2018-05-01 13:42:46 -07:00
Kefu Chai
13a0bd90ce cmake: add options for enabling TBB and NUMA support
Summary:
see also https://github.com/facebook/rocksdb/issues/3036

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/3750

Differential Revision: D7765170

Pulled By: ajkr

fbshipit-source-id: 455788b3131bf62a4987a65684b757e68473eed9
2018-04-25 14:26:55 -07:00
Paweł Bylica
69faddb32e CMake: Read rocksdb version from version.h header file
Summary:
This replaces reading the rocksdb version by external shell script. This does not work reliably on Windows (I wander how it works on AppVeyor).
Closes https://github.com/facebook/rocksdb/pull/3737

Differential Revision: D7703106

Pulled By: ajkr

fbshipit-source-id: 4079c7c77431757e9ddc801363ed896b18fdbf23
2018-04-19 17:42:11 -07:00
Kefu Chai
9fcd82e987 cmake: append rados to THIRDPARTY_LIBS before appending it to LIBS
Summary:
otherwise the env_librados_test executable will fail to link against
librados.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/3724

Differential Revision: D7631542

Pulled By: ajkr

fbshipit-source-id: 38afbf21f9aeb7dedfb840aba8b2f8b421f9edb0
2018-04-15 13:27:54 -07:00
Koby Kahane
df14424410 Fix 3-way SSE4.2 crc32c usage in MSVC with CMake
Summary:
The introduction of the 3-way SSE4.2 optimized crc32c implementation in commit f54d7f5fea added the `HAVE_PCLMUL` definition when the compiler supports intrinsics for that instruction, but did not modify CMakeLists.txt to set that definition on MSVC when appropriate. As a result, 3-way SSE4.2 is not used in MSVC builds with CMake although it could be.

Since the existing test program in CMakeLists.txt for `HAVE_SSE42` already uses `_mm_clmulepi64_si128` which is a PCLMUL instruction, this PR sets `HAVE_PCLMUL` as well if that program builds successfully, fixing the problem.
Closes https://github.com/facebook/rocksdb/pull/3673

Differential Revision: D7473975

Pulled By: miasantreble

fbshipit-source-id: bc346b9eb38920e427aa1a253e6dd9811efa269e
2018-04-02 20:42:26 -07:00
Yanqin Jin
1f5def1653 Fix race condition causing double deletion of ssts
Summary:
Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files,  but this PR targets sst.
Also add a unit test to verify that this PR fixes the issue.

The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows
```
timeline              user_thread                background_compaction thread
t1   |                                          FindObsoleteFiles(full_scan=false)
t2   |     FindObsoleteFiles(full_scan=true)
t3   |                                          PurgeObsoleteFiles
t4   |     PurgeObsoleteFiles
     V
```
When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`.
To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files.

ajkr could you take a look and comment? Thanks!
Closes https://github.com/facebook/rocksdb/pull/3638

Differential Revision: D7384372

Pulled By: riversand963

fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3
2018-03-28 10:29:59 -07:00
Dmitri Smirnov
53d66df0c4 Refactor sync_point to make implementation either customizable or replaceable
Summary: Closes https://github.com/facebook/rocksdb/pull/3637

Differential Revision: D7354373

Pulled By: ajkr

fbshipit-source-id: 6816c7bbc192ed0fb944942b11c7074bf24eddf1
2018-03-23 12:56:52 -07:00
Bruce Mitchener
a3a3f5497c Fix some typos in comments and docs.
Summary: Closes https://github.com/facebook/rocksdb/pull/3568

Differential Revision: D7170953

Pulled By: siying

fbshipit-source-id: 9cfb8dd88b7266da920c0e0c1e10fb2c5af0641c
2018-03-08 10:27:25 -08:00
Yi Wu
b864bc9b5b Blob DB: Improve FIFO eviction
Summary:
Improving blob db FIFO eviction with the following changes,
* Change blob_dir_size to max_db_size. Take into account SST file size when computing DB size.
* FIFO now only take into account live sst files and live blob files. It is normal for disk usage to go over max_db_size because there are obsolete sst files and blob files pending deletion.
* FIFO eviction now also evict TTL blob files that's still open. It doesn't evict non-TTL blob files.
* If FIFO is triggered, it will pass an expiration and the current sequence number to compaction filter. Compaction filter will then filter inlined keys to evict those with an earlier expiration and smaller sequence number. So call LSM FIFO.
* Compaction filter also filter those blob indexes where corresponding blob file is gone.
* Add an event listener to listen compaction/flush event and update sst file size.
* Implement DB::Close() to make sure base db, as well as event listener and compaction filter, destruct before blob db.
* More blob db statistics around FIFO.
* Fix some locking issue when accessing a blob file.
Closes https://github.com/facebook/rocksdb/pull/3556

Differential Revision: D7139328

Pulled By: yiwu-arbug

fbshipit-source-id: ea5edb07b33dfceacb2682f4789bea61de28bbfa
2018-03-06 11:57:42 -08:00
Pooya Shareghi
0a2354ca8f Added bytes XOR merge operator
Summary:
Closes https://github.com/facebook/rocksdb/pull/575

I fixed the merge conflicts etc.
Closes https://github.com/facebook/rocksdb/pull/3065

Differential Revision: D7128233

Pulled By: sagar0

fbshipit-source-id: 2c23a48c9f0432c290b0cd16a12fb691bb37820c
2018-03-06 10:27:36 -08:00
Kamalalochana Subbaiah
d6fdd59c63 CMake changes for CRC32 Optimization on PowerPC
Summary: Closes https://github.com/facebook/rocksdb/pull/2869

Differential Revision: D6791359

Pulled By: ajkr

fbshipit-source-id: fdd38df603d84bbcce8d85dd1729d5caa256e6be
2018-01-23 16:57:11 -08:00
topilski
b9873162f0 Fixed get version on windows, moved throwing exceptions into cc file.
Summary:
Fixes for msys2 and mingw, hide exceptions into cpp  file.
Closes https://github.com/facebook/rocksdb/pull/3377

Differential Revision: D6746707

Pulled By: yiwu-arbug

fbshipit-source-id: 456b38df80bc48b8386a2cf87f669b5a4f9999a4
2018-01-18 14:56:56 -08:00
Bartek Wrona
bf6f03f3cd Issue #3370 Broken CMakeLists.txt
Summary:
Issue #3370 Simple fixes to make RocksDB project working also as a submodule of other bigger one.
Closes https://github.com/facebook/rocksdb/pull/3372

Differential Revision: D6729595

Pulled By: ajkr

fbshipit-source-id: eee2589e7a7c4322873dff8510eebd050301c54c
2018-01-16 14:26:50 -08:00
Dmitri Smirnov
84ddbd186a Make Windows dep switches compatible with other builds
Summary:
Make dependacies switches compatible with other OS builds
  TODO: Make find_package work for Windows.
Closes https://github.com/facebook/rocksdb/pull/3322

Differential Revision: D6667637

Pulled By: sagar0

fbshipit-source-id: 5afcd7bbfe69465310a4fbc8e589f01e506b95f5
2018-01-05 14:56:54 -08:00
yingsu00
f54d7f5fea Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**

RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.

This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.

**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.

Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.

1) ReadRandom in db_bench overall metrics

    PER RUN
    Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
    3-way      |  1   | 4.143   | 241387 | 26.7
    3-way      |  2   | 3.775   | 264872 | 29.3
    3-way      | 3    | 4.116   | 242929 | 26.9
    FastCrc32c|1  | 4.037   | 247727 | 27.4
    FastCrc32c|2  | 4.648   | 215166 | 23.8
    FastCrc32c|3  | 4.352   | 229799 | 25.4

     AVG
    Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
    3-way           |     4.01                               |      249,729                 |      27.63
    FastCrc32c  |     4.35                              |     230,897                  |      25.53

 2)   Crc32c computation CPU cost (inclusive samples percentage)
    PER RUN
    Implementation | run |  TotalSamples   | Crc32c percentage
    3-way                 |  1    |  4,572,250,000 | 4.37%
    3-way                 |  2    |  3,779,250,000 | 4.62%
    3-way                 |  3    |  4,129,500,000 | 4.48%
    FastCrc32c       |  1    |  4,663,500,000 | 11.24%
    FastCrc32c       |  2    |  4,047,500,000 | 12.34%
    FastCrc32c       |  3    |  4,366,750,000 | 11.68%

 **# Test Plan**
     make -j64 corruption_test && ./corruption_test
      By default it uses 3-way SSE algorithm

     NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test

    make clean && DEBUG_LEVEL=0 make -j64 db_bench
    make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173

Differential Revision: D6330882

Pulled By: yingsu00

fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-19 18:26:49 -08:00
Siying Dong
2f1a3a4d74 Refactor ReadBlockContents()
Summary:
Divide ReadBlockContents() to multiple sub-functions. Maintaining the input and intermediate data in a new class BlockFetcher.
I hope in general it makes the code easier to maintain.
Another motivation to do it is to clearly divide the logic before file reading and after file reading. The refactor will help us evaluate how can we make I/O async in the future.
Closes https://github.com/facebook/rocksdb/pull/3244

Differential Revision: D6520983

Pulled By: siying

fbshipit-source-id: 338d90bc0338472d46be7a7682028dc9114b12e9
2017-12-11 15:27:32 -08:00
Andrew Kryczka
57056bb606 gflags in cmake on linux
Summary:
We should use it if available otherwise the tools builds never work. Thanks to #3212, we can set -DGFLAGS=1 and it'll be independent of the namespace with which gflags was compiled.
Closes https://github.com/facebook/rocksdb/pull/3214

Differential Revision: D6462214

Pulled By: ajkr

fbshipit-source-id: db4e5f1b905322e3119554a9d01b57532c499384
2017-12-01 18:28:24 -08:00
Nikhil Benesch
3e40a5e832 add missing config checks to CMakeLists.txt
Summary:
Bring CMakeLists.txt back up to parity with build_detect_platform.
Closes https://github.com/facebook/rocksdb/pull/3211

Differential Revision: D6452908

Pulled By: ajkr

fbshipit-source-id: 93f5f336ad7eff6ecf65dec47bfaf114dd24cfb2
2017-11-30 22:57:00 -08:00
Yi Wu
42564ada53 Blob DB: not using PinnableSlice move assignment
Summary:
The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky.
Closes https://github.com/facebook/rocksdb/pull/3164

Differential Revision: D6319201

Pulled By: yiwu-arbug

fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48
2017-11-13 18:12:20 -08:00
Maysam Yabandeh
60d83df23d WritePrepared Txn: Move DB class to its own file
Summary:
Move  WritePreparedTxnDB from pessimistic_transaction_db.h to its own header, write_prepared_txn_db.h
Closes https://github.com/facebook/rocksdb/pull/3114

Differential Revision: D6220987

Pulled By: maysamyabandeh

fbshipit-source-id: 18893fb4fdc6b809fe117dabb544080f9b4a301b
2017-11-02 11:14:30 -07:00
Dmitri Smirnov
682db81385 Enable cacheline_aligned_alloc() to allocate from jemalloc if enabled.
Summary:
Reuse WITH_JEMALLOC option in preparation for module search unification.
  Move jemalloc overrides into a separate .cc
  Remote obsolete JEMALLOC_NOINIT option.
Closes https://github.com/facebook/rocksdb/pull/3078

Differential Revision: D6174826

Pulled By: yiwu-arbug

fbshipit-source-id: 9970a0289b4490272d15853920d9d7531af91140
2017-10-27 13:27:12 -07:00
Dmitri Smirnov
ebab2e2d42 Enable MSVC W4 with a few exceptions. Fix warnings and bugs
Summary: Closes https://github.com/facebook/rocksdb/pull/3018

Differential Revision: D6079011

Pulled By: yiwu-arbug

fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57
2017-10-19 10:57:12 -07:00
Yi Wu
31d3e41810 PinnableSlice move assignment
Summary:
Allow `std::move(pinnable_slice)`.
Closes https://github.com/facebook/rocksdb/pull/2997

Differential Revision: D6036782

Pulled By: yiwu-arbug

fbshipit-source-id: 583fb0419a97e437ff530f4305822341cd3381fa
2017-10-12 18:28:24 -07:00
Kefu Chai
019aa7074c cmake: pass "-msse4.2" to when building crc32c.cc if HAVE_SSE42
Summary:
it turns out that, with older GCC shipped from centos7, the SSE42
intrinsics are not available even with "target" specified. so we
need to pass "-msse42" for checking compiler's sse4.2 support and
for building crc32c.cc which uses sse4.2 intrinsics for crc32.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/2950

Differential Revision: D6032298

Pulled By: siying

fbshipit-source-id: 124c946321043661b3fb0a70b6cdf4c9c5126ab4
2017-10-11 12:26:46 -07:00
Yi Wu
d1b74b0c82 WritePrepared Txn: Compaction/Flush
Summary:
Update Compaction/Flush to support WritePreparedTxnDB: Add SnapshotChecker which is a proxy to query WritePreparedTxnDB::IsInSnapshot. Pass SnapshotChecker to DBImpl on WritePreparedTxnDB open. CompactionIterator use it to check if a key has been committed and if it is visible to a snapshot. In CompactionIterator:
* check if key has been committed. If not, output uncommitted keys AS-IS.
* use SnapshotChecker to check if key is visible to a snapshot when in need.
* do not output key with seq = 0 if the key is not committed.
Closes https://github.com/facebook/rocksdb/pull/2926

Differential Revision: D5902907

Pulled By: yiwu-arbug

fbshipit-source-id: 945e037fdf0aa652dc5ba0ad879461040baa0320
2017-10-06 10:41:53 -07:00
Yi Wu
d1cab2b64e Add ValueType::kTypeBlobIndex
Summary:
Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).

The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().

Changes on blob db side will be in a separate patch.
Closes https://github.com/facebook/rocksdb/pull/2886

Differential Revision: D5838431

Pulled By: yiwu-arbug

fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
2017-10-03 09:11:23 -07:00
Yao Zongyou
8ae81684e9 Update cmake_minimum_required to 2.8.12.
Summary:
Hello,

current master branch declares cmake_minimum_required (VERSION 2.8.11)
but cmake gives the following error:

[  6%] CMake Error at CMakeLists.txt:658 (install):
  install TARGETS given unknown argument "INCLUDES".

CMake Error at src/CMakeLists.txt:658 (install): install TARGETS given unknown argument "INCLUDES".

because this argument not supported on CMake versions prior 2.8.12
Closes https://github.com/facebook/rocksdb/pull/2904

Differential Revision: D5863430

Pulled By: yiwu-arbug

fbshipit-source-id: 0f7230e080add472ad4b87836b3104ea0b971a38
2017-09-19 12:01:09 -07:00
Orgad Shaneh
34ebadf930 Fix MinGW build
Summary:
snprintf is defined as _snprintf, which doesn't exist in the std
namespace.
Closes https://github.com/facebook/rocksdb/pull/2298

Differential Revision: D5070457

Pulled By: yiwu-arbug

fbshipit-source-id: 6e1659ac3e86170653b174578da5a8ed16812cbb
2017-09-19 10:28:26 -07:00
Bernhard M. Wiedemann
82860bd55c Use cmake TIMESTAMP function
Summary:
because it is not only platform independent
but also allows to override the build date
This helps to make ceph builds reproducible (that includes a fork of rockdb in a submodule)

Also adds UTC flag, to be independent of timezone.

Requires cmake-2.8.11+ from 2013
Closes https://github.com/facebook/rocksdb/pull/2848

Differential Revision: D5820189

Pulled By: yiwu-arbug

fbshipit-source-id: e3e8c1550e10e238c173f6c5d9ba15f71ad3ce28
2017-09-12 17:17:42 -07:00
Maysam Yabandeh
26ac24f199 Add more unit test to write_prepared txns
Summary: Closes https://github.com/facebook/rocksdb/pull/2798

Differential Revision: D5724173

Pulled By: maysamyabandeh

fbshipit-source-id: fb6b782d933fb4be315b1a231a6a67a66fdc9c96
2017-08-31 09:41:27 -07:00
Paweł Bylica
c41744270a CMake: Fix formatting
Summary:
This if followup of #2195.
Closes https://github.com/facebook/rocksdb/pull/2772

Differential Revision: D5722495

Pulled By: sagar0

fbshipit-source-id: 169d0cef53b03056ea7b9454954a35c707a67d52
2017-08-28 18:41:29 -07:00
Paweł Bylica
c21ea8f7a6 CMake: Add support for CMake packages
Summary:
Adds support for CMake packages: https://cmake.org/cmake/help/v3.9/manual/cmake-packages.7.html#creating-packages.

This allow using RocksDB by other CMake projects this way:

```
cmake_minimum_required(VERSION 3.5)
project(rdbt)

find_package(RocksDB CONFIG)

add_executable(rdbt test.cpp)
target_link_libraries(rdbt PRIVATE RocksDB::rocksdb)
```
Closes https://github.com/facebook/rocksdb/pull/2773

Differential Revision: D5722587

Pulled By: sagar0

fbshipit-source-id: 0d90dc4a77b42a617cdbe1348a370e719c282b87
2017-08-28 17:14:37 -07:00
Paweł Bylica
f004307e9b CMake improvements
Summary:
- Allow setting custom installation prefix.
- Add option to disable building tests.
Closes https://github.com/facebook/rocksdb/pull/2195

Differential Revision: D5054239

Pulled By: sagar0

fbshipit-source-id: 2de6bef8b7eafed60a830e1796b262f9e6f79da0
2017-08-21 14:56:50 -07:00
Nikhil Benesch
279296f4d8 properly set C[XX]FLAGS during CMake configure-time checks
Summary:
Some compilers require `-std=c++11` for the `cstdint` header to be available. We already have logic to add `-std=c++11` to `CXXFLAGS` when the compiler is not MSVC; simply reorder CMakeLists.txt so that logic happens before the calls to `CHECK_CXX_SOURCE_COMPILES`.

Additionally add a missing `set(CMAKE_REQUIRED_FLAGS, ...)` before a call to `CHECK_C_SOURCE_COMPILES`.
Closes https://github.com/facebook/rocksdb/pull/2535

Differential Revision: D5384244

Pulled By: yiwu-arbug

fbshipit-source-id: 2dbae4297c5d8ab4636e08b1457ffb2d3e37aef4
2017-08-13 21:47:45 -07:00
Jay
185ade4c0c cmake: support more compression type
Summary:
This pr enables linking all the supported compression libraries via cmake.
Closes https://github.com/facebook/rocksdb/pull/2552

Differential Revision: D5620607

Pulled By: yiwu-arbug

fbshipit-source-id: b6949181f305bfdf04a98f898c92fd0caba0c45a
2017-08-13 21:47:45 -07:00
Maysam Yabandeh
bdc056f8aa Refactor PessimisticTransaction
Summary:
This patch splits Commit and Prepare into lock-related logic and db-write-related logic. It moves lock-related logic to PessimisticTransaction to be reused by all children classes and movies the existing impl of db-write-related to PrepareInternal, CommitSingleInternal, and CommitInternal in WriteCommittedTxnImpl.
Closes https://github.com/facebook/rocksdb/pull/2691

Differential Revision: D5569464

Pulled By: maysamyabandeh

fbshipit-source-id: d1b8698e69801a4126c7bc211745d05c636f5325
2017-08-07 16:12:29 -07:00
Maysam Yabandeh
c9804e007a Refactor TransactionDBImpl
Summary:
This opens space for the new implementations of TransactionDBImpl such as WritePreparedTxnDBImpl that has a different policy of how to write to DB.
Closes https://github.com/facebook/rocksdb/pull/2689

Differential Revision: D5568918

Pulled By: maysamyabandeh

fbshipit-source-id: f7eac866e175daf3793ae79da108f65cc7dc7b25
2017-08-05 17:26:15 -07:00
Maysam Yabandeh
c3d5c4d38a Refactor TransactionImpl
Summary:
This patch refactors TransactionImpl by separating the logic for pessimistic concurrency control from the implementation of how to write the data to rocksdb. The existing implementation is named WriteCommittedTxnImpl as it writes committed data to the db. A template named WritePreparedTxnImpl is also added which will be later completed to provide a an alternative implementation.
Closes https://github.com/facebook/rocksdb/pull/2676

Differential Revision: D5549998

Pulled By: maysamyabandeh

fbshipit-source-id: 16298e86b43ca4849324c1f35c731913c6d17bec
2017-08-03 08:57:22 -07:00
Yi Wu
1900771bd2 Dump Blob DB options to info log
Summary:
* Dump blob db options to info log
* Remove BlobDBOptionsImpl to disallow dynamic cast *BlobDBOptions into *BlobDBOptionsImpl. Move options there to be constants or into BlobDBOptions. The dynamic cast is broken after #2645
* Change some of the default options
* Remove blob_db_options.min_blob_size, which is unimplemented. Will implement it soon.
Closes https://github.com/facebook/rocksdb/pull/2671

Differential Revision: D5529912

Pulled By: yiwu-arbug

fbshipit-source-id: dcd58ca981db5bcc7f123b65a0d6f6ae0dc703c7
2017-08-01 13:01:47 -07:00
Islam AbdelRahman
50a969131f CacheActivityLogger, component to log cache activity into a file
Summary:
Simple component that will add a new entry in a log file every time we lookup/insert a key in SimCache.
API:
```
SimCache::StartActivityLogging(<file_name>, <env>, <optional_max_size>)
SimCache::StopActivityLogging()
```

Sending for review, Still need to add more comments.

I was thinking about a better approach, but I ended up deciding I will use a mutex to sync the writes to the file, since this feature should not be heavily used and only used to collect info that will be analyzed offline. I think it's okay to hold the mutex every time we lookup/add to the SimCache.
Closes https://github.com/facebook/rocksdb/pull/2295

Differential Revision: D5063826

Pulled By: IslamAbdelRahman

fbshipit-source-id: f3b5daed8b201987c9a071146ddd5c5740a2dd8c
2017-07-28 12:36:48 -07:00
Yi Wu
6083bc79f8 Blob DB TTL extractor
Summary:
Introducing blob_db::TTLExtractor to replace extract_ttl_fn. The TTL
extractor can be use to extract TTL from keys insert with Put or
WriteBatch. Change over existing extract_ttl_fn are:
* If value is changed, it will be return via std::string* (rather than Slice*). With Slice* the new value has to be part of the existing value. With std::string* the limitation is removed.
* It can optionally return TTL or expiration.

Other changes in this PR:
* replace `std::chrono::system_clock` with `Env::NowMicros` so that I can mock time in tests.
* add several TTL tests.
* other minor naming change.
Closes https://github.com/facebook/rocksdb/pull/2659

Differential Revision: D5512627

Pulled By: yiwu-arbug

fbshipit-source-id: 0dfcb00d74d060b8534c6130c808e4d5d0a54440
2017-07-27 23:26:04 -07:00
Pengchao Wang
534c255c7a Cassandra compaction filter for purge expired columns and rows
Summary:
Major changes in this PR:
* Implement CassandraCompactionFilter to remove expired columns and rows (if all column expired)
* Move cassandra related code from utilities/merge_operators/cassandra to utilities/cassandra/*
* Switch to use shared_ptr<> from uniqu_ptr for Column membership management in RowValue. Since columns do have multiple owners in Merge and GC process, use shared_ptr helps make RowValue immutable.
* Rename cassandra_merge_test to cassandra_functional_test and add two TTL compaction related tests there.
Closes https://github.com/facebook/rocksdb/pull/2588

Differential Revision: D5430010

Pulled By: wpc

fbshipit-source-id: 9566c21e06de17491d486a68c70f52d501f27687
2017-07-21 14:57:44 -07:00
Andrew Kryczka
7ac184c6d7 Revert cmake -DNDEBUG for non-MSVC
Summary:
Unfortunately we can't use -DNDEBUG yet since we don't properly exclude the test libraries/executables from the non-debug builds on non-MSVC platforms. Previously this was failing on Linux for every build type except `CMAKE_BUILD_TYPE=Debug`.

Reverts a48a62d
Closes https://github.com/facebook/rocksdb/pull/2595

Differential Revision: D5436182

Pulled By: ajkr

fbshipit-source-id: 062f07cc9ce06a073b66054722b27bac1890dca3
2017-07-17 15:08:38 -07:00
Giuseppe Ottaviano
8f927e5f75 Fix undefined behavior in Hash
Summary:
Instead of ignoring UBSan checks, fix the negative shifts in
Hash(). Also add test to make sure the hash values are stable over
time. The values were computed before this change, so the test also
verifies the correctness of the change.
Closes https://github.com/facebook/rocksdb/pull/2546

Differential Revision: D5386369

Pulled By: yiwu-arbug

fbshipit-source-id: 6de4b44461a544d6222cc5d72d8cda2c0373d17e
2017-07-10 12:29:24 -07:00
Yi Wu
982cec22af Fix TARGETS file tests list
Summary:
1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
2. add blob_db_test
Closes https://github.com/facebook/rocksdb/pull/2506

Differential Revision: D5331517

Pulled By: yiwu-arbug

fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57
2017-06-27 14:12:02 -07:00
Ewout Prangsma
51778612c9 Encryption at rest support
Summary:
This PR adds support for encrypting data stored by RocksDB when written to disk.

It adds an `EncryptedEnv` override of the `Env` class with matching overrides for sequential&random access files.
The encryption itself is done through a configurable `EncryptionProvider`. This class creates is asked to create `BlockAccessCipherStream` for a file. This is where the actual encryption/decryption is being done.
Currently there is a Counter mode implementation of `BlockAccessCipherStream` with a `ROT13` block cipher (NOTE the `ROT13` is for demo purposes only!!).

The Counter operation mode uses an initial counter & random initialization vector (IV).
Both are created randomly for each file and stored in a 4K (default size) block that is prefixed to that file. The `EncryptedEnv` implementation is such that clients of the `Env` class do not see this prefix (nor data, nor in filesize).
The largest part of the prefix block is also encrypted, and there is room left for implementation specific settings/values/keys in there.

To test the encryption, the `DBTestBase` class has been extended to consider a new environment variable called `ENCRYPTED_ENV`. If set, the test will setup a encrypted instance of the `Env` class to use for all tests.
Typically you would run it like this:

```
ENCRYPTED_ENV=1 make check_some
```

There is also an added test that checks that some data inserted into the database is or is not "visible" on disk. With `ENCRYPTED_ENV` active it must not find plain text strings, with `ENCRYPTED_ENV` unset, it must find the plain text strings.
Closes https://github.com/facebook/rocksdb/pull/2424

Differential Revision: D5322178

Pulled By: sdwilsh

fbshipit-source-id: 253b0a9c2c498cc98f580df7f2623cbf7678a27f
2017-06-26 16:56:24 -07:00
Chen Shen
cbd825deea Create a MergeOperator for Cassandra Row Value
Summary:
This PR implements the MergeOperator for Cassandra Row Values.
Closes https://github.com/facebook/rocksdb/pull/2289

Differential Revision: D5055464

Pulled By: scv119

fbshipit-source-id: 45f276ef8cbc4704279202f6a20c64889bc1adef
2017-06-16 14:27:00 -07:00
Siying Dong
95b0e89b5d Improve write buffer manager (and allow the size to be tracked in block cache)
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350

Differential Revision: D5110648

Pulled By: siying

fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
2017-06-02 14:26:56 -07:00
Maysam Yabandeh
5a9b4d7435 Retire memenv https://github.com/facebook/rocksdb/pull/2082
Summary:
This is a manual commit of this PR:
Retire InMemoryEnv in favor of MockEnv #2082
With MockEnv doing the same yet being more mature, InMemoryEnv is redundant.

Reviewed By: IslamAbdelRahman

Differential Revision: D5162323

fbshipit-source-id: 59fd0082a891dc99cc531e4da9d68bf891eae3f5
2017-06-01 15:41:20 -07:00
Tamir Duberstein
0dc3040d54 db: avoid #includeing malloc and jemalloc simultaneously
Summary:
This fixes a compilation failure on Linux when the system libc is not
glibc. jemalloc's configure script incorrectly assumes that glibc is
always used on Linux systems, producing glibc-style signatures; when
the system libc is e.g. musl, the following error is observed:

```
  [  0%] Building CXX object CMakeFiles/rocksdb.dir/db/db_impl.cc.o
  In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/table/block.h:19:0,
                   from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/db/db_impl.cc:77:
  /x-tools/x86_64-unknown-linux-musl/x86_64-unknown-linux-musl/sysroot/usr/include/malloc.h:19:8: error: declaration of 'size_t malloc_usable_size(void*)' has a different exception specifier
   size_t malloc_usable_size(void *);
          ^~~~~~~~~~~~~~~~~~
  In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/db/db_impl.cc:20:0:
  /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:78:33: note: from previous declaration 'size_t malloc_usable_size(void*) throw ()'
   #  define je_malloc_usable_size malloc_usable_size
                                   ^
  /go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:239:41: note: in expansion of macro 'je_malloc_usable_size'
   JEMALLOC_EXPORT size_t JEMALLOC_NOTHROW je_malloc_usable_size(
                                           ^~~~~~~~~~~~~~~~~~~~~
  CMakeFiles/rocksdb.dir/build.make:350: recipe for target 'CMakeFiles/rocksdb.dir/db/db_impl.cc.o' failed
```

This works around the issue by rearranging the sources such that
jemalloc's headers are never in the same scope as the system's malloc
header. The jemalloc issue has been reported as well, see:
https://github.com/jemalloc/jemalloc/issues/778.

cc tschottdorf
Closes https://github.com/facebook/rocksdb/pull/2188

Differential Revision: D5163048

Pulled By: siying

fbshipit-source-id: c553125458892def175c1be5682b0330d80b2a0d
2017-05-31 22:43:02 -07:00
Yi Wu
ad19eb8686 Fixing blob db sequence number handling
Summary:
Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch.

Stacking on #2375.
Closes https://github.com/facebook/rocksdb/pull/2385

Differential Revision: D5148358

Pulled By: yiwu-arbug

fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33
2017-05-31 10:56:45 -07:00
Yi Wu
345878a7fb update blob_db_test
Summary:
Re-enable blob_db_test with some update:
* Commented out delay at the end of GC tests. Will update the logic later with sync point to properly trigger GC.
* Added some helper functions.

Also update make files to include blob_dump tool.
Closes https://github.com/facebook/rocksdb/pull/2375

Differential Revision: D5133793

Pulled By: yiwu-arbug

fbshipit-source-id: 95470b26d0c1f9592ba4b7637e027fdd263f425c
2017-05-30 22:26:13 -07:00
Tamir Duberstein
103d0692ea Avoid unsupported attributes when not building with UBSAN
Summary:
yiwu-arbug see individual commits.
Closes https://github.com/facebook/rocksdb/pull/2318

Differential Revision: D5141520

Pulled By: yiwu-arbug

fbshipit-source-id: 7987c92ab4461eef36afce5a133d3a0ee0c96300
2017-05-30 11:13:01 -07:00
Nikhil Benesch
a48a62d5ba define NDEBUG in CMake non-debug builds
Summary:
This brings CMake builds further in line with builds that go through
the normal Makefile.
Closes https://github.com/facebook/rocksdb/pull/2300

Differential Revision: D5064631

Pulled By: yiwu-arbug

fbshipit-source-id: 7b2b2d5299f575f87badcf590cc95e040f14d52d
2017-05-16 11:07:58 -07:00
Nikhil Benesch
11c5d4741a cross-platform compatibility improvements
Summary:
We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS.

See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build.

I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream.
Closes https://github.com/facebook/rocksdb/pull/2199

Differential Revision: D5054042

Pulled By: yiwu-arbug

fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
2017-05-15 16:15:38 -07:00
Yi Wu
86d5492530 Fix build error with blob DB.
Summary:
snprintf is in <stdio.h> and not in namespace std.
Closes https://github.com/facebook/rocksdb/pull/2287

Reviewed By: anirbanr-fb

Differential Revision: D5054752

Pulled By: yiwu-arbug

fbshipit-source-id: 356807ec38f3c7d95951cdb41f31a3d3ae0714d4
2017-05-15 14:05:46 -07:00
Andrew Kryczka
3fa9a39c68 Add GetAllKeyVersions API
Summary:
- Introduced an include/ file dedicated to db-related debug functions to avoid making db.h more complex
- Added debugging function, `GetAllKeyVersions()`, to return a listing of internal data for a range of user keys. The new `struct KeyVersion` exposes data similar to internal key without exposing any internal type.
- Migrated the "ldb idump" subcommand to use this function
- The API takes an inclusive-exclusive range to match behavior of "ldb idump". This will be quite annoying for users who want to query a single user key's versions :(.
Closes https://github.com/facebook/rocksdb/pull/2232

Differential Revision: D4976007

Pulled By: ajkr

fbshipit-source-id: cab375da53a7595d6575af2b7e3b776aa3ad793e
2017-05-12 15:54:06 -07:00
Yi Wu
c61e72c421 Add missing files of blob_db to CMake file
Summary:
Some of the file from #2269 didn't add to CMake file. Adding them to fix window build.
Closes https://github.com/facebook/rocksdb/pull/2276

Differential Revision: D5043487

Pulled By: yiwu-arbug

fbshipit-source-id: 4eba853e9d92574353abce21d77d30e47ce43d3d
2017-05-11 09:52:22 -07:00
Anirban Rahut
d85ff4953c Blob storage pr
Summary:
The final pull request for Blob Storage.
Closes https://github.com/facebook/rocksdb/pull/2269

Differential Revision: D5033189

Pulled By: yiwu-arbug

fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06
2017-05-10 15:14:44 -07:00
Tamir Duberstein
fdaefa0309 travis: add Windows cross-compilation
Summary:
- downcase includes for case-sensitive filesystems
- give targets the same name (librocksdb) on all platforms

With this patch it is possible to cross-compile RocksDB for Windows
from a Linux host using mingw.

cc yuslepukhin orgads
Closes https://github.com/facebook/rocksdb/pull/2107

Differential Revision: D4849784

Pulled By: siying

fbshipit-source-id: ad26ed6b4d393851aa6551e6aa4201faba82ef60
2017-05-05 23:20:01 -07:00
Andrew Kryczka
f6a27d0bce Extract statistics tests into separate file
Summary:
I'm going to add more DB tests for statistics as currently we have very few. I started a file dedicated to this purpose and moved the existing stats-specific tests there.
Closes https://github.com/facebook/rocksdb/pull/2211

Differential Revision: D4951558

Pulled By: ajkr

fbshipit-source-id: 05d11c35079c40ecabdfd2cf5556ccb761f694a4
2017-04-26 14:47:23 -07:00
Andrew Kryczka
e5e545a021 Reunite checkpoint and backup core logic
Summary:
These code paths forked when checkpoint was introduced by copy/pasting the core backup logic. Over time they diverged and bug fixes were sometimes applied to one but not the other (like fix to include all relevant WALs for 2PC), or it required extra effort to fix both (like fix to forge CURRENT file). This diff reunites the code paths by extracting the core logic into a function, CreateCustomCheckpoint(), that is customizable via callbacks to implement both checkpoint and backup.

Related changes:

- flush_before_backup is now forcibly enabled when 2PC is enabled
- Extracted CheckpointImpl class definition into a header file. This is so the function, CreateCustomCheckpoint(), can be called by internal rocksdb code but not exposed to users.
- Implemented more functions in DummyDB/DummyLogFile (in backupable_db_test.cc) that are used by CreateCustomCheckpoint().
Closes https://github.com/facebook/rocksdb/pull/1932

Differential Revision: D4622986

Pulled By: ajkr

fbshipit-source-id: 157723884236ee3999a682673b64f7457a7a0d87
2017-04-24 15:06:46 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Nick Anderson
066cfbacc0 Adding -noprofile to CMakeLists for Windows
Summary:
In the off chance you have a `Microsoft.PowerShell_profile.ps1` profile the `execute_process` will load one's profile during running `cmake`, which will cause garbage output to be written to the `build_version.cc` file. If you add in a `-noprofile` flag, or you suppress output in your pshell profile, this is mitigated.
Closes https://github.com/facebook/rocksdb/pull/2168

Differential Revision: D4927003

Pulled By: siying

fbshipit-source-id: 83861752d6cf2627dd864eedd2acaa8aa8a6232e
2017-04-21 14:11:49 -07:00
Tamir Duberstein
a6439d797e CMake: compile with -O2
Summary:
Seems pretty important.
Closes https://github.com/facebook/rocksdb/pull/2161

Differential Revision: D4891097

Pulled By: yiwu-arbug

fbshipit-source-id: 16dc0149dc89d426407fabab4a5ba1d6d1562269
2017-04-17 23:56:46 -07:00
Tamir Duberstein
bc3973259c CMake: add support for SSE4.2
Summary: Closes https://github.com/facebook/rocksdb/pull/2159

Differential Revision: D4894483

Pulled By: yiwu-arbug

fbshipit-source-id: 607e17e8ef5d30dce02c27be9d1de7d9f823b4ae
2017-04-16 11:47:17 -07:00
Willem Jan Withagen
04abb2b2dd FreeBSD only requires WITH_JEMALLOC, not the rest
Summary:
The compiler error:
```
/home/jenkins/workspace/ceph-master/src/rocksdb/db/db_impl.cc:20:10: fatal error: 'jemalloc/jemalloc.h' file not found
         ^
1 error generated.
```

But is does compile with the `WITH_JEMALLOC` set.
So ignore all the other settings.
Closes https://github.com/facebook/rocksdb/pull/2118

Differential Revision: D4858387

Pulled By: yiwu-arbug

fbshipit-source-id: 05b982969dcab53669a73a903641e71641c714e7
2017-04-09 11:26:50 -07:00
Siying Dong
ff97287016 Refactor compaction picker code
Summary:
1. Move universal compaction picker to separate files compaction_picker_universal.cc and compaction_picker_universal.h.
2. Rename some functions to make the code easier to understand.
3. Move leveled compaction picking code to a dedicated class, so that we we don't need to pass some common variable around when calling functions. It also allowed us to break down LevelCompactionPicker::PickCompaction() to smaller functions.
Closes https://github.com/facebook/rocksdb/pull/2100

Differential Revision: D4845948

Pulled By: siying

fbshipit-source-id: efa0ab4
2017-04-06 20:09:34 -07:00
Sagar Vemuri
343b59d6ee Move various string utility functions into string_util
Summary:
This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.

Check the sub-commits for easier reviewing.
Closes https://github.com/facebook/rocksdb/pull/2094

Differential Revision: D4837730

Pulled By: sagar0

fbshipit-source-id: 344278a
2017-04-06 14:54:12 -07:00
Yi Wu
df6f5a3772 Move memtable related files into memtable directory
Summary:
Move memtable related files into memtable directory.
Closes https://github.com/facebook/rocksdb/pull/2087

Differential Revision: D4829242

Pulled By: yiwu-arbug

fbshipit-source-id: ca70ab6
2017-04-06 14:09:13 -07:00
Tamir Duberstein
107c5f6a60 CMake: more MinGW fixes
Summary:
siying this is a resubmission of #2081 with the 4th commit fixed. From that commit message:

> Note that the previous use of quotes in PLATFORM_{CC,CXX}FLAGS was
incorrect and caused GCC to produce the incorrect define:
>
>  #define ROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE 1
>
> This was the cause of the Linux build failure on the previous version
of this change.

I've tested this locally, and the Linux build succeeds now.
Closes https://github.com/facebook/rocksdb/pull/2097

Differential Revision: D4839964

Pulled By: siying

fbshipit-source-id: cc51322
2017-04-06 14:09:13 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Siying Dong
ce64b8b719 Divide db/db_impl.cc
Summary:
db_impl.cc is too large to manage. Divide db_impl.cc into db/db_impl.cc, db/db_impl_compaction_flush.cc, db/db_impl_files.cc, db/db_impl_open.cc and db/db_impl_write.cc.
Closes https://github.com/facebook/rocksdb/pull/2095

Differential Revision: D4838188

Pulled By: siying

fbshipit-source-id: c5f3059
2017-04-05 17:24:19 -07:00
Siying Dong
43010a929f Revert "[rocksdb][PR] CMake: more MinGW fixes"
fbshipit-source-id: 43b4529
2017-04-04 16:24:26 -07:00
Tamir Duberstein
3450ac8c1b CMake: more MinGW fixes
Summary:
See individual commits.

yuslepukhin siying
Closes https://github.com/facebook/rocksdb/pull/2081

Differential Revision: D4824639

Pulled By: IslamAbdelRahman

fbshipit-source-id: 2fc2b00
2017-04-04 15:09:17 -07:00
Andrew Kryczka
e2c6c06366 add TimedEnv
Summary:
I've needed Env timing measurements a few times now, so finally built something for it.
Closes https://github.com/facebook/rocksdb/pull/2073

Differential Revision: D4811231

Pulled By: ajkr

fbshipit-source-id: 218a249
2017-04-04 11:24:12 -07:00
Siying Dong
6ef8c620d3 Move auto_roll_logger and filename out of db/
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080

Differential Revision: D4821141

Pulled By: siying

fbshipit-source-id: ca7d768
2017-04-03 18:39:14 -07:00
Tamir Duberstein
b6d6090630 CMake: support AVX2 in MinGW
Summary:
yuslepukhin

See individual commits.
Closes https://github.com/facebook/rocksdb/pull/2051

Differential Revision: D4819697

Pulled By: siying

fbshipit-source-id: 75c1de4
2017-04-03 10:11:46 -07:00
Orgad Shaneh
6401a8b76b Fix build with MinGW
Summary:
There still are many warnings (most of them about invalid printf format
for long long), but it builds if FAIL_ON_WARNINGS is disabled.
Closes https://github.com/facebook/rocksdb/pull/2052

Differential Revision: D4807355

Pulled By: siying

fbshipit-source-id: ef03786
2017-03-30 16:54:52 -07:00
Min Wei
8a8c967460 Enable Fast CRC32 for Win64
Summary:
Currently the fast crc32 path is not enabled on Windows. I am trying to enable it here, hopefully, with the minimum impact to the existing code structure.
Closes https://github.com/facebook/rocksdb/pull/2033

Differential Revision: D4770635

Pulled By: siying

fbshipit-source-id: 676f8b8
2017-03-29 17:39:19 -07:00
Maysam Yabandeh
a2f7a514d1 Refactoring
Summary:
This is the first split of https://github.com/facebook/rocksdb/pull/1891 and will be needed for the upcoming partitioned filter patch.
Closes https://github.com/facebook/rocksdb/pull/1949

Differential Revision: D4652152

Pulled By: maysamyabandeh

fbshipit-source-id: 9801778
2017-03-03 18:24:12 -08:00
Siying Dong
ba4c77bd6b Divide external_sst_file_test
Summary:
Separate the platform dependent tests from external_sst_file_test. Only those tests need to run on platforms like OSX
Closes https://github.com/facebook/rocksdb/pull/1923

Differential Revision: D4622461

Pulled By: siying

fbshipit-source-id: d2d6f04
2017-02-28 14:24:11 -08:00
Siying Dong
8ad0fcdf99 Separate small subset tests in DBTest
Summary:
Separate a smal subset of tests in DBTest to DBBasicTest. Tests in DBTest don't have to run in CI tests on platforms like OSX, as long as they are covered by Linux.
Closes https://github.com/facebook/rocksdb/pull/1924

Differential Revision: D4616702

Pulled By: siying

fbshipit-source-id: 13e6549
2017-02-27 12:24:11 -08:00
Siying Dong
1ba2804b7f Remove XFunc tests
Summary:
Xfunc is hardly used. Remove it to keep the code simple.
Closes https://github.com/facebook/rocksdb/pull/1905

Differential Revision: D4603220

Pulled By: siying

fbshipit-source-id: 731f96d
2017-02-23 12:09:11 -08:00
Dmitri Smirnov
0a4cdde50a Windows thread
Summary:
introduce new methods into a public threadpool interface,
- allow submission of std::functions as they allow greater flexibility.
- add Joining methods to the implementation to join scheduled and submitted jobs with
  an option to cancel jobs that did not start executing.
- Remove ugly `#ifdefs` between pthread and std implementation, make it uniform.
- introduce pimpl for a drop in replacement of the implementation
- Introduce rocksdb::port::Thread typedef which is a replacement for std::thread.  On Posix Thread defaults as before std::thread.
- Implement WindowsThread that allocates memory in a more controllable manner than windows std::thread with a replaceable implementation.
- should be no functionality changes.
Closes https://github.com/facebook/rocksdb/pull/1823

Differential Revision: D4492902

Pulled By: siying

fbshipit-source-id: c74cb11
2017-02-06 14:54:18 -08:00
Islam AbdelRahman
574b543f80 Rename merger.h -> merging_iterator.h
Summary:
merger.h was always a confusing name for me, simply give the file a better name
Closes https://github.com/facebook/rocksdb/pull/1836

Differential Revision: D4505357

Pulled By: IslamAbdelRahman

fbshipit-source-id: 07b28d8
2017-02-02 16:54:19 -08:00
Andrew Kryczka
17c1180603 Generalize Env registration framework
Summary:
The Env registration framework supports registering client Envs and selecting which one to instantiate according to a text field. This enabled things like adding the -env_uri argument to db_bench, so the same binary could be reused with different Envs just by changing CLI config.

Now this problem has come up again in a non-Env context, as I want to instantiate a client Statistics implementation from db_bench, which is configured entirely via text parameters. Also, in the future we may wish to use it for deserializing client objects when loading OPTIONS file.

This diff generalizes the Env registration logic to work with arbitrary types.

- Generalized registration and instantiation code by templating them
- The entire implementation is in a header file as that's Google style guide's recommendation for template definitions
- Pattern match with std::regex_match rather than checking prefix, which was the previous behavior
- Rename functions/files to be non-Env-specific
Closes https://github.com/facebook/rocksdb/pull/1776

Differential Revision: D4421933

Pulled By: ajkr

fbshipit-source-id: 34647d1
2017-01-25 16:09:14 -08:00
Arun Sharma
fba726e555 Version librocksdb.so
Summary:
After make install, I see a directory hierarchy that looks like

```
./usr
./usr/include
./usr/include/rocksdb
./usr/include/rocksdb/filter_policy.h
[..]
./usr/include/rocksdb/iterator.h
./usr/include/rocksdb/utilities
./usr/include/rocksdb/utilities/ldb_cmd_execute_result.h
./usr/include/rocksdb/utilities/lua
./usr/include/rocksdb/utilities/lua/rocks_lua_custom_library.h
./usr/include/rocksdb/utilities/lua/rocks_lua_util.h
./usr/include/rocksdb/utilities/lua/rocks_lua_compaction_filter.h
./usr/include/rocksdb/utilities/backupable_db.h
[..]
./usr/include/rocksdb/utilities/env_registry.h
[..]
./usr/include/rocksdb/env.h
./usr/lib64
./usr/lib64/librocksdb.so.5
./usr/lib64/librocksdb.so.5.0.0
./usr/lib64/librocksdb.so
./usr/lib64/librocksdb.a
```
Closes https://github.com/facebook/rocksdb/pull/1798

Differential Revision: D4456536

Pulled By: yiwu-arbug

fbshipit-source-id: 5494e91
2017-01-24 11:09:10 -08:00
Arun Sharma
d7ea44f2f9 Fixup a couple of builds errors on Linux.
Summary:
The libraries produced on linux are now named
librocksdb.a
librocksdb.so

Other fixes:

* Also link with -lrt to avoid linker errors.
* Generalize comments at the top to include Linux
* Move -lgtest before -lpthread to avoid linker errors
* move add_subdirectory(tools) to the end so it picks up
  the right libraries
Closes https://github.com/facebook/rocksdb/pull/1364

Differential Revision: D4444138

Pulled By: yiwu-arbug

fbshipit-source-id: f0e2c19
2017-01-20 13:24:13 -08:00
Kefu Chai
4e35ffdfab cmake: check -momit-leaf-frame-pointer before using it
Summary:
because not all archs support this option. see
https://gcc.gnu.org/onlinedocs/gcc/Option-Summary.html.

also do not pass "-fno-omit-frame-pointer" and
"-momit-leaf-frame-pointer" to compiler if ${CMAKE_BUILD_TYPE} is
"Debug". this matches the behaviour of DEBUG_LEVEL=2 in Makefile.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/1762

Differential Revision: D4444036

Pulled By: yiwu-arbug

fbshipit-source-id: 8596fbe
2017-01-20 13:09:11 -08:00
Yi Wu
c270735861 Iterator should be in corrupted status if merge operator return false
Summary:
Iterator should be in corrupted status if merge operator return false.
Also add test to make sure if max_successive_merges is hit during write,
data will not be lost.
Closes https://github.com/facebook/rocksdb/pull/1665

Differential Revision: D4322695

Pulled By: yiwu-arbug

fbshipit-source-id: b327b05
2016-12-16 11:09:16 -08:00
Igor Canadi
3f407b065c Kill flashcache code in RocksDB
Summary:
Now that we have userspace persisted cache, we don't need flashcache anymore.
Closes https://github.com/facebook/rocksdb/pull/1588

Differential Revision: D4245114

Pulled By: igorcanadi

fbshipit-source-id: e2c1c72
2016-12-01 10:09:22 -08:00
Andrew Kryczka
e333528991 DeleteRange write path end-to-end tests
Summary: Closes https://github.com/facebook/rocksdb/pull/1578

Differential Revision: D4241171

Pulled By: ajkr

fbshipit-source-id: ce5fd83
2016-11-29 11:09:22 -08:00
Kefu Chai
41e77b8390 cmake: s/STEQUAL/STREQUAL/
Summary:
Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/1540

Differential Revision: D4207564

Pulled By: siying

fbshipit-source-id: 567415b
2016-11-18 14:54:14 -08:00
Yueh-Hsuan Chiang
647eafdc21 Introduce Lua Extension: RocksLuaCompactionFilter
Summary:
This diff includes an implementation of CompactionFilter that allows
users to write CompactionFilter in Lua.  With this ability, users can
dynamically change compaction filter logic without requiring building
the rocksdb binary and restarting the database.

To compile, WITH_LUA_PATH must be specified to the base directory
of lua.
Closes https://github.com/facebook/rocksdb/pull/1478

Differential Revision: D4150138

Pulled By: yhchiang

fbshipit-source-id: ed84222
2016-11-16 15:39:12 -08:00
Willem Jan Withagen
dc51bd716b CMakeLists.txt: FreeBSD has jemalloc as default malloc
Summary:
This will allow reference to `malloc_stats_print`
Closes https://github.com/facebook/rocksdb/pull/1516

Differential Revision: D4187258

Pulled By: siying

fbshipit-source-id: 34ae9f9
2016-11-15 17:39:47 -08:00
Yi Wu
1ea79a78c9 Optimize sequential insert into memtable - Part 1: Interface
Summary:
Currently our skip-list have an optimization to speedup sequential
inserts from a single stream, by remembering the last insert position.
We extend the idea to support sequential inserts from multiple streams,
and even tolerate small reordering wihtin each stream.

This PR is the interface part adding the following:
- Add `memtable_insert_prefix_extractor` to allow specifying prefix for each key.
- Add `InsertWithHint()` interface to memtable, to allow underlying
  implementation to return a hint of insert position, which can be later
  pass back to optimize inserts.
- Memtable will maintain a map from prefix to hints and pass the hint
  via `InsertWithHint()` if `memtable_insert_prefix_extractor` is non-null.
Closes https://github.com/facebook/rocksdb/pull/1419

Differential Revision: D4079367

Pulled By: yiwu-arbug

fbshipit-source-id: 3555326
2016-11-13 19:09:18 -08:00
Vladislav Vaintroub
196af035c1 Introduce FAIL_ON_WARNINGS CMake variable (default ON)
Summary:
If FAIL_ON_WARNINGS is OFF, compilation will succeed if there were compile
warnings.
Closes https://github.com/facebook/rocksdb/pull/1446

Differential Revision: D4107825

Pulled By: yiwu-arbug

fbshipit-source-id: 4ce6e05
2016-11-01 01:24:16 -07:00
Kefu Chai
9c0bb7f172 cmake: drop "-march=native" from CXX_FLAGS (#1429)
this breaks the cross-compiling, and we can not assume that the building
machine and the target machine share the same CPU spec.

Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-10-28 10:40:47 -07:00
Kefu Chai
60a2bbba94 Makefile: generate util/build_version.cc from .in file (#1384)
* util/build_verion.cc.in: add this file, so cmake and make can share the
  template file for generating util/build_version.cc.
* CMakeLists.txt: also, cmake v2.8.11 does not support file(GENERATE ...),
  so we are using configure_file() for creating build_version.cc.
* Makefile: use util/build_verion.cc.in for creating build_version.cc.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
2016-10-25 11:31:39 -07:00
Islam AbdelRahman
869ae5d786 Support IngestExternalFile (remove AddFile restrictions)
Summary:
Changes in the diff

API changes:
- Introduce IngestExternalFile to replace AddFile (I think this make the API more clear)
- Introduce IngestExternalFileOptions (This struct will encapsulate the options for ingesting the external file)
- Deprecate AddFile() API

Logic changes:
- If our file overlap with the memtable we will flush the memtable
- We will find the first level in the LSM tree that our file key range overlap with the keys in it
- We will find the lowest level in the LSM tree above the the level we found in step 2 that our file can fit in and ingest our file in it
- We will assign a global sequence number to our new file
- Remove AddFile restrictions by using global sequence numbers

Other changes:
- Refactor all AddFile logic to be encapsulated in ExternalSstFileIngestionJob

Test Plan:
unit tests (still need to add more)
addfile_stress (https://reviews.facebook.net/D65037)

Reviewers: yiwu, andrewkr, lightmark, yhchiang, sdong

Reviewed By: sdong

Subscribers: jkedgar, hcz, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65061
2016-10-20 17:05:32 -07:00
Andrew Kryczka
6fbe96baf8 Compaction Support for Range Deletion
Summary:
This diff introduces RangeDelAggregator, which takes ownership of iterators
provided to it via AddTombstones(). The tombstones are organized in a two-level
map (snapshot stripe -> begin key -> tombstone). Tombstone creation avoids data
copy by holding Slices returned by the iterator, which remain valid thanks to pinning.

For compaction, we create a hierarchical range tombstone iterator with structure
matching the iterator over compaction input data. An aggregator based on that
iterator is used by CompactionIterator to determine which keys are covered by
range tombstones. In case of merge operand, the same aggregator is used by
MergeHelper. Upon finishing each file in the compaction, relevant range tombstones
are added to the output file's range tombstone metablock and file boundaries are
updated accordingly.

To check whether a key is covered by range tombstone, RangeDelAggregator::ShouldDelete()
considers tombstones in the key's snapshot stripe. When this function is used outside of
compaction, it also checks newer stripes, which can contain covering tombstones. Currently
the intra-stripe check involves a linear scan; however, in the future we plan to collapse ranges
within a stripe such that binary search can be used.

RangeDelAggregator::AddToBuilder() adds all range tombstones in the table's key-range
to a new table's range tombstone meta-block. Since range tombstones may fall in the gap
between files, we may need to extend some files' key-ranges. The strategy is (1) first file
extends as far left as possible and other files do not extend left, (2) all files extend right
until either the start of the next file or the end of the last range tombstone in the gap,
whichever comes first.

One other notable change is adding release/move semantics to ScopedArenaIterator
such that it can be used to transfer ownership of an arena-allocated iterator, similar to
how unique_ptr is used for malloc'd data.

Depends on D61473

Test Plan: compaction_iterator_test, mock_table, end-to-end tests in D63927

Reviewers: sdong, IslamAbdelRahman, wanning, yhchiang, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62205
2016-10-18 12:04:56 -07:00
Adam Retter
ce4963fdfc [doc] Document that Visual Studio 2015+ is now required for Windows builds (#1389)
Closes https://github.com/facebook/rocksdb/issues/1377
2016-10-12 13:40:20 -07:00
Bassam Tabbara
aea3ce4c83 Avoid string CONCAT which is not supported in cmake 2.6 (#1383)
Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com>
2016-10-10 17:32:04 -07:00
Kefu Chai
21f4bb5a89 cmake support for linux and osx (#1358)
* enable cmake to work on linux and osx also

* port part of build_detect_platform not covered by thirdparty.inc
  to cmake.
  - detect fallocate()
  - detect malloc_usable_size()
  - detect JeMalloc
  - detect snappy
* check for asan,tsan,ubsan
* create 'build_version.cc' in build directory.
* add `check` target to support 'make check'.
* add `tools` target to match its counterpart in Makefile.
* use `date` on non-win32 platforms.
* pass different cflags on non-win32 platforms
* detect pthead library using FindThread cmake module.
* enable CMP0042 to silence the cmake warning on osx
* reorder the linked libraries. because testutillib references gtest, to
  enable the linker to find the referenced symbols, we need to put gtest
  after testutillib.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>

* hash_table_bench.cc: fix build without gflags

Signed-off-by: Kefu Chai <kchai@redhat.com>

* remove gtest from librocksdb linkage

testharness.cc is included in librocksdb sources, and it uses gtest. but
gtest is not supposed to be part of the public API of librocksdb. so, in
this change, the testharness.cc is moved out out librocksdb, and is
built as an object target, then linked with the tools and tests instead.

Signed-off-by: Marcus Watts <mwatts@redhat.com>
Signed-off-by: Kefu Chai <kchai@redhat.com>
2016-09-28 11:53:15 -07:00
Yi Wu
9ed928e7a9 Split DBOptions into ImmutableDBOptions and MutableDBOptions
Summary: Use ImmutableDBOptions/MutableDBOptions internally and DBOptions only for user-facing APIs. MutableDBOptions is barely a placeholder for now. I'll start to move options to MutableDBOptions in following diffs.

Test Plan:
  make all check

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64065
2016-09-23 16:34:04 -07:00
Islam AbdelRahman
52ee07b021 Move AddFile() tests to external_sst_file_test.cc
Summary: Simply move the tests

Test Plan: make check -j64

Reviewers: andrewkr, lightmark, yiwu, yhchiang, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62529
2016-09-07 15:41:54 -07:00
Injun Song
ce1be2ce37 Fix build error on Windows (AppVeyor) (#1315)
Add 'cf_options' to source list and db_imple.cc

fix casting
2016-09-06 08:41:43 -07:00
Islam AbdelRahman
e9b2af87f8 Expose ThreadPool under include/rocksdb/threadpool.h
Summary:
This diff split ThreadPool to
-ThreadPool (abstract interface exposed in include/rocksdb/threadpool.h)
-ThreadPoolImpl (actual implementation in util/threadpool_imp.h)

This allow us to expose ThreadPool to the user so we can use it as an option later

Test Plan: existing unit tests

Reviewers: andrewkr, yiwu, yhchiang, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62085
2016-08-26 10:41:35 -07:00
Yi Wu
72f8cc703c LRU cache mid-point insertion
Summary:
Add mid-point insertion functionality to LRU cache. Caller of `Cache::Insert()` can set an additional parameter to make a cache entry have higher priority. The LRU cache will reserve at most `capacity * high_pri_pool_pct` bytes for high-pri cache entries. If `high_pri_pool_pct` is zero, the cache degenerates to normal LRU cache.

Context: If we are to put index and filter blocks into RocksDB block cache, index/filter block can be swap out too early. We want to add an option to RocksDB to reserve some capacity in block cache just for index/filter blocks, to mitigate the issue.

In later diffs I'll update block based table reader to use the interface to cache index/filter blocks at high priority, and expose the option to `DBOptions` and make it dynamic changeable.

Test Plan: unit test.

Reviewers: IslamAbdelRahman, sdong, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, march, leveldb

Differential Revision: https://reviews.facebook.net/D61977
2016-08-19 16:43:31 -07:00
Yi Wu
4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
sdong
8b79422b52 [Proof-Of-Concept] RocksDB Blob Storage with a blob log file.
Summary:
This is a proof of concept of a RocksDB blob log file. The actual value of the Put() is appended to a blob log using normal data block format, and the handle of the block is written as the value of the key in RocksDB.

The prototype only supports Put() and Get(). It doesn't support DB restart, garbage collection, Write() call, iterator, snapshots, etc.

Test Plan: Add unit tests.

Reviewers: arahut

Reviewed By: arahut

Subscribers: kradhakrishnan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61485
2016-08-10 17:05:17 -07:00
omegaga
44f5cc57a5 Add time series database (resubmitted)
Summary: Implement a time series database that supports DateTieredCompactionStrategy. It wraps a db object and separate SST files in different column families (time windows).

Test Plan: Add `date_tiered_test`.

Reviewers: dhruba, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61653
2016-08-05 15:56:22 -07:00
sdong
7c4615cf1f A utility function to help users migrate DB after options change
Summary: Add a utility function that trigger necessary full compaction and put output to the correct level by looking at new options and old options.

Test Plan: Add unit tests for it.

Reviewers: andrewkr, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: muthu, sumeet, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60783
2016-08-05 15:39:55 -07:00
Dmitri Smirnov
e4609a749b Fix Windows build issues (#1253) 2016-08-02 17:15:18 -07:00
omegaga
d51dc96a79 Experiments on column-aware encodings
Summary:
Experiments on column-aware encodings. Supported features: 1) extract data blocks from SST file and encode with specified encodings; 2) Decode encoded data back into row format; 3) Directly extract data blocks and write in row format (without prefix encoding); 4) Get column distribution statistics for column format; 5) Dump data blocks separated by columns in human-readable format.

There is still on-going work on this diff. More refactoring is necessary.

Test Plan: Wrote tests in `column_aware_encoding_test.cc`. More tests should be added.

Reviewers: sdong

Reviewed By: sdong

Subscribers: arahut, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60027
2016-08-01 14:50:19 -07:00
Alexander Jipa
50b8d29b9c fixes 1230: Error:string sub-command REGEX, mode REPLACE needs at least 6 arguments total to command (#1231)
quoting the las argument of string REGEXP REPLACE function
2016-07-26 20:29:02 -07:00
Islam AbdelRahman
f8061a237e Fix Statistics TickersNameMap miss match with Tickers enum
Summary:
TickersNameMap is not consistent with Tickers enum.
this cause us to report wrong statistics and sometimes to access TickersNameMap outside it's boundary causing crashes (in Fb303 statistics)

Test Plan: added new unit test

Reviewers: sdong, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61083
2016-07-25 16:05:50 -07:00
Yi Wu
32604e6601 Fix flush not being commit while writing manifest
Summary:
Fix flush not being commit while writing manifest, which is a recent bug introduced by D60075.

The issue:
# Options.max_background_flushes > 1
# Background thread A pick up a flush job, flush, then commit to manifest. (Note that mutex is released before writing manifest.)
# Background thread B pick up another flush job, flush. When it gets to `MemTableList::InstallMemtableFlushResults`, it notices another thread is commiting, so it quit.
# After the first commit, thread A doesn't double check if there are more flush result need to commit, leaving the second flush uncommitted.

Test Plan: run the test. Also verify the new test hit deadlock without the fix.

Reviewers: sdong, igor, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, omegaga, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60969
2016-07-21 10:10:41 -07:00
Yi Wu
4b95253587 Refactor cache.cc
Summary: Refactor cache.cc so that I can plugin clock cache (D55581). Mainly move `ShardedCache` to separate file, move `LRUHandle` back to cache.cc and rename it lru_cache.cc.

Test Plan:
    make check -j64

Reviewers: lightmark, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59655
2016-07-15 10:41:36 -07:00
Yi Wu
6ea41f8527 Fix deadlock when trying update options when write stalls
Summary:
When write stalls because of auto compaction is disabled, or stop write trigger is reached,
user may change these two options to unblock writes. Unfortunately we had issue where the write
thread will block the attempt to persist the options, thus creating a deadlock. This diff
fix the issue and add two test cases to detect such deadlock.

Test Plan:
Run unit tests.

Also, revert db_impl.cc to master (but don't revert `DBImpl::BackgroundCompaction:Finish` sync point) and run db_options_test. Both tests should hit deadlock.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D60627
2016-07-12 15:30:38 -07:00
John Alexander
43692793e3 Fixed Minor Bug on Windows Build and db_bench_tool.cc (#1189)
* Fixed Windows build error in CMakeLists.txt and perf_level error in db_bench_tool.cc

* Changed hard-coded perf levels in db_bench_tool.cc to enum values from perf_level.h

* Replaced remaining FLAGS_perf_level > 0
2016-06-29 18:44:22 -07:00
sdong
6faddd7c55 Merge db/slice.cc into util/slice.cc
Summary: It confuses some compilers to have slice.cc under multiple directories. Merge them.

Test Plan: Run existing tests

Reviewers: andrewkr, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59409
2016-06-10 16:37:36 -07:00
sdong
b2973eaaeb Remove options builder
Summary: AFIK, options builder is not used by anyone. Remove it.

Test Plan: Run all existing tests.

Reviewers: IslamAbdelRahman, andrewkr, igor

Reviewed By: igor

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59319
2016-06-09 16:56:17 -07:00
krad
5b197bff41 Enabled Windows build for volatile tier implementation
Summary: Enabled build in Windows and corresponding fixes

Test Plan:
Compile and run persistent_cache_test in Windows and make check in
Linux

Reviewers: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59307
2016-06-09 11:10:35 -07:00
krad
3070ed9021 Persistent Read Cache (4) Interface definitions
Summary:
This diff provides the basic interface definitions of persistent read
cache system

PersistentCacheOptions captures the persistent read cache options used to
configure and control the system
PersistentCacheTier provides the basic building block for constructing tiered
cache
PersistentTieredCache provides a logical abstraction of tiers of cache layered
over one another

Test Plan: Compile

Reviewers: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57051
2016-06-06 13:17:09 -07:00
Andrew Kryczka
6e6622abb9 Create env_basic_test [pluggable Env part 2]
Summary:
Extracted basic Env-related tests from mock_env_test and memenv_test into a
parameterized test for Envs: env_basic_test.

Depends on D58449. (The dependency is here only so I can keep this series of
diffs in a chain -- there is no dependency on that diff's code.)

Test Plan: ran tests

Reviewers: IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D58635
2016-06-03 15:13:03 -07:00
Andrew Kryczka
af0c9ac01d Env registry for URI-based Env selection [pluggable Env part 1]
Summary:
This enables configurable Envs without recompiling. For example, my
next diff will make env_test test an Env created by NewEnvFromUri(). Then,
users can determine which Env is tested simply by providing the URI for
NewEnvFromUri() (e.g., through a CLI argument or environment variable).

The registration process allows us to register any Env that is linked with the
RocksDB library, so we can register our internal Envs as well.

The registration code is inspired by our internal InitRegistry.

Test Plan: new unit test

Reviewers: IslamAbdelRahman, lightmark, ldemailly, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba, andrewkr

Differential Revision: https://reviews.facebook.net/D58449
2016-06-03 08:15:16 -07:00
krad
62d5480984 Add persistent cache to Windows build system
Summary: Add hash table (under persistent cache) to CMake list

Test Plan: Run hash_test in windows and make check in Linux

Reviewers: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D59151
2016-06-02 17:48:25 -07:00
omegaga
d379d110ee Update CMakeLists.txt for added test
Summary: Update CMakeLists.txt for added db_io_failure_test. Depends on D58341

Test Plan: Run make check to see if the tests are working properly.

Reviewers: sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D58671
2016-05-23 17:09:58 -07:00
sdong
1d725ca51d Deprecate BlockBasedTableOptions.hash_index_allow_collision=false.
Summary: Deprecate this one option and delete code and tests that are now superfluous.

Test Plan: all tests pass

Reviewers: igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: msalib, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55317
2016-05-20 17:52:27 -07:00
Dmitri Smirnov
26adaad438 Split WinEnv into separate classes. (#1128)
For ease of reuse and customization as a library
  without wrapping.
  WinEnvThreads is a class for replacement.
  WintEnvIO is a class for reuse and behavior override.
  Added private virtual functions for custom override
  of fallocate pread for io classes.
2016-05-19 16:40:54 -07:00
Islam AbdelRahman
1f2dca0eaa Add MaxOperator to utilities/merge_operators/
Summary:
Introduce MaxOperator a simple merge operator that return the max of all operands.
This merge operand help me in benchmarking

Test Plan: Add new unitttests

Reviewers: sdong, andrewkr, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57873
2016-05-19 15:51:29 -07:00
Dmitri Smirnov
bac3be7c46 Fix build issue. (#1123)
Implement GetUniqueIdFromFile to support new tests and the feature.
2016-05-16 17:01:00 -07:00
krad
a08c8c851a Added PersistentCache abstraction
Summary:
Added a new abstraction to cache page to RocksDB designed for the read
cache use.

RocksDB current block cache is more of an object cache. For the persistent read cache
project, what we need is a page cache equivalent. This changes adds a cache
abstraction to RocksDB to cache pages called PersistentCache. PersistentCache can cache
uncompressed pages or raw pages (content as in filesystem). The user can
choose to operate PersistentCache either in  COMPRESSED or UNCOMPRESSED mode.

Blame Rev:

Test Plan: Run unit tests

Reviewers: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D55707
2016-05-15 22:17:18 -07:00
Dmitri Smirnov
aab91b8d8f Use generic threadpool for Windows environment (#1120)
Conditionally retrofit thread_posix for use with std::thread
  and reuse the same logic. Posix users continue using Posix interfaces.
  Enable XPRESS compression in test runs.
  Fix master introduced signed/unsigned mismatch.
2016-05-12 18:34:04 -07:00
Reid Horuff
a657ee9a9c [rocksdb] Recovery path sequence miscount fix
Summary:
Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert.
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)]
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)]
[4: COMMIT(a)]
[7: COMMIT(b)]

The first two batches do not consume any sequence numbers so are both prefixed with seq=1.
For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL.
We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries.
This is a valid WAL.

Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains.

We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b))

So, now upon recovery we must track the actual consumption of sequence numbers.
In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct?

Test Plan: provided test.

Reviewers: sdong

Subscribers: andrewkr, leveldb, dhruba, hermanlee4

Differential Revision: https://reviews.facebook.net/D57645
2016-05-10 14:06:07 -07:00
Andrew Kryczka
3f16a836a4 Introduce chroot Env
Summary:
For testing backups, we needed an Env that is fully isolated from other
Envs on the same machine. Our in-memory Envs (MockEnv and InMemoryEnv) were
insufficient because they don't implement most directory operations.

This diff introduces a new Env, "ChrootEnv", that translates paths such that the
chroot directory appears to be the root directory. This way, multiple Envs can
be isolated in the filesystem by using different chroot directories. Since we
use the filesystem, all directory operations are trivially supported.

Test Plan:
I parameterized the existing EnvPosixTest so it runs tests on ChrootEnv
except the ioctl-related cases.

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57543
2016-05-06 17:42:50 -07:00
Dmitri Smirnov
ee221d2de0 Introduce XPRESS compresssion on Windows. (#1081)
Comparable with Snappy on comp ratio.
  Implemented using Windows API, does not require external package.
  Avaiable since Windows 8 and server 2012.
  Use -DXPRESS=1 with CMake to enable.
2016-04-19 22:54:24 -07:00
Yi Wu
792762c42c Split db_test.cc
Summary: Split db_test.cc into several files. Moving several helper functions into DBTestBase.

Test Plan: make check

Reviewers: sdong, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: dhruba, andrewkr, kradhakrishnan, yhchiang, leveldb, sdong

Differential Revision: https://reviews.facebook.net/D56715
2016-04-18 09:42:50 -07:00
Sherlock
80b74a6c64 Include ldb tool in the windows build (#914) 2016-04-14 10:50:34 -07:00
sdong
a23c6052c8 Don't run DBOptionsAllFieldsSettable under valgrind
Summary: Test DBOptionsAllFieldsSettable sometimes fails under valgrind. Move option settable tests to a separate test file and disable it in valgrind..

Test Plan: Run valgrind test and make sure the test doesn't run.

Reviewers: andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: kradhakrishnan, yiwu, yhchiang, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D56529
2016-04-11 14:24:55 -07:00
Andrew Kryczka
e182f03c1e Add unit tests for RepairDB
Summary:
Basic test cases:

- Manifest is lost or corrupt
- Manifest refers to too many or too few SST files
- SST file is corrupt
- Unflushed data is present when RepairDB is called

Depends on D55065 for its CreateFile() function in file_utils

Test Plan: Ran the tests.

Reviewers: IslamAbdelRahman, yhchiang, yoshinorim, sdong

Reviewed By: sdong

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D55485
2016-03-18 15:18:42 -07:00
Siying Dong
774922c680 Merge pull request #1026 from SherlockNoMad/Hist
Histogram Concurrency Improvement and Time-Windowing Support
2016-03-15 11:27:54 -07:00
SherlockNoMad
f11b0df121 Fix AppVeyor build error 2016-03-15 10:57:33 -07:00
SherlockNoMad
54f6b9e162 Histogram Concurrency Improvement and Time-Windowing Support 2016-03-11 16:54:25 -08:00
Yi Wu
f71fc77b7c Cache to have an option to fail Cache::Insert() when full
Summary:
Cache to have an option to fail Cache::Insert() when full. Update call sites to check status and handle error.

I totally have no idea what's correct behavior of all the call sites when they encounter error. Please let me know if you see something wrong or more unit test is needed.

Test Plan: make check -j32, see tests pass.

Reviewers: anthony, yhchiang, andrewkr, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D54705
2016-03-10 17:35:19 -08:00
sdong
e79ad9e184 Add Iterator Property rocksdb.iterator.version_number
Summary: We want to provide a way to detect whether an iterator is stale and needs to be recreated. Add a iterator property to return version number.

Test Plan: Add two unit tests for it.

Reviewers: IslamAbdelRahman, yhchiang, anthony, kradhakrishnan, andrewkr

Reviewed By: andrewkr

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54921
2016-03-02 16:23:59 -08:00
Yi Wu
2568985ab3 IOStatsContext::ToString() add option to exclude zero counters
Summary: similar to D52809 add option to exclude zero counters.

Test Plan:
[yiwu@dev4504.prn1 ~/rocksdb] ./iostats_context_test
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from IOStatsContextTest
[ RUN      ] IOStatsContextTest.ToString
[       OK ] IOStatsContextTest.ToString (0 ms)
[----------] 1 test from IOStatsContextTest (0 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (0 ms total)
[  PASSED  ] 1 test.

Reviewers: anthony, yhchiang, andrewkr, IslamAbdelRahman, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D54591
2016-02-23 10:26:24 -08:00
Jonathan Wiepert
cf38e56f2f Fix broken appveyor build caused by D53991
Test Plan:
pushed remote as D53991_appveyor_test branch. Appveyor run:
https://ci.appveyor.com/project/Facebook/rocksdb/build/1.0.651
shows the file is now found and the run is back to the state before the initial change:
https://ci.appveyor.com/project/Facebook/rocksdb/build/1.0.620

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: igor, dhruba

Differential Revision: https://reviews.facebook.net/D54309
2016-02-17 11:49:52 -08:00
Dmitri Smirnov
9656eab005 This partially addresses issue https://github.com/facebook/rocksdb/issues/935
testutil.cc and testharness.cc could not be moved out at this time
  as they are used by 4 benchmarks in release builds.
2016-02-03 17:06:10 -08:00
Tomas Kolda
e2972803a6 Adding support for Windows JNI build 2016-02-02 09:11:37 +01:00
Islam AbdelRahman
d6c838f1e1 Add SstFileManager (component tracking all SST file in DBs and control the deletion rate)
Summary:
Add a new class SstFileTracker that will be notified whenever a DB add/delete/move and sst file, it will also replace DeleteScheduler
SstFileTracker can be used later to abort writes when we exceed a specific size

Test Plan: unit tests

Reviewers: rven, anthony, yhchiang, sdong

Reviewed By: sdong

Subscribers: igor, lovro, march, dhruba

Differential Revision: https://reviews.facebook.net/D50469
2016-01-28 18:35:01 -08:00
Tomas Kolda
4265f81e87 Remove util/auto_roll_logger.cc (it was moved to different directory) 2016-01-27 12:07:31 +01:00
Tomas Kolda
d7f22b6d25 Fixing generated GenerateBuildVersion.vcxproj when one builds on different
locale than english. The problem is that date and time CLI utilities
generates different format so that REGEX in CMake does not work.
2016-01-27 10:51:20 +01:00
Andrew Kryczka
167bd8856d [directory includes cleanup] Finish removing util->db dependencies 2016-01-26 10:49:24 -08:00
Andrew Kryczka
46f9cd46af [directory includes cleanup] Move cross-function test points
Summary:
I split the db-specific test points out into a separate file under db/
directory. There were also a few bugs to fix in xfunc.{h,cc} that prevented it
from compiling previously; see https://reviews.facebook.net/D36825.

Test Plan:
compilation works now, below command works, will also run "make xfunc".

  $ make check ROCKSDB_XFUNC_TEST='managed_new' tests-regexp='DBTest' -j32

Reviewers: sdong, yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D53343
2016-01-26 10:49:05 -08:00
Andrew Kryczka
eceb5cb1b7 Split db_test.cc (part 1: properties)
Summary:
Moved all the tests that verify property correctness into a separate
file. The goal is to reduce compile time and complexity of db_test. I didn't
add parallelism for db_properties_test, even though these tests were
parallelized in db_test, since the file is small enough that it won't matter.

Some of these moves may be controversial since it's hard to say whether the
test is "verifying property correctness," or "using properties to verify
rocksdb's correctness." I'm interested in any opinions.

Test Plan: ran db_properties_test, also waiting on "make commit-prereq -j32"

Reviewers: yhchiang, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D52995
2016-01-20 15:17:52 -08:00
Siying Dong
22c0ed8a5f Disable Visual Studio Warning C4351
Currently Windows build is broken because of Warning C4351. Disable the warning before figuring out the right way to fix it.
2015-12-28 15:06:34 -08:00
Nathan Bronson
7d87f02799 support for concurrent adds to memtable
Summary:
This diff adds support for concurrent adds to the skiplist memtable
implementations.  Memory allocation is made thread-safe by the addition of
a spinlock, with small per-core buffers to avoid contention.  Concurrent
memtable writes are made via an additional method and don't impose a
performance overhead on the non-concurrent case, so parallelism can be
selected on a per-batch basis.

Write thread synchronization is an increasing bottleneck for higher levels
of concurrency, so this diff adds --enable_write_thread_adaptive_yield
(default off).  This feature causes threads joining a write batch
group to spin for a short time (default 100 usec) using sched_yield,
rather than going to sleep on a mutex.  If the timing of the yield calls
indicates that another thread has actually run during the yield then
spinning is avoided.  This option improves performance for concurrent
situations even without parallel adds, although it has the potential to
increase CPU usage (and the heuristic adaptation is not yet mature).

Parallel writes are not currently compatible with
inplace updates, update callbacks, or delete filtering.
Enable it with --allow_concurrent_memtable_write (and
--enable_write_thread_adaptive_yield).  Parallel memtable writes
are performance neutral when there is no actual parallelism, and in
my experiments (SSD server-class Linux and varying contention and key
sizes for fillrandom) they are always a performance win when there is
more than one thread.

Statistics are updated earlier in the write path, dropping the number
of DB mutex acquisitions from 2 to 1 for almost all cases.

This diff was motivated and inspired by Yahoo's cLSM work.  It is more
conservative than cLSM: RocksDB's write batch group leader role is
preserved (along with all of the existing flush and write throttling
logic) and concurrent writers are blocked until all memtable insertions
have completed and the sequence number has been advanced, to preserve
linearizability.

My test config is "db_bench -benchmarks=fillrandom -threads=$T
-batch_size=1 -memtablerep=skip_list -value_size=100 --num=1000000/$T
-level0_slowdown_writes_trigger=9999 -level0_stop_writes_trigger=9999
-disable_auto_compactions --max_write_buffer_number=8
-max_background_flushes=8 --disable_wal --write_buffer_size=160000000
--block_size=16384 --allow_concurrent_memtable_write" on a two-socket
Xeon E5-2660 @ 2.2Ghz with lots of memory and an SSD hard drive.  With 1
thread I get ~440Kops/sec.  Peak performance for 1 socket (numactl
-N1) is slightly more than 1Mops/sec, at 16 threads.  Peak performance
across both sockets happens at 30 threads, and is ~900Kops/sec, although
with fewer threads there is less performance loss when the system has
background work.

Test Plan:
1. concurrent stress tests for InlineSkipList and DynamicBloom
2. make clean; make check
3. make clean; DISABLE_JEMALLOC=1 make valgrind_check; valgrind db_bench
4. make clean; COMPILE_WITH_TSAN=1 make all check; db_bench
5. make clean; COMPILE_WITH_ASAN=1 make all check; db_bench
6. make clean; OPT=-DROCKSDB_LITE make check
7. verify no perf regressions when disabled

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: MarkCallaghan, IslamAbdelRahman, anthony, yhchiang, rven, sdong, guyg8, kradhakrishnan, dhruba

Differential Revision: https://reviews.facebook.net/D50589
2015-12-25 11:03:40 -08:00
Dmitri Smirnov
3fa68af316 Enable MS compiler warning c4244.
Mostly due to the fact that there are differences in sizes of int,long
  on 64 bit systems vs GNU.
2015-12-11 16:52:41 -08:00
Dmitri Smirnov
236fe21c92 Enable MS compiler warning c4244.
Mostly due to the fact that there are differences in sizes of int,long
  on 64 bit systems vs GNU.
2015-12-11 16:47:34 -08:00
Igor Canadi
64fa43843b Merge pull request #862 from ceph/wip-env
implement EnvMirror
2015-12-10 18:45:07 -08:00
Sage Weil
2074ddd625 env: add EnvMirror
This is an Env implementation that mirrors all storage-related methods on
two different backend Env's and verifies that they return the same
results (return status and read results).  This is useful for implementing
a new Env and verifying its correctness.

Signed-off-by: Sage Weil <sage@redhat.com>
2015-12-10 21:32:45 -05:00
Warren Falk
c6fedf2bf8 Add compaction_iterator and delete_scheduler tests to Windows build 2015-12-09 11:01:02 -05:00
Siying Dong
fa3dbf203f Merge pull request #853 from Vaisman/enable_C4267_warning
Enable C4267 warning
2015-12-08 17:59:24 -08:00
yuslepukhin
78de0c9222 Fix up VS 15 build.
Fix warnings
 Take advantage of native snprintf on VS 15
2015-12-08 08:38:21 -08:00
Nathan Bronson
78812ec6bf InlineSkipList - part 1/3
Summary:
This diff is 1/3 in a sequence that introduces a skip list optimized for
a key that is a freshly-allocated const char*.  The diff is broken into
pieces to make it easier to review.  This piece only introduces the new
type by copying the existing SkipList, with mechanical naming changes
and reformatting.

Test Plan: new unit test

Reviewers: igor, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D51279
2015-11-24 14:30:22 -08:00
Vasili Svirski
41b32c6059 Enable C4267 warning
* conversion from 'size_t' to 'type', by add static_cast

Tested:
* by build solution on Windows, Linux locally,
* run tests
* build CI system successful
2015-11-24 16:33:09 +03:00
yuslepukhin
047bd22aae Build on Visual Studio 2015 Update 1 2015-11-20 15:31:47 -08:00
Dmitri Smirnov
89bacb7e7d Enable MS Warning C4804 : unsafe use of type 'bool' in operation 2015-11-18 16:23:19 -08:00
Dmitri Smirnov
8f01f25414 Remove forward_iter_bench from Win build. 2015-11-16 11:10:58 -08:00
Venkatesh Radhakrishnan
7824444bfc Reuse file iterators in tailing iterator when memtable is flushed
Summary:
Under a tailing workload, there were increased block cache
misses when a memtable was flushed because we were rebuilding iterators
in that case since the version set changed. This was exacerbated in the
case of iterate_upper_bound, since file iterators which were over the
iterate_upper_bound would have been deleted and are now brought back as
part of the Rebuild, only to be deleted again. We now renew the iterators
and only build iterators for files which are added and delete file
iterators for files which are deleted.
Refer to https://reviews.facebook.net/D50463 for previous version

Test Plan: DBTestTailingIterator.TailingIteratorTrimSeekToNext

Reviewers: anthony, IslamAbdelRahman, igor, tnovak, yhchiang, sdong

Reviewed By: sdong

Subscribers: yhchiang, march, dhruba, leveldb, lovro

Differential Revision: https://reviews.facebook.net/D50679
2015-11-13 15:50:59 -08:00
yuslepukhin
2ab3e2df2c Fix a build break so tests can run 2015-11-12 14:53:19 -08:00
Yueh-Hsuan Chiang
e11f676e34 Add OptionsUtil::LoadOptionsFromFile() API
Summary:
This patch adds OptionsUtil::LoadOptionsFromFile() and
OptionsUtil::LoadLatestOptionsFromDB(), which allow developers
to construct DBOptions and ColumnFamilyOptions from a RocksDB
options file.  Note that most pointer-typed options such as
merge_operator will not be constructed.

With this API, developers no longer need to remember all the
options in order to reopen an existing rocksdb instance like
the following:

  DBOptions db_options;
  std::vector<std::string> cf_names;
  std::vector<ColumnFamilyOptions> cf_opts;

  // Load primitive-typed options from an existing DB
  OptionsUtil::LoadLatestOptionsFromDB(
      dbname, &db_options, &cf_names, &cf_opts);

  // Initialize necessary pointer-typed options
  cf_opts[0].merge_operator.reset(new MyMergeOperator());
  ...

  // Construct the vector of ColumnFamilyDescriptor
  std::vector<ColumnFamilyDescriptor> cf_descs;
  for (size_t i = 0; i < cf_opts.size(); ++i) {
    cf_descs.emplace_back(cf_names[i], cf_opts[i]);
  }

  // Open the DB
  DB* db = nullptr;
  std::vector<ColumnFamilyHandle*> cf_handles;
  auto s = DB::Open(db_options, dbname, cf_descs,
                    &handles, &db);

Test Plan:
Augment existing tests in column_family_test
options_test
db_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49095
2015-11-12 06:52:43 -08:00
Yueh-Hsuan Chiang
e114f0abb8 Enable RocksDB to persist Options file.
Summary:
This patch allows rocksdb to persist options into a file on
DB::Open, SetOptions, and Create / Drop ColumnFamily.
Options files are created under the same directory as the rocksdb
instance.

In addition, this patch also adds a fail_if_missing_options_file in DBOptions
that makes any function call return non-ok status when it is not able to
persist options properly.

  // If true, then DB::Open / CreateColumnFamily / DropColumnFamily
  // / SetOptions will fail if options file is not detected or properly
  // persisted.
  //
  // DEFAULT: false
  bool fail_if_missing_options_file;

Options file names are formatted as OPTIONS-<number>, and RocksDB
will always keep the latest two options files.

Test Plan:
Add options_file_test.

options_test
column_family_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48285
2015-11-10 22:58:01 -08:00
Nathan Bronson
b81b430987 Switch to thread-local random for skiplist
Summary:
Using a TLS random instance for skiplist makes it smaller
(useful for hash_skiplist_rep) and prepares skiplist for concurrent
adds.  This diff also modifies the branching factor math to avoid an
unnecessary division.

This diff has the effect of changing the sequence of skip list node
height choices made by tests, so it has the potential to cause unit
test failures for tests that implicitly rely on the exact structure
of the skip list.  Tests that try to exactly trigger a compaction are
likely suspects for this problem (these tests have always been brittle to
changes in the skiplist details).  I've minimizes this risk by reseeding
the main thread's Random at the beginning of each test, increasing the
universal compaction size_ratio limit from 101% to 105% for some tests,
and verifying that the tests pass many times.

Test Plan: for i in `seq 0 9`; do make check; done

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D50439
2015-11-09 19:25:22 -08:00
Dmitri Smirnov
7c86d50497 Enable C4305 'identifier' : truncation from 'type1' to 'type2' 2015-11-06 16:44:39 -08:00
Dmitri Smirnov
85a2ce9c19 Enable C4702 unreachable code 2015-11-06 15:50:11 -08:00
Dmitri Smirnov
62aa1b1b78 Enable C4200 warning
nonstandard extension used : zero-sized array in struct/union
2015-11-06 15:32:45 -08:00
Dmitri Smirnov
20f57b1715 Enable Windows warnings C4307 C4309 C4512 C4701
Enable C4307 'operator' : integral constant overflow
  Longs and ints on Windows are 32-bit hence the overflow
  Enable C4309 'conversion' : truncation of constant value
  Enable C4512 'class' : assignment operator could not be generated
  Enable C4701 Potentially uninitialized local variable 'name' used
2015-11-06 11:34:06 -08:00
Yueh-Hsuan Chiang
183cadfc87 Add OptionsSanityCheckLevel
Summary:
This patch introduces OptionsSanityCheckLevel internally to enable
sanity check rocksdb options.

Utilities API will be added in the follow-up diffs.

Test Plan: Added more tests in options_test

Reviewers: igor, IslamAbdelRahman, sdong, anthony

Reviewed By: anthony

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49515
2015-11-04 18:53:30 -08:00
Yueh-Hsuan Chiang
7d7ee2b654 Add Memory Insight support to utilities
Summary:
This patch introduces utilities/memory, which currently includes
GetApproximateMemoryUsageByType that reports different types of
rocksdb memory usage given a list of input DBs.

The API also take care of the case where Cache could be shared
across multiple column families / multiple db instances.

Currently, it reports memory usage of memtable, table-readers
and cache.

Test Plan: utilities/memory/memory_test.cc

Reviewers: igor, anthony, IslamAbdelRahman, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D49257
2015-11-03 17:52:17 -08:00
Dmitri Smirnov
a0163c0682 Do not disable compiler warnings:
C4101 'identifier' : unreferenced local variable
  C4189 'identifier' : local variable is initialized but not referenced
  C4100 'identifier' : unreferenced formal parameter
  C4296 'operator' : expression is always false
2015-11-02 14:11:28 -08:00