7476 Commits

Author SHA1 Message Date
Yi Wu
91b2c6d1f4 Bump version to 5.17.3 2018-11-19 22:17:46 -08:00
Yi Wu
f445eadba6 BlobDB: handle IO error on write (#4580)
Summary:
A fix similar to #4410 but on the write path. On IO error on `SelectBlobFile()` we didn't return error code properly, but simply a nullptr of `BlobFile`. The `AppendBlob()` method didn't have null check for the pointer and caused crash. The fix make sure we properly return error code in this case.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4580

Differential Revision: D10513849

Pulled By: yiwu-arbug

fbshipit-source-id: 80bca920d1d7a3541149de981015ad83e0aa14b5
2018-11-19 22:17:12 -08:00
Fosco Marotto
f438b98e55 Update history for 5.17.2 release v5.17.2 2018-11-12 11:57:32 -08:00
Yanqin Jin
fb4063749b Bump version 2018-11-02 15:23:50 -07:00
Yanqin Jin
05c9d53a40 Update TARGET file after changing buck template 2018-11-02 15:21:10 -07:00
Philip Jameson
832ab1424d Change BUCK template files (#4624)
Summary:
Slightly changes the format of generated BUCK files for Facebook consumption. Generated targets end up looking like this:
```
cpp_library(
    name = "rocksdb_tools_lib",
    srcs = [
        "tools/db_bench_tool.cc",
        "tools/trace_analyzer_tool.cc",
        "util/testutil.cc",
    ],
    auto_headers = AutoHeaders.RECURSIVE_GLOB,
    arch_preprocessor_flags = rocksdb_arch_preprocessor_flags,
    compiler_flags = rocksdb_compiler_flags,
    preprocessor_flags = rocksdb_preprocessor_flags,
    deps = [":rocksdb_lib"],
    external_deps = rocksdb_external_deps,
)
```
Instead of
```
cpp_library(
    name = "rocksdb_tools_lib",
    srcs = [
        "tools/db_bench_tool.cc",
        "tools/trace_analyzer_tool.cc",
        "util/testutil.cc",
    ],
    headers = AutoHeaders.RECURSIVE_GLOB,
    arch_preprocessor_flags = rocksdb_arch_preprocessor_flags,
    compiler_flags = rocksdb_compiler_flags,
    preprocessor_flags = rocksdb_preprocessor_flags,
    deps = [":rocksdb_lib"],
    external_deps = rocksdb_external_deps,
)
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4624

Reviewed By: riversand963

Differential Revision: D12906711

Pulled By: philipjameson

fbshipit-source-id: 32ab64a3390cdcf2c4043ff77517ac1ad58a5e2b
2018-11-02 15:20:35 -07:00
Fosco Marotto
516c7df05d Revert "Introduce CacheAllocator, a custom allocator for cache blocks (#4437)"
This reverts commit bb3b2eb960a162e10d7730f4e0a08fd8801d6184.
2018-10-31 14:55:11 -07:00
Igor Canadi
bb3b2eb960 Introduce CacheAllocator, a custom allocator for cache blocks (#4437)
Summary:
This is a conceptually simple change, but it touches many files to
pass the allocator through function calls.

We introduce CacheAllocator, which can be used by clients to configure
custom allocator for cache blocks. Our motivation is to hook this up
with folly's `JemallocNodumpAllocator`
(f43ce6d686/folly/experimental/JemallocNodumpAllocator.h),
but there are many other possible use cases.

Additionally, this commit cleans up memory allocation in
`util/compression.h`, making sure that all allocations are wrapped in a
unique_ptr as soon as possible.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437

Differential Revision: D10132814

Pulled By: yiwu-arbug

fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564
2018-10-31 13:02:53 -07:00
Yi Wu
ae5305ee70 Fix compile error with jemalloc (#4488)
Summary:
The "je_" prefix of jemalloc APIs presents only when the macro `JEMALLOC_NO_RENAME` from jemalloc.h presents.

With the patch I'm also adding -DROCKSDB_JEMALLOC flag in buck TARGETS.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4488

Differential Revision: D10355971

Pulled By: yiwu-arbug

fbshipit-source-id: 03a2d69790a44ac89219c7525763fa937a63d95a
2018-10-30 12:37:06 -07:00
Anand Ananthabhotla
f37ea82191 Update HISTORY.md
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
2018-10-19 16:38:03 -07:00
Anand Ananthabhotla
c954445ac8 Handle mixed slowdown/no_slowdown writer properly (#4475)
Summary:
There is a bug when the write queue leader is blocked on a write
delay/stop, and the queue has writers with WriteOptions::no_slowdown set
to true. They are not woken up until the write stall is cleared.

The fix introduces a dummy writer inserted at the tail to indicate a
write stall and prevent further inserts into the queue, and a condition
variable that writers who can tolerate slowdown wait on before adding
themselves to the queue. The leader calls WriteThread::BeginWriteStall()
to add the dummy writer and then walk the queue to fail any writers with
no_slowdown set. Once the stall clears, the leader calls
WriteThread::EndWriteStall() to remove the dummy writer and signal the
condition variable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4475

Differential Revision: D10285827

Pulled By: anand1976

fbshipit-source-id: 747465e5e7f07a829b1fb0bc1afcd7b93f4ab1a9
2018-10-19 16:29:57 -07:00
Siying Dong
619f754816 Fix WriteBatchWithIndex's SeekForPrev() (#4559)
Summary:
WriteBatchWithIndex's SeekForPrev() has a bug that we internally place the position just before the seek key rather than after. This makes the iterator to miss the result that is the same as the seek key. Fix it by position the iterator equal or smaller.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4559

Differential Revision: D10468534

Pulled By: siying

fbshipit-source-id: 2fb371ae809c561b60a1c11cef71e1c66fea1f19
2018-10-19 15:31:49 -07:00
Andrew Kryczka
f81fe96eea update HISTORY.md and version number 2018-10-16 15:43:44 -07:00
anand1976
7d3eb006ff Properly determine a truncated CompactRange stop key (#4496)
Summary:
When a CompactRange() call for a level is truncated before the end key
is reached, because it exceeds max_compaction_bytes, we need to properly
set the compaction_end parameter to indicate the stop key. The next
CompactRange will use that as the begin key. We set it to the smallest
key of the next file in the level after expanding inputs to get a clean
cut.

Previously, we were setting it before expanding inputs. So we could end
up recompacting some files. In a pathological case, where a single key
has many entries spanning all the files in the level (possibly due to
merge operands without a partial merge operator, thus resulting in
compaction output identical to the input), this would result in
an endless loop over the same set of files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4496

Differential Revision: D10395026

Pulled By: anand1976

fbshipit-source-id: f0c2f89fee29b4b3be53b6467b53abba8e9146a9
2018-10-16 15:42:06 -07:00
Andrew Kryczka
02d47c1ff4 Avoid per-key linear scan over snapshots in compaction (#4495)
Summary:
`CompactionIterator::snapshots_` is ordered by ascending seqnum, just like `DBImpl`'s linked list of snapshots from which it was copied. This PR exploits this ordering to make `findEarliestVisibleSnapshot` do binary search rather than linear scan. This can make flush/compaction significantly faster when many snapshots exist since that function is called on every single key.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4495

Differential Revision: D10386470

Pulled By: ajkr

fbshipit-source-id: 29734991631227b6b7b677e156ac567690118a8b
2018-10-16 13:11:32 -07:00
Fosco Marotto
0f17fb9622 Update version macro for 5.17 (#4472)
Summary:
Forgot this in previous commit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4472

Differential Revision: D10244227

Pulled By: gfosco

fbshipit-source-id: ba0cf7a2f5271f0d9f9443004e2620887cd5fd11
2018-10-08 16:26:12 -07:00
Fosco Marotto
9552660ecc Update HISTORY.md to current status (#4471)
Summary:
5.16.x status wasn't tracked, and also updated for pending 5.17 release.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4471

Differential Revision: D10240925

Pulled By: gfosco

fbshipit-source-id: 95ab368a04a65b201d2518097af69edf2402f544
2018-10-08 11:21:55 -07:00
Andrew Gallagher
016437d1cd rocksdb: put #pragma once before #ifdef
Summary: Work around upstream bug with modules: https://bugs.llvm.org/show_bug.cgi?id=39184.

Reviewed By: yiwu-arbug

Differential Revision: D10209569

fbshipit-source-id: 696853a02a3869e9c33d0e61168ad4b0436fa3c0
2018-10-05 10:30:11 -07:00
Fosco Marotto
b6b72687c6 Revert "Introduce CacheAllocator, a custom allocator for cache blocks (#4437)"
This reverts commit 1cf5deb8fdecb7f63ce5ce1a0e942222a95f881e.
2018-10-04 15:17:30 -07:00
JiYou
a1f6142f38 VersionSet: GetOverlappingInputs() fix overflow and optimize. (#4385)
Summary:
This fix is for `level == 0` in `GetOverlappingInputs()`:
- In `GetOverlappingInputs()`, if `level == 0`, it has potential
risk of overflow if `i == 0`.
- Optmize process when `expand = true`, the expected complexity
can be reduced to O(n).

Signed-off-by: JiYou <jiyou09@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4385

Differential Revision: D10181001

Pulled By: riversand963

fbshipit-source-id: 46eef8a1d1605c9329c164e6471cd5c5b6de16b5
2018-10-03 18:40:59 -07:00
Igor Canadi
1cf5deb8fd Introduce CacheAllocator, a custom allocator for cache blocks (#4437)
Summary:
This is a conceptually simple change, but it touches many files to
pass the allocator through function calls.

We introduce CacheAllocator, which can be used by clients to configure
custom allocator for cache blocks. Our motivation is to hook this up
with folly's `JemallocNodumpAllocator`
(f43ce6d686/folly/experimental/JemallocNodumpAllocator.h),
but there are many other possible use cases.

Additionally, this commit cleans up memory allocation in
`util/compression.h`, making sure that all allocations are wrapped in a
unique_ptr as soon as possible.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437

Differential Revision: D10132814

Pulled By: yiwu-arbug

fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564
2018-10-02 17:24:58 -07:00
Yanqin Jin
4e58b2ea3d Check for compression lib support before test exec (#4443)
Summary:
Before running CompactFilesTest.SentinelCompressionType, we should check
whether zlib and snappy are supported.

CompactFilesTest.SentinelCompressionType is a newly added test. Compilation and
linking with different options, e.g. COMPILE_WITH_TSAN, COMPILE_WITH_ASAN, etc.
lead to generation of different binaries. On the one hand, it's not clear why
zlib or snappy is present under ASAN, but not under TSAN. On the other hand,
changing the compilation flags for TSAN or ASAN seems a bigger change worth much
more attention. To unblock the cont-runs, I suggest that we simply add these
two checks at the beginning of the test, as we did for
GeneralTableTest.ApproximateOffsetOfCompressed in table/table_test.cc.

Future actions include invesigating the absence of zlib and snappy when
compiling with TSAN, i.e. COMPILE_WITH_TSAN=1, if necessary.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4443

Differential Revision: D10140935

Pulled By: riversand963

fbshipit-source-id: 62f96d1e685386accd2ef0b98f6f754d3fd67b3e
2018-10-02 10:42:01 -07:00
Jakub Cech
d78b2893bc Adding IOTA Foundation to USERS.MD (#4436)
Summary:
Adding IOTA Foundation to USERS.MD
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4436

Differential Revision: D10108142

Pulled By: sagar0

fbshipit-source-id: 948dc9f7169cec5c113ae347f1af765a41355aae
2018-10-02 10:03:46 -07:00
Gihwan Oh
477107d6f9 Add proper newline markdown (#4434)
Summary:
Add newline for readability
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4434

Differential Revision: D10127684

Pulled By: riversand963

fbshipit-source-id: 39f3ed7eaea655b6ff83474bc9f7616c6ad59107
2018-10-01 17:27:13 -07:00
Yanqin Jin
be5cc4c7b8 Remove a race condition between lsdir and rm (#4440)
Summary:
In DBCompactionTestWithParam::ManualLevelCompactionOutputPathId, there is
a race condition between `DBTestBase::GetSstFileCount` and
`DBImpl::PurgeObsoleteFiles`. The following graph explains why.

```
Timeline  db_compact_test_t              bg_flush_t         bg_compact_t
    |  [initiate bg flush and
    |      start waiting]
    |                                     flush
    |                                     DeleteObsoleteFiles
    |  [waken up by bg_flush_t which
    |   signaled in DeleteObsoleteFiles]
    |
    |  [initiate compaction and
    |   start waiting]
    |
    |                                                         [compact,
    |                                                          set manual.done to true]
    |                                   [signal at the end of
    |                                    BackgroundCallFlush]
    |
    |  [waken up by bg_flush_t
    |   which signaled before
    |   returning from
    |   BackgroundCallFlush]
    |
    |  Check manual.done is true
    |
    |  GetSstFileCount    <-- race condition -->           PurgeObsoleteFiles
    V
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4440

Differential Revision: D10122628

Pulled By: riversand963

fbshipit-source-id: 3ede73c39fee6ad804dc6ac1ed84759c7e63977f
2018-10-01 11:57:55 -07:00
Andrew Kryczka
ac6f435a9a Fix CompactFiles support for kDisableCompressionOption (#4438)
Summary:
Previously `CompactFiles` with `CompressionType::kDisableCompressionOption` caused program to crash on assertion failure. This PR fixes the crash by adding support for that setting. Now, that setting will cause RocksDB to choose compression according to the column family's options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4438

Differential Revision: D10115761

Pulled By: ajkr

fbshipit-source-id: a553c6fa76fa5b6f73b0d165d95640da6f454122
2018-10-01 01:18:10 -07:00
Yi Wu
d6f2ecf49c Utility to run task periodically in a thread (#4423)
Summary:
Introduce `RepeatableThread` utility to run task periodically in a separate thread. It is basically the same as the the same class in fbcode, and in addition provide a helper method to let tests mock time and trigger execution one at a time.

We can use this class to replace `TimerQueue` in #4382 and `BlobDB`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4423

Differential Revision: D10020932

Pulled By: yiwu-arbug

fbshipit-source-id: 3616bef108c39a33c92eedb1256de424b7c04087
2018-09-27 15:28:00 -07:00
JiYou
75ca13875c FindFile: use std::lower_bound reduce the repeated code. (#4372)
Summary:
`FindFile()` and  `FindFileInRange()` actually works as the same
of `std::lower_bound()`. Use `std::lower_bound()` to reduce the
repeated code.

- change `FindFile()` and `FindFileInRange()` to use `std::lower_bound()`

Signed-off-by: JiYou <jiyou09@gmail.com>
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4372

Differential Revision: D9919677

Pulled By: ajkr

fbshipit-source-id: f74aaa30e2f80e410e299c5a5bca4eaf2a7a26de
2018-09-27 10:35:00 -07:00
Sagar Vemuri
b1dad4cfcc assert in PosixEnv::FileExists should be based on errno (#4427)
Summary:
The assert in PosixEnv::FileExists is currently based on the return value of `access` syscall. Instead it should be based on errno.

Initially I wanted to remove this assert as [`access`](https://linux.die.net/man/2/access) can error out in a few other cases (like EROFS). But on thinking more it feels like the assert is doing the right thing ...  its good to crash on EROFS, EFAULT, EINVAL, and other major filesystem related problems so that the user is immediately aware of the problems while testing.
(I think it might be ok to crash on EIO as well, but there might be a specific reason why it was decided not to crash for EIO, and I don't have that context. So letting the letting the assert checks remain as is for now).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4427

Differential Revision: D10037200

Pulled By: sagar0

fbshipit-source-id: 5cc96116a2e53cef701f444a8b5290576f311e51
2018-09-26 13:25:15 -07:00
Andrew Kryczka
d56070d875 Fix benchmark script with vector memtable (#4428)
Summary:
I guess we didn't update this script when `--allow_concurrent_memtable_write` became true by default.

Fixes #4413.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4428

Differential Revision: D10036452

Pulled By: ajkr

fbshipit-source-id: f464be0642bd096d9040f82cdc3eae614a902183
2018-09-26 13:22:45 -07:00
Yi Wu
dc813e4b85 Improve log handling when recover without flush (#4405)
Summary:
Improve log handling when avoid_flush_during_recovery=true.
1. restore total_log_size_ after recovery, by summing up existing log sizes. Fixes #4253.
2. truncate the last existing log, since this log can contain preallocated space and it will be a waste to keep the space. It avoids a crash loop of user application cause a lot of log with non-trivial size being created and ultimately take up all disk space.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4405

Differential Revision: D9953933

Pulled By: yiwu-arbug

fbshipit-source-id: 967780fee8acec7f358b6eb65190fb4684f82e56
2018-09-26 10:37:48 -07:00
Nikhil Benesch
17edc82a4b Handle tombstones at the same seqno in the CollapsedRangeDelMap (#4424)
Summary:
The CollapsedRangeDelMap was entirely mishandling tombstones at the same
sequence number when the tombstones did not have identical start and end
keys. Such tombstones are common since 90fc40690, which causes
tombstones to be split during compactions.

For example, if the tombstone [a, c) @ 1 lies across a compaction
boundary at b, it will be split into [a, b) @ 1 and [b, c) @ 1. Without
this patch, the collapsed range deletion map would look like this:

  a -> 1
  b -> 1
  c -> 0

Notice how the b -> 1 entry is redundant. When the tombstones overlap,
the problem is even worse. Consider tombstones [a, c) @ 1 and [b, d) @
1, which produces this map without this patch:

  a -> 1
  b -> 1
  c -> 0
  d -> 0

This map is corrupt, as a map can never contain adjacent sentinel (zero)
entries. When the iterator advances from b to c, it will notice that c
is a sentinel enty and skip to d--but d is also a sentinel entry! Asking
what tombstone this iterator points to will trigger an assertion, as it
is not pointing to a valid tombstone.

/cc ajkr
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4424

Differential Revision: D10039248

Pulled By: abhimadan

fbshipit-source-id: 6d737c1e88d60e80cf27286726627ba44463e7f4
2018-09-25 14:50:31 -07:00
Yi Wu
31d46993cc Update TARGETS file template (#4426)
Summary:
Update template of TARGETS file according to recent changes in #4371 , #4363 and dbf44c314b.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4426

Differential Revision: D10025053

Pulled By: yiwu-arbug

fbshipit-source-id: e6a0a702bfd401fc1af240ee446f5690f0bcd85d
2018-09-25 14:14:01 -07:00
Abhishek Madan
3c350a7cf0 Improve RangeDelAggregator benchmarks (#4395)
Summary:
Improve time measurements for AddTombstones to only include the
call and not the VectorIterator setup. Also add a new
add_tombstones_per_run flag to call AddTombstones multiple times per
aggregator, which will help simulate more realistic workloads.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4395

Differential Revision: D9996811

Pulled By: abhimadan

fbshipit-source-id: 5865a95c323fbd9b3606493013664b4890fe5a02
2018-09-21 16:13:08 -07:00
Yi Wu
04d373b260 BlobDB: handle IO error on read (#4410)
Summary:
Fix IO error on read not being handle and crashing the DB. With the fix we properly return the error.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4410

Differential Revision: D9979246

Pulled By: yiwu-arbug

fbshipit-source-id: 111a85675067a29c03cb60e9a34103f4ff636694
2018-09-20 16:58:45 -07:00
Anand Ananthabhotla
72712f4e28 Allow dynamic modification of window size and deletion trigger (#4403)
Summary:
Make the CompactOnDeletionCollectorFactory class public, and provide
methods to update the window size and deletion trigger params. These
will take effect on subsequent created SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4403

Differential Revision: D9976857

Pulled By: anand1976

fbshipit-source-id: 31dbf0511c12fa2bb9b2a7ba620079e0ee09cf48
2018-09-20 15:15:28 -07:00
Chen, You
02dc074916 add GetAggregatedLongProperty for Java API (#4379)
Summary:
Add Java API `getAggregatedLongProperty(final String property)`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4379

Differential Revision: D9921463

Pulled By: sagar0

fbshipit-source-id: a02512e1b2aff4765a10b77de9a7bf7b1909d954
2018-09-19 17:46:59 -07:00
Abhishek Madan
519f8b145f Generate appropriate number of keys in db_bench (#4404)
Summary:
If range tombstones are generated every few writes, the
KeyGenerator's limit is now extended to account for the additional
Next() calls. This is primarily important for `filluniquerandom`
benchmarks that enforce the call limit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4404

Differential Revision: D9949326

Pulled By: abhimadan

fbshipit-source-id: 0bdfeb2cad2098dc0b8b029236dab5e4bef25e38
2018-09-19 16:28:21 -07:00
Zhongyi Xie
9b3cf908a6 add missing range in random.choice argument (#4397)
Summary:
This will fix the broken asan crash test:
> Traceback (most recent call last):
  File "tools/db_crashtest.py", line 384, in <module>
    main()
  File "tools/db_crashtest.py", line 368, in main
    parser.add_argument("--" + k, type=type(v() if callable(v) else v))
  File "tools/db_crashtest.py", line 59, in <lambda>
    "index_block_restart_interval": lambda: random.choice(1, 16),
TypeError: choice() takes exactly 2 arguments (3 given)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397

Differential Revision: D9933041

Pulled By: miasantreble

fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
2018-09-19 12:13:20 -07:00
Maysam Yabandeh
a0ebec3804 Extend crash test with index_block_restart_interval (#4383)
Summary:
The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383

Differential Revision: D9887304

Pulled By: maysamyabandeh

fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
2018-09-18 15:43:29 -07:00
Fosco Marotto
886766c31d Fix issue with docs/feed.xml validation (#4392)
Summary:
Per #4387 this should address the validation error with the link tag.  This is a quick fix, a future iteration could significantly upgrade the jekyll integration.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4392

Differential Revision: D9923643

Pulled By: gfosco

fbshipit-source-id: e7ed478e55c907add8319290326540e6e44fc0d6
2018-09-18 13:43:32 -07:00
Andrew Kryczka
990b52e95b Unit test for custom comparator RangeDelAggregator (#4388)
Summary:
Add a unit test for range collapsing when non-default comparator is used. This exposes the bug fixed in #4386.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4388

Differential Revision: D9918252

Pulled By: ajkr

fbshipit-source-id: 99501b96b251eab41791a7e33b27055ee36c5c39
2018-09-18 12:13:20 -07:00
jsteemann
27221b0cc2 use specified comparator in CollapsedRangeDelMap (#4386)
Summary:
The Comparator passed to CollapsedRangeDelMap was not used for
operator less of the std::map `rep_` object contained in
CollapsedRangeDelMap. So the map was always sorted using the
default ByteWiseComparator, which seems wrong.

Passing the specified Comparator through for usage in that map
object fixes actual problems we were seeing with RangeDelete operations
that do not delete keys as expected when using a custom Comparator.

I found that the tests in current master crash when I run them locally,
both with and without my patch, at the very same location. I therefore
don't know if the patch breaks something else, but it seems to fix
RangeDeletion issues in our product that uses RocksDB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4386

Differential Revision: D9916506

Pulled By: ajkr

fbshipit-source-id: 27bff8c775831f089dde8c5289df7343d88b2d66
2018-09-18 09:28:30 -07:00
Maysam Yabandeh
65ac72edd9 Fix bug in partition filters with format_version=4 (#4381)
Summary:
Value delta encoding in format_version 4 requires the differences between the size of two consecutive handles to be sent to BlockBuilder::Add. This applies not only to indexes on blocks but also the indexes on indexes and filters in partitioned indexes and filters respectively. The patch fixes a bug where the partitioned filters would encode the entire size of the handle rather than the difference of the size with the last size.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4381

Differential Revision: D9879505

Pulled By: maysamyabandeh

fbshipit-source-id: 27a22e49b482b927fbd5629dc310c46d63d4b6d1
2018-09-17 17:28:15 -07:00
Abhishek Madan
1626f6ab6b Add RangeDelAggregator microbenchmarks (#4363)
Summary:
To measure the results of upcoming DeleteRange v2 work, this commit adds
simple benchmarks for RangeDelAggregator. It measures the average time
for AddTombstones and ShouldDelete calls.

Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results:

Before #4014:
```
=======================
Results:
=======================
AddTombstones:          1356.28 us
ShouldDelete:           0.401732 us
```

Latest master:
```
=======================
Results:
=======================
AddTombstones:          740.82 us
ShouldDelete:           0.383271 us
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363

Differential Revision: D9881676

Pulled By: abhimadan

fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442
2018-09-17 14:58:31 -07:00
Anand Ananthabhotla
30c21df97c Fix regression test failures introduced by PR #4164 (#4375)
Summary:
1. Add override keyword to overridden virtual functions in EventListener
2. Fix a memory corruption that can happen during DB shutdown when in
read-only mode due to a background write error
3. Fix uninitialized buffers in error_handler_test.cc that cause
valgrind to complain
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4375

Differential Revision: D9875779

Pulled By: anand1976

fbshipit-source-id: 022ede1edc01a9f7e21ecf4c61ef7d46545d0640
2018-09-17 13:14:07 -07:00
Andrew Kryczka
8c25204633 Support manual flush in stress/crash tests (#4368)
Summary:
- Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
- Enabled by default in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368

Differential Revision: D9838593

Pulled By: ajkr

fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
2018-09-17 12:27:55 -07:00
Sagar Vemuri
ac46790374 Fix sync-point comment in Block destructor (#4380)
Summary:
This is a follow up to #4370. The earlier comment is not correct.

Thanks to ajkr for pointing this out.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4380

Differential Revision: D9874667

Pulled By: sagar0

fbshipit-source-id: f4e092d86b29c715258210b770643d367e38caae
2018-09-17 11:58:11 -07:00
Anand Ananthabhotla
dfda91027b Remove trace_analyzer_tool.cc from rocksdb_lib buck target (#4371)
Summary:
Including tools/trace_analyzer_tool.cc in rocksdb_lib was causing conflicts in dependent binaries due to duplicate gflag (other_prefix).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4371

Differential Revision: D9846953

Pulled By: anand1976

fbshipit-source-id: 80b4aa36ab8428b8f6dceb896c45532684102709
2018-09-15 19:58:13 -07:00
Anand Ananthabhotla
a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00