Summary:
With a regression bug was introduced two years ago, by 6e9fbeb27c , we fail to check return status of fsync call. This can cause we miss the information from the file system and can potentially cause corrupted data which we could have been detected.
Closes https://github.com/facebook/rocksdb/pull/2495
Reviewed By: ajkr
Differential Revision: D5321949
Pulled By: siying
fbshipit-source-id: c68117914bb40700198fc37d0e4c63163a8a1031
Summary:
Even if hard limit hits, flushing more memtable may not help cap the memory usage if already more than half data is scheduled for flush. Not triggering flush instead.
Closes https://github.com/facebook/rocksdb/pull/2469
Differential Revision: D5284249
Pulled By: siying
fbshipit-source-id: 8ab7ba1aba56a634dbe72b318fcab2093063972e
Summary:
Added a flag, `ignore_unknown_options`, to skip unknown options when loading an options file (using `LoadLatestOptions`/`LoadOptionsFromFile`) or while verifying options (using `CheckOptionsCompatibility`). This will help in downgrading the db to an older version.
Also added `--ignore_unknown_options` flag to ldb
**Example Use case:**
In MyRocks, if copying from newer version to older version, it is often impossible to start because of new RocksDB options that don't exist in older version, even though data format is compatible.
MyRocks uses these load and verify functions in [ha_rocksdb.cc::check_rocksdb_options_compatibility](e004fd9f41/storage/rocksdb/ha_rocksdb.cc (L3348-L3401)).
**Test Plan:**
Updated the unit tests.
`make check`
ldb:
$ ./ldb --db=/tmp/test_db --create_if_missing put a1 b1
OK
Now edit /tmp/test_db/<OPTIONS-file> and add an unknown option.
Try loading the options now, and it fails:
$ ./ldb --db=/tmp/test_db --try_load_options get a1
Failed: Invalid argument: Unrecognized option DBOptions:: abcd
Passes with the new --ignore_unknown_options flag
$ ./ldb --db=/tmp/test_db --try_load_options --ignore_unknown_options get a1
b1
Closes https://github.com/facebook/rocksdb/pull/2423
Differential Revision: D5212091
Pulled By: sagar0
fbshipit-source-id: 2ec17636feb47dc0351b53a77e5f15ef7cbf2ca7
Summary:
We had a crash in this code: `fstat()` failed; `file_stats` contained garbage, in particular `file_stats.st_blksize == 6`; the expression `file_stats.st_blocks / (file_stats.st_blksize / 512)` divided by zero.
Closes https://github.com/facebook/rocksdb/pull/2420
Differential Revision: D5216110
Pulled By: al13n321
fbshipit-source-id: 6d8fc5e7c4f98c1139e68c7829ebdbac68b0fce0
Summary:
Adding SSTFileWriter's newly introduced put, merge and delete apis to the Java api. The C++ APIs were first introduced in #2361.
Add is deprecated in favor of Put.
Merge is especially needed to support streaming for Cassandra-on-RocksDB work in https://issues.apache.org/jira/browse/CASSANDRA-13476.
Closes https://github.com/facebook/rocksdb/pull/2392
Differential Revision: D5165091
Pulled By: sagar0
fbshipit-source-id: 6f0ad396a7cbd2e27ca63e702584784dd72acaab
Summary:
UBSAN crashes when it run the test. Disabling it for UBSAN.
Closes https://github.com/facebook/rocksdb/pull/2427
Differential Revision: D5210897
Pulled By: yiwu-arbug
fbshipit-source-id: 2f5a876807c98d8db79ab9581965f7e6b29d4163
Summary:
/home/travis/build/facebook/rocksdb/env/mock_env.cc: In member function ‘virtual void rocksdb::{anonymous}::TestMemLogger::Logv(const char*, va_list)’:
/home/travis/build/facebook/rocksdb/env/mock_env.cc:391:53: error: ‘t.tm::tm_year’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
static_cast<int>(now_tv.tv_usec));
Closes https://github.com/facebook/rocksdb/pull/2418
Differential Revision: D5193597
Pulled By: maysamyabandeh
fbshipit-source-id: 8801a3ef27f33eb419d534f7de747702cdf504a0
Summary:
USE_CLANG=1 make -j32 analyze
The two errors would disappear after the assertion.
Closes https://github.com/facebook/rocksdb/pull/2416
Differential Revision: D5193526
Pulled By: maysamyabandeh
fbshipit-source-id: 16a21f18f68023f862764dd3ab9e00ca60b0eefa
Summary:
filter_block_set_ access must also be protected with mutex.
Closes https://github.com/facebook/rocksdb/pull/2413
Differential Revision: D5193159
Pulled By: maysamyabandeh
fbshipit-source-id: 6987fc219d9a65c20b9c7e52151aef4b8e4882e6
Summary:
Attempt to force travis to build with clang on MacOS
Closes https://github.com/facebook/rocksdb/pull/2408
Differential Revision: D5186635
Pulled By: yiwu-arbug
fbshipit-source-id: dbb779eff07b1cb7dbd2092631303cf946316656
Summary:
There are a couple of warnings while building RocksJava, coming from Javadoc generation.
```
Generating target/apidocs/org/rocksdb/RocksDB.html...
src/main/java/org/rocksdb/RocksDB.java:2139: warning: no throws for org.rocksdb.RocksDBException
public void ingestExternalFile(final List<String> filePathList,
^
src/main/java/org/rocksdb/RocksDB.java:2162: warning: no throws for org.rocksdb.RocksDBException
public void ingestExternalFile(final ColumnFamilyHandle columnFamilyHandle,
^
```
Closes https://github.com/facebook/rocksdb/pull/2396
Differential Revision: D5178388
Pulled By: sagar0
fbshipit-source-id: a0ab6696d6de78d089a9a860a559f64cc320019e
Summary:
If ReadOptions.low_pri=true and compaction is behind, the write will either return immediate or be slowed down based on ReadOptions.no_slowdown.
Closes https://github.com/facebook/rocksdb/pull/2369
Differential Revision: D5127619
Pulled By: siying
fbshipit-source-id: d30e1cff515890af0eff32dfb869d2e4c9545eb0
Summary:
clean up the current test dir if the last regression test is still running.
Closes https://github.com/facebook/rocksdb/pull/2401
Differential Revision: D5177882
Pulled By: lightmark
fbshipit-source-id: 91d899fcc2bde841948eae71af8584d4bdb35468
Summary:
… headers
https://github.com/facebook/rocksdb/pull/2199 should not reference RocksDB-specific macros (like ROCKSDB_SUPPORT_THREAD_LOCAL in this case) to public headers, `iostats_context.h` and `perf_context.h`. We shouldn't do that because users have to provide these compiler flags when building their binary with RocksDB.
We should hide the thread local global variable inside our implementation and just expose a function api to retrieve these variables. It may break some users for now but good for long term.
make check -j64
Closes https://github.com/facebook/rocksdb/pull/2380
Differential Revision: D5177896
Pulled By: lightmark
fbshipit-source-id: 6fcdfac57f2e2dcfe60992b7385c5403f6dcb390
Summary:
Fixes the following scenario:
1. Set prefix extractor. Enable bloom filters, with `whole_key_filtering = false`. Use compaction filter that sometimes returns `kRemoveAndSkipUntil`.
2. Do a compaction.
3. Compaction creates an iterator with `total_order_seek = false`, calls `SeekToFirst()` on it, then repeatedly calls `Next()`.
4. At some point compaction filter returns `kRemoveAndSkipUntil`.
5. Compaction calls `Seek(skip_until)` on the iterator. The key that it seeks to happens to have prefix that doesn't match the bloom filter. Since `total_order_seek = false`, iterator becomes invalid, and compaction thinks that it has reached the end. The rest of the compaction input is silently discarded.
The fix is to make compaction iterator use `total_order_seek = true`.
The implementation for PlainTable is quite awkward. I've made `kRemoveAndSkipUntil` officially incompatible with PlainTable. If you try to use them together, compaction will fail, and DB will enter read-only mode (`bg_error_`). That's not a very graceful way to communicate a misconfiguration, but the alternatives don't seem worth the implementation time and complexity. To be able to check in advance that `kRemoveAndSkipUntil` is not going to be used with PlainTable, we'd need to extend the interface of either `CompactionFilter` or `InternalIterator`. It seems unlikely that anyone will ever want to use `kRemoveAndSkipUntil` with PlainTable: PlainTable probably has very few users, and `kRemoveAndSkipUntil` has only one user so far: us (logdevice).
Closes https://github.com/facebook/rocksdb/pull/2349
Differential Revision: D5110388
Pulled By: lightmark
fbshipit-source-id: ec29101a99d9dcd97db33923b87f72bce56cc17a
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350
Differential Revision: D5110648
Pulled By: siying
fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
Summary:
Some users want to monitor column family activity in their custom memtable implementations. Previously there was no way to figure out with which column family a memtable is associated. This diff:
- adds an overload to MemTableRepFactory::CreateMemTableRep() that provides the CF ID. For compatibility, its default implementation calls the old overload.
- updates MemTable to create MemTableRep's using the new overload.
Closes https://github.com/facebook/rocksdb/pull/2346
Differential Revision: D5108061
Pulled By: ajkr
fbshipit-source-id: 3a1921214a348dd8ea0f54e1cab3b71c3d46d616
Summary:
rocksdb::Random is not thread-safe. Have one Random for each thread instead.
Closes https://github.com/facebook/rocksdb/pull/2400
Differential Revision: D5173919
Pulled By: yiwu-arbug
fbshipit-source-id: 1a99c7b877f3893eb22355af49e321bcad4e53e6
Summary:
The range deletion meta-block iterators weren't getting cleaned up properly since they don't support arena allocation. I didn't implement arena support since, in the general case, each iterator is used only once and separately from all other iterators, so there should be no benefit to data locality.
Anyways, this diff fixes up #2370 by treating range deletion iterators as non-arena-allocated.
Closes https://github.com/facebook/rocksdb/pull/2399
Differential Revision: D5171119
Pulled By: ajkr
fbshipit-source-id: bef6f5c4c5905a124f4993945aed4bd86e2807d8
Summary:
also changed the `>` in the comparison against `level0_file_num_compaction_trigger` into a `>=` since exactly `level0_file_num_compaction_trigger` can trigger a compaction from L0.
Closes https://github.com/facebook/rocksdb/pull/2179
Differential Revision: D4915772
Pulled By: ajkr
fbshipit-source-id: e38fec6253de6f9a40e61734615c6670d84038aa
Summary:
This is a manual commit of this PR:
Retire InMemoryEnv in favor of MockEnv #2082
With MockEnv doing the same yet being more mature, InMemoryEnv is redundant.
Reviewed By: IslamAbdelRahman
Differential Revision: D5162323
fbshipit-source-id: 59fd0082a891dc99cc531e4da9d68bf891eae3f5
Summary:
When the `TwoLevelIndexSearch` was introduced, it wasn't added to
the C-API.
Closes https://github.com/facebook/rocksdb/pull/2395
Differential Revision: D5165127
Pulled By: maysamyabandeh
fbshipit-source-id: d077f16ab5646c18158d8202a33b0fd076c6c8ad
Summary:
This fixes a compilation failure on Linux when the system libc is not
glibc. jemalloc's configure script incorrectly assumes that glibc is
always used on Linux systems, producing glibc-style signatures; when
the system libc is e.g. musl, the following error is observed:
```
[ 0%] Building CXX object CMakeFiles/rocksdb.dir/db/db_impl.cc.o
In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/table/block.h:19:0,
from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/db/db_impl.cc:77:
/x-tools/x86_64-unknown-linux-musl/x86_64-unknown-linux-musl/sysroot/usr/include/malloc.h:19:8: error: declaration of 'size_t malloc_usable_size(void*)' has a different exception specifier
size_t malloc_usable_size(void *);
^~~~~~~~~~~~~~~~~~
In file included from /go/src/github.com/cockroachdb/cockroach/c-deps/rocksdb.src/db/db_impl.cc:20:0:
/go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:78:33: note: from previous declaration 'size_t malloc_usable_size(void*) throw ()'
# define je_malloc_usable_size malloc_usable_size
^
/go/native/x86_64-unknown-linux-musl/jemalloc/include/jemalloc/jemalloc.h:239:41: note: in expansion of macro 'je_malloc_usable_size'
JEMALLOC_EXPORT size_t JEMALLOC_NOTHROW je_malloc_usable_size(
^~~~~~~~~~~~~~~~~~~~~
CMakeFiles/rocksdb.dir/build.make:350: recipe for target 'CMakeFiles/rocksdb.dir/db/db_impl.cc.o' failed
```
This works around the issue by rearranging the sources such that
jemalloc's headers are never in the same scope as the system's malloc
header. The jemalloc issue has been reported as well, see:
https://github.com/jemalloc/jemalloc/issues/778.
cc tschottdorf
Closes https://github.com/facebook/rocksdb/pull/2188
Differential Revision: D5163048
Pulled By: siying
fbshipit-source-id: c553125458892def175c1be5682b0330d80b2a0d
Summary:
fix regression test by not reporting stats when building db
Closes https://github.com/facebook/rocksdb/pull/2390
Differential Revision: D5159909
Pulled By: lightmark
fbshipit-source-id: c3f4b9deb9c6799ff84207fd341c529144f8158d
Summary:
Previously we returned NotSupported when ingesting files into a database containing any range deletions. This diff adds the support.
- Flush if any memtable contains range deletions overlapping the to-be-ingested file
- Place to-be-ingested file before any level that contains range deletions overlapping it.
- Added support for `Version` to return iterators over range deletions in a given level. Previously, we piggybacked getting range deletions onto `Version`'s `Get()` / `AddIterator()` functions by passing them a `RangeDelAggregator*`. But file ingestion needs to get iterators over range deletions, not populate an aggregator (since the aggregator does collapsing and doesn't expose the actual ranges).
Closes https://github.com/facebook/rocksdb/pull/2370
Differential Revision: D5127648
Pulled By: ajkr
fbshipit-source-id: 816faeb9708adfa5287962bafdde717db56e3f1a
Summary:
Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch.
Stacking on #2375.
Closes https://github.com/facebook/rocksdb/pull/2385
Differential Revision: D5148358
Pulled By: yiwu-arbug
fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33
Summary:
Add a histogram in statistics to help users understand how many merge operands they merge.
Closes https://github.com/facebook/rocksdb/pull/2373
Differential Revision: D5139983
Pulled By: siying
fbshipit-source-id: 61b9ba8ca83f358530a4833d68f0103b56a0e182
Summary:
Re-enable blob_db_test with some update:
* Commented out delay at the end of GC tests. Will update the logic later with sync point to properly trigger GC.
* Added some helper functions.
Also update make files to include blob_dump tool.
Closes https://github.com/facebook/rocksdb/pull/2375
Differential Revision: D5133793
Pulled By: yiwu-arbug
fbshipit-source-id: 95470b26d0c1f9592ba4b7637e027fdd263f425c
Summary:
This collapses all the "platform dependent" tests into a single travis
builder in an effort to reduce overall CI times. These builds currently
take a combined 21-23 minutes, but each one has to compile the library,
so combining them should yield some time savings (5-10 minutes).
Unfortunately the other builders don't duplicate work, so combining
them is unlikely to provide benefit.
Closes https://github.com/facebook/rocksdb/pull/2306
Differential Revision: D5147850
Pulled By: yiwu-arbug
fbshipit-source-id: d947dc8b9f49639fe22f3c8ab9a82a8d730ddddf
Summary:
Adding my name to the authors list so that I can publish a post to rocksdb blog (rocksdb.org).
Closes https://github.com/facebook/rocksdb/pull/2379
Differential Revision: D5143582
Pulled By: sagar0
fbshipit-source-id: d85163f8b59aaeb07ac2a1cdd776ae335c7062b9
Summary:
Previously sst_file_writer only supports kTypeValue, we need kTypeMerge and kTypeDeletion also as user requested.
Closes https://github.com/facebook/rocksdb/pull/2361
Differential Revision: D5139402
Pulled By: lightmark
fbshipit-source-id: 092a60756d01692539d817a3765ebfd58a8d7f88