Summary:
This patch instruments the read path to verify each read value against an optional ReadCallback class. If the value is rejected, the reader moves on to the next value. The WritePreparedTxn makes use of this feature to skip sequence numbers that are not in the read snapshot.
Closes https://github.com/facebook/rocksdb/pull/2850
Differential Revision: D5787375
Pulled By: maysamyabandeh
fbshipit-source-id: 49d808b3062ab35e7ae98ad388f659757794184c
Summary:
This patch advances the max_evicted_seq_ is larger granularities to reduce the overhead of updating the relevant data structures.
It also refactor the related code and adds testing to that. As part of this patch some of the TODOs for removing usage of non-static const members are also addressed.
Closes https://github.com/facebook/rocksdb/pull/2844
Differential Revision: D5772928
Pulled By: maysamyabandeh
fbshipit-source-id: f4fcc2948be69c034f10812cf922ce5ab82ef98c
Summary:
Some of these names, like `MEMTABLE_COMPACTION`, did not mean anything. Tried to give them descriptive names.
Closes https://github.com/facebook/rocksdb/pull/2852
Differential Revision: D5782822
Pulled By: ajkr
fbshipit-source-id: f2695c4124af4073da4492d7135bae2411220f3a
Summary:
if we enable SSE42 globally when compiling the tree for preparing a
portable binary, which could be running on CPU w/o SSE42 instructions
even the GCC on the building host is able to emit SSE42 code, this leads
to illegal instruction errors on machines not supporting SSE42. to solve
this problem, crc32 detects the supported instruction at runtime, and
selects the supported CRC32 implementation according to the result of
`cpuid`. but intrinics like "_mm_crc32_u64()" will not be available
unless the "target" machine is appropriately specified in the command
line, like "-msse42", or using the "target" attribute.
we could pass "-msse42" only when compiling crc32c.cc, and allow the
compiler to generate the SSE42 instructions, but we are still at the
risk of executing illegal instructions on machines does not support
SSE42 if the compiler emits code that is not guarded by our runtime
detection. and we need to do the change in both Makefile and CMakefile.
or, we can use GCC's "target" attribute to enable the machine specific
instructions on certain function. in this way, we have finer grained
control of the used "target". and no need to change the makefiles. so
we don't need to duplicate the changes on both makefile and cmake as
the previous approach.
this problem surfaces when preparing a package for GNU/Linux distribution,
and we only applies to optimization for SSE42, so using a feature
only available on GCC/Clang is not that formidable.
Closes https://github.com/facebook/rocksdb/pull/2807
Differential Revision: D5786084
Pulled By: siying
fbshipit-source-id: bca5c0f877b8d6fb55f58f8f122254a26422843d
Summary:
TransactionCallback was never used. Remove it to avoid confusion.
Closes https://github.com/facebook/rocksdb/pull/2853
Differential Revision: D5787219
Pulled By: maysamyabandeh
fbshipit-source-id: e2b6a89537e3770a269ad38be71c4b0b160a88ac
Summary:
if we're moving any L0 files down, we need to include older L0 files since they may contain older versions of the keys being moved down.
Closes https://github.com/facebook/rocksdb/pull/2845
Differential Revision: D5773800
Pulled By: ajkr
fbshipit-source-id: 9f0770a8eaaeea4c87df2e7a2a1d65bf9d7f4f7e
Summary:
The patch skips write_prepared_transaction_test from travis as they time out there. They are still covered in daily runs of tests.
Closes https://github.com/facebook/rocksdb/pull/2836
Differential Revision: D5767203
Pulled By: maysamyabandeh
fbshipit-source-id: 51045ef98a745197136e14b2ec02fc6f38081b75
Summary:
I spent too much time thinking about histograms lately and realized boundary values fall into the lower bucket, not the upper bucket. It's because we're using `std::map::lower_bound` here: 867fe92e5e/monitoring/histogram.cc (L53). Fixed histogram's `ToString()` to reflect this.
Closes https://github.com/facebook/rocksdb/pull/2817
Differential Revision: D5751159
Pulled By: ajkr
fbshipit-source-id: 67432bb45849eec9b5bcc0d095551dbc0ee81766
Summary:
Add -DPORTABLE=1
port::cacheline_aligned_alloc() has arguments swapped which prevents every single test from running.
Closes https://github.com/facebook/rocksdb/pull/2815
Differential Revision: D5751661
Pulled By: siying
fbshipit-source-id: e0857d6e138ec46035b3c23d7c3c751901a0a4a0
Summary:
useful when debugging to tell whether a DB has stats enabled, and whether a stats object is shared across DBs.
Closes https://github.com/facebook/rocksdb/pull/2813
Differential Revision: D5741755
Pulled By: ajkr
fbshipit-source-id: 9b9d51dee77d14d415cd5da985d8d61b5b3837c3
Summary:
Backup engine is intentionally openable even when some backups are corrupt. Previously the engine could write new backups as long as the most recent backup wasn't corrupt. This PR makes the backup engine able to create new backups even when the most recent one is corrupt.
We now maintain two ID instance variables:
- `latest_backup_id_` is used when creating backup to choose the new ID
- `latest_valid_backup_id_` is used when restoring latest backup since we want most recent valid one
Closes https://github.com/facebook/rocksdb/pull/2804
Differential Revision: D5734148
Pulled By: ajkr
fbshipit-source-id: db440707b31df2c7b084188aa5f6368449e10bcf
Summary:
Update dependencies.sh. Also update tbb to 4.3, which is the latest available in TP2.
Closes https://github.com/facebook/rocksdb/pull/2812
Differential Revision: D5741394
Pulled By: yiwu-arbug
fbshipit-source-id: cafa0b7179f9a44669e5ccace818a02b42336781
Summary:
Remove cassandra tombstone when reaching the max compaction level (full merge). if all columns collected key will be removed in next compaction via compaction filter
Closes https://github.com/facebook/rocksdb/pull/2791
Reviewed By: sagar0
Differential Revision: D5722465
Pulled By: wpc
fbshipit-source-id: 61e9898a5686551653a16383255aeaab3197e65e
Summary:
**Summary**:
Set defaults for high-pri and low-pri thread pools in regression test script.
**Reason for this change**:
With #2680 , high-pri and low-pri thread pools get different numbers than before if `num_high_pri_threads` and `num_low_pri_threads` options are not explicitly passed to db_bench in regression test script ... leading to a false-positive regression.
**Test Plan**:
REMOTE_HOST=udb1671.prn3 TEST_MODE=1 FBSOURCE=~/fbsource ~/fbsource/fbcode/rocks/tools/debug_regression_test.sh viewstate (with very minor changes to the internals).
Observe P50 and P99 which showed up as regressions in our graphs.
Stats with the commit prior to #2680 , ie. 4f81ab3 :
seekrandomwhilewriting : 75.096 micros/op 13316 ops/sec; 168.6 MB/s (7499074 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1197.7254 StdDev: 33.35
Min: 187 Median: 980.5292 Max: 1816424
Percentiles: **P50: 980.53** P75: 1494.57 **P99: 4185.64** P99.9: 7800.11 P99.99: 15039.64
Stats at #2680, ie. at commit dce6d5a (false-positive regression):
seekrandomwhilewriting : 85.330 micros/op 11719 ops/sec; 148.4 MB/s (7499073 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1362.3261 StdDev: 27.86
Min: 185 Median: 1088.1915 Max: 652760
Percentiles: **P50: 1088.19** P75: 1658.12 **P99: 5361.15** P99.9: 7997.95 P99.99: 11730.07
Stats with the current change on top of dce6d5a :
seekrandomwhilewriting : 77.780 micros/op 12856 ops/sec; 162.8 MB/s (7499102 of 7500000 found)
Microseconds per seek:
Count: 120000000 Average: 1226.6744 StdDev: 17.16
Min: 185 Median: 994.2956 Max: 2553530
Percentiles: **P50: 994.30** P75: 1513.68 **P99: 4284.30** P99.9: 9338.64 P99.99: 23008.86
Closes https://github.com/facebook/rocksdb/pull/2801
Differential Revision: D5742338
Pulled By: sagar0
fbshipit-source-id: cc5d727c1a131f2a7070d1bb892efbe929b976ff
Summary:
This branch extends existing property map which keeps values in doubles to keep values in strings so that it can be used to provide wider range of properties. The immediate need for that is to provide IO stall stats in an easy parseable way to MyRocks which is also part of this branch.
Closes https://github.com/facebook/rocksdb/pull/2794
Differential Revision: D5717676
Pulled By: Tema
fbshipit-source-id: e34ba5b79ba774697f7b97ce1138d8fd55471b8a
Summary:
GCC < 5 + ASAN does not instrument aligned_alloc, which can make ASAN
report false-positive with "free on address which was not malloc" error.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61693
Also suppress leak warning with LRUCache::DisownData().
Closes https://github.com/facebook/rocksdb/pull/2783
Differential Revision: D5696465
Pulled By: yiwu-arbug
fbshipit-source-id: 87c607c002511fa089b18cc35e24909bee0e74b4
Summary:
* db/range_del_aggregator.cc (AddTombstone): Avoid a potential
use-after-move bug. The original code would both use and move
`tombstone` in a context where the order of those operations is
not specified. The fix is to perform the use on a new, preceding
statement.
Author: meyering
Closes https://github.com/facebook/rocksdb/pull/2796
Differential Revision: D5721163
Pulled By: ajkr
fbshipit-source-id: a1d328d6a77a17c6425e8069860a202e615e2f48
Summary:
small edit of the language binding file to add the Erlang binding.
Closes https://github.com/facebook/rocksdb/pull/2797
Differential Revision: D5722235
Pulled By: sagar0
fbshipit-source-id: 8ecd74996dad4cac19666783256cfa4d9ce09160
Summary:
Divide the old snapshots to two lists: a few that fit into a cached array and the rest in a vector, which is expected to be empty in normal cases. The former is to optimize concurrent reads from snapshots without requiring locks. It is done by an array of std::atomic, from which std::memory_order_acquire reads are compiled to simple read instructions in most of the x86_64 architectures.
Closes https://github.com/facebook/rocksdb/pull/2758
Differential Revision: D5660504
Pulled By: maysamyabandeh
fbshipit-source-id: 524fcf9a8e7f90a92324536456912a99aaa6740c
Summary:
Fixing flaky blob_db_test.
To close a blob file, blob db used to add a CloseSeqWrite job to the background thread to close it. Changing file close to be synchronous in order to simplify logic, and fix flaky blob_db_test.
Closes https://github.com/facebook/rocksdb/pull/2787
Differential Revision: D5699387
Pulled By: yiwu-arbug
fbshipit-source-id: dd07a945cd435cd3808fce7ee4ea57817409474a
Summary:
Allow user to reduce number of levels in LSM by issue a full CompactRange() and put the result in a lower level, and then reopen DB with reduced options.num_levels. Previous this will fail on reopen on when recovery replaying the previous MANIFEST and found a historical file was on a higher level than the new options.num_levels. The workaround was after CompactRange(), reopen the DB with old num_levels, which will create a new MANIFEST, and then reopen the DB again with new num_levels.
This patch relax the check of levels during recovery. It allows DB to open if there was a historical file on level > options.num_levels, but was also deleted.
Closes https://github.com/facebook/rocksdb/pull/2740
Differential Revision: D5629354
Pulled By: yiwu-arbug
fbshipit-source-id: 545903f6b36b6083e8cbaf777176aef2f488021d
Summary:
It should hold db mutex while accessing max_total_in_memory_state_.
Closes https://github.com/facebook/rocksdb/pull/2784
Differential Revision: D5696536
Pulled By: yiwu-arbug
fbshipit-source-id: 45430634d7fe11909b38e42e5f169f618681c4ee
Summary:
store a zero as the checksum when disabled since it's easier to keep block trailer a fixed length.
Closes https://github.com/facebook/rocksdb/pull/2781
Differential Revision: D5694702
Pulled By: ajkr
fbshipit-source-id: 69cea9da415778ba2b600dfd9d0dfc8cb5188ecd
Summary:
This is the warning that clang considers a bug and has been causing it to fail:
```
table/block_based_table_reader.cc:240:27: warning: Potential leak of memory pointed to by 'block.value'
for (; biter.Valid(); biter.Next()) {
^~~~~
```
Actually clang just doesn't have enough knowledge to statically determine it's safe. We can teach it using an assert.
Closes https://github.com/facebook/rocksdb/pull/2779
Differential Revision: D5691225
Pulled By: ajkr
fbshipit-source-id: 3f0d545bf44636953b30ee5243c63239e8f16d8e
Summary:
One of the core assumptions of DeleteRange is that files containing portions of the same range tombstone are treated as a single unit from the perspective of compaction picker. Need better tests for this. This PR adds the tests for manual compaction.
Closes https://github.com/facebook/rocksdb/pull/2769
Differential Revision: D5676677
Pulled By: ajkr
fbshipit-source-id: 1b4b3382b300ff7048b872911405fdf900e4fbec
Summary:
Solves #2632
Added OptimisticTransactionDB to the C API.
Added missing merge operations to Transaction.
Added missing get_for_update operation to transaction
If required I will create tests for this another day.
Closes https://github.com/facebook/rocksdb/pull/2633
Differential Revision: D5600906
Pulled By: yiwu-arbug
fbshipit-source-id: da23e4484433d8f59d471f778ff2ae210e3fe4eb
Summary:
I made another rust binding. 👻
* Use C++ API (instead of C API)
* Try to follow [Rust Guidelines](https://aturon.github.io/README.html)
* Working in progress (the APIs are not stable yet)
Closes https://github.com/facebook/rocksdb/pull/2438
Differential Revision: D5690612
Pulled By: siying
fbshipit-source-id: 11d3956c33b5e5366555afbf3786b782be3046e7
Summary:
Allow `Slice` holding nullptr as a sentinel value but not in comparisons. This new restriction eliminates the need for the manual checks in 39ef900551, while still conforming to glibc's `memcmp` API. Thanks siying for the idea. Users may need to migrate, so mentioned it in HISTORY.md.
Closes https://github.com/facebook/rocksdb/pull/2777
Differential Revision: D5686016
Pulled By: ajkr
fbshipit-source-id: 03a2ca3fd9a0ebade9d0d5686c81d59a9534f563
Summary:
The ::Get from DB is not augmented with an overload method that takes a PinnableSlice instead of a string. Transactions however are not yet upgraded to use the new API. As a result, transaction users such as MyRocks cannot benefit from it. This patch updates the transactional API with a PinnableSlice overload.
Closes https://github.com/facebook/rocksdb/pull/2736
Differential Revision: D5645770
Pulled By: maysamyabandeh
fbshipit-source-id: f6af520df902f842de1bcf99bed3e8dfc43ad96d