Commit Graph

8239 Commits

Author SHA1 Message Date
Eli Pozniansky
6b7fcc0d5f Improve CPU Efficiency of ApproximateSize (part 1) (#5613)
Summary:
1. Avoid creating the iterator in order to call BlockBasedTable::ApproximateOffsetOf(). Instead, directly call into it.
2. Optimize BlockBasedTable::ApproximateOffsetOf() keeps the index block iterator in stack.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5613

Differential Revision: D16442660

Pulled By: elipoz

fbshipit-source-id: 9320be3e918c139b10e758cbbb684706d172e516
2019-07-23 15:34:33 -07:00
sdong
3782accf7d ldb sometimes specify a string-append merge operator (#5607)
Summary:
Right now, ldb cannot scan a DB with merge operands with default ldb. There is no hard to give a general merge operator so that it can at least print out something
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5607

Test Plan: Run ldb against a DB with merge operands and see the outputs.

Differential Revision: D16442634

fbshipit-source-id: c66c414ec07f219cfc6e6ec2cc14c783ee95df54
2019-07-23 14:25:18 -07:00
anand76
112702ac6c Parallelize file_reader_writer_test in order to reduce timeouts
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5608

Test Plan:
make check
buck test mode/dev-tsan internal_repo_rocksdb/repo:file_reader_writer_test -- --run-disabled

Differential Revision: D16441796

Pulled By: anand1976

fbshipit-source-id: afbb88a9fcb1c0ba22215118767e8eab3d1d6a4a
2019-07-23 11:50:10 -07:00
Manuel Ung
eae832740b WriteUnPrepared: improve read your own write functionality (#5573)
Summary:
There are a number of fixes in this PR (with most bugs found via the added stress tests):
1. Re-enable reseek optimization. This was initially disabled to avoid infinite loops in https://github.com/facebook/rocksdb/pull/3955 but this can be resolved by remembering not to reseek after a reseek has already been done. This problem only affects forward iteration in `DBIter::FindNextUserEntryInternal`, as we already disable reseeking in `DBIter::FindValueForCurrentKeyUsingSeek`.
2. Verify that ReadOption.snapshot can be safely used for iterator creation. Some snapshots would not give correct results because snaphsot validation would not be enforced, breaking some assumptions in Prev() iteration.
3. In the non-snapshot Get() case, reads done at `LastPublishedSequence` may not be enough, because unprepared sequence numbers are not published. Use `std::max(published_seq, max_visible_seq)` to do lookups instead.
4. Add stress test to test reading own writes.
5. Minor bug in the allow_concurrent_memtable_write case where we forgot to pass in batch_per_txn_.
6. Minor performance optimization in `CalcMaxUnpreparedSequenceNumber` by assigning by reference instead of value.
7. Add some more comments everywhere.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5573

Differential Revision: D16276089

Pulled By: lth

fbshipit-source-id: 18029c944eb427a90a87dee76ac1b23f37ec1ccb
2019-07-23 08:08:19 -07:00
Maysam Yabandeh
327c4807a7 Disable refresh snapshot feature by default (#5606)
Summary:
There are concerns about the correctness of this patch. Disabling by default until the concerns are resolved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5606

Differential Revision: D16428064

Pulled By: maysamyabandeh

fbshipit-source-id: a89280f0ea85796c9c9dfbfd9a8e91dad9b000b3
2019-07-22 20:05:00 -07:00
sdong
66b5613d0c row_cache to share entry for recent snapshots (#5600)
Summary:
Right now, users cannot take advantage of row cache, unless no snapshot is used, or Get() is repeated for the same snapshots. This limits the usage of row cache.
This change eliminate this restriction in some cases. If the snapshot used is newer than the largest sequence number in the file, and write callback function is not registered, the same row cache key is used as no snapshot is given. We still need the callback function restriction for now because the callback function may filter out different keys for different snapshots even if the snapshots are new.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5600

Test Plan: Add a unit test.

Differential Revision: D16386616

fbshipit-source-id: 6b7d214bd215d191b03ccf55926ad4b703ec2e53
2019-07-22 18:56:19 -07:00
haoyuhuang
3778470061 Block cache analyzer: Compute correlation of features and human readable trace file. (#5596)
Summary:
- Compute correlation between a few features and predictions, e.g., number of accesses since the last access vs number of accesses till the next access on a block.
- Output human readable trace file so python can consume it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5596

Test Plan: make clean && USE_CLANG=1 make check -j32

Differential Revision: D16373200

Pulled By: HaoyuHuang

fbshipit-source-id: c848d26bc2e9210461f317d7dbee42d55be5a0cc
2019-07-22 17:51:34 -07:00
Yanqin Jin
a78503bd6c Temporarily disable snapshot list refresh for atomic flush stress test (#5581)
Summary:
Atomic flush test started to fail after https://github.com/facebook/rocksdb/issues/5099. Then https://github.com/facebook/rocksdb/issues/5278 provided a fix after
which the same error occurred much less frequently. However it still occur
occasionally. Not sure what the root cause is. This PR disables the feature of
snapshot list refresh, and we should keep an eye on the failure in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5581

Differential Revision: D16295985

Pulled By: riversand963

fbshipit-source-id: c9e62e65133c52c21b07097de359632ca62571e4
2019-07-22 14:38:16 -07:00
Eli Pozniansky
0be1feec21 Added .watchmanconfig file to rocksdb repo (#5593)
Summary:
Added .watchmanconfig file to rocksdb repo. It is currently .gitignored.
This allows to auto sync modified files with watchman when editing them remotely.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5593

Differential Revision: D16363860

Pulled By: elipoz

fbshipit-source-id: 5ae221e21c6c757ceb08877771550d508f773d55
2019-07-19 15:00:33 -07:00
anand76
4f7ba3aaed Fix tsan and valgrind failures in import_column_family_test
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5598

Test Plan:
tsan_check
valgrind_test

Differential Revision: D16380167

Pulled By: anand1976

fbshipit-source-id: 2d0caea7d2d02a9606457f62811175d762b89d5c
2019-07-19 13:25:36 -07:00
Eli Pozniansky
c129c75fb7 Added log_readahead_size option to control prefetching for Log::Reader (#5592)
Summary:
Added log_readahead_size option to control prefetching for Log::Reader.
This is mostly useful for reading a remotely located log, as it can save the number of round-trips when reading it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5592

Differential Revision: D16362989

Pulled By: elipoz

fbshipit-source-id: c5d4d5245a44008cd59879640efff70c091ad3e8
2019-07-19 12:00:19 -07:00
sdong
6bb3b4b567 ldb idump to support non-default column families. (#5594)
Summary:
ldb idump now only works for default column family. Extend it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5594

Test Plan: Compile and run the tool against a multiple CF DB.

Differential Revision: D16380684

fbshipit-source-id: bfb8af36fdad1806837c90aaaab492d71528aceb
2019-07-19 11:36:59 -07:00
anand76
abd1fdddef Fix asan_check failures
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5589

Test Plan: TEST_TMPDIR=/dev/shm/rocksdb COMPILE_WITH_ASAN=1 OPT=-g make J=64 -j64 asan_check

Differential Revision: D16361081

Pulled By: anand1976

fbshipit-source-id: 09474832b9cfb318a840d4b633e22dfad105d58c
2019-07-18 14:51:25 -07:00
Venki Pallipadi
3a6e83b56b HISTORY update for export and import column family APIs
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5587

Differential Revision: D16359919

fbshipit-source-id: cfd9c448d79a8b8e7ac1d2b661d10151df269dba
2019-07-18 10:16:38 -07:00
anand76
ec2b996b29 Fix LITE mode build failure
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5588

Test Plan: make LITE=1 all check

Differential Revision: D16354543

Pulled By: anand1976

fbshipit-source-id: 327a171439e183ac3a5e5057c511d6bca445e97d
2019-07-17 22:06:12 -07:00
Eli Pozniansky
9f5cfb8e71 Fix for ReadaheadSequentialFile crash in ldb_cmd_test (#5586)
Summary:
Fixing a corner case crash when there was no data read from file, but status is still OK
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5586

Differential Revision: D16348117

Pulled By: elipoz

fbshipit-source-id: f97973308024f020d8be79ca3c56466b84d80656
2019-07-17 17:04:39 -07:00
haoyuhuang
8a008d4170 Block access tracing: Trace referenced key for Get on non-data blocks. (#5548)
Summary:
This PR traces the referenced key for Get for all types of blocks. This is useful when evaluating hybrid row-block caches.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5548

Test Plan: make clean && USE_CLANG=1 make check -j32

Differential Revision: D16157979

Pulled By: HaoyuHuang

fbshipit-source-id: f6327411c9deb74e35e22a35f66cdbae09ab9d87
2019-07-17 13:05:58 -07:00
Venki Pallipadi
22ce462450 Export Import sst files (#5495)
Summary:
Refresh of the earlier change here - https://github.com/facebook/rocksdb/issues/5135

This is a review request for code change needed for - https://github.com/facebook/rocksdb/issues/3469
"Add support for taking snapshot of a column family and creating column family from a given CF snapshot"

We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.

(1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
// Exports all live SST files of a specified Column Family onto export_dir,
// returning SST files information in metadata.
// - SST files will be created as hard links when the directory specified
//   is in the same partition as the db directory, copied otherwise.
// - export_dir should not already exist and will be created by this API.
// - Always triggers a flush.
virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                  const std::string& export_dir,
                                  ExportImportFilesMetaData** metadata);

Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
returning the list of file metadata.

(2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
// CreateColumnFamilyWithImport() will create a new column family with
// column_family_name and import external SST files specified in metadata into
// this column family.
// (1) External SST files can be created using SstFileWriter.
// (2) External SST files can be exported from a particular column family in
//     an existing DB.
// Option in import_options specifies whether the external files are copied or
// moved (default is copy). When option specifies copy, managing files at
// external_file_path is caller's responsibility. When option specifies a
// move, the call ensures that the specified files at external_file_path are
// deleted on successful return and files are not modified on any error
// return.
// On error return, column family handle returned will be nullptr.
// ColumnFamily will be present on successful return and will not be present
// on error return. ColumnFamily may be present on any crash during this call.
virtual Status CreateColumnFamilyWithImport(
    const ColumnFamilyOptions& options, const std::string& column_family_name,
    const ImportColumnFamilyOptions& import_options,
    const ExportImportFilesMetaData& metadata,
    ColumnFamilyHandle** handle);

Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.

If incoming sequence number is higher than current local sequence number, local sequence
number is updated to reflect this.

Note, as the sst files is are being moved across Column Families, Column Family name in sst file
will no longer match the actual column family on destination DB. The API does not modify Column
Family name or id in the sst files being imported.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5495

Differential Revision: D16018881

fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
2019-07-17 12:27:14 -07:00
Yuqi Gu
a3c1832e86 Arm64 CRC32 parallel computation optimization for RocksDB (#5494)
Summary:
Crc32c Parallel computation optimization:
Algorithm comes from Intel whitepaper: [crc-iscsi-polynomial-crc32-instruction-paper](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf)
 Input data is divided into three equal-sized blocks
Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes
One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes

1. crc32c_test:
```
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CRC
[ RUN      ] CRC.StandardResults
[       OK ] CRC.StandardResults (1 ms)
[ RUN      ] CRC.Values
[       OK ] CRC.Values (0 ms)
[ RUN      ] CRC.Extend
[       OK ] CRC.Extend (0 ms)
[ RUN      ] CRC.Mask
[       OK ] CRC.Mask (0 ms)
[----------] 4 tests from CRC (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (1 ms total)
[  PASSED  ] 4 tests.
```

2. RocksDB benchmark: db_bench --benchmarks="crc32c"

```
Linear Arm crc32c:
  crc32c: 1.005 micros/op 995133 ops/sec; 3887.2 MB/s (4096 per op)
```

```
Parallel optimization with Armv8 crypto extension:
  crc32c: 0.419 micros/op 2385078 ops/sec; 9316.7 MB/s (4096 per op)
```

It gets ~2.4x speedup compared to linear Arm crc32c instructions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5494

Differential Revision: D16340806

fbshipit-source-id: 95dae9a5b646fd20a8303671d82f17b2e162e945
2019-07-17 11:22:38 -07:00
Eli Pozniansky
74fb7f0ba5 Cleaned up and simplified LRU cache implementation (#5579)
Summary:
The 'refs' field in LRUHandle now counts only external references, since anyway we already have the IN_CACHE flag. This simplifies reference accounting logic a bit. Also cleaned up few asserts code as well as the comments - to be more readable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5579

Differential Revision: D16286747

Pulled By: elipoz

fbshipit-source-id: 7186d88f80f512ce584d0a303437494b5cbefd7f
2019-07-16 19:17:45 -07:00
Eli Pozniansky
0f4d90e6e4 Added support for sequential read-ahead file (#5580)
Summary:
Added support for sequential read-ahead file that can prefetch the read data and later serve it from internal cache buffer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5580

Differential Revision: D16287082

Pulled By: elipoz

fbshipit-source-id: a3e7ad9643d377d39352ff63058ce050ec31dcf3
2019-07-16 18:21:18 -07:00
sdong
699a569c52 Remove RandomAccessFileReader.for_compaction_ (#5572)
Summary:
RandomAccessFileReader.for_compaction_ doesn't seem to be used anymore. Remove it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5572

Test Plan: USE_CLANG=1 make all check -j

Differential Revision: D16286178

fbshipit-source-id: aa338049761033dfbe5e8b1707bbb0be2df5be7e
2019-07-16 16:32:18 -07:00
Manuel Ung
0acaa1a846 WriteUnPrepared: use tracked_keys_ to track keys needed for rollback (#5562)
Summary:
Currently, we are tracking keys we need to rollback via a separate structure specific to WriteUnprepared in write_set_keys_.

We already have a data structure called tracked_keys_ used to track which keys to unlock on transaction termination. This is exactly what we want, since we should only rollback keys that we have locked anyway.

Save some memory by reusing that data structure instead of making our own.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5562

Differential Revision: D16206484

Pulled By: lth

fbshipit-source-id: 5894d2b824a4b19062d84adbd6e6e86f00047488
2019-07-16 15:24:56 -07:00
Levi Tamasi
3bde41b5a3 Move the filter readers out of the block cache (#5504)
Summary:
Currently, when the block cache is used for the filter block, it is not
really the block itself that is stored in the cache but a FilterBlockReader
object. Since this object is not pure data (it has, for instance, pointers that
might dangle, including in one case a back pointer to the TableReader), it's not
really sharable. To avoid the issues around this, the current code erases the
cache entries when the TableReader is closed (which, BTW, is not sufficient
since a concurrent TableReader might have picked up the object in the meantime).
Instead of doing this, the patch moves the FilterBlockReader out of the cache
altogether, and decouples the filter reader object from the filter block.
In particular, instead of the TableReader owning, or caching/pinning the
FilterBlockReader (based on the customer's settings), with the change the
TableReader unconditionally owns the FilterBlockReader, which in turn
owns/caches/pins the filter block. This change also enables us to reuse the code
paths historically used for data blocks for filters as well.

Note:
Eviction statistics for filter blocks are temporarily broken. We plan to fix this in a
separate phase.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5504

Test Plan: make asan_check

Differential Revision: D16036974

Pulled By: ltamasi

fbshipit-source-id: 770f543c5fb4ed126fd1e04bfd3809cf4ff9c091
2019-07-16 13:14:58 -07:00
Jim Lin
cd2520361d Fix memorty leak in rocksdb_wal_iter_get_batch function (#5515)
Summary:
`wal_batch.writeBatchPtr.release()` gives up the ownership of the original `WriteBatch`, but there is no new owner, which causes memory leak.

The patch is simple. Removing `release()` prevent ownership change. `std::move` is for speed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5515

Differential Revision: D16264281

Pulled By: riversand963

fbshipit-source-id: 51c556b7a1c977325c3aa24acb636303847151fa
2019-07-15 12:59:39 -07:00
Tomas Kolda
6e8a1354a7 Fix regression - 100% CPU - Regression for Windows 7 (#5557)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/5552
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5557

Differential Revision: D16266329

fbshipit-source-id: a8f6b50298a6f7c8d6c7e172bb26dd7eb6bd8a4d
2019-07-15 12:19:49 -07:00
Zhongyi Xie
b0259e45e0 add more tracing for stats history (#5566)
Summary:
Sample info log output from db_bench:
In-memory:
```
2019/07/12-21:39:19.478490 7fa01b3f5700 [_impl/db_impl.cc:702] ------- PERSISTING STATS -------
2019/07/12-21:39:19.478633 7fa01b3f5700 [_impl/db_impl.cc:753] Storing 145 stats with timestamp 1562992759 to in-memory stats history
2019/07/12-21:39:19.478670 7fa01b3f5700 [_impl/db_impl.cc:766] [Pre-GC] In-memory stats history size: 1051218 bytes, slice count: 103
2019/07/12-21:39:19.478704 7fa01b3f5700 [_impl/db_impl.cc:775] [Post-GC] In-memory stats history size: 1051218 bytes, slice count: 102
```
On-disk:
```
2019/07/12-21:48:53.862548 7f24943f5700 [_impl/db_impl.cc:702] ------- PERSISTING STATS -------
2019/07/12-21:48:53.862553 7f24943f5700 [_impl/db_impl.cc:709] Reading 145 stats from statistics
2019/07/12-21:48:53.862852 7f24943f5700 [_impl/db_impl.cc:737] Writing 145 stats with timestamp 1562993333 to persistent stats CF succeeded
```
```
2019/07/12-21:48:51.861711 7f24943f5700 [_impl/db_impl.cc:702] ------- PERSISTING STATS -------
2019/07/12-21:48:51.861729 7f24943f5700 [_impl/db_impl.cc:709] Reading 145 stats from statistics
2019/07/12-21:48:51.861921 7f24943f5700 [_impl/db_impl.cc:732] Writing to persistent stats CF failed -- Result incomplete: Write stall
...
2019/07/12-21:48:51.873032 7f2494bf6700 [WARN] [lumn_family.cc:749] [default] Stopping writes because we have 2 immutable memtables (waiting for flush), max_write_buffer_number is set to 2
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5566

Differential Revision: D16258187

Pulled By: miasantreble

fbshipit-source-id: 292497099b941418590ed4312411bee36e244dc5
2019-07-15 11:49:17 -07:00
Yikun Jiang
f064d74e45 Cleanup the Arm64 CRC32 unused warning (#5565)
Summary:
When 'HAVE_ARM64_CRC' is set, the blew methods:

- bool rocksdb::crc32c::isSSE42()
- bool rocksdb::crc32c::isPCLMULQDQ()

are defined but not used, the unused-function is raised
when do rocksdb build.

This patch try to cleanup these warnings by add ifndef,
if it build under the HAVE_ARM64_CRC, we will not define
`isSSE42` and `isPCLMULQDQ`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5565

Differential Revision: D16233654

fbshipit-source-id: c32a9dda7465dbf65f9ccafef159124db92cdffd
2019-07-15 11:20:26 -07:00
haoyuhuang
68d43b4d30 A python script to plot graphs for cvs files generated by block_cache_trace_analyzer
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5563

Test Plan: Manually run the script on files generated by block_cache_trace_analyzer.

Differential Revision: D16214400

Pulled By: HaoyuHuang

fbshipit-source-id: 94485eed995e9b2b63e197c5dfeb80129fa7897f
2019-07-12 18:56:20 -07:00
Sergei Petrunia
61876614dc Fix MyRocks compile warnings-treated-as-errors on Fedora 30, gcc 9.1.1 (#5553)
Summary:
- Provide assignment operator in CompactionStats
- Provide a copy constructor for FileDescriptor
- Remove std::move from "return std::move(t)" in BoundedQueue
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5553

Differential Revision: D16230170

fbshipit-source-id: fd7c6e52390b2db1be24141e25649cf62424d078
2019-07-12 17:30:51 -07:00
haoyuhuang
3e9c5a3523 Block cache analyzer: Add more stats (#5516)
Summary:
This PR provides more command line options for block cache analyzer to better understand block cache access pattern.
-analyze_bottom_k_access_count_blocks
-analyze_top_k_access_count_blocks
-reuse_lifetime_labels
-reuse_lifetime_buckets
-analyze_callers
-access_count_buckets
-analyze_blocks_reuse_k_reuse_window
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5516

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16037440

Pulled By: HaoyuHuang

fbshipit-source-id: b9a4ac0d4712053fab910732077a4d4b91400bc8
2019-07-12 16:55:34 -07:00
haoyuhuang
1a59b6e2a9 Cache simulator: Add a ghost cache for admission control and a hybrid row-block cache. (#5534)
Summary:
This PR adds a ghost cache for admission control. Specifically, it admits an entry on its second access.
It also adds a hybrid row-block cache that caches the referenced key-value pairs of a Get/MultiGet request instead of its blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5534

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16101124

Pulled By: HaoyuHuang

fbshipit-source-id: b99edda6418a888e94eb40f71ece45d375e234b1
2019-07-11 12:43:29 -07:00
Yanqin Jin
82d8ca8ade Upload db directory during cleanup for certain tests (#5554)
Summary:
Add an extra cleanup step so that db directory can be saved and uploaded.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5554

Reviewed By: yancouto

Differential Revision: D16168844

Pulled By: riversand963

fbshipit-source-id: ec7b2cee5f11c7d388c36531f8b076d648e2fb19
2019-07-10 11:29:55 -07:00
ggaurav28
60d8b19836 Implemented a file logger that uses WritableFileWriter (#5491)
Summary:
Current PosixLogger performs IO operations using posix calls. Thus the
current implementation will not work for non-posix env. Created a new
logger class EnvLogger that uses env specific WritableFileWriter for IO operations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5491

Test Plan: make check

Differential Revision: D15909002

Pulled By: ggaurav28

fbshipit-source-id: 13a8105176e8e42db0c59798d48cb6a0dbccc965
2019-07-09 16:27:22 -07:00
Yanqin Jin
f786b4a5b4 Improve result print on atomic flush stress test failure (#5549)
Summary:
When atomic flush stress test fails, we print internal keys within the range with mismatched key/values for all column families.

Test plan (on devserver)
Manually hack the code to randomly insert wrong data. Run the test.
```
$make clean && COMPILE_WITH_TSAN=1 make -j32 db_stress
$./db_stress -test_atomic_flush=true -ops_per_thread=10000
```
Check that proper error messages are printed, as follows:
```
2019/07/08-17:40:14  Starting verification
Verification failed
Latest Sequence Number: 190903
[default] 000000000000050B => 56290000525350515E5F5C5D5A5B5859
[3] 0000000000000533 => EE100000EAEBE8E9E6E7E4E5E2E3E0E1FEFFFCFDFAFBF8F9
Internal keys in CF 'default', [000000000000050B, 0000000000000533] (max 8)
  key 000000000000050B seq 139920 type 1
  key 0000000000000533 seq 0 type 1
Internal keys in CF '3', [000000000000050B, 0000000000000533] (max 8)
  key 0000000000000533 seq 0 type 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5549

Differential Revision: D16158709

Pulled By: riversand963

fbshipit-source-id: f07fa87763f87b3bd908da03c956709c6456bcab
2019-07-09 16:27:22 -07:00
sdong
aa0367aabb Allow ldb to open DB as secondary (#5537)
Summary:
Right now ldb can open running DB through read-only DB. However, it might leave info logs files to the read-only DB directory. Add an option to open the DB as secondary to avoid it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5537

Test Plan:
Run
./ldb scan  --max_keys=10 --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp --no_value --hex
and
./ldb get 0x00000000000000103030303030303030 --hex --db=/tmp/rocksdbtest-2491/dbbench --secondary_path=/tmp
against a normal db_bench run and observe the output changes. Also observe that no new info logs files are created under /tmp/rocksdbtest-2491/dbbench.
Run without --secondary_path and observe that new info logs created under /tmp/rocksdbtest-2491/dbbench.

Differential Revision: D16113886

fbshipit-source-id: 4e09dec47c2528f6ca08a9e7a7894ba2d9daebbb
2019-07-09 12:51:28 -07:00
sdong
cb19e7411f Fix bugs in DBWALTest.kTolerateCorruptedTailRecords triggered by #5520 (#5550)
Summary:
https://github.com/facebook/rocksdb/pull/5520 caused a buffer overflow bug in DBWALTest.kTolerateCorruptedTailRecords. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5550

Test Plan: Run the test in UBSAN. It used to fail. Not it succeeds.

Differential Revision: D16165516

fbshipit-source-id: 42c56a6bc64eb091f054b87757fcbef60da825f7
2019-07-09 11:18:32 -07:00
Tim Hatch
a6a9213a36 Fix interpreter lines for files with python2-only syntax.
Reviewed By: lisroach

Differential Revision: D15362271

fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5
2019-07-09 10:51:37 -07:00
sdong
872a261ffc db_stress to print some internal keys after verification failure (#5543)
Summary:
Print out some more information when db_tress fails with verification failures to help debugging problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5543

Test Plan:
Manually ingest some failures and observe the outputs are like this:

Verification failed
[default] 0000000000199A5A => 7C3D000078797A7B74757677707172736C6D6E6F68696A6B
[6] 000000000019C8BD => 65380000616063626D6C6F6E69686B6A
internal keys in default CF [0000000000199A5A, 000000000019C8BD] (max 8)
  key 0000000000199A5A seq 179246 type 1
  key 000000000019C8BD seq 163970 type 1
Lastest Sequence Number: 292234

Differential Revision: D16153717

fbshipit-source-id: b33fa50a828c190cbf8249a37955432044f92daf
2019-07-08 13:36:37 -07:00
haoyuhuang
6ca3feed5c Fix -Werror=shadow (#5546)
Summary:
This PR fixes shadow errors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5546

Test Plan: make clean && make check -j32 && make clean && USE_CLANG=1 make check -j32 && make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16147841

Pulled By: HaoyuHuang

fbshipit-source-id: 1043500d70c134185f537ab4c3900452752f1534
2019-07-08 00:12:43 -07:00
Yanqin Jin
7c76a7fba2 Support GetAllKeyVersions() for non-default cf (#5544)
Summary:
Previously `GetAllKeyVersions()` supports default column family only. This PR add support for other column families.

Test plan (devserver):
```
$make clean && COMPILE_WITH_ASAN=1 make -j32 db_basic_test
$./db_basic_test --gtest_filter=DBBasicTest.GetAllKeyVersions
```
All other unit tests must pass.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5544

Differential Revision: D16147551

Pulled By: riversand963

fbshipit-source-id: 5a61aece2a32d789e150226a9b8d53f4a5760168
2019-07-07 22:43:52 -07:00
Zhongyi Xie
8d34806972 setup wal_in_db_path_ for secondary instance (#5545)
Summary:
PR https://github.com/facebook/rocksdb/pull/5520 adds DBImpl:: wal_in_db_path_ and initializes it in DBImpl::Open, this PR fixes the valgrind error for secondary instance:
```
==236417== Conditional jump or move depends on uninitialised value(s)
==236417==    at 0x62242A: rocksdb::DeleteDBFile(rocksdb::ImmutableDBOptions const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool, bool) (file_util.cc:96)
==236417==    by 0x512432: rocksdb::DBImpl::DeleteObsoleteFileImpl(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, rocksdb::FileType, unsigned long) (db_impl_files.cc:261)
==236417==    by 0x515A7A: rocksdb::DBImpl::PurgeObsoleteFiles(rocksdb::JobContext&, bool) (db_impl_files.cc:492)
==236417==    by 0x499153: rocksdb::ColumnFamilyHandleImpl::~ColumnFamilyHandleImpl() (column_family.cc:75)
==236417==    by 0x499880: rocksdb::ColumnFamilyHandleImpl::~ColumnFamilyHandleImpl() (column_family.cc:84)
==236417==    by 0x4C9AF9: rocksdb::DB::DestroyColumnFamilyHandle(rocksdb::ColumnFamilyHandle*) (db_impl.cc:3105)
==236417==    by 0x44E853: CloseSecondary (db_secondary_test.cc:53)
==236417==    by 0x44E853: rocksdb::DBSecondaryTest::~DBSecondaryTest() (db_secondary_test.cc:31)
==236417==    by 0x44EC77: ~DBSecondaryTest_PrimaryDropColumnFamily_Test (db_secondary_test.cc:443)
==236417==    by 0x44EC77: rocksdb::DBSecondaryTest_PrimaryDropColumnFamily_Test::~DBSecondaryTest_PrimaryDropColumnFamily_Test() (db_secondary_test.cc:443)
==236417==    by 0x83D1D7: HandleSehExceptionsInMethodIfSupported<testing::Test, void> (gtest-all.cc:3824)
==236417==    by 0x83D1D7: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) (gtest-all.cc:3860)
==236417==    by 0x8346DB: testing::TestInfo::Run() [clone .part.486] (gtest-all.cc:4078)
==236417==    by 0x8348D4: Run (gtest-all.cc:4047)
==236417==    by 0x8348D4: testing::TestCase::Run() [clone .part.487] (gtest-all.cc:4190)
==236417==    by 0x834D14: Run (gtest-all.cc:6100)
==236417==    by 0x834D14: testing::internal::UnitTestImpl::RunAllTests() (gtest-all.cc:6062)
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5545

Differential Revision: D16146224

Pulled By: miasantreble

fbshipit-source-id: 184c90e451352951da4e955f054d4b1a1f29ea29
2019-07-07 21:32:50 -07:00
anand76
e0d9d57750 Fix bugs in WAL trash file handling (#5520)
Summary:
1. Cleanup WAL trash files on open
2. Don't apply deletion rate limit if WAL dir is different from db dir
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5520

Test Plan: Add new unit tests and make check

Differential Revision: D16096750

Pulled By: anand1976

fbshipit-source-id: 6f07858ad864b754b711db416f0389c45ede599b
2019-07-06 21:07:32 -07:00
sdong
2de61d9129 Assert get_context not null in BlockBasedTable::Get() (#5542)
Summary:
clang analyze fails after https://github.com/facebook/rocksdb/pull/5514 for this failure:
table/block_based/block_based_table_reader.cc:3450:16: warning: Called C++ object pointer is null
          if (!get_context->SaveValue(
               ^~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.

The reaon is that a branching is added earlier in the function on get_context is null or not, CLANG analyze thinks that it can be null and we make the function call withou the null checking.
Fix the issue by removing the branch and add an assert.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5542

Test Plan: "make all check" passes and CLANG analyze failure goes away.

Differential Revision: D16133988

fbshipit-source-id: d4627d03c4746254cc11926c523931086ccebcda
2019-07-05 12:34:13 -07:00
Yi Wu
4f66ec977d Fix lower bound check error when iterate across file boundary (#5540)
Summary:
Since https://github.com/facebook/rocksdb/issues/5468 `LevelIterator` compare lower bound and file smallest key on `NewFileIterator` and cache the result to reduce per key lower bound check. However when iterate across file boundary, it doesn't update the cached result since `Valid()=false` because `Valid()` still reflect the status of the previous file iterator. Fixing it by remove the `Valid()` check from `CheckMayBeOutOfLowerBound()`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5540

Test Plan:
See the new test.

Signed-off-by: Yi Wu <yiwu@pingcap.com>

Differential Revision: D16127653

fbshipit-source-id: a0691e1164658d485c17971aaa97028812f74678
2019-07-04 17:28:30 -07:00
sdong
e4dcf5fd22 db_bench to add a new "benchmark" to print out all stats history (#5532)
Summary:
Sometimes it is helpful to fetch the whole history of stats after benchmark runs. Add such an option
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5532

Test Plan: Run the benchmark manually and observe the output is as expected.

Differential Revision: D16097764

fbshipit-source-id: 10b5b735a22a18be198b8f348be11f11f8806904
2019-07-03 20:03:28 -07:00
haoyuhuang
6edc5d0719 Block cache tracing: Associate a unique id with Get and MultiGet (#5514)
Summary:
This PR associates a unique id with Get and MultiGet. This enables us to track how many blocks a Get/MultiGet request accesses. We can also measure the impact of row cache vs block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5514

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16032681

Pulled By: HaoyuHuang

fbshipit-source-id: 775b05f4440badd58de6667e3ec9f4fc87a0af4c
2019-07-03 19:35:41 -07:00
Sagar Vemuri
84c5c9aab1 Fix a bug in compaction reads causing checksum mismatches and asan errors (#5531)
Summary:
Fixed a bug in compaction reads due to which incorrect number of bytes were being read/utilized. The bug was introduced in https://github.com/facebook/rocksdb/issues/5498 , resulting in "Corruption: block checksum mismatch" and "heap-buffer-overflow" asan errors in our tests.

https://github.com/facebook/rocksdb/issues/5498 was introduced recently and is not in any released versions.

ASAN:
```
> ==2280939==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6250005e83da at pc 0x000000d57f62 bp 0x7f954f483770 sp 0x7f954f482f20
> === How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
> READ of size 4 at 0x6250005e83da thread T4
> SCARINESS: 27 (4-byte-read-heap-buffer-overflow-far-from-bounds)

>      #0 tests+0xd57f61                           __asan_memcpy
>      https://github.com/facebook/rocksdb/issues/1 rocksdb/src/util/coding.h:124            rocksdb::DecodeFixed32(char const*)
>      https://github.com/facebook/rocksdb/issues/2 rocksdb/src/table/block_fetcher.cc:39    rocksdb::BlockFetcher::CheckBlockChecksum()
>      https://github.com/facebook/rocksdb/issues/3 rocksdb/src/table/block_fetcher.cc:99    rocksdb::BlockFetcher::TryGetFromPrefetchBuffer()
>      https://github.com/facebook/rocksdb/issues/4 rocksdb/src/table/block_fetcher.cc:209   rocksdb::BlockFetcher::ReadBlockContents()
>      https://github.com/facebook/rocksdb/issues/5 rocksdb/src/table/block_based/block_based_table_reader.cc:93 rocksdb::(anonymous namespace)::ReadBlockFromFile(rocksdb::RandomAccessFileReader*, rocksdb::FilePrefetchBuffer*, rocksdb::Footer const&, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, std::unique_ptr<...>*, rocksdb::ImmutableCFOptions const&, bool, bool, rocksdb::UncompressionDict
 const&, rocksdb::PersistentCacheOptions const&, unsigned long, unsigned long, rocksdb::MemoryAllocator*, bool)
>      https://github.com/facebook/rocksdb/issues/6 rocksdb/src/table/block_based/block_based_table_reader.cc:2331 rocksdb::BlockBasedTable::RetrieveBlock(rocksdb::FilePrefetchBuffer*, rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::UncompressionDict const&, rocksdb::CachableEntry<...>*, rocksdb::BlockType, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, bool) const
>      https://github.com/facebook/rocksdb/issues/7 rocksdb/src/table/block_based/block_based_table_reader.cc:2090 rocksdb::DataBlockIter* rocksdb::BlockBasedTable::NewDataBlockIterator<...>(rocksdb::ReadOptions const&, rocksdb::BlockHandle const&, rocksdb::DataBlockIter*, rocksdb::BlockType, bool, bool, rocksdb::GetContext*, rocksdb::BlockCacheLookupContext*, rocksdb::Status, rocksdb::FilePrefetchBuffe
r*, bool) const
>      https://github.com/facebook/rocksdb/issues/8 rocksdb/src/table/block_based/block_based_table_reader.cc:2720 rocksdb::BlockBasedTableIterator<...>::InitDataBlock()
>      https://github.com/facebook/rocksdb/issues/9 rocksdb/src/table/block_based/block_based_table_reader.cc:2607 rocksdb::BlockBasedTableIterator<...>::SeekToFirst()
>     https://github.com/facebook/rocksdb/issues/10 rocksdb/src/table/iterator_wrapper.h:83  rocksdb::IteratorWrapperBase<...>::SeekToFirst()
>     https://github.com/facebook/rocksdb/issues/11 rocksdb/src/table/merging_iterator.cc:100 rocksdb::MergingIterator::SeekToFirst()
>     https://github.com/facebook/rocksdb/issues/12 rocksdb/compaction/compaction_job.cc:877 rocksdb::CompactionJob::ProcessKeyValueCompaction(rocksdb::CompactionJob::SubcompactionState*)
>     https://github.com/facebook/rocksdb/issues/13 rocksdb/compaction/compaction_job.cc:590 rocksdb::CompactionJob::Run()
>     https://github.com/facebook/rocksdb/issues/14 rocksdb/db_impl/db_impl_compaction_flush.cc:2689 rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)
>     https://github.com/facebook/rocksdb/issues/15 rocksdb/db_impl/db_impl_compaction_flush.cc:2248 rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority)
>     https://github.com/facebook/rocksdb/issues/16 rocksdb/db_impl/db_impl_compaction_flush.cc:2024 rocksdb::DBImpl::BGWorkCompaction(void*)
>     https://github.com/facebook/rocksdb/issues/23 rocksdb/src/util/threadpool_imp.cc:266   rocksdb::ThreadPoolImpl::Impl::BGThread(unsigned long)
>     https://github.com/facebook/rocksdb/issues/24 rocksdb/src/util/threadpool_imp.cc:307   rocksdb::ThreadPoolImpl::Impl::BGThreadWrapper(void*)
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5531

Test Plan: Verified that this fixes the fb-internal Logdevice test which caught the issue.

Differential Revision: D16109702

Pulled By: sagar0

fbshipit-source-id: 1fc08549cf7b553e338a133ae11eb9f4d5011914
2019-07-03 19:06:46 -07:00
Andrew Kryczka
09ea5d8944 Fix clang build with jemalloc (#5522)
Summary:
Fixes the below build failure for clang compiler using glibc and jemalloc.

Platform: linux x86-64
Compiler: clang version 6.0.0-1ubuntu2
Build failure:
```
$ CXX=clang++ CC=clang USE_CLANG=1 WITH_JEMALLOC_FLAG=1 JEMALLOC=1 EXTRA_LDFLAGS="-L/home/andrew/jemalloc/lib/" EXTRA_CXXFLAGS="-I/home/andrew/jemalloc/include/" make check -j12
...
  CC       memory/jemalloc_nodump_allocator.o
In file included from memory/jemalloc_nodump_allocator.cc:6:
In file included from ./memory/jemalloc_nodump_allocator.h:11:
In file included from ./port/jemalloc_helper.h:16:
/usr/include/clang/6.0.0/include/mm_malloc.h:39:16: error: 'posix_memalign' is missing exception specification 'throw()'
extern "C" int posix_memalign(void **__memptr, size_t __alignment, size_t __size);
               ^
/home/andrew/jemalloc/include/jemalloc/jemalloc.h:388:26: note: expanded from macro 'posix_memalign'
#  define posix_memalign je_posix_memalign
                         ^
/home/andrew/jemalloc/include/jemalloc/jemalloc.h:77:29: note: expanded from macro 'je_posix_memalign'
#  define je_posix_memalign posix_memalign
                            ^
/home/andrew/jemalloc/include/jemalloc/jemalloc.h:232:38: note: previous declaration is here
JEMALLOC_EXPORT int JEMALLOC_NOTHROW    je_posix_memalign(void **memptr,
                                        ^
/home/andrew/jemalloc/include/jemalloc/jemalloc.h:77:29: note: expanded from macro 'je_posix_memalign'
#  define je_posix_memalign posix_memalign
                            ^
1 error generated.
Makefile:1972: recipe for target 'memory/jemalloc_nodump_allocator.o' failed
make: *** [memory/jemalloc_nodump_allocator.o] Error 1
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5522

Differential Revision: D16069869

Pulled By: miasantreble

fbshipit-source-id: c489bbc993adee194b9a550134c6237a264bc443
2019-07-02 13:02:12 -07:00
Andrew Kryczka
0d57d93a06 Support jemalloc compiled with --with-jemalloc-prefix (#5521)
Summary:
Previously, if the jemalloc was built with nonempty string for
`--with-jemalloc-prefix`, then `HasJemalloc()` would return false on
Linux, so jemalloc would not be used at runtime. On Mac, it would cause
a linker failure due to no definitions found for the weak functions
declared in "port/jemalloc_helper.h". This should be a rare problem
because (1) on Linux the default `--with-jemalloc-prefix` value is the
empty string, and (2) Homebrew's build explicitly sets
`--with-jemalloc-prefix` to the empty string.

However, there are cases where `--with-jemalloc-prefix` is nonempty.
For example, when building jemalloc from source on Mac, the default
setting is `--with-jemalloc-prefix=je_`. Such jemalloc builds should be
usable by RocksDB.

The fix is simple. Defining `JEMALLOC_MANGLE` before including
"jemalloc.h" causes it to define unprefixed symbols that are aliases for
each of the prefixed symbols. Thanks to benesch for figuring this out
and explaining it to me.

Fixes https://github.com/facebook/rocksdb/issues/1462.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5521

Test Plan:
build jemalloc with prefixed symbols:

```
$ ./configure --with-jemalloc-prefix=lol
$ make
```

compile rocksdb against it:

```
$ WITH_JEMALLOC_FLAG=1 JEMALLOC=1 EXTRA_LDFLAGS="-L/home/andrew/jemalloc/lib/" EXTRA_CXXFLAGS="-I/home/andrew/jemalloc/include/" make -j12 ./db_bench
```

run db_bench and verify jemalloc actually used:

```
$ ./db_bench -benchmarks=fillrandom -statistics=true -dump_malloc_stats=true -stats_dump_period_sec=1
$ grep jemalloc /tmp/rocksdbtest-1000/dbbench/LOG
2019/06/29-12:20:52.088658 7fc5fb7f6700 [_impl/db_impl.cc:837] ___ Begin jemalloc statistics ___
...
```

Differential Revision: D16092758

fbshipit-source-id: c2c358346190ed62ceb2a3547a6c4c180b12f7c4
2019-07-02 12:07:01 -07:00