Commit Graph

7017 Commits

Author SHA1 Message Date
Adam Retter
00e5e1ef7f Fixes the build on Windows
Summary:
As discovered during v5.9.2 release, and forward-ported.
Closes https://github.com/facebook/rocksdb/pull/3323

Differential Revision: D6657209

Pulled By: siying

fbshipit-source-id: b560d9f8ddb89e0ffaff7c895ec80f68ddf7dab4
2018-01-03 12:27:12 -08:00
Siying Dong
ccc095a016 Speed up BlockTest.BlockReadAmpBitmap
Summary:
BlockTest.BlockReadAmpBitmap is too slow and times out in some environments. Speed it up by:
(1) improve the way the verification is done. With this it is 5 times faster
(2) run fewer tests for large blocks. This cut it down by another 10 times.
Now it can finish in similar time as other tests.
Closes https://github.com/facebook/rocksdb/pull/3313

Differential Revision: D6643711

Pulled By: siying

fbshipit-source-id: c2397d666eab5421a78ca87e1e45491e0f832a6d
2018-01-02 10:41:28 -08:00
burtonli
b5c99cc908 Disable onboard cache for compaction output
Summary:
FILE_FLAG_WRITE_THROUGH is for disabling device on-board cache in windows API, which should be disabled if user doesn't need system cache.
There was a perf issue related with this, we found during memtable flush, the high percentile latency jumps significantly. During profiling, we found those high latency (P99.9) read requests got queue-jumped by write requests from memtable flush and takes 80ms or even more time to wait, even when SSD overall IO throughput is relatively low.

After enabling FILE_FLAG_WRITE_THROUGH, we rerun the test found high percentile latency drops a lot without observable impact on writes.

Scenario 1: 40MB/s + 40MB/s  R/W compaction throughput

 Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 56.897 ms | 35.593 ms | -37.4%
P99 | 3.905 ms | 3.896 ms | -2.8%

Scenario 2:  14MB/s + 14MB/s R/W compaction throughput, cohosted with 100+ other rocksdb instances have manually triggered memtable flush operations (memtable is tiny), creating a lot of randomized the small file writes operations during test.

Original | FILE_FLAG_WRITE_THROUGH | Percentage reduction
---------------------------------------------------------------
P99.9 | 86.227   ms | 50.436 ms | -41.5%
P99 | 8.415   ms | 3.356 ms | -60.1%
Closes https://github.com/facebook/rocksdb/pull/3225

Differential Revision: D6624174

Pulled By: miasantreble

fbshipit-source-id: 321b86aee9d74470840c70e5d0d4fa9880660a91
2017-12-21 18:41:34 -08:00
Andrew Kryczka
f00e176c5b fix ForwardIterator reference to temporary object
Summary:
Fixes the following ASAN error:

```
==2108042==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7fc50ae9b868 at pc 0x7fc5112aff55 bp 0x7fff9eb9dc10 sp 0x7fff9eb9dc08
=== How to use this, how to get the raw stack trace, and more: fburl.com/ASAN ===
READ of size 8 at 0x7fc50ae9b868 thread T0
SCARINESS: 23 (8-byte-read-stack-use-after-scope)
     #0 rocksdb/dbformat.h:164                   rocksdb::InternalKeyComparator::user_comparator() const
     #1 librocksdb_src_rocksdb_lib.so+0x1429a7d  rocksdb::RangeDelAggregator::InitRep(std::vector<...> const&)
     #2 librocksdb_src_rocksdb_lib.so+0x142ceae  rocksdb::RangeDelAggregator::AddTombstones(std::unique_ptr<...>)
     #3 librocksdb_src_rocksdb_lib.so+0x1382d88  rocksdb::ForwardIterator::RebuildIterators(bool)
     #4 librocksdb_src_rocksdb_lib.so+0x1382362  rocksdb::ForwardIterator::ForwardIterator(rocksdb::DBImpl*, rocksdb::ReadOptions const&, rocksdb::ColumnFamilyData*, rocksdb::SuperVersion*)
     #5 librocksdb_src_rocksdb_lib.so+0x11f433f  rocksdb::DBImpl::NewIterator(rocksdb::ReadOptions const&, rocksdb::ColumnFamilyHandle*)
     #6 rocksdb/src/include/rocksdb/db.h:382     rocksdb::DB::NewIterator(rocksdb::ReadOptions const&)
     #7 rocksdb/db_range_del_test.cc:807         rocksdb::DBRangeDelTest_TailingIteratorRangeTombstoneUnsupported_Test::TestBody()
    #18 rocksdb/db_range_del_test.cc:1006        main

Address 0x7fc50ae9b868 is located in stack of thread T0 at offset 104 in frame
     #0 librocksdb_src_rocksdb_lib.so+0x13825af  rocksdb::ForwardIterator::RebuildIterators(bool)
```
Closes https://github.com/facebook/rocksdb/pull/3300

Differential Revision: D6612989

Pulled By: ajkr

fbshipit-source-id: e7ea2ed914c1b80a8a29d71d92440a6bd9cbcc80
2017-12-20 16:12:04 -08:00
Maysam Yabandeh
02a2c11732 Blog post for WritePrepared Txn
Summary:
Blog post to introduce the next generation of transaction engine at RocksDB.
Closes https://github.com/facebook/rocksdb/pull/3296

Differential Revision: D6612932

Pulled By: maysamyabandeh

fbshipit-source-id: 5bfa91ce84e937f5e4346bbda5a4725d0a7fd131
2017-12-20 11:42:15 -08:00
Maysam Yabandeh
0ef3fdd732 Disable need_log_sync on bg err
Summary:
When there is a background error PreprocessWrite returns without marking the logs synced. If we keep need_log_sync to true, it would try to sync them at the end, which would break the logic. The patch would unset need_log_sync if the logs end up not being marked for sync in PreprocessWrite.
Closes https://github.com/facebook/rocksdb/pull/3293

Differential Revision: D6602347

Pulled By: maysamyabandeh

fbshipit-source-id: 37ee04209e8dcfd78de891654ce50d0954abeb38
2017-12-20 08:12:24 -08:00
Wouter Beek
58b841b356 FIXED: string buffers potentially too small to fit formatted write
Summary:
This fixes the following warnings when compiled with GCC7:

util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::DBGet(rocksdb::DB*, rocksdb::Transaction*, rocksdb::ReadOptions&, uint16_t, uint64_t, bool, uint64_t*, std::__cxx11::string*, bool*)’:
util/transaction_test_util.cc:75:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
 Status RandomTransactionInserter::DBGet(
        ^~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc:84:11: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
   snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
   ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc: In static member function ‘static rocksdb::Status rocksdb::RandomTransactionInserter::Verify(rocksdb::DB*, uint16_t, uint64_t, bool, rocksdb::Random64*)’:
util/transaction_test_util.cc:245:8: error: ‘snprintf’ output may be truncated before the last format character [-Werror=format-truncation=]
 Status RandomTransactionInserter::Verify(DB* db, uint16_t num_sets,
        ^~~~~~~~~~~~~~~~~~~~~~~~~
util/transaction_test_util.cc:268:13: note: ‘snprintf’ output between 5 and 6 bytes into a destination of size 5
     snprintf(prefix_buf, sizeof(prefix_buf), "%.4u", set_i + 1);
     ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Closes https://github.com/facebook/rocksdb/pull/3295

Differential Revision: D6609411

Pulled By: maysamyabandeh

fbshipit-source-id: 33f0add471056eb59db2f8bd4366e6dfbb1a187d
2017-12-20 08:12:22 -08:00
yingsu00
f54d7f5fea Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**

RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.

This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.

**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.

Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.

1) ReadRandom in db_bench overall metrics

    PER RUN
    Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
    3-way      |  1   | 4.143   | 241387 | 26.7
    3-way      |  2   | 3.775   | 264872 | 29.3
    3-way      | 3    | 4.116   | 242929 | 26.9
    FastCrc32c|1  | 4.037   | 247727 | 27.4
    FastCrc32c|2  | 4.648   | 215166 | 23.8
    FastCrc32c|3  | 4.352   | 229799 | 25.4

     AVG
    Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
    3-way           |     4.01                               |      249,729                 |      27.63
    FastCrc32c  |     4.35                              |     230,897                  |      25.53

 2)   Crc32c computation CPU cost (inclusive samples percentage)
    PER RUN
    Implementation | run |  TotalSamples   | Crc32c percentage
    3-way                 |  1    |  4,572,250,000 | 4.37%
    3-way                 |  2    |  3,779,250,000 | 4.62%
    3-way                 |  3    |  4,129,500,000 | 4.48%
    FastCrc32c       |  1    |  4,663,500,000 | 11.24%
    FastCrc32c       |  2    |  4,047,500,000 | 12.34%
    FastCrc32c       |  3    |  4,366,750,000 | 11.68%

 **# Test Plan**
     make -j64 corruption_test && ./corruption_test
      By default it uses 3-way SSE algorithm

     NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test

    make clean && DEBUG_LEVEL=0 make -j64 db_bench
    make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173

Differential Revision: D6330882

Pulled By: yingsu00

fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-19 18:26:49 -08:00
Yi Wu
e763e1b623 BlobDB: dump blob db options on open
Summary:
We dump blob db options on blob db open, but it was removed by mistake in #3246. Adding it back.
Closes https://github.com/facebook/rocksdb/pull/3298

Differential Revision: D6607177

Pulled By: yiwu-arbug

fbshipit-source-id: 2a4aacbfa52fd8f1878dc9e1fbb95fe48faf80c0
2017-12-19 16:57:12 -08:00
Yi Wu
48cf8da2bb BlobDB: update blob_db_options.bytes_per_sync behavior
Summary:
Previously, if blob_db_options.bytes_per_sync, there is a background job to call fsync() for every bytes_per_sync bytes written to a blob file. With the change we simply pass bytes_per_sync as env_options_ to blob files so that sync_file_range() will be used instead.
Closes https://github.com/facebook/rocksdb/pull/3297

Differential Revision: D6606994

Pulled By: yiwu-arbug

fbshipit-source-id: 452424be52e32ba92f5ea603b564e9b88929af47
2017-12-19 16:41:41 -08:00
Yi Wu
06149429d9 WritePrepared Txn: Return NotSupported on iterator refresh
Summary:
A proper implementation of Iterator::Refresh() for WritePreparedTxnDB would require release and acquire another snapshot. Since MyRocks don't make use of Iterator::Refresh(), we just simply mark it as not supported.
Closes https://github.com/facebook/rocksdb/pull/3290

Differential Revision: D6599931

Pulled By: yiwu-arbug

fbshipit-source-id: 4e1632d967316431424f6e458254ecf9a97567cf
2017-12-18 22:29:30 -08:00
Andrew Kryczka
1563801bce blog post for auto-tuned rate limiter
Summary:
Wrote the blog post.
Closes https://github.com/facebook/rocksdb/pull/3289

Differential Revision: D6599031

Pulled By: ajkr

fbshipit-source-id: 77ee553196f225f20c56112d2c015b6fa14f1b83
2017-12-18 17:56:50 -08:00
Yi Wu
2190e96727 Remove incorrect comment
Summary:
We actually create individual compaction filter from compaction filter factory per sub-compaction in `CompactionJob::ProcessKeyValueCompaction`: https://github.com/facebook/rocksdb/blob/master/db/compaction_job.cc#L742
The comment seems incorrect.
Closes https://github.com/facebook/rocksdb/pull/3288

Differential Revision: D6598455

Pulled By: yiwu-arbug

fbshipit-source-id: a6bc059a9103b87a73ae6ec4bb01ca33f5d48cf5
2017-12-18 17:56:47 -08:00
Maysam Yabandeh
0faa026db6 WritePrepared Txn: make buck tests parallel
Summary:
The TSAN version of tests could take quite long. Make the buck tests parallel to avoid timeouts.
Closes https://github.com/facebook/rocksdb/pull/3280

Differential Revision: D6581594

Pulled By: maysamyabandeh

fbshipit-source-id: 3f8476d8c69f0183e394fa8a2089dd8d4e90c90c
2017-12-18 14:42:09 -08:00
Maysam Yabandeh
78c2eedb4f fix release order in validateNumberOfEntries
Summary:
ScopedArenaIterator should be defined after range_del_agg so that it destructs the assigned iterator, which depends on range_del_agg, before it range_del_agg is already destructed.
Closes https://github.com/facebook/rocksdb/pull/3281

Differential Revision: D6592332

Pulled By: maysamyabandeh

fbshipit-source-id: 89a15d8ed13d0fc856b0c47dce3d91778738dbac
2017-12-18 14:27:28 -08:00
Guo Xiao
aa6509d8e4 Fix build for linux
Summary:
* Include `unistd.h` for `sleep(3)`
* Include `sys/time.h` for `gettimeofday(3)`
* Include `utils/random.h` for `Random64`

Error messages:

utilities/persistent_cache/hash_table_bench.cc: In constructor ‘rocksdb::HashTableBenchmark::HashTableBenchmark(rocksdb::HashTableImpl<long unsigned int, std::__cxx11::basic_string<char> >*, size_t, size_t, size_t, size_t)’:
utilities/persistent_cache/hash_table_bench.cc:76:28: error: ‘sleep’ was not declared in this scope
       /* sleep override */ sleep(1);
                            ^~~~~
utilities/persistent_cache/hash_table_bench.cc:76:28: note: suggested alternative: ‘strsep’
       /* sleep override */ sleep(1);
                            ^~~~~
                            strsep
utilities/persistent_cache/hash_table_bench.cc: In member function ‘void rocksdb::HashTableBenchmark::RunRead()’:
utilities/persistent_cache/hash_table_bench.cc:107:5: error: ‘Random64’ was not declared in this scope
     Random64 rgen(time(nullptr));
     ^~~~~~~~
utilities/persistent_cache/hash_table_bench.cc:107:5: note: suggested alternative: ‘random_r’
     Random64 rgen(time(nullptr));
     ^~~~~~~~
     random_r
utilities/persistent_cache/hash_table_bench.cc:110:18: error: ‘rgen’ was not declared in this scope
       size_t k = rgen.Next() % max_prepop_key;
                  ^~~~
utilities/persistent_cache/hash_table_bench.cc: In static member function ‘static uint64_t rocksdb::HashTableBenchmark::NowInMillSec()’:
utilities/persistent_cache/hash_table_bench.cc:153:5: error: ‘gettimeofday’ was not declared in this scope
     gettimeofday(&tv, /*tz=*/nullptr);
     ^~~~~~~~~~~~
make[2]: *** [CMakeFiles/hash_table_bench.dir/build.make:63: CMakeFiles/hash_table_bench.dir/utilities/persistent_cache/hash_table_bench.cc.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:3346: CMakeFiles/hash_table_bench.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
Closes https://github.com/facebook/rocksdb/pull/3283

Differential Revision: D6594850

Pulled By: ajkr

fbshipit-source-id: fd83957338c210cdfd253763347aafd39476824f
2017-12-18 12:28:03 -08:00
Maysam Yabandeh
a6d3c762df WritePrepared Txn: non-2pc write in one round
Summary:
Currently non-2pc writes do the 2nd dummy write to actually commit the transaction. This was necessary to ensure that publishing the commit sequence number will be done only from one queue (the queue that does not write to memtable). This is however not necessary when we have only one write queue, which is actually the setup that would be used by non-2pc writes. This patch eliminates the 2nd write when two_write_queues are disabled by updating the commit map in the 1st write.
Closes https://github.com/facebook/rocksdb/pull/3277

Differential Revision: D6575392

Pulled By: maysamyabandeh

fbshipit-source-id: 8ab458f7ca506905962f9166026b2ec81e749c46
2017-12-18 08:19:43 -08:00
Anand Ananthabhotla
fccc12f386 Add a histogram stat for memtable flush
Summary:
Add a new histogram stat called rocksdb.db.flush.micros for memtable
flush
Closes https://github.com/facebook/rocksdb/pull/3269

Differential Revision: D6559496

Pulled By: anand1976

fbshipit-source-id: f5c771ba2568630458751795e8c37a493ff9b14d
2017-12-15 18:57:00 -08:00
Maysam Yabandeh
95583e1532 db_stress: skip snapshot check if cf is dropped
Summary:
We added a new verification that ensures a value that snapshot reads when is released is the same as when it was created. This test however fails when the cf is dropped in between. The patch skips the tests if that was the case.
Closes https://github.com/facebook/rocksdb/pull/3279

Differential Revision: D6581584

Pulled By: maysamyabandeh

fbshipit-source-id: afe37d371c0f91818d2e279b3949b810e112e8eb
2017-12-15 16:28:04 -08:00
Yi Wu
237b292515 BlobDB: Remove the need to get sequence number per write
Summary:
Previously we store sequence number range of each blob files, and use the sequence number range to check if the file can be possibly visible by a snapshot. But it adds complexity to the code, since the sequence number is only available after a write. (The current implementation get sequence number by calling GetLatestSequenceNumber(), which is wrong.) With the patch, we are not storing sequence number range, and check if snapshot_sequence < obsolete_sequence to decide if the file is visible by a snapshot (previously we check if first_sequence <= snapshot_sequence < obsolete_sequence).
Closes https://github.com/facebook/rocksdb/pull/3274

Differential Revision: D6571497

Pulled By: yiwu-arbug

fbshipit-source-id: ca06479dc1fcd8782f6525b62b7762cd47d61909
2017-12-15 13:27:30 -08:00
Andrew Kryczka
a79c7c05e8 fix backup meta-file buffer overrun
Summary:
- check most times after calling snprintf that the buffer didn't fill up. Previously we'd proceed and use `buf_size - len` as the length in subsequent calls, which underflowed as those are unsigned size_t.
- replace some memcpys with snprintf for consistency
Closes https://github.com/facebook/rocksdb/pull/3255

Differential Revision: D6541464

Pulled By: ajkr

fbshipit-source-id: 8610ea6a24f38e0a37c6d17bc65b7c712da6d932
2017-12-15 12:29:16 -08:00
Andrew Kryczka
5a7e08468a fix ThreadStatus for bottom-pri compaction threads
Summary:
added `ThreadType::BOTTOM_PRIORITY` which is used in the `ThreadStatus` object to indicate the thread is used for bottom-pri compactions. Previously there was a bug where we mislabeled such threads as `ThreadType::LOW_PRIORITY`.
Closes https://github.com/facebook/rocksdb/pull/3270

Differential Revision: D6559428

Pulled By: ajkr

fbshipit-source-id: 96b1a50a9c19492b1a5fd1b77cf7061a6f9f1d1c
2017-12-14 14:57:49 -08:00
Orvid King
b4d88d7128 Fix the build with MSVC 2017
Summary:
There were a few places where MSVC's implicit truncation warnings were getting triggered, which was causing the MSVC build to fail due to warnings being treated as errors. This resolves the issues by making the truncations in some places explicit, and by making it so there are no truncations of literals.

Fixes #3239
Supersedes #3259
Closes https://github.com/facebook/rocksdb/pull/3273

Reviewed By: yiwu-arbug

Differential Revision: D6569204

Pulled By: Orvid

fbshipit-source-id: c188cf1cf98d9acb6d94b71875041cc81f8ff088
2017-12-14 12:02:22 -08:00
Siying Dong
def6a00740 Print out compression type of new SST files in logging
Summary: Closes https://github.com/facebook/rocksdb/pull/3264

Differential Revision: D6552768

Pulled By: siying

fbshipit-source-id: 6303110aff22f341d5cff41f8d2d4f138a53652d
2017-12-14 10:27:43 -08:00
Siying Dong
6b77c07379 NUMBER_BLOCK_COMPRESSED, etc, shouldn't be treated as timer counter
Summary:
NUMBER_BLOCK_DECOMPRESSED and NUMBER_BLOCK_COMPRESSED are not reported unless the stats level contain detailed timers, which is wrong. They are normal counters. Fix it.
Closes https://github.com/facebook/rocksdb/pull/3263

Differential Revision: D6552519

Pulled By: siying

fbshipit-source-id: 40899ccea7b2856bb39752616657c0bfd432f6f9
2017-12-14 10:27:43 -08:00
Maysam Yabandeh
cd2e5cae7f WritePrepared Txn: make db_stress transactional
Summary:
Add "--use_txn" option to use transactional API in db_stress, default being WRITE_PREPARED policy, which is the main intention of modifying db_stress. It also extend the existing snapshots to verify that before releasing a snapshot a read from it returns the same value as before.
Closes https://github.com/facebook/rocksdb/pull/3243

Differential Revision: D6556912

Pulled By: maysamyabandeh

fbshipit-source-id: 1ae31465be362d44bd06e635e2e9e49a1da11268
2017-12-13 11:57:29 -08:00
Maysam Yabandeh
546a63272f disableWAL with WriteImplWALOnly
Summary:
Currently WriteImplWALOnly simply returns when disableWAL is set. This is an incorrect behavior since it does not allocated the sequence number, which is a side-effect of writing to the WAL. This patch fixes the issue.
Closes https://github.com/facebook/rocksdb/pull/3262

Differential Revision: D6550974

Pulled By: maysamyabandeh

fbshipit-source-id: 745a83ae8f04e7ca6c8ffb247d6ef16c287c52e7
2017-12-13 07:57:44 -08:00
Maysam Yabandeh
35dfbd58dd WritePrepared Txn: GC old_commit_map_
Summary:
Garbage collect entries from old_commit_map_ when the corresponding snapshots are released.
Closes https://github.com/facebook/rocksdb/pull/3247

Differential Revision: D6528478

Pulled By: maysamyabandeh

fbshipit-source-id: 15d1566d85d4ac07036bc0dc47418f6c3228d4bf
2017-12-13 07:57:43 -08:00
Zhongyi Xie
51c2ea0feb Reduce heavy hitter for Get operation
Summary:
This PR addresses the following heavy hitters in `Get` operation by moving calls to `StatisticsImpl::recordTick` from `BlockBasedTable` to `Version::Get`

- rocksdb.block.cache.bytes.write
- rocksdb.block.cache.add
- rocksdb.block.cache.data.miss
- rocksdb.block.cache.data.bytes.insert
- rocksdb.block.cache.data.add
- rocksdb.block.cache.hit
- rocksdb.block.cache.data.hit
- rocksdb.block.cache.bytes.read

The db_bench statistics before and after the change are:

|1GB block read|Children      |Self  |Command          |Shared Object        |Symbol|
|---|---|---|---|---|---|
|master:     |4.22%     |1.31%  |db_bench  |db_bench  |[.] rocksdb::StatisticsImpl::recordTick|
|updated:    |0.51%     |0.21%  |db_bench  |db_bench  |[.] rocksdb::StatisticsImpl::recordTick|
|     	     |0.14%     |0.14%  |db_bench  |db_bench  |[.] rocksdb::GetContext::record_counters|

|1MB block read|Children      |Self  |Command          |Shared Object        |Symbol|
|---|---|---|---|---|---|
|master:    |3.48%     |1.08%  |db_bench  |db_bench  |[.] rocksdb::StatisticsImpl::recordTick|
|updated:    |0.80%     |0.31%  |db_bench  |db_bench  |[.] rocksdb::StatisticsImpl::recordTick|
|    	     |0.35%     |0.35%  |db_bench  |db_bench  |[.] rocksdb::GetContext::record_counters|
Closes https://github.com/facebook/rocksdb/pull/3172

Differential Revision: D6330532

Pulled By: miasantreble

fbshipit-source-id: 2b492959e00a3db29e9437ecdcc5e48ca4ec5741
2017-12-12 21:11:33 -08:00
Islam AbdelRahman
9089373a01 Fix DeleteScheduler::MarkAsTrash() handling existing trash
Summary:
DeleteScheduler::MarkAsTrash() don't handle existing .trash files correctly
This cause rocksdb to not being able to delete existing .trash files on restart
Closes https://github.com/facebook/rocksdb/pull/3261

Differential Revision: D6548003

Pulled By: IslamAbdelRahman

fbshipit-source-id: c3800639412e587a690062c63076a5a08881e0e6
2017-12-12 18:17:13 -08:00
Yi Wu
7393ef779c Fix BlockFetcher ASAN error
Summary:
Some call sites of BlockFetcher create temporary ReadOptions and pass to BlockFetcher. The temporary object will be gone after BlockFetcher construction but BlockFetcher keep its reference, causing stack-use-after-scope. Fixing it.
Closes https://github.com/facebook/rocksdb/pull/3258

Differential Revision: D6547152

Pulled By: yiwu-arbug

fbshipit-source-id: 6b49e9dd46bb72307f5d8f88ea15faacff35b9bc
2017-12-12 12:12:38 -08:00
Souvik Banerjee
4bcb7fb148 Update transaction_test_util.cc
Summary:
Fixes a compile error on gcc 7.2.1 (-Werror=format-truncation=).
Closes https://github.com/facebook/rocksdb/pull/3248

Differential Revision: D6546515

Pulled By: yiwu-arbug

fbshipit-source-id: bd78cca63f2af376faceccb1838d2d4cc9208fef
2017-12-12 12:12:38 -08:00
Yi Wu
e3a06f12d2 WritePrepared Txn: fix compaction filter snapshot checks
Summary:
Add snapshot_checker check whenever we need to check sequence against snapshots and decide what to do with an input key. The changes are related to one of:
* compaction filter
* single delete
* delete at bottom level
* merge
Closes https://github.com/facebook/rocksdb/pull/3251

Differential Revision: D6537850

Pulled By: yiwu-arbug

fbshipit-source-id: 3faba40ed5e37779f4a0cb7ae78af9546659c7f2
2017-12-12 11:12:24 -08:00
Siying Dong
a9c8d4ef15 Fix memory issue introduced by 2f1a3a4d74
Summary: Closes https://github.com/facebook/rocksdb/pull/3256

Differential Revision: D6541714

Pulled By: siying

fbshipit-source-id: 40efd89b68587a9d58cfe6f4eebd771c2d9f1542
2017-12-11 18:27:28 -08:00
Zhongyi Xie
bb5ed4b1d1 exclude DynamicUniversalCompactionOptions from ROCKSDB_LITE
Summary:
since [SetOptions](https://github.com/facebook/rocksdb/blob/master/db/db_impl.cc#L494) is not supported in ROCKSDB_LITE
Right now unit test under lite is broken
Closes https://github.com/facebook/rocksdb/pull/3253

Differential Revision: D6539428

Pulled By: miasantreble

fbshipit-source-id: 13172b8ecbd75682330726498ea198969bc3e637
2017-12-11 16:28:20 -08:00
Siying Dong
0d5692e02b Switch version to 5.10
Summary: Closes https://github.com/facebook/rocksdb/pull/3252

Differential Revision: D6539373

Pulled By: siying

fbshipit-source-id: ce7c3d3fe625852179055295da9cf7bc80755025
2017-12-11 15:42:01 -08:00
Siying Dong
2f1a3a4d74 Refactor ReadBlockContents()
Summary:
Divide ReadBlockContents() to multiple sub-functions. Maintaining the input and intermediate data in a new class BlockFetcher.
I hope in general it makes the code easier to maintain.
Another motivation to do it is to clearly divide the logic before file reading and after file reading. The refactor will help us evaluate how can we make I/O async in the future.
Closes https://github.com/facebook/rocksdb/pull/3244

Differential Revision: D6520983

Pulled By: siying

fbshipit-source-id: 338d90bc0338472d46be7a7682028dc9114b12e9
2017-12-11 15:27:32 -08:00
Yi Wu
9a27ac5d89 Fix drop column family data race
Summary:
A data race is caught by tsan_crash test between compaction and DropColumnFamily:
https://gist.github.com/yiwu-arbug/5a2b4baae05eeb99ae1719b650f30a44 Compaction checks if the column family has been dropped on each key input, while user can issue DropColumnFamily which updates cfd->dropped_, causing the data race. Fixing it by making cfd->dropped_ an atomic.
Closes https://github.com/facebook/rocksdb/pull/3250

Differential Revision: D6535991

Pulled By: yiwu-arbug

fbshipit-source-id: 5571df020beae7fa7db6fff5ad0d598f49962895
2017-12-11 13:57:48 -08:00
Zhongyi Xie
fcc8a6574d Make Universal compaction options dynamic
Summary:
Let me know if more test coverage is needed
Closes https://github.com/facebook/rocksdb/pull/3213

Differential Revision: D6457165

Pulled By: miasantreble

fbshipit-source-id: 3f944abff28aa7775237f1c4f61c64ccbad4eea9
2017-12-11 13:27:06 -08:00
Yi Wu
250a51a3f9 BlobDB: refactor DB open logic
Summary:
Refactor BlobDB open logic. List of changes:

Major:
* On reopen, mark blob files found as immutable, do not use them for writing new keys.
* Not to scan the whole file to find file footer. Instead just seek to the end of the file and try to read footer.

Minor:
* Move most of the real logic from blob_db.cc to blob_db_impl.cc.
* Not to hold shared_ptr of event listeners in global maps in blob_db.cc
* Some changes to BlobFile interface.
* Improve logging and error handling.
Closes https://github.com/facebook/rocksdb/pull/3246

Differential Revision: D6526147

Pulled By: yiwu-arbug

fbshipit-source-id: 9dc4cdd63359a2f9b696af817086949da8d06952
2017-12-11 12:12:38 -08:00
Prashant D
6a183d1ae8 Fix coverity issues compaction_job, compaction_picker
Summary:
db/compaction_job.cc:
  ReportStartedCompaction(compaction);

CID 1419863 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member bottommost_level_ is not initialized in this constructor nor in any functions that it calls.

db/compaction_picker_universal.cc:
7struct InputFileInfo {
   	2. uninit_member: Non-static class member level is not initialized in this constructor nor in any functions that it calls.

CID 1405355 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member index is not initialized in this constructor nor in any functions that it calls.
 38  InputFileInfo() : f(nullptr) {}

db/dbformat.h:
 ParsedInternalKey()
 84      : sequence(kMaxSequenceNumber)  // Make code analyzer happy

CID 1168095 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
2. uninit_member: Non-static class member type is not initialized in this constructor nor in any functions that it calls.
 85  {}  // Intentionally left uninitialized (for speed)
Closes https://github.com/facebook/rocksdb/pull/3091

Differential Revision: D6534558

Pulled By: yiwu-arbug

fbshipit-source-id: 5ada975956196d267b3f149386842af71eda7553
2017-12-11 11:57:15 -08:00
Andrew Kryczka
e3814a8608 revert fbcode build behavior
Summary: Closes https://github.com/facebook/rocksdb/pull/3242

Differential Revision: D6514255

Pulled By: ajkr

fbshipit-source-id: c39fa8e745866b052649d02bf339e794d77e96a3
2017-12-07 16:12:52 -08:00
Dmitri Smirnov
fe608e32ab Fix a race condition in WindowsThread (port::Thread)
Summary:
Fix a race condition when we create a thread and immediately destroy
 This case should be supported.
  What happens is that the thread function needs the Data instance
  to actually run but has no shared ownership and must rely on the
  WindowsThread instance to continue existing.
  To address this we change unique_ptr to shared_ptr and then
  acquire an additional refcount for the threadproc which destroys it
  just before the thread exit.
  We choose to allocate shared_ptr instance on the heap as this allows
  the original thread to continue w/o waiting for the new thread to start
  running.
Closes https://github.com/facebook/rocksdb/pull/3240

Differential Revision: D6511324

Pulled By: yiwu-arbug

fbshipit-source-id: 4633ff7996daf4d287a9fe34f60c1dd28cf4ff36
2017-12-07 13:42:53 -08:00
Prashant D
34aa245dd8 Fix coverity issues version, write_batch
Summary:
db/version_builder.cc:
117        base_vstorage_->InternalComparator();

CID 1351713 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member field level_zero_cmp_.internal_comparator is not initialized in this constructor nor in any functions that it calls.

db/version_edit.h:
145  FdWithKeyRange()
146      : fd(),
147        smallest_key(),
148        largest_key() {

CID 1418254 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member file_metadata is not initialized in this constructor nor in any functions that it calls.
149  }

db/version_set.cc:
120    }

CID 1322789 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
4. uninit_member: Non-static class member curr_file_level_ is not initialized in this constructor nor in any functions that it calls.
121  }

db/write_batch.cc:
 939    assert(cf_mems_);

CID 1419862 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
3. uninit_member: Non-static class member rebuilding_trx_seq_ is not initialized in this constructor nor in any functions that it calls.
 940  }
Closes https://github.com/facebook/rocksdb/pull/3092

Differential Revision: D6505666

Pulled By: yiwu-arbug

fbshipit-source-id: fd2c68948a0280772691a419d72ac7e190951d86
2017-12-07 11:57:36 -08:00
Prashant D
baff91c1ad table: Fix coverity issues
Summary:
table/block.cc:
420  }

CID 1396127 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
7. uninit_member: Non-static class member restart_offset_ is not initialized in this constructor nor in any functions that it calls.
421}

table/block_based_table_builder.cc:

CID 1418259 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
7. uninit_member: Non-static class member compressed_cache_key_prefix_size is not initialized in this constructor nor in any functions that it calls.

table/block_based_table_reader.h:
   	3. uninit_member: Non-static class member index_type is not initialized in this constructor nor in any functions that it calls.

CID 1396147 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
5. uninit_member: Non-static class member hash_index_allow_collision is not initialized in this constructor nor in any functions that it calls.
413        global_seqno(kDisableGlobalSequenceNumber) {}
414

table/cuckoo_table_reader.cc:
 55  if (hash_funs == user_props.end()) {
 56    status_ = Status::Corruption("Number of hash functions not found");
   	5. uninit_member: Non-static class member is_last_level_ is not initialized in this constructor nor in any functions that it calls.
   	7. uninit_member: Non-static class member identity_as_first_hash_ is not initialized in this constructor nor in any functions that it calls.
   	9. uninit_member: Non-static class member use_module_hash_ is not initialized in this constructor nor in any functions that it calls.
   	11. uninit_member: Non-static class member num_hash_func_ is not initialized in this constructor nor in any functions that it calls.
   	13. uninit_member: Non-static class member key_length_ is not initialized in this constructor nor in any functions that it calls.
   	15. uninit_member: Non-static class member user_key_length_ is not initialized in this constructor nor in any functions that it calls.
   	17. uninit_member: Non-static class member value_length_ is not initialized in this constructor nor in any functions that it calls.
   	19. uninit_member: Non-static class member bucket_length_ is not initialized in this constructor nor in any functions that it calls.
   	21. uninit_member: Non-static class member cuckoo_block_size_ is not initialized in this constructor nor in any functions that it calls.
   	23. uninit_member: Non-static class member cuckoo_block_bytes_minus_one_ is not initialized in this constructor nor in any functions that it calls.

CID 1322785 (#2 of 2): Uninitialized scalar field (UNINIT_CTOR)
25. uninit_member: Non-static class member table_size_ is not initialized in this constructor nor in any functions that it calls.
 57    return;

table/plain_table_index.h:
   	2. uninit_member: Non-static class member index_size_ is not initialized in this constructor nor in any functions that it calls.

CID 1322801 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
4. uninit_member: Non-static class member sub_index_size_ is not initialized in this constructor nor in any functions that it calls.
128        huge_page_tlb_size_(huge_page_tlb_size) {}
129
Closes https://github.com/facebook/rocksdb/pull/3113

Differential Revision: D6505719

Pulled By: yiwu-arbug

fbshipit-source-id: 38f44d8f9dfefb4c2e25d83b8df25a5201c75618
2017-12-07 11:57:36 -08:00
Andrew Kryczka
2e3a00987e fix ASAN for DeleteFilesInRange test case
Summary:
error message was

```
==3095==ERROR: AddressSanitizer: stack-use-after-scope on address 0x7ffd18216c40 at pc 0x0000005edda1 bp 0x7ffd18215550 sp 0x7ffd18214d00
...
Address 0x7ffd18216c40 is located in stack of thread T0 at offset 1952 in frame
     #0 internal_repo_rocksdb/db_compaction_test.cc:1520 rocksdb::DBCompactionTest_DeleteFileRangeFileEndpointsOverlapBug_Test::TestBody()
```

It was unsafe to have slices referring to the temporary string objects' buffers, as those strings were destroyed before the slices were used. Fixed it by assigning the strings returned by `Key()` to local variables.
Closes https://github.com/facebook/rocksdb/pull/3238

Differential Revision: D6507864

Pulled By: ajkr

fbshipit-source-id: dd07de1a0070c6748c1ab4f3d7bd31f9a81889d0
2017-12-07 11:12:43 -08:00
Yi Wu
e1c569c324 Fix clang-analyzer false-positive on ldb_cmd.cc
Summary:
clang-analyzer complaint about db_ being nullptr, but it couldn't be because it checks exec_stats before proceed. Add an assert to get around the false-positive.

Test Plan
`make analyze`
Closes https://github.com/facebook/rocksdb/pull/3236

Differential Revision: D6505417

Pulled By: yiwu-arbug

fbshipit-source-id: e5b65764ea994dd9e4bab3e697b97dc70dc22cab
2017-12-06 22:58:46 -08:00
Sagar Vemuri
bbef8c3884 Log GetCurrentTime failures during Flush and Compaction
Summary:
`GetCurrentTime()` is used to populate `creation_time` table property during flushes and compactions. It is safe to ignore `GetCurrentTime()` failures here but they should be logged.

(Note that `creation_time` property was introduced as part of TTL-based FIFO compaction in #2480.)

Tes Plan:
`make check`
Closes https://github.com/facebook/rocksdb/pull/3231

Differential Revision: D6501935

Pulled By: sagar0

fbshipit-source-id: 376adcf4ab801d3a43ec4453894b9a10909c8eb6
2017-12-06 20:56:53 -08:00
Sagar Vemuri
d51fcb21f4 Blob DB: Add db_bench options
Summary:
Adding more BlobDB db_bench options which are needed for benchmarking.
Closes https://github.com/facebook/rocksdb/pull/3230

Differential Revision: D6500711

Pulled By: sagar0

fbshipit-source-id: 91d63122905854ef7c9148a0235568719146e6c5
2017-12-06 20:44:12 -08:00
Andrew Kryczka
53a516ab58 update history for recent commits
Summary:
I browsed through the history since 5.9 was released and found some changes worth mentioning in HISTORY.md.
Closes https://github.com/facebook/rocksdb/pull/3237

Differential Revision: D6506472

Pulled By: ajkr

fbshipit-source-id: 627ce9f94ca33df9f0f231a9c5ced3624b05506c
2017-12-06 19:56:17 -08:00