Commit Graph

1625 Commits

Author SHA1 Message Date
Burton Li
a8b9891f95 Concurrent task limiter for compaction thread control (#4332)
Summary:
The PR is targeting to resolve the issue of:
https://github.com/facebook/rocksdb/issues/3972#issue-330771918

We have a rocksdb created with leveled-compaction with multiple column families (CFs), some of CFs are using HDD to store big and less frequently accessed data and others are using SSD.
When there are continuously write traffics going on to all CFs, the compaction thread pool is mostly occupied by those slow HDD compactions, which blocks fully utilize SSD bandwidth.
Since atomic write and transaction is needed across CFs, so splitting it to multiple rocksdb instance is not an option for us.

With the compaction thread control, we got 30%+ HDD write throughput gain, and also a lot smooth SSD write since less write stall happening.

ConcurrentTaskLimiter can be shared with multi-CFs across rocksdb instances, so the feature does not only work for multi-CFs scenarios, but also for multi-rocksdbs scenarios, who need disk IO resource control per tenant.

The usage is straight forward:
e.g.:

//
// Enable compaction thread limiter thru ColumnFamilyOptions
//
std::shared_ptr<ConcurrentTaskLimiter> ctl(NewConcurrentTaskLimiter("foo_limiter", 4));
Options options;
ColumnFamilyOptions cf_opt(options);
cf_opt.compaction_thread_limiter = ctl;
...

//
// Compaction thread limiter can be tuned or disabled on-the-fly
//
ctl->SetMaxOutstandingTask(12); // enlarge to 12 tasks
...
ctl->ResetMaxOutstandingTask(); // disable (bypass) thread limiter
ctl->SetMaxOutstandingTask(-1); // Same as above
...
ctl->SetMaxOutstandingTask(0);  // full throttle (0 task)

//
// Sharing compaction thread limiter among CFs (to resolve multiple storage perf issue)
//
std::shared_ptr<ConcurrentTaskLimiter> ctl_ssd(NewConcurrentTaskLimiter("ssd_limiter", 8));
std::shared_ptr<ConcurrentTaskLimiter> ctl_hdd(NewConcurrentTaskLimiter("hdd_limiter", 4));
Options options;
ColumnFamilyOptions cf_opt_ssd1(options);
ColumnFamilyOptions cf_opt_ssd2(options);
ColumnFamilyOptions cf_opt_hdd1(options);
ColumnFamilyOptions cf_opt_hdd2(options);
ColumnFamilyOptions cf_opt_hdd3(options);

// SSD CFs
cf_opt_ssd1.compaction_thread_limiter = ctl_ssd;
cf_opt_ssd2.compaction_thread_limiter = ctl_ssd;

// HDD CFs
cf_opt_hdd1.compaction_thread_limiter = ctl_hdd;
cf_opt_hdd2.compaction_thread_limiter = ctl_hdd;
cf_opt_hdd3.compaction_thread_limiter = ctl_hdd;

...

//
// The limiter is disabled by default (or set to nullptr explicitly)
//
Options options;
ColumnFamilyOptions cf_opt(options);
cf_opt.compaction_thread_limiter = nullptr;
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4332

Differential Revision: D13226590

Pulled By: siying

fbshipit-source-id: 14307aec55b8bd59c8223d04aa6db3c03d1b0c1d
2018-12-13 13:18:28 -08:00
DorianZheng
4862720e08 Expose column family id to FlushJobInfo
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4772

Differential Revision: D13428923

Pulled By: ajkr

fbshipit-source-id: e351e9c5eea97816db25429e129357a8af90712a
2018-12-11 20:33:42 -08:00
Maysam Yabandeh
21fca397cc Fix inline comments for assumed_tracked (#4762)
Summary:
Fix the definition of assumed_tracked in Transaction that was introduced in #4680
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4762

Differential Revision: D13399150

Pulled By: maysamyabandeh

fbshipit-source-id: 2a30fe49e3c44adacd7e45cd48eae95023ca9dca
2018-12-10 09:56:21 -08:00
Anand Ananthabhotla
1b01d23be2 Add PerfContext counters for index/filter block cache stats (#4540)
Summary:
Add counters to track block cache index/filter hits and misses. We currently count aggregate hits and misses, which includes index/filter/data blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4540

Differential Revision: D10459652

Pulled By: anand1976

fbshipit-source-id: 0c59eee7f12f5103dcb6686f0e7995babe63d425
2018-12-07 15:07:56 -08:00
Maysam Yabandeh
b878f93c70 Extend Transaction::GetForUpdate with do_validate (#4680)
Summary:
Transaction::GetForUpdate is extended with a do_validate parameter with default value of true. If false it skips validating the snapshot (if there is any) before doing the read. After the read it also returns the latest value (expects the ReadOptions::snapshot to be nullptr). This allows RocksDB applications to use GetForUpdate similarly to how InnoDB does. Similarly ::Merge, ::Put, ::Delete, and ::SingleDelete are extended with assume_exclusive_tracked with default value of false. It true it indicates that call is assumed to be after a ::GetForUpdate(do_validate=false).
The Java APIs are accordingly updated.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4680

Differential Revision: D13068508

Pulled By: maysamyabandeh

fbshipit-source-id: f0b59db28f7f6a078b60844d902057140765e67d
2018-12-06 17:49:00 -08:00
Zhongyi Xie
2f1ca4e838 Revert "BaseDeltaIterator: always check valid() before accessing key(… (#4744)
Summary:
…) (#4702)"

This reverts commit 3a18bb3e15.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4744

Differential Revision: D13311869

Pulled By: miasantreble

fbshipit-source-id: 6300b12cc34828d8b9274e907a3aef1506d5d553
2018-12-03 23:38:27 -08:00
Zhongyi Xie
3a18bb3e15 BaseDeltaIterator: always check valid() before accessing key() (#4702)
Summary:
Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors.
This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator
Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702

Differential Revision: D13146643

Pulled By: miasantreble

fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0
2018-11-30 15:35:13 -08:00
Siying Dong
6e938c904f Make NewBloomFilterPolicy() use full filter by default (#4735)
Summary:
Full block (use_block_based_builder=false) Bloom filter has clear CPU saving benefits but with limitation of using temp memory when building an SST file proportional to the SST file size. We reduced the chance of having large SST files with multi-level universal compaction. Now we change to a default with better performance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4735

Differential Revision: D13266674

Pulled By: siying

fbshipit-source-id: 7594a4c3e32568a5a2adce22bb0e46553e55c602
2018-11-30 13:13:27 -08:00
Yi Wu
cf1df5d3cb JemallocNodumpAllocator: option to limit tcache memory usage (#4736)
Summary:
Add option to limit tcache usage by allocation size. This is to reduce total tcache size in case there are many user threads accessing the allocator and incur non-trivial memory usage.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4736

Differential Revision: D13269305

Pulled By: yiwu-arbug

fbshipit-source-id: 95a9b7fc67facd66837c849137e30e137112e19d
2018-11-29 17:33:40 -08:00
Zhichao Cao
7125e24619 Add the max trace file size limitation option to Tracing (#4610)
Summary:
If user do not end the trace manually, the tracing will continue which can potential use up all the storage space and cause problem. In this PR, the max trace file size is added to the TraceOptions and user can set the value if they need or the default is 64GB.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4610

Differential Revision: D12893400

Pulled By: zhichao-cao

fbshipit-source-id: acf4b5a6076bb691778bdfbac4864e1006758953
2018-11-27 14:27:05 -08:00
Huachao Huang
5e72bc113a Add SstFileReader to read sst files (#4717)
Summary:
A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.

TODO: there is only a very simple test now, because I want to get some feedback first.
If the changes look good, I will add more tests soon.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717

Differential Revision: D13212686

Pulled By: ajkr

fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
2018-11-27 13:02:23 -08:00
Abhishek Madan
e76448185c Remove DeleteRange experimental comment (#4709)
Summary:
DeleteRange is now ready for production use. Change the header comment to reflect this, and update HISTORY.md with the feature's status.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4709

Differential Revision: D13209055

Pulled By: abhimadan

fbshipit-source-id: 65423eb1a4927cf593c38254cd87c322f73ae137
2018-11-27 11:11:35 -08:00
Adam Singer
1db4a096d4 Test mapping of Histograms and HistogramsNameMap (#4720)
Summary:
Adding sanity check test for mapping of `Histograms` and `HistogramsNameMap`

```
[==========] Running 2 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 2 tests from StatisticsTest
[ RUN      ] StatisticsTest.SanityTickers
[       OK ] StatisticsTest.SanityTickers (0 ms)
[ RUN      ] StatisticsTest.SanityHistograms
[       OK ] StatisticsTest.SanityHistograms (0 ms)
[----------] 2 tests from StatisticsTest (0 ms total)

[----------] Global test environment tear-down
[==========] 2 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 2 tests.
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4720

Differential Revision: D13217061

Pulled By: ajkr

fbshipit-source-id: 6427f4e684c36b2f3c3440808b74fee86a364683
2018-11-27 10:48:30 -08:00
Soli
f1837595a3 FIX #3278: Move global const object definitions from .h to .cc (#4691)
Summary:
Summary

We should declare constants in headers and define them in source files.
But this commit is only aimed at compound types.

I don't know if it is necessary to do the same thing to fundamental types.

I used this command to find all of the constant definitions in header files.

`find . -name "*.h" | xargs grep -e "^const .*=.*"`

And here is what I found:

```
./db/version_edit.h:const uint64_t kFileNumberMask = 0x3FFFFFFFFFFFFFFF;
./include/rocksdb/env.h:const size_t kDefaultPageSize = 4 * 1024;
./include/rocksdb/statistics.h:const std::vector<std::pair<Tickers, std::string>> TickersNameMap = {
./include/rocksdb/statistics.h:const std::vector<std::pair<Histograms, std::string>> HistogramsNameMap = {
./include/rocksdb/table.h:const uint32_t kPlainTableVariableLength = 0;
./include/rocksdb/utilities/transaction_db.h:const uint32_t kInitialMaxDeadlocks = 5;
./port/port_posix.h:const uint32_t kMaxUint32 = std::numeric_limits<uint32_t>::max();
./port/port_posix.h:const int kMaxInt32 = std::numeric_limits<int32_t>::max();
./port/port_posix.h:const uint64_t kMaxUint64 = std::numeric_limits<uint64_t>::max();
./port/port_posix.h:const int64_t kMaxInt64 = std::numeric_limits<int64_t>::max();
./port/port_posix.h:const size_t kMaxSizet = std::numeric_limits<size_t>::max();
./port/win/port_win.h:const uint32_t kMaxUint32 = UINT32_MAX;
./port/win/port_win.h:const int kMaxInt32 = INT32_MAX;
./port/win/port_win.h:const int64_t kMaxInt64 = INT64_MAX;
./port/win/port_win.h:const uint64_t kMaxUint64 = UINT64_MAX;
./port/win/port_win.h:const size_t kMaxSizet = UINT64_MAX;
./port/win/port_win.h:const size_t kMaxSizet = UINT_MAX;
./port/win/port_win.h:const uint32_t kMaxUint32 = std::numeric_limits<uint32_t>::max();
./port/win/port_win.h:const int kMaxInt32 = std::numeric_limits<int>::max();
./port/win/port_win.h:const uint64_t kMaxUint64 = std::numeric_limits<uint64_t>::max();
./port/win/port_win.h:const int64_t kMaxInt64 = std::numeric_limits<int64_t>::max();
./port/win/port_win.h:const size_t kMaxSizet = std::numeric_limits<size_t>::max();
./port/win/port_win.h:const bool kLittleEndian = true;
./table/cuckoo_table_factory.h:const uint32_t kCuckooMurmurSeedMultiplier = 816922183;
./table/data_block_hash_index.h:const uint8_t kNoEntry = 255;
./table/data_block_hash_index.h:const uint8_t kCollision = 254;
./table/data_block_hash_index.h:const uint8_t kMaxRestartSupportedByHashIndex = 253;
./table/data_block_hash_index.h:const size_t kMaxBlockSizeSupportedByHashIndex = 1u << 16;
./table/data_block_hash_index.h:const double kDefaultUtilRatio = 0.75;
./table/filter_block.h:const uint64_t kNotValid = ULLONG_MAX;
./table/format.h:const int kMagicNumberLengthByte = 8;
./third-party/fbson/FbsonJsonParser.h:const char* const kJsonDelim = " ,]}\t\r\n";
./third-party/fbson/FbsonJsonParser.h:const char* const kWhiteSpace = " \t\n\r";
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const BiggestInt kMaxBiggestInt =
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const char kDeathTestStyleFlag[] = "death_test_style";
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const char kDeathTestUseFork[] = "death_test_use_fork";
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const char kInternalRunDeathTestFlag[] = "internal_run_death_test";
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const char* pets[] = {"cat", "dog"};
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const size_t kProtobufOneLinerMaxLength = 50;
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const int kMaxStackTraceDepth = 100;
./third-party/gtest-1.7.0/fused-src/gtest/gtest.h:const T* WithParamInterface<T>::parameter_ = NULL;
./util/coding.h:const unsigned int kMaxVarint64Length = 10;
./util/filename.h:const size_t kFormatFileNumberBufSize = 38;
./util/testutil.h:const SliceTransform* RandomSliceTransform(Random* rnd, int pre_defined = -1);
./util/trace_replay.h:const std::string kTraceMagic = "feedcafedeadbeef";
./util/trace_replay.h:const unsigned int kTraceTimestampSize = 8;
./util/trace_replay.h:const unsigned int kTraceTypeSize = 1;
./util/trace_replay.h:const unsigned int kTracePayloadLengthSize = 4;
./util/trace_replay.h:const unsigned int kTraceMetadataSize =
./utilities/cassandra/serialize.h:const int64_t kCharMask = 0xFFLL;
./utilities/cassandra/serialize.h:const int32_t kBitsPerByte = 8;
```

And these 3 lines are related to this commit:

```
./include/rocksdb/statistics.h:const std::vector<std::pair<Tickers, std::string>> TickersNameMap = {
./include/rocksdb/statistics.h:const std::vector<std::pair<Histograms, std::string>> HistogramsNameMap = {
./util/trace_replay.h:const std::string kTraceMagic = "feedcafedeadbeef";
```

Any comments would be appreciated.
Thanks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4691

Differential Revision: D13208049

Pulled By: ajkr

fbshipit-source-id: e5ee55fdaec5447fc5798c6721e2821e7cdc0d5b
2018-11-26 21:32:03 -08:00
Zhongyi Xie
a21cb22ee3 Revert "apply ReadOptions.iterate_upper_bound to transaction iterator… (#4705)
Summary:
… (#4656)"

This reverts commit b76398a82b.

Will add test coverage for iterate_upper_bound before re-commit b76398
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4705

Differential Revision: D13148592

Pulled By: miasantreble

fbshipit-source-id: 4d1ce0bfd9f7a5359a7688bd780eb06a66f45b1f
2018-11-24 10:46:28 -08:00
Yi Wu
05d9d82181 Revert "Move MemoryAllocator option from Cache to BlockBasedTableOpti… (#4697)
Summary:
…ons (#4676)"

This reverts commit b32d087dbb.

`MemoryAllocator` needs to be with `Cache`, since cache entry can
outlive DB and block based table. The cache needs to hold reference to
memory allocator when deleting cache entry.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4697

Differential Revision: D13133490

Pulled By: yiwu-arbug

fbshipit-source-id: 8ef7e8a51263bfd929f892fd062665ff4ce9ce5a
2018-11-21 11:29:57 -08:00
Andrew Kryczka
a138e351bc Fix compatibility of public ticker stats (#4701)
Summary:
- Added back the `NO_ITERATORS` that was removed in 5945e16dfc.
- Marked it as deprecated since it is no longer populated, but kept for API compatibility.
- Made sure the new tickers, `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETED`, are appended at the end of the enum, in case people are relying on the int values.

The change where `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETED` were introduced is unreleased so I believe it is ok to change their ordering.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4701

Differential Revision: D13142887

Pulled By: ajkr

fbshipit-source-id: 29a336ce5b46632ce50ad42ccc4a29013f71d6d6
2018-11-20 13:13:16 -08:00
Siying Dong
13579e8c5a WriteBufferManger doens't cost to cache if no limit is set (#4695)
Summary:
WriteBufferManger is not invoked when allocating memory for memtable if the limit is not set even if a cache is passed. It is inconsistent from the comment syas. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4695

Differential Revision: D13112722

Pulled By: siying

fbshipit-source-id: 0b27eef63867f679cd06033ea56907c0569597f4
2018-11-18 16:55:43 -08:00
Andrew Kryczka
9d6d4867ab Fix uninitialized fields in file metadata (#4693)
Summary:
This is a quick fix for the uninitialized bugs in `LiveFileMetaData` and `SstFileMetaData` that were uncovered in #4686.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4693

Differential Revision: D13113189

Pulled By: ajkr

fbshipit-source-id: 18e798d031d2a59d0b55fc010c135e0126f4042d
2018-11-16 20:49:17 -08:00
Maysam Yabandeh
c2a20f1776 Fix ignoring params in default impl of GetForUpdate (#4679)
Summary:
The default implementation of GetForUpdate that receives PinnableSlice was mistakenly dropping column_family and exclusive parameters.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4679

Differential Revision: D13062531

Pulled By: maysamyabandeh

fbshipit-source-id: 7625d0c1ba872a5d894b58ced42147d6c8556a6f
2018-11-14 11:29:38 -08:00
Siying Dong
b82e57d425 Remove two variables from BlockContents class and don't use class Block for compressed block (#4650)
Summary:
We carry compression type and "cachable" variables for every block in the block cache, while they take well-known values. 8-byte is wasted for each block (2-byte for useful information but it takes 8 bytes because of padding). With this change, these two variables are removed.

The cachable information is only useful in the process of reading the block. We use other information to infer from it. For compressed blocks, the compression type is a part of the block content itself so we can get it from there.

Some code is slightly refactored so that the cachable information can flow better.

Another change is to only use class BlockContents for compressed block, and narrow the class Block to only be used for uncompressed blocks, including blocks in compressed block cache. This can make the Block class less confusing. It also saves tens of bytes for each block in compressed block cache.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4650

Differential Revision: D12969070

Pulled By: siying

fbshipit-source-id: 548b62724e9eb66993026429fd9c7c3acd1f95ed
2018-11-13 17:02:55 -08:00
Zhongyi Xie
b76398a82b apply ReadOptions.iterate_upper_bound to transaction iterator (#4656)
Summary:
Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656

Differential Revision: D13039257

Pulled By: miasantreble

fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1
2018-11-13 15:44:15 -08:00
Yi Wu
b32d087dbb Move MemoryAllocator option from Cache to BlockBasedTableOptions (#4676)
Summary:
Per offline discussion with siying, `MemoryAllocator` and `Cache` should be decouple. The idea is that memory allocator handles memory allocation, while cache handle cache policy.

It is normal that external cache libraries pack couple the two components for better optimization. If we want to integrate with such library in the future, we can make a wrapper of the library implementing both `Cache` and `MemoryAllocator` interface.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4676

Differential Revision: D13047662

Pulled By: yiwu-arbug

fbshipit-source-id: cd42e246d80ab600b4de47d073f7d2db308ce6dd
2018-11-13 13:48:38 -08:00
QingpingWang
4f0fcb78ae Expose num entries and deletions of sst files (#4623)
Summary:
he ratio of num_deletions to num_entries of a level can be useful to determine if a manual compaction needs to be triggered on a level.
Also refer #3980
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4623

Differential Revision: D13045744

Pulled By: sagar0

fbshipit-source-id: 71f3c8e363a8ffd194ec3bb0ed0b69612231f0b3
2018-11-13 11:52:19 -08:00
Soli Como
5945e16dfc Divide NO_ITERATORS into two counters NO_ITERATOR_CREATED and NO_ITERATOR_DELETE (#4498)
Summary:
Currently, `Statistics` can record tick by `recordTick()` whose second parameter is an `uint64_t`.
That means tick can only increase.
If we want to reduce tick, we have to work around like `RecordTick(statistics_, NO_ITERATORS, uint64_t(-1));`.
That's kind of a hack.

So, this PR divide `NO_ITERATORS` into two counters `NO_ITERATOR_CREATED` and `NO_ITERATOR_DELETE`, making the counters increase only.

Fixes #3013 .
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4498

Differential Revision: D10395010

Pulled By: sagar0

fbshipit-source-id: cfb523b22a37411c794b4e9da090f1ae30293db2
2018-11-13 11:46:32 -08:00
Zhongyi Xie
b313019326 use per-level perfcontext for DB::Get calls (#4617)
Summary:
this PR adds two more per-level perf context counters to track
* number of keys returned in Get call, break down by levels
* total processing time at each level during Get call
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4617

Differential Revision: D12898024

Pulled By: miasantreble

fbshipit-source-id: 6b84ef1c8097c0d9e97bee1a774958f56ab4a6c4
2018-11-13 10:40:49 -08:00
Fosco Marotto
050d73551b Update history and version for future 5.18 release.
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4669

Differential Revision: D13031522

Pulled By: gfosco

fbshipit-source-id: d09e655a9d5556594f195c5d1b786900932145ce
2018-11-12 14:45:47 -08:00
Sagar Vemuri
0c4678fd1b Update documentation about dynamic options (#4653)
Summary:
Updated the comments around all options which are currently dynamic.
Without explicitly specifying in the comments around the options, its hard for RocksDB users to know if an option is dynamically changeable or not unless they dig into the implementation details.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4653

Differential Revision: D12966735

Pulled By: sagar0

fbshipit-source-id: 1a01acbf6fe506b989e71629ce223f9803ebae27
2018-11-09 13:30:49 -08:00
Sagar Vemuri
dc3528077a Update all unique/shared_ptr instances to be qualified with namespace std (#4638)
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638

Differential Revision: D12934992

Pulled By: sagar0

fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
2018-11-09 11:19:58 -08:00
Andrew Kryczka
fffac43cfb Add DB property for SST files kept from deletion (#4618)
Summary:
This property can help debug why SST files aren't being deleted. Previously we only had the property "rocksdb.is-file-deletions-enabled". However, even when that returned true, obsolete SSTs may still not be deleted due to the coarse-grained mechanism we use to prevent newly created SSTs from being accidentally deleted. That coarse-grained mechanism uses a lower bound file number for SSTs that should not be deleted, and this property exposes that lower bound.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4618

Differential Revision: D12898179

Pulled By: ajkr

fbshipit-source-id: fe68acc041ddbcc9276bbd48976524d95aafc776
2018-11-05 20:24:40 -08:00
Bo Hou
cd9404bb77 xxhash 64 support
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4607

Reviewed By: siying

Differential Revision: D12836696

Pulled By: jsjhoubo

fbshipit-source-id: 7122ccb712d0b0f1cd998aa4477e0da1401bd870
2018-11-01 15:44:06 -07:00
Abhishek Madan
eaaf1a6f05 Promote rocksdb.{deleted.keys,merge.operands} to main table properties (#4594)
Summary:
Since the number of range deletions are reported in
TableProperties, it is confusing to not report the number of merge
operands and point deletions as top-level properties; they are
accessible through the public API, but since they are not the "main"
properties, they do not appear in aggregated table properties, or the
string representation of table properties.

This change promotes those two property keys to
`rocksdb/table_properties.h`, adds corresponding uint64 members for
them, deprecates the old access methods `GetDeletedKeys()` and
`GetMergeOperands()` (though they are still usable for now), and removes
`InternalKeyPropertiesCollector`. The property key strings are the same
as before this change, so this should be able to read DBs written from older
versions (though I haven't tested this yet).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4594

Differential Revision: D12826893

Pulled By: abhimadan

fbshipit-source-id: 9e4e4fbdc5b0da161c89582566d184101ba8eb68
2018-10-30 15:34:27 -07:00
Yi Wu
5f5fddabc7 port folly::JemallocNodumpAllocator (#4534)
Summary:
Introduce `JemallocNodumpAllocator`, which allow exclusion of block cache usage from core dump. It utilize custom hook of jemalloc arena, and when jemalloc arena request memory from system, the allocator use the hook to set `MADV_DONTDUMP ` to the memory. The implementation is basically the same as `folly::JemallocNodumpAllocator`, except for some minor difference:
1. It only support jemalloc >= 5.0
2. When the allocator destruct, it explicitly destruct the corresponding arena via `arena.<i>.destroy` via `mallctl`.

Depending on #4502.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4534

Differential Revision: D10435474

Pulled By: yiwu-arbug

fbshipit-source-id: e80edea755d3853182485d2be710376384ce0bb4
2018-10-26 17:29:18 -07:00
Yanqin Jin
5b4c709fad Enable atomic flush (#4023)
Summary:
Adds a DB option `atomic_flush` to control whether to enable this feature. This PR is a subset of [PR 3752](https://github.com/facebook/rocksdb/pull/3752).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4023

Differential Revision: D8518381

Pulled By: riversand963

fbshipit-source-id: 1e3bb33e99bb102876a31b378d93b0138ff6634f
2018-10-26 15:08:43 -07:00
Yi Wu
f560c8f5c8 s/CacheAllocator/MemoryAllocator/g (#4590)
Summary:
Rename the interface, as it is mean to be a generic interface for memory allocation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4590

Differential Revision: D10866340

Pulled By: yiwu-arbug

fbshipit-source-id: 85cb753351a40cb856c046aeaa3f3b369eef3d16
2018-10-26 14:30:30 -07:00
Sagar Vemuri
abb8ecb4cd Add missing methods to WritableFileWrapper (#4584)
Summary:
`WritableFileWrapper` was missing some newer methods that were added to `WritableFile`. Without these functions, the missing wrapper methods would fallback to using the default implementations in WritableFile instead of using the corresponding implementations in, say, `PosixWritableFile` or `WinWritableFile`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4584

Differential Revision: D10559199

Pulled By: sagar0

fbshipit-source-id: 0d0f18a486aee727d5b8eebd3110a41988e27391
2018-10-24 12:19:54 -07:00
jsteemann
d1c0d3f358 Small issues (#4564)
Summary:
Couple of very minor improvements (typos in comments, full qualification of class name, reordering members of a struct to make it smaller)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4564

Differential Revision: D10510183

Pulled By: maysamyabandeh

fbshipit-source-id: c7ddf9bfbf2db08cd31896c3fd93789d3fa68c8b
2018-10-23 10:35:57 -07:00
Siying Dong
7024263682 Dynamic level to adjust level multiplier when write is too heavy (#4338)
Summary:
Level compaction usually performs poorly when the writes so heavy that the level targets can't be guaranteed. With this improvement, we improve level_compaction_dynamic_level_bytes = true so that in the write heavy cases, the level multiplier can be slightly adjusted based on the size of L0.

We keep the behavior the same if number of L0 files is under 2X compaction trigger and the total size is less than options.max_bytes_for_level_base, so that unless write is so heavy that compaction cannot keep up, the behavior doesn't change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4338

Differential Revision: D9636782

Pulled By: siying

fbshipit-source-id: e27fc17a7c29c84b00064cc17536a01dacef7595
2018-10-22 10:21:47 -07:00
Zhongyi Xie
d6ec288703 Add PerfContextByLevel to provide per level perf context information (#4226)
Summary:
Current implementation of perf context is level agnostic. Making it hard to do performance evaluation for the LSM tree. This PR adds `PerfContextByLevel` to decompose the counters by level.
This will be helpful when analyzing point and range query performance as well as tuning bloom filter
Also replaced __thread with thread_local keyword for perf_context
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4226

Differential Revision: D10369509

Pulled By: miasantreble

fbshipit-source-id: f1ced4e0de5fcebdb7f9cff36164516bc6382d82
2018-10-17 11:19:40 -07:00
Yanqin Jin
ce52274640 Replace 'string' with 'const string&' in FileOperationInfo (#4491)
Summary:
Using const string& can avoid one extra string copy. This PR addresses a recent comment made by siying  on #3933.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4491

Differential Revision: D10381211

Pulled By: riversand963

fbshipit-source-id: 27fc2d65d84bc7cd07833c77cdc47f06dcfaeb31
2018-10-15 13:46:01 -07:00
Yanqin Jin
729a617b5b Add listener to sample file io (#3933)
Summary:
We would like to collect file-system-level statistics including file name, offset, length, return code, latency, etc., which requires to add callbacks to intercept file IO function calls when RocksDB is running.
To collect file-system-level statistics, users can inherit the class `EventListener`, as in `TestFileOperationListener `. Note that `TestFileOperationListener::ShouldBeNotifiedOnFileIO()` returns true.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3933

Differential Revision: D10219571

Pulled By: riversand963

fbshipit-source-id: 7acc577a2d31097766a27adb6f78eaf8b1e8ff15
2018-10-12 18:36:11 -07:00
Peter Pei
09814f2cfc support OnCompactionBegin (#4431)
Summary:
fix #4288

Add `OnCompactionBegin` support to `rocksdb::EventListener`.

Currently, we only have these three callbacks:

- OnFlushBegin
- OnFlushCompleted
- OnCompactionCompleted

As paolococchi requested in #4288 , and ajkr agreed, we should also support `OnCompactionBegin`.

This PR is a try to implement the support of `OnCompactionBegin`.

Hope it is useful to you.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4431

Differential Revision: D10055515

Pulled By: yiwu-arbug

fbshipit-source-id: 39c0f95f8e9ff1c7ca3a10787502a17f258d2334
2018-10-10 17:32:27 -07:00
Fosco Marotto
35f26beca5 Update version macro for 5.17 (#4472)
Summary:
Forgot this in previous commit.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4472

Differential Revision: D10244227

Pulled By: gfosco

fbshipit-source-id: ba0cf7a2f5271f0d9f9443004e2620887cd5fd11
2018-10-08 16:22:17 -07:00
DorianZheng
e0f05754ba Expose column family id to OnCompactionCompleted (#4466)
Summary:
The controller you requested could not be found. PTAL
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4466

Differential Revision: D10241358

Pulled By: yiwu-arbug

fbshipit-source-id: 99664eb286860a6c8844d50efeb0ef6f0e10dd1e
2018-10-08 14:24:16 -07:00
Andrew Gallagher
897fe6a4a3 rocksdb: put #pragma once before #ifdef
Summary: Work around upstream bug with modules: https://bugs.llvm.org/show_bug.cgi?id=39184.

Reviewed By: yiwu-arbug

Differential Revision: D10209569

fbshipit-source-id: 696853a02a3869e9c33d0e61168ad4b0436fa3c0
2018-10-04 17:10:21 -07:00
Zhongyi Xie
ce1fc5af09 fix unused param allocator in compression.h (#4453)
Summary:
this should fix currently failing contrun test: rocksdb-contrun-no_compression, rocksdb-contrun-tsan, rocksdb-contrun-tsan_crash
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4453

Differential Revision: D10202626

Pulled By: miasantreble

fbshipit-source-id: 850b07f14f671b5998c22d8239e2a55b2fc1e355
2018-10-04 13:24:22 -07:00
Igor Canadi
1cf5deb8fd Introduce CacheAllocator, a custom allocator for cache blocks (#4437)
Summary:
This is a conceptually simple change, but it touches many files to
pass the allocator through function calls.

We introduce CacheAllocator, which can be used by clients to configure
custom allocator for cache blocks. Our motivation is to hook this up
with folly's `JemallocNodumpAllocator`
(f43ce6d686/folly/experimental/JemallocNodumpAllocator.h),
but there are many other possible use cases.

Additionally, this commit cleans up memory allocation in
`util/compression.h`, making sure that all allocations are wrapped in a
unique_ptr as soon as possible.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4437

Differential Revision: D10132814

Pulled By: yiwu-arbug

fbshipit-source-id: be1343a4b69f6048df127939fea9bbc96969f564
2018-10-02 17:24:58 -07:00
Andrew Kryczka
ac6f435a9a Fix CompactFiles support for kDisableCompressionOption (#4438)
Summary:
Previously `CompactFiles` with `CompressionType::kDisableCompressionOption` caused program to crash on assertion failure. This PR fixes the crash by adding support for that setting. Now, that setting will cause RocksDB to choose compression according to the column family's options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4438

Differential Revision: D10115761

Pulled By: ajkr

fbshipit-source-id: a553c6fa76fa5b6f73b0d165d95640da6f454122
2018-10-01 01:18:10 -07:00
Anand Ananthabhotla
72712f4e28 Allow dynamic modification of window size and deletion trigger (#4403)
Summary:
Make the CompactOnDeletionCollectorFactory class public, and provide
methods to update the window size and deletion trigger params. These
will take effect on subsequent created SST files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4403

Differential Revision: D9976857

Pulled By: anand1976

fbshipit-source-id: 31dbf0511c12fa2bb9b2a7ba620079e0ee09cf48
2018-09-20 15:15:28 -07:00
Anand Ananthabhotla
a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00