Commit Graph

1569 Commits

Author SHA1 Message Date
Yi Wu
be410dede8 Fix PinnableSlice move assignment
Summary:
After move assignment, we need to re-initialized the moved PinnableSlice.

Also update blob_db_impl.cc to not reuse the moved PinnableSlice since it is supposed to be in an undefined state after move.
Closes https://github.com/facebook/rocksdb/pull/3127

Differential Revision: D6238585

Pulled By: yiwu-arbug

fbshipit-source-id: bd99f2e37406c4f7de160c7dee6a2e8126bc224e
2017-11-03 18:13:21 -07:00
Prashant D
4c8f336401 util: Fix coverity issues
Summary:
util/concurrent_arena.h:
CID 1396145 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
2. uninit_member: Non-static class member free_begin_ is not initialized in this constructor nor in any functions that it calls.
 94    Shard() : allocated_and_unused_(0) {}

util/dynamic_bloom.cc:
	1. Condition hash_func == NULL, taking true branch.

CID 1322821 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
3. uninit_member: Non-static class member data_ is not initialized in this constructor nor in any functions that it calls.
47      hash_func_(hash_func == nullptr ? &BloomHash : hash_func) {}
48

util/file_reader_writer.h:
204 private:
205  AlignedBuffer buffer_;
   	member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_offset_.
206  uint64_t buffer_offset_;

CID 1418246 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
member_not_init_in_gen_ctor: The compiler-generated constructor for this class does not initialize buffer_len_.
207  size_t buffer_len_;
208};

util/thread_local.cc:
341#endif

CID 1322795 (#1 of 1): Uninitialized scalar field (UNINIT_CTOR)
3. uninit_member: Non-static class member pthread_key_ is not initialized in this constructor nor in any functions that it calls.
342}

40struct ThreadData {
   	2. uninit_member: Non-static class member next is not initialized in this constructor nor in any functions that it calls.

CID 1400668 (#1 of 1): Uninitialized pointer field (UNINIT_CTOR)
4. uninit_member: Non-static class member prev is not initialized in this constructor nor in any functions that it calls.
 41  explicit ThreadData(ThreadLocalPtr::StaticMeta* _inst) : entries(), inst(_inst) {}
 42  std::vector<Entry> entries;
   	1. member_decl: Class member declaration for next.
 43  ThreadData* next;
   	3. member_decl: Class member declaration for prev.
 44  ThreadData* prev;
 45  ThreadLocalPtr::StaticMeta* inst;
 46};
Closes https://github.com/facebook/rocksdb/pull/3123

Differential Revision: D6233566

Pulled By: sagar0

fbshipit-source-id: aa2068790ea69787a0035c0db39d59b0c25108db
2017-11-03 14:42:08 -07:00
Andrew Kryczka
cfb120f737 fix CopyFile status checks
Summary:
copied from internal diff D6156261
Closes https://github.com/facebook/rocksdb/pull/3124

Differential Revision: D6230167

Pulled By: ajkr

fbshipit-source-id: 17926bb1152d607556364e3aacfec0ef3c115748
2017-11-03 11:57:10 -07:00
Yi Wu
d956169563 Fix clang build error
Summary:
Fix cast from size_t to unsigned int.
Closes https://github.com/facebook/rocksdb/pull/3125

Differential Revision: D6232863

Pulled By: yiwu-arbug

fbshipit-source-id: 4c6131168b1faec26f7820b2cf4a09c242d323b7
2017-11-03 11:26:54 -07:00
Andrew Kryczka
24ad430600 pass key/value samples through zstd compression dictionary generator
Summary:
Instead of using samples directly, we now support passing the samples through zstd's dictionary generator when `CompressionOptions::zstd_max_train_bytes` is set to nonzero. If set to zero, we will use the samples directly as the dictionary -- same as before.

Note this is the first step of #2987, extracted into a separate PR per reviewer request.
Closes https://github.com/facebook/rocksdb/pull/3057

Differential Revision: D6116891

Pulled By: ajkr

fbshipit-source-id: 70ab13cc4c734fa02e554180eed0618b75255497
2017-11-02 22:56:36 -07:00
Zhongyi Xie
30e4e01e05 add missing else
Summary: Closes https://github.com/facebook/rocksdb/pull/3121

Differential Revision: D6229415

Pulled By: miasantreble

fbshipit-source-id: 57c7ad2fddf5dd6b8d7e3aaf6f62348151327dfb
2017-11-02 22:28:06 -07:00
Shaohua Li
33c7d4ccd9 Make writable_file_max_buffer_size dynamic
Summary:
The DBOptions::writable_file_max_buffer_size can be changed dynamically.
Closes https://github.com/facebook/rocksdb/pull/3053

Differential Revision: D6152720

Pulled By: shligit

fbshipit-source-id: aa0c0cfcfae6a54eb17faadb148d904797c68681
2017-10-31 13:56:35 -07:00
Islam AbdelRahman
05993155ef Mark files as trash by using .trash extension
Summary:
SstFileManager move files that need to be deleted into a trash directory.
Deprecate this behaviour and instead add ".trash" extension to files that need to be deleted
Closes https://github.com/facebook/rocksdb/pull/2970

Differential Revision: D5976805

Pulled By: IslamAbdelRahman

fbshipit-source-id: 27374ece4315610b2792c30ffcd50232d4c9a343
2017-10-27 13:27:12 -07:00
Sagar Vemuri
f0804db7f7 Make FIFO compaction options dynamically configurable
Summary:
ColumnFamilyOptions::compaction_options_fifo and all its sub-fields can be set dynamically now.

Some of the ways in which the fifo compaction options can be set are:
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024}"}})`
- `SetOptions({{"compaction_options_fifo", "{ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=1024;ttl=600;}"}})`
- `SetOptions({{"compaction_options_fifo", "{max_table_files_size=51;ttl=49;allow_compaction=true;}"}})`

Most of the code has been made generic enough so that it could be reused later to make universal options (and other such nested defined-types) dynamic with very few lines of parsing/serializing code changes.
Introduced a few new functions like `ParseStruct`, `SerializeStruct` and `GetStringFromStruct`.
The duplicate code in `GetStringFromDBOptions` and `GetStringFromColumnFamilyOptions` has been moved into `GetStringFromStruct`. So they become just simple wrappers now.
Closes https://github.com/facebook/rocksdb/pull/3006

Differential Revision: D6058619

Pulled By: sagar0

fbshipit-source-id: 1e8f78b3374ca5249bb4f3be8a6d3bb4cbc52f92
2017-10-19 15:26:36 -07:00
Dmitri Smirnov
ebab2e2d42 Enable MSVC W4 with a few exceptions. Fix warnings and bugs
Summary: Closes https://github.com/facebook/rocksdb/pull/3018

Differential Revision: D6079011

Pulled By: yiwu-arbug

fbshipit-source-id: 988a721e7e7617967859dba71d660fc69f4dff57
2017-10-19 10:57:12 -07:00
Nikhil Benesch
c0208dffbe arena: derive alignment unit from std::max_align_t
Summary:
As raised in #2265, the arena allocator will return memory that is improperly aligned to store a `std::function` on macOS. Oddly, I'm unable to tickle this bug without adding a `std::function` field to `struct ReadOptions`—but my proposal in #2265 does exactly that.

In any case, here's a simple reproduction. Apply this bogus patch to get a `std::function` into `struct ReadOptions`

```
 --- a/include/rocksdb/options.h
+++ b/include/rocksdb/options.h
@@ -1035,6 +1035,8 @@ struct ReadOptions {
   // Default: 0
   uint64_t max_skippable_internal_keys;

+  std::function<void()> foo;
+
   ReadOptions();
   ReadOptions(bool cksum, bool cache);
 };
```

then compile `db_properties_test` *with ubsan* and run `ReadLatencyHistogramByLevel`:

```
$ make COMPILE_WITH_UBSAN=1 db_properties_test
$ ./db_properties_test --gtest_filter=DBPropertiesTest.ReadLatencyHistogramByLevel
```

ubsan will complain about several misaligned accesses:

```
Note: Google Test filter = DBPropertiesTest.ReadLatencyHistogramByLevel
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from DBPropertiesTest
[ RUN      ] DBPropertiesTest.ReadLatencyHistogramByLevel
util/coding.h:372:12: runtime error: load of misaligned address 0x00010d85516c for type 'const unsigned long', which requires 8 byte alignment
0x00010d85516c: note: pointer points here
  01 00 34 57 00 00 00 00  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  78 24 82 0a 01 00 00 00
              ^
util/coding.h:362:3: runtime error: store to misaligned address 0x7fff5733fac4 for type 'unsigned long', which requires 8 byte alignment
0x7fff5733fac4: note: pointer points here
  01 00 00 00 00 00 00 00  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  80 1d 96 0d 01 00 00 00
              ^
util/coding.h:372:12: runtime error: load of misaligned address 0x00010d85516c for type 'const unsigned long', which requires 8 byte alignment
0x00010d85516c: note: pointer points here
  01 00 34 57 00 00 00 00  02 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  78 24 82 0a 01 00 00 00
              ^
version_set.cc:854: runtime error: constructor call on misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
version_set.cc:512: runtime error: constructor call on misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
version_set.cc:505: runtime error: constructor call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
options.h:931: runtime error: constructor call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
options.h:931: runtime error: constructor call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
functional:1583: runtime error: constructor call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1585:9: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1585:9: runtime error: store to misaligned address 0x00010dbfa648 for type '__base *' (aka '__base<void ()> *'), which requires 16 byte alignment
0x00010dbfa648: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
db/version_set.cc:864:29: runtime error: upcast of misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:521:12: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:521:12: runtime error: load of misaligned address 0x00010dbfa5d8 for type 'rocksdb::TableCache *', which requires 16 byte alignment
0x00010dbfa5d8: note: pointer points here
 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00
              ^
db/version_set.cc:522:9: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:522:9: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
db/version_set.cc:522:24: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:522:38: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:522:57: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:522:57: runtime error: load of misaligned address 0x00010dbfa678 for type 'rocksdb::RangeDelAggregator *', which requires 16 byte alignment
0x00010dbfa678: note: pointer points here
 01 00 00 00  d0 a1 bf 0d 01 00 00 00  00 00 00 00 00 00 00 00  f8 db 70 0a 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:523:54: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:523:54: runtime error: load of misaligned address 0x00010dbfa668 for type 'rocksdb::HistogramImpl *', which requires 16 byte alignment
0x00010dbfa668: note: pointer points here
 01 00 00 00  c8 88 a5 0d 01 00 00 00  00 00 00 00 01 00 00 00  d0 a1 bf 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:524:9: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:524:47: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:524:62: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/table_cache.cc:228:33: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
table/block_based_table_reader.cc:1554:41: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
table/block_based_table_reader.cc:1396:21: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
include/rocksdb/options.h:931:8: runtime error: reference binding to misaligned address 0x00010dbfa628 for type 'const std::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1584:13: runtime error: load of misaligned address 0x00010dbfa648 for type '__base *const' (aka '__base<void ()> *const'), which requires 16 byte alignment
0x00010dbfa648: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c8 a5 97 0d 01 00 00 00  38 36 9b 0d
              ^
table/block_based_table_reader.cc:1555:24: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
db/table_cache.cc:244:54: runtime error: load of misaligned address 0x00010dbfa618 for type 'const bool', which requires 16 byte alignment
0x00010dbfa618: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
db/table_cache.cc:246:49: runtime error: reference binding to misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
db/version_set.cc:532:12: runtime error: member access within misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
db/version_set.cc:532:12: runtime error: member access within misaligned address 0x00010dbfa5e8 for type 'const rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
db/version_set.cc:532:26: runtime error: load of misaligned address 0x00010dbfa5f8 for type 'const rocksdb::Slice *const', which requires 16 byte alignment
0x00010dbfa5f8: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
version_set.cc:493: runtime error: member call on misaligned address 0x00010dbfa5c8 for type 'rocksdb::(anonymous namespace)::LevelFileIteratorState', which requires 16 byte alignment
0x00010dbfa5c8: note: pointer points here
 00 00 00 00  a0 db 70 0a 01 00 00 00  00 00 00 00 00 00 00 00  90 14 98 0d 01 00 00 00  00 00 00 00
              ^
version_set.cc:493: runtime error: member call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
options.h:931: runtime error: member call on misaligned address 0x00010dbfa5e8 for type 'rocksdb::ReadOptions', which requires 16 byte alignment
0x00010dbfa5e8: note: pointer points here
 00 00 00 00  01 01 ff ff ff ff ff ff  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
options.h:931: runtime error: member call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
functional:1765: runtime error: member call on misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1766:9: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1766:9: runtime error: load of misaligned address 0x00010dbfa648 for type '__base *' (aka '__base<void ()> *'), which requires 16 byte alignment
0x00010dbfa648: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c8 a5 97 0d 01 00 00 00  38 36 9b 0d
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1766:27: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1768:14: runtime error: member access within misaligned address 0x00010dbfa628 for type 'std::__1::function<void ()>', which requires 16 byte alignment
0x00010dbfa628: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00
              ^
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../include/c++/v1/functional:1768:14: runtime error: load of misaligned address 0x00010dbfa648 for type '__base *' (aka '__base<void ()> *'), which requires 16 byte alignment
0x00010dbfa648: note: pointer points here
 00 00 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  c8 a5 97 0d 01 00 00 00  38 36 9b 0d
              ^
[       OK ] DBPropertiesTest.ReadLatencyHistogramByLevel (1599 ms)
[----------] 1 test from DBPropertiesTest (1599 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (1599 ms total)
[  PASSED  ] 1 test.
```

So it seems the root cause is that the internal implementation of `std::function` on macOS (and perhaps with libc++ generally?) requires 16-byte aligned memory, but the arena allocator only guarantees that the returned memory will be `sizeof(void*)` aligned, which is only 8-byte alignment on my machine. This patch solves the problem by adjusting the allocator to derive the necessary alignment from `alignof(std::max_align_t)`, which is properly 16 bytes on my machine.

As I mentioned in #2265, none of RocksDB's tests will cause this unaligned access to actually abort the process, but, on macOS, linking CockroachDB against a version of RocksDB with the above patch and letting it run for just a few seconds will cause a SIGABRT.

```
Process 19792 stopped
* thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
    frame #0: 0x0000000004f5e78f cockroach`DBNewIter + 95
cockroach`DBNewIter:
->  0x4f5e78f <+95>:  callq  *0x28(%rax)
    0x4f5e792 <+98>:  jmp    0x4f5e79e                 ; <+110>
    0x4f5e794 <+100>: movq   -0x50(%rbp), %rcx
    0x4f5e798 <+104>: movq   %rax, %rdi
(lldb) bt
* thread #2, stop reason = EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
  * frame #0: 0x0000000004f5e78f cockroach`DBNewIter + 95
```

I'd get you a backtrace, but [Go doesn't include cgo debug information on macOS](https://github.com/golang/go/issues/6942). I've also tried building against libc++ on Linux, where debug information would be available, but I can't seem to trigger the bug there.

In any case, this PR both fixes the segfault in CockroachDB and fixes the warnings reported by ubsan.
Closes https://github.com/facebook/rocksdb/pull/2347

Differential Revision: D5108596

Pulled By: yiwu-arbug

fbshipit-source-id: bd5e4323b2ce915ed4fe78e123cb8996aec75a00
2017-10-17 11:13:19 -07:00
codeeply
f7843f30a8 Move ~Comparator define to comparator.h
Summary:
When I impl my own comparator, and build in release mode.
The following compile error occurs.
undefined reference to `typeinfo for rocksdb::Comparator'

This fix allows users build with RTTI off when has their own comparator.
Closes https://github.com/facebook/rocksdb/pull/3008

Differential Revision: D6077354

Pulled By: yiwu-arbug

fbshipit-source-id: 914c26dbab72f0ad1f0e15f8666a3fb2f10bfed8
2017-10-17 09:58:13 -07:00
Andrew Kryczka
731895214b db_bench randomtransaction print throughput
Summary:
print throughput in MB/s upon finishing randomtransaction benchmark
Closes https://github.com/facebook/rocksdb/pull/3016

Differential Revision: D6070426

Pulled By: ajkr

fbshipit-source-id: 69df43beed4c374a36d826e761ca3a83e1fdcbf5
2017-10-16 18:42:25 -07:00
Yi Wu
31d3e41810 PinnableSlice move assignment
Summary:
Allow `std::move(pinnable_slice)`.
Closes https://github.com/facebook/rocksdb/pull/2997

Differential Revision: D6036782

Pulled By: yiwu-arbug

fbshipit-source-id: 583fb0419a97e437ff530f4305822341cd3381fa
2017-10-12 18:28:24 -07:00
Kefu Chai
019aa7074c cmake: pass "-msse4.2" to when building crc32c.cc if HAVE_SSE42
Summary:
it turns out that, with older GCC shipped from centos7, the SSE42
intrinsics are not available even with "target" specified. so we
need to pass "-msse42" for checking compiler's sse4.2 support and
for building crc32c.cc which uses sse4.2 intrinsics for crc32.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
Closes https://github.com/facebook/rocksdb/pull/2950

Differential Revision: D6032298

Pulled By: siying

fbshipit-source-id: 124c946321043661b3fb0a70b6cdf4c9c5126ab4
2017-10-11 12:26:46 -07:00
Andrew Kryczka
1026e794a3 rate limit auto-tuning
Summary:
Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in `GenericRateLimiter::Tune`. Other changes:

- make rate limiter's `Env*` configurable for testing
- track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter.
Closes https://github.com/facebook/rocksdb/pull/2899

Differential Revision: D5858704

Pulled By: ajkr

fbshipit-source-id: cc2bac30f85e7f6fd63655d0a6732ef9ed7403b1
2017-10-04 19:15:01 -07:00
Andrew Kryczka
5b2cb64bfb Prevent threads from respawning during joining
Summary:
Previously the thread pool might be non-empty after joining since concurrent submissions could spawn new threads. This problem didn't affect our background flush/compaction thread pools because the `shutting_down_` flag prevented new jobs from being submitted during/after joining. But I wanted to be able to reuse the `ThreadPool` without such external synchronization.
Closes https://github.com/facebook/rocksdb/pull/2953

Differential Revision: D5951920

Pulled By: ajkr

fbshipit-source-id: 0efec7d0056d36d1338367da75e8b0c089bbc973
2017-10-03 16:27:28 -07:00
Siying Dong
64b6452e0c Make InternalKeyComparator final and directly use it in merging iterator
Summary:
Merging iterator invokes InternalKeyComparator.Compare() frequently to heap merge. By making InternalKeyComparator final and merging iterator to directly use InternalKeyComparator rather than through Iterator interface, we can give compiler a choice to avoid one more virtual function call if possible. I ran readseq benchmark in memory-only use case to make sure the performance at least doesn't regress.

I have to disable the final key word in debug build, as a hack test class depends on overriding the class.
Closes https://github.com/facebook/rocksdb/pull/2860

Differential Revision: D5800461

Pulled By: siying

fbshipit-source-id: ab876f22a09bb5c560740911412336e0e25ccb53
2017-09-11 12:04:21 -07:00
Yi Wu
dcd36a6aee Make it explicit blob db doesn't support CF
Summary:
Blob db doesn't currently support column families. Return NotSupported status explicitly.
Closes https://github.com/facebook/rocksdb/pull/2825

Differential Revision: D5757438

Pulled By: yiwu-arbug

fbshipit-source-id: 44de9408fd032c98e8ae337d4db4ed37169bd9fa
2017-09-08 11:11:04 -07:00
Kefu Chai
ba3c58cab6 specify SSE42 'target' attribute for Fast_CRC32()
Summary:
if we enable SSE42 globally when compiling the tree for preparing a
portable binary, which could be running on CPU w/o SSE42 instructions
even the GCC on the building host is able to emit SSE42 code, this leads
to illegal instruction errors on machines not supporting SSE42. to solve
this problem, crc32 detects the supported instruction at runtime, and
selects the supported CRC32 implementation according to the result of
`cpuid`. but intrinics like "_mm_crc32_u64()" will not be available
unless the "target" machine is appropriately specified in the command
line, like "-msse42", or using the "target" attribute.

we could pass "-msse42" only when compiling crc32c.cc, and allow the
compiler to generate the SSE42 instructions, but we are still at the
risk of executing illegal instructions on machines does not support
SSE42 if the compiler emits code that is not guarded by our runtime
detection. and we need to do the change in both Makefile and CMakefile.

or, we can use GCC's "target" attribute to enable the machine specific
instructions on certain function. in this way, we have finer grained
control of the used "target". and no need to change the makefiles. so
we don't need to duplicate the changes on both makefile and cmake as
the previous approach.

this problem surfaces when preparing a package for GNU/Linux distribution,
and we only applies to optimization for SSE42, so using a feature
only available on GCC/Clang is not that formidable.
Closes https://github.com/facebook/rocksdb/pull/2807

Differential Revision: D5786084

Pulled By: siying

fbshipit-source-id: bca5c0f877b8d6fb55f58f8f122254a26422843d
2017-09-07 12:40:57 -07:00
Kamalalochana Subbaiah
e612e31740 Updated CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2716

Differential Revision: D5606836

Pulled By: siying

fbshipit-source-id: 720262453b1546e5fdbbc668eff56848164113f3
2017-08-31 14:16:30 -07:00
Maysam Yabandeh
26ac24f199 Add more unit test to write_prepared txns
Summary: Closes https://github.com/facebook/rocksdb/pull/2798

Differential Revision: D5724173

Pulled By: maysamyabandeh

fbshipit-source-id: fb6b782d933fb4be315b1a231a6a67a66fdc9c96
2017-08-31 09:41:27 -07:00
Siying Dong
666a005f9b Support prefetch last 512KB with direct I/O in block based file reader
Summary:
Right now, if direct I/O is enabled, prefetching the last 512KB cannot be applied, except compaction inputs or readahead is enabled for iterators. This can create a lot of I/O for HDD cases. To solve the problem, the 512KB is prefetched in block based table if direct I/O is enabled. The prefetched buffer is passed in totegher with random access file reader, so that we try to read from the buffer before reading from the file. This can be extended in the future to support flexible user iterator readahead too.
Closes https://github.com/facebook/rocksdb/pull/2708

Differential Revision: D5593091

Pulled By: siying

fbshipit-source-id: ee36ff6d8af11c312a2622272b21957a7b5c81e7
2017-08-11 12:16:45 -07:00
James Page
36375de76f gcc-7/i386: markup intentional fallthroughs
Summary:
Markup i386 code paths resolving compilation
failure under i386 with gcc-7.

Signed-off-by: James Page <james.page@ubuntu.com>
Closes https://github.com/facebook/rocksdb/pull/2700

Differential Revision: D5583047

Pulled By: maysamyabandeh

fbshipit-source-id: fe31bcfeaf7cd2d3f51b55f5ae0b3b0cb3788fbc
2017-08-08 08:56:52 -07:00
Andrew Kryczka
cc01985db0 Introduce bottom-pri thread pool for large universal compactions
Summary:
When we had a single thread pool for compactions, a thread could be busy for a long time (minutes) executing a compaction involving the bottom level. In multi-instance setups, the entire thread pool could be consumed by such bottom-level compactions. Then, top-level compactions (e.g., a few L0 files) would be blocked for a long time ("head-of-line blocking"). Such top-level compactions are critical to prevent compaction stalls as they can quickly reduce number of L0 files / sorted runs.

This diff introduces a bottom-priority queue for universal compactions including the bottom level. This alleviates the head-of-line blocking situation for fast, top-level compactions.

- Added `Env::Priority::BOTTOM` thread pool. This feature is only enabled if user explicitly configures it to have a positive number of threads.
- Changed `ThreadPoolImpl`'s default thread limit from one to zero. This change is invisible to users as we call `IncBackgroundThreadsIfNeeded` on the low-pri/high-pri pools during `DB::Open` with values of at least one. It is necessary, though, for bottom-pri to start with zero threads so the feature is disabled by default.
- Separated `ManualCompaction` into two parts in `PrepickedCompaction`. `PrepickedCompaction` is used for any compaction that's picked outside of its execution thread, either manual or automatic.
- Forward universal compactions involving last level to the bottom pool (worker thread's entry point is `BGWorkBottomCompaction`).
- Track `bg_bottom_compaction_scheduled_` so we can wait for bottom-level compactions to finish. We don't count them against the background jobs limits. So users of this feature will get an extra compaction for free.
Closes https://github.com/facebook/rocksdb/pull/2580

Differential Revision: D5422916

Pulled By: ajkr

fbshipit-source-id: a74bd11f1ea4933df3739b16808bb21fcd512333
2017-08-03 15:43:29 -07:00
Siying Dong
a84cee8127 Add a missing "once" in .h
Summary: Closes https://github.com/facebook/rocksdb/pull/2670

Differential Revision: D5529018

Pulled By: siying

fbshipit-source-id: 10a378933d509035d2dbe502247dd85fcea09789
2017-07-31 12:12:03 -07:00
Siying Dong
21696ba502 Replace dynamic_cast<>
Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645

Differential Revision: D5502723

Pulled By: siying

fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
2017-07-28 16:27:16 -07:00
Mike Kolupaev
e85f2c64cb Prevent empty memtables from using a lot of memory
Summary:
This fixes OOMs that we (logdevice) are currently having in production.

SkipListRep constructor does a couple small allocations from ConcurrentArena (see InlineSkipList constructor). ConcurrentArena would sometimes allocate an entire block for that, which is a few megabytes (we use Options::arena_block_size = 4 MB). So an empty memtable can take take 4 MB of memory. We have ~40k column families (spread across 15 DB instances), so 4 MB per empty memtable easily OOMs a machine for us.

This PR makes ConcurrentArena always allocate from Arena's inline block when possible. So as long as InlineSkipList's initial allocations are below 2 KB there would be no blocks allocated for empty memtables.
Closes https://github.com/facebook/rocksdb/pull/2569

Differential Revision: D5404029

Pulled By: al13n321

fbshipit-source-id: 568ec22a3fd1a485c06123f6b2dfc5e9ef67cd23
2017-07-28 15:58:43 -07:00
Siying Dong
e7697b8ce8 Fix LITE unit tests
Summary: Closes https://github.com/facebook/rocksdb/pull/2649

Differential Revision: D5505778

Pulled By: siying

fbshipit-source-id: 7e935603ede3d958ea087ed6b8cfc4121e8797bc
2017-07-26 21:11:47 -07:00
Siying Dong
c281b44829 Revert "CRC32 Power Optimization Changes"
Summary:
This reverts commit 2289d38115.
Closes https://github.com/facebook/rocksdb/pull/2652

Differential Revision: D5506163

Pulled By: siying

fbshipit-source-id: 105e31dd9d99090453a6b9f32c165206cd3affa3
2017-07-26 19:31:36 -07:00
Kamalalochana Subbaiah
2289d38115 CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2353

Differential Revision: D5210948

Pulled By: siying

fbshipit-source-id: 859a8c063d37697addd89ba2b8a14e5efd5d24bf
2017-07-26 09:42:29 -07:00
Sagar Vemuri
72502cf227 Revert "comment out unused parameters"
Summary:
This reverts the previous commit 1d7048c598, which broke the build.

Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627

Differential Revision: D5476473

Pulled By: sagar0

fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
2017-07-21 18:26:26 -07:00
Victor Gao
1d7048c598 comment out unused parameters
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.

Reviewed By: igorsugak

Differential Revision: D5454343

fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
2017-07-21 14:57:44 -07:00
Islam AbdelRahman
3e5ea29a83 Fix Flaky DeleteSchedulerTest::ImmediateDeleteOn25PercDBSize
Summary:
In this test we are deleting 100 files, and we are expecting DeleteScheduler to delete 26 files in the background and 74 files immediately in the foreground

The main purpose of the test is to make sure that we delete files in foreground thread, which is verified in line 546

But sometimes we may end up with 26 files or 25 files in the trash directory because the background thread may be slow and not be able to delete the first file fast enough, so sometimes this test fail.

Remove
```
ASSERT_EQ(CountFilesInDir(trash_dir_), 25);
```
Since it does not have any benefit any way
Closes https://github.com/facebook/rocksdb/pull/2618

Differential Revision: D5458674

Pulled By: IslamAbdelRahman

fbshipit-source-id: 5556a9edfa049db71dce80b8e6ae0fdd25e1e74e
2017-07-20 11:29:01 -07:00
Yedidya Feldblum
f1a056e005 CodeMod: Prefer ADD_FAILURE() over EXPECT_TRUE(false), et cetera
Summary:
CodeMod: Prefer `ADD_FAILURE()` over `EXPECT_TRUE(false)`, et cetera.

The tautologically-conditioned and tautologically-contradicted boolean expectations/assertions have better alternatives: unconditional passes and failures.

Reviewed By: Orvid

Differential Revision:
D5432398

Tags: codemod, codemod-opensource

fbshipit-source-id: d16b447e8696a6feaa94b41199f5052226ef6914
2017-07-16 21:26:02 -07:00
Siying Dong
3c327ac2d0 Change RocksDB License
Summary: Closes https://github.com/facebook/rocksdb/pull/2589

Differential Revision: D5431502

Pulled By: siying

fbshipit-source-id: 8ebf8c87883daa9daa54b2303d11ce01ab1f6f75
2017-07-15 16:11:23 -07:00
Giuseppe Ottaviano
8f927e5f75 Fix undefined behavior in Hash
Summary:
Instead of ignoring UBSan checks, fix the negative shifts in
Hash(). Also add test to make sure the hash values are stable over
time. The values were computed before this change, so the test also
verifies the correctness of the change.
Closes https://github.com/facebook/rocksdb/pull/2546

Differential Revision: D5386369

Pulled By: yiwu-arbug

fbshipit-source-id: 6de4b44461a544d6222cc5d72d8cda2c0373d17e
2017-07-10 12:29:24 -07:00
Maysam Yabandeh
b5fb85ec51 fix valgrind init complaint
Summary: Closes https://github.com/facebook/rocksdb/pull/2549

Differential Revision: D5386307

Pulled By: maysamyabandeh

fbshipit-source-id: 3032c95c54755053b6450765ec4dacbecb734f9d
2017-07-07 18:27:08 -07:00
Maysam Yabandeh
45b9bb0331 Cut filter partition based on metadata_block_size
Summary:
Currently metadata_block_size controls only index partition size. With this patch a partition is cut after any of index or filter partitions reaches metadata_block_size.
Closes https://github.com/facebook/rocksdb/pull/2452

Differential Revision: D5275651

Pulled By: maysamyabandeh

fbshipit-source-id: 5057e4424b4c8902043782e6bf8c38f0c4f25160
2017-07-02 10:42:12 -07:00
Siying Dong
afbef65187 Bug fix: Fast CRC Support printing is not honest
Summary:
11c5d4741a introduces a bug that IsFastCrc32Supported() returns wrong result. Fix it. Also fix some FB internal scripts.
Closes https://github.com/facebook/rocksdb/pull/2513

Differential Revision: D5343802

Pulled By: yiwu-arbug

fbshipit-source-id: 057dc7ae3b262fe951413d1190ce60afc788cc05
2017-06-28 21:41:42 -07:00
Mike Kolupaev
397ab11152 Improve Status message for block checksum mismatches
Summary:
We've got some DBs where iterators return Status with message "Corruption: block checksum mismatch" all the time. That's not very informative. It would be much easier to investigate if the error message contained the file name - then we would know e.g. how old the corrupted file is, which would be very useful for finding the root cause. This PR adds file name, offset and other stuff to some block corruption-related status messages.

It doesn't improve all the error messages, just a few that were easy to improve. I'm mostly interested in "block checksum mismatch" and "Bad table magic number" since they're the only corruption errors that I've ever seen in the wild.
Closes https://github.com/facebook/rocksdb/pull/2507

Differential Revision: D5345702

Pulled By: al13n321

fbshipit-source-id: fc8023d43f1935ad927cef1b9c55481ab3cb1339
2017-06-28 21:27:01 -07:00
Aaron Gao
0025a36409 revert perf_context and io_stats to __thread
Summary:
https://github.com/facebook/rocksdb/pull/2380 introduces a regression by replacing __thread with ThreadLocalPtr. Revert the thread local implementation back.
Closes https://github.com/facebook/rocksdb/pull/2485

Differential Revision: D5308050

Pulled By: lightmark

fbshipit-source-id: 2676e9c22edf76e8133d3f4c50e2711e11a95480
2017-06-26 15:27:17 -07:00
Maysam Yabandeh
499ebb3ab5 Optimize for serial commits in 2PC
Summary:
Throughput: 46k tps in our sysbench settings (filling the details later)

The idea is to have the simplest change that gives us a reasonable boost
in 2PC throughput.

Major design changes:
1. The WAL file internal buffer is not flushed after each write. Instead
it is flushed before critical operations (WAL copy via fs) or when
FlushWAL is called by MySQL. Flushing the WAL buffer is also protected
via mutex_.
2. Use two sequence numbers: last seq, and last seq for write. Last seq
is the last visible sequence number for reads. Last seq for write is the
next sequence number that should be used to write to WAL/memtable. This
allows to have a memtable write be in parallel to WAL writes.
3. BatchGroup is not used for writes. This means that we can have
parallel writers which changes a major assumption in the code base. To
accommodate for that i) allow only 1 WriteImpl that intends to write to
memtable via mem_mutex_--which is fine since in 2PC almost all of the memtable writes
come via group commit phase which is serial anyway, ii) make all the
parts in the code base that assumed to be the only writer (via
EnterUnbatched) to also acquire mem_mutex_, iii) stat updates are
protected via a stat_mutex_.

Note: the first commit has the approach figured out but is not clean.
Submitting the PR anyway to get the early feedback on the approach. If
we are ok with the approach I will go ahead with this updates:
0) Rebase with Yi's pipelining changes
1) Currently batching is disabled by default to make sure that it will be
consistent with all unit tests. Will make this optional via a config.
2) A couple of unit tests are disabled. They need to be updated with the
serial commit of 2PC taken into account.
3) Replacing BatchGroup with mem_mutex_ got a bit ugly as it requires
releasing mutex_ beforehand (the same way EnterUnbatched does). This
needs to be cleaned up.
Closes https://github.com/facebook/rocksdb/pull/2345

Differential Revision: D5210732

Pulled By: maysamyabandeh

fbshipit-source-id: 78653bd95a35cd1e831e555e0e57bdfd695355a4
2017-06-24 14:11:29 -07:00
Dmitri Smirnov
a21db161c9 Implement ReopenWritibaleFile on Windows and other fixes
Summary:
Make default impl return NoSupported so the db_blob
  tests exist in a meaningful manner.
  Replace std::thread to port::Thread
Closes https://github.com/facebook/rocksdb/pull/2465

Differential Revision: D5275563

Pulled By: yiwu-arbug

fbshipit-source-id: cedf1a18a2c05e20d768c1308b3f3224dbd70ab6
2017-06-20 10:31:13 -07:00
Sagar Vemuri
53dda8797d Do not run RateLimiterTest.Rate test on Travis+Mac OSX.
Summary:
RateLimiterTest.Rate test has been failing continuously since many days on travis in Mac OSX PLATFORM_DEPENDENT test suite.
Check https://travis-ci.org/facebook/rocksdb/pull_requests.

Disabling this test for now, so that we can investigate more in depth.
Closes https://github.com/facebook/rocksdb/pull/2451

Differential Revision: D5250147

Pulled By: sagar0

fbshipit-source-id: d58476a3c2792d20e875754d1516c4bc7174e86c
2017-06-14 14:58:02 -07:00
hyunwoo
6b5a5dc5d8 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2430

Differential Revision: D5242471

Pulled By: IslamAbdelRahman

fbshipit-source-id: 832eb3a4c70221444ccd2ae63217823fec56c748
2017-06-13 16:58:01 -07:00
haoxiang
0f228be3bb fixed typo in util/dynamic_bloom.h
Summary:
fixed a typo in util/dynamic_bloom.h
Closes https://github.com/facebook/rocksdb/pull/2442

Differential Revision: D5242397

Pulled By: IslamAbdelRahman

fbshipit-source-id: c47fd18cc79afff6b022201a0410c0cd47626576
2017-06-13 16:41:36 -07:00
Andrew Kryczka
c217e0b9c7 Call RateLimiter for compaction reads
Summary:
Allow users to rate limit background work based on read bytes, written bytes, or sum of read and written bytes. Support these by changing the RateLimiter API, so no additional options were needed.
Closes https://github.com/facebook/rocksdb/pull/2433

Differential Revision: D5216946

Pulled By: ajkr

fbshipit-source-id: aec57a8357dbb4bfde2003261094d786d94f724e
2017-06-13 14:56:46 -07:00
Siying Dong
0175d58c3c Make direct I/O write use incremental buffer
Summary:
Currently for direct I/O, the large maximum buffer is always allocated. This will be wasteful if users flush the data in much smaller chunks. This diff fix this by changing the behavior of incremental buffer works. When we enlarge buffer, we try to copy the existing data in the buffer to the enlarged buffer, rather than flush the buffer first. This can make sure that no extra I/O is introduced because of buffer enlargement.
Closes https://github.com/facebook/rocksdb/pull/2403

Differential Revision: D5178403

Pulled By: siying

fbshipit-source-id: a8fe1e7304bdb8cab2973340022fe80ff83449fd
2017-06-13 04:41:37 -07:00
Islam AbdelRahman
d713471da8 Limit trash directory to be 25% of total DB
Summary:
Update DeleteScheduler to delete files immediately if trash directory is >= 25% of DB size
Closes https://github.com/facebook/rocksdb/pull/2436

Differential Revision: D5230384

Pulled By: IslamAbdelRahman

fbshipit-source-id: 5cbda8ac536a3cc72c774641621edc02c8202482
2017-06-12 16:57:21 -07:00
Siying Dong
db818d2d1a Fix RocksDB Lite build with CLANG
Summary: Closes https://github.com/facebook/rocksdb/pull/2419

Differential Revision: D5193976

Pulled By: siying

fbshipit-source-id: 62d115edee6043237e9d6ad3c2a05481e162c9eb
2017-06-12 06:41:27 -07:00
Siying Dong
52a7f38b19 WriteOptions.low_pri which can throttle low pri writes if needed
Summary:
If ReadOptions.low_pri=true and compaction is behind, the write will either return immediate or be slowed down based on ReadOptions.no_slowdown.
Closes https://github.com/facebook/rocksdb/pull/2369

Differential Revision: D5127619

Pulled By: siying

fbshipit-source-id: d30e1cff515890af0eff32dfb869d2e4c9545eb0
2017-06-05 15:02:35 -07:00
hyunwoo
c7662a44a4 fixed typo
Summary:
fixed typo
Closes https://github.com/facebook/rocksdb/pull/2376

Differential Revision: D5183630

Pulled By: ajkr

fbshipit-source-id: 133cfd0445959e70aa2cd1a12151bf3c0c5c3ac5
2017-06-05 11:27:34 -07:00
Aaron Gao
7f6c02dda1 using ThreadLocalPtr to hide ROCKSDB_SUPPORT_THREAD_LOCAL from public…
Summary:
… headers

https://github.com/facebook/rocksdb/pull/2199 should not reference RocksDB-specific macros (like ROCKSDB_SUPPORT_THREAD_LOCAL in this case) to public headers, `iostats_context.h` and `perf_context.h`. We shouldn't do that because users have to provide these compiler flags when building their binary with RocksDB.

We should hide the thread local global variable inside our implementation and just expose a function api to retrieve these variables. It may break some users for now but good for long term.

make check -j64
Closes https://github.com/facebook/rocksdb/pull/2380

Differential Revision: D5177896

Pulled By: lightmark

fbshipit-source-id: 6fcdfac57f2e2dcfe60992b7385c5403f6dcb390
2017-06-02 17:26:19 -07:00
Siying Dong
95b0e89b5d Improve write buffer manager (and allow the size to be tracked in block cache)
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350

Differential Revision: D5110648

Pulled By: siying

fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
2017-06-02 14:26:56 -07:00
Tamir Duberstein
103d0692ea Avoid unsupported attributes when not building with UBSAN
Summary:
yiwu-arbug see individual commits.
Closes https://github.com/facebook/rocksdb/pull/2318

Differential Revision: D5141520

Pulled By: yiwu-arbug

fbshipit-source-id: 7987c92ab4461eef36afce5a133d3a0ee0c96300
2017-05-30 11:13:01 -07:00
Siying Dong
41cbb72749 options.delayed_write_rate use the rate of rate_limiter by default.
Summary:
It's hard for RocksDB to come up with a good default of delayed write rate. Use rate given by rate limiter if it is availalbe. This provides the I/O order of magnitude.
Closes https://github.com/facebook/rocksdb/pull/2357

Differential Revision: D5115324

Pulled By: siying

fbshipit-source-id: 341065ad2211c981fc804011c0f0e59a50c7e754
2017-05-24 09:58:24 -07:00
Andrew Kryczka
6cc9aef162 New API for background work in single thread pool
Summary:
Previously users could set `max_background_flushes=0` to force rocksdb to use a single thread pool for both background flushes and compactions. That'll no longer be possible since I'm going to deprecate `max_background_flushes` and `max_background_compactions` in favor of a single option. This diff introduces a new way to force a single thread pool: when high-pri pool has zero threads, all background jobs will be submitted to low-pri pool.

Note the majority of the code change is adding `Env::GetBackgroundThreads()`, which is necessary to check whether the user has provided a zero-sized thread pool.
Closes https://github.com/facebook/rocksdb/pull/2204

Differential Revision: D4936256

Pulled By: ajkr

fbshipit-source-id: 929a07a0c0705f7766f5339cd013ff74e90d6e01
2017-05-23 11:12:27 -07:00
Andrew Kryczka
ac39d6bec5 Core-local statistics
Summary:
This diff changes `StatisticsImpl` from a thread-local approach to a core-local one. The goal is to perform faster aggregations, particularly for applications that have many threads. There should be no behavior change.
Closes https://github.com/facebook/rocksdb/pull/2258

Differential Revision: D5016258

Pulled By: ajkr

fbshipit-source-id: 7d4d165b4a91d8110f0409d113d1be91f22d31a9
2017-05-23 10:42:59 -07:00
Aaron Gao
3e86c0f07c disable direct reads for log and manifest and add direct io to tests
Summary:
Disable direct reads for log and manifest. Direct reads should not affect sequential_file
Also add kDirectIO for option_config_ in db_test_util
Closes https://github.com/facebook/rocksdb/pull/2337

Differential Revision: D5100261

Pulled By: lightmark

fbshipit-source-id: 0ebfd13b93fa1b8f9acae514ac44f8125a05868b
2017-05-22 18:41:28 -07:00
Dmitri Smirnov
15ba4d6c4b Address MS Visual Studio 2017 issue with autovector
Summary:
This addresses https://github.com/facebook/rocksdb/issues/2262
Closes https://github.com/facebook/rocksdb/pull/2333

Differential Revision: D5097941

Pulled By: siying

fbshipit-source-id: fb33582bfe7883ecc3f6da028703982522b5f75f
2017-05-22 10:57:06 -07:00
Aaron Gao
a36220ccfb fix unity test
Summary:
unity test will fail even if we have the same function names in different anonymous namespaces in different files.
Closes https://github.com/facebook/rocksdb/pull/2321

Differential Revision: D5083783

Pulled By: lightmark

fbshipit-source-id: 1347aaf866900af30d23cdd4f29c1b96f17352af
2017-05-17 18:56:55 -07:00
Nikhil Benesch
11c5d4741a cross-platform compatibility improvements
Summary:
We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS.

See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build.

I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream.
Closes https://github.com/facebook/rocksdb/pull/2199

Differential Revision: D5054042

Pulled By: yiwu-arbug

fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
2017-05-15 16:15:38 -07:00
Andrew Kryczka
bbe9ee7dd4 core-local array type conversions
Summary:
try to clean up the type conversions and hope it passes on windows.

one interesting thing I learned is that bitshift operations are special: in `x << y`, the result type depends only on the type of `x`, unlike most arithmetic operations where the result type depends on both operands' types.
Closes https://github.com/facebook/rocksdb/pull/2277

Differential Revision: D5050145

Pulled By: ajkr

fbshipit-source-id: f3309e77526ac9612c632bf93a62d99757af9a29
2017-05-12 09:28:07 -07:00
Andrew Kryczka
cda5fde2d9 CoreLocalArray class
Summary:
Moved the logic for core-local array out of ConcurrentArena and into a separate class because I want to reuse it for core-local stats.
Closes https://github.com/facebook/rocksdb/pull/2256

Differential Revision: D5011518

Pulled By: ajkr

fbshipit-source-id: a75a7b8f7b7a42fd6273489ada405f14c6be196a
2017-05-10 18:25:36 -07:00
Anirban Rahut
d85ff4953c Blob storage pr
Summary:
The final pull request for Blob Storage.
Closes https://github.com/facebook/rocksdb/pull/2269

Differential Revision: D5033189

Pulled By: yiwu-arbug

fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06
2017-05-10 15:14:44 -07:00
Tamir Duberstein
fdaefa0309 travis: add Windows cross-compilation
Summary:
- downcase includes for case-sensitive filesystems
- give targets the same name (librocksdb) on all platforms

With this patch it is possible to cross-compile RocksDB for Windows
from a Linux host using mingw.

cc yuslepukhin orgads
Closes https://github.com/facebook/rocksdb/pull/2107

Differential Revision: D4849784

Pulled By: siying

fbshipit-source-id: ad26ed6b4d393851aa6551e6aa4201faba82ef60
2017-05-05 23:20:01 -07:00
Aaron Gao
a30a696034 do not read next datablock if upperbound is reached
Summary:
Now if we have iterate_upper_bound set, we continue read until get a key >= upper_bound. For a lot of cases that neighboring data blocks have a user key gap between them, our index key will be a user key in the middle to get a shorter size. For example, if we have blocks:
[a b c d][f g h]
Then the index key for the first block will be 'e'.
then if upper bound is any key between 'd' and 'e', for example, d1, d2, ..., d99999999999, we don't have to read the second block and also know that we have done our iteration by reaching the last key that smaller the upper bound already.

This diff can reduce RA in most cases.
Closes https://github.com/facebook/rocksdb/pull/2239

Differential Revision: D4990693

Pulled By: lightmark

fbshipit-source-id: ab30ea2e3c6edf3fddd5efed3c34fcf7739827ff
2017-05-05 23:20:01 -07:00
Aaron Gao
2d42cf5ea9 Roundup read bytes in ReadaheadRandomAccessFile
Summary:
Fix alignment in ReadaheadRandomAccessFile
Closes https://github.com/facebook/rocksdb/pull/2253

Differential Revision: D5012336

Pulled By: lightmark

fbshipit-source-id: 10d2c829520cb787227ef653ef63d5d701725778
2017-05-05 12:14:14 -07:00
Siying Dong
d616ebea23 Add GPLv2 as an alternative license.
Summary: Closes https://github.com/facebook/rocksdb/pull/2226

Differential Revision: D4967547

Pulled By: siying

fbshipit-source-id: dd3b58ae1e7a106ab6bb6f37ab5c88575b125ab4
2017-04-27 18:06:12 -07:00
Dmitri Smirnov
cdad04b051 Remove double buffering on RandomRead on Windows.
Summary:
Remove double buffering on RandomRead on Windows.
  With more logic appear in file reader/write Read no longer
  obeys forwarding calls to Windows implementation.
  Previously direct_io (unbuffered) was only available on Windows
  but now is supported as generic.
  We remove intermediate buffering on Windows.
  Remove random_access_max_buffer_size option which was windows specific.
  Non-zero values for that opton introduced unnecessary lock contention.
  Remove Env::EnableReadAhead(), Env::ShouldForwardRawRequest() that are
  no longer necessary.
  Add aligned buffer reads for cases when requested reads exceed read ahead size.
Closes https://github.com/facebook/rocksdb/pull/2105

Differential Revision: D4847770

Pulled By: siying

fbshipit-source-id: 8ab48f8e854ab498a4fd398a6934859792a2788f
2017-04-27 12:30:05 -07:00
Aaron Gao
2150cc1f3e fix WritableFile buffer size in direct IO
Summary:
�fix the buffer size in case of ppl use buffer size as their block_size.
Closes https://github.com/facebook/rocksdb/pull/2198

Differential Revision: D4956878

Pulled By: lightmark

fbshipit-source-id: 8bb0dc9c133887aadcd625d5261a3d1110b71473
2017-04-26 15:57:02 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Yi Wu
0fcdccc33e Blob storage helper methods
Summary:
Split out interfaces needed for blob storage from #1560, including
* CompactionEventListener and OnFlushBegin listener interfaces.
* Blob filename support.
Closes https://github.com/facebook/rocksdb/pull/2169

Differential Revision: D4905463

Pulled By: yiwu-arbug

fbshipit-source-id: 564e73448f1b7a367e5e46216a521e57ea9011b5
2017-04-18 12:42:38 -07:00
Aaron Gao
6e8d6f429d readahead backwards from sst end
Summary:
prefetch some data from the end of the file for each compaction to reduce IO.
Closes https://github.com/facebook/rocksdb/pull/2149

Differential Revision: D4880576

Pulled By: lightmark

fbshipit-source-id: aa767cd1afc84c541837fbf1ad6c0d45b34d3932
2017-04-14 18:56:14 -07:00
Aaron Gao
44fa8ece9b change use_direct_writes to use_direct_io_for_flush_and_compaction
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117

Differential Revision: D4860912

Pulled By: lightmark

fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
2017-04-13 16:12:04 -07:00
Aaron Gao
10d7546961 set readahead buffer size from roundup(user_size) + 4k to roundup(use…
Summary:
Users usually set readahead buffer to a multiple of 4k, more than that, usually a multiple of blocks.
So previously we set real buffer size 512 * n + 4k, which may introduce an additional block reading.
Closes https://github.com/facebook/rocksdb/pull/2138

Differential Revision: D4871504

Pulled By: lightmark

fbshipit-source-id: b070faa51d92e976e8e8468c00692699e585e243
2017-04-11 17:13:33 -07:00
Manuel Ung
1f8b119ed6 Limit maximum memory used in the WriteBatch representation
Summary:
Extend TransactionOptions to include max_write_batch_size which determines the maximum size of the writebatch representation. If memory limit is exceeded, the operation will abort with subcode kMemoryLimit.
Closes https://github.com/facebook/rocksdb/pull/2124

Differential Revision: D4861842

Pulled By: lth

fbshipit-source-id: 46fd172ea67cc90bbba829bf0d70cfab2261c161
2017-04-10 15:42:26 -07:00
Sagar Vemuri
343b59d6ee Move various string utility functions into string_util
Summary:
This is an effort to club all string related utility functions into one common place, in string_util, so that it is easier for everyone to know what string processing functions are available. Right now they seem to be spread out across multiple modules, like logging and options_helper.

Check the sub-commits for easier reviewing.
Closes https://github.com/facebook/rocksdb/pull/2094

Differential Revision: D4837730

Pulled By: sagar0

fbshipit-source-id: 344278a
2017-04-06 14:54:12 -07:00
Yi Wu
df6f5a3772 Move memtable related files into memtable directory
Summary:
Move memtable related files into memtable directory.
Closes https://github.com/facebook/rocksdb/pull/2087

Differential Revision: D4829242

Pulled By: yiwu-arbug

fbshipit-source-id: ca70ab6
2017-04-06 14:09:13 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Islam AbdelRahman
c50e3750dc Use a human readable size for level report
Summary:
Current
```
** Compaction Stats [default] **
Level    Files   Size(MB} Score Read(GB}  Rn(GB} Rnp1(GB} Write(GB} Wnew(GB} Moved(GB} W-Amp Rd(MB/s} Wr(MB/s} Comp(sec} Comp(cnt} Avg(sec} KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/0      49.02   0.5      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0     76.1         1         2    0.322       0      0
 Sum      2/0      49.02   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     76.1         1         2    0.322       0      0
 Int      0/0       0.00   0.0      0.0     0.0      0.0       0.0      0.0       0.0   1.0      0.0     76.1         1         2    0.322       0      0
```

New
```
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn Key
Closes https://github.com/facebook/rocksdb/pull/2055

Differential Revision: D4804576

Pulled By: IslamAbdelRahman

fbshipit-source-id: 719be6a
2017-04-05 17:24:19 -07:00
Aaron Gao
02799ad77a Revert "delete fallocate with punch_hole"
Summary:
This reverts commit 0fd574926c.
It breaks tmpfs on kernel 4.0 or earlier. We will wait for the fix before remove this part
Closes https://github.com/facebook/rocksdb/pull/2096

Differential Revision: D4839661

Pulled By: lightmark

fbshipit-source-id: 574a51f
2017-04-05 16:10:09 -07:00
Andrew Kryczka
e2c6c06366 add TimedEnv
Summary:
I've needed Env timing measurements a few times now, so finally built something for it.
Closes https://github.com/facebook/rocksdb/pull/2073

Differential Revision: D4811231

Pulled By: ajkr

fbshipit-source-id: 218a249
2017-04-04 11:24:12 -07:00
Siying Dong
6ef8c620d3 Move auto_roll_logger and filename out of db/
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080

Differential Revision: D4821141

Pulled By: siying

fbshipit-source-id: ca7d768
2017-04-03 18:39:14 -07:00
Orgad Shaneh
6401a8b76b Fix build with MinGW
Summary:
There still are many warnings (most of them about invalid printf format
for long long), but it builds if FAIL_ON_WARNINGS is disabled.
Closes https://github.com/facebook/rocksdb/pull/2052

Differential Revision: D4807355

Pulled By: siying

fbshipit-source-id: ef03786
2017-03-30 16:54:52 -07:00
Andrew Kryczka
80fe5b3855 disable test: DeleteSchedulerTest.DynamicRateLimiting1
Summary:
temporarily disable since it isn't working on travis.
Closes https://github.com/facebook/rocksdb/pull/2064

Differential Revision: D4807373

Pulled By: ajkr

fbshipit-source-id: f2bb2b0
2017-03-30 16:54:52 -07:00
Sagar Vemuri
c6d04f2ecf Option to fail a request as incomplete when skipping too many internal keys
Summary:
Operations like Seek/Next/Prev sometimes take too long to complete when there are many internal keys to be skipped. Adding an option, max_skippable_internal_keys -- which could be used to set a threshold for the maximum number of keys that can be skipped, will help to address these cases where it is much better to fail a request (as incomplete) than to wait for a considerable time for the request to complete.

This feature -- to fail an iterator seek request as incomplete, is disabled by default when max_skippable_internal_keys = 0. It is enabled only when max_skippable_internal_keys > 0.

This feature is based on the discussion mentioned in the PR https://github.com/facebook/rocksdb/pull/1084.
Closes https://github.com/facebook/rocksdb/pull/2000

Differential Revision: D4753223

Pulled By: sagar0

fbshipit-source-id: 1c973f7
2017-03-30 12:09:21 -07:00
Herman Lee
58179ec4a6 Cleanup of ThreadStatusUtil structures should use the DB's reference
Summary:
instead of thread_local

The cleanup path for the rocksdb database might not have the
thread_updater_local_cache_ pointer initialized because the thread
executing the cleanup is likely not a rocksdb thread. This results in a
memory leak detected by Valgrind. The cleanup code path should use the
thread_status_updater pointer obtained from the DB object instead of a
thread local one.
Closes https://github.com/facebook/rocksdb/pull/2059

Differential Revision: D4801611

Pulled By: hermanlee

fbshipit-source-id: 407d7de
2017-03-30 10:39:13 -07:00
Min Wei
8a8c967460 Enable Fast CRC32 for Win64
Summary:
Currently the fast crc32 path is not enabled on Windows. I am trying to enable it here, hopefully, with the minimum impact to the existing code structure.
Closes https://github.com/facebook/rocksdb/pull/2033

Differential Revision: D4770635

Pulled By: siying

fbshipit-source-id: 676f8b8
2017-03-29 17:39:19 -07:00
Aaron Gao
0fd574926c delete fallocate with punch_hole
Summary:
As discuss in this thread:
https://www.facebook.com/groups/rocksdb.dev/permalink/1218043868294125/

We remove fallocate with FALLOC_FL_PUNCH_HOLE because the recent bug on xfs in kernel 4.x+ that align file size to page size even with FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE.
Closes https://github.com/facebook/rocksdb/pull/2038

Differential Revision: D4779974

Pulled By: siying

fbshipit-source-id: 5f54625
2017-03-28 15:54:12 -07:00
Maysam Yabandeh
e7731d119a Configure index partition size
Summary:
Allow the users to specify the target index partition size.

With this patch an index partition is cut before its estimated in-memory size goes above the configured value for metadata_block_size. The filter partitions are still cut right after an index partition is cut.
Closes https://github.com/facebook/rocksdb/pull/2041

Differential Revision: D4780216

Pulled By: maysamyabandeh

fbshipit-source-id: 95a0831
2017-03-28 12:09:12 -07:00
Shu Zhang
8dee8cad9e Enable fifo compaction benchmark to db_bench
Summary:
Added fifo benchmark to db_bench.
One thing i am not sure is that i am using CompactRange() instead of CompactFiles(). (may cause performance skew because CompactionRange() is not happening in current thread?)  For CompactFiles(), for some reason FIFO compaction doesn't work as expected. More insight is welcomed. I guess FIFO compaction doesn't work with file names? igorcanadi

test cmd:
./db_bench --compaction_style=2 --benchmarks=fillseqdeterministic --disable_auto_compactions --num_levels=1 --fifo_compaction_max_table_files_size_mb=10

---------------------- DB 0 LSM ---------------------
Level[0]: /000014.sst(size: 4211014 bytes)
fillseqdeterministic :       4.731 micros/op 211381 ops/sec;   23.4 MB/s
Closes https://github.com/facebook/rocksdb/pull/1734

Differential Revision: D4774964

Pulled By: siying

fbshipit-source-id: 9d08df6
2017-03-24 17:09:15 -07:00
Raza Hussain
6908e24b56 dynamic setting of stats_dump_period_sec through SetDBOption()
Summary:
Resolved the following issue: https://github.com/facebook/rocksdb/issues/1930
Closes https://github.com/facebook/rocksdb/pull/2004

Differential Revision: D4736764

Pulled By: yiwu-arbug

fbshipit-source-id: 64fe0b7
2017-03-20 22:54:13 -07:00
Aaron Gao
9272e12f19 avoid ftruncate twice in buffered io
Summary:
in buffered io, the filesize_ is the real size.
Closes https://github.com/facebook/rocksdb/pull/1991

Differential Revision: D4711433

Pulled By: lightmark

fbshipit-source-id: ad604b9
2017-03-17 11:39:13 -07:00
Islam AbdelRahman
d52f334cbd Break stalls when no bg work is happening
Summary:
Current stall will keep sleeping even if there is no Flush/Compactions to wait for, I changed the logic to break the stall if we are not flushing or compacting

db_bench command used
```
# fillrandom
# memtable size = 10MB
# value size = 1 MB
# num = 1000
# use /dev/shm
./db_bench --benchmarks="fillrandom,stats" --value_size=1048576 --write_buffer_size=10485760 --num=1000 --delayed_write_rate=XXXXX  --db="/dev/shm/new_stall" | grep "Cumulative stall"
```

```
Current results

# delayed_write_rate = 1000 Kb/sec
Cumulative stall: 00:00:9.031 H:M:S

# delayed_write_rate = 200 Kb/sec
Cumulative stall: 00:00:22.314 H:M:S

# delayed_write_rate = 100 Kb/sec
Cumulative stall: 00:00:42.784 H:M:S

# delayed_write_rate = 50 Kb/sec
Cumulative stall: 00:01:23.785 H:M:S

# delayed_write_rate = 25 Kb/sec
Cumulative stall: 00:02:45.702 H:M:S
```

```
New results

# delayed_write_rate = 1000 Kb/sec
Cumulative stall: 00:00:9.017 H:M:S

# delayed_write_rate = 200 Kb/sec
Cumulative stall: 00
Closes https://github.com/facebook/rocksdb/pull/1884

Differential Revision: D4585439

Pulled By: IslamAbdelRahman

fbshipit-source-id: aed2198
2017-03-16 18:24:17 -07:00
Islam AbdelRahman
995618a821 Support SstFileManager::SetDeleteRateBytesPerSecond()
Summary:
Update DeleteScheduler component to support changing delete rate in runtime by introducing
SstFileManager::SetDeleteRateBytesPerSecond()
Closes https://github.com/facebook/rocksdb/pull/1994

Differential Revision: D4719906

Pulled By: IslamAbdelRahman

fbshipit-source-id: e6b8d9e
2017-03-16 12:09:15 -07:00
Islam AbdelRahman
e19163688b Add macros to include file name and line number during Logging
Summary:
current logging
```
2017/03/14-14:20:30.393432 7fedde9f5700 (Original Log Time 2017/03/14-14:20:30.393414) [default] Level summary: base level 1 max bytes base 268435456 files[1 0 0 0 0 0 0] max score 0.25
2017/03/14-14:20:30.393438 7fedde9f5700 [JOB 2] Try to delete WAL files size 61417909, prev total WAL file size 73820858, number of live WAL files 2.
2017/03/14-14:20:30.393464 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//MANIFEST-000001 type=3 #1 -- OK
2017/03/14-14:20:30.393472 7fedde9f5700 [DEBUG] [JOB 2] Delete /dev/shm/old_logging//000003.log type=0 #3 -- OK
2017/03/14-14:20:31.427103 7fedd49f1700 [default] New memtable created with log file: #9. Immutable memtables: 0.
2017/03/14-14:20:31.427179 7fedde9f5700 [JOB 3] Syncing log #6
2017/03/14-14:20:31.427190 7fedde9f5700 (Original Log Time 2017/03/14-14:20:31.427170) Calling FlushMemTableToOutputFile with column family [default], flush slots available 1, compaction slots allowed 1, compaction slots scheduled 1
2017/03/14-14:20:31.
Closes https://github.com/facebook/rocksdb/pull/1990

Differential Revision: D4708695

Pulled By: IslamAbdelRahman

fbshipit-source-id: cb8968f
2017-03-15 19:39:12 -07:00
Aaron Gao
d525718a93 cleanup direct io flag in WritableFileWriter
Summary:
remove unnecessary field `direct_io_`, use `use_direct_io()` instead.
Closes https://github.com/facebook/rocksdb/pull/1992

Differential Revision: D4712195

Pulled By: lightmark

fbshipit-source-id: 57d34f9
2017-03-14 22:39:09 -07:00
Maysam Yabandeh
11526252cc Pinnableslice (2nd attempt)
Summary:
PinnableSlice

    Summary:
    Currently the point lookup values are copied to a string provided by the
    user. This incures an extra memcpy cost. This patch allows doing point lookup
    via a PinnableSlice which pins the source memory location (instead of
    copying their content) and releases them after the content is consumed
    by the user. The old API of Get(string) is translated to the new API
    underneath.

    Here is the summary for improvements:

    value 100 byte: 1.8% regular, 1.2% merge values
    value 1k byte: 11.5% regular, 7.5% merge values
    value 10k byte: 26% regular, 29.9% merge values
    The improvement for merge could be more if we extend this approach to
    pin the merge output and delay the full merge operation until the user
    actually needs it. We have put that for future work.

    PS:
    Sometimes we observe a small decrease in performance when switching from
    t5452014 to this patch but with the old Get(string) API. The d
Closes https://github.com/facebook/rocksdb/pull/1756

Differential Revision: D4391738

Pulled By: maysamyabandeh

fbshipit-source-id: 6f3edd3
2017-03-13 11:54:10 -07:00