Commit Graph

798 Commits

Author SHA1 Message Date
Peter Dillinger
00d58a370e Abandon use of folly::Optional (#6036)
Summary:
Had complications with LITE build and valgrind test.
Reverts/fixes small parts of PR https://github.com/facebook/rocksdb/issues/6007
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6036

Test Plan:
make LITE=1 all check
and
ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 make -j24 db_bloom_filter_test && ROCKSDB_VALGRIND_RUN=1 DISABLE_JEMALLOC=1 ./db_bloom_filter_test

Differential Revision: D18512238

Pulled By: pdillinger

fbshipit-source-id: 37213cf0d309edf11c483fb4b2fb6c02c2cf2b28
2019-11-14 14:04:15 -08:00
Peter Dillinger
f059c7d9b9 New Bloom filter implementation for full and partitioned filters (#6007)
Summary:
Adds an improved, replacement Bloom filter implementation (FastLocalBloom) for full and partitioned filters in the block-based table. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single filter.

Speed

The improved speed, at least on recent x86_64, comes from
* Using fastrange instead of modulo (%)
* Using our new hash function (XXH3 preview, added in a previous commit), which is much faster for large keys and only *slightly* slower on keys around 12 bytes if hashing the same size many thousands of times in a row.
* Optimizing the Bloom filter queries with AVX2 SIMD operations. (Added AVX2 to the USE_SSE=1 build.) Careful design was required to support (a) SIMD-optimized queries, (b) compatible non-SIMD code that's simple and efficient, (c) flexible choice of number of probes, and (d) essentially maximized accuracy for a cache-local Bloom filter. Probes are made eight at a time, so any number of probes up to 8 is the same speed, then up to 16, etc.
* Prefetching cache lines when building the filter. Although this optimization could be applied to the old structure as well, it seems to balance out the small added cost of accumulating 64 bit hashes for adding to the filter rather than 32 bit hashes.

Here's nominal speed data from filter_bench (200MB in filters, about 10k keys each, 10 bits filter data / key, 6 probes, avg key size 24 bytes, includes hashing time) on Skylake DE (relatively low clock speed):

$ ./filter_bench -quick -impl=2 -net_includes_hashing # New Bloom filter
Build avg ns/key: 47.7135
Mixed inside/outside queries...
  Single filter net ns/op: 26.2825
  Random filter net ns/op: 150.459
    Average FP rate %: 0.954651
$ ./filter_bench -quick -impl=0 -net_includes_hashing # Old Bloom filter
Build avg ns/key: 47.2245
Mixed inside/outside queries...
  Single filter net ns/op: 63.2978
  Random filter net ns/op: 188.038
    Average FP rate %: 1.13823

Similar build time but dramatically faster query times on hot data (63 ns to 26 ns), and somewhat faster on stale data (188 ns to 150 ns). Performance differences on batched and skewed query loads are between these extremes as expected.

The only other interesting thing about speed is "inside" (query key was added to filter) vs. "outside" (query key was not added to filter) query times. The non-SIMD implementations are substantially slower when most queries are "outside" vs. "inside". This goes against what one might expect or would have observed years ago, as "outside" queries only need about two probes on average, due to short-circuiting, while "inside" always have num_probes (say 6). The problem is probably the nastily unpredictable branch. The SIMD implementation has few branches (very predictable) and has pretty consistent running time regardless of query outcome.

Accuracy

The generally improved accuracy (re: Issue https://github.com/facebook/rocksdb/issues/5857) comes from a better design for probing indices
within a cache line (re: Issue https://github.com/facebook/rocksdb/issues/4120) and improved accuracy for millions of keys in a single filter from using a 64-bit hash function (XXH3p). Design details in code comments.

Accuracy data (generalizes, except old impl gets worse with millions of keys):
Memory bits per key: FP rate percent old impl -> FP rate percent new impl
6: 5.70953 -> 5.69888
8: 2.45766 -> 2.29709
10: 1.13977 -> 0.959254
12: 0.662498 -> 0.411593
16: 0.353023 -> 0.0873754
24: 0.261552 -> 0.0060971
50: 0.225453 -> ~0.00003 (less than 1 in a million queries are FP)

Fixes https://github.com/facebook/rocksdb/issues/5857
Fixes https://github.com/facebook/rocksdb/issues/4120

Unlike the old implementation, this implementation has a fixed cache line size (64 bytes). At 10 bits per key, the accuracy of this new implementation is very close to the old implementation with 128-byte cache line size. If there's sufficient demand, this implementation could be generalized.

Compatibility

Although old releases would see the new structure as corrupt filter data and read the table as if there's no filter, we've decided only to enable the new Bloom filter with new format_version=5. This provides a smooth path for automatic adoption over time, with an option for early opt-in.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6007

Test Plan: filter_bench has been used thoroughly to validate speed, accuracy, and correctness. Unit tests have been carefully updated to exercise new and old implementations, as well as the logic to select an implementation based on context (format_version).

Differential Revision: D18294749

Pulled By: pdillinger

fbshipit-source-id: d44c9db3696e4d0a17caaec47075b7755c262c5f
2019-11-13 16:44:01 -08:00
Yun Tang
07a0ad3c29 Download bzip2 packages from sourceforge (#5995)
Summary:
From bzip2's official [download page](http://www.bzip.org/downloads.html), we could download it from sourceforge. This source would be more credible than previous web archive.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5995

Differential Revision: D18377662

fbshipit-source-id: e8353f83d5d6ea6067f78208b7bfb7f0d5b49c05
2019-11-07 12:51:06 -08:00
Yanqin Jin
925250f42f Include db_stress_tool in rocksdb tools lib (#5950)
Summary:
include db_stress_tool in rocksdb tools lib

Test Plan (on devserver):
```
$make db_stress
$./db_stress
$make all && make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5950

Differential Revision: D18044399

Pulled By: riversand963

fbshipit-source-id: 895585abbbdfd8b954965921dba4b1400b7af1b1
2019-10-21 19:40:35 -07:00
Yanqin Jin
e60cc0925c Expose db stress tests (#5937)
Summary:
expose db stress test by providing db_stress_tool.h in public header.
This PR does the following:
- adds a new header, db_stress_tool.h, in include/rocksdb/
- renames db_stress.cc to db_stress_tool.cc
- adds a db_stress.cc which simply invokes a test function.
- update Makefile accordingly.

Test Plan (dev server):
```
make db_stress
./db_stress
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5937

Differential Revision: D17997647

Pulled By: riversand963

fbshipit-source-id: 1a8d9994f89ce198935566756947c518f0052410
2019-10-18 09:46:44 -07:00
Peter Dillinger
46ca51d430 filter_bench - a prelim tool for SST filter benchmarking (#5825)
Summary:
Example: using the tool before and after PR https://github.com/facebook/rocksdb/issues/5784 shows that
the refactoring, presumed performance-neutral, actually sped up SST
filters by about 3% to 8% (repeatable result):

Before:
-  Dry run ns/op: 22.4725
-  Single filter ns/op: 51.1078
-  Random filter ns/op: 120.133

After:
+  Dry run ns/op: 22.2301
+  Single filter run ns/op: 47.4313
+  Random filter ns/op: 115.9

Only tests filters for the block-based table (full filters and
partitioned filters - same implementation; not block-based filters),
which seems to be the recommended format/implementation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5825

Differential Revision: D17804987

Pulled By: pdillinger

fbshipit-source-id: 0f18a9c254c57f7866030d03e7fa4ba503bac3c5
2019-10-07 20:10:53 -07:00
Yanqin Jin
a9c5e8e944 Refactor deletefile_test.cc (#5822)
Summary:
Make DeleteFileTest inherit DBTestBase to avoid code duplication.

Test Plan (on devserver)
```
$make deletefile_test
$./deletefile_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5822

Differential Revision: D17456750

Pulled By: riversand963

fbshipit-source-id: 224e97967da7b98838a98981cd5095d3230a814f
2019-09-18 16:58:21 -07:00
Yanqin Jin
6a279037cf Refactor ObsoleteFilesTest to inherit from DBTestBase (#5820)
Summary:
Make class ObsoleteFilesTest inherit from DBTestBase.

Test plan (on devserver):
```
$COMPILE_WITH_ASAN=1 make obsolete_files_test
$./obsolete_files_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5820

Differential Revision: D17452348

Pulled By: riversand963

fbshipit-source-id: b09f4581a18022ca2bfd79f2836c0bf7083f5f25
2019-09-18 11:52:17 -07:00
Peter Dillinger
68626249c3 Refactor/consolidate legacy Bloom implementation details (#5784)
Summary:
Refactoring to consolidate implementation details of legacy
Bloom filters. This helps to organize and document some related,
obscure code.

Also added make/cpp var TEST_CACHE_LINE_SIZE so that it's easy to
compile and run unit tests for non-native cache line size. (Fixed a
related test failure in db_properties_test.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5784

Test Plan:
make check, including Recently added Bloom schema unit tests
(in ./plain_table_db_test && ./bloom_test), and including with
TEST_CACHE_LINE_SIZE=128U and TEST_CACHE_LINE_SIZE=256U. Tested the
schema tests with temporary fault injection into new implementations.

Some performance testing with modified unit tests suggest a small to moderate
improvement in speed.

Differential Revision: D17381384

Pulled By: pdillinger

fbshipit-source-id: ee42586da996798910fc45ac0b6289147f16d8df
2019-09-16 16:17:09 -07:00
Peter Dillinger
d3a6726f02 Revert changes from PR#5784 accidentally in PR#5780 (#5810)
Summary:
This will allow us to fix history by having the code changes for PR#5784 properly attributed to it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5810

Differential Revision: D17400231

Pulled By: pdillinger

fbshipit-source-id: 2da8b1cdf2533cfedb35b5526eadefb38c291f09
2019-09-16 11:38:53 -07:00
Peter Dillinger
aa2486b23c Refactor some confusing logic in PlainTableReader
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5780

Test Plan: existing plain table unit test

Differential Revision: D17368629

Pulled By: pdillinger

fbshipit-source-id: f25409cdc2f39ebe8d5cbb599cf820270e6b5d26
2019-09-13 10:26:36 -07:00
Wilfried Goesgens
fbab9913e2 upgrade gtest 1.7.0 => 1.8.1 for json result writing
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332

Differential Revision: D17242232

fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64
2019-09-09 11:24:11 -07:00
ENDOH takanao
3f2723a81b fix checking the '-march' flag (#5766)
Summary:
Hi! guys,

I got errors on the ARM machine.

before:

```console
$ make static_lib
...
g++: error: unrecognized argument in option '-march=armv8-a+crc+crypto'
g++: note: valid arguments to '-march=' are: armv2 armv2a armv3 armv3m armv4 armv4t armv5 armv5e armv5t armv5te armv6 armv6-m armv6j armv6k armv6kz armv6s-m armv6t2 armv6z armv6zk armv7 armv7-a armv7-m armv7-r armv7e-m armv7ve armv8-a armv8-a+crc armv8.1-a armv8.1-a+crc iwmmxt iwmmxt2 native
```

Thanks!
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5766

Differential Revision: D17191117

fbshipit-source-id: 7a61e3a2a4a06f37faeb8429bd7314da54ec5868
2019-09-04 14:34:28 -07:00
sdong
d8a27d9331 Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729)
Summary:
AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same.
Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729

Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed.

Differential Revision: D16969791

fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163
2019-08-22 16:32:55 -07:00
sdong
e1c468d16f Do readahead in VerifyChecksum() (#5713)
Summary:
Right now VerifyChecksum() doesn't do read-ahead. In some use cases, users won't be able to achieve good performance. With this change, by default, RocksDB will do a default readahead, and users will be able to overwrite the readahead size by passing in a ReadOptions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5713

Test Plan: Add a new unit test.

Differential Revision: D16860874

fbshipit-source-id: 0cff0fe79ac855d3d068e6ccd770770854a68413
2019-08-16 16:42:56 -07:00
Adam Retter
f2bf0b2d1e Fixes for building RocksJava releases on arm64v8
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5674

Differential Revision: D16870338

fbshipit-source-id: c8dac644b1479fa734b491f3a8d50151772290f7
2019-08-16 16:27:50 -07:00
Aaryaman Sagar
77273d4137 Fix TSAN failures in DistributedMutex tests (#5684)
Summary:
TSAN was not able to correctly instrument atomic bts and btr instructions, so
when TSAN is enabled implement those with std::atomic::fetch_or and
std::atomic::fetch_and. Also disable tests that fail on TSAN with false
negatives (we know these are false negatives because this other verifiably
correct program fails with the same TSAN error <link>)

```
make clean
TEST_TMPDIR=/dev/shm/rocksdb OPT=-g COMPILE_WITH_TSAN=1 make J=1 -j56 folly_synchronization_distributed_mutex_test
```

This is the code that fails with the same false-negative with TSAN
```
namespace {
class ExceptionWithConstructionTrack : public std::exception {
 public:
  explicit ExceptionWithConstructionTrack(int id)
      : id_{folly::to<std::string>(id)}, constructionTrack_{id} {}

  const char* what() const noexcept override {
    return id_.c_str();
  }

 private:
  std::string id_;
  TestConstruction constructionTrack_;
};

template <typename Storage, typename Atomic>
void transferCurrentException(Storage& storage, Atomic& produced) {
  assert(std::current_exception());
  new (&storage) std::exception_ptr(std::current_exception());
  produced->store(true, std::memory_order_release);
}

void concurrentExceptionPropagationStress(
    int numThreads,
    std::chrono::milliseconds milliseconds) {
  auto&& stop = std::atomic<bool>{false};
  auto&& exceptions = std::vector<std::aligned_storage<48, 8>::type>{};
  auto&& produced = std::vector<std::unique_ptr<std::atomic<bool>>>{};
  auto&& consumed = std::vector<std::unique_ptr<std::atomic<bool>>>{};
  auto&& consumers = std::vector<std::thread>{};
  for (auto i = 0; i < numThreads; ++i) {
    produced.emplace_back(new std::atomic<bool>{false});
    consumed.emplace_back(new std::atomic<bool>{false});
    exceptions.push_back({});
  }

  auto producer = std::thread{[&]() {
    auto counter = std::vector<int>(numThreads, 0);
    for (auto i = 0; true; i = ((i + 1) % numThreads)) {
      try {
        throw ExceptionWithConstructionTrack{counter.at(i)++};
      } catch (...) {
        transferCurrentException(exceptions.at(i), produced.at(i));
      }

      while (!consumed.at(i)->load(std::memory_order_acquire)) {
        if (stop.load(std::memory_order_acquire)) {
          return;
        }
      }

      consumed.at(i)->store(false, std::memory_order_release);
    }
  }};

  for (auto i = 0; i < numThreads; ++i) {
    consumers.emplace_back([&, i]() {
      auto counter = 0;
      while (true) {
        while (!produced.at(i)->load(std::memory_order_acquire)) {
          if (stop.load(std::memory_order_acquire)) {
            return;
          }
        }
        produced.at(i)->store(false, std::memory_order_release);

        try {
          auto storage = &exceptions.at(i);
          auto exc = folly::launder(
            reinterpret_cast<std::exception_ptr*>(storage));
          auto copy = std::move(*exc);
          exc->std::exception_ptr::~exception_ptr();
          std::rethrow_exception(std::move(copy));
        } catch (std::exception& exc) {
          auto value = std::stoi(exc.what());
          EXPECT_EQ(value, counter++);
        }

        consumed.at(i)->store(true, std::memory_order_release);
      }
    });
  }

  std::this_thread::sleep_for(milliseconds);
  stop.store(true);
  producer.join();
  for (auto& thread : consumers) {
    thread.join();
  }
}
} // namespace
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5684

Differential Revision: D16746077

Pulled By: miasantreble

fbshipit-source-id: 8af88dcf9161c05daec1a76290f577918638f79d
2019-08-14 17:01:31 -07:00
Aaryaman Sagar
38b03c840e Port folly/synchronization/DistributedMutex to rocksdb (#5642)
Summary:
This ports `folly::DistributedMutex` into RocksDB. The PR includes everything else needed to compile and use DistributedMutex as a component within folly. Most files are unchanged except for some portability stuff and includes.

For now, I've put this under `rocksdb/third-party`, but if there is a better folder to put this under, let me know. I also am not sure how or where to put unit tests for third-party stuff like this. It seems like gtest is included already, but I need to link with it from another third-party folder.

This also includes some other common components from folly

- folly/Optional
- folly/ScopeGuard (In particular `SCOPE_EXIT`)
- folly/synchronization/ParkingLot (A portable futex-like interface)
- folly/synchronization/AtomicNotification (The standard C++ interface for futexes)
- folly/Indestructible (For singletons that don't get destroyed without allocations)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5642

Differential Revision: D16544439

fbshipit-source-id: 179b98b5dcddc3075926d31a30f92fd064245731
2019-08-07 14:34:19 -07:00
Vijay Nadimpalli
d150e01474 New API to get all merge operands for a Key (#5604)
Summary:
This is a new API added to db.h to allow for fetching all merge operands associated with a Key. The main motivation for this API is to support use cases where doing a full online merge is not necessary as it is performance sensitive. Example use-cases:
1. Update subset of columns and read subset of columns -
Imagine a SQL Table, a row is encoded as a K/V pair (as it is done in MyRocks). If there are many columns and users only updated one of them, we can use merge operator to reduce write amplification. While users only read one or two columns in the read query, this feature can avoid a full merging of the whole row, and save some CPU.
2. Updating very few attributes in a value which is a JSON-like document -
Updating one attribute can be done efficiently using merge operator, while reading back one attribute can be done more efficiently if we don't need to do a full merge.
----------------------------------------------------------------------------------------------------
API :
Status GetMergeOperands(
      const ReadOptions& options, ColumnFamilyHandle* column_family,
      const Slice& key, PinnableSlice* merge_operands,
      GetMergeOperandsOptions* get_merge_operands_options,
      int* number_of_operands)

Example usage :
int size = 100;
int number_of_operands = 0;
std::vector<PinnableSlice> values(size);
GetMergeOperandsOptions merge_operands_info;
db_->GetMergeOperands(ReadOptions(), db_->DefaultColumnFamily(), "k1", values.data(), merge_operands_info, &number_of_operands);

Description :
Returns all the merge operands corresponding to the key. If the number of merge operands in DB is greater than merge_operands_options.expected_max_number_of_operands no merge operands are returned and status is Incomplete. Merge operands returned are in the order of insertion.
merge_operands-> Points to an array of at-least merge_operands_options.expected_max_number_of_operands and the caller is responsible for allocating it. If the status returned is Incomplete then number_of_operands will contain the total number of merge operands found in DB for key.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5604

Test Plan:
Added unit test and perf test in db_bench that can be run using the command:
./db_bench -benchmarks=getmergeoperands --merge_operator=sortlist

Differential Revision: D16657366

Pulled By: vjnadimpalli

fbshipit-source-id: 0faadd752351745224ee12d4ae9ef3cb529951bf
2019-08-06 14:26:44 -07:00
Yanqin Jin
b1a02ffeab Fix make target 'all' and 'check' (#5672)
Summary:
If a test is one of parallel tests, then it should also be one of the 'tests'.
Otherwise, `make all` won't build the binaries. For examle,
```
$COMPILE_WITH_ASAN=1 make -j32 all
```
Then if you do
```
$make check
```
The second command will invoke the compilation and building for db_bloom_test
and file_reader_writer_test **without** the `COMPILE_WITH_ASAN=1`, causing the
command to fail.

Test plan (on devserver):
```
$make -j32 all
```
Verify all binaries are built so that `make check` won't have to compile any
thing.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5672

Differential Revision: D16655834

Pulled By: riversand963

fbshipit-source-id: 050131412b5313496f85ae3deeeeb8d28af75746
2019-08-05 15:45:56 -07:00
haoyuhuang
70c7302fb5 Block cache simulator: Add pysim to simulate caches using reinforcement learning. (#5610)
Summary:
This PR implements cache eviction using reinforcement learning. It includes two implementations:
1. An implementation of Thompson Sampling for the Bernoulli Bandit [1].
2. An implementation of LinUCB with disjoint linear models [2].

The idea is that a cache uses multiple eviction policies, e.g., MRU, LRU, and LFU. The cache learns which eviction policy is the best and uses it upon a cache miss.
Thompson Sampling is contextless and does not include any features.
LinUCB includes features such as level, block type, caller, column family id to decide which eviction policy to use.

[1] Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband, and Zheng Wen. 2018. A Tutorial on Thompson Sampling. Found. Trends Mach. Learn. 11, 1 (July 2018), 1-96. DOI: https://doi.org/10.1561/2200000070
[2] Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web (WWW '10). ACM, New York, NY, USA, 661-670. DOI=http://dx.doi.org/10.1145/1772690.1772758
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5610

Differential Revision: D16435067

Pulled By: HaoyuHuang

fbshipit-source-id: 6549239ae14115c01cb1e70548af9e46d8dc21bb
2019-07-26 14:41:13 -07:00
Levi Tamasi
3617287e0e Parallelize db_bloom_filter_test (#5632)
Summary:
This test frequently times out under TSAN; parallelizing it should fix
this issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5632

Test Plan:
make check
buck test mode/dev-tsan internal_repo_rocksdb/repo:db_bloom_filter_test

Differential Revision: D16519399

Pulled By: ltamasi

fbshipit-source-id: 66e05a644d6f79c6d544255ffcf6de195d2d62fe
2019-07-26 11:48:17 -07:00
Yanqin Jin
74782cec32 Fix target 'clean' to include parallel test binaries (#5629)
Summary:
current `clean` target in Makefile does not remove parallel test
binaries. Fix this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5629

Test Plan:
(on devserver)
Take file_reader_writer_test for instance.
```
$make -j32 file_reader_writer_test
$make clean
```
Verify that binary file 'file_reader_writer_test' is delete by `make clean`.

Differential Revision: D16513176

Pulled By: riversand963

fbshipit-source-id: 70acb9f56c928a494964121b86aacc0090f31ff6
2019-07-26 09:56:09 -07:00
anand76
112702ac6c Parallelize file_reader_writer_test in order to reduce timeouts
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5608

Test Plan:
make check
buck test mode/dev-tsan internal_repo_rocksdb/repo:file_reader_writer_test -- --run-disabled

Differential Revision: D16441796

Pulled By: anand1976

fbshipit-source-id: afbb88a9fcb1c0ba22215118767e8eab3d1d6a4a
2019-07-23 11:50:10 -07:00
Venki Pallipadi
22ce462450 Export Import sst files (#5495)
Summary:
Refresh of the earlier change here - https://github.com/facebook/rocksdb/issues/5135

This is a review request for code change needed for - https://github.com/facebook/rocksdb/issues/3469
"Add support for taking snapshot of a column family and creating column family from a given CF snapshot"

We have an implementation for this that we have been testing internally. We have two new APIs that together provide this functionality.

(1) ExportColumnFamily() - This API is modelled after CreateCheckpoint() as below.
// Exports all live SST files of a specified Column Family onto export_dir,
// returning SST files information in metadata.
// - SST files will be created as hard links when the directory specified
//   is in the same partition as the db directory, copied otherwise.
// - export_dir should not already exist and will be created by this API.
// - Always triggers a flush.
virtual Status ExportColumnFamily(ColumnFamilyHandle* handle,
                                  const std::string& export_dir,
                                  ExportImportFilesMetaData** metadata);

Internally, the API will DisableFileDeletions(), GetColumnFamilyMetaData(), Parse through
metadata, creating links/copies of all the sst files, EnableFileDeletions() and complete the call by
returning the list of file metadata.

(2) CreateColumnFamilyWithImport() - This API is modeled after IngestExternalFile(), but invoked only during a CF creation as below.
// CreateColumnFamilyWithImport() will create a new column family with
// column_family_name and import external SST files specified in metadata into
// this column family.
// (1) External SST files can be created using SstFileWriter.
// (2) External SST files can be exported from a particular column family in
//     an existing DB.
// Option in import_options specifies whether the external files are copied or
// moved (default is copy). When option specifies copy, managing files at
// external_file_path is caller's responsibility. When option specifies a
// move, the call ensures that the specified files at external_file_path are
// deleted on successful return and files are not modified on any error
// return.
// On error return, column family handle returned will be nullptr.
// ColumnFamily will be present on successful return and will not be present
// on error return. ColumnFamily may be present on any crash during this call.
virtual Status CreateColumnFamilyWithImport(
    const ColumnFamilyOptions& options, const std::string& column_family_name,
    const ImportColumnFamilyOptions& import_options,
    const ExportImportFilesMetaData& metadata,
    ColumnFamilyHandle** handle);

Internally, this API creates a new CF, parses all the sst files and adds it to the specified column family, at the same level and with same sequence number as in the metadata. Also performs safety checks with respect to overlaps between the sst files being imported.

If incoming sequence number is higher than current local sequence number, local sequence
number is updated to reflect this.

Note, as the sst files is are being moved across Column Families, Column Family name in sst file
will no longer match the actual column family on destination DB. The API does not modify Column
Family name or id in the sst files being imported.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5495

Differential Revision: D16018881

fbshipit-source-id: 9ae2251025d5916d35a9fc4ea4d6707f6be16ff9
2019-07-17 12:27:14 -07:00
Yuqi Gu
a3c1832e86 Arm64 CRC32 parallel computation optimization for RocksDB (#5494)
Summary:
Crc32c Parallel computation optimization:
Algorithm comes from Intel whitepaper: [crc-iscsi-polynomial-crc32-instruction-paper](https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/crc-iscsi-polynomial-crc32-instruction-paper.pdf)
 Input data is divided into three equal-sized blocks
Three parallel blocks (crc0, crc1, crc2) for 1024 Bytes
One Block: 42(BLK_LENGTH) * 8(step length: crc32c_u64) bytes

1. crc32c_test:
```
[==========] Running 4 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 4 tests from CRC
[ RUN      ] CRC.StandardResults
[       OK ] CRC.StandardResults (1 ms)
[ RUN      ] CRC.Values
[       OK ] CRC.Values (0 ms)
[ RUN      ] CRC.Extend
[       OK ] CRC.Extend (0 ms)
[ RUN      ] CRC.Mask
[       OK ] CRC.Mask (0 ms)
[----------] 4 tests from CRC (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 1 test case ran. (1 ms total)
[  PASSED  ] 4 tests.
```

2. RocksDB benchmark: db_bench --benchmarks="crc32c"

```
Linear Arm crc32c:
  crc32c: 1.005 micros/op 995133 ops/sec; 3887.2 MB/s (4096 per op)
```

```
Parallel optimization with Armv8 crypto extension:
  crc32c: 0.419 micros/op 2385078 ops/sec; 9316.7 MB/s (4096 per op)
```

It gets ~2.4x speedup compared to linear Arm crc32c instructions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5494

Differential Revision: D16340806

fbshipit-source-id: 95dae9a5b646fd20a8303671d82f17b2e162e945
2019-07-17 11:22:38 -07:00
haoyuhuang
1a59b6e2a9 Cache simulator: Add a ghost cache for admission control and a hybrid row-block cache. (#5534)
Summary:
This PR adds a ghost cache for admission control. Specifically, it admits an entry on its second access.
It also adds a hybrid row-block cache that caches the referenced key-value pairs of a Get/MultiGet request instead of its blocks.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5534

Test Plan: make clean && COMPILE_WITH_ASAN=1 make check -j32

Differential Revision: D16101124

Pulled By: HaoyuHuang

fbshipit-source-id: b99edda6418a888e94eb40f71ece45d375e234b1
2019-07-11 12:43:29 -07:00
ggaurav28
60d8b19836 Implemented a file logger that uses WritableFileWriter (#5491)
Summary:
Current PosixLogger performs IO operations using posix calls. Thus the
current implementation will not work for non-posix env. Created a new
logger class EnvLogger that uses env specific WritableFileWriter for IO operations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5491

Test Plan: make check

Differential Revision: D15909002

Pulled By: ggaurav28

fbshipit-source-id: 13a8105176e8e42db0c59798d48cb6a0dbccc965
2019-07-09 16:27:22 -07:00
Adam Retter
5dc9fbd117 Update the version of ZStd for the Rocks Java static build
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5228

Differential Revision: D15880451

Pulled By: sagar0

fbshipit-source-id: 84da6f42cac15367d95bffa5336ebd002e7c3308
2019-06-18 11:57:01 -07:00
haoyuhuang
2d1dd5bce7 Support computing miss ratio curves using sim_cache. (#5449)
Summary:
This PR adds a BlockCacheTraceSimulator that reports the miss ratios given different cache configurations. A cache configuration contains "cache_name,num_shard_bits,cache_capacities". For example, "lru, 1, 1K, 2K, 4M, 4G".

When we replay the trace, we also perform lookups and inserts on the simulated caches.
In the end, it reports the miss ratio for each tuple <cache_name, num_shard_bits, cache_capacity> in a output file.

This PR also adds a main source block_cache_trace_analyzer so that we can run the analyzer in command line.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5449

Test Plan:
Added tests for block_cache_trace_analyzer.
COMPILE_WITH_ASAN=1 make check -j32.

Differential Revision: D15797073

Pulled By: HaoyuHuang

fbshipit-source-id: aef0c5c2e7938f3e8b6a10d4a6a50e6928ecf408
2019-06-17 16:41:12 -07:00
Zhongyi Xie
671d15cbdd Persistent Stats: persist stats history to disk (#5046)
Summary:
This PR continues the work in https://github.com/facebook/rocksdb/pull/4748 and https://github.com/facebook/rocksdb/pull/4535 by adding a new DBOption `persist_stats_to_disk` which instructs RocksDB to persist stats history to RocksDB itself. When statistics is enabled, and  both options `stats_persist_period_sec` and `persist_stats_to_disk` are set, RocksDB will periodically write stats to a built-in column family in the following form: key -> (timestamp in microseconds)#(stats name), value -> stats value. The existing API `GetStatsHistory` will detect the current value of `persist_stats_to_disk` and either read from in-memory data structure or from the hidden column family on disk.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5046

Differential Revision: D15863138

Pulled By: miasantreble

fbshipit-source-id: bb82abdb3f2ca581aa42531734ac799f113e931b
2019-06-17 15:21:50 -07:00
Patrick Zhang
5c76ba9dc4 Support rocksdbjava aarch64 build and test (#5258)
Summary:
Verified with an Ampere Computing eMAG aarch64 system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5258

Differential Revision: D15807309

Pulled By: maysamyabandeh

fbshipit-source-id: ab85d2fd3fe40e6094430ab0eba557b1e979510d
2019-06-13 11:48:10 -07:00
haoyuhuang
9bbccda01e First commit for block cache trace analyzer (#5425)
Summary:
This PR contains the first commit for block cache trace analyzer. It reads a block cache trace file and prints statistics of the traces.

We will extend this class to provide more functionalities.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5425

Differential Revision: D15709580

Pulled By: HaoyuHuang

fbshipit-source-id: 2f43bd2311f460ab569880819d95eeae217c20bb
2019-06-11 12:22:44 -07:00
haoyuhuang
aa71718ac3 Add block cache tracer. (#5410)
Summary:
This PR adds a help class block cache tracer to read/write block cache accesses. It uses the trace reader/writer to perform this task.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5410

Differential Revision: D15612843

Pulled By: HaoyuHuang

fbshipit-source-id: f30fd1e1524355ca87db5d533a5c086728b141ea
2019-06-06 11:24:39 -07:00
Siying Dong
000b9ec217 Move some logging related files to logging/ (#5387)
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387

Differential Revision: D15579036

Pulled By: siying

fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
2019-05-31 17:23:59 -07:00
Vijay Nadimpalli
49c5a12dbe Organizing rocksdb/db directory
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5390

Differential Revision: D15579388

Pulled By: vjnadimpalli

fbshipit-source-id: 5bfc95e31554b8ff05b97b76d6534113f527f366
2019-05-31 11:57:01 -07:00
Siying Dong
8843129ece Move some memory related files from util/ to memory/ (#5382)
Summary:
Move arena, allocator, and memory tools under util to a separate memory/ directory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382

Differential Revision: D15564655

Pulled By: siying

fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
2019-05-30 17:44:09 -07:00
Vijay Nadimpalli
50e470791d Organizing rocksdb/table directory by format
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5373

Differential Revision: D15559425

Pulled By: vjnadimpalli

fbshipit-source-id: 5d6d6d615582bedd96a4b879bb25d429a6de8b55
2019-05-30 14:51:11 -07:00
Siying Dong
e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Siying Dong
545d206040 Move some file related files outside util/ (#5375)
Summary:
util/ means for lower level libraries, so it's a good idea to move the files which requires knowledge to DB out. Create a file/ and move some files there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5375

Differential Revision: D15550935

Pulled By: siying

fbshipit-source-id: 61a9715dcde5386eebfb43e93f847bba1ae0d3f2
2019-05-29 20:47:06 -07:00
Adam Retter
5882e847aa Allow builds of RocksJava debug releases (#5274)
Summary:
This allows debug releases of RocksJava to be build with the Docker release targets.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5274

Differential Revision: D15185067

Pulled By: sagar0

fbshipit-source-id: f3988e472f281f5844d9a07098344a827b1e7eb1
2019-05-02 14:27:20 -07:00
Yuqi Gu
03c7ae24c2 RocksDB CRC32c optimization with ARMv8 Intrinsic (#5221)
Summary:
1. Add Arm linear crc32c implemtation for RocksDB.
2. Arm runtime check for crc32
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5221

Differential Revision: D15013685

Pulled By: siying

fbshipit-source-id: 2c2983743d26656d93f212dc7c1a3cf66a1acf12
2019-04-30 10:59:05 -07:00
Yanqin Jin
9358178edc Support for single-primary, multi-secondary instances (#4899)
Summary:
This PR allows RocksDB to run in single-primary, multi-secondary process mode.
The writer is a regular RocksDB (e.g. an `DBImpl`) instance playing the role of a primary.
Multiple `DBImplSecondary` processes (secondaries) share the same set of SST files, MANIFEST, WAL files with the primary. Secondaries tail the MANIFEST of the primary and apply updates to their own in-memory state of the file system, e.g. `VersionStorageInfo`.

This PR has several components:
1. (Originally in #4745). Add a `PathNotFound` subcode to `IOError` to denote the failure when a secondary tries to open a file which has been deleted by the primary.

2. (Similar to #4602). Add `FragmentBufferedReader` to handle partially-read, trailing record at the end of a log from where future read can continue.

3. (Originally in #4710 and #4820). Add implementation of the secondary, i.e. `DBImplSecondary`.
3.1 Tail the primary's MANIFEST during recovery.
3.2 Tail the primary's MANIFEST during normal processing by calling `ReadAndApply`.
3.3 Tailing WAL will be in a future PR.

4. Add an example in 'examples/multi_processes_example.cc' to demonstrate the usage of secondary RocksDB instance in a multi-process setting. Instructions to run the example can be found at the beginning of the source code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4899

Differential Revision: D14510945

Pulled By: riversand963

fbshipit-source-id: 4ac1c5693e6012ad23f7b4b42d3c374fecbe8886
2019-03-26 16:45:31 -07:00
Adam Retter
bb474e9a02 Add missing functionality to RocksJava (#4833)
Summary:
This is my latest round of changes to add missing items to RocksJava. More to come in future PRs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4833

Differential Revision: D14152266

Pulled By: sagar0

fbshipit-source-id: d6cff67e26da06c131491b5cf6911a8cd0db0775
2019-02-22 14:46:46 -08:00
Yanqin Jin
7d23210226 Separate crash test with atomic flush (#4945)
Summary:
Currently crash test covers cases with and without atomic flush, but takes too
long to finish. Therefore it may be a better idea to put crash test with atomic
flush in a separate set of tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4945

Differential Revision: D13947548

Pulled By: riversand963

fbshipit-source-id: 177c6de865290fd650b0103408339eaa3f801d8c
2019-02-19 14:08:39 -08:00
Yanqin Jin
5af9446ee6 Remove Lua compaction filter from RocksDB main repo (#4971)
Summary:
as title. For people who continue to need Lua compaction filter, you
can copy the include/rocksdb/utilities/rocks_lua/lua_compaction_filter.h and
utilities/lua/rocks_lua_compaction_filter.cc to your own codebase.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4971

Differential Revision: D14047468

Pulled By: riversand963

fbshipit-source-id: 9ad1a6484a7c94e478f1e108127a3184e4069f70
2019-02-13 12:42:44 -08:00
Yanqin Jin
95604d13e9 Change the command to invoke parallel tests (#4922)
Summary:
We used to call `printf $(t_run)` and later feed the result to GNU parallel in the recipe of target `check_0`. However, this approach is problematic when the length of $(t_run) exceeds the
maximum length of a command and the `printf` command cannot be executed. Instead we use 'find -print' to avoid generating an overly long command.

**This PR is actually the last commit of #4916. Prefer to merge this PR separately.**
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4922

Differential Revision: D13845883

Pulled By: riversand963

fbshipit-source-id: b56de7f7af43337c6ec89b931de843c9667cb679
2019-01-28 15:02:26 -08:00
Yanqin Jin
e1de88c8c7 Escape '.' by adding a '\' to avoid matching any char
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4912

Differential Revision: D13789449

Pulled By: riversand963

fbshipit-source-id: 0639dae82049b7ac977c8f81851f1c9fdc346705
2019-01-24 11:25:27 -08:00
Remington Brasga
1eded07f00 Bug in Regular Expression in Makefile (#4682)
Summary:
False-negative about path not existing. The regex is ignoring the "." in front of a path.
Example: "./path/to/file"
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4682

Differential Revision: D13777110

Pulled By: sagar0

fbshipit-source-id: 9f8173b7581407555fdc055580732aeab37d4ade
2019-01-23 10:24:10 -08:00
Varadharajan
349c7cceff Fix downloaded filename of snappy (#4870)
Summary:
Build failing due to incorrect filename.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4870

Differential Revision: D13637205

Pulled By: sagar0

fbshipit-source-id: 72da45d51b49bce32f696532ba0656ee0dc2b89f
2019-01-11 10:29:40 -08:00
Siying Dong
1fb2e274c5 Remove some components (#4101)
Summary:
Remove some components that we never heard people using them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4101

Differential Revision: D8825431

Pulled By: siying

fbshipit-source-id: 97a12ad3cad4ab12c82741a5ba49669aaa854180
2019-01-10 13:30:09 -08:00
Jakub Tomanik
71a69d9b68 Fix building RocksDB for iOS (#4687)
Summary:
This PR contains the following fixes:

1. Fixing Makefile to support non-default locations of developer tools

2. Fixing compile error using a patch from https://github.com/facebook/rocksdb/pull/4007
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4687

Differential Revision: D13287263

Pulled By: riversand963

fbshipit-source-id: 4525eb42ba7b6f82af5f9bfb8e52fa4024e27ccc
2018-12-19 14:13:55 -08:00
Adam Retter
257b458121 Update the version of the dependencies used by the RocksJava static build (#4761)
Summary:
Note that Snappy now requires CMake to build it, so I added a note about RocksJava to the README.md file.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4761

Differential Revision: D13403811

Pulled By: ajkr

fbshipit-source-id: 8fcd7e3dc7f7152080364a374d3065472f417eff
2018-12-18 20:25:43 -08:00
Abhishek Madan
81b6b09f6b Remove v1 RangeDelAggregator (#4778)
Summary:
Now that v2 is fully functional, the v1 aggregator is removed.
The v2 aggregator has been renamed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4778

Differential Revision: D13495930

Pulled By: abhimadan

fbshipit-source-id: 9d69500a60a283e79b6c4fa938fc68a8aa4d40d6
2018-12-17 17:33:46 -08:00
Adam Retter
84001cfa96 Cache dependencies for static build of RocksJava (#4769)
Summary:
Avoids re-downloading the .tar.gz files for the static build of RocksJava if they are already present.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4769

Differential Revision: D13491919

Pulled By: ajkr

fbshipit-source-id: 9265f577e049838dc40335d54f1ff2b4f972c38c
2018-12-17 13:32:24 -08:00
Huachao Huang
5e72bc113a Add SstFileReader to read sst files (#4717)
Summary:
A user friendly sst file reader is useful when we want to access sst
files outside of RocksDB. For example, we can generate an sst file
with SstFileWriter and send it to other places, then use SstFileReader
to read the file and process the entries in other ways.

Also rename the original SstFileReader to SstFileDumper because of
name conflict, and seems SstFileDumper is more appropriate for tools.

TODO: there is only a very simple test now, because I want to get some feedback first.
If the changes look good, I will add more tests soon.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4717

Differential Revision: D13212686

Pulled By: ajkr

fbshipit-source-id: 737593383264c954b79e63edaf44aaae0d947e56
2018-11-27 13:02:23 -08:00
Abhishek Madan
457f77b9ff Introduce RangeDelAggregatorV2 (#4649)
Summary:
The old RangeDelAggregator did expensive pre-processing work
to create a collapsed, binary-searchable representation of range
tombstones. With FragmentedRangeTombstoneIterator, much of this work is
now unnecessary. RangeDelAggregatorV2 takes advantage of this by seeking
in each iterator to find a covering tombstone in ShouldDelete, while
doing minimal work in AddTombstones. The old RangeDelAggregator is still
used during flush/compaction for now, though RangeDelAggregatorV2 will
support those uses in a future PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4649

Differential Revision: D13146964

Pulled By: abhimadan

fbshipit-source-id: be29a4c020fc440500c137216fcc1cf529571eb3
2018-11-21 10:56:45 -08:00
Sagar Vemuri
dd742e2416 Automatically set LITE=1 on passing OPT="-DROCKSDB_LITE" (#4671)
Summary:
In #4652 we are setting -Os for lite builds only when LITE=1 is specified. But currently almost all the users invoke lite build via OPT="-DROCKSDB_LITE=1". So this diff tries to set LITE=1 when users already pass in -DROCKSDB_LITE=1 via the command line.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4671

Differential Revision: D13033801

Pulled By: sagar0

fbshipit-source-id: e7b506cee574f9e3f42221ee6647915011c78d78
2018-11-12 16:58:54 -08:00
Yi Wu
7a2f98a0fc Fix liblua link error when building shared lib under fbcode (#4651)
Summary:
When running `make shared_lib` under fbcode, there's liblua link error: https://gist.github.com/yiwu-arbug/b796bff6b3d46d90c1ed878d983de50d
This is because we link liblua.a when building shared lib. If we want to link with liblua, we need to link with liblua_pic.a instead. Fixing by simply not link with lua.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4651

Differential Revision: D12964798

Pulled By: yiwu-arbug

fbshipit-source-id: 18d6cee94afe20748068822b76e29ef255cdb04d
2018-11-09 14:13:40 -08:00
Yi Wu
8c2a48742a Use -Os for lite release build (#4652)
Summary:
Set `-Os` for lite release build to minimize binary size.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4652

Differential Revision: D12965427

Pulled By: yiwu-arbug

fbshipit-source-id: c8b647642c24b3e5df6a2cd13112e452a08e8398
2018-11-07 22:10:28 -08:00
Yanqin Jin
912bbbbc72 Enable crash-recovery stress test for atomic flush (#4605)
Summary:
This PR adds test of atomic flush to our continuous stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605

Differential Revision: D12840607

Pulled By: riversand963

fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9
2018-10-30 14:03:36 -07:00
Abhishek Madan
8c78348c77 Use only "local" range tombstones during Get (#4449)
Summary:
Previously, range tombstones were accumulated from every level, which
was necessary if a range tombstone in a higher level covered a key in a lower
level. However, RangeDelAggregator::AddTombstones's complexity is based on
the number of tombstones that are currently stored in it, which is wasteful in
the Get case, where we only need to know the highest sequence number of range
tombstones that cover the key from higher levels, and compute the highest covering
sequence number at the current level. This change introduces this optimization, and
removes the use of RangeDelAggregator from the Get path.

In the benchmark results, the following command was used to initialize the database:
```
./db_bench -db=/dev/shm/5k-rts -use_existing_db=false -benchmarks=filluniquerandom -write_buffer_size=1048576 -compression_type=lz4 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 -value_size=112 -key_size=16 -block_size=4096 -level_compaction_dynamic_level_bytes=true -num=5000000 -max_background_jobs=12 -benchmark_write_rate_limit=20971520 -range_tombstone_width=100 -writes_per_range_tombstone=100 -max_num_range_tombstones=50000 -bloom_bits=8
```

...and the following command was used to measure read throughput:
```
./db_bench -db=/dev/shm/5k-rts/ -use_existing_db=true -benchmarks=readrandom -disable_auto_compactions=true -num=5000000 -reads=100000 -threads=32
```

The filluniquerandom command was only run once, and the resulting database was used
to measure read performance before and after the PR. Both binaries were compiled with
`DEBUG_LEVEL=0`.

Readrandom results before PR:
```
readrandom   :       4.544 micros/op 220090 ops/sec;   16.9 MB/s (63103 of 100000 found)
```

Readrandom results after PR:
```
readrandom   :      11.147 micros/op 89707 ops/sec;    6.9 MB/s (63103 of 100000 found)
```

So it's actually slower right now, but this PR paves the way for future optimizations (see #4493).

----
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4449

Differential Revision: D10370575

Pulled By: abhimadan

fbshipit-source-id: 9a2e152be1ef36969055c0e9eb4beb0d96c11f4d
2018-10-24 12:31:12 -07:00
Yi Wu
742302a1a3 Fix compile error with aligned-new (#4576)
Summary:
In fbcode when we build with clang7++, although -faligned-new is available in compile phase, we link with an older version of libstdc++.a and it doesn't come with aligned-new support (e.g. `nm libstdc++.a | grep align_val_t` return empty). In this case the previous -faligned-new detection can pass but will end up with link error. Fixing it by only have the detection for non-fbcode build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4576

Differential Revision: D10500008

Pulled By: yiwu-arbug

fbshipit-source-id: b375de4fbb61d2a08e54ab709441aa8e7b4b08cf
2018-10-23 10:55:41 -07:00
Yi Wu
d6f2ecf49c Utility to run task periodically in a thread (#4423)
Summary:
Introduce `RepeatableThread` utility to run task periodically in a separate thread. It is basically the same as the the same class in fbcode, and in addition provide a helper method to let tests mock time and trigger execution one at a time.

We can use this class to replace `TimerQueue` in #4382 and `BlobDB`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4423

Differential Revision: D10020932

Pulled By: yiwu-arbug

fbshipit-source-id: 3616bef108c39a33c92eedb1256de424b7c04087
2018-09-27 15:28:00 -07:00
Abhishek Madan
1626f6ab6b Add RangeDelAggregator microbenchmarks (#4363)
Summary:
To measure the results of upcoming DeleteRange v2 work, this commit adds
simple benchmarks for RangeDelAggregator. It measures the average time
for AddTombstones and ShouldDelete calls.

Using this to compare the results before #4014 and on the latest master (using the default arguments) produces the following results:

Before #4014:
```
=======================
Results:
=======================
AddTombstones:          1356.28 us
ShouldDelete:           0.401732 us
```

Latest master:
```
=======================
Results:
=======================
AddTombstones:          740.82 us
ShouldDelete:           0.383271 us
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4363

Differential Revision: D9881676

Pulled By: abhimadan

fbshipit-source-id: 793e7d61aa4b9d47eb917bbcc03f08695b5e5442
2018-09-17 14:58:31 -07:00
Yanqin Jin
3ba3b153ef Fix Makefile target 'jtest' on PowerPC (#4357)
Summary:
Before the fix:
On a PowerPC machine, run the following
```
$ make jtest
```
The command will fail due to "undefined symbol: crc32c_ppc". It was caused by
'rocksdbjava' Makefile target not including crc32c_ppc object files when
generating the shared lib. The fix is simple.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4357

Differential Revision: D9779474

Pulled By: riversand963

fbshipit-source-id: 3c5ec9068c2b9c796e6500f71cd900267064fd51
2018-09-11 16:37:23 -07:00
Zhongyi Xie
1cf17ba53b Rename DecodeCFAndKey to resolve naming conflict in unity test (#4323)
Summary:
Currently unity-test is failing because both trace_replay.cc and trace_analyzer_tool.cc defined `DecodeCFAndKey` under anonymous namespace. It is supposed to be fine except unity test will dump all source files together and now we have a conflict.
Another issue with trace_analyzer_tool.cc is that it is using some utility functions from ldb_cmd which is not included in Makefile for unity_test, I chose to update TESTHARNESS to include LIBOBJECTS. Feel free to comment if there is a less intrusive way to solve this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4323

Differential Revision: D9599170

Pulled By: miasantreble

fbshipit-source-id: 38765b11f8e7de92b43c63bdcf43ea914abdc029
2018-08-30 18:42:51 -07:00
Sagar Vemuri
2f871bc85e Download bzip2 packages from Internet Archive (#4306)
Summary:
Since bzip.org is no longer maintained, download the bzip2 packages from a snapshot taken by the internet archive until we figure out a more credible source.

Fixes issue: #4305
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4306

Differential Revision: D9514868

Pulled By: sagar0

fbshipit-source-id: 57c6a141a62e652f94377efc7ca9916b458e68d5
2018-08-27 09:58:24 -07:00
Zhichao Cao
9e2d5ab6bf Adjusted the Makefile of trace_analyzer to isolate the Gflags from other (#4290)
Summary:
Previously, the trace_analyzer_tool will be complied with other libobjects, which let the GFLAGS of trace_analyzer appear in other tools (e.g., db_bench, rocksdb_dump, and etc.). When using '--help', the help information of trace_analyzer will appear in other tool help information, which will cause confusion issues.

Currently, trace_analyzer_tool is built and used only by trace_analyzer and trace_analyzer_test to avoid the issues.

Tested with make asan_check.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4290

Differential Revision: D9413163

Pulled By: zhichao-cao

fbshipit-source-id: ed5d20c4575a53ca15ff62a2ffe601d5cf278cc4
2018-08-21 10:47:24 -07:00
Zhichao Cao
999d955e4f RocksDB Trace Analyzer (#4091)
Summary:
A framework of trace analyzing for RocksDB

After collecting the trace by using the tool of [PR #3837](https://github.com/facebook/rocksdb/pull/3837). User can use the Trace Analyzer to interpret, analyze, and characterize the collected workload.
**Input:**
1. trace file
2. Whole keys space file

**Statistics:**
1. Access count of each operation (Get, Put, Delete, SingleDelete, DeleteRange, Merge) in each column family.
2. Key hotness (access count) of each one
3. Key space separation based on given prefix
4. Key size distribution
5. Value size distribution if appliable
6. Top K accessed keys
7. QPS statistics including the average QPS and peak QPS
8. Top K accessed prefix
9. The query correlation analyzing, output the number of X after Y and the corresponding average time
    intervals

**Output:**
1. key access heat map (either in the accessed key space or whole key space)
2. trace sequence file (interpret the raw trace file to line base text file for future use)
3. Time serial (The key space ID and its access time)
4. Key access count distritbution
5. Key size distribution
6. Value size distribution (in each intervals)
7. whole key space separation by the prefix
8. Accessed key space separation by the prefix
9. QPS of each operation and each column family
10. Top K QPS and their accessed prefix range

**Test:**
1. Added the unit test of analyzing Get, Put, Delete, SingleDelete, DeleteRange, Merge
2. Generated the trace and analyze the trace

**Implemented but not tested (due to the limitation of trace_replay):**
1. Analyzing Iterator, supporting Seek() and SeekForPrev() analyzing
2. Analyzing the number of Key found by Get

**Future Work:**
1.  Support execution time analyzing of each requests
2.  Support cache hit situation and block read situation of Get
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4091

Differential Revision: D9256157

Pulled By: zhichao-cao

fbshipit-source-id: f0ceacb7eedbc43a3eee6e85b76087d7832a8fe6
2018-08-13 11:44:02 -07:00
Yanqin Jin
bdc6abd0b4 Enable cscope to exclude test source files (#4190)
Summary:
Usually when using cscope, the query results contain a lot of function calls in test, making it hard to browse. So this PR aims to provide an option to exclude test source files.

Add a new PHONY target, tags0, to exclude test source files while using cscope.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4190

Differential Revision: D9015901

Pulled By: riversand963

fbshipit-source-id: ea9a45756ccff5b26344d37e9ff1c02c5d9736d6
2018-07-26 11:12:29 -07:00
Fenggang Wu
8805ec2f49 DataBlockHashIndex: Standalone Implementation with Unit Test (#4139)
Summary:
The first step of the `DataBlockHashIndex` implementation. A string based hash table is implemented and unit-tested.

`DataBlockHashIndexBuilder`: `Add()` takes pairs of `<key, restart_index>`, and formats it into a string when `Finish()` is called.
`DataBlockHashIndex`: initialized by the formatted string, and can interpret it as a hash table. Lookup for a key is supported by iterator operation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4139

Reviewed By: sagar0

Differential Revision: D8866764

Pulled By: fgwu

fbshipit-source-id: 7f015f0098632c65979a22898a50424384730b10
2018-07-24 11:43:37 -07:00
Adam Retter
c6d2a7f821 Build improvements: Split docker targets and parallelize java builds
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/4165

Differential Revision: D8955531

Pulled By: sagar0

fbshipit-source-id: 97d5a1375e200bde3c6414f94703504a4ed7536a
2018-07-23 13:28:37 -07:00
Maysam Yabandeh
537a233941 Exclude StackableDB from transaction stress tests (#4132)
Summary:
The transactions are currently tested with and without using StackableDB. This is mostly to check that the code path is consistent with stackable db as well. Slow, stress tests however do not benefit from being run again with StackableDB. The patch excludes StackableDB from such tests.
On a single core it reduced the runtime of transaction_test from 199s to 135s.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4132

Differential Revision: D8841655

Pulled By: maysamyabandeh

fbshipit-source-id: 7b9aaba2673b542b195439dfb306cef26bd63b19
2018-07-13 13:59:11 -07:00
Anand Ananthabhotla
52d4c9b7f6 Allow DB resume after background errors (#3997)
Summary:
Currently, if RocksDB encounters errors during a write operation (user requested or BG operations), it sets DBImpl::bg_error_ and fails subsequent writes. This PR allows the DB to be resumed for certain classes of errors. It consists of 3 parts -
1. Introduce Status::Severity in rocksdb::Status to indicate whether a given error can be recovered from or not
2. Refactor the error handling code so that setting bg_error_ and deciding on severity is in one place
3. Provide an API for the user to clear the error and resume the DB instance

This whole change is broken up into multiple PRs. Initially, we only allow clearing the error for Status::NoSpace() errors during background flush/compaction. Subsequent PRs will expand this to include more errors and foreground operations such as Put(), and implement a polling mechanism for out-of-space errors.
Closes https://github.com/facebook/rocksdb/pull/3997

Differential Revision: D8653831

Pulled By: anand1976

fbshipit-source-id: 6dc835c76122443a7668497c0226b4f072bc6afd
2018-06-28 12:34:40 -07:00
Daniel Black
346d1069c3 Align StatisticsImpl / StatisticsData (#4036)
Summary:
Pinned the alignment of StatisticsData to the cacheline size rather than just extending its size (which could go over two cache lines)if unaligned in allocation.

Avoid compile errors in the process as per individual commit messages.

strengthen static_assert to CACHELINE rather than the highest common multiple.
Closes https://github.com/facebook/rocksdb/pull/4036

Differential Revision: D8582844

Pulled By: yiwu-arbug

fbshipit-source-id: 363c37029f28e6093e06c60b987bca9aa204bc71
2018-06-25 22:58:19 -07:00
Adam Retter
64c85d0d97 Set DEBUG_LEVEL=0 for RocksJava Mac Release (#4040)
Summary:
Closes https://github.com/facebook/rocksdb/issues/2717
Closes https://github.com/facebook/rocksdb/pull/4040

Differential Revision: D8592058

Pulled By: sagar0

fbshipit-source-id: d01099a1067aa32659abb0b4bed641d919a3927e
2018-06-22 10:57:48 -07:00
Manuel Ung
01e3c30def Extend existing unit tests to run with WriteUnprepared as well
Summary:
As titled.

I have not extended the Compatibility tests because the new WAL markers are still unimplemented.
Closes https://github.com/facebook/rocksdb/pull/3941

Differential Revision: D8238394

Pulled By: lth

fbshipit-source-id: 980e3d44837bbf2cfa64047f9738f559dfac4b1d
2018-06-01 14:58:41 -07:00
Mike Kolupaev
8bf555f487 Change and clarify the relationship between Valid(), status() and Seek*() for all iterators. Also fix some bugs
Summary:
Before this PR, Iterator/InternalIterator may simultaneously have non-ok status() and Valid() = true. That state means that the last operation failed, but the iterator is nevertheless positioned on some unspecified record. Likely intended uses of that are:
 * If some sst files are corrupted, a normal iterator can be used to read the data from files that are not corrupted.
 * When using read_tier = kBlockCacheTier, read the data that's in block cache, skipping over the data that is not.

However, this behavior wasn't documented well (and until recently the wiki on github had misleading incorrect information). In the code there's a lot of confusion about the relationship between status() and Valid(), and about whether Seek()/SeekToLast()/etc reset the status or not. There were a number of bugs caused by this confusion, both inside rocksdb and in the code that uses rocksdb (including ours).

This PR changes the convention to:
 * If status() is not ok, Valid() always returns false.
 * Any seek operation resets status. (Before the PR, it depended on iterator type and on particular error.)

This does sacrifice the two use cases listed above, but siying said it's ok.

Overview of the changes:
 * A commit that adds missing status checks in MergingIterator. This fixes a bug that actually affects us, and we need it fixed. `DBIteratorTest.NonBlockingIterationBugRepro` explains the scenario.
 * Changes to lots of iterator types to make all of them conform to the new convention. Some bug fixes along the way. By far the biggest changes are in DBIter, which is a big messy piece of code; I tried to make it less big and messy but mostly failed.
 * A stress-test for DBIter, to gain some confidence that I didn't break it. It does a few million random operations on the iterator, while occasionally modifying the underlying data (like ForwardIterator does) and occasionally returning non-ok status from internal iterator.

To find the iterator types that needed changes I searched for "public .*Iterator" in the code. Here's an overview of all 27 iterator types:

Iterators that didn't need changes:
 * status() is always ok(), or Valid() is always false: MemTableIterator, ModelIter, TestIterator, KVIter (2 classes with this name anonymous namespaces), LoggingForwardVectorIterator, VectorIterator, MockTableIterator, EmptyIterator, EmptyInternalIterator.
 * Thin wrappers that always pass through Valid() and status(): ArenaWrappedDBIter, TtlIterator, InternalIteratorFromIterator.

Iterators with changes (see inline comments for details):
 * DBIter - an overhaul:
    - It used to silently skip corrupted keys (`FindParseableKey()`), which seems dangerous. This PR makes it just stop immediately after encountering a corrupted key, just like it would for other kinds of corruption. Let me know if there was actually some deeper meaning in this behavior and I should put it back.
    - It had a few code paths silently discarding subiterator's status. The stress test caught a few.
    - The backwards iteration code path was expecting the internal iterator's set of keys to be immutable. It's probably always true in practice at the moment, since ForwardIterator doesn't support backwards iteration, but this PR fixes it anyway. See added DBIteratorTest.ReverseToForwardBug for an example.
    - Some parts of backwards iteration code path even did things like `assert(iter_->Valid())` after a seek, which is never a safe assumption.
    - It used to not reset status on seek for some types of errors.
    - Some simplifications and better comments.
    - Some things got more complicated from the added error handling. I'm open to ideas for how to make it nicer.
 * MergingIterator - check status after every operation on every subiterator, and in some places assert that valid subiterators have ok status.
 * ForwardIterator - changed to the new convention, also slightly simplified.
 * ForwardLevelIterator - fixed some bugs and simplified.
 * LevelIterator - simplified.
 * TwoLevelIterator - changed to the new convention. Also fixed a bug that would make SeekForPrev() sometimes silently ignore errors from first_level_iter_.
 * BlockBasedTableIterator - minor changes.
 * BlockIter - replaced `SetStatus()` with `Invalidate()` to make sure non-ok BlockIter is always invalid.
 * PlainTableIterator - some seeks used to not reset status.
 * CuckooTableIterator - tiny code cleanup.
 * ManagedIterator - fixed some bugs.
 * BaseDeltaIterator - changed to the new convention and fixed a bug.
 * BlobDBIterator - seeks used to not reset status.
 * KeyConvertingIterator - some small change.
Closes https://github.com/facebook/rocksdb/pull/3810

Differential Revision: D7888019

Pulled By: al13n321

fbshipit-source-id: 4aaf6d3421c545d16722a815b2fa2e7912bc851d
2018-05-17 02:56:56 -07:00
Andrew Kryczka
6fc1bccef5 Fix crash test allocation error under TSAN
Summary:
We were seeing the following error: "ThreadSanitizer: DenseSlabAllocator overflow. Dying."

It is fixable by mmap'ing a smaller region for keys' expected values, which this PR achieves by reducing the number of keys.
Closes https://github.com/facebook/rocksdb/pull/3803

Differential Revision: D7874478

Pulled By: ajkr

fbshipit-source-id: 433939f5cb92410ab4777d540cb0cc2ee0fe6c2e
2018-05-04 13:44:04 -07:00
David Lai
3be9b36453 comment unused parameters to turn on -Wunused-parameter flag
Summary:
This PR comments out the rest of the unused arguments which allow us to turn on the -Wunused-parameter flag. This is the second part of a codemod relating to https://github.com/facebook/rocksdb/pull/3557.
Closes https://github.com/facebook/rocksdb/pull/3662

Differential Revision: D7426121

Pulled By: Dayvedde

fbshipit-source-id: 223994923b42bd4953eb016a0129e47560f7e352
2018-04-12 17:59:16 -07:00
zhsj
6571770030 fix shared libary compile on ppc
Summary:
shared-ppc-objects is missed in $(SHARED4) target
Closes https://github.com/facebook/rocksdb/pull/3619

Differential Revision: D7475767

Pulled By: ajkr

fbshipit-source-id: d957ac7290bab3cd542af504405fb5ff912bfbf1
2018-04-05 19:58:20 -07:00
Adam Retter
3cb591954e Allow rocksdbjavastatic to also be built as debug build
Summary: Closes https://github.com/facebook/rocksdb/pull/3654

Differential Revision: D7417948

Pulled By: sagar0

fbshipit-source-id: 9514df9328181e54a6384764444c0c7ce66e7f5f
2018-03-28 16:30:36 -07:00
Yanqin Jin
1f5def1653 Fix race condition causing double deletion of ssts
Summary:
Possible interleaved execution of background compaction thread calling `FindObsoleteFiles (no full scan) / PurgeObsoleteFiles` and user thread calling `FindObsoleteFiles (full scan) / PurgeObsoleteFiles` can lead to race condition on which RocksDB attempts to delete a file twice. The second attempt will fail and return `IO error`. This may occur to other files,  but this PR targets sst.
Also add a unit test to verify that this PR fixes the issue.

The newly added unit test `obsolete_files_test` has a test case for this scenario, implemented in `ObsoleteFilesTest#RaceForObsoleteFileDeletion`. `TestSyncPoint`s are used to coordinate the interleaving the `user_thread` and background compaction thread. They execute as follows
```
timeline              user_thread                background_compaction thread
t1   |                                          FindObsoleteFiles(full_scan=false)
t2   |     FindObsoleteFiles(full_scan=true)
t3   |                                          PurgeObsoleteFiles
t4   |     PurgeObsoleteFiles
     V
```
When `user_thread` invokes `FindObsoleteFiles` with full scan, it collects ALL files in RocksDB directory, including the ones that background compaction thread have collected in its job context. Then `user_thread` will see an IO error when trying to delete these files in `PurgeObsoleteFiles` because background compaction thread has already deleted the file in `PurgeObsoleteFiles`.
To fix this, we make RocksDB remember which (SST) files have been found by threads after calling `FindObsoleteFiles` (see `DBImpl#files_grabbed_for_purge_`). Therefore, when another thread calls `FindObsoleteFiles` with full scan, it will not collect such files.

ajkr could you take a look and comment? Thanks!
Closes https://github.com/facebook/rocksdb/pull/3638

Differential Revision: D7384372

Pulled By: riversand963

fbshipit-source-id: 01489516d60012e722ee65a80e1449e589ce26d3
2018-03-28 10:29:59 -07:00
Sagar Vemuri
23f9d93f47 Exclude MySQLStyleTransactionTest.TransactionStressTest* from valgrind
Summary:
I found that each instance of MySQLStyleTransactionTest.TransactionStressTest/x is taking more than 10 hours to complete on our continuous testing environment, causing the whole valgrind run to timeout after a day. So excluding these tests.
Closes https://github.com/facebook/rocksdb/pull/3652

Differential Revision: D7400332

Pulled By: sagar0

fbshipit-source-id: 987810574506d01487adf7c2de84d4817ec3d22d
2018-03-26 10:27:47 -07:00
Tobias Tschinkowitz
ccb761364d Enable compilation on OpenBSD
Summary:
I modified the Makefile so that we can compile rocksdb on OpenBSD.
The instructions for building have been added to INSTALL.md.
The whole compilation process works fine like this on OpenBSD-current
Closes https://github.com/facebook/rocksdb/pull/3617

Differential Revision: D7323754

Pulled By: siying

fbshipit-source-id: 990037d1cc69138d22f85bd77ef4dc8c1ba9edea
2018-03-19 12:30:05 -07:00
Yanqin Jin
1139422dfb Fix the command used to generate ctags
Summary:
In original $ROCKSDB_HOME/Makefile, the command used to generate ctags is
```
ctags * -R
```
However, this failed to generate tags for me.
I did some search on the usage of ctags command and found that it should be
```
ctags -R .
```
or
```
ctags -R *
```
After the change, I can find the tags in vim using `:ts <identifier>`.
Closes https://github.com/facebook/rocksdb/pull/3626

Reviewed By: ajkr

Differential Revision: D7320217

Pulled By: riversand963

fbshipit-source-id: e4cd8f8a67842370a2343f0213df3cbd07754111
2018-03-18 22:43:18 -07:00
Zhongyi Xie
09e5d7af8c add 4th test_group in travis
Summary:
to overcome the space limitation
Closes https://github.com/facebook/rocksdb/pull/3605

Differential Revision: D7262607

Pulled By: miasantreble

fbshipit-source-id: 1b1148026f17a7ee4b9f3a17ddc6b4ba9cf7af7f
2018-03-13 18:57:29 -07:00
Pengchao Wang
1ccdc2c337 Fix vagrant build process
Summary:
https://blog.github.com/2018-02-23-weak-cryptographic-standards-removed/
Github dropped supporting some weak cryptographic protocols from their website couple of weeks ago which cause our vagrant build process to fail on curl downloading step.  This diff force curl use tls v1.2 protocol if it is supported so that it does not rely on the default protocol on different systems.
Closes https://github.com/facebook/rocksdb/pull/3561

Differential Revision: D7148575

Pulled By: wpc

fbshipit-source-id: b8cecfdfeb2bc8236de2d0d14f044532befec98c
2018-03-05 11:57:41 -08:00
Siying Dong
74748611a8 Suppress UBSAN error in finer guanularity
Summary:
Now we suppress alignment UBSAN error as a whole. Suppressing 3-way CRC and murmurhash feels a better idea than turning off alignment check as a whole.
Closes https://github.com/facebook/rocksdb/pull/3495

Differential Revision: D6971273

Pulled By: siying

fbshipit-source-id: 080b59fed6df494b9f622ef7cb5d42d39e6a8cdf
2018-02-13 12:18:07 -08:00
Siying Dong
821e0b1683 Disable options_settable_test in UBSAN and fix UBSAN failure in blob_…
Summary:
…db_test

options_settable_test won't pass UBSAN so disable it.
blob_db_test fails in UBSAN as SnapshotList doesn't initialize all the fields in dummy snapshot. Fix it. I don't understand why only blob_db_test fails though.
Closes https://github.com/facebook/rocksdb/pull/3477

Differential Revision: D6928681

Pulled By: siying

fbshipit-source-id: e31dd300fcdecdfd4f6af279a0987fd0cdec5122
2018-02-07 14:42:26 -08:00
Siying Dong
1336a7742d Disable alignment check in UBSAN
Summary:
Disable alignment check in UBSAN for now. Now we can't get signals to meaningful failures. We can reenable it after we figure out how we can suppress failures in finer grain manner.
Closes https://github.com/facebook/rocksdb/pull/3473

Differential Revision: D6925971

Pulled By: siying

fbshipit-source-id: a0f1a242cde866abbc5c1eeee9ff8d1d7d582ac4
2018-02-07 10:58:01 -08:00
Zhongyi Xie
5eccf0b9d5 add -fno-sanitize-recover option to force exit on errors
Summary:
By default if ubsan detects any problem, it outputs a “runtime error:” message, and in most cases continues executing the program.
In order to make test abort on errors, option `-fno-sanitize-recover` is needed. [link](https://clang.llvm.org/docs/UndefinedBehaviorSanitizer.html)
Closes https://github.com/facebook/rocksdb/pull/3447

Differential Revision: D6854654

Pulled By: miasantreble

fbshipit-source-id: c48e892b0b38307029df38a67adda0e24257e481
2018-01-31 12:13:00 -08:00
Sagar Vemuri
68829ed89c Revert Snappy version upgrade
Summary:
Java static builds are again broken, this time due Snappy version upgrade introduced in 90c1d81975 (#3331).

This is due to two reasons:
1. The new Snappy packages should now be downloaded from https://github.com/google/snappy/archive/<pkg.tar.gz> instead of https://github.com/google/snappy/releases/download/<pkg.tar.gz> which we are using now.
1. In addition to the the above URL change, Snappy changed its build from using autotools to CMake based : e69d9f8806/README.md (L65-L72)

So more changes are needed if we are going to upgrade to 1.1.7. Hence reverting the version upgrade until we figure them out.
Closes https://github.com/facebook/rocksdb/pull/3363

Differential Revision: D6716983

Pulled By: sagar0

fbshipit-source-id: f451a1bc5eb0bb090f4da07bc3e5ba72cf89aefa
2018-01-12 23:41:43 -08:00
Sagar Vemuri
e446d14093 Fix PowerPC dynamic java build
Summary:
Java build on PPC64le has been broken since a few months, due to #2716. Fixing it with the least amount of changes.
(We should cleanup a little around this code when time permits).

This should fix the build failures seen in http://140.211.168.68:8080/job/Rocksdb/ .
Closes https://github.com/facebook/rocksdb/pull/3359

Differential Revision: D6712938

Pulled By: sagar0

fbshipit-source-id: 3046e8f072180693de2af4762934ec1ace309ca4
2018-01-12 10:57:14 -08:00
Adam Retter
a53c571d2d FreeBSD build support for RocksDB and RocksJava
Summary:
Tested on a clean FreeBSD 11.01 x64.

Closes https://github.com/facebook/rocksdb/pull/1423
Closes https://github.com/facebook/rocksdb/pull/3357

Differential Revision: D6705868

Pulled By: sagar0

fbshipit-source-id: cbccbbdafd4f42922512ca03619a5d5583a425fd
2018-01-11 13:29:55 -08:00
Sagar Vemuri
3e955fad09 Fix zstd/zdict include path for java static build
Summary:
With the ZSTD dictionary generator support added in #3057
`PORTABLE=1 ROCKSDB_NO_FBCODE=1 make rocksdbjavastatic` fails as it can't find zdict.h. Specifically due to:
e3a06f12d2/util/compression.h (L39)
In java static builds zstd code gets directly downloaded from https://github.com/facebook/zstd , and in there zdict.h is under dictBuilder directory. So, I modified libzstd.a target to use `make install` to collect all the header files into a single location and used that as the zstd's include path.
Closes https://github.com/facebook/rocksdb/pull/3260

Differential Revision: D6669850

Pulled By: sagar0

fbshipit-source-id: f8a7562a670e5aed4c4fb6034a921697590d7285
2018-01-05 15:41:46 -08:00
Adam Retter
90c1d81975 Update javastatic dependencies
Summary:
1. Snappy 1.1.7
2. LZ4 1.8.0
3. ZSTD  1.3.3
Closes https://github.com/facebook/rocksdb/pull/3331

Differential Revision: D6667933

Pulled By: ajkr

fbshipit-source-id: 21c526609df7580481195a389d31f733e2695e65
2018-01-05 12:11:44 -08:00
Andrew Kryczka
ea8ccd2267 fix powerpc java static build
Summary:
added support for C and asm files as required for e612e31740.
Closes https://github.com/facebook/rocksdb/pull/3299

Differential Revision: D6612479

Pulled By: ajkr

fbshipit-source-id: 6263ed7c1602f249460421825c76b5721f396163
2018-01-03 12:41:37 -08:00
yingsu00
f54d7f5fea Port 3 way SSE4.2 crc32c implementation from Folly
Summary:
**# Summary**

RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain.

This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used.

**# Performance Test Results**
We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm.

Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s.

1) ReadRandom in db_bench overall metrics

    PER RUN
    Algorithm | run | micros/op | ops/sec |Throughput (MB/s)
    3-way      |  1   | 4.143   | 241387 | 26.7
    3-way      |  2   | 3.775   | 264872 | 29.3
    3-way      | 3    | 4.116   | 242929 | 26.9
    FastCrc32c|1  | 4.037   | 247727 | 27.4
    FastCrc32c|2  | 4.648   | 215166 | 23.8
    FastCrc32c|3  | 4.352   | 229799 | 25.4

     AVG
    Algorithm     |    Average of micros/op |   Average of ops/sec |    Average of Throughput (MB/s)
    3-way           |     4.01                               |      249,729                 |      27.63
    FastCrc32c  |     4.35                              |     230,897                  |      25.53

 2)   Crc32c computation CPU cost (inclusive samples percentage)
    PER RUN
    Implementation | run |  TotalSamples   | Crc32c percentage
    3-way                 |  1    |  4,572,250,000 | 4.37%
    3-way                 |  2    |  3,779,250,000 | 4.62%
    3-way                 |  3    |  4,129,500,000 | 4.48%
    FastCrc32c       |  1    |  4,663,500,000 | 11.24%
    FastCrc32c       |  2    |  4,047,500,000 | 12.34%
    FastCrc32c       |  3    |  4,366,750,000 | 11.68%

 **# Test Plan**
     make -j64 corruption_test && ./corruption_test
      By default it uses 3-way SSE algorithm

     NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test

    make clean && DEBUG_LEVEL=0 make -j64 db_bench
    make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench
Closes https://github.com/facebook/rocksdb/pull/3173

Differential Revision: D6330882

Pulled By: yingsu00

fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
2017-12-19 18:26:49 -08:00
Maysam Yabandeh
0faa026db6 WritePrepared Txn: make buck tests parallel
Summary:
The TSAN version of tests could take quite long. Make the buck tests parallel to avoid timeouts.
Closes https://github.com/facebook/rocksdb/pull/3280

Differential Revision: D6581594

Pulled By: maysamyabandeh

fbshipit-source-id: 3f8476d8c69f0183e394fa8a2089dd8d4e90c90c
2017-12-18 14:42:09 -08:00
Yi Wu
bbcd3b0bd2 Suppress valgrind "unimplemented functionality" error
Summary:
Add ROCKSDB_VALGRIND_RUN macro and suppress false-positive "unimplemented functionality" throw by valgrind for steam hints.

Another approach would be add a valgrind suppress file. Valgrind is suppose to print the suppression when given "--gen-suppressions=all" param, which is suppose to be the content for the suppression file. But it doesn't print.
Closes https://github.com/facebook/rocksdb/pull/3174

Differential Revision: D6338786

Pulled By: yiwu-arbug

fbshipit-source-id: 3559efa5f3b92d40d09ad6ac82bc7b59f86c75aa
2017-11-15 14:28:34 -08:00
Yi Wu
42564ada53 Blob DB: not using PinnableSlice move assignment
Summary:
The current implementation of PinnableSlice move assignment have an issue #3163. We are moving away from it instead of try to get the move assignment right, since it is too tricky.
Closes https://github.com/facebook/rocksdb/pull/3164

Differential Revision: D6319201

Pulled By: yiwu-arbug

fbshipit-source-id: 8f3279021f3710da4a4caa14fd238ed2df902c48
2017-11-13 18:12:20 -08:00
Yi Wu
31d3e41810 PinnableSlice move assignment
Summary:
Allow `std::move(pinnable_slice)`.
Closes https://github.com/facebook/rocksdb/pull/2997

Differential Revision: D6036782

Pulled By: yiwu-arbug

fbshipit-source-id: 583fb0419a97e437ff530f4305822341cd3381fa
2017-10-12 18:28:24 -07:00
Andrew Kryczka
5a6ad9d52a release build treat warnings as errors
Summary:
fixing warnings is important, especially for release code.
Closes https://github.com/facebook/rocksdb/pull/2971

Differential Revision: D5980596

Pulled By: ajkr

fbshipit-source-id: 04f4ea3fb005dcda33d60342e4361e380bc4dfb1
2017-10-05 12:41:52 -07:00
Andrew Kryczka
1026e794a3 rate limit auto-tuning
Summary:
Dynamic adjustment of rate limit according to demand for background I/O. It increases by a factor when limiter is drained too frequently, and decreases by the same factor when limiter is not drained frequently enough. The parameters for this behavior are fixed in `GenericRateLimiter::Tune`. Other changes:

- make rate limiter's `Env*` configurable for testing
- track num drain intervals in RateLimiter so we don't have to rely on stats, which may be shared across different DB instances from the ones that share the RateLimiter.
Closes https://github.com/facebook/rocksdb/pull/2899

Differential Revision: D5858704

Pulled By: ajkr

fbshipit-source-id: cc2bac30f85e7f6fd63655d0a6732ef9ed7403b1
2017-10-04 19:15:01 -07:00
Yi Wu
92ccae7123 speedup 'make check'
Summary:
Make SnapshotConcurrentAccessTest run in the beginning of the queue.

Test Plan
`make all check -j64` on devserver
Closes https://github.com/facebook/rocksdb/pull/2962

Differential Revision: D5965871

Pulled By: yiwu-arbug

fbshipit-source-id: 8cb5a47c2468be0fbbb929226a143ec5848bfaa9
2017-10-03 12:11:49 -07:00
Yi Wu
d1cab2b64e Add ValueType::kTypeBlobIndex
Summary:
Add kTypeBlobIndex value type, which will be used by blob db only, to insert a (key, blob_offset) KV pair. The purpose is to
1. Make it possible to open existing rocksdb instance as blob db. Existing value will be of kTypeIndex type, while value inserted by blob db will be of kTypeBlobIndex.
2. Make rocksdb able to detect if the db contains value written by blob db, if so return error.
3. Make it possible to have blob db optionally store value in SST file (with kTypeValue type) or as a blob value (with kTypeBlobIndex type).

The root db (DBImpl) basically pretended kTypeBlobIndex are normal value on write. On Get if is_blob is provided, return whether the value read is of kTypeBlobIndex type, or return Status::NotSupported() status if is_blob is not provided. On scan allow_blob flag is pass and if the flag is true, return wether the value is of kTypeBlobIndex type via iter->IsBlob().

Changes on blob db side will be in a separate patch.
Closes https://github.com/facebook/rocksdb/pull/2886

Differential Revision: D5838431

Pulled By: yiwu-arbug

fbshipit-source-id: 3c5306c62bc13bb11abc03422ec5cbcea1203cca
2017-10-03 09:11:23 -07:00
Adam Retter
983028f097 RocksJava build target for Docker on ppc64le
Summary:
This enables us to crossbuild pcc64le RocksJava binaries with a suitably old version of glibc (2.17) on CentOS 7.
Closes https://github.com/facebook/rocksdb/pull/2491

Differential Revision: D5955301

Pulled By: sagar0

fbshipit-source-id: 69ef9746f1dc30ffde4063dc764583d8c7ae937e
2017-10-02 11:11:56 -07:00
Kamalalochana Subbaiah
e612e31740 Updated CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2716

Differential Revision: D5606836

Pulled By: siying

fbshipit-source-id: 720262453b1546e5fdbbc668eff56848164113f3
2017-08-31 14:16:30 -07:00
Maysam Yabandeh
26ac24f199 Add more unit test to write_prepared txns
Summary: Closes https://github.com/facebook/rocksdb/pull/2798

Differential Revision: D5724173

Pulled By: maysamyabandeh

fbshipit-source-id: fb6b782d933fb4be315b1a231a6a67a66fdc9c96
2017-08-31 09:41:27 -07:00
yiwu-arbug
e367774d19 Overload new[] to properly align LRUCacheShard
Summary:
Also verify it fixes gcc7 compile failure #2672 (see also #2699)
Closes https://github.com/facebook/rocksdb/pull/2732

Differential Revision: D5620348

Pulled By: yiwu-arbug

fbshipit-source-id: 87db657ab734f23b1bfaaa9db9b9956d10eaef59
2017-08-14 14:41:56 -07:00
yiwu-arbug
ad77ee0ea0 Revert "Makefile: correct faligned-new test"
Summary:
This reverting #2699 to fix clang build.
Closes https://github.com/facebook/rocksdb/pull/2723

Differential Revision: D5610207

Pulled By: yiwu-arbug

fbshipit-source-id: 6857f4556d6d18f17b74cf81fa936d1dc0bd364c
2017-08-10 21:14:46 -07:00
Siying Dong
b87ee6f773 Use more keys per lock in daily TSAN crash test
Summary:
TSAN shows error when we grab too many locks at the same time. In TSAN crash test, make one shard key cover 2^22 keys so that no many keys will be hold at the same time.
Closes https://github.com/facebook/rocksdb/pull/2719

Differential Revision: D5609035

Pulled By: siying

fbshipit-source-id: 930e5d63fff92dbc193dc154c4c615efbdf06c6a
2017-08-10 17:56:57 -07:00
Daniel Black
1fbad84b69 Makefile: correct faligned-new test
Summary:
Commit 4f81ab38bf has the test wrong.

clang doesn't support a -dumpversion option. By lucky coincidence
clang/gcc --version both place a version number at the same output location
when --verison is passed.

Example output (1st line only).

    $ clang --version
    clang version 3.9.1 (tags/RELEASE_391/final)

    $ gcc --version
    gcc (GCC) 6.4.1 20170727 (Red Hat 6.4.1-1)

During the test of the compiler we ensure that a minimum version is met
as Makefile doesn't support patterns.

Also xcode9 doesn't seem affected by https://github.com/facebook/rocksdb/issues/2672
and also doesn't have "clang" as the first part of its output so the
fix implemented here also is Apple clang friendly.

    $ clang --version
    Apple LLVM version 9.0.0 (clang-900.0.31)

Signed-off-by: Daniel Black <daniel.black@au.ibm.com>
Closes https://github.com/facebook/rocksdb/pull/2699

Differential Revision: D5600818

Pulled By: yiwu-arbug

fbshipit-source-id: 3b0f2751becb53c1c35468bf29f3f828e7cf2c2a
2017-08-09 22:42:03 -07:00
Maysam Yabandeh
627c9f1abb Don't add -ljemalloc when DISABLE_JEMALLOC is set
Summary:
fixes #2555
Closes https://github.com/facebook/rocksdb/pull/2684

Differential Revision: D5560527

Pulled By: maysamyabandeh

fbshipit-source-id: 6e1d874ae0b4e699a77203d9d52d0bb8f59013b0
2017-08-04 10:42:32 -07:00
Cholerae Hu
4f81ab38bf Makefile: fix for GCC 7+ and clang 4+
Summary:
maysamyabandeh IslamAbdelRahman PTAL

Fix https://github.com/facebook/rocksdb/issues/2672

Signed-off-by: Cholerae Hu <huyingqian@pingcap.com>
Closes https://github.com/facebook/rocksdb/pull/2681

Differential Revision: D5561515

Pulled By: ajkr

fbshipit-source-id: 676187802ebd8a87a6c051bb565818a1bf89d0a9
2017-08-03 20:58:46 -07:00
Siying Dong
21696ba502 Replace dynamic_cast<>
Summary:
Replace dynamic_cast<> so that users can choose to build with RTTI off, so that they can save several bytes per object, and get tiny more memory available.
Some nontrivial changes:
1. Add Comparator::GetRootComparator() to get around the internal comparator hack
2. Add the two experiemental functions to DB
3. Add TableFactory::GetOptionString() to avoid unnecessary casting to get the option string
4. Since 3 is done, move the parsing option functions for table factory to table factory files too, to be symmetric.
Closes https://github.com/facebook/rocksdb/pull/2645

Differential Revision: D5502723

Pulled By: siying

fbshipit-source-id: fd13cec5601cf68a554d87bfcf056f2ffa5fbf7c
2017-07-28 16:27:16 -07:00
Siying Dong
fca4d6da17 Build fewer tests in Travis platform_dependent tests
Summary:
platform_dependent tests in Travis now builds all tests, which is not needed. Only build those tests we need to run.
Closes https://github.com/facebook/rocksdb/pull/2647

Differential Revision: D5513954

Pulled By: siying

fbshipit-source-id: 4d540b146124e70dd25586c47939d19f93655b0a
2017-07-27 17:29:01 -07:00
Siying Dong
c281b44829 Revert "CRC32 Power Optimization Changes"
Summary:
This reverts commit 2289d38115.
Closes https://github.com/facebook/rocksdb/pull/2652

Differential Revision: D5506163

Pulled By: siying

fbshipit-source-id: 105e31dd9d99090453a6b9f32c165206cd3affa3
2017-07-26 19:31:36 -07:00
Kamalalochana Subbaiah
2289d38115 CRC32 Power Optimization Changes
Summary:
Support for PowerPC Architecture
Detecting AltiVec Support
Closes https://github.com/facebook/rocksdb/pull/2353

Differential Revision: D5210948

Pulled By: siying

fbshipit-source-id: 859a8c063d37697addd89ba2b8a14e5efd5d24bf
2017-07-26 09:42:29 -07:00
Pengchao Wang
534c255c7a Cassandra compaction filter for purge expired columns and rows
Summary:
Major changes in this PR:
* Implement CassandraCompactionFilter to remove expired columns and rows (if all column expired)
* Move cassandra related code from utilities/merge_operators/cassandra to utilities/cassandra/*
* Switch to use shared_ptr<> from uniqu_ptr for Column membership management in RowValue. Since columns do have multiple owners in Merge and GC process, use shared_ptr helps make RowValue immutable.
* Rename cassandra_merge_test to cassandra_functional_test and add two TTL compaction related tests there.
Closes https://github.com/facebook/rocksdb/pull/2588

Differential Revision: D5430010

Pulled By: wpc

fbshipit-source-id: 9566c21e06de17491d486a68c70f52d501f27687
2017-07-21 14:57:44 -07:00
Giuseppe Ottaviano
8f927e5f75 Fix undefined behavior in Hash
Summary:
Instead of ignoring UBSan checks, fix the negative shifts in
Hash(). Also add test to make sure the hash values are stable over
time. The values were computed before this change, so the test also
verifies the correctness of the change.
Closes https://github.com/facebook/rocksdb/pull/2546

Differential Revision: D5386369

Pulled By: yiwu-arbug

fbshipit-source-id: 6de4b44461a544d6222cc5d72d8cda2c0373d17e
2017-07-10 12:29:24 -07:00
Yi Wu
982cec22af Fix TARGETS file tests list
Summary:
1. The buckifier script assume each test "foo" comes with a .cc file of the same name (i.e. foo.cc). Update cassandra tests to follow this pattern so that the buckifier script can recognize them.
2. add blob_db_test
Closes https://github.com/facebook/rocksdb/pull/2506

Differential Revision: D5331517

Pulled By: yiwu-arbug

fbshipit-source-id: 86f3eba471fc621186ab44cbd073b6162cde8e57
2017-06-27 14:12:02 -07:00
Ewout Prangsma
51778612c9 Encryption at rest support
Summary:
This PR adds support for encrypting data stored by RocksDB when written to disk.

It adds an `EncryptedEnv` override of the `Env` class with matching overrides for sequential&random access files.
The encryption itself is done through a configurable `EncryptionProvider`. This class creates is asked to create `BlockAccessCipherStream` for a file. This is where the actual encryption/decryption is being done.
Currently there is a Counter mode implementation of `BlockAccessCipherStream` with a `ROT13` block cipher (NOTE the `ROT13` is for demo purposes only!!).

The Counter operation mode uses an initial counter & random initialization vector (IV).
Both are created randomly for each file and stored in a 4K (default size) block that is prefixed to that file. The `EncryptedEnv` implementation is such that clients of the `Env` class do not see this prefix (nor data, nor in filesize).
The largest part of the prefix block is also encrypted, and there is room left for implementation specific settings/values/keys in there.

To test the encryption, the `DBTestBase` class has been extended to consider a new environment variable called `ENCRYPTED_ENV`. If set, the test will setup a encrypted instance of the `Env` class to use for all tests.
Typically you would run it like this:

```
ENCRYPTED_ENV=1 make check_some
```

There is also an added test that checks that some data inserted into the database is or is not "visible" on disk. With `ENCRYPTED_ENV` active it must not find plain text strings, with `ENCRYPTED_ENV` unset, it must find the plain text strings.
Closes https://github.com/facebook/rocksdb/pull/2424

Differential Revision: D5322178

Pulled By: sdwilsh

fbshipit-source-id: 253b0a9c2c498cc98f580df7f2623cbf7678a27f
2017-06-26 16:56:24 -07:00
Chen Shen
cbd825deea Create a MergeOperator for Cassandra Row Value
Summary:
This PR implements the MergeOperator for Cassandra Row Values.
Closes https://github.com/facebook/rocksdb/pull/2289

Differential Revision: D5055464

Pulled By: scv119

fbshipit-source-id: 45f276ef8cbc4704279202f6a20c64889bc1adef
2017-06-16 14:27:00 -07:00
Adam Retter
26a8a80711 Switch from CentOS 5 to CentOS 6 for crossbuilding RocksJava
Summary:
Updates the statically linked libraries from linking against glibc 2.5, to linking against glibc 2.12.
Closes https://github.com/facebook/rocksdb/pull/2405

Differential Revision: D5184132

Pulled By: sagar0

fbshipit-source-id: 7a8ad4cf7e737ca62f29e58938bd49fa02114541
2017-06-05 12:27:24 -07:00
Siying Dong
95b0e89b5d Improve write buffer manager (and allow the size to be tracked in block cache)
Summary:
Improve write buffer manager in several ways:
1. Size is tracked when arena block is allocated, rather than every allocation, so that it can better track actual memory usage and the tracking overhead is slightly lower.
2. We start to trigger memtable flush when 7/8 of the memory cap hits, instead of 100%, and make 100% much harder to hit.
3. Allow a cache object to be passed into buffer manager and the size allocated by memtable can be costed there. This can help users have one single memory cap across block cache and memtable.
Closes https://github.com/facebook/rocksdb/pull/2350

Differential Revision: D5110648

Pulled By: siying

fbshipit-source-id: b4238113094bf22574001e446b5d88523ba00017
2017-06-02 14:26:56 -07:00
Yi Wu
ad19eb8686 Fixing blob db sequence number handling
Summary:
Blob db rely on base db returning sequence number through write batch after DB::Write(). However after recent changes to the write path, DB::Writ()e no longer return sequence number in some cases. Fixing it by have WriteBatchInternal::InsertInto() always encode sequence number into write batch.

Stacking on #2375.
Closes https://github.com/facebook/rocksdb/pull/2385

Differential Revision: D5148358

Pulled By: yiwu-arbug

fbshipit-source-id: 8bda0aa07b9334ed03ed381548b39d167dc20c33
2017-05-31 10:56:45 -07:00
Yi Wu
345878a7fb update blob_db_test
Summary:
Re-enable blob_db_test with some update:
* Commented out delay at the end of GC tests. Will update the logic later with sync point to properly trigger GC.
* Added some helper functions.

Also update make files to include blob_dump tool.
Closes https://github.com/facebook/rocksdb/pull/2375

Differential Revision: D5133793

Pulled By: yiwu-arbug

fbshipit-source-id: 95470b26d0c1f9592ba4b7637e027fdd263f425c
2017-05-30 22:26:13 -07:00
Tamir Duberstein
103d0692ea Avoid unsupported attributes when not building with UBSAN
Summary:
yiwu-arbug see individual commits.
Closes https://github.com/facebook/rocksdb/pull/2318

Differential Revision: D5141520

Pulled By: yiwu-arbug

fbshipit-source-id: 7987c92ab4461eef36afce5a133d3a0ee0c96300
2017-05-30 11:13:01 -07:00
Sagar Vemuri
6c456ecae7 Clean zstd files
Summary:
zstd files are downloaded and used as part of JNI build, but are left behind even after doing a `make clean`. This PR updates the `clean` target to remove these zstd files as well.
Closes https://github.com/facebook/rocksdb/pull/2365

Differential Revision: D5123537

Pulled By: sagar0

fbshipit-source-id: a8f355da5ba961aa89d5852e35751ffc35de03ea
2017-05-26 09:56:13 -07:00
Yi Wu
578fb0b1dc Simple blob file dumper
Summary:
A simple blob file dumper.
Closes https://github.com/facebook/rocksdb/pull/2242

Differential Revision: D5097553

Pulled By: yiwu-arbug

fbshipit-source-id: c6e00d949fcd3658f9f68da9352f06339fac418d
2017-05-23 10:42:59 -07:00
Aaron Gao
9f839a7f62 keep util/build_version.cc when make clean
Summary:
https://github.com/facebook/rocksdb/pull/2264
adding build_version.cc into clean list which breaks fbcode release.
we need to keep it when `make clean`
Closes https://github.com/facebook/rocksdb/pull/2322

Differential Revision: D5088932

Pulled By: lightmark

fbshipit-source-id: ab001424af596e94a6bc1d4186c39edf6ace484f
2017-05-18 12:26:25 -07:00
Leonidas Galanis
7eecd40a49 add emacs tags file - etags
Summary:
added ctags -e to the tags target in the makefile. It creates an etags file suitable for emacs.
Closes https://github.com/facebook/rocksdb/pull/2193

Differential Revision: D4983535

Pulled By: siying

fbshipit-source-id: 1077ef0676025b8109df37433572533c9e8fe86e
2017-05-18 07:56:28 -07:00
Siying Dong
8032f4cb31 Remove -pie in TSAN
Summary:
-pic seems to be not working in gcc-5 and it is curently broken. Remove it to fix the build.
Closes https://github.com/facebook/rocksdb/pull/2320

Differential Revision: D5082775

Pulled By: siying

fbshipit-source-id: 5055f987353f1417643a394e7ce05905670410a4
2017-05-17 15:56:35 -07:00
Yi Wu
86d5492530 Fix build error with blob DB.
Summary:
snprintf is in <stdio.h> and not in namespace std.
Closes https://github.com/facebook/rocksdb/pull/2287

Reviewed By: anirbanr-fb

Differential Revision: D5054752

Pulled By: yiwu-arbug

fbshipit-source-id: 356807ec38f3c7d95951cdb41f31a3d3ae0714d4
2017-05-15 14:05:46 -07:00
Adam Retter
fa5a15ceb5 Make sure that zstd is statically linked correctly in the Java static build
Summary:
Closes https://github.com/facebook/rocksdb/issues/2280
Closes https://github.com/facebook/rocksdb/pull/2292

Differential Revision: D5061259

Pulled By: sagar0

fbshipit-source-id: eec89111d114c04beee5870a4eb4b51857754783
2017-05-15 11:12:08 -07:00
Adam Retter
a5cc7ecec4 Facility for cross-building RocksJava using Docker
Summary:
As an alternative to Vagrant, we can now also use Docker to cross-build RocksDB. The advantages are:

1. The Docker images are fixed; they include all the latest updates and build tools.
2. The Vagrant image, required scripts that ran for every build that would update CentOS and install the buildtools. This lead to slow repeatable builds, we don't need to do this with Docker as they are already in the provided images.

The Docker images I have used have their Docker build files here: https://github.com/evolvedbinary/docker-rocksjava and the images themselves are available from Docker hub: https://hub.docker.com/r/evolvedbinary/rocksjava/

I have added the following targets to the `Makefile`:
1. `rocksdbjavastaticreleasedocker` this uses Docker to perform the cross-builds. It is basically the Docker version of the existing Vagrant `rocksdbjavastaticrelease` target.
2. `rocksdbjavastaticpublishdocker` delegates to `rocksdbjavastaticreleasedocker` and then `rocksdbjavastaticpublishcentral` to upload the artiacts to Maven Central. Equivalent to the existing Vagrant target: `rocksdbjavastaticpublish`
Closes https://github.com/facebook/rocksdb/pull/2278

Differential Revision: D5048206

Pulled By: yiwu-arbug

fbshipit-source-id: 78fa96ef9d966fe09638ed01de282cd4e31961a9
2017-05-12 11:41:21 -07:00
Adam Retter
c2be434307 Build and link with ZStd when creating the static RocksJava build
Summary: Closes https://github.com/facebook/rocksdb/pull/2279

Differential Revision: D5048161

Pulled By: yiwu-arbug

fbshipit-source-id: 43742ff93137e0a35ea7e855692c9e9a0cd41968
2017-05-11 15:27:10 -07:00
Anirban Rahut
d85ff4953c Blob storage pr
Summary:
The final pull request for Blob Storage.
Closes https://github.com/facebook/rocksdb/pull/2269

Differential Revision: D5033189

Pulled By: yiwu-arbug

fbshipit-source-id: 6356b683ccd58cbf38a1dc55e2ea400feecd5d06
2017-05-10 15:14:44 -07:00
Lovro Puzar
0f559abdb7 Add NO_UPDATE_BUILD_VERSION option to makefile
Summary:
When building rocksdb in fbcode using `make`, util/build_version.cc is always updated (gitignore/hgignore doesn't apply because the file is already checked into fbcode).  To use the rocksdb makefile from our own makefile, I would like an option to prevent the metadata update, which is of no value for us.
Closes https://github.com/facebook/rocksdb/pull/2264

Differential Revision: D5037846

Pulled By: siying

fbshipit-source-id: 9fa005725c5ecb31d9cbe2e738cbee209591f08a
2017-05-10 11:27:40 -07:00
Andrew Kryczka
f6a27d0bce Extract statistics tests into separate file
Summary:
I'm going to add more DB tests for statistics as currently we have very few. I started a file dedicated to this purpose and moved the existing stats-specific tests there.
Closes https://github.com/facebook/rocksdb/pull/2211

Differential Revision: D4951558

Pulled By: ajkr

fbshipit-source-id: 05d11c35079c40ecabdfd2cf5556ccb761f694a4
2017-04-26 14:47:23 -07:00
Tomas Kolda
04d58970cb AIX and Solaris Sparc Support
Summary:
Replacement of #2147

The change was squashed due to a lot of conflicts.
Closes https://github.com/facebook/rocksdb/pull/2194

Differential Revision: D4929799

Pulled By: siying

fbshipit-source-id: 5cd49c254737a1d5ac13f3c035f128e86524c581
2017-04-21 20:48:04 -07:00
Jay Lee
e67f0adf3a enable O2 optimization for lz4
Summary: Closes https://github.com/facebook/rocksdb/pull/2164

Differential Revision: D4897389

Pulled By: yiwu-arbug

fbshipit-source-id: fac15374ae7fef1ece70fd2b9018f2451f3c2f7c
2017-04-16 11:47:17 -07:00
Tudor Bosman
7d5f5aa977 Separate compile and link for shared library
Summary:
Previously, the shared library (make shared_lib) was built with only one
compile line, compiling all .cc files and linking the shared library in
one step. That step would often take 10+ minutes on one machine, and
could not take advantage of multiple CPUs (it's only one invocation of
the compiler).

This commit changes the shared_lib build to compile .o files
individually (placing the resulting .o files in the directory
shared-objects) and then link them into the shared library at the end,
similarly to how the java static build (jls) does it.

Tested by making sure that both static and shared libraries work, and by
making sure that "make clean" cleans up the shared-objects directory.
Closes https://github.com/facebook/rocksdb/pull/2165

Differential Revision: D4897121

Pulled By: yiwu-arbug

fbshipit-source-id: 9811e043d1c01e10503593f3489d186c786ee7d7
2017-04-16 10:48:43 -07:00
Siying Dong
6257837d83 Add ROCKSDB_JAVA_NO_COMPRESSION flag
Summary:
In some CI test environment, compression libraries can't be successfully built. It still helps to build RocksDB there. Provide such an option to skip to download and build compression libraries.
Closes https://github.com/facebook/rocksdb/pull/2135

Differential Revision: D4872617

Pulled By: siying

fbshipit-source-id: bb21ac373bc62a2528cdf1ca4547e05fcae86214
2017-04-11 16:56:59 -07:00
Yi Wu
df6f5a3772 Move memtable related files into memtable directory
Summary:
Move memtable related files into memtable directory.
Closes https://github.com/facebook/rocksdb/pull/2087

Differential Revision: D4829242

Pulled By: yiwu-arbug

fbshipit-source-id: ca70ab6
2017-04-06 14:09:13 -07:00
Tamir Duberstein
107c5f6a60 CMake: more MinGW fixes
Summary:
siying this is a resubmission of #2081 with the 4th commit fixed. From that commit message:

> Note that the previous use of quotes in PLATFORM_{CC,CXX}FLAGS was
incorrect and caused GCC to produce the incorrect define:
>
>  #define ROCKSDB_JEMALLOC -DJEMALLOC_NO_DEMANGLE 1
>
> This was the cause of the Linux build failure on the previous version
of this change.

I've tested this locally, and the Linux build succeeds now.
Closes https://github.com/facebook/rocksdb/pull/2097

Differential Revision: D4839964

Pulled By: siying

fbshipit-source-id: cc51322
2017-04-06 14:09:13 -07:00
Siying Dong
d2dce5611a Move some files under util/ to separate dirs
Summary:
Move some files under util/ to new directories env/, monitoring/ options/ and cache/
Closes https://github.com/facebook/rocksdb/pull/2090

Differential Revision: D4833681

Pulled By: siying

fbshipit-source-id: 2fd8bef
2017-04-05 19:09:16 -07:00
Siying Dong
43010a929f Revert "[rocksdb][PR] CMake: more MinGW fixes"
fbshipit-source-id: 43b4529
2017-04-04 16:24:26 -07:00
Tamir Duberstein
3450ac8c1b CMake: more MinGW fixes
Summary:
See individual commits.

yuslepukhin siying
Closes https://github.com/facebook/rocksdb/pull/2081

Differential Revision: D4824639

Pulled By: IslamAbdelRahman

fbshipit-source-id: 2fc2b00
2017-04-04 15:09:17 -07:00
Andrew Kryczka
e2c6c06366 add TimedEnv
Summary:
I've needed Env timing measurements a few times now, so finally built something for it.
Closes https://github.com/facebook/rocksdb/pull/2073

Differential Revision: D4811231

Pulled By: ajkr

fbshipit-source-id: 218a249
2017-04-04 11:24:12 -07:00
Siying Dong
6ef8c620d3 Move auto_roll_logger and filename out of db/
Summary:
It is confusing to have auto_roll_logger to stay under db/, which has nothing to do with database. Move filename together as it is a dependency.
Closes https://github.com/facebook/rocksdb/pull/2080

Differential Revision: D4821141

Pulled By: siying

fbshipit-source-id: ca7d768
2017-04-03 18:39:14 -07:00
Ayappan
69c8d524a3 Fix jni library name for PowerPC Architecture
Summary:
Right now, building rocksdbjava in PowerPC is broken due to JNI library name. I figured it out that "uname -m" and java's os.arch matches in PowerPC architecture. I made use of this advantage to fix the issue. More info can found from this issue  --> https://github.com/facebook/rocksdb/issues/1317
Closes https://github.com/facebook/rocksdb/pull/2040

Differential Revision: D4779967

Pulled By: siying

fbshipit-source-id: 259f939
2017-03-27 14:09:11 -07:00
Maysam Yabandeh
8b0097b49b Readers for partition filter
Summary:
This is the last split of this pull request: https://github.com/facebook/rocksdb/pull/1891 which includes the reader part as well as the tests.
Closes https://github.com/facebook/rocksdb/pull/1961

Differential Revision: D4672216

Pulled By: maysamyabandeh

fbshipit-source-id: 6a2b829
2017-03-22 09:24:15 -07:00
Siying Dong
f8a4ea0206 Move db_test and external_sst_file_test out of Travis's MAC OS run
Summary:
After we have db_basic_test and external_sst_file_basic_test, we don't need to run db_test and external_sst_file_test in Travis's MAC OS run anymore. Move it out.
Closes https://github.com/facebook/rocksdb/pull/1940

Differential Revision: D4659361

Pulled By: siying

fbshipit-source-id: e64e291
2017-03-06 09:39:15 -08:00
Siying Dong
ba4c77bd6b Divide external_sst_file_test
Summary:
Separate the platform dependent tests from external_sst_file_test. Only those tests need to run on platforms like OSX
Closes https://github.com/facebook/rocksdb/pull/1923

Differential Revision: D4622461

Pulled By: siying

fbshipit-source-id: d2d6f04
2017-02-28 14:24:11 -08:00
Siying Dong
8ad0fcdf99 Separate small subset tests in DBTest
Summary:
Separate a smal subset of tests in DBTest to DBBasicTest. Tests in DBTest don't have to run in CI tests on platforms like OSX, as long as they are covered by Linux.
Closes https://github.com/facebook/rocksdb/pull/1924

Differential Revision: D4616702

Pulled By: siying

fbshipit-source-id: 13e6549
2017-02-27 12:24:11 -08:00
Siying Dong
6c951c43c7 Run fewer tests in OSX
Summary:
Travis is short of OSX resource. Try to move platform independent test suites out of OSX
Closes https://github.com/facebook/rocksdb/pull/1922

Differential Revision: D4616070

Pulled By: siying

fbshipit-source-id: 786342c
2017-02-27 11:09:20 -08:00
Siying Dong
e0b87afc70 Black list some slow valgrind tests
Summary:
valgrind tests always timeout with parallel run. Black list some slowest ones. It is better to run fewer tests than always have the tests timeout.
Closes https://github.com/facebook/rocksdb/pull/1908

Differential Revision: D4607875

Pulled By: siying

fbshipit-source-id: 7062664
2017-02-23 16:09:11 -08:00
Siying Dong
1ba2804b7f Remove XFunc tests
Summary:
Xfunc is hardly used. Remove it to keep the code simple.
Closes https://github.com/facebook/rocksdb/pull/1905

Differential Revision: D4603220

Pulled By: siying

fbshipit-source-id: 731f96d
2017-02-23 12:09:11 -08:00
Adam Retter
0227c16d67 Update static library versions and add checksums
Summary:
The previous version of zlib is no longer available. I have also updated the versions of the other static libraries and added checkum checks for the downloads; This is related to https://github.com/facebook/rocksdb/issues/1769
Closes https://github.com/facebook/rocksdb/pull/1863

Differential Revision: D4550742

Pulled By: yiwu-arbug

fbshipit-source-id: 4414150
2017-02-12 23:09:09 -08:00
Andrew Kryczka
b3aae4d07c Add repair_test to make check
Summary:
needed so we can proactively find issues like #1858
Closes https://github.com/facebook/rocksdb/pull/1862

Differential Revision: D4545854

Pulled By: ajkr

fbshipit-source-id: d77fcb7
2017-02-10 18:09:18 -08:00
Andrew Kryczka
17c1180603 Generalize Env registration framework
Summary:
The Env registration framework supports registering client Envs and selecting which one to instantiate according to a text field. This enabled things like adding the -env_uri argument to db_bench, so the same binary could be reused with different Envs just by changing CLI config.

Now this problem has come up again in a non-Env context, as I want to instantiate a client Statistics implementation from db_bench, which is configured entirely via text parameters. Also, in the future we may wish to use it for deserializing client objects when loading OPTIONS file.

This diff generalizes the Env registration logic to work with arbitrary types.

- Generalized registration and instantiation code by templating them
- The entire implementation is in a header file as that's Google style guide's recommendation for template definitions
- Pattern match with std::regex_match rather than checking prefix, which was the previous behavior
- Rename functions/files to be non-Env-specific
Closes https://github.com/facebook/rocksdb/pull/1776

Differential Revision: D4421933

Pulled By: ajkr

fbshipit-source-id: 34647d1
2017-01-25 16:09:14 -08:00
Adam Retter
e29bb934f7 Zlib 1.2.8 is no longer available, switched to 1.2.10
Summary: Closes https://github.com/facebook/rocksdb/pull/1741

Differential Revision: D4380116

Pulled By: yiwu-arbug

fbshipit-source-id: 7cf09df
2017-01-20 13:24:12 -08:00
Maysam Yabandeh
0712d541d1 Delegate Cleanables
Summary:
Cleanable objects will perform the registered cleanups when
they are destructed. We however rather to delay this cleaning like when
we are gathering the merge operands. Current approach is to create the
Cleanable object on heap (instead of on stack) and delay deleting it.

By allowing Cleanables to delegate their cleanups to another cleanable
object we can delay the cleaning without however the need to craete the
cleanable object on heap and keeping it around. This patch applies this
technique for the cleanups of BlockIter and shows improved performance
for some in-memory benchmarks:
+1.8% for merge worklaod, +6.4% for non-merge workload when the merge
operator is specified.
https://our.intern.facebook.com/intern/tasks?t=15168163

Non-merge benchmark:
TEST_TMPDIR=/dev/shm/v100nocomp/ ./db_bench --benchmarks=fillrandom
--num=1000000 -value_size=100 -compression_type=none

Reading random with no merge operator specified:
TEST_TMPDIR=/dev/shm/v100nocomp/ ./db_bench
--benchmarks="read
Closes https://github.com/facebook/rocksdb/pull/1711

Differential Revision: D4361163

Pulled By: maysamyabandeh

fbshipit-source-id: 9801e07
2016-12-29 15:54:19 -08:00
Andrew Kryczka
50e305de98 Collapse range deletions
Summary:
Added a tombstone-collapsing mode to RangeDelAggregator, which eliminates overlap in the TombstoneMap. In this mode, we can check whether a tombstone covers a user key using upper_bound() (i.e., binary search). However, the tradeoff is the overhead to add tombstones is now higher, so at first I've only enabled it for range scans (compaction/flush/user iterators), where we expect a high number of calls to ShouldDelete() for the same tombstones. Point queries like Get() will still use the linear scan approach.

Also in this diff I changed RangeDelAggregator's TombstoneMap to use multimap with user keys instead of map with internal keys. Callers sometimes provided ParsedInternalKey directly, from which it would've required string copying to derive an internal key Slice with which we could search the map.
Closes https://github.com/facebook/rocksdb/pull/1614

Differential Revision: D4270397

Pulled By: ajkr

fbshipit-source-id: 93092c7
2016-12-19 16:54:12 -08:00
Yi Wu
c270735861 Iterator should be in corrupted status if merge operator return false
Summary:
Iterator should be in corrupted status if merge operator return false.
Also add test to make sure if max_successive_merges is hit during write,
data will not be lost.
Closes https://github.com/facebook/rocksdb/pull/1665

Differential Revision: D4322695

Pulled By: yiwu-arbug

fbshipit-source-id: b327b05
2016-12-16 11:09:16 -08:00
Bassam Tabbara
6653e32ac2 build: make it easier to pass PORTABLE
Summary:
currently when running a portable build we have to do the following

  PORTABLE=1 make ...

this commit adds support for the following

  make PORTABLE=1 ...

this might be seem subtle but it makes PORTABLE like all other
makefile args and simplifies invocation from numerous build systems
including things like ExternalProject_Add in cmake.

Signed-off-by: Bassam Tabbara <bassam.tabbara@quantum.com>
Closes https://github.com/facebook/rocksdb/pull/1643

Differential Revision: D4315870

Pulled By: yiwu-arbug

fbshipit-source-id: ee43755
2016-12-13 14:39:17 -08:00
Jonathan Lee
c04f6a0b4c Specify shell in makefile
Summary:
The second variable "SHELL" simply tells make explicitly which shell to use, instead of allowing it to default to "/bin/sh", which may or may not be Bash.

However, simply defining the second variable by itself causes make to throw an error concerning a circular definition, as it would be attempting to use the "shell" command while simultaneously trying to set which shell to use. Thus, the first variable "BASH_EXISTS" is defined such that make already knows about "/path/to/bash" before trying to use it to set "SHELL".

A more technically correct solution would be to edit the makefile itself to make it compatible with non-bash shells (see the original Issue discussion for details). However, as it seems very few of the people working on this project were building with non-bash shells, I figured this solution would be good enough.
Closes https://github.com/facebook/rocksdb/pull/1631

Differential Revision: D4295689

Pulled By: yiwu-arbug

fbshipit-source-id: e4f9532
2016-12-07 22:24:15 -08:00
Andrew Kryczka
e333528991 DeleteRange write path end-to-end tests
Summary: Closes https://github.com/facebook/rocksdb/pull/1578

Differential Revision: D4241171

Pulled By: ajkr

fbshipit-source-id: ce5fd83
2016-11-29 11:09:22 -08:00
Siying Dong
a13bde39ee Skip ldb test in Travis
Summary:
Travis now is building for ldb tests. Disable for now to unblock other tests while we are investigating.
Closes https://github.com/facebook/rocksdb/pull/1546

Differential Revision: D4209404

Pulled By: siying

fbshipit-source-id: 47edd97
2016-11-18 19:24:13 -08:00
Yueh-Hsuan Chiang
647eafdc21 Introduce Lua Extension: RocksLuaCompactionFilter
Summary:
This diff includes an implementation of CompactionFilter that allows
users to write CompactionFilter in Lua.  With this ability, users can
dynamically change compaction filter logic without requiring building
the rocksdb binary and restarting the database.

To compile, WITH_LUA_PATH must be specified to the base directory
of lua.
Closes https://github.com/facebook/rocksdb/pull/1478

Differential Revision: D4150138

Pulled By: yhchiang

fbshipit-source-id: ed84222
2016-11-16 15:39:12 -08:00
Yi Wu
b952c898b6 Parallize persistent_cache_test and transaction_test
Summary:
Parallize persistent_cache_test and transaction_test
Closes https://github.com/facebook/rocksdb/pull/1506

Differential Revision: D4179392

Pulled By: IslamAbdelRahman

fbshipit-source-id: 05507a1
2016-11-14 20:09:19 -08:00
Yi Wu
1ea79a78c9 Optimize sequential insert into memtable - Part 1: Interface
Summary:
Currently our skip-list have an optimization to speedup sequential
inserts from a single stream, by remembering the last insert position.
We extend the idea to support sequential inserts from multiple streams,
and even tolerate small reordering wihtin each stream.

This PR is the interface part adding the following:
- Add `memtable_insert_prefix_extractor` to allow specifying prefix for each key.
- Add `InsertWithHint()` interface to memtable, to allow underlying
  implementation to return a hint of insert position, which can be later
  pass back to optimize inserts.
- Memtable will maintain a map from prefix to hints and pass the hint
  via `InsertWithHint()` if `memtable_insert_prefix_extractor` is non-null.
Closes https://github.com/facebook/rocksdb/pull/1419

Differential Revision: D4079367

Pulled By: yiwu-arbug

fbshipit-source-id: 3555326
2016-11-13 19:09:18 -08:00
ananclub
6a4faee5cd fix freebsd build include path err and so & jar file name
Summary: Closes https://github.com/facebook/rocksdb/pull/1441

Differential Revision: D4103477

Pulled By: yiwu-arbug

fbshipit-source-id: 071a0dc
2016-10-31 09:39:16 -07:00
Kefu Chai
60a2bbba94 Makefile: generate util/build_version.cc from .in file (#1384)
* util/build_verion.cc.in: add this file, so cmake and make can share the
  template file for generating util/build_version.cc.
* CMakeLists.txt: also, cmake v2.8.11 does not support file(GENERATE ...),
  so we are using configure_file() for creating build_version.cc.
* Makefile: use util/build_verion.cc.in for creating build_version.cc.

Signed-off-by: Kefu Chai <tchaikov@gmail.com>
2016-10-25 11:31:39 -07:00
Anirban Rahut
0e926b84fd Passing DISABLE_JEMALLOC=1 to valgrind_check if run locally
Summary:
Valgrind does not work well with JEMALLOC. If you run
a simple make valgrind_check, you will see lots of issues and
crashes. When precommit runs, this is taken care of. Here we
make sure valgrind_check is passed in DISABLE_JEMALLOC=1

Test Plan: Ran local valgrind_test and noticed the difference

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D65379
2016-10-21 14:57:44 -07:00
Adam Retter
d346ba2468 Minor fixes around Windows 64 Java Artifacts (#1366) 2016-10-03 11:58:08 -07:00
yiwu-arbug
4bc8c88e6b Recover same sequence id from WAL (#1350)
Summary:
Revert the behavior where we don't read sequence id from WAL, but increase it as we replay the log. We still keep the behave for 2PC for now but will fix later.

This change fixes github issue 1339, where some writes come with WAL disabled and we may recover records with wrong sequence id.

Test Plan: Added unit test.

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D64275
2016-09-23 16:15:14 -07:00
Yi Wu
41a9070f84 Fix java makefile dependencies
Summary: Fix dependencies in java makefile, to avoid java build failure.

Test Plan: run "make rocksdbjava" multiple times.

Reviewers: IslamAbdelRahman, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D64023
2016-09-16 10:54:31 -07:00
Islam AbdelRahman
52ee07b021 Move AddFile() tests to external_sst_file_test.cc
Summary: Simply move the tests

Test Plan: make check -j64

Reviewers: andrewkr, lightmark, yiwu, yhchiang, kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62529
2016-09-07 15:41:54 -07:00
Adam Retter
c8513cde0c Update the download location of Snappy (#1304)
* Fix the download location of Snappy, no longer available from Google Code

* Ensure that curl follows any redirect headers when downloading binaries
2016-08-28 20:24:08 -07:00
Islam AbdelRahman
4b3438d2d6 Fix parallel valgrind (valgrind_check)
Summary:
I just realized that when we run parallel valgrind we actually don't run the parallel tests under valgrind (we run the normally)
This patch make sure that we run both parallel and non-parallel tests with valgrind

Test Plan: DISABLE_JEMALLOC=1 make valgrind_check -j64

Reviewers: andrewkr, yiwu, lightmark, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D62469
2016-08-25 11:02:39 -07:00
Yi Wu
72f8cc703c LRU cache mid-point insertion
Summary:
Add mid-point insertion functionality to LRU cache. Caller of `Cache::Insert()` can set an additional parameter to make a cache entry have higher priority. The LRU cache will reserve at most `capacity * high_pri_pool_pct` bytes for high-pri cache entries. If `high_pri_pool_pct` is zero, the cache degenerates to normal LRU cache.

Context: If we are to put index and filter blocks into RocksDB block cache, index/filter block can be swap out too early. We want to add an option to RocksDB to reserve some capacity in block cache just for index/filter blocks, to mitigate the issue.

In later diffs I'll update block based table reader to use the interface to cache index/filter blocks at high priority, and expose the option to `DBOptions` and make it dynamic changeable.

Test Plan: unit test.

Reviewers: IslamAbdelRahman, sdong, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, march, leveldb

Differential Revision: https://reviews.facebook.net/D61977
2016-08-19 16:43:31 -07:00
krad
87c91bd876 Persistent Read Cache (8) Benchmark tooling
Summary:
Adding benchmark tool for persistent read cache.

TODO: Add integration to db_bench

Test Plan: Compile

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D57711
2016-08-10 17:49:22 -07:00
sdong
8b79422b52 [Proof-Of-Concept] RocksDB Blob Storage with a blob log file.
Summary:
This is a proof of concept of a RocksDB blob log file. The actual value of the Put() is appended to a blob log using normal data block format, and the handle of the block is written as the value of the key in RocksDB.

The prototype only supports Put() and Get(). It doesn't support DB restart, garbage collection, Write() call, iterator, snapshots, etc.

Test Plan: Add unit tests.

Reviewers: arahut

Reviewed By: arahut

Subscribers: kradhakrishnan, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61485
2016-08-10 17:05:17 -07:00
sdong
56dd034115 read_options.background_purge_on_iterator_cleanup to cover forward iterator and log file closing too.
Summary: With read_options.background_purge_on_iterator_cleanup=true, File deletion and closing can still happen in forward iterator, or WAL file closing. Cover those cases too.

Test Plan: I am adding unit tests.

Reviewers: andrewkr, IslamAbdelRahman, yiwu

Reviewed By: yiwu

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61503
2016-08-10 13:16:41 -07:00
Andrew Kryczka
1b0069ce2d Remove non-gtest from parallelized tests
Summary:
compact_on_deletion_collector_test does not support --gtest_list_tests
since it isn't gtest, so the full program would run for the target
gen_parallel_tests. This caused gen_parallel_tests to take 8+ minutes for tsan
and prevented compact_on_deletion_collector_test from running during check_0
since no t/run-* script could be generated.

Test Plan:
run make check, verify generating t/run-* scripts is fast and
./compact_on_deletion_collector_test is now run

Reviewers: IslamAbdelRahman, wanning, lightmark, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61695
2016-08-10 11:08:09 -07:00
omegaga
44f5cc57a5 Add time series database (resubmitted)
Summary: Implement a time series database that supports DateTieredCompactionStrategy. It wraps a db object and separate SST files in different column families (time windows).

Test Plan: Add `date_tiered_test`.

Reviewers: dhruba, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61653
2016-08-05 15:56:22 -07:00
sdong
7c4615cf1f A utility function to help users migrate DB after options change
Summary: Add a utility function that trigger necessary full compaction and put output to the correct level by looking at new options and old options.

Test Plan: Add unit tests for it.

Reviewers: andrewkr, igor, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: muthu, sumeet, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60783
2016-08-05 15:39:55 -07:00
Islam AbdelRahman
0155c73dee Fix parallel tests make check -j
Summary:
parallel tests are broken because gnu_parallel is reading deprecated options from `/etc/parallel/config`
Fix this by passing `--plain` to ignore `/etc/parallel/config`

Test Plan: make check -j64

Reviewers: kradhakrishnan, sdong, andrewkr, yiwu, arahut

Reviewed By: arahut

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61359
2016-08-01 17:22:07 -07:00
omegaga
d51dc96a79 Experiments on column-aware encodings
Summary:
Experiments on column-aware encodings. Supported features: 1) extract data blocks from SST file and encode with specified encodings; 2) Decode encoded data back into row format; 3) Directly extract data blocks and write in row format (without prefix encoding); 4) Get column distribution statistics for column format; 5) Dump data blocks separated by columns in human-readable format.

There is still on-going work on this diff. More refactoring is necessary.

Test Plan: Wrote tests in `column_aware_encoding_test.cc`. More tests should be added.

Reviewers: sdong

Reviewed By: sdong

Subscribers: arahut, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60027
2016-08-01 14:50:19 -07:00
Adam Retter
9ae92f50b2 More granular steps in the Makefile, can help with running all or single Java tests (and with ASAN build - https://github.com/facebook/rocksdb/wiki/JNI-Debugging) (#1237) 2016-07-29 12:55:54 -07:00
Anirban Rahut
d3bfd33972 Testing out parallel sandcastle changes
Summary:
Removing moreutils from sandcastle and adding gnu parallel.
Then passing in J= nproc command

Test Plan: Testing on sandcastle

Reviewers: sdong, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61017
2016-07-27 11:58:57 -07:00
Islam AbdelRahman
f8061a237e Fix Statistics TickersNameMap miss match with Tickers enum
Summary:
TickersNameMap is not consistent with Tickers enum.
this cause us to report wrong statistics and sometimes to access TickersNameMap outside it's boundary causing crashes (in Fb303 statistics)

Test Plan: added new unit test

Reviewers: sdong, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61083
2016-07-25 16:05:50 -07:00
sdong
f85df120f2 Re-enable tsan crash white-box test with reduced killing odds
Summary:
Tsan crash white-box test was disabled because it never ends. Re-enable it with reduced killing odds.
Add a parameter in crash test script to allow we pass it through an environment variable.

Test Plan: Run it manually.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61053
2016-07-22 12:51:27 -07:00
sdong
4ea0ab3cc5 Revert "Remove bashism from make check (#1225)"
This reverts commit 08ab1d83ac.
2016-07-22 10:42:56 -07:00
sdong
d5a51d4de3 Need to make sure log file synced before flushing memtable of one column family
Summary: Multiput atomiciy is broken across multiple column families if we don't sync WAL before flushing one column family. The WAL file may contain a write batch containing writes to a key to the CF to be flushed and a key to other CF. If we don't sync WAL before flushing, if machine crashes after flushing, the write batch will only be partial recovered. Data to other CFs are lost.

Test Plan: Add a new unit test which will fail without the diff.

Reviewers: yhchiang, IslamAbdelRahman, igor, yiwu

Reviewed By: yiwu

Subscribers: yiwu, leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D60915
2016-07-21 16:29:06 -07:00