Compare commits

...

39 Commits
main ... 6.6.fb

Author SHA1 Message Date
Adam Retter
01d5fafe08 Add Visual Studio 2015 to AppVeyor (#5446)
Summary:
This is required to compile on Windows with Visual Studio 2015, which is used for creating the RocksJava releases.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5446

Differential Revision: D18924811

fbshipit-source-id: a183a62e79a2af5aaf59cd08235458a172fe7dcb
2020-02-19 13:11:39 -08:00
Peter Dillinger
6367dee267 Don't download from (unreliable) maven.org (#6348)
Summary:
I set up a mirror of our Java deps on github so we can download
them through github URLs rather than maven.org, which is proving
terribly unreliable from Travis builds.

Also sanitized calls to curl, so they are easier to read and
appropriately fail on download failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348

Test Plan: CI

Differential Revision: D19633621

Pulled By: pdillinger

fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3
2020-02-19 13:11:32 -08:00
Adam Retter
524f1958aa Reduce the need to re-download dependencies (#6318)
Summary:
Both changes are related to RocksJava:

1. Allow dependencies that are already present on the host system due to Maven to be reused in Docker builds.

2. Extend the `make clean-not-downloaded` target to RocksJava, so that libraries needed as dependencies for the test suite are not deleted and re-downloaded unnecessarily.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6318

Differential Revision: D19608742

Pulled By: pdillinger

fbshipit-source-id: 25e25649e3e3212b537ac4512b40e2e53dc02ae7
2020-02-19 13:11:25 -08:00
Levi Tamasi
71b3e43f01 Access Maven Central over HTTPS (#6301)
Summary:
As of 1/15/2020, Maven Central does not support plain HTTP. Because of
this, our Travis and AppVeyor builds have started failing during the
assertj download step. This patch will hopefully fix these issues.

See https://blog.sonatype.com/central-repository-moving-to-https
for more info.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6301

Test Plan:
Will monitor the builds. ("I don't always test my changes but when I do,
I do it in production.")

Differential Revision: D19422923

Pulled By: ltamasi

fbshipit-source-id: 76f9a8564a5b66ddc721d705f9cbfc736bf7a97d
2020-02-19 13:10:45 -08:00
Fosco Marotto
551a110918 Update version to 6.6.4 2020-01-31 13:03:51 -08:00
anand76
f5f46ade44 Fix a unit test in error_handler_test.cc 2020-01-31 12:58:44 -08:00
anand76
07786d9d8e Force a new manifest file if append to current one fails (#6331)
Summary:
Fix for issue https://github.com/facebook/rocksdb/issues/6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
2020-01-31 11:45:01 -08:00
anand76
ac29858b3e Update version to 6.6.3
Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:
2020-01-24 14:01:40 -08:00
Maysam Yabandeh
f7619b4177 Implement PinnableSlice::remove_prefix (#6330)
Summary:
The function was left unimplemented. Although we currently don't have a use for that it was declared with an assert(0) to prevent mistakenly using the remove_prefix of the parent class. The function body  with only assert(0) however causes issues with some compiler's warning levels. The patch implements the function to avoid the warning.
It also piggybacks some minor code warning for unnecessary semicolons after the function definition.s
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6330

Differential Revision: D19559062

Pulled By: maysamyabandeh

fbshipit-source-id: 3a022484f688c9abd4556e5412bcc2628ab96a00
2020-01-24 13:30:13 -08:00
anand76
19e217815d Fix queue manipulation in WriteThread::BeginWriteStall() (#6322)
Summary:
When there is a write stall, the active write group leader calls ```BeginWriteStall()``` to walk the queue of writers and remove any with the ```no_slowdown``` option set. There was a bug in the code which updated the back pointer but not the forward pointer (```link_newer```), corrupting the list and causing some threads to wait forever. This PR fixes it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6322

Test Plan: Add a unit test in db_write_test

Differential Revision: D19538313

Pulled By: anand1976

fbshipit-source-id: 6fbed819e594913f435886606f5d36f74f235c3a
2020-01-24 10:08:01 -08:00
Sagar Vemuri
1fab610a29 Update version to 6.6.2 2020-01-13 12:28:06 -08:00
Sagar Vemuri
4df4e63ee6 Consider all compaction input files to compute the oldest ancestor time (#6279)
Summary:
Look at all compaction input files to compute the oldest ancestor time.

In https://github.com/facebook/rocksdb/issues/5992 we changed how creation_time (aka oldest-ancestor-time) table property of compaction output files is computed from max(creation-time-of-all-compaction-inputs) to min(creation-time-of-all-inputs). This exposed a bug where, during compaction, the creation_time:s of only the L0 compaction inputs were being looked at, and all other input levels were being ignored. This PR fixes the issue.
Some TTL compactions when using Level-Style compactions might not have run due to this bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6279

Test Plan: Enhanced the unit tests to validate that the correct time is propagated to the compaction outputs.

Differential Revision: D19337812

Pulled By: sagar0

fbshipit-source-id: edf8a72f11e405e93032ff5f45590816debe0bb4
2020-01-13 12:20:11 -08:00
Yanqin Jin
beca3c9a41 Update release date 2020-01-02 12:50:59 -08:00
Yanqin Jin
4fc5e6c177 Update HISTORY and bump up version number 2020-01-02 12:38:21 -08:00
Mike Kolupaev
74b01ac2ea Fix use-after-free and double-deleting files in BackgroundCallPurge() (#6193)
Summary:
The bad code was:

```
mutex.Lock(); // `mutex` protects `container`
for (auto& x : container) {
  mutex.Unlock();
  // do stuff to x
  mutex.Lock();
}
```

It's incorrect because both `x` and the iterator may become invalid if another thread modifies the container while this thread is not holding the mutex.

Broken by https://github.com/facebook/rocksdb/pull/5796 - it replaced a `while (!container.empty())` loop with a `for (auto x : container)`.

(RocksDB code does a lot of such unlocking+re-locking of mutexes, and this type of bugs comes up a lot :/ )
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6193

Test Plan: Ran some logdevice integration tests that were crashing without this fix.

Differential Revision: D19116874

Pulled By: al13n321

fbshipit-source-id: 9672bc4227c1b68f46f7436db2b96811adb8c703
2020-01-02 12:21:53 -08:00
解轶伦
924bc5fb95 delete superversions in BackgroundCallPurge (#6146)
Summary:
I found that CleanupSuperVersion() may block Get() for 30ms+ (per MemTable is 256MB).

Then I found "delete sv" in ~SuperVersion() takes the time.

The backtrace looks like this

DBImpl::GetImpl() -> DBImpl::ReturnAndCleanupSuperVersion() ->
DBImpl::CleanupSuperVersion() : delete sv; -> ~SuperVersion()

I think it's better to delete in a background thread,  please review it。
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6146

Differential Revision: D18972066

fbshipit-source-id: 0f7b0b70b9bb1e27ad6fc1c8a408fbbf237ae08c
2020-01-02 12:20:29 -08:00
Levi Tamasi
7168d16103 BlobDB: only compare CF IDs when checking whether an API call is for the default CF (#6226)
Summary:
BlobDB currently only supports using the default column family. The earlier
code enforces this by comparing the `ColumnFamilyHandle` passed to the
`Get`/`Put`/etc. call with the handle returned by `DefaultColumnFamily`
(which, at the end of the day, comes from `DBImpl::default_cf_handle_`).
Since other `ColumnFamilyHandle`s can also point to the default column
family, this can reject legitimate requests as well. (As an example,
with the earlier code, the handle returned by `BlobDB::Open` cannot
actually be used in API calls.) The patch fixes this by comparing only
the IDs of the column family handles instead of the pointers themselves.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6226

Test Plan: `make check`

Differential Revision: D19187461

Pulled By: ltamasi

fbshipit-source-id: 54ce2e12ebb1f07e6d1e70e3b1e0213dfa94bda2
2019-12-19 18:29:39 -08:00
suzanwen
d84805962d Isolate building db_bench from tests with WITH_BENCHMARK_TOOLS option. (#6098)
Summary:
Isolate `db_bench` from building tests, out of respect for the related comments.
Let building tests yields to `WITH_TEST=ON` AND `CMAKE_BUILD_TYPE=Debug` both,
and building `db_bench` yields to `WITH_BENCHMARK_TOOLS=ON`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6098

Test Plan: cmake -DCMAKE_BUILD_TYPE=Debug/Release -DWITH_TESTS=ON/OFF -DWITH_BENCHMARK_TOOLS=ON/OFF -DWITH_TOOLS=ON/OFF && make

Differential Revision: D18856891

Pulled By: riversand963

fbshipit-source-id: addbee8ad6abefb877843a313b4630cfab3ce4f0
2019-12-19 14:05:50 -08:00
Adam Retter
5929ac8834 Env should also load the native library (#6167)
Summary:
Closes https://github.com/facebook/rocksdb/issues/6118
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6167

Differential Revision: D19053577

Pulled By: pdillinger

fbshipit-source-id: 86aca9a5bec0947a641649b515da17b3cb12bdde
2019-12-19 11:38:15 -08:00
Adam Retter
9ea7363d4e Add missing mutable DBOptions to RocksJava (#6152)
Summary:
As requested in https://github.com/facebook/rocksdb/issues/6127
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6152

Differential Revision: D18955608

Pulled By: pdillinger

fbshipit-source-id: 3e1367d944e44d5f1675a422f7dd2451c86feb6f
2019-12-19 11:38:06 -08:00
奏之章
137dfbcab7 Fix RangeDeletion bug (#6062)
Summary:
Read keys from a snapshot that a range deletion were added after the snapshot  was created and this range deletion was inside an immutable memtable, we will get wrong key set.
More detail rest in codes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6062

Differential Revision: D18966785

Pulled By: pdillinger

fbshipit-source-id: 38a60bb1e2d0a1dbfc8ec641617200b6a02b86c3
2019-12-17 17:09:46 -08:00
Levi Tamasi
3ff40125cd Update HISTORY.md with recent BlobDB related changes 2019-12-17 12:32:20 -08:00
Levi Tamasi
df032f5dd0 Do not update SST <-> blob file mapping if compaction failed
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6156

Test Plan: Extended unit tests.

Differential Revision: D18943867

Pulled By: ltamasi

fbshipit-source-id: b3669d2dd6af08e987ad1a59d6712ae2514da0b1
2019-12-17 12:31:10 -08:00
Levi Tamasi
142f00d410 Update HISTORY.md with the recent memtable trimming fixes
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6194

Differential Revision: D19125292

Pulled By: ltamasi

fbshipit-source-id: d41aca2755ec4bec07feedd6b561e8d18606a931
2019-12-17 07:53:12 -08:00
Levi Tamasi
509da20ae5 Fix a data race related to memtable trimming (#6187)
Summary:
https://github.com/facebook/rocksdb/pull/6177 introduced a data race
involving `MemTableList::InstallNewVersion` and `MemTableList::NumFlushed`.
The patch fixes this by caching whether the current version has any
memtable history (i.e. flushed memtables that are kept around for
transaction conflict checking) in an `std::atomic<bool>` member called
`current_has_history_`, similarly to how `current_memory_usage_excluding_last_`
is handled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6187

Test Plan:
```
make clean
COMPILE_WITH_TSAN=1 make db_test -j24
./db_test
```

Differential Revision: D19084059

Pulled By: ltamasi

fbshipit-source-id: 327a5af9700fb7102baea2cc8903c085f69543b9
2019-12-17 07:52:22 -08:00
Levi Tamasi
628786ed14 Do not schedule memtable trimming if there is no history (#6177)
Summary:
We have observed an increase in CPU load caused by frequent calls to
`ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
when using `max_write_buffer_size_to_maintain` to limit the amount of
memtable history maintained for transaction conflict checking. Part of the issue
is that trimming can potentially be scheduled even if there is no memtable
history. The patch adds a check that fixes this.

See also https://github.com/facebook/rocksdb/pull/6169.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6177

Test Plan:
Compared `perf` output for

```
./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
```

before and after the change. There is a significant reduction for the call chain
`rocksdb::DBImpl::TrimMemtableHistory` -> `rocksdb::ColumnFamilyData::InstallSuperVersion` ->
`rocksdb::ThreadLocalPtr::StaticMeta::Scrape` even without https://github.com/facebook/rocksdb/pull/6169.

Differential Revision: D19057445

Pulled By: ltamasi

fbshipit-source-id: dff81882d7b280e17eda7d9b072a2d4882c50f79
2019-12-17 07:52:22 -08:00
Levi Tamasi
80de900464 Do not create/install new SuperVersion if nothing was deleted during memtable trim (#6169)
Summary:
We have observed an increase in CPU load caused by frequent calls to
`ColumnFamilyData::InstallSuperVersion` from `DBImpl::TrimMemtableHistory`
when using `max_write_buffer_size_to_maintain` to limit the amount of
memtable history maintained for transaction conflict checking. As it turns out,
this is caused by the code creating and installing a new `SuperVersion` even if
no memtables were actually trimmed. The patch adds a check to avoid this.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6169

Test Plan:
Compared `perf` output for

```
./db_bench -benchmarks=randomtransaction -optimistic_transaction_db=1 -statistics -stats_interval_seconds=1 -duration=90 -num=500000 --max_write_buffer_size_to_maintain=16000000 --transaction_set_snapshot=1 --threads=32
```

before and after the change. With the fix, the call chain `rocksdb::DBImpl::TrimMemtableHistory` ->
`rocksdb::ColumnFamilyData::InstallSuperVersion` -> `rocksdb::ThreadLocalPtr::StaticMeta::Scrape`
no longer registers in the `perf` report.

Differential Revision: D19031509

Pulled By: ltamasi

fbshipit-source-id: 02686fce594e5b50eba0710e4b28a9b808c8aa20
2019-12-17 07:52:22 -08:00
Yanqin Jin
1d9eae3f61 Use Env::LoadEnv to create custom Env objects (#6196)
Summary:
As title. Previous assumption was that the underlying lib can always return
a shared_ptr<Env>. This is too strong. Therefore, we use Env::LoadEnv to relax
it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6196

Test Plan: make check

Differential Revision: D19133199

Pulled By: riversand963

fbshipit-source-id: c83a0c02a42610d077054f2de1acfc45126b3a75
2019-12-16 23:00:35 -08:00
anand1976
2ba7f1e574 Fix crash in Transaction::MultiGet() when num_keys > 32
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6192

Test Plan:
Add a unit test that fails without the fix and passes now
make check

Differential Revision: D19124781

Pulled By: anand1976

fbshipit-source-id: 8c8cb6fa16c3fc23ec011e168561a13f76bbd783
2019-12-16 22:17:35 -08:00
Maysam Yabandeh
d6e199016c Fix build breakage from lock_guard error (#6161)
Summary:
This change fixes a source issue that caused compile time error which breaks build for many fbcode services in that setup. The size() member function of channel is a const member, so member variables accessed within it are implicitly const as well. This caused error when clang fails to resolve to a constructor that takes std::mutex because the suitable constructor got rejected due to loss of constness for its argument. The fix is to add mutable modifier to the lock_ member of channel.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6161

Differential Revision: D18967685

Pulled By: maysamyabandeh

fbshipit-source-id: 698b6a5153c3c92eeacb842c467aa28cc350d432
2019-12-12 13:54:29 -08:00
Peter Dillinger
92453f26ef Disable new Bloom filter assertion (#6128)
Summary:
A longstanding bug in our C interface can trigger this
assertion; see issue https://github.com/facebook/rocksdb/issues/6129. Disabling the assertion for now
(for 6.6.0) and will re-enable on fix of that bug.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6128

Differential Revision: D18854899

Pulled By: pdillinger

fbshipit-source-id: 9eb5294b9f11b208dc1a8cc148aaa31e47ff892b
2019-12-06 10:31:04 -08:00
Jim Meyering
e106a3cf29 build_tools/precommit_checker.py: don't hard-code a platform-afflicted python path (#6124)
Summary:
Use `#!/usr/bin/env python2.7` instead.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6124

Test Plan: `J=8 make commit_prereq`

Differential Revision: D18834668

Pulled By: ltamasi

fbshipit-source-id: cec40266cd5bcae8bf6cbe5a564ae78540deccc4
2019-12-05 12:20:14 -08:00
Yanqin Jin
98c414772c Let DBSecondary close files after catch up (#6114)
Summary:
After secondary instance replays the logs from primary, certain files become
obsolete. The secondary should find these files, evict their table readers from
table cache and close them. If this is not done, the secondary will hold on to
these files and prevent their space from being freed.

Test plan (devserver):
```
$./db_secondary_test --gtest_filter=DBSecondaryTest.SecondaryCloseFiles
$make check
$./db_stress -ops_per_thread=100000 -enable_secondary=true -threads=32 -secondary_catch_up_one_in=10000 -clear_column_family_one_in=1000 -reopen=100
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6114

Differential Revision: D18769998

Pulled By: riversand963

fbshipit-source-id: 5d1f151567247196164e1b79d8402fa2045b9120
2019-12-02 17:53:24 -08:00
anand76
96da9d7224 Remove key length assertion LRUHandle::CalcTotalCharge (#6115)
Summary:
Inserting an entry in the block cache with 0 length key is a valid use case. Remove the assertion in ```LRUHandle::CalcTotalCharge```.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6115

Differential Revision: D18769693

Pulled By: anand1976

fbshipit-source-id: 34cc159650300dda6d7273480640478f28392cda
2019-12-02 15:52:55 -08:00
Peter Dillinger
7e8b4f5f69 Update comment on max_valid_backups_to_open (#6105)
Summary:
To reflect changes in PR https://github.com/facebook/rocksdb/issues/6072

This comment also implies that a seemingly valid use-case for
max_valid_backups_to_open is flawed: even if you only want to add a new
backup without trying to delete, you might need to clean up after a
backup creation that never finished. To clean up properly requires
opening all backups to get proper ref counts on shared files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6105

Test Plan: code comment only

Differential Revision: D18736716

Pulled By: pdillinger

fbshipit-source-id: 2447c0000eefe3a4ca606926bfe922a8456b0cb7
2019-11-27 15:17:57 -08:00
Peter Dillinger
ce1abbca73 Update format_version comment for 6.6.0
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6097

Differential Revision: D18729661

Pulled By: pdillinger

fbshipit-source-id: d2e4a9d6803aad8dd61ececd5c2b861e6f2da73b
2019-11-27 15:17:45 -08:00
Adam Retter
4d26e7550a Fix BlobDB compilation on older GCC versions
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6094

Differential Revision: D18731951

Pulled By: ltamasi

fbshipit-source-id: 5b73c6009c748f6a2a48d4d880b1259980d801d4
2019-11-27 14:10:33 -08:00
John Ericson
880e30a8b1 Work around weird unused errors with Mingw (#6075)
Summary:
From the reset of the code, it looks this this maybe can be unconditionally given the attribute? But I couldn't test with MSVC so I defensively put under CPP.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6075

Differential Revision: D18723749

fbshipit-source-id: 45fc8732c28dd29aab1644225d68f3c6f39bd69b
2019-11-27 09:51:01 -08:00
sdong
73c1203af1 Support options.max_open_files = -1 with periodic_compaction_seconds (#6090)
Summary:
options.periodic_compaction_seconds isn't supported when options.max_open_files != -1. It's because that the information of file creation time is stored in table properties and are not guaranteed to be loaded unless options.max_open_files = -1. Relax this constraint by storing the information in manifest.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6090

Test Plan: Pass all existing tests; Modify an existing test to force the manifest value to take 0 to simulate backward compatibility case; manually open the DB generated with the change by release 4.2.

Differential Revision: D18702268

fbshipit-source-id: 13e0bd94f546498a04f3dc5fc0d9dff5125ec9eb
2019-11-27 09:50:44 -08:00
85 changed files with 1721 additions and 384 deletions

View File

@ -70,7 +70,7 @@ install:
CC=gcc-8 && CXX=g++-8;
fi
- if [[ "${JOB_NAME}" == cmake* ]] && [ "${TRAVIS_OS_NAME}" == linux ]; then
mkdir cmake-dist && curl -sfSL https://github.com/Kitware/CMake/releases/download/v3.14.5/cmake-3.14.5-Linux-x86_64.tar.gz | tar --strip-components=1 -C cmake-dist -xz && export PATH=$PWD/cmake-dist/bin:$PATH;
mkdir cmake-dist && curl --silent --fail --show-error --location https://github.com/Kitware/CMake/releases/download/v3.14.5/cmake-3.14.5-Linux-x86_64.tar.gz | tar --strip-components=1 -C cmake-dist -xz && export PATH=$PWD/cmake-dist/bin:$PATH;
fi
- if [[ "${JOB_NAME}" == java_test ]]; then
java -version && echo "JAVA_HOME=${JAVA_HOME}";

View File

@ -14,7 +14,7 @@
# cd build
# 3. Run cmake to generate project files for Windows, add more options to enable required third-party libraries.
# See thirdparty.inc for more information.
# sample command: cmake -G "Visual Studio 15 Win64" -DWITH_GFLAGS=1 -DWITH_SNAPPY=1 -DWITH_JEMALLOC=1 -DWITH_JNI=1 ..
# sample command: cmake -G "Visual Studio 15 Win64" -DCMAKE_BUILD_TYPE=Release -DWITH_GFLAGS=1 -DWITH_SNAPPY=1 -DWITH_JEMALLOC=1 -DWITH_JNI=1 ..
# 4. Then build the project in debug mode (you may want to add /m[:<N>] flag to run msbuild in <N> parallel threads
# or simply /m to use all avail cores)
# msbuild rocksdb.sln
@ -63,7 +63,12 @@ endif()
# third-party/folly is only validated to work on Linux and Windows for now.
# So only turn it on there by default.
if(CMAKE_SYSTEM_NAME MATCHES "Linux" OR CMAKE_SYSTEM_NAME MATCHES "Windows")
option(WITH_FOLLY_DISTRIBUTED_MUTEX "build with folly::DistributedMutex" ON)
if(MSVC AND MSVC_VERSION LESS 1910)
# Folly does not compile with MSVC older than VS2017
option(WITH_FOLLY_DISTRIBUTED_MUTEX "build with folly::DistributedMutex" OFF)
else()
option(WITH_FOLLY_DISTRIBUTED_MUTEX "build with folly::DistributedMutex" ON)
endif()
else()
option(WITH_FOLLY_DISTRIBUTED_MUTEX "build with folly::DistributedMutex" OFF)
endif()
@ -174,7 +179,7 @@ else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -W -Wextra -Wall")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wsign-compare -Wshadow -Wno-unused-parameter -Wno-unused-variable -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-strict-aliasing")
if(MINGW)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-format")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-format -fno-asynchronous-unwind-tables")
add_definitions(-D_POSIX_C_SOURCE=1)
endif()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
@ -885,7 +890,9 @@ if(NOT WIN32 OR ROCKSDB_INSTALL_ON_WINDOWS)
endif()
option(WITH_TESTS "build with tests" ON)
if(WITH_TESTS)
# For test libraries, utilities, and exes that are build iff WITH_TESTS=ON and
# in Debug mode. Add test only code that is not #ifdefed for Release here.
if(WITH_TESTS AND CMAKE_BUILD_TYPE STREQUAL "Debug")
add_subdirectory(third-party/gtest-1.8.1/fused-src/gtest)
set(TESTS
cache/cache_test.cc
@ -1040,24 +1047,6 @@ if(WITH_TESTS)
list(APPEND TESTS third-party/folly/folly/synchronization/test/DistributedMutexTest.cpp)
endif()
set(BENCHMARKS
cache/cache_bench.cc
memtable/memtablerep_bench.cc
db/range_del_aggregator_bench.cc
tools/db_bench.cc
table/table_reader_bench.cc
util/filter_bench.cc
utilities/persistent_cache/hash_table_bench.cc)
add_library(testharness OBJECT test_util/testharness.cc)
foreach(sourcefile ${BENCHMARKS})
get_filename_component(exename ${sourcefile} NAME_WE)
add_executable(${exename}${ARTIFACT_SUFFIX} ${sourcefile}
$<TARGET_OBJECTS:testharness>)
target_link_libraries(${exename}${ARTIFACT_SUFFIX} gtest ${LIBS})
endforeach(sourcefile ${BENCHMARKS})
# For test util library that is build only in DEBUG mode
# and linked to tests. Add test only code that is not #ifdefed for Release here.
set(TESTUTIL_SOURCE
db/db_test_util.cc
monitoring/thread_status_updater_debug.cc
@ -1065,7 +1054,6 @@ if(WITH_TESTS)
test_util/fault_injection_test_env.cc
utilities/cassandra/test_utils.cc
)
# test utilities are only build in debug
enable_testing()
add_custom_target(check COMMAND ${CMAKE_CTEST_COMMAND})
set(TESTUTILLIB testutillib${ARTIFACT_SUFFIX})
@ -1080,9 +1068,7 @@ if(WITH_TESTS)
EXCLUDE_FROM_DEFAULT_BUILD_RELWITHDEBINFO 1
)
# Tests are excluded from Release builds
set(TEST_EXES ${TESTS})
foreach(sourcefile ${TEST_EXES})
get_filename_component(exename ${sourcefile} NAME_WE)
add_executable(${CMAKE_PROJECT_NAME}_${exename}${ARTIFACT_SUFFIX} ${sourcefile}
@ -1119,6 +1105,29 @@ if(WITH_TESTS)
endforeach(sourcefile ${C_TEST_EXES})
endif()
option(WITH_BENCHMARK_TOOLS "build with benchmarks" ON)
if(WITH_BENCHMARK_TOOLS)
if(NOT TARGET gtest)
add_subdirectory(third-party/gtest-1.8.1/fused-src/gtest)
endif()
set(BENCHMARKS
cache/cache_bench.cc
memtable/memtablerep_bench.cc
db/range_del_aggregator_bench.cc
tools/db_bench.cc
table/table_reader_bench.cc
util/filter_bench.cc
utilities/persistent_cache/hash_table_bench.cc
)
add_library(testharness OBJECT test_util/testharness.cc)
foreach(sourcefile ${BENCHMARKS})
get_filename_component(exename ${sourcefile} NAME_WE)
add_executable(${exename}${ARTIFACT_SUFFIX} ${sourcefile}
$<TARGET_OBJECTS:testharness>)
target_link_libraries(${exename}${ARTIFACT_SUFFIX} gtest ${LIBS})
endforeach(sourcefile ${BENCHMARKS})
endif()
option(WITH_TOOLS "build with tools" ON)
if(WITH_TOOLS)
add_subdirectory(tools)

View File

@ -1,4 +1,28 @@
# Rocksdb Change Log
## Unreleased
## 6.6.4 (1/31/2020)
### Bug Fixes
* Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
## 6.6.3 (01/24/2020)
### Bug Fixes
* Fix a bug that can cause write threads to hang when a slowdown/stall happens and there is a mix of writers with WriteOptions::no_slowdown set/unset.
## 6.6.2 (01/13/2020)
### Bug Fixes
* Fixed a bug where non-L0 compaction input files were not considered to compute the `creation_time` of new compaction outputs.
## 6.6.1 (01/02/2020)
### Bug Fixes
* Fix a bug in WriteBatchWithIndex::MultiGetFromBatchAndDB, which is called by Transaction::MultiGet, that causes due to stale pointer access when the number of keys is > 32
* Fixed two performance issues related to memtable history trimming. First, a new SuperVersion is now created only if some memtables were actually trimmed. Second, trimming is only scheduled if there is at least one flushed memtable that is kept in memory for the purposes of transaction conflict checking.
* BlobDB no longer updates the SST to blob file mapping upon failed compactions.
* Fix a bug in which a snapshot read through an iterator could be affected by a DeleteRange after the snapshot (#6062).
* Fixed a bug where BlobDB was comparing the `ColumnFamilyHandle` pointers themselves instead of only the column family IDs when checking whether an API call uses the default column family or not.
* Delete superversions in BackgroundCallPurge.
* Fix use-after-free and double-deleting files in BackgroundCallPurge().
## 6.6.0 (11/25/2019)
### Bug Fixes
* Fix data corruption casued by output of intra-L0 compaction on ingested file not being placed in correct order in L0.
@ -19,8 +43,9 @@
* A batched MultiGet API (DB::MultiGet()) that supports retrieving keys from multiple column families.
* Full and partitioned filters in the block-based table use an improved Bloom filter implementation, enabled with format_version 5 (or above) because previous releases cannot read this filter. This replacement is faster and more accurate, especially for high bits per key or millions of keys in a single (full) filter. For example, the new Bloom filter has the same false postive rate at 9.55 bits per key as the old one at 10 bits per key, and a lower false positive rate at 16 bits per key than the old one at 100 bits per key.
* Added AVX2 instructions to USE_SSE builds to accelerate the new Bloom filter and XXH3-based hash function on compatible x86_64 platforms (Haswell and later, ~2014).
* Support options.ttl with options.max_open_files = -1. File's oldest ancester time will be written to manifest. If it is availalbe, this information will be used instead of creation_time in table properties.
* Support options.ttl or options.periodic_compaction_seconds with options.max_open_files = -1. File's oldest ancester time and file creation time will be written to manifest. If it is availalbe, this information will be used instead of creation_time and file_creation_time in table properties.
* Setting options.ttl for universal compaction now has the same meaning as setting periodic_compaction_seconds.
* SstFileMetaData also returns file creation time and oldest ancester time.
* The `sst_dump` command line tool `recompress` command now displays how many blocks were compressed and how many were not, in particular how many were not compressed because the compression ratio was not met (12.5% threshold for GoodCompressionRatio), as seen in the `number.block.not_compressed` counter stat since version 6.0.0.
* The block cache usage is now takes into account the overhead of metadata per each entry. This results into more accurate managment of memory. A side-effect of this feature is that less items are fit into the block cache of the same size, which would result to higher cache miss rates. This can be remedied by increasing the block cache size or passing kDontChargeCacheMetadata to its constuctor to restore the old behavior.
* When using BlobDB, a mapping is maintained and persisted in the MANIFEST between each SST file and the oldest non-TTL blob file it references.

View File

@ -1103,16 +1103,21 @@ unity_test: db/db_test.o db/db_test_util.o $(TESTHARNESS) $(TOOLLIBOBJECTS) unit
rocksdb.h rocksdb.cc: build_tools/amalgamate.py Makefile $(LIB_SOURCES) unity.cc
build_tools/amalgamate.py -I. -i./include unity.cc -x include/rocksdb/c.h -H rocksdb.h -o rocksdb.cc
clean: clean-ext-libraries-all clean-rocks
clean: clean-ext-libraries-all clean-rocks clean-rocksjava
clean-not-downloaded: clean-ext-libraries-bin clean-rocks
clean-not-downloaded: clean-ext-libraries-bin clean-rocks clean-not-downloaded-rocksjava
clean-rocks:
rm -f $(BENCHMARKS) $(TOOLS) $(TESTS) $(PARALLEL_TEST) $(LIBRARY) $(SHARED)
rm -rf $(CLEAN_FILES) ios-x86 ios-arm scan_build_report
$(FIND) . -name "*.[oda]" -exec rm -f {} \;
$(FIND) . -type f -regex ".*\.\(\(gcda\)\|\(gcno\)\)" -exec rm {} \;
cd java; $(MAKE) clean
clean-rocksjava:
cd java && $(MAKE) clean
clean-not-downloaded-rocksjava:
cd java && $(MAKE) clean-not-downloaded
clean-ext-libraries-all:
rm -rf bzip2* snappy* zlib* lz4* zstd*
@ -1806,7 +1811,7 @@ endif
libz.a:
-rm -rf zlib-$(ZLIB_VER)
ifeq (,$(wildcard ./zlib-$(ZLIB_VER).tar.gz))
curl --output zlib-$(ZLIB_VER).tar.gz -L ${ZLIB_DOWNLOAD_BASE}/zlib-$(ZLIB_VER).tar.gz
curl --fail --output zlib-$(ZLIB_VER).tar.gz --location ${ZLIB_DOWNLOAD_BASE}/zlib-$(ZLIB_VER).tar.gz
endif
ZLIB_SHA256_ACTUAL=`$(SHA256_CMD) zlib-$(ZLIB_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(ZLIB_SHA256)" != "$$ZLIB_SHA256_ACTUAL" ]; then \
@ -1820,7 +1825,7 @@ endif
libbz2.a:
-rm -rf bzip2-$(BZIP2_VER)
ifeq (,$(wildcard ./bzip2-$(BZIP2_VER).tar.gz))
curl --output bzip2-$(BZIP2_VER).tar.gz -L ${CURL_SSL_OPTS} ${BZIP2_DOWNLOAD_BASE}/bzip2-$(BZIP2_VER).tar.gz
curl --fail --output bzip2-$(BZIP2_VER).tar.gz --location ${CURL_SSL_OPTS} ${BZIP2_DOWNLOAD_BASE}/bzip2-$(BZIP2_VER).tar.gz
endif
BZIP2_SHA256_ACTUAL=`$(SHA256_CMD) bzip2-$(BZIP2_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(BZIP2_SHA256)" != "$$BZIP2_SHA256_ACTUAL" ]; then \
@ -1834,7 +1839,7 @@ endif
libsnappy.a:
-rm -rf snappy-$(SNAPPY_VER)
ifeq (,$(wildcard ./snappy-$(SNAPPY_VER).tar.gz))
curl --output snappy-$(SNAPPY_VER).tar.gz -L ${CURL_SSL_OPTS} ${SNAPPY_DOWNLOAD_BASE}/$(SNAPPY_VER).tar.gz
curl --fail --output snappy-$(SNAPPY_VER).tar.gz --location ${CURL_SSL_OPTS} ${SNAPPY_DOWNLOAD_BASE}/$(SNAPPY_VER).tar.gz
endif
SNAPPY_SHA256_ACTUAL=`$(SHA256_CMD) snappy-$(SNAPPY_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(SNAPPY_SHA256)" != "$$SNAPPY_SHA256_ACTUAL" ]; then \
@ -1849,7 +1854,7 @@ endif
liblz4.a:
-rm -rf lz4-$(LZ4_VER)
ifeq (,$(wildcard ./lz4-$(LZ4_VER).tar.gz))
curl --output lz4-$(LZ4_VER).tar.gz -L ${CURL_SSL_OPTS} ${LZ4_DOWNLOAD_BASE}/v$(LZ4_VER).tar.gz
curl --fail --output lz4-$(LZ4_VER).tar.gz --location ${CURL_SSL_OPTS} ${LZ4_DOWNLOAD_BASE}/v$(LZ4_VER).tar.gz
endif
LZ4_SHA256_ACTUAL=`$(SHA256_CMD) lz4-$(LZ4_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(LZ4_SHA256)" != "$$LZ4_SHA256_ACTUAL" ]; then \
@ -1863,7 +1868,7 @@ endif
libzstd.a:
-rm -rf zstd-$(ZSTD_VER)
ifeq (,$(wildcard ./zstd-$(ZSTD_VER).tar.gz))
curl --output zstd-$(ZSTD_VER).tar.gz -L ${CURL_SSL_OPTS} ${ZSTD_DOWNLOAD_BASE}/v$(ZSTD_VER).tar.gz
curl --fail --output zstd-$(ZSTD_VER).tar.gz --location ${CURL_SSL_OPTS} ${ZSTD_DOWNLOAD_BASE}/v$(ZSTD_VER).tar.gz
endif
ZSTD_SHA256_ACTUAL=`$(SHA256_CMD) zstd-$(ZSTD_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(ZSTD_SHA256)" != "$$ZSTD_SHA256_ACTUAL" ]; then \
@ -1933,35 +1938,35 @@ rocksdbjavastaticreleasedocker: rocksdbjavastatic rocksdbjavastaticdockerx86 roc
rocksdbjavastaticdockerx86:
mkdir -p java/target
docker run --rm --name rocksdb_linux_x86-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_x86-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerx86_64:
mkdir -p java/target
docker run --rm --name rocksdb_linux_x64-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_x64-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerppc64le:
mkdir -p java/target
docker run --rm --name rocksdb_linux_ppc64le-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_ppc64le-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerarm64v8:
mkdir -p java/target
docker run --rm --name rocksdb_linux_arm64v8-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_arm64v8-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerx86musl:
mkdir -p java/target
docker run --rm --name rocksdb_linux_x86-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_x86-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerx86_64musl:
mkdir -p java/target
docker run --rm --name rocksdb_linux_x64-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_x64-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerppc64lemusl:
mkdir -p java/target
docker run --rm --name rocksdb_linux_ppc64le-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_ppc64le-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerarm64v8musl:
mkdir -p java/target
docker run --rm --name rocksdb_linux_arm64v8-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
docker run --rm --name rocksdb_linux_arm64v8-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticpublish: rocksdbjavastaticrelease rocksdbjavastaticpublishcentral

View File

@ -17,41 +17,49 @@ environment:
ZSTD_INCLUDE: $(ZSTD_HOME)\lib;$(ZSTD_HOME)\lib\dictBuilder
ZSTD_LIB_DEBUG: $(ZSTD_HOME)\build\VS2010\bin\x64_Debug\libzstd_static.lib
ZSTD_LIB_RELEASE: $(ZSTD_HOME)\build\VS2010\bin\x64_Release\libzstd_static.lib
matrix:
- APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2015
CMAKE_GENERATOR: Visual Studio 14 Win64
DEV_ENV: C:\Program Files (x86)\Microsoft Visual Studio 14.0\Common7\IDE\devenv.com
- APPVEYOR_BUILD_WORKER_IMAGE: Visual Studio 2017
CMAKE_GENERATOR: Visual Studio 15 Win64
DEV_ENV: C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\devenv.com
install:
- md %THIRDPARTY_HOME%
- echo "Building Snappy dependency..."
- cd %THIRDPARTY_HOME%
- curl -fsSL -o snappy-1.1.7.zip https://github.com/google/snappy/archive/1.1.7.zip
- curl --fail --silent --show-error --output snappy-1.1.7.zip --location https://github.com/google/snappy/archive/1.1.7.zip
- unzip snappy-1.1.7.zip
- cd snappy-1.1.7
- mkdir build
- cd build
- cmake -DCMAKE_GENERATOR_PLATFORM=x64 ..
- cmake -G "%CMAKE_GENERATOR%" ..
- msbuild Snappy.sln /p:Configuration=Debug /p:Platform=x64
- msbuild Snappy.sln /p:Configuration=Release /p:Platform=x64
- echo "Building LZ4 dependency..."
- cd %THIRDPARTY_HOME%
- curl -fsSL -o lz4-1.8.3.zip https://github.com/lz4/lz4/archive/v1.8.3.zip
- curl --fail --silent --show-error --output lz4-1.8.3.zip --location https://github.com/lz4/lz4/archive/v1.8.3.zip
- unzip lz4-1.8.3.zip
- cd lz4-1.8.3\visual\VS2010
- ps: $CMD="C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\devenv.com"; & $CMD lz4.sln /upgrade
- ps: $CMD="$Env:DEV_ENV"; & $CMD lz4.sln /upgrade
- msbuild lz4.sln /p:Configuration=Debug /p:Platform=x64
- msbuild lz4.sln /p:Configuration=Release /p:Platform=x64
- echo "Building ZStd dependency..."
- cd %THIRDPARTY_HOME%
- curl -fsSL -o zstd-1.4.0.zip https://github.com/facebook/zstd/archive/v1.4.0.zip
- curl --fail --silent --show-error --output zstd-1.4.0.zip --location https://github.com/facebook/zstd/archive/v1.4.0.zip
- unzip zstd-1.4.0.zip
- cd zstd-1.4.0\build\VS2010
- ps: $CMD="C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\devenv.com"; & $CMD zstd.sln /upgrade
- ps: $CMD="$Env:DEV_ENV"; & $CMD zstd.sln /upgrade
- msbuild zstd.sln /p:Configuration=Debug /p:Platform=x64
- msbuild zstd.sln /p:Configuration=Release /p:Platform=x64
before_build:
- md %APPVEYOR_BUILD_FOLDER%\build
- cd %APPVEYOR_BUILD_FOLDER%\build
- cmake -G "Visual Studio 15 Win64" -DOPTDBG=1 -DPORTABLE=1 -DSNAPPY=1 -DLZ4=1 -DZSTD=1 -DXPRESS=1 -DJNI=1 ..
- cmake -G "%CMAKE_GENERATOR%" -DCMAKE_BUILD_TYPE=Debug -DOPTDBG=1 -DPORTABLE=1 -DSNAPPY=1 -DLZ4=1 -DZSTD=1 -DXPRESS=1 -DJNI=1 ..
- cd ..
build:
project: build\rocksdb.sln
parallel: true

View File

@ -1,4 +1,4 @@
#!/usr/local/fbcode/gcc-4.9-glibc-2.20-fb/bin/python2.7
#!/usr/bin/env python2.7
# Copyright (c) Facebook, Inc. and its affiliates. All Rights Reserved.
from __future__ import absolute_import

View File

@ -378,7 +378,7 @@ function send_to_ods {
echo >&2 "ERROR: Key $key doesn't have a value."
return
fi
curl -s "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build$git_br&key=$key&value=$value" \
curl --silent "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build$git_br&key=$key&value=$value" \
--connect-timeout 60
}

View File

@ -778,7 +778,7 @@ run_regression()
# parameters: $1 -- key, $2 -- value
function send_size_to_ods {
curl -s "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build&key=rocksdb.build_size.$1&value=$2" \
curl --silent "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build&key=rocksdb.build_size.$1&value=$2" \
--connect-timeout 60
}

1
cache/lru_cache.h vendored
View File

@ -133,7 +133,6 @@ struct LRUHandle {
// Caclculate the memory usage by metadata
inline size_t CalcTotalCharge(
CacheMetadataChargePolicy metadata_charge_policy) {
assert(key_length);
size_t meta_charge = 0;
if (metadata_charge_policy == kFullChargeCacheMetadata) {
#ifdef ROCKSDB_MALLOC_USABLE_SIZE

View File

@ -64,7 +64,7 @@ Status ArenaWrappedDBIter::Refresh() {
arena_.~Arena();
new (&arena_) Arena();
SuperVersion* sv = cfd_->GetReferencedSuperVersion(db_impl_->mutex());
SuperVersion* sv = cfd_->GetReferencedSuperVersion(db_impl_);
if (read_callback_) {
read_callback_->Refresh(latest_seq);
}

View File

@ -1080,9 +1080,8 @@ Compaction* ColumnFamilyData::CompactRange(
return result;
}
SuperVersion* ColumnFamilyData::GetReferencedSuperVersion(
InstrumentedMutex* db_mutex) {
SuperVersion* sv = GetThreadLocalSuperVersion(db_mutex);
SuperVersion* ColumnFamilyData::GetReferencedSuperVersion(DBImpl* db) {
SuperVersion* sv = GetThreadLocalSuperVersion(db);
sv->Ref();
if (!ReturnThreadLocalSuperVersion(sv)) {
// This Unref() corresponds to the Ref() in GetThreadLocalSuperVersion()
@ -1094,8 +1093,7 @@ SuperVersion* ColumnFamilyData::GetReferencedSuperVersion(
return sv;
}
SuperVersion* ColumnFamilyData::GetThreadLocalSuperVersion(
InstrumentedMutex* db_mutex) {
SuperVersion* ColumnFamilyData::GetThreadLocalSuperVersion(DBImpl* db) {
// The SuperVersion is cached in thread local storage to avoid acquiring
// mutex when SuperVersion does not change since the last use. When a new
// SuperVersion is installed, the compaction or flush thread cleans up
@ -1122,16 +1120,21 @@ SuperVersion* ColumnFamilyData::GetThreadLocalSuperVersion(
if (sv && sv->Unref()) {
RecordTick(ioptions_.statistics, NUMBER_SUPERVERSION_CLEANUPS);
db_mutex->Lock();
db->mutex()->Lock();
// NOTE: underlying resources held by superversion (sst files) might
// not be released until the next background job.
sv->Cleanup();
sv_to_delete = sv;
if (db->immutable_db_options().avoid_unnecessary_blocking_io) {
db->AddSuperVersionsToFreeQueue(sv);
db->SchedulePurge();
} else {
sv_to_delete = sv;
}
} else {
db_mutex->Lock();
db->mutex()->Lock();
}
sv = super_version_->Ref();
db_mutex->Unlock();
db->mutex()->Unlock();
delete sv_to_delete;
}

View File

@ -430,11 +430,11 @@ class ColumnFamilyData {
SuperVersion* GetSuperVersion() { return super_version_; }
// thread-safe
// Return a already referenced SuperVersion to be used safely.
SuperVersion* GetReferencedSuperVersion(InstrumentedMutex* db_mutex);
SuperVersion* GetReferencedSuperVersion(DBImpl* db);
// thread-safe
// Get SuperVersion stored in thread local storage. If it does not exist,
// get a reference from a current SuperVersion.
SuperVersion* GetThreadLocalSuperVersion(InstrumentedMutex* db_mutex);
SuperVersion* GetThreadLocalSuperVersion(DBImpl* db);
// Try to return SuperVersion back to thread local storage. Retrun true on
// success and false on failure. It fails when the thread local storage
// contains anything other than SuperVersion::kSVInUse flag.

View File

@ -67,9 +67,9 @@ class ColumnFamilyTestBase : public testing::Test {
#ifndef ROCKSDB_LITE
const char* test_env_uri = getenv("TEST_ENV_URI");
if (test_env_uri) {
Status s = ObjectRegistry::NewInstance()->NewSharedObject(test_env_uri,
&env_guard_);
base_env = env_guard_.get();
Env* test_env = nullptr;
Status s = Env::LoadEnv(test_env_uri, &test_env, &env_guard_);
base_env = test_env;
EXPECT_OK(s);
EXPECT_NE(Env::Default(), base_env);
}

View File

@ -547,11 +547,13 @@ bool Compaction::ShouldFormSubcompactions() const {
uint64_t Compaction::MinInputFileOldestAncesterTime() const {
uint64_t min_oldest_ancester_time = port::kMaxUint64;
for (const auto& file : inputs_[0].files) {
uint64_t oldest_ancester_time = file->TryGetOldestAncesterTime();
if (oldest_ancester_time != 0) {
min_oldest_ancester_time =
std::min(min_oldest_ancester_time, oldest_ancester_time);
for (const auto& level_files : inputs_) {
for (const auto& file : level_files.files) {
uint64_t oldest_ancester_time = file->TryGetOldestAncesterTime();
if (oldest_ancester_time != 0) {
min_oldest_ancester_time =
std::min(min_oldest_ancester_time, oldest_ancester_time);
}
}
}
return min_oldest_ancester_time;

View File

@ -1501,6 +1501,7 @@ Status CompactionJob::OpenCompactionOutputFile(
out.meta.fd = FileDescriptor(file_number,
sub_compact->compaction->output_path_id(), 0);
out.meta.oldest_ancester_time = oldest_ancester_time;
out.meta.file_creation_time = current_time;
out.finished = false;
sub_compact->outputs.push_back(out);
}

View File

@ -184,7 +184,7 @@ class CompactionJobTest : public testing::Test {
VersionEdit edit;
edit.AddFile(level, file_number, 0, 10, smallest_key, largest_key,
smallest_seqno, largest_seqno, false, oldest_blob_file_number,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
mutex_.Lock();
versions_->LogAndApply(versions_->GetColumnFamilySet()->GetDefault(),

View File

@ -95,7 +95,7 @@ class CompactionPickerTest : public testing::Test {
InternalKey(smallest, smallest_seq, kTypeValue),
InternalKey(largest, largest_seq, kTypeValue), smallest_seq,
largest_seq, /* marked_for_compact */ false, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
f->compensated_file_size =
(compensated_file_size != 0) ? compensated_file_size : file_size;
vstorage_->AddFile(level, f);

View File

@ -3576,6 +3576,15 @@ TEST_F(DBCompactionTest, LevelTtlCascadingCompactions) {
ASSERT_OK(Put(Key(i), RandomString(&rnd, kValueSize)));
}
Flush();
// Get the first file's creation time. This will be the oldest file in the
// DB. Compactions inolving this file's descendents should keep getting
// this time.
std::vector<std::vector<FileMetaData>> level_to_files;
dbfull()->TEST_GetFilesMetaData(dbfull()->DefaultColumnFamily(),
&level_to_files);
uint64_t oldest_time = level_to_files[0][0].oldest_ancester_time;
// Add 1 hour and do another flush.
env_->addon_time_.fetch_add(1 * 60 * 60);
for (int i = 101; i <= 200; ++i) {
ASSERT_OK(Put(Key(i), RandomString(&rnd, kValueSize)));
}
@ -3583,11 +3592,13 @@ TEST_F(DBCompactionTest, LevelTtlCascadingCompactions) {
MoveFilesToLevel(6);
ASSERT_EQ("0,0,0,0,0,0,2", FilesPerLevel());
env_->addon_time_.fetch_add(1 * 60 * 60);
// Add two L4 files with key ranges: [1 .. 50], [51 .. 150].
for (int i = 1; i <= 50; ++i) {
ASSERT_OK(Put(Key(i), RandomString(&rnd, kValueSize)));
}
Flush();
env_->addon_time_.fetch_add(1 * 60 * 60);
for (int i = 51; i <= 150; ++i) {
ASSERT_OK(Put(Key(i), RandomString(&rnd, kValueSize)));
}
@ -3595,6 +3606,7 @@ TEST_F(DBCompactionTest, LevelTtlCascadingCompactions) {
MoveFilesToLevel(4);
ASSERT_EQ("0,0,0,0,2,0,2", FilesPerLevel());
env_->addon_time_.fetch_add(1 * 60 * 60);
// Add one L1 file with key range: [26, 75].
for (int i = 26; i <= 75; ++i) {
ASSERT_OK(Put(Key(i), RandomString(&rnd, kValueSize)));
@ -3636,6 +3648,10 @@ TEST_F(DBCompactionTest, LevelTtlCascadingCompactions) {
ASSERT_EQ("1,0,0,0,0,0,1", FilesPerLevel());
ASSERT_EQ(5, ttl_compactions);
dbfull()->TEST_GetFilesMetaData(dbfull()->DefaultColumnFamily(),
&level_to_files);
ASSERT_EQ(oldest_time, level_to_files[6][0].oldest_ancester_time);
env_->addon_time_.fetch_add(25 * 60 * 60);
ASSERT_OK(Put(Key(2), "1"));
if (if_restart) {
@ -3657,71 +3673,103 @@ TEST_F(DBCompactionTest, LevelPeriodicCompaction) {
const int kNumLevelFiles = 2;
const int kValueSize = 100;
Options options = CurrentOptions();
options.periodic_compaction_seconds = 48 * 60 * 60; // 2 days
options.max_open_files = -1; // needed for ttl compaction
env_->time_elapse_only_sleep_ = false;
options.env = env_;
for (bool if_restart : {false, true}) {
for (bool if_open_all_files : {false, true}) {
Options options = CurrentOptions();
options.periodic_compaction_seconds = 48 * 60 * 60; // 2 days
if (if_open_all_files) {
options.max_open_files = -1; // needed for ttl compaction
} else {
options.max_open_files = 20;
}
// RocksDB sanitize max open files to at least 20. Modify it back.
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"SanitizeOptions::AfterChangeMaxOpenFiles", [&](void* arg) {
int* max_open_files = static_cast<int*>(arg);
*max_open_files = 0;
});
// In the case where all files are opened and doing DB restart
// forcing the file creation time in manifest file to be 0 to
// simulate the case of reading from an old version.
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"VersionEdit::EncodeTo:VarintFileCreationTime", [&](void* arg) {
if (if_restart && if_open_all_files) {
std::string* encoded_fieled = static_cast<std::string*>(arg);
*encoded_fieled = "";
PutVarint64(encoded_fieled, 0);
}
});
env_->addon_time_.store(0);
DestroyAndReopen(options);
env_->time_elapse_only_sleep_ = false;
options.env = env_;
int periodic_compactions = 0;
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"LevelCompactionPicker::PickCompaction:Return", [&](void* arg) {
Compaction* compaction = reinterpret_cast<Compaction*>(arg);
auto compaction_reason = compaction->compaction_reason();
if (compaction_reason == CompactionReason::kPeriodicCompaction) {
periodic_compactions++;
env_->addon_time_.store(0);
DestroyAndReopen(options);
int periodic_compactions = 0;
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"LevelCompactionPicker::PickCompaction:Return", [&](void* arg) {
Compaction* compaction = reinterpret_cast<Compaction*>(arg);
auto compaction_reason = compaction->compaction_reason();
if (compaction_reason == CompactionReason::kPeriodicCompaction) {
periodic_compactions++;
}
});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
Random rnd(301);
for (int i = 0; i < kNumLevelFiles; ++i) {
for (int j = 0; j < kNumKeysPerFile; ++j) {
ASSERT_OK(Put(Key(i * kNumKeysPerFile + j),
RandomString(&rnd, kValueSize)));
}
});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
Flush();
}
dbfull()->TEST_WaitForCompact();
Random rnd(301);
for (int i = 0; i < kNumLevelFiles; ++i) {
for (int j = 0; j < kNumKeysPerFile; ++j) {
ASSERT_OK(
Put(Key(i * kNumKeysPerFile + j), RandomString(&rnd, kValueSize)));
ASSERT_EQ("2", FilesPerLevel());
ASSERT_EQ(0, periodic_compactions);
// Add 50 hours and do a write
env_->addon_time_.fetch_add(50 * 60 * 60);
ASSERT_OK(Put("a", "1"));
Flush();
dbfull()->TEST_WaitForCompact();
// Assert that the files stay in the same level
ASSERT_EQ("3", FilesPerLevel());
// The two old files go through the periodic compaction process
ASSERT_EQ(2, periodic_compactions);
MoveFilesToLevel(1);
ASSERT_EQ("0,3", FilesPerLevel());
// Add another 50 hours and do another write
env_->addon_time_.fetch_add(50 * 60 * 60);
ASSERT_OK(Put("b", "2"));
if (if_restart) {
Reopen(options);
} else {
Flush();
}
dbfull()->TEST_WaitForCompact();
ASSERT_EQ("1,3", FilesPerLevel());
// The three old files now go through the periodic compaction process. 2
// + 3.
ASSERT_EQ(5, periodic_compactions);
// Add another 50 hours and do another write
env_->addon_time_.fetch_add(50 * 60 * 60);
ASSERT_OK(Put("c", "3"));
Flush();
dbfull()->TEST_WaitForCompact();
ASSERT_EQ("2,3", FilesPerLevel());
// The four old files now go through the periodic compaction process. 5
// + 4.
ASSERT_EQ(9, periodic_compactions);
rocksdb::SyncPoint::GetInstance()->DisableProcessing();
}
Flush();
}
dbfull()->TEST_WaitForCompact();
ASSERT_EQ("2", FilesPerLevel());
ASSERT_EQ(0, periodic_compactions);
// Add 50 hours and do a write
env_->addon_time_.fetch_add(50 * 60 * 60);
ASSERT_OK(Put("a", "1"));
Flush();
dbfull()->TEST_WaitForCompact();
// Assert that the files stay in the same level
ASSERT_EQ("3", FilesPerLevel());
// The two old files go through the periodic compaction process
ASSERT_EQ(2, periodic_compactions);
MoveFilesToLevel(1);
ASSERT_EQ("0,3", FilesPerLevel());
// Add another 50 hours and do another write
env_->addon_time_.fetch_add(50 * 60 * 60);
ASSERT_OK(Put("b", "2"));
Flush();
dbfull()->TEST_WaitForCompact();
ASSERT_EQ("1,3", FilesPerLevel());
// The three old files now go through the periodic compaction process. 2 + 3.
ASSERT_EQ(5, periodic_compactions);
// Add another 50 hours and do another write
env_->addon_time_.fetch_add(50 * 60 * 60);
ASSERT_OK(Put("c", "3"));
Flush();
dbfull()->TEST_WaitForCompact();
ASSERT_EQ("2,3", FilesPerLevel());
// The four old files now go through the periodic compaction process. 5 + 4.
ASSERT_EQ(9, periodic_compactions);
rocksdb::SyncPoint::GetInstance()->DisableProcessing();
}
TEST_F(DBCompactionTest, LevelPeriodicCompactionWithOldDB) {
@ -3734,7 +3782,6 @@ TEST_F(DBCompactionTest, LevelPeriodicCompactionWithOldDB) {
const int kValueSize = 100;
Options options = CurrentOptions();
options.max_open_files = -1; // needed for ttl compaction
env_->time_elapse_only_sleep_ = false;
options.env = env_;

View File

@ -879,7 +879,7 @@ Status DBImpl::TablesRangeTombstoneSummary(ColumnFamilyHandle* column_family,
column_family);
ColumnFamilyData* cfd = cfh->cfd();
SuperVersion* super_version = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* super_version = cfd->GetReferencedSuperVersion(this);
Version* version = super_version->current;
Status s =
@ -895,7 +895,6 @@ void DBImpl::ScheduleBgLogWriterClose(JobContext* job_context) {
AddToLogsToFreeQueue(l);
}
job_context->logs_to_free.clear();
SchedulePurge();
}
}
@ -1322,19 +1321,34 @@ void DBImpl::BackgroundCallPurge() {
delete log_writer;
mutex_.Lock();
}
for (const auto& file : purge_files_) {
const PurgeFileInfo& purge_file = file.second;
while (!superversions_to_free_queue_.empty()) {
assert(!superversions_to_free_queue_.empty());
SuperVersion* sv = superversions_to_free_queue_.front();
superversions_to_free_queue_.pop_front();
mutex_.Unlock();
delete sv;
mutex_.Lock();
}
// Can't use iterator to go over purge_files_ because inside the loop we're
// unlocking the mutex that protects purge_files_.
while (!purge_files_.empty()) {
auto it = purge_files_.begin();
// Need to make a copy of the PurgeFilesInfo before unlocking the mutex.
PurgeFileInfo purge_file = it->second;
const std::string& fname = purge_file.fname;
const std::string& dir_to_sync = purge_file.dir_to_sync;
FileType type = purge_file.type;
uint64_t number = purge_file.number;
int job_id = purge_file.job_id;
purge_files_.erase(it);
mutex_.Unlock();
DeleteObsoleteFileImpl(job_id, fname, dir_to_sync, type, number);
mutex_.Lock();
}
purge_files_.clear();
bg_purge_scheduled_--;
@ -1374,10 +1388,14 @@ static void CleanupIteratorState(void* arg1, void* /*arg2*/) {
state->db->FindObsoleteFiles(&job_context, false, true);
if (state->background_purge) {
state->db->ScheduleBgLogWriterClose(&job_context);
state->db->AddSuperVersionsToFreeQueue(state->super_version);
state->db->SchedulePurge();
}
state->mu->Unlock();
delete state->super_version;
if (!state->background_purge) {
delete state->super_version;
}
if (job_context.HaveSomethingToDelete()) {
if (state->background_purge) {
// PurgeObsoleteFiles here does not delete files. Instead, it adds the
@ -2452,7 +2470,7 @@ Iterator* DBImpl::NewIterator(const ReadOptions& read_options,
result = nullptr;
#else
SuperVersion* sv = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* sv = cfd->GetReferencedSuperVersion(this);
auto iter = new ForwardIterator(this, read_options, cfd, sv);
result = NewDBIterator(
env_, read_options, *cfd->ioptions(), sv->mutable_cf_options,
@ -2478,7 +2496,7 @@ ArenaWrappedDBIter* DBImpl::NewIteratorImpl(const ReadOptions& read_options,
ReadCallback* read_callback,
bool allow_blob,
bool allow_refresh) {
SuperVersion* sv = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* sv = cfd->GetReferencedSuperVersion(this);
// Try to generate a DB iterator tree in continuous memory area to be
// cache friendly. Here is an example of result:
@ -2557,7 +2575,7 @@ Status DBImpl::NewIterators(
#else
for (auto cfh : column_families) {
auto cfd = reinterpret_cast<ColumnFamilyHandleImpl*>(cfh)->cfd();
SuperVersion* sv = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* sv = cfd->GetReferencedSuperVersion(this);
auto iter = new ForwardIterator(this, read_options, cfd, sv);
iterators->push_back(NewDBIterator(
env_, read_options, *cfd->ioptions(), sv->mutable_cf_options,
@ -2884,7 +2902,7 @@ bool DBImpl::GetAggregatedIntProperty(const Slice& property,
SuperVersion* DBImpl::GetAndRefSuperVersion(ColumnFamilyData* cfd) {
// TODO(ljin): consider using GetReferencedSuperVersion() directly
return cfd->GetThreadLocalSuperVersion(&mutex_);
return cfd->GetThreadLocalSuperVersion(this);
}
// REQUIRED: this function should only be called on the write thread or if the
@ -2902,11 +2920,19 @@ SuperVersion* DBImpl::GetAndRefSuperVersion(uint32_t column_family_id) {
void DBImpl::CleanupSuperVersion(SuperVersion* sv) {
// Release SuperVersion
if (sv->Unref()) {
bool defer_purge =
immutable_db_options().avoid_unnecessary_blocking_io;
{
InstrumentedMutexLock l(&mutex_);
sv->Cleanup();
if (defer_purge) {
AddSuperVersionsToFreeQueue(sv);
SchedulePurge();
}
}
if (!defer_purge) {
delete sv;
}
delete sv;
RecordTick(stats_, NUMBER_SUPERVERSION_CLEANUPS);
}
RecordTick(stats_, NUMBER_SUPERVERSION_RELEASES);
@ -3912,7 +3938,7 @@ Status DBImpl::IngestExternalFiles(
start_file_number += args[i - 1].external_files.size();
auto* cfd =
static_cast<ColumnFamilyHandleImpl*>(args[i].column_family)->cfd();
SuperVersion* super_version = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* super_version = cfd->GetReferencedSuperVersion(this);
exec_results[i].second = ingestion_jobs[i].Prepare(
args[i].external_files, start_file_number, super_version);
exec_results[i].first = true;
@ -3923,7 +3949,7 @@ Status DBImpl::IngestExternalFiles(
{
auto* cfd =
static_cast<ColumnFamilyHandleImpl*>(args[0].column_family)->cfd();
SuperVersion* super_version = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* super_version = cfd->GetReferencedSuperVersion(this);
exec_results[0].second = ingestion_jobs[0].Prepare(
args[0].external_files, next_file_number, super_version);
exec_results[0].first = true;
@ -4192,7 +4218,7 @@ Status DBImpl::CreateColumnFamilyWithImport(
dummy_sv_ctx.Clean();
if (status.ok()) {
SuperVersion* sv = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* sv = cfd->GetReferencedSuperVersion(this);
status = import_job.Prepare(next_file_number, sv);
CleanupSuperVersion(sv);
}
@ -4269,7 +4295,7 @@ Status DBImpl::VerifyChecksum(const ReadOptions& read_options) {
}
std::vector<SuperVersion*> sv_list;
for (auto cfd : cfd_list) {
sv_list.push_back(cfd->GetReferencedSuperVersion(&mutex_));
sv_list.push_back(cfd->GetReferencedSuperVersion(this));
}
for (auto& sv : sv_list) {
VersionStorageInfo* vstorage = sv->current->storage_info();
@ -4294,14 +4320,23 @@ Status DBImpl::VerifyChecksum(const ReadOptions& read_options) {
break;
}
}
bool defer_purge =
immutable_db_options().avoid_unnecessary_blocking_io;
{
InstrumentedMutexLock l(&mutex_);
for (auto sv : sv_list) {
if (sv && sv->Unref()) {
sv->Cleanup();
delete sv;
if (defer_purge) {
AddSuperVersionsToFreeQueue(sv);
} else {
delete sv;
}
}
}
if (defer_purge) {
SchedulePurge();
}
for (auto cfd : cfd_list) {
cfd->Unref();
}

View File

@ -798,6 +798,10 @@ class DBImpl : public DB {
logs_to_free_queue_.push_back(log_writer);
}
void AddSuperVersionsToFreeQueue(SuperVersion* sv) {
superversions_to_free_queue_.push_back(sv);
}
void SetSnapshotChecker(SnapshotChecker* snapshot_checker);
// Fill JobContext with snapshot information needed by flush and compaction.
@ -1109,6 +1113,8 @@ class DBImpl : public DB {
bool read_only = false, bool error_if_log_file_exist = false,
bool error_if_data_exists_in_logs = false);
virtual bool OwnTablesAndLogs() const { return true; }
private:
friend class DB;
friend class ErrorHandler;
@ -1193,7 +1199,7 @@ class DBImpl : public DB {
};
// PurgeFileInfo is a structure to hold information of files to be deleted in
// purge_queue_
// purge_files_
struct PurgeFileInfo {
std::string fname;
std::string dir_to_sync;
@ -1888,6 +1894,7 @@ class DBImpl : public DB {
// A queue to store log writers to close
std::deque<log::Writer*> logs_to_free_queue_;
std::deque<SuperVersion*> superversions_to_free_queue_;
int unscheduled_flushes_;
int unscheduled_compactions_;

View File

@ -659,7 +659,7 @@ Status DBImpl::CompactRange(const CompactRangeOptions& options,
// one/both sides of the interval are unbounded. But it requires more
// changes to RangesOverlapWithMemtables.
Range range(*begin, *end);
SuperVersion* super_version = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* super_version = cfd->GetReferencedSuperVersion(this);
cfd->RangesOverlapWithMemtables({range}, super_version, &flush_needed);
CleanupSuperVersion(super_version);
}
@ -1257,7 +1257,7 @@ Status DBImpl::ReFitLevel(ColumnFamilyData* cfd, int level, int target_level) {
f->fd.GetFileSize(), f->smallest, f->largest,
f->fd.smallest_seqno, f->fd.largest_seqno,
f->marked_for_compaction, f->oldest_blob_file_number,
f->oldest_ancester_time);
f->oldest_ancester_time, f->file_creation_time);
}
ROCKS_LOG_DEBUG(immutable_db_options_.info_log,
"[%s] Apply version edit:\n%s", cfd->GetName().c_str(),
@ -2672,7 +2672,8 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
f->fd.GetPathId(), f->fd.GetFileSize(), f->smallest,
f->largest, f->fd.smallest_seqno,
f->fd.largest_seqno, f->marked_for_compaction,
f->oldest_blob_file_number, f->oldest_ancester_time);
f->oldest_blob_file_number, f->oldest_ancester_time,
f->file_creation_time);
ROCKS_LOG_BUFFER(
log_buffer,

View File

@ -129,7 +129,7 @@ Status DBImpl::PromoteL0(ColumnFamilyHandle* column_family, int target_level) {
f->fd.GetFileSize(), f->smallest, f->largest,
f->fd.smallest_seqno, f->fd.largest_seqno,
f->marked_for_compaction, f->oldest_blob_file_number,
f->oldest_ancester_time);
f->oldest_ancester_time, f->file_creation_time);
}
status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),

View File

@ -385,6 +385,7 @@ void DBImpl::PurgeObsoleteFiles(JobContext& state, bool schedule_only) {
w->Close();
}
bool own_files = OwnTablesAndLogs();
std::unordered_set<uint64_t> files_to_del;
for (const auto& candidate_file : candidate_files) {
const std::string& to_delete = candidate_file.file_name;
@ -484,6 +485,12 @@ void DBImpl::PurgeObsoleteFiles(JobContext& state, bool schedule_only) {
}
#endif // !ROCKSDB_LITE
// If I do not own these files, e.g. secondary instance with max_open_files
// = -1, then no need to delete or schedule delete these files since they
// will be removed by their owner, e.g. the primary instance.
if (!own_files) {
continue;
}
Status file_deletion_status;
if (schedule_only) {
InstrumentedMutexLock guard_lock(&mutex_);
@ -495,7 +502,6 @@ void DBImpl::PurgeObsoleteFiles(JobContext& state, bool schedule_only) {
{
// After purging obsolete files, remove them from files_grabbed_for_purge_.
// Use a temporary vector to perform bulk deletion via swap.
InstrumentedMutexLock guard_lock(&mutex_);
autovector<uint64_t> to_be_removed;
for (auto fn : files_grabbed_for_purge_) {

View File

@ -1226,7 +1226,7 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
meta.fd.GetFileSize(), meta.smallest, meta.largest,
meta.fd.smallest_seqno, meta.fd.largest_seqno,
meta.marked_for_compaction, meta.oldest_blob_file_number,
meta.oldest_ancester_time);
meta.oldest_ancester_time, meta.file_creation_time);
}
InternalStats::CompactionStats stats(CompactionReason::kFlush, 1);

View File

@ -11,6 +11,7 @@
#include "db/merge_context.h"
#include "logging/auto_roll_logger.h"
#include "monitoring/perf_context_imp.h"
#include "util/cast_util.h"
namespace rocksdb {
@ -405,7 +406,7 @@ ArenaWrappedDBIter* DBImplSecondary::NewIteratorImpl(
const ReadOptions& read_options, ColumnFamilyData* cfd,
SequenceNumber snapshot, ReadCallback* read_callback) {
assert(nullptr != cfd);
SuperVersion* super_version = cfd->GetReferencedSuperVersion(&mutex_);
SuperVersion* super_version = cfd->GetReferencedSuperVersion(this);
auto db_iter = NewArenaWrappedDbIterator(
env_, read_options, *cfd->ioptions(), super_version->mutable_cf_options,
snapshot,
@ -497,45 +498,61 @@ Status DBImplSecondary::TryCatchUpWithPrimary() {
// read the manifest and apply new changes to the secondary instance
std::unordered_set<ColumnFamilyData*> cfds_changed;
JobContext job_context(0, true /*create_superversion*/);
InstrumentedMutexLock lock_guard(&mutex_);
s = static_cast<ReactiveVersionSet*>(versions_.get())
->ReadAndApply(&mutex_, &manifest_reader_, &cfds_changed);
{
InstrumentedMutexLock lock_guard(&mutex_);
s = static_cast_with_check<ReactiveVersionSet>(versions_.get())
->ReadAndApply(&mutex_, &manifest_reader_, &cfds_changed);
ROCKS_LOG_INFO(immutable_db_options_.info_log, "Last sequence is %" PRIu64,
static_cast<uint64_t>(versions_->LastSequence()));
for (ColumnFamilyData* cfd : cfds_changed) {
if (cfd->IsDropped()) {
ROCKS_LOG_DEBUG(immutable_db_options_.info_log, "[%s] is dropped\n",
cfd->GetName().c_str());
continue;
ROCKS_LOG_INFO(immutable_db_options_.info_log, "Last sequence is %" PRIu64,
static_cast<uint64_t>(versions_->LastSequence()));
for (ColumnFamilyData* cfd : cfds_changed) {
if (cfd->IsDropped()) {
ROCKS_LOG_DEBUG(immutable_db_options_.info_log, "[%s] is dropped\n",
cfd->GetName().c_str());
continue;
}
VersionStorageInfo::LevelSummaryStorage tmp;
ROCKS_LOG_DEBUG(immutable_db_options_.info_log,
"[%s] Level summary: %s\n", cfd->GetName().c_str(),
cfd->current()->storage_info()->LevelSummary(&tmp));
}
VersionStorageInfo::LevelSummaryStorage tmp;
ROCKS_LOG_DEBUG(immutable_db_options_.info_log, "[%s] Level summary: %s\n",
cfd->GetName().c_str(),
cfd->current()->storage_info()->LevelSummary(&tmp));
}
// list wal_dir to discover new WALs and apply new changes to the secondary
// instance
if (s.ok()) {
s = FindAndRecoverLogFiles(&cfds_changed, &job_context);
}
if (s.IsPathNotFound()) {
ROCKS_LOG_INFO(immutable_db_options_.info_log,
"Secondary tries to read WAL, but WAL file(s) have already "
"been purged by primary.");
s = Status::OK();
}
if (s.ok()) {
for (auto cfd : cfds_changed) {
cfd->imm()->RemoveOldMemTables(cfd->GetLogNumber(),
&job_context.memtables_to_free);
auto& sv_context = job_context.superversion_contexts.back();
cfd->InstallSuperVersion(&sv_context, &mutex_);
sv_context.NewSuperVersion();
// list wal_dir to discover new WALs and apply new changes to the secondary
// instance
if (s.ok()) {
s = FindAndRecoverLogFiles(&cfds_changed, &job_context);
}
if (s.IsPathNotFound()) {
ROCKS_LOG_INFO(
immutable_db_options_.info_log,
"Secondary tries to read WAL, but WAL file(s) have already "
"been purged by primary.");
s = Status::OK();
}
if (s.ok()) {
for (auto cfd : cfds_changed) {
cfd->imm()->RemoveOldMemTables(cfd->GetLogNumber(),
&job_context.memtables_to_free);
auto& sv_context = job_context.superversion_contexts.back();
cfd->InstallSuperVersion(&sv_context, &mutex_);
sv_context.NewSuperVersion();
}
}
job_context.Clean();
}
job_context.Clean();
// Cleanup unused, obsolete files.
JobContext purge_files_job_context(0);
{
InstrumentedMutexLock lock_guard(&mutex_);
// Currently, secondary instance does not own the database files, thus it
// is unnecessary for the secondary to force full scan.
FindObsoleteFiles(&purge_files_job_context, /*force=*/false);
}
if (purge_files_job_context.HaveSomethingToDelete()) {
PurgeObsoleteFiles(purge_files_job_context);
}
purge_files_job_context.Clean();
return s;
}

View File

@ -172,6 +172,24 @@ class DBImplSecondary : public DBImpl {
return Status::NotSupported("Not supported operation in secondary mode.");
}
using DBImpl::SetDBOptions;
Status SetDBOptions(const std::unordered_map<std::string, std::string>&
/*options_map*/) override {
// Currently not supported because changing certain options may cause
// flush/compaction.
return Status::NotSupported("Not supported operation in secondary mode.");
}
using DBImpl::SetOptions;
Status SetOptions(
ColumnFamilyHandle* /*cfd*/,
const std::unordered_map<std::string, std::string>& /*options_map*/)
override {
// Currently not supported because changing certain options may cause
// flush/compaction and/or write to MANIFEST.
return Status::NotSupported("Not supported operation in secondary mode.");
}
using DBImpl::SyncWAL;
Status SyncWAL() override {
return Status::NotSupported("Not supported operation in secondary mode.");
@ -269,6 +287,14 @@ class DBImplSecondary : public DBImpl {
return s;
}
bool OwnTablesAndLogs() const override {
// Currently, the secondary instance does not own the database files. It
// simply opens the files of the primary instance and tracks their file
// descriptors until they become obsolete. In the future, the secondary may
// create links to database files. OwnTablesAndLogs will return true then.
return false;
}
private:
friend class DB;

View File

@ -1516,12 +1516,14 @@ Status DBImpl::TrimMemtableHistory(WriteContext* context) {
for (auto& cfd : cfds) {
autovector<MemTable*> to_delete;
cfd->imm()->TrimHistory(&to_delete, cfd->mem()->ApproximateMemoryUsage());
for (auto m : to_delete) {
delete m;
if (!to_delete.empty()) {
for (auto m : to_delete) {
delete m;
}
context->superversion_context.NewSuperVersion();
assert(context->superversion_context.new_superversion.get() != nullptr);
cfd->InstallSuperVersion(&context->superversion_context, &mutex_);
}
context->superversion_context.NewSuperVersion();
assert(context->superversion_context.new_superversion.get() != nullptr);
cfd->InstallSuperVersion(&context->superversion_context, &mutex_);
if (cfd->Unref()) {
delete cfd;

View File

@ -195,6 +195,90 @@ TEST_F(DBSecondaryTest, OpenAsSecondary) {
verify_db_func("new_foo_value", "new_bar_value");
}
namespace {
class TraceFileEnv : public EnvWrapper {
public:
explicit TraceFileEnv(Env* target) : EnvWrapper(target) {}
Status NewRandomAccessFile(const std::string& f,
std::unique_ptr<RandomAccessFile>* r,
const EnvOptions& env_options) override {
class TracedRandomAccessFile : public RandomAccessFile {
public:
TracedRandomAccessFile(std::unique_ptr<RandomAccessFile>&& target,
std::atomic<int>& counter)
: target_(std::move(target)), files_closed_(counter) {}
~TracedRandomAccessFile() override {
files_closed_.fetch_add(1, std::memory_order_relaxed);
}
Status Read(uint64_t offset, size_t n, Slice* result,
char* scratch) const override {
return target_->Read(offset, n, result, scratch);
}
private:
std::unique_ptr<RandomAccessFile> target_;
std::atomic<int>& files_closed_;
};
Status s = target()->NewRandomAccessFile(f, r, env_options);
if (s.ok()) {
r->reset(new TracedRandomAccessFile(std::move(*r), files_closed_));
}
return s;
}
int files_closed() const {
return files_closed_.load(std::memory_order_relaxed);
}
private:
std::atomic<int> files_closed_{0};
};
} // namespace
TEST_F(DBSecondaryTest, SecondaryCloseFiles) {
Options options;
options.env = env_;
options.max_open_files = 1;
options.disable_auto_compactions = true;
Reopen(options);
Options options1;
std::unique_ptr<Env> traced_env(new TraceFileEnv(env_));
options1.env = traced_env.get();
OpenSecondary(options1);
static const auto verify_db = [&]() {
std::unique_ptr<Iterator> iter1(dbfull()->NewIterator(ReadOptions()));
std::unique_ptr<Iterator> iter2(db_secondary_->NewIterator(ReadOptions()));
for (iter1->SeekToFirst(), iter2->SeekToFirst();
iter1->Valid() && iter2->Valid(); iter1->Next(), iter2->Next()) {
ASSERT_EQ(iter1->key(), iter2->key());
ASSERT_EQ(iter1->value(), iter2->value());
}
ASSERT_FALSE(iter1->Valid());
ASSERT_FALSE(iter2->Valid());
};
ASSERT_OK(Put("a", "value"));
ASSERT_OK(Put("c", "value"));
ASSERT_OK(Flush());
ASSERT_OK(db_secondary_->TryCatchUpWithPrimary());
verify_db();
ASSERT_OK(Put("b", "value"));
ASSERT_OK(Put("d", "value"));
ASSERT_OK(Flush());
ASSERT_OK(db_secondary_->TryCatchUpWithPrimary());
verify_db();
ASSERT_OK(dbfull()->CompactRange(CompactRangeOptions(), nullptr, nullptr));
ASSERT_OK(db_secondary_->TryCatchUpWithPrimary());
ASSERT_EQ(2, static_cast<TraceFileEnv*>(traced_env.get())->files_closed());
Status s = db_secondary_->SetDBOptions({{"max_open_files", "-1"}});
ASSERT_TRUE(s.IsNotSupported());
CloseSecondary();
}
TEST_F(DBSecondaryTest, OpenAsSecondaryWALTailing) {
Options options;
options.env = env_;

View File

@ -1427,6 +1427,47 @@ TEST_F(DBRangeDelTest, SnapshotPreventsDroppedKeys) {
db_->ReleaseSnapshot(snapshot);
}
TEST_F(DBRangeDelTest, SnapshotPreventsDroppedKeysInImmMemTables) {
const int kFileBytes = 1 << 20;
Options options = CurrentOptions();
options.compression = kNoCompression;
options.disable_auto_compactions = true;
options.target_file_size_base = kFileBytes;
Reopen(options);
// block flush thread -> pin immtables in memory
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->LoadDependency({
{"SnapshotPreventsDroppedKeysInImmMemTables:AfterNewIterator",
"DBImpl::BGWorkFlush"},
});
SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(0), "a"));
std::unique_ptr<const Snapshot, std::function<void(const Snapshot*)>>
snapshot(db_->GetSnapshot(),
[this](const Snapshot* s) { db_->ReleaseSnapshot(s); });
ASSERT_OK(db_->DeleteRange(WriteOptions(), db_->DefaultColumnFamily(), Key(0),
Key(10)));
ASSERT_OK(dbfull()->TEST_SwitchMemtable());
ReadOptions read_opts;
read_opts.snapshot = snapshot.get();
std::unique_ptr<Iterator> iter(db_->NewIterator(read_opts));
TEST_SYNC_POINT("SnapshotPreventsDroppedKeysInImmMemTables:AfterNewIterator");
iter->SeekToFirst();
ASSERT_TRUE(iter->Valid());
ASSERT_EQ(Key(0), iter->key());
iter->Next();
ASSERT_FALSE(iter->Valid());
}
TEST_F(DBRangeDelTest, RangeTombstoneWrittenToMinimalSsts) {
// Adapted from
// https://github.com/cockroachdb/cockroach/blob/de8b3ea603dd1592d9dc26443c2cc92c356fbc2f/pkg/storage/engine/rocksdb_test.go#L1267-L1398.

View File

@ -1022,7 +1022,8 @@ TEST_F(DBTest, FailMoreDbPaths) {
void CheckColumnFamilyMeta(
const ColumnFamilyMetaData& cf_meta,
const std::vector<std::vector<FileMetaData>>& files_by_level) {
const std::vector<std::vector<FileMetaData>>& files_by_level,
uint64_t start_time, uint64_t end_time) {
ASSERT_EQ(cf_meta.name, kDefaultColumnFamilyName);
ASSERT_EQ(cf_meta.levels.size(), files_by_level.size());
@ -1060,6 +1061,14 @@ void CheckColumnFamilyMeta(
file_meta_from_files.largest.user_key().ToString());
ASSERT_EQ(file_meta_from_cf.oldest_blob_file_number,
file_meta_from_files.oldest_blob_file_number);
ASSERT_EQ(file_meta_from_cf.oldest_ancester_time,
file_meta_from_files.oldest_ancester_time);
ASSERT_EQ(file_meta_from_cf.file_creation_time,
file_meta_from_files.file_creation_time);
ASSERT_GE(file_meta_from_cf.file_creation_time, start_time);
ASSERT_LE(file_meta_from_cf.file_creation_time, end_time);
ASSERT_GE(file_meta_from_cf.oldest_ancester_time, start_time);
ASSERT_LE(file_meta_from_cf.oldest_ancester_time, end_time);
}
ASSERT_EQ(level_meta_from_cf.size, level_size);
@ -1113,6 +1122,11 @@ TEST_F(DBTest, MetaDataTest) {
Options options = CurrentOptions();
options.create_if_missing = true;
options.disable_auto_compactions = true;
int64_t temp_time = 0;
options.env->GetCurrentTime(&temp_time);
uint64_t start_time = static_cast<uint64_t>(temp_time);
DestroyAndReopen(options);
Random rnd(301);
@ -1139,9 +1153,12 @@ TEST_F(DBTest, MetaDataTest) {
std::vector<std::vector<FileMetaData>> files_by_level;
dbfull()->TEST_GetFilesMetaData(db_->DefaultColumnFamily(), &files_by_level);
options.env->GetCurrentTime(&temp_time);
uint64_t end_time = static_cast<uint64_t>(temp_time);
ColumnFamilyMetaData cf_meta;
db_->GetColumnFamilyMetaData(&cf_meta);
CheckColumnFamilyMeta(cf_meta, files_by_level);
CheckColumnFamilyMeta(cf_meta, files_by_level, start_time, end_time);
std::vector<LiveFileMetaData> live_file_meta;
db_->GetLiveFilesMetaData(&live_file_meta);
@ -6420,6 +6437,12 @@ TEST_F(DBTest, CreationTimeOfOldestFile) {
}
}
});
// Set file creation time in manifest all to 0.
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"FileMetaData::FileMetaData", [&](void* arg) {
FileMetaData* meta = static_cast<FileMetaData*>(arg);
meta->file_creation_time = 0;
});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
Random rnd(301);
@ -6431,7 +6454,7 @@ TEST_F(DBTest, CreationTimeOfOldestFile) {
Flush();
}
// At this point there should be 2 files, oen with file_creation_time = 0 and
// At this point there should be 2 files, one with file_creation_time = 0 and
// the other non-zero. GetCreationTimeOfOldestFile API should return 0.
uint64_t creation_time;
Status s1 = dbfull()->GetCreationTimeOfOldestFile(&creation_time);

View File

@ -4210,6 +4210,42 @@ TEST_F(DBTest2, SeekFileRangeDeleteTail) {
}
db_->ReleaseSnapshot(s1);
}
TEST_F(DBTest2, BackgroundPurgeTest) {
Options options = CurrentOptions();
options.write_buffer_manager = std::make_shared<rocksdb::WriteBufferManager>(1 << 20);
options.avoid_unnecessary_blocking_io = true;
DestroyAndReopen(options);
size_t base_value = options.write_buffer_manager->memory_usage();
ASSERT_OK(Put("a", "a"));
Iterator* iter = db_->NewIterator(ReadOptions());
ASSERT_OK(Flush());
size_t value = options.write_buffer_manager->memory_usage();
ASSERT_GT(value, base_value);
db_->GetEnv()->SetBackgroundThreads(1, Env::Priority::HIGH);
test::SleepingBackgroundTask sleeping_task_after;
db_->GetEnv()->Schedule(&test::SleepingBackgroundTask::DoSleepTask,
&sleeping_task_after, Env::Priority::HIGH);
delete iter;
Env::Default()->SleepForMicroseconds(100000);
value = options.write_buffer_manager->memory_usage();
ASSERT_GT(value, base_value);
sleeping_task_after.WakeUp();
sleeping_task_after.WaitUntilDone();
test::SleepingBackgroundTask sleeping_task_after2;
db_->GetEnv()->Schedule(&test::SleepingBackgroundTask::DoSleepTask,
&sleeping_task_after2, Env::Priority::HIGH);
sleeping_task_after2.WakeUp();
sleeping_task_after2.WaitUntilDone();
value = options.write_buffer_manager->memory_usage();
ASSERT_EQ(base_value, value);
}
} // namespace rocksdb
#ifdef ROCKSDB_UNITTESTS_WITH_CUSTOM_OBJECTS_FROM_STATIC_LIBS

View File

@ -53,9 +53,9 @@ DBTestBase::DBTestBase(const std::string path)
#ifndef ROCKSDB_LITE
const char* test_env_uri = getenv("TEST_ENV_URI");
if (test_env_uri) {
Status s = ObjectRegistry::NewInstance()->NewSharedObject(test_env_uri,
&env_guard_);
base_env = env_guard_.get();
Env* test_env = nullptr;
Status s = Env::LoadEnv(test_env_uri, &test_env, &env_guard_);
base_env = test_env;
EXPECT_OK(s);
EXPECT_NE(Env::Default(), base_env);
}

View File

@ -39,6 +39,97 @@ TEST_P(DBWriteTest, SyncAndDisableWAL) {
ASSERT_TRUE(dbfull()->Write(write_options, &batch).IsInvalidArgument());
}
TEST_P(DBWriteTest, WriteThreadHangOnWriteStall) {
Options options = GetOptions();
options.level0_stop_writes_trigger = options.level0_slowdown_writes_trigger = 4;
std::vector<port::Thread> threads;
std::atomic<int> thread_num(0);
port::Mutex mutex;
port::CondVar cv(&mutex);
Reopen(options);
std::function<void()> write_slowdown_func = [&]() {
int a = thread_num.fetch_add(1);
std::string key = "foo" + std::to_string(a);
WriteOptions wo;
wo.no_slowdown = false;
dbfull()->Put(wo, key, "bar");
};
std::function<void()> write_no_slowdown_func = [&]() {
int a = thread_num.fetch_add(1);
std::string key = "foo" + std::to_string(a);
WriteOptions wo;
wo.no_slowdown = true;
dbfull()->Put(wo, key, "bar");
};
std::function<void(void *)> unblock_main_thread_func = [&](void *) {
mutex.Lock();
cv.SignalAll();
mutex.Unlock();
};
// Create 3 L0 files and schedule 4th without waiting
Put("foo" + std::to_string(thread_num.fetch_add(1)), "bar");
Flush();
Put("foo" + std::to_string(thread_num.fetch_add(1)), "bar");
Flush();
Put("foo" + std::to_string(thread_num.fetch_add(1)), "bar");
Flush();
Put("foo" + std::to_string(thread_num.fetch_add(1)), "bar");
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"WriteThread::JoinBatchGroup:Start", unblock_main_thread_func);
rocksdb::SyncPoint::GetInstance()->LoadDependency(
{{"DBWriteTest::WriteThreadHangOnWriteStall:1",
"DBImpl::BackgroundCallFlush:start"},
{"DBWriteTest::WriteThreadHangOnWriteStall:2",
"DBImpl::WriteImpl:BeforeLeaderEnters"},
// Make compaction start wait for the write stall to be detected and
// implemented by a write group leader
{"DBWriteTest::WriteThreadHangOnWriteStall:3",
"BackgroundCallCompaction:0"}});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
// Schedule creation of 4th L0 file without waiting. This will seal the
// memtable and then wait for a sync point before writing the file. We need
// to do it this way because SwitchMemtable() needs to enter the
// write_thread
FlushOptions fopt;
fopt.wait = false;
dbfull()->Flush(fopt);
// Create a mix of slowdown/no_slowdown write threads
mutex.Lock();
// First leader
threads.emplace_back(write_slowdown_func);
cv.Wait();
// Second leader. Will stall writes
threads.emplace_back(write_slowdown_func);
cv.Wait();
threads.emplace_back(write_no_slowdown_func);
cv.Wait();
threads.emplace_back(write_slowdown_func);
cv.Wait();
threads.emplace_back(write_no_slowdown_func);
cv.Wait();
threads.emplace_back(write_slowdown_func);
cv.Wait();
mutex.Unlock();
TEST_SYNC_POINT("DBWriteTest::WriteThreadHangOnWriteStall:1");
dbfull()->TEST_WaitForFlushMemTable(nullptr);
// This would have triggered a write stall. Unblock the write group leader
TEST_SYNC_POINT("DBWriteTest::WriteThreadHangOnWriteStall:2");
// The leader is going to create missing newer links. When the leader finishes,
// the next leader is going to delay writes and fail writers with no_slowdown
TEST_SYNC_POINT("DBWriteTest::WriteThreadHangOnWriteStall:3");
for (auto& t : threads) {
t.join();
}
}
TEST_P(DBWriteTest, IOErrorOnWALWritePropagateToWriteThreadFollower) {
constexpr int kNumThreads = 5;
std::unique_ptr<FaultInjectionTestEnv> mock_env(

View File

@ -166,12 +166,6 @@ Status ErrorHandler::SetBGError(const Status& bg_err, BackgroundErrorReason reas
return Status::OK();
}
// Check if recovery is currently in progress. If it is, we will save this
// error so we can check it at the end to see if recovery succeeded or not
if (recovery_in_prog_ && recovery_error_.ok()) {
recovery_error_ = bg_err;
}
bool paranoid = db_options_.paranoid_checks;
Status::Severity sev = Status::Severity::kFatalError;
Status new_bg_err;
@ -204,10 +198,15 @@ Status ErrorHandler::SetBGError(const Status& bg_err, BackgroundErrorReason reas
new_bg_err = Status(bg_err, sev);
// Check if recovery is currently in progress. If it is, we will save this
// error so we can check it at the end to see if recovery succeeded or not
if (recovery_in_prog_ && recovery_error_.ok()) {
recovery_error_ = new_bg_err;
}
bool auto_recovery = auto_recovery_;
if (new_bg_err.severity() >= Status::Severity::kFatalError && auto_recovery) {
auto_recovery = false;
;
}
// Allow some error specific overrides

View File

@ -22,6 +22,21 @@ namespace rocksdb {
class DBErrorHandlingTest : public DBTestBase {
public:
DBErrorHandlingTest() : DBTestBase("/db_error_handling_test") {}
std::string GetManifestNameFromLiveFiles() {
std::vector<std::string> live_files;
uint64_t manifest_size;
dbfull()->GetLiveFiles(live_files, &manifest_size, false);
for (auto& file : live_files) {
uint64_t num = 0;
FileType type;
if (ParseFileName(file, &num, &type) && type == kDescriptorFile) {
return file;
}
}
return "";
}
};
class DBErrorHandlingEnv : public EnvWrapper {
@ -161,6 +176,169 @@ TEST_F(DBErrorHandlingTest, FLushWriteError) {
Destroy(options);
}
TEST_F(DBErrorHandlingTest, ManifestWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default()));
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
Options options = GetDefaultOptions();
options.create_if_missing = true;
options.env = fault_env.get();
options.listeners.emplace_back(listener);
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
Put(Key(0), "val");
Flush();
Put(Key(1), "val");
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
fault_env->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_EQ(s, Status::OK());
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingTest, DoubleManifestWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default()));
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
Options options = GetDefaultOptions();
options.create_if_missing = true;
options.env = fault_env.get();
options.listeners.emplace_back(listener);
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
Put(Key(0), "val");
Flush();
Put(Key(1), "val");
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
fault_env->SetFilesystemActive(true);
// This Resume() will attempt to create a new manifest file and fail again
s = dbfull()->Resume();
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
fault_env->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
// A successful Resume() will create a new manifest file
s = dbfull()->Resume();
ASSERT_EQ(s, Status::OK());
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingTest, CompactionManifestWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default()));
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
Options options = GetDefaultOptions();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
options.env = fault_env.get();
Status s;
std::string old_manifest;
std::string new_manifest;
std::atomic<bool> fail_manifest(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
Put(Key(0), "val");
Put(Key(2), "val");
s = Flush();
ASSERT_EQ(s, Status::OK());
rocksdb::SyncPoint::GetInstance()->LoadDependency(
// Wait for flush of 2nd L0 file before starting compaction
{{"FlushMemTableFinished",
"BackgroundCallCompaction:0"},
// Wait for compaction to detect manifest write error
{"BackgroundCallCompaction:1",
"CompactionManifestWriteError:0"},
// Make compaction thread wait for error to be cleared
{"CompactionManifestWriteError:1",
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"},
// Wait for DB instance to clear bg_error before calling
// TEST_WaitForCompact
{"SstFileManagerImpl::ClearError",
"CompactionManifestWriteError:2"}});
// trigger manifest write failure in compaction thread
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void *) {
fail_manifest.store(true);
});
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
if (fail_manifest.load()) {
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
}
});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
Put(Key(1), "val");
// This Flush will trigger a compaction, which will fail when appending to
// the manifest
s = Flush();
ASSERT_EQ(s, Status::OK());
TEST_SYNC_POINT("CompactionManifestWriteError:0");
// Clear all errors so when the compaction is retried, it will succeed
fault_env->SetFilesystemActive(true);
rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("CompactionManifestWriteError:1");
TEST_SYNC_POINT("CompactionManifestWriteError:2");
s = dbfull()->TEST_WaitForCompact();
rocksdb::SyncPoint::GetInstance()->DisableProcessing();
ASSERT_EQ(s, Status::OK());
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
ASSERT_EQ("val", Get(Key(2)));
Close();
}
TEST_F(DBErrorHandlingTest, CompactionWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default()));

View File

@ -246,16 +246,19 @@ Status ExternalSstFileIngestionJob::Run() {
// We use the import time as the ancester time. This is the time the data
// is written to the database.
uint64_t oldest_ancester_time = 0;
int64_t temp_current_time = 0;
uint64_t current_time = kUnknownFileCreationTime;
uint64_t oldest_ancester_time = kUnknownOldestAncesterTime;
if (env_->GetCurrentTime(&temp_current_time).ok()) {
oldest_ancester_time = static_cast<uint64_t>(temp_current_time);
current_time = oldest_ancester_time =
static_cast<uint64_t>(temp_current_time);
}
edit_.AddFile(f.picked_level, f.fd.GetNumber(), f.fd.GetPathId(),
f.fd.GetFileSize(), f.smallest_internal_key,
f.largest_internal_key, f.assigned_seqno, f.assigned_seqno,
false, kInvalidBlobFileNumber, oldest_ancester_time);
false, kInvalidBlobFileNumber, oldest_ancester_time,
current_time);
}
return status;
}

View File

@ -368,6 +368,7 @@ Status FlushJob::WriteLevel0Table() {
// It's not clear whether oldest_key_time is always available. In case
// it is not available, use current_time.
meta_.oldest_ancester_time = std::min(current_time, oldest_key_time);
meta_.file_creation_time = current_time;
s = BuildTable(
dbname_, db_options_.env, *cfd_->ioptions(), mutable_cf_options_,
@ -413,7 +414,7 @@ Status FlushJob::WriteLevel0Table() {
meta_.fd.GetFileSize(), meta_.smallest, meta_.largest,
meta_.fd.smallest_seqno, meta_.fd.largest_seqno,
meta_.marked_for_compaction, meta_.oldest_blob_file_number,
meta_.oldest_ancester_time);
meta_.oldest_ancester_time, meta_.file_creation_time);
}
#ifndef ROCKSDB_LITE
// Piggyback FlushJobInfo on the first first flushed memtable.

View File

@ -239,9 +239,13 @@ void ForwardIterator::SVCleanup(DBImpl* db, SuperVersion* sv,
db->FindObsoleteFiles(&job_context, false, true);
if (background_purge_on_iterator_cleanup) {
db->ScheduleBgLogWriterClose(&job_context);
db->AddSuperVersionsToFreeQueue(sv);
db->SchedulePurge();
}
db->mutex_.Unlock();
delete sv;
if (!background_purge_on_iterator_cleanup) {
delete sv;
}
if (job_context.HaveSomethingToDelete()) {
db->PurgeObsoleteFiles(job_context, background_purge_on_iterator_cleanup);
}
@ -614,7 +618,7 @@ void ForwardIterator::RebuildIterators(bool refresh_sv) {
Cleanup(refresh_sv);
if (refresh_sv) {
// New
sv_ = cfd_->GetReferencedSuperVersion(&(db_->mutex_));
sv_ = cfd_->GetReferencedSuperVersion(db_);
}
ReadRangeDelAggregator range_del_agg(&cfd_->internal_comparator(),
kMaxSequenceNumber /* upper_bound */);
@ -668,7 +672,7 @@ void ForwardIterator::RebuildIterators(bool refresh_sv) {
void ForwardIterator::RenewIterators() {
SuperVersion* svnew;
assert(sv_);
svnew = cfd_->GetReferencedSuperVersion(&(db_->mutex_));
svnew = cfd_->GetReferencedSuperVersion(db_);
if (mutable_iter_ != nullptr) {
DeleteIterator(mutable_iter_, true /* is_arena */);

View File

@ -135,10 +135,12 @@ Status ImportColumnFamilyJob::Run() {
// We use the import time as the ancester time. This is the time the data
// is written to the database.
uint64_t oldest_ancester_time = 0;
int64_t temp_current_time = 0;
uint64_t oldest_ancester_time = kUnknownOldestAncesterTime;
uint64_t current_time = kUnknownOldestAncesterTime;
if (env_->GetCurrentTime(&temp_current_time).ok()) {
oldest_ancester_time = static_cast<uint64_t>(temp_current_time);
current_time = oldest_ancester_time =
static_cast<uint64_t>(temp_current_time);
}
for (size_t i = 0; i < files_to_import_.size(); ++i) {
@ -149,7 +151,7 @@ Status ImportColumnFamilyJob::Run() {
f.fd.GetFileSize(), f.smallest_internal_key,
f.largest_internal_key, file_metadata.smallest_seqno,
file_metadata.largest_seqno, false, kInvalidBlobFileNumber,
oldest_ancester_time);
oldest_ancester_time, current_time);
// If incoming sequence number is higher, update local sequence number.
if (file_metadata.largest_seqno > versions_->LastSequence()) {

View File

@ -138,7 +138,7 @@ class MemTable {
// As a cheap version of `ApproximateMemoryUsage()`, this function doens't
// require external synchronization. The value may be less accurate though
size_t ApproximateMemoryUsageFast() {
size_t ApproximateMemoryUsageFast() const {
return approximate_memory_usage_.load(std::memory_order_relaxed);
}

View File

@ -186,11 +186,14 @@ Status MemTableListVersion::AddRangeTombstoneIterators(
const ReadOptions& read_opts, Arena* /*arena*/,
RangeDelAggregator* range_del_agg) {
assert(range_del_agg != nullptr);
// Except for snapshot read, using kMaxSequenceNumber is OK because these
// are immutable memtables.
SequenceNumber read_seq = read_opts.snapshot != nullptr
? read_opts.snapshot->GetSequenceNumber()
: kMaxSequenceNumber;
for (auto& m : memlist_) {
// Using kMaxSequenceNumber is OK because these are immutable memtables.
std::unique_ptr<FragmentedRangeTombstoneIterator> range_del_iter(
m->NewRangeTombstoneIterator(read_opts,
kMaxSequenceNumber /* read_seq */));
m->NewRangeTombstoneIterator(read_opts, read_seq));
range_del_agg->AddTombstones(std::move(range_del_iter));
}
return Status::OK();
@ -277,7 +280,7 @@ void MemTableListVersion::Remove(MemTable* m,
}
// return the total memory usage assuming the oldest flushed memtable is dropped
size_t MemTableListVersion::ApproximateMemoryUsageExcludingLast() {
size_t MemTableListVersion::ApproximateMemoryUsageExcludingLast() const {
size_t total_memtable_size = 0;
for (auto& memtable : memlist_) {
total_memtable_size += memtable->ApproximateMemoryUsage();
@ -500,7 +503,7 @@ Status MemTableList::TryInstallMemtableFlushResults(
cfd->GetName().c_str(), m->file_number_, mem_id);
assert(m->file_number_ > 0);
current_->Remove(m, to_delete);
UpdateMemoryUsageExcludingLast();
UpdateCachedValuesFromMemTableListVersion();
ResetTrimHistoryNeeded();
++mem_id;
}
@ -541,14 +544,14 @@ void MemTableList::Add(MemTable* m, autovector<MemTable*>* to_delete) {
if (num_flush_not_started_ == 1) {
imm_flush_needed.store(true, std::memory_order_release);
}
UpdateMemoryUsageExcludingLast();
UpdateCachedValuesFromMemTableListVersion();
ResetTrimHistoryNeeded();
}
void MemTableList::TrimHistory(autovector<MemTable*>* to_delete, size_t usage) {
InstallNewVersion();
current_->TrimHistory(to_delete, usage);
UpdateMemoryUsageExcludingLast();
UpdateCachedValuesFromMemTableListVersion();
ResetTrimHistoryNeeded();
}
@ -563,18 +566,25 @@ size_t MemTableList::ApproximateUnflushedMemTablesMemoryUsage() {
size_t MemTableList::ApproximateMemoryUsage() { return current_memory_usage_; }
size_t MemTableList::ApproximateMemoryUsageExcludingLast() {
size_t usage =
size_t MemTableList::ApproximateMemoryUsageExcludingLast() const {
const size_t usage =
current_memory_usage_excluding_last_.load(std::memory_order_relaxed);
return usage;
}
// Update current_memory_usage_excluding_last_, need to call whenever state
// changes for MemtableListVersion (whenever InstallNewVersion() is called)
void MemTableList::UpdateMemoryUsageExcludingLast() {
size_t total_memtable_size = current_->ApproximateMemoryUsageExcludingLast();
bool MemTableList::HasHistory() const {
const bool has_history = current_has_history_.load(std::memory_order_relaxed);
return has_history;
}
void MemTableList::UpdateCachedValuesFromMemTableListVersion() {
const size_t total_memtable_size =
current_->ApproximateMemoryUsageExcludingLast();
current_memory_usage_excluding_last_.store(total_memtable_size,
std::memory_order_relaxed);
const bool has_history = current_->HasHistory();
current_has_history_.store(has_history, std::memory_order_relaxed);
}
uint64_t MemTableList::ApproximateOldestKeyTime() const {
@ -704,7 +714,7 @@ Status InstallMemtableAtomicFlushResults(
cfds[i]->GetName().c_str(), m->GetFileNumber(),
mem_id);
imm->current_->Remove(m, to_delete);
imm->UpdateMemoryUsageExcludingLast();
imm->UpdateCachedValuesFromMemTableListVersion();
imm->ResetTrimHistoryNeeded();
}
}
@ -754,7 +764,7 @@ void MemTableList::RemoveOldMemTables(uint64_t log_number,
}
}
UpdateMemoryUsageExcludingLast();
UpdateCachedValuesFromMemTableListVersion();
ResetTrimHistoryNeeded();
}

View File

@ -162,7 +162,11 @@ class MemTableListVersion {
// excluding the last MemTable in memlist_history_. The reason for excluding
// the last MemTable is to see if dropping the last MemTable will keep total
// memory usage above or equal to max_write_buffer_size_to_maintain_
size_t ApproximateMemoryUsageExcludingLast();
size_t ApproximateMemoryUsageExcludingLast() const;
// Whether this version contains flushed memtables that are only kept around
// for transaction conflict checking.
bool HasHistory() const { return !memlist_history_.empty(); }
bool MemtableLimitExceeded(size_t usage);
@ -211,7 +215,8 @@ class MemTableList {
commit_in_progress_(false),
flush_requested_(false),
current_memory_usage_(0),
current_memory_usage_excluding_last_(0) {
current_memory_usage_excluding_last_(0),
current_has_history_(false) {
current_->Ref();
}
@ -266,11 +271,16 @@ class MemTableList {
// Returns an estimate of the number of bytes of data in use.
size_t ApproximateMemoryUsage();
// Returns the cached current_memory_usage_excluding_last_ value
size_t ApproximateMemoryUsageExcludingLast();
// Returns the cached current_memory_usage_excluding_last_ value.
size_t ApproximateMemoryUsageExcludingLast() const;
// Update current_memory_usage_excluding_last_ from MemtableListVersion
void UpdateMemoryUsageExcludingLast();
// Returns the cached current_has_history_ value.
bool HasHistory() const;
// Updates current_memory_usage_excluding_last_ and current_has_history_
// from MemTableListVersion. Must be called whenever InstallNewVersion is
// called.
void UpdateCachedValuesFromMemTableListVersion();
// `usage` is the current size of the mutable Memtable. When
// max_write_buffer_size_to_maintain is used, total size of mutable and
@ -388,7 +398,11 @@ class MemTableList {
// The current memory usage.
size_t current_memory_usage_;
// Cached value of current_->ApproximateMemoryUsageExcludingLast().
std::atomic<size_t> current_memory_usage_excluding_last_;
// Cached value of current_->HasHistory().
std::atomic<bool> current_has_history_;
};
// Installs memtable atomic flush results.

View File

@ -577,13 +577,13 @@ class Repairer {
// TODO(opt): separate out into multiple levels
for (const auto* table : cf_id_and_tables.second) {
edit.AddFile(0, table->meta.fd.GetNumber(), table->meta.fd.GetPathId(),
table->meta.fd.GetFileSize(), table->meta.smallest,
table->meta.largest, table->meta.fd.smallest_seqno,
table->meta.fd.largest_seqno,
table->meta.marked_for_compaction,
table->meta.oldest_blob_file_number,
table->meta.oldest_ancester_time);
edit.AddFile(
0, table->meta.fd.GetNumber(), table->meta.fd.GetPathId(),
table->meta.fd.GetFileSize(), table->meta.smallest,
table->meta.largest, table->meta.fd.smallest_seqno,
table->meta.fd.largest_seqno, table->meta.marked_for_compaction,
table->meta.oldest_blob_file_number,
table->meta.oldest_ancester_time, table->meta.file_creation_time);
}
assert(next_file_number_ > 0);
vset_.MarkFileNumberUsed(next_file_number_ - 1);

View File

@ -63,7 +63,7 @@ class VersionBuilderTest : public testing::Test {
file_number, path_id, file_size, GetInternalKey(smallest, smallest_seq),
GetInternalKey(largest, largest_seq), smallest_seqno, largest_seqno,
/* marked_for_compact */ false, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
f->compensated_file_size = file_size;
f->num_entries = num_entries;
f->num_deletions = num_deletions;
@ -114,7 +114,8 @@ TEST_F(VersionBuilderTest, ApplyAndSaveTo) {
VersionEdit version_edit;
version_edit.AddFile(2, 666, 0, 100U, GetInternalKey("301"),
GetInternalKey("350"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.DeleteFile(3, 27U);
EnvOptions env_options;
@ -149,7 +150,8 @@ TEST_F(VersionBuilderTest, ApplyAndSaveToDynamic) {
VersionEdit version_edit;
version_edit.AddFile(3, 666, 0, 100U, GetInternalKey("301"),
GetInternalKey("350"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.DeleteFile(0, 1U);
version_edit.DeleteFile(0, 88U);
@ -187,7 +189,8 @@ TEST_F(VersionBuilderTest, ApplyAndSaveToDynamic2) {
VersionEdit version_edit;
version_edit.AddFile(4, 666, 0, 100U, GetInternalKey("301"),
GetInternalKey("350"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.DeleteFile(0, 1U);
version_edit.DeleteFile(0, 88U);
version_edit.DeleteFile(4, 6U);
@ -216,19 +219,24 @@ TEST_F(VersionBuilderTest, ApplyMultipleAndSaveTo) {
VersionEdit version_edit;
version_edit.AddFile(2, 666, 0, 100U, GetInternalKey("301"),
GetInternalKey("350"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 676, 0, 100U, GetInternalKey("401"),
GetInternalKey("450"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 636, 0, 100U, GetInternalKey("601"),
GetInternalKey("650"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 616, 0, 100U, GetInternalKey("501"),
GetInternalKey("550"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 606, 0, 100U, GetInternalKey("701"),
GetInternalKey("750"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
EnvOptions env_options;
@ -255,30 +263,37 @@ TEST_F(VersionBuilderTest, ApplyDeleteAndSaveTo) {
VersionEdit version_edit;
version_edit.AddFile(2, 666, 0, 100U, GetInternalKey("301"),
GetInternalKey("350"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 676, 0, 100U, GetInternalKey("401"),
GetInternalKey("450"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 636, 0, 100U, GetInternalKey("601"),
GetInternalKey("650"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 616, 0, 100U, GetInternalKey("501"),
GetInternalKey("550"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit.AddFile(2, 606, 0, 100U, GetInternalKey("701"),
GetInternalKey("750"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_builder.Apply(&version_edit);
VersionEdit version_edit2;
version_edit.AddFile(2, 808, 0, 100U, GetInternalKey("901"),
GetInternalKey("950"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_edit2.DeleteFile(2, 616);
version_edit2.DeleteFile(2, 636);
version_edit.AddFile(2, 806, 0, 100U, GetInternalKey("801"),
GetInternalKey("850"), 200, 200, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
version_builder.Apply(&version_edit2);
version_builder.SaveTo(&new_vstorage);

View File

@ -62,6 +62,7 @@ enum CustomTag : uint32_t {
kMinLogNumberToKeepHack = 3,
kOldestBlobFileNumber = 4,
kOldestAncesterTime = 5,
kFileCreationTime = 6,
kPathId = 65,
};
// If this bit for the custom tag is set, opening DB should fail if
@ -217,6 +218,14 @@ bool VersionEdit::EncodeTo(std::string* dst) const {
TEST_SYNC_POINT_CALLBACK("VersionEdit::EncodeTo:VarintOldestAncesterTime",
&varint_oldest_ancester_time);
PutLengthPrefixedSlice(dst, Slice(varint_oldest_ancester_time));
PutVarint32(dst, CustomTag::kFileCreationTime);
std::string varint_file_creation_time;
PutVarint64(&varint_file_creation_time, f.file_creation_time);
TEST_SYNC_POINT_CALLBACK("VersionEdit::EncodeTo:VarintFileCreationTime",
&varint_file_creation_time);
PutLengthPrefixedSlice(dst, Slice(varint_file_creation_time));
if (f.fd.GetPathId() != 0) {
PutVarint32(dst, CustomTag::kPathId);
char p = static_cast<char>(f.fd.GetPathId());
@ -335,6 +344,11 @@ const char* VersionEdit::DecodeNewFile4From(Slice* input) {
return "invalid oldest ancester time";
}
break;
case kFileCreationTime:
if (!GetVarint64(&field, &f.file_creation_time)) {
return "invalid file creation time";
}
break;
case kNeedCompaction:
if (field.size() != 1) {
return "need_compaction field wrong size";
@ -660,6 +674,8 @@ std::string VersionEdit::DebugString(bool hex_key) const {
}
r.append(" oldest_ancester_time:");
AppendNumberTo(&r, f.oldest_ancester_time);
r.append(" file_creation_time:");
AppendNumberTo(&r, f.file_creation_time);
}
r.append("\n ColumnFamily: ");
AppendNumberTo(&r, column_family_);

View File

@ -26,6 +26,7 @@ class VersionSet;
constexpr uint64_t kFileNumberMask = 0x3FFFFFFFFFFFFFFF;
constexpr uint64_t kInvalidBlobFileNumber = 0;
constexpr uint64_t kUnknownOldestAncesterTime = 0;
constexpr uint64_t kUnknownFileCreationTime = 0;
extern uint64_t PackFileNumberAndPathId(uint64_t number, uint64_t path_id);
@ -128,7 +129,10 @@ struct FileMetaData {
// in turn be outputs for compact older SST files. We track the memtable
// flush timestamp for the oldest SST file that eventaully contribute data
// to this file. 0 means the information is not available.
uint64_t oldest_ancester_time = 0;
uint64_t oldest_ancester_time = kUnknownOldestAncesterTime;
// Unix time when the SST file is created.
uint64_t file_creation_time = kUnknownFileCreationTime;
FileMetaData() = default;
@ -136,13 +140,17 @@ struct FileMetaData {
const InternalKey& smallest_key, const InternalKey& largest_key,
const SequenceNumber& smallest_seq,
const SequenceNumber& largest_seq, bool marked_for_compact,
uint64_t oldest_blob_file, uint64_t _oldest_ancester_time)
uint64_t oldest_blob_file, uint64_t _oldest_ancester_time,
uint64_t _file_creation_time)
: fd(file, file_path_id, file_size, smallest_seq, largest_seq),
smallest(smallest_key),
largest(largest_key),
marked_for_compaction(marked_for_compact),
oldest_blob_file_number(oldest_blob_file),
oldest_ancester_time(_oldest_ancester_time) {}
oldest_ancester_time(_oldest_ancester_time),
file_creation_time(_file_creation_time) {
TEST_SYNC_POINT_CALLBACK("FileMetaData::FileMetaData", this);
}
// REQUIRED: Keys must be given to the function in sorted order (it expects
// the last key to be the largest).
@ -168,13 +176,23 @@ struct FileMetaData {
// if table reader is already pinned.
// 0 means the information is not available.
uint64_t TryGetOldestAncesterTime() {
if (oldest_ancester_time != 0) {
if (oldest_ancester_time != kUnknownOldestAncesterTime) {
return oldest_ancester_time;
} else if (fd.table_reader != nullptr &&
fd.table_reader->GetTableProperties() != nullptr) {
return fd.table_reader->GetTableProperties()->creation_time;
}
return 0;
return kUnknownOldestAncesterTime;
}
uint64_t TryGetFileCreationTime() {
if (file_creation_time != kUnknownFileCreationTime) {
return file_creation_time;
} else if (fd.table_reader != nullptr &&
fd.table_reader->GetTableProperties() != nullptr) {
return fd.table_reader->GetTableProperties()->file_creation_time;
}
return kUnknownFileCreationTime;
}
};
@ -277,14 +295,14 @@ class VersionEdit {
uint64_t file_size, const InternalKey& smallest,
const InternalKey& largest, const SequenceNumber& smallest_seqno,
const SequenceNumber& largest_seqno, bool marked_for_compaction,
uint64_t oldest_blob_file_number,
uint64_t oldest_ancester_time) {
uint64_t oldest_blob_file_number, uint64_t oldest_ancester_time,
uint64_t file_creation_time) {
assert(smallest_seqno <= largest_seqno);
new_files_.emplace_back(
level,
FileMetaData(file, file_path_id, file_size, smallest, largest,
smallest_seqno, largest_seqno, marked_for_compaction,
oldest_blob_file_number, oldest_ancester_time));
level, FileMetaData(file, file_path_id, file_size, smallest, largest,
smallest_seqno, largest_seqno,
marked_for_compaction, oldest_blob_file_number,
oldest_ancester_time, file_creation_time));
}
void AddFile(int level, const FileMetaData& f) {

View File

@ -37,7 +37,7 @@ TEST_F(VersionEditTest, EncodeDecode) {
InternalKey("foo", kBig + 500 + i, kTypeValue),
InternalKey("zoo", kBig + 600 + i, kTypeDeletion),
kBig + 500 + i, kBig + 600 + i, false, kInvalidBlobFileNumber,
888);
888, 678);
edit.DeleteFile(4, kBig + 700 + i);
}
@ -55,17 +55,19 @@ TEST_F(VersionEditTest, EncodeDecodeNewFile4) {
edit.AddFile(3, 300, 3, 100, InternalKey("foo", kBig + 500, kTypeValue),
InternalKey("zoo", kBig + 600, kTypeDeletion), kBig + 500,
kBig + 600, true, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
edit.AddFile(4, 301, 3, 100, InternalKey("foo", kBig + 501, kTypeValue),
InternalKey("zoo", kBig + 601, kTypeDeletion), kBig + 501,
kBig + 601, false, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
edit.AddFile(5, 302, 0, 100, InternalKey("foo", kBig + 502, kTypeValue),
InternalKey("zoo", kBig + 602, kTypeDeletion), kBig + 502,
kBig + 602, true, kInvalidBlobFileNumber, 666);
kBig + 602, true, kInvalidBlobFileNumber, 666, 888);
edit.AddFile(5, 303, 0, 100, InternalKey("foo", kBig + 503, kTypeBlobIndex),
InternalKey("zoo", kBig + 603, kTypeBlobIndex), kBig + 503,
kBig + 603, true, 1001, kUnknownOldestAncesterTime);
kBig + 603, true, 1001, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
;
edit.DeleteFile(4, 700);
@ -104,10 +106,10 @@ TEST_F(VersionEditTest, ForwardCompatibleNewFile4) {
edit.AddFile(3, 300, 3, 100, InternalKey("foo", kBig + 500, kTypeValue),
InternalKey("zoo", kBig + 600, kTypeDeletion), kBig + 500,
kBig + 600, true, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
edit.AddFile(4, 301, 3, 100, InternalKey("foo", kBig + 501, kTypeValue),
InternalKey("zoo", kBig + 601, kTypeDeletion), kBig + 501,
kBig + 601, false, kInvalidBlobFileNumber, 686);
kBig + 601, false, kInvalidBlobFileNumber, 686, 868);
edit.DeleteFile(4, 700);
edit.SetComparatorName("foo");
@ -154,7 +156,7 @@ TEST_F(VersionEditTest, NewFile4NotSupportedField) {
edit.AddFile(3, 300, 3, 100, InternalKey("foo", kBig + 500, kTypeValue),
InternalKey("zoo", kBig + 600, kTypeDeletion), kBig + 500,
kBig + 600, true, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
edit.SetComparatorName("foo");
edit.SetLogNumber(kBig + 100);
@ -182,7 +184,8 @@ TEST_F(VersionEditTest, NewFile4NotSupportedField) {
TEST_F(VersionEditTest, EncodeEmptyFile) {
VersionEdit edit;
edit.AddFile(0, 0, 0, 0, InternalKey(), InternalKey(), 0, 0, false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
std::string buffer;
ASSERT_TRUE(!edit.EncodeTo(&buffer));
}

View File

@ -1459,7 +1459,8 @@ void Version::GetColumnFamilyMetaData(ColumnFamilyMetaData* cf_meta) {
file->fd.largest_seqno, file->smallest.user_key().ToString(),
file->largest.user_key().ToString(),
file->stats.num_reads_sampled.load(std::memory_order_relaxed),
file->being_compacted, file->oldest_blob_file_number});
file->being_compacted, file->oldest_blob_file_number,
file->TryGetOldestAncesterTime(), file->TryGetFileCreationTime()});
files.back().num_entries = file->num_entries;
files.back().num_deletions = file->num_deletions;
level_size += file->fd.GetFileSize();
@ -1485,10 +1486,9 @@ void Version::GetCreationTimeOfOldestFile(uint64_t* creation_time) {
for (int level = 0; level < storage_info_.num_non_empty_levels_; level++) {
for (FileMetaData* meta : storage_info_.LevelFiles(level)) {
assert(meta->fd.table_reader != nullptr);
uint64_t file_creation_time =
meta->fd.table_reader->GetTableProperties()->file_creation_time;
if (file_creation_time == 0) {
*creation_time = file_creation_time;
uint64_t file_creation_time = meta->TryGetFileCreationTime();
if (file_creation_time == kUnknownFileCreationTime) {
*creation_time = 0;
return;
}
if (file_creation_time < oldest_time) {
@ -2501,8 +2501,7 @@ void VersionStorageInfo::ComputeExpiredTtlFiles(
void VersionStorageInfo::ComputeFilesMarkedForPeriodicCompaction(
const ImmutableCFOptions& ioptions,
const uint64_t periodic_compaction_seconds) {
assert(periodic_compaction_seconds > 0 &&
periodic_compaction_seconds < port::kMaxUint64);
assert(periodic_compaction_seconds > 0);
files_marked_for_periodic_compaction_.clear();
@ -2513,8 +2512,8 @@ void VersionStorageInfo::ComputeFilesMarkedForPeriodicCompaction(
}
const uint64_t current_time = static_cast<uint64_t>(temp_current_time);
// If periodic_compaction_seconds > current_time, no file possibly qualifies
// periodic compaction.
// If periodic_compaction_seconds is larger than current time, periodic
// compaction can't possibly be triggered.
if (periodic_compaction_seconds > current_time) {
return;
}
@ -2524,20 +2523,18 @@ void VersionStorageInfo::ComputeFilesMarkedForPeriodicCompaction(
for (int level = 0; level < num_levels(); level++) {
for (auto f : files_[level]) {
if (!f->being_compacted && f->fd.table_reader != nullptr &&
f->fd.table_reader->GetTableProperties() != nullptr) {
if (!f->being_compacted) {
// Compute a file's modification time in the following order:
// 1. Use file_creation_time table property if it is > 0.
// 2. Use creation_time table property if it is > 0.
// 3. Use file's mtime metadata if the above two table properties are 0.
// Don't consider the file at all if the modification time cannot be
// correctly determined based on the above conditions.
uint64_t file_modification_time =
f->fd.table_reader->GetTableProperties()->file_creation_time;
if (file_modification_time == 0) {
uint64_t file_modification_time = f->TryGetFileCreationTime();
if (file_modification_time == kUnknownFileCreationTime) {
file_modification_time = f->TryGetOldestAncesterTime();
}
if (file_modification_time == 0) {
if (file_modification_time == kUnknownOldestAncesterTime) {
auto file_path = TableFileName(ioptions.cf_paths, f->fd.GetNumber(),
f->fd.GetPathId());
status = ioptions.env->GetFileModificationTime(
@ -3945,12 +3942,15 @@ Status VersionSet::ProcessManifestWrites(
for (auto v : versions) {
delete v;
}
// If manifest append failed for whatever reason, the file could be
// corrupted. So we need to force the next version update to start a
// new manifest file.
descriptor_log_.reset();
if (new_descriptor_log) {
ROCKS_LOG_INFO(db_options_->info_log,
"Deleting manifest %" PRIu64 " current manifest %" PRIu64
"\n",
manifest_file_number_, pending_manifest_file_number_);
descriptor_log_.reset();
env_->DeleteFile(
DescriptorFileName(dbname_, pending_manifest_file_number_));
}
@ -4980,7 +4980,7 @@ Status VersionSet::WriteCurrentStateToManifest(log::Writer* log) {
f->fd.GetFileSize(), f->smallest, f->largest,
f->fd.smallest_seqno, f->fd.largest_seqno,
f->marked_for_compaction, f->oldest_blob_file_number,
f->oldest_ancester_time);
f->oldest_ancester_time, f->file_creation_time);
}
}
edit.SetLogNumber(cfd->GetLogNumber());

View File

@ -40,7 +40,7 @@ class GenerateLevelFilesBriefTest : public testing::Test {
InternalKey(smallest, smallest_seq, kTypeValue),
InternalKey(largest, largest_seq, kTypeValue), smallest_seq,
largest_seq, /* marked_for_compact */ false, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
files_.push_back(f);
}
@ -135,7 +135,7 @@ class VersionStorageInfoTest : public testing::Test {
file_number, 0, file_size, GetInternalKey(smallest, 0),
GetInternalKey(largest, 0), /* smallest_seq */ 0, /* largest_seq */ 0,
/* marked_for_compact */ false, kInvalidBlobFileNumber,
kUnknownOldestAncesterTime);
kUnknownOldestAncesterTime, kUnknownFileCreationTime);
f->compensated_file_size = file_size;
vstorage_.AddFile(level, f);
}
@ -146,7 +146,8 @@ class VersionStorageInfoTest : public testing::Test {
FileMetaData* f = new FileMetaData(
file_number, 0, file_size, smallest, largest, /* smallest_seq */ 0,
/* largest_seq */ 0, /* marked_for_compact */ false,
kInvalidBlobFileNumber, kUnknownOldestAncesterTime);
kInvalidBlobFileNumber, kUnknownOldestAncesterTime,
kUnknownFileCreationTime);
f->compensated_file_size = file_size;
vstorage_.AddFile(level, f);
}

View File

@ -1784,14 +1784,28 @@ class MemTableInserter : public WriteBatch::Handler {
// check if memtable_list size exceeds max_write_buffer_size_to_maintain
if (trim_history_scheduler_ != nullptr) {
auto* cfd = cf_mems_->current();
assert(cfd != nullptr);
if (cfd->ioptions()->max_write_buffer_size_to_maintain > 0 &&
cfd->mem()->ApproximateMemoryUsageFast() +
cfd->imm()->ApproximateMemoryUsageExcludingLast() >=
static_cast<size_t>(
cfd->ioptions()->max_write_buffer_size_to_maintain) &&
cfd->imm()->MarkTrimHistoryNeeded()) {
trim_history_scheduler_->ScheduleWork(cfd);
assert(cfd);
assert(cfd->ioptions());
const size_t size_to_maintain = static_cast<size_t>(
cfd->ioptions()->max_write_buffer_size_to_maintain);
if (size_to_maintain > 0) {
MemTableList* const imm = cfd->imm();
assert(imm);
if (imm->HasHistory()) {
const MemTable* const mem = cfd->mem();
assert(mem);
if (mem->ApproximateMemoryUsageFast() +
imm->ApproximateMemoryUsageExcludingLast() >=
size_to_maintain &&
imm->MarkTrimHistoryNeeded()) {
trim_history_scheduler_->ScheduleWork(cfd);
}
}
}
}
}

View File

@ -344,6 +344,9 @@ void WriteThread::BeginWriteStall() {
prev->link_older = w->link_older;
w->status = Status::Incomplete("Write stall");
SetState(w, STATE_COMPLETED);
if (prev->link_older) {
prev->link_older->link_newer = prev;
}
w = prev->link_older;
} else {
prev = w;
@ -355,7 +358,11 @@ void WriteThread::BeginWriteStall() {
void WriteThread::EndWriteStall() {
MutexLock lock(&stall_mu_);
// Unlink write_stall_dummy_ from the write queue. This will unblock
// pending write threads to enqueue themselves
assert(newest_writer_.load(std::memory_order_relaxed) == &write_stall_dummy_);
assert(write_stall_dummy_.link_older != nullptr);
write_stall_dummy_.link_older->link_newer = write_stall_dummy_.link_newer;
newest_writer_.exchange(write_stall_dummy_.link_older);
// Wake up writers

View File

@ -73,7 +73,8 @@ Status DeleteScheduler::DeleteFile(const std::string& file_path,
s = MarkAsTrash(file_path, &trash_file);
if (!s.ok()) {
ROCKS_LOG_ERROR(info_log_, "Failed to mark %s as trash", file_path.c_str());
ROCKS_LOG_ERROR(info_log_, "Failed to mark %s as trash -- %s",
file_path.c_str(), s.ToString().c_str());
s = env_->DeleteFile(file_path);
if (s.ok()) {
sst_file_manager_->OnDeleteFile(file_path);

View File

@ -306,6 +306,7 @@ void SstFileManagerImpl::ClearError() {
// since the ErrorHandler::recovery_in_prog_ flag would be true
cur_instance_ = error_handler;
mu_.Unlock();
TEST_SYNC_POINT("SstFileManagerImpl::ClearError");
s = error_handler->RecoverFromBGError();
mu_.Lock();
// The DB instance might have been deleted while we were

View File

@ -255,7 +255,7 @@ class Cache {
// Always delete the DB object before calling this method!
virtual void DisownData(){
// default implementation is noop
};
}
// Apply callback to all entries in the cache
// If thread_safe is true, it will also lock the accesses. Otherwise, it will

View File

@ -672,7 +672,7 @@ class RandomAccessFile {
virtual size_t GetUniqueId(char* /*id*/, size_t /*max_size*/) const {
return 0; // Default implementation to prevent issues with backwards
// compatibility.
};
}
enum AccessPattern { NORMAL, RANDOM, SEQUENTIAL, WILLNEED, DONTNEED };
@ -1414,7 +1414,7 @@ class RandomAccessFileWrapper : public RandomAccessFile {
}
size_t GetUniqueId(char* id, size_t max_size) const override {
return target_->GetUniqueId(id, max_size);
};
}
void Hint(AccessPattern pattern) override { target_->Hint(pattern); }
bool use_direct_io() const override { return target_->use_direct_io(); }
size_t GetRequiredBufferAlignment() const override {

View File

@ -69,7 +69,8 @@ struct SstFileMetaData {
SequenceNumber _smallest_seqno, SequenceNumber _largest_seqno,
const std::string& _smallestkey,
const std::string& _largestkey, uint64_t _num_reads_sampled,
bool _being_compacted, uint64_t _oldest_blob_file_number)
bool _being_compacted, uint64_t _oldest_blob_file_number,
uint64_t _oldest_ancester_time, uint64_t _file_creation_time)
: size(_size),
name(_file_name),
file_number(_file_number),
@ -82,7 +83,9 @@ struct SstFileMetaData {
being_compacted(_being_compacted),
num_entries(0),
num_deletions(0),
oldest_blob_file_number(_oldest_blob_file_number) {}
oldest_blob_file_number(_oldest_blob_file_number),
oldest_ancester_time(_oldest_ancester_time),
file_creation_time(_file_creation_time) {}
// File size in bytes.
size_t size;
@ -105,6 +108,15 @@ struct SstFileMetaData {
uint64_t oldest_blob_file_number; // The id of the oldest blob file
// referenced by the file.
// An SST file may be generated by compactions whose input files may
// in turn be generated by earlier compactions. The creation time of the
// oldest SST file that is the compaction ancester of this file.
// The timestamp is provided Env::GetCurrentTime().
// 0 if the information is not available.
uint64_t oldest_ancester_time;
// Timestamp when the SST file is created, provided by Env::GetCurrentTime().
// 0 if the information is not available.
uint64_t file_creation_time;
};
// The full set of metadata associated with each SST file.

View File

@ -1079,10 +1079,10 @@ struct DBOptions {
// independently if the process crashes later and tries to recover.
bool atomic_flush = false;
// If true, ColumnFamilyHandle's and Iterator's destructors won't delete
// obsolete files directly and will instead schedule a background job
// to do it. Use it if you're destroying iterators or ColumnFamilyHandle-s
// from latency-sensitive threads.
// If true, working thread may avoid doing unnecessary and long-latency
// operation (such as deleting obsolete files directly or deleting memtable)
// and will instead schedule a background job to do it.
// Use it if you're latency-sensitive.
// If set to true, takes precedence over
// ReadOptions::background_purge_on_iterator_cleanup.
bool avoid_unnecessary_blocking_io = false;

View File

@ -195,8 +195,15 @@ class PinnableSlice : public Slice, public Cleanable {
}
}
void remove_prefix(size_t /*n*/) {
assert(0); // Not implemented
void remove_prefix(size_t n) {
assert(n <= size());
if (pinned_) {
data_ += n;
size_ -= n;
} else {
buf_->erase(0, n);
PinSelf();
}
}
void Reset() {

View File

@ -523,7 +523,7 @@ class Statistics {
virtual bool getTickerMap(std::map<std::string, uint64_t>*) const {
// Do nothing by default
return false;
};
}
// Override this function to disable particular histogram collection
virtual bool HistEnabledForType(uint32_t type) const {

View File

@ -269,9 +269,9 @@ struct BlockBasedTableOptions {
// probably use this as it would reduce the index size.
// This option only affects newly written tables. When reading existing
// tables, the information about version is read from the footer.
// 5 -- Can be read by RocksDB's versions since X.X.X (something after 6.4.6)
// Full and partitioned filters use a generally faster and more accurate
// Bloom filter implementation, with a different schema.
// 5 -- Can be read by RocksDB's versions since 6.6.0. Full and partitioned
// filters use a generally faster and more accurate Bloom filter
// implementation, with a different schema.
uint32_t format_version = 2;
// Store index blocks on disk in compressed format. Changing this option to

View File

@ -104,13 +104,13 @@ struct BackupableDBOptions {
// Default: 4194304
uint64_t callback_trigger_interval_size;
// When Open() is called, it will open at most this many of the latest
// non-corrupted backups.
// For BackupEngineReadOnly, Open() will open at most this many of the
// latest non-corrupted backups.
//
// Note setting this to a non-default value prevents old files from being
// deleted in the shared directory, as we can't do proper ref-counting. If
// using this option, make sure to occasionally disable it (by resetting to
// INT_MAX) and run GarbageCollect to clean accumulated stale files.
// Note: this setting is ignored (behaves like INT_MAX) for any kind of
// writable BackupEngine because it would inhibit accounting for shared
// files for proper backup deletion, including purging any incompletely
// created backups on creation of a new backup.
//
// Default: INT_MAX
int max_valid_backups_to_open;

View File

@ -6,7 +6,7 @@
#define ROCKSDB_MAJOR 6
#define ROCKSDB_MINOR 6
#define ROCKSDB_PATCH 0
#define ROCKSDB_PATCH 4
// Do not use these. We made the mistake of declaring macros starting with
// double underscore. Now we have to live with our choice. We'll deprecate these

View File

@ -312,17 +312,19 @@ if(NOT EXISTS ${JAVA_TEST_LIBDIR})
file(MAKE_DIRECTORY mkdir ${JAVA_TEST_LIBDIR})
endif()
if (DEFINED CUSTOM_REPO_URL)
set(SEARCH_REPO_URL ${CUSTOM_REPO_URL}/)
set(CENTRAL_REPO_URL ${CUSTOM_REPO_URL}/)
if (DEFINED CUSTOM_DEPS_URL)
set(DEPS_URL ${CUSTOM_DEPS_URL}/)
else ()
set(SEARCH_REPO_URL "http://search.maven.org/remotecontent?filepath=")
set(CENTRAL_REPO_URL "http://central.maven.org/maven2/")
# This is a URL for artifacts from a "fake" release on pdillinger's fork,
# so as not to put binaries in git (ew). We should move to hosting these
# under the facebook account on github, or something else more reliable
# than maven.org, which has been failing frequently from Travis.
set(DEPS_URL "https://github.com/pdillinger/rocksdb/releases/download/v6.6.x-java-deps")
endif()
if(NOT EXISTS ${JAVA_JUNIT_JAR})
message("Downloading ${JAVA_JUNIT_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}junit/junit/4.12/junit-4.12.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
file(DOWNLOAD ${DEPS_URL}/junit-4.12.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_JUNIT_JAR}")
@ -331,7 +333,7 @@ if(NOT EXISTS ${JAVA_JUNIT_JAR})
endif()
if(NOT EXISTS ${JAVA_HAMCR_JAR})
message("Downloading ${JAVA_HAMCR_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
file(DOWNLOAD ${DEPS_URL}/hamcrest-core-1.3.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_HAMCR_JAR}")
@ -340,7 +342,7 @@ if(NOT EXISTS ${JAVA_HAMCR_JAR})
endif()
if(NOT EXISTS ${JAVA_MOCKITO_JAR})
message("Downloading ${JAVA_MOCKITO_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
file(DOWNLOAD ${DEPS_URL}/mockito-all-1.10.19.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_MOCKITO_JAR}")
@ -349,7 +351,7 @@ if(NOT EXISTS ${JAVA_MOCKITO_JAR})
endif()
if(NOT EXISTS ${JAVA_CGLIB_JAR})
message("Downloading ${JAVA_CGLIB_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}cglib/cglib/2.2.2/cglib-2.2.2.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
file(DOWNLOAD ${DEPS_URL}/cglib-2.2.2.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_CGLIB_JAR}")
@ -358,7 +360,7 @@ if(NOT EXISTS ${JAVA_CGLIB_JAR})
endif()
if(NOT EXISTS ${JAVA_ASSERTJ_JAR})
message("Downloading ${JAVA_ASSERTJ_JAR}")
file(DOWNLOAD ${CENTRAL_REPO_URL}org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
file(DOWNLOAD ${DEPS_URL}/assertj-core-1.7.1.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_ASSERTJ_JAR}")

View File

@ -213,16 +213,23 @@ ifneq ($(DEBUG_LEVEL),0)
JAVAC_ARGS = -Xlint:deprecation -Xlint:unchecked
endif
SEARCH_REPO_URL?=http://search.maven.org/remotecontent?filepath=
CENTRAL_REPO_URL?=http://central.maven.org/maven2/
# This is a URL for artifacts from a "fake" release on pdillinger's fork,
# so as not to put binaries in git (ew). We should move to hosting these
# under the facebook account on github, or something else more reliable
# than maven.org, which has been failing frequently from Travis.
DEPS_URL?=https://github.com/pdillinger/rocksdb/releases/download/v6.6.x-java-deps
clean:
$(AM_V_at)rm -rf include/*
$(AM_V_at)rm -rf test-libs/
clean: clean-not-downloaded clean-downloaded
clean-not-downloaded:
$(AM_V_at)rm -rf $(NATIVE_INCLUDE)
$(AM_V_at)rm -rf $(OUTPUT)
$(AM_V_at)rm -rf $(BENCHMARK_OUTPUT)
$(AM_V_at)rm -rf $(SAMPLES_OUTPUT)
clean-downloaded:
$(AM_V_at)rm -rf $(JAVA_TEST_LIBDIR)
javadocs: java
$(AM_V_GEN)mkdir -p $(JAVADOC)
@ -279,11 +286,11 @@ optimistic_transaction_sample: java
resolve_test_deps:
test -d "$(JAVA_TEST_LIBDIR)" || mkdir -p "$(JAVA_TEST_LIBDIR)"
test -s "$(JAVA_JUNIT_JAR)" || cp $(MVN_LOCAL)/junit/junit/4.12/junit-4.12.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o $(JAVA_JUNIT_JAR) $(SEARCH_REPO_URL)junit/junit/4.12/junit-4.12.jar
test -s "$(JAVA_HAMCR_JAR)" || cp $(MVN_LOCAL)/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o $(JAVA_HAMCR_JAR) $(SEARCH_REPO_URL)org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar
test -s "$(JAVA_MOCKITO_JAR)" || cp $(MVN_LOCAL)/org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o "$(JAVA_MOCKITO_JAR)" $(SEARCH_REPO_URL)org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar
test -s "$(JAVA_CGLIB_JAR)" || cp $(MVN_LOCAL)/cglib/cglib/2.2.2/cglib-2.2.2.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o "$(JAVA_CGLIB_JAR)" $(SEARCH_REPO_URL)cglib/cglib/2.2.2/cglib-2.2.2.jar
test -s "$(JAVA_ASSERTJ_JAR)" || cp $(MVN_LOCAL)/org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o "$(JAVA_ASSERTJ_JAR)" $(CENTRAL_REPO_URL)org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar
test -s "$(JAVA_JUNIT_JAR)" || cp $(MVN_LOCAL)/junit/junit/4.12/junit-4.12.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output $(JAVA_JUNIT_JAR) --location $(DEPS_URL)/junit-4.12.jar
test -s "$(JAVA_HAMCR_JAR)" || cp $(MVN_LOCAL)/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output $(JAVA_HAMCR_JAR) --location $(DEPS_URL)/hamcrest-core-1.3.jar
test -s "$(JAVA_MOCKITO_JAR)" || cp $(MVN_LOCAL)/org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output "$(JAVA_MOCKITO_JAR)" --location $(DEPS_URL)/mockito-all-1.10.19.jar
test -s "$(JAVA_CGLIB_JAR)" || cp $(MVN_LOCAL)/cglib/cglib/2.2.2/cglib-2.2.2.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output "$(JAVA_CGLIB_JAR)" --location $(DEPS_URL)/cglib-2.2.2.jar
test -s "$(JAVA_ASSERTJ_JAR)" || cp $(MVN_LOCAL)/org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output "$(JAVA_ASSERTJ_JAR)" --location $(DEPS_URL)/assertj-core-1.7.1.jar
java_test: java resolve_test_deps
$(AM_V_GEN)mkdir -p $(TEST_CLASSES)

View File

@ -1247,9 +1247,51 @@ jint Java_org_rocksdb_Options_statsDumpPeriodSec(
*/
void Java_org_rocksdb_Options_setStatsDumpPeriodSec(
JNIEnv*, jobject, jlong jhandle,
jint stats_dump_period_sec) {
jint jstats_dump_period_sec) {
reinterpret_cast<rocksdb::Options*>(jhandle)->stats_dump_period_sec =
static_cast<int>(stats_dump_period_sec);
static_cast<unsigned int>(jstats_dump_period_sec);
}
/*
* Class: org_rocksdb_Options
* Method: statsPersistPeriodSec
* Signature: (J)I
*/
jint Java_org_rocksdb_Options_statsPersistPeriodSec(
JNIEnv*, jobject, jlong jhandle) {
return reinterpret_cast<rocksdb::Options*>(jhandle)->stats_persist_period_sec;
}
/*
* Class: org_rocksdb_Options
* Method: setStatsPersistPeriodSec
* Signature: (JI)V
*/
void Java_org_rocksdb_Options_setStatsPersistPeriodSec(
JNIEnv*, jobject, jlong jhandle, jint jstats_persist_period_sec) {
reinterpret_cast<rocksdb::Options*>(jhandle)->stats_persist_period_sec =
static_cast<unsigned int>(jstats_persist_period_sec);
}
/*
* Class: org_rocksdb_Options
* Method: statsHistoryBufferSize
* Signature: (J)J
*/
jlong Java_org_rocksdb_Options_statsHistoryBufferSize(
JNIEnv*, jobject, jlong jhandle) {
return reinterpret_cast<rocksdb::Options*>(jhandle)->stats_history_buffer_size;
}
/*
* Class: org_rocksdb_Options
* Method: setStatsHistoryBufferSize
* Signature: (JJ)V
*/
void Java_org_rocksdb_Options_setStatsHistoryBufferSize(
JNIEnv*, jobject, jlong jhandle, jlong jstats_history_buffer_size) {
reinterpret_cast<rocksdb::Options*>(jhandle)->stats_history_buffer_size =
static_cast<size_t>(jstats_history_buffer_size);
}
/*
@ -1481,6 +1523,28 @@ jlong Java_org_rocksdb_Options_walBytesPerSync(
return static_cast<jlong>(opt->wal_bytes_per_sync);
}
/*
* Class: org_rocksdb_Options
* Method: setStrictBytesPerSync
* Signature: (JZ)V
*/
void Java_org_rocksdb_Options_setStrictBytesPerSync(
JNIEnv*, jobject, jlong jhandle, jboolean jstrict_bytes_per_sync) {
reinterpret_cast<rocksdb::Options*>(jhandle)->strict_bytes_per_sync =
jstrict_bytes_per_sync == JNI_TRUE;
}
/*
* Class: org_rocksdb_Options
* Method: strictBytesPerSync
* Signature: (J)Z
*/
jboolean Java_org_rocksdb_Options_strictBytesPerSync(
JNIEnv*, jobject, jlong jhandle) {
auto* opt = reinterpret_cast<rocksdb::Options*>(jhandle);
return static_cast<jboolean>(opt->strict_bytes_per_sync);
}
/*
* Class: org_rocksdb_Options
* Method: setEnableThreadTracking
@ -5441,9 +5505,9 @@ jboolean Java_org_rocksdb_DBOptions_isFdCloseOnExec(
* Signature: (JI)V
*/
void Java_org_rocksdb_DBOptions_setStatsDumpPeriodSec(
JNIEnv*, jobject, jlong jhandle, jint stats_dump_period_sec) {
JNIEnv*, jobject, jlong jhandle, jint jstats_dump_period_sec) {
reinterpret_cast<rocksdb::DBOptions*>(jhandle)->stats_dump_period_sec =
static_cast<int>(stats_dump_period_sec);
static_cast<unsigned int>(jstats_dump_period_sec);
}
/*
@ -5456,6 +5520,48 @@ jint Java_org_rocksdb_DBOptions_statsDumpPeriodSec(
return reinterpret_cast<rocksdb::DBOptions*>(jhandle)->stats_dump_period_sec;
}
/*
* Class: org_rocksdb_DBOptions
* Method: setStatsPersistPeriodSec
* Signature: (JI)V
*/
void Java_org_rocksdb_DBOptions_setStatsPersistPeriodSec(
JNIEnv*, jobject, jlong jhandle, jint jstats_persist_period_sec) {
reinterpret_cast<rocksdb::DBOptions*>(jhandle)->stats_persist_period_sec =
static_cast<unsigned int>(jstats_persist_period_sec);
}
/*
* Class: org_rocksdb_DBOptions
* Method: statsPersistPeriodSec
* Signature: (J)I
*/
jint Java_org_rocksdb_DBOptions_statsPersistPeriodSec(
JNIEnv*, jobject, jlong jhandle) {
return reinterpret_cast<rocksdb::DBOptions*>(jhandle)->stats_persist_period_sec;
}
/*
* Class: org_rocksdb_DBOptions
* Method: setStatsHistoryBufferSize
* Signature: (JJ)V
*/
void Java_org_rocksdb_DBOptions_setStatsHistoryBufferSize(
JNIEnv*, jobject, jlong jhandle, jlong jstats_history_buffer_size) {
reinterpret_cast<rocksdb::DBOptions*>(jhandle)->stats_history_buffer_size =
static_cast<size_t>(jstats_history_buffer_size);
}
/*
* Class: org_rocksdb_DBOptions
* Method: statsHistoryBufferSize
* Signature: (J)J
*/
jlong Java_org_rocksdb_DBOptions_statsHistoryBufferSize(
JNIEnv*, jobject, jlong jhandle) {
return reinterpret_cast<rocksdb::DBOptions*>(jhandle)->stats_history_buffer_size;
}
/*
* Class: org_rocksdb_DBOptions
* Method: setAdviseRandomOnOpen
@ -5694,6 +5800,28 @@ jlong Java_org_rocksdb_DBOptions_walBytesPerSync(
return static_cast<jlong>(opt->wal_bytes_per_sync);
}
/*
* Class: org_rocksdb_DBOptions
* Method: setStrictBytesPerSync
* Signature: (JZ)V
*/
void Java_org_rocksdb_DBOptions_setStrictBytesPerSync(
JNIEnv*, jobject, jlong jhandle, jboolean jstrict_bytes_per_sync) {
reinterpret_cast<rocksdb::DBOptions*>(jhandle)->strict_bytes_per_sync =
jstrict_bytes_per_sync == JNI_TRUE;
}
/*
* Class: org_rocksdb_DBOptions
* Method: strictBytesPerSync
* Signature: (J)Z
*/
jboolean Java_org_rocksdb_DBOptions_strictBytesPerSync(
JNIEnv*, jobject, jlong jhandle) {
return static_cast<jboolean>(
reinterpret_cast<rocksdb::DBOptions*>(jhandle)->strict_bytes_per_sync);
}
/*
* Class: org_rocksdb_DBOptions
* Method: setDelayedWriteRate

View File

@ -660,6 +660,34 @@ public class DBOptions extends RocksObject
return statsDumpPeriodSec(nativeHandle_);
}
@Override
public DBOptions setStatsPersistPeriodSec(
final int statsPersistPeriodSec) {
assert(isOwningHandle());
setStatsPersistPeriodSec(nativeHandle_, statsPersistPeriodSec);
return this;
}
@Override
public int statsPersistPeriodSec() {
assert(isOwningHandle());
return statsPersistPeriodSec(nativeHandle_);
}
@Override
public DBOptions setStatsHistoryBufferSize(
final long statsHistoryBufferSize) {
assert(isOwningHandle());
setStatsHistoryBufferSize(nativeHandle_, statsHistoryBufferSize);
return this;
}
@Override
public long statsHistoryBufferSize() {
assert(isOwningHandle());
return statsHistoryBufferSize(nativeHandle_);
}
@Override
public DBOptions setAdviseRandomOnOpen(
final boolean adviseRandomOnOpen) {
@ -807,6 +835,19 @@ public class DBOptions extends RocksObject
return walBytesPerSync(nativeHandle_);
}
@Override
public DBOptions setStrictBytesPerSync(final boolean strictBytesPerSync) {
assert(isOwningHandle());
setStrictBytesPerSync(nativeHandle_, strictBytesPerSync);
return this;
}
@Override
public boolean strictBytesPerSync() {
assert(isOwningHandle());
return strictBytesPerSync(nativeHandle_);
}
//TODO(AR) NOW
// @Override
// public DBOptions setListeners(final List<EventListener> listeners) {
@ -1239,6 +1280,14 @@ public class DBOptions extends RocksObject
private native void setStatsDumpPeriodSec(
long handle, int statsDumpPeriodSec);
private native int statsDumpPeriodSec(long handle);
private native void setStatsPersistPeriodSec(
final long handle, final int statsPersistPeriodSec);
private native int statsPersistPeriodSec(
final long handle);
private native void setStatsHistoryBufferSize(
final long handle, final long statsHistoryBufferSize);
private native long statsHistoryBufferSize(
final long handle);
private native void setAdviseRandomOnOpen(
long handle, boolean adviseRandomOnOpen);
private native boolean adviseRandomOnOpen(long handle);
@ -1270,6 +1319,10 @@ public class DBOptions extends RocksObject
private native long bytesPerSync(long handle);
private native void setWalBytesPerSync(long handle, long walBytesPerSync);
private native long walBytesPerSync(long handle);
private native void setStrictBytesPerSync(
final long handle, final boolean strictBytesPerSync);
private native boolean strictBytesPerSync(
final long handle);
private native void setEnableThreadTracking(long handle,
boolean enableThreadTracking);
private native boolean enableThreadTracking(long handle);

View File

@ -13,6 +13,10 @@ import java.util.List;
*/
public abstract class Env extends RocksObject {
static {
RocksDB.loadLibrary();
}
private static final Env DEFAULT_ENV = new RocksEnv(getDefaultEnvInternal());
static {
/**

View File

@ -89,9 +89,12 @@ public class MutableDBOptions extends AbstractMutableOptions {
max_total_wal_size(ValueType.LONG),
delete_obsolete_files_period_micros(ValueType.LONG),
stats_dump_period_sec(ValueType.INT),
stats_persist_period_sec(ValueType.INT),
stats_history_buffer_size(ValueType.LONG),
max_open_files(ValueType.INT),
bytes_per_sync(ValueType.LONG),
wal_bytes_per_sync(ValueType.LONG),
strict_bytes_per_sync(ValueType.BOOLEAN),
compaction_readahead_size(ValueType.LONG);
private final ValueType valueType;
@ -240,6 +243,28 @@ public class MutableDBOptions extends AbstractMutableOptions {
return getInt(DBOption.stats_dump_period_sec);
}
@Override
public MutableDBOptionsBuilder setStatsPersistPeriodSec(
final int statsPersistPeriodSec) {
return setInt(DBOption.stats_persist_period_sec, statsPersistPeriodSec);
}
@Override
public int statsPersistPeriodSec() {
return getInt(DBOption.stats_persist_period_sec);
}
@Override
public MutableDBOptionsBuilder setStatsHistoryBufferSize(
final long statsHistoryBufferSize) {
return setLong(DBOption.stats_history_buffer_size, statsHistoryBufferSize);
}
@Override
public long statsHistoryBufferSize() {
return getLong(DBOption.stats_history_buffer_size);
}
@Override
public MutableDBOptionsBuilder setMaxOpenFiles(final int maxOpenFiles) {
return setInt(DBOption.max_open_files, maxOpenFiles);
@ -271,6 +296,17 @@ public class MutableDBOptions extends AbstractMutableOptions {
return getLong(DBOption.wal_bytes_per_sync);
}
@Override
public MutableDBOptionsBuilder setStrictBytesPerSync(
final boolean strictBytesPerSync) {
return setBoolean(DBOption.strict_bytes_per_sync, strictBytesPerSync);
}
@Override
public boolean strictBytesPerSync() {
return getBoolean(DBOption.strict_bytes_per_sync);
}
@Override
public MutableDBOptionsBuilder setCompactionReadaheadSize(
final long compactionReadaheadSize) {

View File

@ -237,6 +237,44 @@ public interface MutableDBOptionsInterface<T extends MutableDBOptionsInterface<T
*/
int statsDumpPeriodSec();
/**
* If not zero, dump rocksdb.stats to RocksDB every
* {@code statsPersistPeriodSec}
*
* Default: 600
*
* @param statsPersistPeriodSec time interval in seconds.
* @return the instance of the current object.
*/
T setStatsPersistPeriodSec(int statsPersistPeriodSec);
/**
* If not zero, dump rocksdb.stats to RocksDB every
* {@code statsPersistPeriodSec}
*
* @return time interval in seconds.
*/
int statsPersistPeriodSec();
/**
* If not zero, periodically take stats snapshots and store in memory, the
* memory size for stats snapshots is capped at {@code statsHistoryBufferSize}
*
* Default: 1MB
*
* @param statsHistoryBufferSize the size of the buffer.
* @return the instance of the current object.
*/
T setStatsHistoryBufferSize(long statsHistoryBufferSize);
/**
* If not zero, periodically take stats snapshots and store in memory, the
* memory size for stats snapshots is capped at {@code statsHistoryBufferSize}
*
* @return the size of the buffer.
*/
long statsHistoryBufferSize();
/**
* Number of open files that can be used by the DB. You may need to
* increase this if your database has a large working set. Value -1 means
@ -303,6 +341,42 @@ public interface MutableDBOptionsInterface<T extends MutableDBOptionsInterface<T
*/
long walBytesPerSync();
/**
* When true, guarantees WAL files have at most {@link #walBytesPerSync()}
* bytes submitted for writeback at any given time, and SST files have at most
* {@link #bytesPerSync()} bytes pending writeback at any given time. This
* can be used to handle cases where processing speed exceeds I/O speed
* during file generation, which can lead to a huge sync when the file is
* finished, even with {@link #bytesPerSync()} / {@link #walBytesPerSync()}
* properly configured.
*
* - If `sync_file_range` is supported it achieves this by waiting for any
* prior `sync_file_range`s to finish before proceeding. In this way,
* processing (compression, etc.) can proceed uninhibited in the gap
* between `sync_file_range`s, and we block only when I/O falls
* behind.
* - Otherwise the `WritableFile::Sync` method is used. Note this mechanism
* always blocks, thus preventing the interleaving of I/O and processing.
*
* Note: Enabling this option does not provide any additional persistence
* guarantees, as it may use `sync_file_range`, which does not write out
* metadata.
*
* Default: false
*
* @param strictBytesPerSync the bytes per sync
* @return the instance of the current object.
*/
T setStrictBytesPerSync(boolean strictBytesPerSync);
/**
* Return the strict byte limit per sync.
*
* See {@link #setStrictBytesPerSync(boolean)}
*
* @return the limit in bytes.
*/
boolean strictBytesPerSync();
/**
* If non-zero, we perform bigger reads when doing compaction. If you're

View File

@ -739,6 +739,34 @@ public class Options extends RocksObject
return this;
}
@Override
public Options setStatsPersistPeriodSec(
final int statsPersistPeriodSec) {
assert(isOwningHandle());
setStatsPersistPeriodSec(nativeHandle_, statsPersistPeriodSec);
return this;
}
@Override
public int statsPersistPeriodSec() {
assert(isOwningHandle());
return statsPersistPeriodSec(nativeHandle_);
}
@Override
public Options setStatsHistoryBufferSize(
final long statsHistoryBufferSize) {
assert(isOwningHandle());
setStatsHistoryBufferSize(nativeHandle_, statsHistoryBufferSize);
return this;
}
@Override
public long statsHistoryBufferSize() {
assert(isOwningHandle());
return statsHistoryBufferSize(nativeHandle_);
}
@Override
public boolean adviseRandomOnOpen() {
return adviseRandomOnOpen(nativeHandle_);
@ -883,6 +911,19 @@ public class Options extends RocksObject
return walBytesPerSync(nativeHandle_);
}
@Override
public Options setStrictBytesPerSync(final boolean strictBytesPerSync) {
assert(isOwningHandle());
setStrictBytesPerSync(nativeHandle_, strictBytesPerSync);
return this;
}
@Override
public boolean strictBytesPerSync() {
assert(isOwningHandle());
return strictBytesPerSync(nativeHandle_);
}
@Override
public Options setEnableThreadTracking(final boolean enableThreadTracking) {
assert(isOwningHandle());
@ -1858,6 +1899,14 @@ public class Options extends RocksObject
private native void setStatsDumpPeriodSec(
long handle, int statsDumpPeriodSec);
private native int statsDumpPeriodSec(long handle);
private native void setStatsPersistPeriodSec(
final long handle, final int statsPersistPeriodSec);
private native int statsPersistPeriodSec(
final long handle);
private native void setStatsHistoryBufferSize(
final long handle, final long statsHistoryBufferSize);
private native long statsHistoryBufferSize(
final long handle);
private native void setAdviseRandomOnOpen(
long handle, boolean adviseRandomOnOpen);
private native boolean adviseRandomOnOpen(long handle);
@ -1889,6 +1938,10 @@ public class Options extends RocksObject
private native long bytesPerSync(long handle);
private native void setWalBytesPerSync(long handle, long walBytesPerSync);
private native long walBytesPerSync(long handle);
private native void setStrictBytesPerSync(
final long handle, final boolean strictBytesPerSync);
private native boolean strictBytesPerSync(
final long handle);
private native void setEnableThreadTracking(long handle,
boolean enableThreadTracking);
private native boolean enableThreadTracking(long handle);

View File

@ -406,6 +406,24 @@ public class DBOptionsTest {
}
}
@Test
public void statsPersistPeriodSec() {
try (final DBOptions opt = new DBOptions()) {
final int intValue = rand.nextInt();
opt.setStatsPersistPeriodSec(intValue);
assertThat(opt.statsPersistPeriodSec()).isEqualTo(intValue);
}
}
@Test
public void statsHistoryBufferSize() {
try (final DBOptions opt = new DBOptions()) {
final long longValue = rand.nextLong();
opt.setStatsHistoryBufferSize(longValue);
assertThat(opt.statsHistoryBufferSize()).isEqualTo(longValue);
}
}
@Test
public void adviseRandomOnOpen() {
try(final DBOptions opt = new DBOptions()) {
@ -516,6 +534,15 @@ public class DBOptionsTest {
}
}
@Test
public void strictBytesPerSync() {
try (final DBOptions opt = new DBOptions()) {
assertThat(opt.strictBytesPerSync()).isFalse();
opt.setStrictBytesPerSync(true);
assertThat(opt.strictBytesPerSync()).isTrue();
}
}
@Test
public void enableThreadTracking() {
try (final DBOptions opt = new DBOptions()) {

View File

@ -56,21 +56,22 @@ public class MutableDBOptionsTest {
}
@Test
public void mutableColumnFamilyOptions_toString() {
public void mutableDBOptions_toString() {
final String str = MutableDBOptions
.builder()
.setMaxOpenFiles(99)
.setDelayedWriteRate(789)
.setAvoidFlushDuringShutdown(true)
.setStrictBytesPerSync(true)
.build()
.toString();
assertThat(str).isEqualTo("max_open_files=99;delayed_write_rate=789;"
+ "avoid_flush_during_shutdown=true");
+ "avoid_flush_during_shutdown=true;strict_bytes_per_sync=true");
}
@Test
public void mutableColumnFamilyOptions_parse() {
public void mutableDBOptions_parse() {
final String str = "max_open_files=99;delayed_write_rate=789;"
+ "avoid_flush_during_shutdown=true";

View File

@ -625,6 +625,24 @@ public class OptionsTest {
}
}
@Test
public void statsPersistPeriodSec() {
try (final Options opt = new Options()) {
final int intValue = rand.nextInt();
opt.setStatsPersistPeriodSec(intValue);
assertThat(opt.statsPersistPeriodSec()).isEqualTo(intValue);
}
}
@Test
public void statsHistoryBufferSize() {
try (final Options opt = new Options()) {
final long longValue = rand.nextLong();
opt.setStatsHistoryBufferSize(longValue);
assertThat(opt.statsHistoryBufferSize()).isEqualTo(longValue);
}
}
@Test
public void adviseRandomOnOpen() {
try (final Options opt = new Options()) {
@ -735,6 +753,15 @@ public class OptionsTest {
}
}
@Test
public void strictBytesPerSync() {
try (final Options opt = new Options()) {
assertThat(opt.strictBytesPerSync()).isFalse();
opt.setStrictBytesPerSync(true);
assertThat(opt.strictBytesPerSync()).isTrue();
}
}
@Test
public void enableThreadTracking() {
try (final Options opt = new Options()) {

View File

@ -1025,7 +1025,12 @@ Status WinRandomRWFile::Close() {
//////////////////////////////////////////////////////////////////////////
/// WinMemoryMappedBufer
WinMemoryMappedBuffer::~WinMemoryMappedBuffer() {
BOOL ret = FALSE;
BOOL ret
#if defined(_MSC_VER)
= FALSE;
#else
__attribute__((__unused__));
#endif
if (base_ != nullptr) {
ret = ::UnmapViewOfFile(base_);
assert(ret);

View File

@ -138,7 +138,12 @@ void WindowsThread::join() {
"WaitForSingleObjectFailed: thread join");
}
BOOL rc;
BOOL rc
#if defined(_MSC_VER)
= FALSE;
#else
__attribute__((__unused__));
#endif
rc = CloseHandle(reinterpret_cast<HANDLE>(data_->handle_));
assert(rc != 0);
data_->handle_ = 0;

View File

@ -443,7 +443,8 @@ void BloomFilterPolicy::CreateFilter(const Slice* keys, int n,
std::string* dst) const {
// We should ideally only be using this deprecated interface for
// appropriately constructed BloomFilterPolicy
assert(mode_ == kDeprecatedBlock);
// FIXME disabled because of bug in C interface; see issue #6129
//assert(mode_ == kDeprecatedBlock);
// Compute bloom filter size (in both bits and bytes)
uint32_t bits = static_cast<uint32_t>(n * whole_bits_per_key_);

View File

@ -60,7 +60,7 @@ class channel {
private:
std::condition_variable cv_;
std::mutex lock_;
mutable std::mutex lock_;
std::queue<T> buffer_;
bool eof_;
};

View File

@ -21,7 +21,12 @@ const char* Status::CopyState(const char* state) {
#ifdef OS_WIN
const size_t cch = std::strlen(state) + 1; // +1 for the null terminator
char* result = new char[cch];
errno_t ret;
errno_t ret
#if defined(_MSC_VER)
;
#else
__attribute__((__unused__));
#endif
ret = strncpy_s(result, cch, state, cch - 1);
result[cch - 1] = '\0';
assert(ret == 0);

View File

@ -87,7 +87,7 @@ class BlobDB : public StackableDB {
virtual Status Put(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value) override {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
return Status::NotSupported(
"Blob DB doesn't support non-default column family.");
}
@ -98,7 +98,7 @@ class BlobDB : public StackableDB {
virtual Status Delete(const WriteOptions& options,
ColumnFamilyHandle* column_family,
const Slice& key) override {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
return Status::NotSupported(
"Blob DB doesn't support non-default column family.");
}
@ -111,7 +111,7 @@ class BlobDB : public StackableDB {
virtual Status PutWithTTL(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value, uint64_t ttl) {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
return Status::NotSupported(
"Blob DB doesn't support non-default column family.");
}
@ -125,7 +125,7 @@ class BlobDB : public StackableDB {
virtual Status PutUntil(const WriteOptions& options,
ColumnFamilyHandle* column_family, const Slice& key,
const Slice& value, uint64_t expiration) {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
return Status::NotSupported(
"Blob DB doesn't support non-default column family.");
}
@ -157,7 +157,7 @@ class BlobDB : public StackableDB {
const std::vector<Slice>& keys,
std::vector<std::string>* values) override {
for (auto column_family : column_families) {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
return std::vector<Status>(
column_families.size(),
Status::NotSupported(
@ -197,7 +197,7 @@ class BlobDB : public StackableDB {
virtual Iterator* NewIterator(const ReadOptions& options) override = 0;
virtual Iterator* NewIterator(const ReadOptions& options,
ColumnFamilyHandle* column_family) override {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
// Blob DB doesn't support non-default column family.
return nullptr;
}
@ -217,7 +217,7 @@ class BlobDB : public StackableDB {
const int output_path_id = -1,
std::vector<std::string>* const output_file_names = nullptr,
CompactionJobInfo* compaction_job_info = nullptr) override {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
return Status::NotSupported(
"Blob DB doesn't support non-default column family.");
}

View File

@ -459,6 +459,10 @@ void BlobDBImpl::ProcessFlushJobInfo(const FlushJobInfo& info) {
void BlobDBImpl::ProcessCompactionJobInfo(const CompactionJobInfo& info) {
assert(bdb_options_.enable_garbage_collection);
if (!info.status.ok()) {
return;
}
// Note: the same SST file may appear in both the input and the output
// file list in case of a trivial move. We process the inputs first
// to ensure the blob file still has a link after processing all updates.
@ -975,7 +979,7 @@ Status BlobDBImpl::PutBlobValue(const WriteOptions& /*options*/,
}
if (s.ok()) {
assert(blob_file != nullptr);
assert(blob_file->compression() == bdb_options_.compression);
assert(blob_file->GetCompressionType() == bdb_options_.compression);
s = AppendBlob(blob_file, headerbuf, key, value_compressed, expiration,
&index_entry);
}
@ -1364,7 +1368,7 @@ Status BlobDBImpl::GetRawBlobFromFile(const Slice& key, uint64_t file_number,
blob_file = it->second;
}
*compression_type = blob_file->compression();
*compression_type = blob_file->GetCompressionType();
// takes locks when called
std::shared_ptr<RandomAccessFileReader> reader;
@ -1462,7 +1466,7 @@ Status BlobDBImpl::Get(const ReadOptions& read_options,
Status BlobDBImpl::GetImpl(const ReadOptions& read_options,
ColumnFamilyHandle* column_family, const Slice& key,
PinnableSlice* value, uint64_t* expiration) {
if (column_family != DefaultColumnFamily()) {
if (column_family->GetID() != DefaultColumnFamily()->GetID()) {
return Status::NotSupported(
"Blob DB doesn't support non-default column family.");
}

View File

@ -1780,6 +1780,36 @@ TEST_F(BlobDBTest, MaintainBlobFileToSstMapping) {
ASSERT_EQ(obsolete_files[0]->BlobFileNumber(), 1);
}
// Simulate a failed compaction. No mappings should be updated.
{
CompactionJobInfo info{};
info.input_file_infos.emplace_back(CompactionFileInfo{1, 7, 2});
info.input_file_infos.emplace_back(CompactionFileInfo{2, 22, 5});
info.output_file_infos.emplace_back(CompactionFileInfo{2, 25, 3});
info.status = Status::Corruption();
blob_db_impl()->TEST_ProcessCompactionJobInfo(info);
const std::vector<std::unordered_set<uint64_t>> expected_sst_files{
{}, {7}, {3, 8, 23}, {4, 9}, {5, 10, 22}};
const std::vector<bool> expected_obsolete{true, false, false, false, false};
for (size_t i = 0; i < 5; ++i) {
const auto &blob_file = blob_files[i];
ASSERT_EQ(blob_file->GetLinkedSstFiles(), expected_sst_files[i]);
ASSERT_EQ(blob_file->Obsolete(), expected_obsolete[i]);
}
auto live_imm_files = blob_db_impl()->TEST_GetLiveImmNonTTLFiles();
ASSERT_EQ(live_imm_files.size(), 4);
for (size_t i = 0; i < 4; ++i) {
ASSERT_EQ(live_imm_files[i]->BlobFileNumber(), i + 2);
}
auto obsolete_files = blob_db_impl()->TEST_GetObsoleteFiles();
ASSERT_EQ(obsolete_files.size(), 1);
ASSERT_EQ(obsolete_files[0]->BlobFileNumber(), 1);
}
// Simulate another compaction. Blob file 2 loses all its linked SSTs
// but since it got marked immutable at sequence number 300 which hasn't
// been flushed yet, it cannot be marked obsolete at this point.

View File

@ -54,7 +54,7 @@ BlobFile::~BlobFile() {
}
}
uint32_t BlobFile::column_family_id() const { return column_family_id_; }
uint32_t BlobFile::GetColumnFamilyId() const { return column_family_id_; }
std::string BlobFile::PathName() const {
return BlobFileName(path_to_dir_, file_number_);

View File

@ -116,7 +116,7 @@ class BlobFile {
~BlobFile();
uint32_t column_family_id() const;
uint32_t GetColumnFamilyId() const;
// Returns log file's absolute pathname.
std::string PathName() const;
@ -204,7 +204,7 @@ class BlobFile {
void SetHasTTL(bool has_ttl) { has_ttl_ = has_ttl; }
CompressionType compression() const { return compression_; }
CompressionType GetCompressionType() const { return compression_; }
std::shared_ptr<Writer> GetWriter() const { return log_writer_; }

View File

@ -2823,6 +2823,104 @@ TEST_P(TransactionTest, MultiGetBatchedTest) {
}
}
// This test calls WriteBatchWithIndex::MultiGetFromBatchAndDB with a large
// number of keys, i.e greater than MultiGetContext::MAX_BATCH_SIZE, which is
// is 32. This forces autovector allocations in the MultiGet code paths
// to use std::vector in addition to stack allocations. The MultiGet keys
// includes Merges, which are handled specially in MultiGetFromBatchAndDB by
// allocating an autovector of MergeContexts
TEST_P(TransactionTest, MultiGetLargeBatchedTest) {
WriteOptions write_options;
ReadOptions read_options, snapshot_read_options;
string value;
Status s;
ColumnFamilyHandle* cf;
ColumnFamilyOptions cf_options;
std::vector<std::string> key_str;
for (int i = 0; i < 100; ++i) {
key_str.emplace_back(std::to_string(i));
}
// Create a new column families
s = db->CreateColumnFamily(cf_options, "CF", &cf);
ASSERT_OK(s);
delete cf;
delete db;
db = nullptr;
// open DB with three column families
std::vector<ColumnFamilyDescriptor> column_families;
// have to open default column family
column_families.push_back(
ColumnFamilyDescriptor(kDefaultColumnFamilyName, ColumnFamilyOptions()));
// open the new column families
cf_options.merge_operator = MergeOperators::CreateStringAppendOperator();
column_families.push_back(ColumnFamilyDescriptor("CF", cf_options));
std::vector<ColumnFamilyHandle*> handles;
options.merge_operator = MergeOperators::CreateStringAppendOperator();
ASSERT_OK(ReOpenNoDelete(column_families, &handles));
assert(db != nullptr);
// Write some data to the db
WriteBatch batch;
for (int i = 0; i < 3 * MultiGetContext::MAX_BATCH_SIZE; ++i) {
std::string val = "val" + std::to_string(i);
batch.Put(handles[1], key_str[i], val);
}
s = db->Write(write_options, &batch);
ASSERT_OK(s);
WriteBatchWithIndex wb;
// Write some data to the db
s = wb.Delete(handles[1], std::to_string(1));
ASSERT_OK(s);
s = wb.Put(handles[1], std::to_string(2), "new_val" + std::to_string(2));
ASSERT_OK(s);
// Write a lot of merges so when we call MultiGetFromBatchAndDB later on,
// it is forced to use std::vector in rocksdb::autovector to allocate
// MergeContexts. The number of merges needs to be >
// MultiGetContext::MAX_BATCH_SIZE
for (int i = 8; i < MultiGetContext::MAX_BATCH_SIZE + 24; ++i) {
s = wb.Merge(handles[1], std::to_string(i), "merge");
ASSERT_OK(s);
}
// MultiGet a lot of keys in order to force std::vector reallocations
std::vector<Slice> keys;
for (int i = 0; i < MultiGetContext::MAX_BATCH_SIZE + 32; ++i) {
keys.emplace_back(key_str[i]);
}
std::vector<PinnableSlice> values(keys.size());
std::vector<Status> statuses(keys.size());
wb.MultiGetFromBatchAndDB(db, snapshot_read_options, handles[1], keys.size(), keys.data(),
values.data(), statuses.data(), false);
for (size_t i =0; i < keys.size(); ++i) {
if (i == 1) {
ASSERT_TRUE(statuses[1].IsNotFound());
} else if (i == 2) {
ASSERT_TRUE(statuses[2].ok());
ASSERT_EQ(values[2], "new_val" + std::to_string(2));
} else if (i >= 8 && i < 56) {
ASSERT_TRUE(statuses[i].ok());
ASSERT_EQ(values[i], "val" + std::to_string(i) + ",merge");
} else {
ASSERT_TRUE(statuses[i].ok());
if (values[i] != "val" + std::to_string(i)) {
ASSERT_EQ(values[i], "val" + std::to_string(i));
}
}
}
for (auto handle : handles) {
delete handle;
}
}
TEST_P(TransactionTest, ColumnFamiliesTest2) {
WriteOptions write_options;
ReadOptions read_options, snapshot_read_options;

View File

@ -1004,10 +1004,13 @@ void WriteBatchWithIndex::MultiGetFromBatchAndDB(
assert(result == WriteBatchWithIndexInternal::Result::kMergeInProgress ||
result == WriteBatchWithIndexInternal::Result::kNotFound);
key_context.emplace_back(column_family, keys[i], &values[i], &statuses[i]);
sorted_keys.emplace_back(&key_context.back());
merges.emplace_back(result, std::move(merge_context));
}
for (KeyContext& key : key_context) {
sorted_keys.emplace_back(&key);
}
// Did not find key in batch OR could not resolve Merges. Try DB.
static_cast_with_check<DBImpl, DB>(db->GetRootDB())
->PrepareMultiGetKeys(key_context.size(), sorted_input, &sorted_keys);