Compare commits

...

20 Commits
main ... 6.7.fb

Author SHA1 Message Date
Peter Dillinger
9e3088a66b HISTORY.md update for bzip upgrade (#6767)
Summary:
See https://github.com/facebook/rocksdb/issues/6714 and https://github.com/facebook/rocksdb/issues/6703
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6767

Reviewed By: riversand963

Differential Revision: D21283307

Pulled By: pdillinger

fbshipit-source-id: 8463bec725669d13846c728ad4b5bde43f9a84f8
2020-04-28 13:48:32 -07:00
Adam Retter
f6d4b8a5a0 Update RocksJava static version of bzip2 (#6714)
Summary:
Updates the version of bzip2 used for RocksJava static builds.

Please, can we also get this cherry-picked to:

1. 6.7.fb
2. 6.8.fb
3. 6.9.fb
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6714

Reviewed By: cheng-chang

Differential Revision: D21067233

Pulled By: pdillinger

fbshipit-source-id: 8164b7eb99c5ca7b2021ab8c371ba9ded4cb4f7e
2020-04-16 15:51:14 -07:00
Peter Dillinger
eef6001ea2 Use an Amazon S3 bucket for downloading deps (#6526)
Summary:
After we had a lot of failures with maven.org downloads, we
wanted an alternative location for downloading binary dependencies.
Hosting them through github would have been good in terms of
organizational and network dependencies, but that approach seems to be
awkward (fake releases, so would need a 'rocksdb-deps' repo) and
strangely complicated for Facebook policy on open source repositories.

This commit moves the downloads (that are not officially hosted by
others on github) from my personal rocksdb fork to an S3 bucket owned
by the Facebook RocksDB AWS account. Facebook employees can access
this through an internal tool, and we should be able to grant permission
to outside collaborators.

Assuming this works out, I will back-port to older branches to stabilize
their CI testing as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6526

Test Plan: CI

Differential Revision: D20430130

Pulled By: pdillinger

fbshipit-source-id: df52394a65e0a57942db3039bdaade8a4d520cb2
2020-04-16 15:50:33 -07:00
sdong
0915c99f01 Bump up version to 6.7.3 2020-03-18 18:51:16 -07:00
sdong
4f48a2af7b cmake: add option WITH_CORE_TOOLS to exclude tools except ldb and sst_dump (#6506)
Summary:
ldb and sst_dump are most important tools and they don't dependend on gflags. In cmake, we don't have an way to only build these two tools and exclude other tools. This is inconvenient if the environment has a problem with gflags. Add such an option WITH_CORE_TOOLS.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6506

Test Plan: cmake and build with WITH_TOOLS and without.

Differential Revision: D20473029

fbshipit-source-id: 3d730fd14bbae6eeeae7f9cc9aec50a4e488ad72
2020-03-18 13:34:03 -07:00
Adam Retter
aaabdc85d9
Update to latest Snappy to fix compilation issue on latest MacOS XCode (#6498)
* Update to latest Snappy to fix compilation issue on latest MacOS XCode

* Fix build issues on MSVC with db_stress_tool types
2020-03-10 10:16:37 -07:00
sdong
5929273ea6 Fix data race of GetCreationTimeOfOldestFile() (#6473)
Summary:
When DBImpl::GetCreationTimeOfOldestFile() calls Version::GetCreationTimeOfOldestFile(), the version is not directly or indirectly referenced, so an event like compaction can race with the operation and cause DBImpl::GetCreationTimeOfOldestFile() to access delocated data. This was caught by an ASAN run:

==268==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000b7d198 at pc 0x000018332913 bp 0x7f391510d310 sp 0x7f391510d308
READ of size 8 at 0x612000b7d198 thread T845 (store_load-33)
SCARINESS: 51 (8-byte-read-heap-use-after-free)
    #0 0x18332912 in rocksdb::Version::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/version_set.cc:1488
    https://github.com/facebook/rocksdb/issues/1 0x1803ddaa in rocksdb::DBImpl::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/src/db/db_impl/db_impl.cc:4499
    https://github.com/facebook/rocksdb/issues/2 0xe24ca09 in rocksdb::StackableDB::GetCreationTimeOfOldestFile(unsigned long*) rocksdb/utilities/stackable_db.h:392
    ......
0x612000b7d198 is located 216 bytes inside of 296-byte region [0x612000b7d0c0,0x612000b7d1e8)
freed by thread T28 here:
    ......
    https://github.com/facebook/rocksdb/issues/5 0x1832c73f in std::vector<rocksdb::FileMetaData*, std::allocator<rocksdb::FileMetaData*> >::~vector() third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/stl_vector.h:435
    https://github.com/facebook/rocksdb/issues/6 0x1832c73f in rocksdb::VersionStorageInfo::~VersionStorageInfo() rocksdb/src/db/version_set.cc:734
    https://github.com/facebook/rocksdb/issues/7 0x1832cf42 in rocksdb::Version::~Version() rocksdb/src/db/version_set.cc:758
    https://github.com/facebook/rocksdb/issues/8 0x9d1bb5 in rocksdb::Version::Unref() rocksdb/src/db/version_set.cc:2869
    https://github.com/facebook/rocksdb/issues/9 0x183e7631 in rocksdb::Compaction::~Compaction() rocksdb/src/db/compaction/compaction.cc:275
    https://github.com/facebook/rocksdb/issues/10 0x9e6de6 in std::default_delete<rocksdb::Compaction>::operator()(rocksdb::Compaction*) const third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:78
    https://github.com/facebook/rocksdb/issues/11 0x9e6de6 in std::unique_ptr<rocksdb::Compaction, std::default_delete<rocksdb::Compaction> >::reset(rocksdb::Compaction*) third-party-buck/platform007/build/libgcc/include/c++/trunk/bits/unique_ptr.h:376
    https://github.com/facebook/rocksdb/issues/12 0x9e6de6 in rocksdb::DBImpl::BackgroundCompaction(bool*, rocksdb::JobContext*, rocksdb::LogBuffer*, rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2826
    https://github.com/facebook/rocksdb/issues/13 0x9ac3b8 in rocksdb::DBImpl::BackgroundCallCompaction(rocksdb::DBImpl::PrepickedCompaction*, rocksdb::Env::Priority) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2320
    https://github.com/facebook/rocksdb/issues/14 0x9abff7 in rocksdb::DBImpl::BGWorkCompaction(void*) rocksdb/src/db/db_impl/db_impl_compaction_flush.cc:2096
    ......

Fix the issue by reference the super version and use the referenced version from it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6473

Test Plan: Run ASAN for all existing tests.

Differential Revision: D20196416

fbshipit-source-id: 5f4a7918110fc7b8dd7841932d376bc9d1e59d6f
2020-03-03 11:23:01 -08:00
sdong
b0864639e8 Fix namespace regression 2020-02-25 13:24:25 -08:00
sdong
59d47e8c3e Bump up version to 6.7.2 2020-02-24 18:40:00 -08:00
sdong
b88bb93514 Buck config: Re-enable liburing under Linux (#6451)
Summary:
The known bug of liburing has been fixed. Now we can re-enable liburing under Linux
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6451

Test Plan: Watch internal CI

Differential Revision: D20079009

fbshipit-source-id: 04a6f53a900ff721f9a62a188cf906771b5d68d2
2020-02-24 18:25:13 -08:00
Peter Dillinger
944346d0fa Don't download from (unreliable) maven.org (#6348)
Summary:
I set up a mirror of our Java deps on github so we can download
them through github URLs rather than maven.org, which is proving
terribly unreliable from Travis builds.

Also sanitized calls to curl, so they are easier to read and
appropriately fail on download failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348

Test Plan: CI

Differential Revision: D19633621

Pulled By: pdillinger

fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3
2020-02-24 15:11:44 -08:00
Adam Retter
02fd7ca92a Reduce the need to re-download dependencies (#6318)
Summary:
Both changes are related to RocksJava:

1. Allow dependencies that are already present on the host system due to Maven to be reused in Docker builds.

2. Extend the `make clean-not-downloaded` target to RocksJava, so that libraries needed as dependencies for the test suite are not deleted and re-downloaded unnecessarily.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6318

Differential Revision: D19608742

Pulled By: pdillinger

fbshipit-source-id: 25e25649e3e3212b537ac4512b40e2e53dc02ae7
2020-02-24 15:11:36 -08:00
sdong
0eab26e482 Handle io_uring partial results (#6441)
Summary:
The logic that handles io_uring partial results was wrong. Fix the logic by putting it into a queue and continue reading.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6441

Test Plan: Make sure this patch fixes the application test case where the bug was discovered; in env_test, add a unit test that simulates partial results and make sure the results are still correct.

Differential Revision: D20018616

fbshipit-source-id: 5398a7e34d74c26d52aa69dfd604e93e95d99c62
2020-02-24 15:10:28 -08:00
anand76
25dce8fd7c Update version to 6.7.1 2020-02-13 11:59:07 -08:00
anand76
10c141a3b7 Ensure all MultiGet IO errors are propagated to user (#6403)
Summary:
Unrevert the previous fix to propagate error status, and an additional fix to not treat a memtable lookup MergeInProgress status as an error.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6403

Test Plan:
Unit tests
Tried running stress tests but couldn't repro the stress failure

Differential Revision: D19846721

Pulled By: anand1976

fbshipit-source-id: 7db10cccbdc863d9b559497f0a46b608d2488ca4
2020-02-13 11:55:42 -08:00
sdong
2c62c227ae By default turn IO Uring off. (#6405)
Summary:
We realized bugs related to IO Uring. Turn it off by default.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6405

Test Plan: Manually run build_tools/build_detect_platform and observe outputs.

Differential Revision: D19862792

fbshipit-source-id: 5d5e8e2762997b72a145ae59389ef3d7e4ccd060
2020-02-12 18:27:20 -08:00
anand76
0bc8750e82 Force a new manifest file if append to current one fails (#6331)
Summary:
Fix for issue https://github.com/facebook/rocksdb/issues/6316

When an append/sync of the manifest file fails due to an IO error such
as NoSpace, we don't always put the DB in read-only mode. This is true
for flush and compactions, as well as foreground operatons such as column family
add/drop, CompactFiles etc. Subsequent changes to the DB will be
recorded in the same manifest file, which would have a corrupted record
in the middle due to the previous failure. On next DB::Open(), it will
fail to process the full manifest and data will be lost.

To fix this, we reset VersionSet::descriptor_log_ on append/sync
failure, which will force a new manifest file to be written on the next
append.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6331

Test Plan: Add new unit tests in error_handler_test.cc

Differential Revision: D19632951

Pulled By: anand1976

fbshipit-source-id: 68d527cb6e59a94cbbbf9f5a17a7f464381d51e3
2020-01-31 16:26:22 -08:00
sdong
f6b3de76e5 Revert "Fix kHashSearch bug with SeekForPrev (#6297)"
This reverts commit d2b4d42d4b.
2020-01-27 14:58:30 -08:00
sdong
974dfc3de6 Revert "Fix a bug caused by recent fix of Prefix Hash (#6302)"
This reverts commit f8b5ef85ec.
2020-01-27 14:56:41 -08:00
sdong
cad5db1c3e Revert "Fix another bug caused by recent hash index fix (#6305)"
This reverts commit d87cffaea4.
2020-01-27 14:56:30 -08:00
31 changed files with 602 additions and 322 deletions

View File

@ -81,7 +81,9 @@ install:
sudo apt-get install -y mingw-w64 ; sudo apt-get install -y mingw-w64 ;
fi fi
- if [[ "${JOB_NAME}" == cmake* ]] && [ "${TRAVIS_OS_NAME}" == linux ]; then - if [[ "${JOB_NAME}" == cmake* ]] && [ "${TRAVIS_OS_NAME}" == linux ]; then
mkdir cmake-dist && curl -sfSL https://github.com/Kitware/CMake/releases/download/v3.14.5/cmake-3.14.5-Linux-x86_64.tar.gz | tar --strip-components=1 -C cmake-dist -xz && export PATH=$PWD/cmake-dist/bin:$PATH; CMAKE_DIST_URL="https://rocksdb-deps.s3-us-west-2.amazonaws.com/cmake/cmake-3.14.5-Linux-$(uname -m).tar.bz2";
TAR_OPT="--strip-components=1 -xj";
mkdir cmake-dist && curl --silent --fail --show-error --location "${CMAKE_DIST_URL}" | tar -C cmake-dist ${TAR_OPT} && export PATH=$PWD/cmake-dist/bin:$PATH;
fi fi
- if [[ "${JOB_NAME}" == java_test ]]; then - if [[ "${JOB_NAME}" == java_test ]]; then
java -version && echo "JAVA_HOME=${JAVA_HOME}"; java -version && echo "JAVA_HOME=${JAVA_HOME}";

View File

@ -1166,9 +1166,15 @@ if(WITH_BENCHMARK_TOOLS)
${ROCKSDB_LIB}) ${ROCKSDB_LIB})
endif() endif()
option(WITH_CORE_TOOLS "build with ldb and sst_dump" ON)
option(WITH_TOOLS "build with tools" ON) option(WITH_TOOLS "build with tools" ON)
if(WITH_TOOLS) if(WITH_CORE_TOOLS OR WITH_TOOLS)
add_subdirectory(tools) add_subdirectory(tools)
add_custom_target(core_tools
DEPENDS ${core_tool_deps})
endif()
if(WITH_TOOLS)
add_subdirectory(db_stress_tool) add_subdirectory(db_stress_tool)
add_custom_target(tools add_custom_target(tools
DEPENDS ${tool_deps}) DEPENDS ${tool_deps})

View File

@ -1,5 +1,21 @@
# Rocksdb Change Log # Rocksdb Change Log
## Unreleased ## Unreleased
### Bug Fixes
* Upgraded version of bzip library (1.0.6 -> 1.0.8) used with RocksJava to address potential vulnerabilities if an attacker can manipulate compressed data saved and loaded by RocksDB (not normal). See issue #6703.
## 6.7.3 (03/18/2020)
### Bug Fixes
* Fix a data race that might cause crash when calling DB::GetCreationTimeOfOldestFile() by a small chance. The bug was introduced in 6.6 Release.
## 6.7.2 (02/24/2020)
### Bug Fixes
* Fixed a bug of IO Uring partial result handling introduced in 6.7.0.
## 6.7.1 (02/13/2020)
### Bug Fixes
* Fixed issue #6316 that can cause a corruption of the MANIFEST file in the middle when writing to it fails due to no disk space.
* Batched MultiGet() ignores IO errors while reading data blocks, causing it to potentially continue looking for a key and returning stale results.
## 6.7.0 (01/21/2020) ## 6.7.0 (01/21/2020)
### Public API Change ### Public API Change
@ -14,7 +30,6 @@
* Fix a race condition for cfd->log_number_ between manifest switch and memtable switch (PR 6249) when number of column families is greater than 1. * Fix a race condition for cfd->log_number_ between manifest switch and memtable switch (PR 6249) when number of column families is greater than 1.
* Fix a bug on fractional cascading index when multiple files at the same level contain the same smallest user key, and those user keys are for merge operands. In this case, Get() the exact key may miss some merge operands. * Fix a bug on fractional cascading index when multiple files at the same level contain the same smallest user key, and those user keys are for merge operands. In this case, Get() the exact key may miss some merge operands.
* Delcare kHashSearch index type feature-incompatible with index_block_restart_interval larger than 1. * Delcare kHashSearch index type feature-incompatible with index_block_restart_interval larger than 1.
* Fix incorrect results while block-based table uses kHashSearch, together with Prev()/SeekForPrev().
* Fixed an issue where the thread pools were not resized upon setting `max_background_jobs` dynamically through the `SetDBOptions` interface. * Fixed an issue where the thread pools were not resized upon setting `max_background_jobs` dynamically through the `SetDBOptions` interface.
* Fix a bug that can cause write threads to hang when a slowdown/stall happens and there is a mix of writers with WriteOptions::no_slowdown set/unset. * Fix a bug that can cause write threads to hang when a slowdown/stall happens and there is a mix of writers with WriteOptions::no_slowdown set/unset.
* Fixed an issue where an incorrect "number of input records" value was used to compute the "records dropped" statistics for compactions. * Fixed an issue where an incorrect "number of input records" value was used to compute the "records dropped" statistics for compactions.
@ -23,6 +38,7 @@
* It is now possible to enable periodic compactions for the base DB when using BlobDB. * It is now possible to enable periodic compactions for the base DB when using BlobDB.
* BlobDB now garbage collects non-TTL blobs when `enable_garbage_collection` is set to `true` in `BlobDBOptions`. Garbage collection is performed during compaction: any valid blobs located in the oldest N files (where N is the number of non-TTL blob files multiplied by the value of `BlobDBOptions::garbage_collection_cutoff`) encountered during compaction get relocated to new blob files, and old blob files are dropped once they are no longer needed. Note: we recommend enabling periodic compactions for the base DB when using this feature to deal with the case when some old blob files are kept alive by SSTs that otherwise do not get picked for compaction. * BlobDB now garbage collects non-TTL blobs when `enable_garbage_collection` is set to `true` in `BlobDBOptions`. Garbage collection is performed during compaction: any valid blobs located in the oldest N files (where N is the number of non-TTL blob files multiplied by the value of `BlobDBOptions::garbage_collection_cutoff`) encountered during compaction get relocated to new blob files, and old blob files are dropped once they are no longer needed. Note: we recommend enabling periodic compactions for the base DB when using this feature to deal with the case when some old blob files are kept alive by SSTs that otherwise do not get picked for compaction.
* `db_bench` now supports the `garbage_collection_cutoff` option for BlobDB. * `db_bench` now supports the `garbage_collection_cutoff` option for BlobDB.
* MultiGet() can use IO Uring to parallelize read from the same SST file. This featuer is by default disabled. It can be enabled with environment variable ROCKSDB_USE_IO_URING.
## 6.6.2 (01/13/2020) ## 6.6.2 (01/13/2020)
### Bug Fixes ### Bug Fixes

View File

@ -1128,16 +1128,21 @@ unity_test: db/db_test.o db/db_test_util.o $(TESTHARNESS) $(TOOLLIBOBJECTS) unit
rocksdb.h rocksdb.cc: build_tools/amalgamate.py Makefile $(LIB_SOURCES) unity.cc rocksdb.h rocksdb.cc: build_tools/amalgamate.py Makefile $(LIB_SOURCES) unity.cc
build_tools/amalgamate.py -I. -i./include unity.cc -x include/rocksdb/c.h -H rocksdb.h -o rocksdb.cc build_tools/amalgamate.py -I. -i./include unity.cc -x include/rocksdb/c.h -H rocksdb.h -o rocksdb.cc
clean: clean-ext-libraries-all clean-rocks clean: clean-ext-libraries-all clean-rocks clean-rocksjava
clean-not-downloaded: clean-ext-libraries-bin clean-rocks clean-not-downloaded: clean-ext-libraries-bin clean-rocks clean-not-downloaded-rocksjava
clean-rocks: clean-rocks:
rm -f $(BENCHMARKS) $(TOOLS) $(TESTS) $(PARALLEL_TEST) $(LIBRARY) $(SHARED) rm -f $(BENCHMARKS) $(TOOLS) $(TESTS) $(PARALLEL_TEST) $(LIBRARY) $(SHARED)
rm -rf $(CLEAN_FILES) ios-x86 ios-arm scan_build_report rm -rf $(CLEAN_FILES) ios-x86 ios-arm scan_build_report
$(FIND) . -name "*.[oda]" -exec rm -f {} \; $(FIND) . -name "*.[oda]" -exec rm -f {} \;
$(FIND) . -type f -regex ".*\.\(\(gcda\)\|\(gcno\)\)" -exec rm {} \; $(FIND) . -type f -regex ".*\.\(\(gcda\)\|\(gcno\)\)" -exec rm {} \;
cd java; $(MAKE) clean
clean-rocksjava:
cd java && $(MAKE) clean
clean-not-downloaded-rocksjava:
cd java && $(MAKE) clean-not-downloaded
clean-ext-libraries-all: clean-ext-libraries-all:
rm -rf bzip2* snappy* zlib* lz4* zstd* rm -rf bzip2* snappy* zlib* lz4* zstd*
@ -1784,11 +1789,11 @@ SHA256_CMD = sha256sum
ZLIB_VER ?= 1.2.11 ZLIB_VER ?= 1.2.11
ZLIB_SHA256 ?= c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1 ZLIB_SHA256 ?= c3e5e9fdd5004dcb542feda5ee4f0ff0744628baf8ed2dd5d66f8ca1197cb1a1
ZLIB_DOWNLOAD_BASE ?= http://zlib.net ZLIB_DOWNLOAD_BASE ?= http://zlib.net
BZIP2_VER ?= 1.0.6 BZIP2_VER ?= 1.0.8
BZIP2_SHA256 ?= a2848f34fcd5d6cf47def00461fcb528a0484d8edef8208d6d2e2909dc61d9cd BZIP2_SHA256 ?= ab5a03176ee106d3f0fa90e381da478ddae405918153cca248e682cd0c4a2269
BZIP2_DOWNLOAD_BASE ?= https://downloads.sourceforge.net/project/bzip2 BZIP2_DOWNLOAD_BASE ?= https://sourceware.org/pub/bzip2
SNAPPY_VER ?= 1.1.7 SNAPPY_VER ?= 1.1.8
SNAPPY_SHA256 ?= 3dfa02e873ff51a11ee02b9ca391807f0c8ea0529a4924afa645fbf97163f9d4 SNAPPY_SHA256 ?= 16b677f07832a612b0836178db7f374e414f94657c138e6993cbfc5dcc58651f
SNAPPY_DOWNLOAD_BASE ?= https://github.com/google/snappy/archive SNAPPY_DOWNLOAD_BASE ?= https://github.com/google/snappy/archive
LZ4_VER ?= 1.9.2 LZ4_VER ?= 1.9.2
LZ4_SHA256 ?= 658ba6191fa44c92280d4aa2c271b0f4fbc0e34d249578dd05e50e76d0e5efcc LZ4_SHA256 ?= 658ba6191fa44c92280d4aa2c271b0f4fbc0e34d249578dd05e50e76d0e5efcc
@ -1834,7 +1839,7 @@ endif
libz.a: libz.a:
-rm -rf zlib-$(ZLIB_VER) -rm -rf zlib-$(ZLIB_VER)
ifeq (,$(wildcard ./zlib-$(ZLIB_VER).tar.gz)) ifeq (,$(wildcard ./zlib-$(ZLIB_VER).tar.gz))
curl --output zlib-$(ZLIB_VER).tar.gz -L ${ZLIB_DOWNLOAD_BASE}/zlib-$(ZLIB_VER).tar.gz curl --fail --output zlib-$(ZLIB_VER).tar.gz --location ${ZLIB_DOWNLOAD_BASE}/zlib-$(ZLIB_VER).tar.gz
endif endif
ZLIB_SHA256_ACTUAL=`$(SHA256_CMD) zlib-$(ZLIB_VER).tar.gz | cut -d ' ' -f 1`; \ ZLIB_SHA256_ACTUAL=`$(SHA256_CMD) zlib-$(ZLIB_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(ZLIB_SHA256)" != "$$ZLIB_SHA256_ACTUAL" ]; then \ if [ "$(ZLIB_SHA256)" != "$$ZLIB_SHA256_ACTUAL" ]; then \
@ -1848,7 +1853,7 @@ endif
libbz2.a: libbz2.a:
-rm -rf bzip2-$(BZIP2_VER) -rm -rf bzip2-$(BZIP2_VER)
ifeq (,$(wildcard ./bzip2-$(BZIP2_VER).tar.gz)) ifeq (,$(wildcard ./bzip2-$(BZIP2_VER).tar.gz))
curl --output bzip2-$(BZIP2_VER).tar.gz -L ${CURL_SSL_OPTS} ${BZIP2_DOWNLOAD_BASE}/bzip2-$(BZIP2_VER).tar.gz curl --fail --output bzip2-$(BZIP2_VER).tar.gz --location ${CURL_SSL_OPTS} ${BZIP2_DOWNLOAD_BASE}/bzip2-$(BZIP2_VER).tar.gz
endif endif
BZIP2_SHA256_ACTUAL=`$(SHA256_CMD) bzip2-$(BZIP2_VER).tar.gz | cut -d ' ' -f 1`; \ BZIP2_SHA256_ACTUAL=`$(SHA256_CMD) bzip2-$(BZIP2_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(BZIP2_SHA256)" != "$$BZIP2_SHA256_ACTUAL" ]; then \ if [ "$(BZIP2_SHA256)" != "$$BZIP2_SHA256_ACTUAL" ]; then \
@ -1862,7 +1867,7 @@ endif
libsnappy.a: libsnappy.a:
-rm -rf snappy-$(SNAPPY_VER) -rm -rf snappy-$(SNAPPY_VER)
ifeq (,$(wildcard ./snappy-$(SNAPPY_VER).tar.gz)) ifeq (,$(wildcard ./snappy-$(SNAPPY_VER).tar.gz))
curl --output snappy-$(SNAPPY_VER).tar.gz -L ${CURL_SSL_OPTS} ${SNAPPY_DOWNLOAD_BASE}/$(SNAPPY_VER).tar.gz curl --fail --output snappy-$(SNAPPY_VER).tar.gz --location ${CURL_SSL_OPTS} ${SNAPPY_DOWNLOAD_BASE}/$(SNAPPY_VER).tar.gz
endif endif
SNAPPY_SHA256_ACTUAL=`$(SHA256_CMD) snappy-$(SNAPPY_VER).tar.gz | cut -d ' ' -f 1`; \ SNAPPY_SHA256_ACTUAL=`$(SHA256_CMD) snappy-$(SNAPPY_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(SNAPPY_SHA256)" != "$$SNAPPY_SHA256_ACTUAL" ]; then \ if [ "$(SNAPPY_SHA256)" != "$$SNAPPY_SHA256_ACTUAL" ]; then \
@ -1877,7 +1882,7 @@ endif
liblz4.a: liblz4.a:
-rm -rf lz4-$(LZ4_VER) -rm -rf lz4-$(LZ4_VER)
ifeq (,$(wildcard ./lz4-$(LZ4_VER).tar.gz)) ifeq (,$(wildcard ./lz4-$(LZ4_VER).tar.gz))
curl --output lz4-$(LZ4_VER).tar.gz -L ${CURL_SSL_OPTS} ${LZ4_DOWNLOAD_BASE}/v$(LZ4_VER).tar.gz curl --fail --output lz4-$(LZ4_VER).tar.gz --location ${CURL_SSL_OPTS} ${LZ4_DOWNLOAD_BASE}/v$(LZ4_VER).tar.gz
endif endif
LZ4_SHA256_ACTUAL=`$(SHA256_CMD) lz4-$(LZ4_VER).tar.gz | cut -d ' ' -f 1`; \ LZ4_SHA256_ACTUAL=`$(SHA256_CMD) lz4-$(LZ4_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(LZ4_SHA256)" != "$$LZ4_SHA256_ACTUAL" ]; then \ if [ "$(LZ4_SHA256)" != "$$LZ4_SHA256_ACTUAL" ]; then \
@ -1891,7 +1896,7 @@ endif
libzstd.a: libzstd.a:
-rm -rf zstd-$(ZSTD_VER) -rm -rf zstd-$(ZSTD_VER)
ifeq (,$(wildcard ./zstd-$(ZSTD_VER).tar.gz)) ifeq (,$(wildcard ./zstd-$(ZSTD_VER).tar.gz))
curl --output zstd-$(ZSTD_VER).tar.gz -L ${CURL_SSL_OPTS} ${ZSTD_DOWNLOAD_BASE}/v$(ZSTD_VER).tar.gz curl --fail --output zstd-$(ZSTD_VER).tar.gz --location ${CURL_SSL_OPTS} ${ZSTD_DOWNLOAD_BASE}/v$(ZSTD_VER).tar.gz
endif endif
ZSTD_SHA256_ACTUAL=`$(SHA256_CMD) zstd-$(ZSTD_VER).tar.gz | cut -d ' ' -f 1`; \ ZSTD_SHA256_ACTUAL=`$(SHA256_CMD) zstd-$(ZSTD_VER).tar.gz | cut -d ' ' -f 1`; \
if [ "$(ZSTD_SHA256)" != "$$ZSTD_SHA256_ACTUAL" ]; then \ if [ "$(ZSTD_SHA256)" != "$$ZSTD_SHA256_ACTUAL" ]; then \
@ -1961,35 +1966,35 @@ rocksdbjavastaticreleasedocker: rocksdbjavastatic rocksdbjavastaticdockerx86 roc
rocksdbjavastaticdockerx86: rocksdbjavastaticdockerx86:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_x86-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_x86-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerx86_64: rocksdbjavastaticdockerx86_64:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_x64-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_x64-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos6_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerppc64le: rocksdbjavastaticdockerppc64le:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_ppc64le-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_ppc64le-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerarm64v8: rocksdbjavastaticdockerarm64v8:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_arm64v8-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_arm64v8-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:centos7_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerx86musl: rocksdbjavastaticdockerx86musl:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_x86-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_x86-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x86-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerx86_64musl: rocksdbjavastaticdockerx86_64musl:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_x64-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_x64-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_x64-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerppc64lemusl: rocksdbjavastaticdockerppc64lemusl:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_ppc64le-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_ppc64le-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_ppc64le-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticdockerarm64v8musl: rocksdbjavastaticdockerarm64v8musl:
mkdir -p java/target mkdir -p java/target
docker run --rm --name rocksdb_linux_arm64v8-musl-be --attach stdin --attach stdout --attach stderr --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh docker run --rm --name rocksdb_linux_arm64v8-musl-be --attach stdin --attach stdout --attach stderr --volume $(HOME)/.m2:/root/.m2:ro --volume `pwd`:/rocksdb-host:ro --volume /rocksdb-local-build --volume `pwd`/java/target:/rocksdb-java-target --env DEBUG_LEVEL=$(DEBUG_LEVEL) evolvedbinary/rocksjava:alpine3_arm64v8-be /rocksdb-host/java/crossbuild/docker-build-linux-centos.sh
rocksdbjavastaticpublish: rocksdbjavastaticrelease rocksdbjavastaticpublishcentral rocksdbjavastaticpublish: rocksdbjavastaticrelease rocksdbjavastaticpublishcentral

View File

@ -49,6 +49,7 @@ ROCKSDB_OS_PREPROCESSOR_FLAGS = [
"-DROCKSDB_SCHED_GETCPU_PRESENT", "-DROCKSDB_SCHED_GETCPU_PRESENT",
"-DROCKSDB_IOURING_PRESENT", "-DROCKSDB_IOURING_PRESENT",
"-DHAVE_SSE42", "-DHAVE_SSE42",
"-DLIBURING",
"-DNUMA", "-DNUMA",
], ],
), ),

View File

@ -29,7 +29,7 @@ install:
- md %THIRDPARTY_HOME% - md %THIRDPARTY_HOME%
- echo "Building Snappy dependency..." - echo "Building Snappy dependency..."
- cd %THIRDPARTY_HOME% - cd %THIRDPARTY_HOME%
- curl -fsSL -o snappy-1.1.7.zip https://github.com/google/snappy/archive/1.1.7.zip - curl --fail --silent --show-error --output snappy-1.1.7.zip --location https://github.com/google/snappy/archive/1.1.7.zip
- unzip snappy-1.1.7.zip - unzip snappy-1.1.7.zip
- cd snappy-1.1.7 - cd snappy-1.1.7
- mkdir build - mkdir build
@ -39,7 +39,7 @@ install:
- msbuild Snappy.sln /p:Configuration=Release /p:Platform=x64 - msbuild Snappy.sln /p:Configuration=Release /p:Platform=x64
- echo "Building LZ4 dependency..." - echo "Building LZ4 dependency..."
- cd %THIRDPARTY_HOME% - cd %THIRDPARTY_HOME%
- curl -fsSL -o lz4-1.8.3.zip https://github.com/lz4/lz4/archive/v1.8.3.zip - curl --fail --silent --show-error --output lz4-1.8.3.zip --location https://github.com/lz4/lz4/archive/v1.8.3.zip
- unzip lz4-1.8.3.zip - unzip lz4-1.8.3.zip
- cd lz4-1.8.3\visual\VS2010 - cd lz4-1.8.3\visual\VS2010
- ps: $CMD="$Env:DEV_ENV"; & $CMD lz4.sln /upgrade - ps: $CMD="$Env:DEV_ENV"; & $CMD lz4.sln /upgrade
@ -47,7 +47,7 @@ install:
- msbuild lz4.sln /p:Configuration=Release /p:Platform=x64 - msbuild lz4.sln /p:Configuration=Release /p:Platform=x64
- echo "Building ZStd dependency..." - echo "Building ZStd dependency..."
- cd %THIRDPARTY_HOME% - cd %THIRDPARTY_HOME%
- curl -fsSL -o zstd-1.4.0.zip https://github.com/facebook/zstd/archive/v1.4.0.zip - curl --fail --silent --show-error --output zstd-1.4.0.zip --location https://github.com/facebook/zstd/archive/v1.4.0.zip
- unzip zstd-1.4.0.zip - unzip zstd-1.4.0.zip
- cd zstd-1.4.0\build\VS2010 - cd zstd-1.4.0\build\VS2010
- ps: $CMD="$Env:DEV_ENV"; & $CMD zstd.sln /upgrade - ps: $CMD="$Env:DEV_ENV"; & $CMD zstd.sln /upgrade

View File

@ -55,6 +55,7 @@ ROCKSDB_OS_PREPROCESSOR_FLAGS = [
"-DROCKSDB_SCHED_GETCPU_PRESENT", "-DROCKSDB_SCHED_GETCPU_PRESENT",
"-DROCKSDB_IOURING_PRESENT", "-DROCKSDB_IOURING_PRESENT",
"-DHAVE_SSE42", "-DHAVE_SSE42",
"-DLIBURING",
"-DNUMA", "-DNUMA",
], ],
), ),

View File

@ -150,6 +150,7 @@ case "$TARGET_OS" in
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -latomic" PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -latomic"
fi fi
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt" PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt"
if test $ROCKSDB_USE_IO_URING; then
# check for liburing # check for liburing
$CXX $CFLAGS -x c++ - -luring -o /dev/null 2>/dev/null <<EOF $CXX $CFLAGS -x c++ - -luring -o /dev/null 2>/dev/null <<EOF
#include <liburing.h> #include <liburing.h>
@ -163,6 +164,7 @@ EOF
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -luring" PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -luring"
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_IOURING_PRESENT" COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_IOURING_PRESENT"
fi fi
fi
if test -z "$USE_FOLLY_DISTRIBUTED_MUTEX"; then if test -z "$USE_FOLLY_DISTRIBUTED_MUTEX"; then
USE_FOLLY_DISTRIBUTED_MUTEX=1 USE_FOLLY_DISTRIBUTED_MUTEX=1
fi fi

View File

@ -378,7 +378,7 @@ function send_to_ods {
echo >&2 "ERROR: Key $key doesn't have a value." echo >&2 "ERROR: Key $key doesn't have a value."
return return
fi fi
curl -s "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build$git_br&key=$key&value=$value" \ curl --silent "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build$git_br&key=$key&value=$value" \
--connect-timeout 60 --connect-timeout 60
} }

View File

@ -880,7 +880,7 @@ run_regression()
# parameters: $1 -- key, $2 -- value # parameters: $1 -- key, $2 -- value
function send_size_to_ods { function send_size_to_ods {
curl -s "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build&key=rocksdb.build_size.$1&value=$2" \ curl --silent "https://www.intern.facebook.com/intern/agent/ods_set.php?entity=rocksdb_build&key=rocksdb.build_size.$1&value=$2" \
--connect-timeout 60 --connect-timeout 60
} }

View File

@ -1405,6 +1405,91 @@ TEST_F(DBBasicTest, MultiGetBatchedMultiLevel) {
} }
} }
TEST_F(DBBasicTest, MultiGetBatchedMultiLevelMerge) {
Options options = CurrentOptions();
options.disable_auto_compactions = true;
options.merge_operator = MergeOperators::CreateStringAppendOperator();
BlockBasedTableOptions bbto;
bbto.filter_policy.reset(NewBloomFilterPolicy(10, false));
options.table_factory.reset(NewBlockBasedTableFactory(bbto));
Reopen(options);
int num_keys = 0;
for (int i = 0; i < 128; ++i) {
ASSERT_OK(Put("key_" + std::to_string(i), "val_l2_" + std::to_string(i)));
num_keys++;
if (num_keys == 8) {
Flush();
num_keys = 0;
}
}
if (num_keys > 0) {
Flush();
num_keys = 0;
}
MoveFilesToLevel(2);
for (int i = 0; i < 128; i += 3) {
ASSERT_OK(Merge("key_" + std::to_string(i), "val_l1_" + std::to_string(i)));
num_keys++;
if (num_keys == 8) {
Flush();
num_keys = 0;
}
}
if (num_keys > 0) {
Flush();
num_keys = 0;
}
MoveFilesToLevel(1);
for (int i = 0; i < 128; i += 5) {
ASSERT_OK(Merge("key_" + std::to_string(i), "val_l0_" + std::to_string(i)));
num_keys++;
if (num_keys == 8) {
Flush();
num_keys = 0;
}
}
if (num_keys > 0) {
Flush();
num_keys = 0;
}
ASSERT_EQ(0, num_keys);
for (int i = 0; i < 128; i += 9) {
ASSERT_OK(Merge("key_" + std::to_string(i), "val_mem_" + std::to_string(i)));
}
std::vector<std::string> keys;
std::vector<std::string> values;
for (int i = 32; i < 80; ++i) {
keys.push_back("key_" + std::to_string(i));
}
values = MultiGet(keys, nullptr);
ASSERT_EQ(values.size(), keys.size());
for (unsigned int j = 0; j < 48; ++j) {
int key = j + 32;
std::string value;
value.append("val_l2_" + std::to_string(key));
if (key % 3 == 0) {
value.append(",");
value.append("val_l1_" + std::to_string(key));
}
if (key % 5 == 0) {
value.append(",");
value.append("val_l0_" + std::to_string(key));
}
if (key % 9 == 0) {
value.append(",");
value.append("val_mem_" + std::to_string(key));
}
ASSERT_EQ(values[j], value);
}
}
// Test class for batched MultiGet with prefix extractor // Test class for batched MultiGet with prefix extractor
// Param bool - If true, use partitioned filters // Param bool - If true, use partitioned filters
// If false, use full filter block // If false, use full filter block
@ -2011,6 +2096,90 @@ TEST_P(DBBasicTestWithParallelIO, MultiGet) {
} }
} }
TEST_P(DBBasicTestWithParallelIO, MultiGetWithChecksumMismatch) {
std::vector<std::string> key_data(10);
std::vector<Slice> keys;
// We cannot resize a PinnableSlice vector, so just set initial size to
// largest we think we will need
std::vector<PinnableSlice> values(10);
std::vector<Status> statuses;
int read_count = 0;
ReadOptions ro;
ro.fill_cache = fill_cache();
SyncPoint::GetInstance()->SetCallBack(
"RetrieveMultipleBlocks:VerifyChecksum", [&](void *status) {
Status* s = static_cast<Status*>(status);
read_count++;
if (read_count == 2) {
*s = Status::Corruption();
}
});
SyncPoint::GetInstance()->EnableProcessing();
// Warm up the cache first
key_data.emplace_back(Key(0));
keys.emplace_back(Slice(key_data.back()));
key_data.emplace_back(Key(50));
keys.emplace_back(Slice(key_data.back()));
statuses.resize(keys.size());
dbfull()->MultiGet(ro, dbfull()->DefaultColumnFamily(), keys.size(),
keys.data(), values.data(), statuses.data(), true);
ASSERT_TRUE(CheckValue(0, values[0].ToString()));
//ASSERT_TRUE(CheckValue(50, values[1].ToString()));
ASSERT_EQ(statuses[0], Status::OK());
ASSERT_EQ(statuses[1], Status::Corruption());
SyncPoint::GetInstance()->DisableProcessing();
}
TEST_P(DBBasicTestWithParallelIO, MultiGetWithMissingFile) {
std::vector<std::string> key_data(10);
std::vector<Slice> keys;
// We cannot resize a PinnableSlice vector, so just set initial size to
// largest we think we will need
std::vector<PinnableSlice> values(10);
std::vector<Status> statuses;
ReadOptions ro;
ro.fill_cache = fill_cache();
SyncPoint::GetInstance()->SetCallBack(
"TableCache::MultiGet:FindTable", [&](void *status) {
Status* s = static_cast<Status*>(status);
*s = Status::IOError();
});
// DB open will create table readers unless we reduce the table cache
// capacity.
// SanitizeOptions will set max_open_files to minimum of 20. Table cache
// is allocated with max_open_files - 10 as capacity. So override
// max_open_files to 11 so table cache capacity will become 1. This will
// prevent file open during DB open and force the file to be opened
// during MultiGet
SyncPoint::GetInstance()->SetCallBack(
"SanitizeOptions::AfterChangeMaxOpenFiles", [&](void *arg) {
int* max_open_files = (int*)arg;
*max_open_files = 11;
});
SyncPoint::GetInstance()->EnableProcessing();
Reopen(CurrentOptions());
// Warm up the cache first
key_data.emplace_back(Key(0));
keys.emplace_back(Slice(key_data.back()));
key_data.emplace_back(Key(50));
keys.emplace_back(Slice(key_data.back()));
statuses.resize(keys.size());
dbfull()->MultiGet(ro, dbfull()->DefaultColumnFamily(), keys.size(),
keys.data(), values.data(), statuses.data(), true);
ASSERT_EQ(statuses[0], Status::IOError());
ASSERT_EQ(statuses[1], Status::IOError());
SyncPoint::GetInstance()->DisableProcessing();
}
INSTANTIATE_TEST_CASE_P( INSTANTIATE_TEST_CASE_P(
ParallelIO, DBBasicTestWithParallelIO, ParallelIO, DBBasicTestWithParallelIO,
// Params are as follows - // Params are as follows -

View File

@ -4495,8 +4495,15 @@ Status DBImpl::GetCreationTimeOfOldestFile(uint64_t* creation_time) {
if (mutable_db_options_.max_open_files == -1) { if (mutable_db_options_.max_open_files == -1) {
uint64_t oldest_time = port::kMaxUint64; uint64_t oldest_time = port::kMaxUint64;
for (auto cfd : *versions_->GetColumnFamilySet()) { for (auto cfd : *versions_->GetColumnFamilySet()) {
if (!cfd->IsDropped()) {
uint64_t ctime; uint64_t ctime;
cfd->current()->GetCreationTimeOfOldestFile(&ctime); {
SuperVersion* sv = GetAndRefSuperVersion(cfd);
Version* version = sv->current;
version->GetCreationTimeOfOldestFile(&ctime);
ReturnAndCleanupSuperVersion(cfd, sv);
}
if (ctime < oldest_time) { if (ctime < oldest_time) {
oldest_time = ctime; oldest_time = ctime;
} }
@ -4504,6 +4511,7 @@ Status DBImpl::GetCreationTimeOfOldestFile(uint64_t* creation_time) {
break; break;
} }
} }
}
*creation_time = oldest_time; *creation_time = oldest_time;
return Status::OK(); return Status::OK();
} else { } else {

View File

@ -4300,110 +4300,6 @@ TEST_F(DBTest2, SameSmallestInSameLevel) {
ASSERT_EQ("2,3,4,5,6,7,8", Get("key")); ASSERT_EQ("2,3,4,5,6,7,8", Get("key"));
} }
TEST_F(DBTest2, BlockBasedTablePrefixIndexSeekForPrev) {
// create a DB with block prefix index
BlockBasedTableOptions table_options;
Options options = CurrentOptions();
table_options.block_size = 300;
table_options.index_type = BlockBasedTableOptions::kHashSearch;
table_options.index_shortening =
BlockBasedTableOptions::IndexShorteningMode::kNoShortening;
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
options.prefix_extractor.reset(NewFixedPrefixTransform(1));
Reopen(options);
Random rnd(301);
std::string large_value = RandomString(&rnd, 500);
ASSERT_OK(Put("a1", large_value));
ASSERT_OK(Put("x1", large_value));
ASSERT_OK(Put("y1", large_value));
Flush();
{
std::unique_ptr<Iterator> iterator(db_->NewIterator(ReadOptions()));
iterator->SeekForPrev("x3");
ASSERT_TRUE(iterator->Valid());
ASSERT_EQ("x1", iterator->key().ToString());
iterator->SeekForPrev("a3");
ASSERT_TRUE(iterator->Valid());
ASSERT_EQ("a1", iterator->key().ToString());
iterator->SeekForPrev("y3");
ASSERT_TRUE(iterator->Valid());
ASSERT_EQ("y1", iterator->key().ToString());
// Query more than one non-existing prefix to cover the case both
// of empty hash bucket and hash bucket conflict.
iterator->SeekForPrev("b1");
// Result should be not valid or "a1".
if (iterator->Valid()) {
ASSERT_EQ("a1", iterator->key().ToString());
}
iterator->SeekForPrev("c1");
// Result should be not valid or "a1".
if (iterator->Valid()) {
ASSERT_EQ("a1", iterator->key().ToString());
}
iterator->SeekForPrev("d1");
// Result should be not valid or "a1".
if (iterator->Valid()) {
ASSERT_EQ("a1", iterator->key().ToString());
}
iterator->SeekForPrev("y3");
ASSERT_TRUE(iterator->Valid());
ASSERT_EQ("y1", iterator->key().ToString());
}
}
TEST_F(DBTest2, BlockBasedTablePrefixGetIndexNotFound) {
// create a DB with block prefix index
BlockBasedTableOptions table_options;
Options options = CurrentOptions();
table_options.block_size = 300;
table_options.index_type = BlockBasedTableOptions::kHashSearch;
table_options.index_shortening =
BlockBasedTableOptions::IndexShorteningMode::kNoShortening;
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
options.prefix_extractor.reset(NewFixedPrefixTransform(1));
options.level0_file_num_compaction_trigger = 8;
Reopen(options);
ASSERT_OK(Put("b1", "ok"));
Flush();
// Flushing several files so that the chance that hash bucket
// is empty fo "b" in at least one of the files is high.
ASSERT_OK(Put("a1", ""));
ASSERT_OK(Put("c1", ""));
Flush();
ASSERT_OK(Put("a2", ""));
ASSERT_OK(Put("c2", ""));
Flush();
ASSERT_OK(Put("a3", ""));
ASSERT_OK(Put("c3", ""));
Flush();
ASSERT_OK(Put("a4", ""));
ASSERT_OK(Put("c4", ""));
Flush();
ASSERT_OK(Put("a5", ""));
ASSERT_OK(Put("c5", ""));
Flush();
ASSERT_EQ("ok", Get("b1"));
}
} // namespace rocksdb } // namespace rocksdb
#ifdef ROCKSDB_UNITTESTS_WITH_CUSTOM_OBJECTS_FROM_STATIC_LIBS #ifdef ROCKSDB_UNITTESTS_WITH_CUSTOM_OBJECTS_FROM_STATIC_LIBS

View File

@ -166,12 +166,6 @@ Status ErrorHandler::SetBGError(const Status& bg_err, BackgroundErrorReason reas
return Status::OK(); return Status::OK();
} }
// Check if recovery is currently in progress. If it is, we will save this
// error so we can check it at the end to see if recovery succeeded or not
if (recovery_in_prog_ && recovery_error_.ok()) {
recovery_error_ = bg_err;
}
bool paranoid = db_options_.paranoid_checks; bool paranoid = db_options_.paranoid_checks;
Status::Severity sev = Status::Severity::kFatalError; Status::Severity sev = Status::Severity::kFatalError;
Status new_bg_err; Status new_bg_err;
@ -204,10 +198,15 @@ Status ErrorHandler::SetBGError(const Status& bg_err, BackgroundErrorReason reas
new_bg_err = Status(bg_err, sev); new_bg_err = Status(bg_err, sev);
// Check if recovery is currently in progress. If it is, we will save this
// error so we can check it at the end to see if recovery succeeded or not
if (recovery_in_prog_ && recovery_error_.ok()) {
recovery_error_ = new_bg_err;
}
bool auto_recovery = auto_recovery_; bool auto_recovery = auto_recovery_;
if (new_bg_err.severity() >= Status::Severity::kFatalError && auto_recovery) { if (new_bg_err.severity() >= Status::Severity::kFatalError && auto_recovery) {
auto_recovery = false; auto_recovery = false;
;
} }
// Allow some error specific overrides // Allow some error specific overrides

View File

@ -22,6 +22,21 @@ namespace rocksdb {
class DBErrorHandlingTest : public DBTestBase { class DBErrorHandlingTest : public DBTestBase {
public: public:
DBErrorHandlingTest() : DBTestBase("/db_error_handling_test") {} DBErrorHandlingTest() : DBTestBase("/db_error_handling_test") {}
std::string GetManifestNameFromLiveFiles() {
std::vector<std::string> live_files;
uint64_t manifest_size;
dbfull()->GetLiveFiles(live_files, &manifest_size, false);
for (auto& file : live_files) {
uint64_t num = 0;
FileType type;
if (ParseFileName(file, &num, &type) && type == kDescriptorFile) {
return file;
}
}
return "";
}
}; };
class DBErrorHandlingEnv : public EnvWrapper { class DBErrorHandlingEnv : public EnvWrapper {
@ -161,6 +176,169 @@ TEST_F(DBErrorHandlingTest, FLushWriteError) {
Destroy(options); Destroy(options);
} }
TEST_F(DBErrorHandlingTest, ManifestWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default()));
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
Options options = GetDefaultOptions();
options.create_if_missing = true;
options.env = fault_env.get();
options.listeners.emplace_back(listener);
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
Put(Key(0), "val");
Flush();
Put(Key(1), "val");
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
fault_env->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_EQ(s, Status::OK());
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingTest, DoubleManifestWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default()));
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
Options options = GetDefaultOptions();
options.create_if_missing = true;
options.env = fault_env.get();
options.listeners.emplace_back(listener);
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
Put(Key(0), "val");
Flush();
Put(Key(1), "val");
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
fault_env->SetFilesystemActive(true);
// This Resume() will attempt to create a new manifest file and fail again
s = dbfull()->Resume();
ASSERT_EQ(s.severity(), rocksdb::Status::Severity::kHardError);
fault_env->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
// A successful Resume() will create a new manifest file
s = dbfull()->Resume();
ASSERT_EQ(s, Status::OK());
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingTest, CompactionManifestWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default()));
std::shared_ptr<ErrorHandlerListener> listener(new ErrorHandlerListener());
Options options = GetDefaultOptions();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
options.env = fault_env.get();
Status s;
std::string old_manifest;
std::string new_manifest;
std::atomic<bool> fail_manifest(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
Put(Key(0), "val");
Put(Key(2), "val");
s = Flush();
ASSERT_EQ(s, Status::OK());
rocksdb::SyncPoint::GetInstance()->LoadDependency(
// Wait for flush of 2nd L0 file before starting compaction
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"},
// Wait for compaction to detect manifest write error
{"BackgroundCallCompaction:1",
"CompactionManifestWriteError:0"},
// Make compaction thread wait for error to be cleared
{"CompactionManifestWriteError:1",
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"},
// Wait for DB instance to clear bg_error before calling
// TEST_WaitForCompact
{"SstFileManagerImpl::ClearError",
"CompactionManifestWriteError:2"}});
// trigger manifest write failure in compaction thread
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void *) {
fail_manifest.store(true);
});
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void *) {
if (fail_manifest.load()) {
fault_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
}
});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
Put(Key(1), "val");
// This Flush will trigger a compaction, which will fail when appending to
// the manifest
s = Flush();
ASSERT_EQ(s, Status::OK());
TEST_SYNC_POINT("CompactionManifestWriteError:0");
// Clear all errors so when the compaction is retried, it will succeed
fault_env->SetFilesystemActive(true);
rocksdb::SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("CompactionManifestWriteError:1");
TEST_SYNC_POINT("CompactionManifestWriteError:2");
s = dbfull()->TEST_WaitForCompact();
rocksdb::SyncPoint::GetInstance()->DisableProcessing();
ASSERT_EQ(s, Status::OK());
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
ASSERT_EQ("val", Get(Key(2)));
Close();
}
TEST_F(DBErrorHandlingTest, CompactionWriteError) { TEST_F(DBErrorHandlingTest, CompactionWriteError) {
std::unique_ptr<FaultInjectionTestEnv> fault_env( std::unique_ptr<FaultInjectionTestEnv> fault_env(
new FaultInjectionTestEnv(Env::Default())); new FaultInjectionTestEnv(Env::Default()));

View File

@ -485,6 +485,7 @@ Status TableCache::MultiGet(const ReadOptions& options,
file_options_, internal_comparator, fd, &handle, prefix_extractor, file_options_, internal_comparator, fd, &handle, prefix_extractor,
options.read_tier == kBlockCacheTier /* no_io */, options.read_tier == kBlockCacheTier /* no_io */,
true /* record_read_stats */, file_read_hist, skip_filters, level); true /* record_read_stats */, file_read_hist, skip_filters, level);
TEST_SYNC_POINT_CALLBACK("TableCache::MultiGet:FindTable", &s);
if (s.ok()) { if (s.ok()) {
t = GetTableReaderFromHandle(handle); t = GetTableReaderFromHandle(handle);
assert(t); assert(t);

View File

@ -1918,6 +1918,11 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range,
&iter->max_covering_tombstone_seq, this->env_, nullptr, &iter->max_covering_tombstone_seq, this->env_, nullptr,
merge_operator_ ? &pinned_iters_mgr : nullptr, callback, is_blob, merge_operator_ ? &pinned_iters_mgr : nullptr, callback, is_blob,
tracing_mget_id); tracing_mget_id);
// MergeInProgress status, if set, has been transferred to the get_context
// state, so we set status to ok here. From now on, the iter status will
// be used for IO errors, and get_context state will be used for any
// key level errors
*(iter->s) = Status::OK();
} }
int get_ctx_index = 0; int get_ctx_index = 0;
for (auto iter = range->begin(); iter != range->end(); for (auto iter = range->begin(); iter != range->end();
@ -1962,6 +1967,15 @@ void Version::MultiGet(const ReadOptions& read_options, MultiGetRange* range,
for (auto iter = file_range.begin(); iter != file_range.end(); ++iter) { for (auto iter = file_range.begin(); iter != file_range.end(); ++iter) {
GetContext& get_context = *iter->get_context; GetContext& get_context = *iter->get_context;
Status* status = iter->s; Status* status = iter->s;
// The Status in the KeyContext takes precedence over GetContext state
// Status may be an error if there were any IO errors in the table
// reader. We never expect Status to be NotFound(), as that is
// determined by get_context
assert(!status->IsNotFound());
if (!status->ok()) {
file_range.MarkKeyDone(iter);
continue;
}
if (get_context.sample()) { if (get_context.sample()) {
sample_file_read_inc(f->file_metadata); sample_file_read_inc(f->file_metadata);
@ -3953,12 +3967,15 @@ Status VersionSet::ProcessManifestWrites(
for (auto v : versions) { for (auto v : versions) {
delete v; delete v;
} }
// If manifest append failed for whatever reason, the file could be
// corrupted. So we need to force the next version update to start a
// new manifest file.
descriptor_log_.reset();
if (new_descriptor_log) { if (new_descriptor_log) {
ROCKS_LOG_INFO(db_options_->info_log, ROCKS_LOG_INFO(db_options_->info_log,
"Deleting manifest %" PRIu64 " current manifest %" PRIu64 "Deleting manifest %" PRIu64 " current manifest %" PRIu64
"\n", "\n",
manifest_file_number_, pending_manifest_file_number_); manifest_file_number_, pending_manifest_file_number_);
descriptor_log_.reset();
env_->DeleteFile( env_->DeleteFile(
DescriptorFileName(dbname_, pending_manifest_file_number_)); DescriptorFileName(dbname_, pending_manifest_file_number_));
} }

View File

@ -62,7 +62,7 @@ void InitializeHotKeyGenerator(double alpha) {
int64_t GetOneHotKeyID(double rand_seed, int64_t max_key) { int64_t GetOneHotKeyID(double rand_seed, int64_t max_key) {
int64_t low = 1, mid, high = zipf_sum_size, zipf = 0; int64_t low = 1, mid, high = zipf_sum_size, zipf = 0;
while (low <= high) { while (low <= high) {
mid = std::floor((low + high) / 2); mid = static_cast<int64_t>(std::floor((low + high) / 2));
if (sum_probs[mid] >= rand_seed && sum_probs[mid - 1] < rand_seed) { if (sum_probs[mid] >= rand_seed && sum_probs[mid - 1] < rand_seed) {
zipf = mid; zipf = mid;
break; break;

View File

@ -771,7 +771,7 @@ std::vector<std::string> StressTest::GetWhiteBoxKeys(ThreadState* thread,
k[i] = static_cast<char>(cur - 1); k[i] = static_cast<char>(cur - 1);
break; break;
} else if (i > 0) { } else if (i > 0) {
k[i] = 0xFF; k[i] = static_cast<char>(0xFF);
} }
} }
} else if (thread->rand.OneIn(2)) { } else if (thread->rand.OneIn(2)) {

20
env/env_test.cc vendored
View File

@ -1130,8 +1130,25 @@ TEST_P(EnvPosixTestWithParam, MultiRead) {
ASSERT_OK(wfile->Close()); ASSERT_OK(wfile->Close());
} }
// More attempts to simulate more partial result sequences.
for (uint32_t attempt = 0; attempt < 20; attempt++) {
// Random Read // Random Read
{ Random rnd(301 + attempt);
rocksdb::SyncPoint::GetInstance()->SetCallBack(
"PosixRandomAccessFile::MultiRead:io_uring_result", [&](void* arg) {
if (attempt > 0) {
// No failure in the first attempt.
size_t& bytes_read = *static_cast<size_t*>(arg);
if (rnd.OneIn(4)) {
bytes_read = 0;
} else if (rnd.OneIn(3)) {
bytes_read = static_cast<size_t>(
rnd.Uniform(static_cast<int>(bytes_read)));
}
}
});
rocksdb::SyncPoint::GetInstance()->EnableProcessing();
std::unique_ptr<RandomAccessFile> file; std::unique_ptr<RandomAccessFile> file;
std::vector<ReadRequest> reqs(3); std::vector<ReadRequest> reqs(3);
std::vector<std::unique_ptr<char, Deleter>> data; std::vector<std::unique_ptr<char, Deleter>> data;
@ -1156,6 +1173,7 @@ TEST_P(EnvPosixTestWithParam, MultiRead) {
ASSERT_OK(reqs[i].status); ASSERT_OK(reqs[i].status);
ASSERT_EQ(memcmp(reqs[i].scratch, buf.get(), kSectorSize), 0); ASSERT_EQ(memcmp(reqs[i].scratch, buf.get(), kSectorSize), 0);
} }
rocksdb::SyncPoint::GetInstance()->DisableProcessing();
} }
} }

84
env/io_posix.cc vendored
View File

@ -482,9 +482,6 @@ IOStatus PosixRandomAccessFile::MultiRead(FSReadRequest* reqs,
const IOOptions& options, const IOOptions& options,
IODebugContext* dbg) { IODebugContext* dbg) {
#if defined(ROCKSDB_IOURING_PRESENT) #if defined(ROCKSDB_IOURING_PRESENT)
size_t reqs_off;
ssize_t ret __attribute__((__unused__));
struct io_uring* iu = nullptr; struct io_uring* iu = nullptr;
if (thread_local_io_urings_) { if (thread_local_io_urings_) {
iu = static_cast<struct io_uring*>(thread_local_io_urings_->Get()); iu = static_cast<struct io_uring*>(thread_local_io_urings_->Get());
@ -505,35 +502,49 @@ IOStatus PosixRandomAccessFile::MultiRead(FSReadRequest* reqs,
struct WrappedReadRequest { struct WrappedReadRequest {
FSReadRequest* req; FSReadRequest* req;
struct iovec iov; struct iovec iov;
explicit WrappedReadRequest(FSReadRequest* r) : req(r) {} size_t finished_len;
explicit WrappedReadRequest(FSReadRequest* r) : req(r), finished_len(0) {}
}; };
autovector<WrappedReadRequest, 32> req_wraps; autovector<WrappedReadRequest, 32> req_wraps;
autovector<WrappedReadRequest*, 4> incomplete_rq_list;
for (size_t i = 0; i < num_reqs; i++) { for (size_t i = 0; i < num_reqs; i++) {
req_wraps.emplace_back(&reqs[i]); req_wraps.emplace_back(&reqs[i]);
} }
reqs_off = 0; size_t reqs_off = 0;
while (num_reqs) { while (num_reqs > reqs_off || !incomplete_rq_list.empty()) {
size_t this_reqs = num_reqs; size_t this_reqs = (num_reqs - reqs_off) + incomplete_rq_list.size();
// If requests exceed depth, split it into batches // If requests exceed depth, split it into batches
if (this_reqs > kIoUringDepth) this_reqs = kIoUringDepth; if (this_reqs > kIoUringDepth) this_reqs = kIoUringDepth;
assert(incomplete_rq_list.size() <= this_reqs);
for (size_t i = 0; i < this_reqs; i++) { for (size_t i = 0; i < this_reqs; i++) {
size_t index = i + reqs_off; WrappedReadRequest* rep_to_submit;
struct io_uring_sqe* sqe; if (i < incomplete_rq_list.size()) {
rep_to_submit = incomplete_rq_list[i];
sqe = io_uring_get_sqe(iu); } else {
req_wraps[index].iov.iov_base = reqs[index].scratch; rep_to_submit = &req_wraps[reqs_off++];
req_wraps[index].iov.iov_len = reqs[index].len;
io_uring_prep_readv(sqe, fd_, &req_wraps[index].iov, 1,
reqs[index].offset);
io_uring_sqe_set_data(sqe, &req_wraps[index]);
} }
assert(rep_to_submit->req->len > rep_to_submit->finished_len);
rep_to_submit->iov.iov_base =
rep_to_submit->req->scratch + rep_to_submit->finished_len;
rep_to_submit->iov.iov_len =
rep_to_submit->req->len - rep_to_submit->finished_len;
ret = io_uring_submit_and_wait(iu, static_cast<unsigned int>(this_reqs)); struct io_uring_sqe* sqe;
sqe = io_uring_get_sqe(iu);
io_uring_prep_readv(
sqe, fd_, &rep_to_submit->iov, 1,
rep_to_submit->req->offset + rep_to_submit->finished_len);
io_uring_sqe_set_data(sqe, rep_to_submit);
}
incomplete_rq_list.clear();
ssize_t ret =
io_uring_submit_and_wait(iu, static_cast<unsigned int>(this_reqs));
if (static_cast<size_t>(ret) != this_reqs) { if (static_cast<size_t>(ret) != this_reqs) {
fprintf(stderr, "ret = %ld this_reqs: %ld\n", (long)ret, (long)this_reqs); fprintf(stderr, "ret = %ld this_reqs: %ld\n", (long)ret, (long)this_reqs);
} }
@ -547,21 +558,44 @@ IOStatus PosixRandomAccessFile::MultiRead(FSReadRequest* reqs,
// of our initial wait not reaping all completions // of our initial wait not reaping all completions
ret = io_uring_wait_cqe(iu, &cqe); ret = io_uring_wait_cqe(iu, &cqe);
assert(!ret); assert(!ret);
req_wrap = static_cast<WrappedReadRequest*>(io_uring_cqe_get_data(cqe)); req_wrap = static_cast<WrappedReadRequest*>(io_uring_cqe_get_data(cqe));
FSReadRequest* req = req_wrap->req; FSReadRequest* req = req_wrap->req;
if (static_cast<size_t>(cqe->res) == req_wrap->iov.iov_len) { if (cqe->res < 0) {
req->result = Slice(req->scratch, cqe->res);
req->status = IOStatus::OK();
} else if (cqe->res >= 0) {
req->result = Slice(req->scratch, req_wrap->iov.iov_len - cqe->res);
} else {
req->result = Slice(req->scratch, 0); req->result = Slice(req->scratch, 0);
req->status = IOError("Req failed", filename_, cqe->res); req->status = IOError("Req failed", filename_, cqe->res);
} else {
size_t bytes_read = static_cast<size_t>(cqe->res);
TEST_SYNC_POINT_CALLBACK(
"PosixRandomAccessFile::MultiRead:io_uring_result", &bytes_read);
if (bytes_read == req_wrap->iov.iov_len) {
req->result = Slice(req->scratch, req->len);
req->status = IOStatus::OK();
} else if (bytes_read == 0) {
// cqe->res == 0 can means EOF, or can mean partial results. See
// comment
// https://github.com/facebook/rocksdb/pull/6441#issuecomment-589843435
// Fall back to pread in this case.
Slice tmp_slice;
req->status =
Read(req->offset + req_wrap->finished_len,
req->len - req_wrap->finished_len, options, &tmp_slice,
req->scratch + req_wrap->finished_len, dbg);
req->result =
Slice(req->scratch, req_wrap->finished_len + tmp_slice.size());
} else if (bytes_read < req_wrap->iov.iov_len) {
assert(bytes_read > 0);
assert(bytes_read + req_wrap->finished_len < req->len);
req_wrap->finished_len += bytes_read;
incomplete_rq_list.push_back(req_wrap);
} else {
req->result = Slice(req->scratch, 0);
req->status = IOError("Req returned more bytes than requested",
filename_, cqe->res);
}
} }
io_uring_cqe_seen(iu, cqe); io_uring_cqe_seen(iu, cqe);
} }
num_reqs -= this_reqs;
reqs_off += this_reqs;
} }
return IOStatus::OK(); return IOStatus::OK();
#else #else

View File

@ -308,6 +308,7 @@ void SstFileManagerImpl::ClearError() {
// since the ErrorHandler::recovery_in_prog_ flag would be true // since the ErrorHandler::recovery_in_prog_ flag would be true
cur_instance_ = error_handler; cur_instance_ = error_handler;
mu_.Unlock(); mu_.Unlock();
TEST_SYNC_POINT("SstFileManagerImpl::ClearError");
s = error_handler->RecoverFromBGError(); s = error_handler->RecoverFromBGError();
mu_.Lock(); mu_.Lock();
// The DB instance might have been deleted while we were // The DB instance might have been deleted while we were

View File

@ -6,7 +6,7 @@
#define ROCKSDB_MAJOR 6 #define ROCKSDB_MAJOR 6
#define ROCKSDB_MINOR 7 #define ROCKSDB_MINOR 7
#define ROCKSDB_PATCH 0 #define ROCKSDB_PATCH 3
// Do not use these. We made the mistake of declaring macros starting with // Do not use these. We made the mistake of declaring macros starting with
// double underscore. Now we have to live with our choice. We'll deprecate these // double underscore. Now we have to live with our choice. We'll deprecate these

View File

@ -313,17 +313,17 @@ if(NOT EXISTS ${JAVA_TEST_LIBDIR})
file(MAKE_DIRECTORY mkdir ${JAVA_TEST_LIBDIR}) file(MAKE_DIRECTORY mkdir ${JAVA_TEST_LIBDIR})
endif() endif()
if (DEFINED CUSTOM_REPO_URL) if (DEFINED CUSTOM_DEPS_URL)
set(SEARCH_REPO_URL ${CUSTOM_REPO_URL}/) set(DEPS_URL ${CUSTOM_DEPS_URL}/)
set(CENTRAL_REPO_URL ${CUSTOM_REPO_URL}/)
else () else ()
set(SEARCH_REPO_URL "http://search.maven.org/remotecontent?filepath=") # Using a Facebook AWS account for S3 storage. (maven.org has a history
set(CENTRAL_REPO_URL "https://repo1.maven.org/maven2/") # of failing in Travis builds.)
set(DEPS_URL "https://rocksdb-deps.s3-us-west-2.amazonaws.com/jars")
endif() endif()
if(NOT EXISTS ${JAVA_JUNIT_JAR}) if(NOT EXISTS ${JAVA_JUNIT_JAR})
message("Downloading ${JAVA_JUNIT_JAR}") message("Downloading ${JAVA_JUNIT_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}junit/junit/4.12/junit-4.12.jar ${JAVA_TMP_JAR} STATUS downloadStatus) file(DOWNLOAD ${DEPS_URL}/junit-4.12.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code) list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0) if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_JUNIT_JAR}") message(FATAL_ERROR "Failed downloading ${JAVA_JUNIT_JAR}")
@ -332,7 +332,7 @@ if(NOT EXISTS ${JAVA_JUNIT_JAR})
endif() endif()
if(NOT EXISTS ${JAVA_HAMCR_JAR}) if(NOT EXISTS ${JAVA_HAMCR_JAR})
message("Downloading ${JAVA_HAMCR_JAR}") message("Downloading ${JAVA_HAMCR_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar ${JAVA_TMP_JAR} STATUS downloadStatus) file(DOWNLOAD ${DEPS_URL}/hamcrest-core-1.3.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code) list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0) if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_HAMCR_JAR}") message(FATAL_ERROR "Failed downloading ${JAVA_HAMCR_JAR}")
@ -341,7 +341,7 @@ if(NOT EXISTS ${JAVA_HAMCR_JAR})
endif() endif()
if(NOT EXISTS ${JAVA_MOCKITO_JAR}) if(NOT EXISTS ${JAVA_MOCKITO_JAR})
message("Downloading ${JAVA_MOCKITO_JAR}") message("Downloading ${JAVA_MOCKITO_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar ${JAVA_TMP_JAR} STATUS downloadStatus) file(DOWNLOAD ${DEPS_URL}/mockito-all-1.10.19.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code) list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0) if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_MOCKITO_JAR}") message(FATAL_ERROR "Failed downloading ${JAVA_MOCKITO_JAR}")
@ -350,7 +350,7 @@ if(NOT EXISTS ${JAVA_MOCKITO_JAR})
endif() endif()
if(NOT EXISTS ${JAVA_CGLIB_JAR}) if(NOT EXISTS ${JAVA_CGLIB_JAR})
message("Downloading ${JAVA_CGLIB_JAR}") message("Downloading ${JAVA_CGLIB_JAR}")
file(DOWNLOAD ${SEARCH_REPO_URL}cglib/cglib/2.2.2/cglib-2.2.2.jar ${JAVA_TMP_JAR} STATUS downloadStatus) file(DOWNLOAD ${DEPS_URL}/cglib-2.2.2.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code) list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0) if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_CGLIB_JAR}") message(FATAL_ERROR "Failed downloading ${JAVA_CGLIB_JAR}")
@ -359,7 +359,7 @@ if(NOT EXISTS ${JAVA_CGLIB_JAR})
endif() endif()
if(NOT EXISTS ${JAVA_ASSERTJ_JAR}) if(NOT EXISTS ${JAVA_ASSERTJ_JAR})
message("Downloading ${JAVA_ASSERTJ_JAR}") message("Downloading ${JAVA_ASSERTJ_JAR}")
file(DOWNLOAD ${CENTRAL_REPO_URL}org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar ${JAVA_TMP_JAR} STATUS downloadStatus) file(DOWNLOAD ${DEPS_URL}/assertj-core-1.7.1.jar ${JAVA_TMP_JAR} STATUS downloadStatus)
list(GET downloadStatus 0 error_code) list(GET downloadStatus 0 error_code)
if(NOT error_code EQUAL 0) if(NOT error_code EQUAL 0)
message(FATAL_ERROR "Failed downloading ${JAVA_ASSERTJ_JAR}") message(FATAL_ERROR "Failed downloading ${JAVA_ASSERTJ_JAR}")

View File

@ -213,16 +213,21 @@ ifneq ($(DEBUG_LEVEL),0)
JAVAC_ARGS = -Xlint:deprecation -Xlint:unchecked JAVAC_ARGS = -Xlint:deprecation -Xlint:unchecked
endif endif
SEARCH_REPO_URL?=http://search.maven.org/remotecontent?filepath= # Using a Facebook AWS account for S3 storage. (maven.org has a history
CENTRAL_REPO_URL?=https://repo1.maven.org/maven2/ # of failing in Travis builds.)
DEPS_URL?=https://rocksdb-deps.s3-us-west-2.amazonaws.com/jars
clean: clean: clean-not-downloaded clean-downloaded
$(AM_V_at)rm -rf include/*
$(AM_V_at)rm -rf test-libs/ clean-not-downloaded:
$(AM_V_at)rm -rf $(NATIVE_INCLUDE)
$(AM_V_at)rm -rf $(OUTPUT) $(AM_V_at)rm -rf $(OUTPUT)
$(AM_V_at)rm -rf $(BENCHMARK_OUTPUT) $(AM_V_at)rm -rf $(BENCHMARK_OUTPUT)
$(AM_V_at)rm -rf $(SAMPLES_OUTPUT) $(AM_V_at)rm -rf $(SAMPLES_OUTPUT)
clean-downloaded:
$(AM_V_at)rm -rf $(JAVA_TEST_LIBDIR)
javadocs: java javadocs: java
$(AM_V_GEN)mkdir -p $(JAVADOC) $(AM_V_GEN)mkdir -p $(JAVADOC)
@ -279,11 +284,11 @@ optimistic_transaction_sample: java
resolve_test_deps: resolve_test_deps:
test -d "$(JAVA_TEST_LIBDIR)" || mkdir -p "$(JAVA_TEST_LIBDIR)" test -d "$(JAVA_TEST_LIBDIR)" || mkdir -p "$(JAVA_TEST_LIBDIR)"
test -s "$(JAVA_JUNIT_JAR)" || cp $(MVN_LOCAL)/junit/junit/4.12/junit-4.12.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o $(JAVA_JUNIT_JAR) $(SEARCH_REPO_URL)junit/junit/4.12/junit-4.12.jar test -s "$(JAVA_JUNIT_JAR)" || cp $(MVN_LOCAL)/junit/junit/4.12/junit-4.12.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output $(JAVA_JUNIT_JAR) --location $(DEPS_URL)/junit-4.12.jar
test -s "$(JAVA_HAMCR_JAR)" || cp $(MVN_LOCAL)/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o $(JAVA_HAMCR_JAR) $(SEARCH_REPO_URL)org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar test -s "$(JAVA_HAMCR_JAR)" || cp $(MVN_LOCAL)/org/hamcrest/hamcrest-core/1.3/hamcrest-core-1.3.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output $(JAVA_HAMCR_JAR) --location $(DEPS_URL)/hamcrest-core-1.3.jar
test -s "$(JAVA_MOCKITO_JAR)" || cp $(MVN_LOCAL)/org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o "$(JAVA_MOCKITO_JAR)" $(SEARCH_REPO_URL)org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar test -s "$(JAVA_MOCKITO_JAR)" || cp $(MVN_LOCAL)/org/mockito/mockito-all/1.10.19/mockito-all-1.10.19.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output "$(JAVA_MOCKITO_JAR)" --location $(DEPS_URL)/mockito-all-1.10.19.jar
test -s "$(JAVA_CGLIB_JAR)" || cp $(MVN_LOCAL)/cglib/cglib/2.2.2/cglib-2.2.2.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o "$(JAVA_CGLIB_JAR)" $(SEARCH_REPO_URL)cglib/cglib/2.2.2/cglib-2.2.2.jar test -s "$(JAVA_CGLIB_JAR)" || cp $(MVN_LOCAL)/cglib/cglib/2.2.2/cglib-2.2.2.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output "$(JAVA_CGLIB_JAR)" --location $(DEPS_URL)/cglib-2.2.2.jar
test -s "$(JAVA_ASSERTJ_JAR)" || cp $(MVN_LOCAL)/org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar $(JAVA_TEST_LIBDIR) || curl -k -L -o "$(JAVA_ASSERTJ_JAR)" $(CENTRAL_REPO_URL)org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar test -s "$(JAVA_ASSERTJ_JAR)" || cp $(MVN_LOCAL)/org/assertj/assertj-core/1.7.1/assertj-core-1.7.1.jar $(JAVA_TEST_LIBDIR) || curl --fail --insecure --output "$(JAVA_ASSERTJ_JAR)" --location $(DEPS_URL)/assertj-core-1.7.1.jar
java_test: java resolve_test_deps java_test: java resolve_test_deps
$(AM_V_GEN)mkdir -p $(TEST_CLASSES) $(AM_V_GEN)mkdir -p $(TEST_CLASSES)

View File

@ -390,19 +390,10 @@ void IndexBlockIter::Seek(const Slice& target) {
if (data_ == nullptr) { // Not init yet if (data_ == nullptr) { // Not init yet
return; return;
} }
status_ = Status::OK();
uint32_t index = 0; uint32_t index = 0;
bool ok = false; bool ok = false;
if (prefix_index_) { if (prefix_index_) {
bool prefix_may_exist = true; ok = PrefixSeek(target, &index);
ok = PrefixSeek(target, &index, &prefix_may_exist);
if (!prefix_may_exist) {
// This is to let the caller to distinguish between non-existing prefix,
// and when key is larger than the last key, which both set Valid() to
// false.
current_ = restarts_;
status_ = Status::NotFound();
}
} else if (value_delta_encoded_) { } else if (value_delta_encoded_) {
ok = BinarySeek<DecodeKeyV4>(seek_key, 0, num_restarts_ - 1, &index, ok = BinarySeek<DecodeKeyV4>(seek_key, 0, num_restarts_ - 1, &index,
comparator_); comparator_);
@ -471,7 +462,6 @@ void IndexBlockIter::SeekToFirst() {
if (data_ == nullptr) { // Not init yet if (data_ == nullptr) { // Not init yet
return; return;
} }
status_ = Status::OK();
SeekToRestartPoint(0); SeekToRestartPoint(0);
ParseNextIndexKey(); ParseNextIndexKey();
} }
@ -490,7 +480,6 @@ void IndexBlockIter::SeekToLast() {
if (data_ == nullptr) { // Not init yet if (data_ == nullptr) { // Not init yet
return; return;
} }
status_ = Status::OK();
SeekToRestartPoint(num_restarts_ - 1); SeekToRestartPoint(num_restarts_ - 1);
while (ParseNextIndexKey() && NextEntryOffset() < restarts_) { while (ParseNextIndexKey() && NextEntryOffset() < restarts_) {
// Keep skipping // Keep skipping
@ -729,12 +718,8 @@ int IndexBlockIter::CompareBlockKey(uint32_t block_index, const Slice& target) {
// with a key >= target // with a key >= target
bool IndexBlockIter::BinaryBlockIndexSeek(const Slice& target, bool IndexBlockIter::BinaryBlockIndexSeek(const Slice& target,
uint32_t* block_ids, uint32_t left, uint32_t* block_ids, uint32_t left,
uint32_t right, uint32_t* index, uint32_t right, uint32_t* index) {
bool* prefix_may_exist) {
assert(left <= right); assert(left <= right);
assert(index);
assert(prefix_may_exist);
*prefix_may_exist = true;
uint32_t left_bound = left; uint32_t left_bound = left;
while (left <= right) { while (left <= right) {
@ -768,7 +753,6 @@ bool IndexBlockIter::BinaryBlockIndexSeek(const Slice& target,
(left == left_bound || block_ids[left - 1] != block_ids[left] - 1) && (left == left_bound || block_ids[left - 1] != block_ids[left] - 1) &&
CompareBlockKey(block_ids[left] - 1, target) > 0) { CompareBlockKey(block_ids[left] - 1, target) > 0) {
current_ = restarts_; current_ = restarts_;
*prefix_may_exist = false;
return false; return false;
} }
@ -776,45 +760,14 @@ bool IndexBlockIter::BinaryBlockIndexSeek(const Slice& target,
return true; return true;
} else { } else {
assert(left > right); assert(left > right);
// If the next block key is larger than seek key, it is possible that
// no key shares the prefix with `target`, or all keys with the same
// prefix as `target` are smaller than prefix. In the latter case,
// we are mandated to set the position the same as the total order.
// In the latter case, either:
// (1) `target` falls into the range of the next block. In this case,
// we can place the iterator to the next block, or
// (2) `target` is larger than all block keys. In this case we can
// keep the iterator invalidate without setting `prefix_may_exist`
// to false.
// We might sometimes end up with setting the total order position
// while there is no key sharing the prefix as `target`, but it
// still follows the contract.
uint32_t right_index = block_ids[right];
assert(right_index + 1 <= num_restarts_);
if (right_index + 1 < num_restarts_) {
if (CompareBlockKey(right_index + 1, target) >= 0) {
*index = right_index + 1;
return true;
} else {
// We have to set the flag here because we are not positioning
// the iterator to the total order position.
*prefix_may_exist = false;
}
}
// Mark iterator invalid // Mark iterator invalid
current_ = restarts_; current_ = restarts_;
return false; return false;
} }
} }
bool IndexBlockIter::PrefixSeek(const Slice& target, uint32_t* index, bool IndexBlockIter::PrefixSeek(const Slice& target, uint32_t* index) {
bool* prefix_may_exist) {
assert(index);
assert(prefix_may_exist);
assert(prefix_index_); assert(prefix_index_);
*prefix_may_exist = true;
Slice seek_key = target; Slice seek_key = target;
if (!key_includes_seq_) { if (!key_includes_seq_) {
seek_key = ExtractUserKey(target); seek_key = ExtractUserKey(target);
@ -824,12 +777,9 @@ bool IndexBlockIter::PrefixSeek(const Slice& target, uint32_t* index,
if (num_blocks == 0) { if (num_blocks == 0) {
current_ = restarts_; current_ = restarts_;
*prefix_may_exist = false;
return false; return false;
} else { } else {
assert(block_ids); return BinaryBlockIndexSeek(seek_key, block_ids, 0, num_blocks - 1, index);
return BinaryBlockIndexSeek(seek_key, block_ids, 0, num_blocks - 1, index,
prefix_may_exist);
} }
} }

View File

@ -539,13 +539,6 @@ class IndexBlockIter final : public BlockIter<IndexValue> {
} }
} }
// IndexBlockIter follows a different contract for prefix iterator
// from data iterators.
// If prefix of the seek key `target` exists in the file, it must
// return the same result as total order seek.
// If the prefix of `target` doesn't exist in the file, it can either
// return the result of total order seek, or set both of Valid() = false
// and status() = NotFound().
virtual void Seek(const Slice& target) override; virtual void Seek(const Slice& target) override;
virtual void SeekForPrev(const Slice&) override { virtual void SeekForPrev(const Slice&) override {
@ -602,16 +595,9 @@ class IndexBlockIter final : public BlockIter<IndexValue> {
std::unique_ptr<GlobalSeqnoState> global_seqno_state_; std::unique_ptr<GlobalSeqnoState> global_seqno_state_;
// Set *prefix_may_exist to false if no key possibly share the same prefix bool PrefixSeek(const Slice& target, uint32_t* index);
// as `target`. If not set, the result position should be the same as total
// order Seek.
bool PrefixSeek(const Slice& target, uint32_t* index, bool* prefix_may_exist);
// Set *prefix_may_exist to false if no key can possibly share the same
// prefix as `target`. If not set, the result position should be the same
// as total order seek.
bool BinaryBlockIndexSeek(const Slice& target, uint32_t* block_ids, bool BinaryBlockIndexSeek(const Slice& target, uint32_t* block_ids,
uint32_t left, uint32_t right, uint32_t* index, uint32_t left, uint32_t right, uint32_t* index);
bool* prefix_may_exist);
inline int CompareBlockKey(uint32_t block_index, const Slice& target); inline int CompareBlockKey(uint32_t block_index, const Slice& target);
inline int Compare(const Slice& a, const Slice& b) const { inline int Compare(const Slice& a, const Slice& b) const {

View File

@ -2455,6 +2455,7 @@ void BlockBasedTable::RetrieveMultipleBlocks(
s = rocksdb::VerifyChecksum(footer.checksum(), s = rocksdb::VerifyChecksum(footer.checksum(),
req.result.data() + req_offset, req.result.data() + req_offset,
handle.size() + 1, expected); handle.size() + 1, expected);
TEST_SYNC_POINT_CALLBACK("RetrieveMultipleBlocks:VerifyChecksum", &s);
} }
} }
@ -2901,22 +2902,12 @@ void BlockBasedTableIterator<TBlockIter, TValue>::SeekForPrev(
index_iter_->Seek(target); index_iter_->Seek(target);
if (!index_iter_->Valid()) { if (!index_iter_->Valid()) {
auto seek_status = index_iter_->status(); if (!index_iter_->status().ok()) {
// Check for IO error
if (!seek_status.IsNotFound() && !seek_status.ok()) {
ResetDataIter(); ResetDataIter();
return; return;
} }
// With prefix index, Seek() returns NotFound if the prefix doesn't exist
if (seek_status.IsNotFound()) {
// Any key less than the target is fine for prefix seek
ResetDataIter();
return;
} else {
index_iter_->SeekToLast(); index_iter_->SeekToLast();
}
// Check for IO error
if (!index_iter_->Valid()) { if (!index_iter_->Valid()) {
ResetDataIter(); ResetDataIter();
return; return;
@ -3467,7 +3458,7 @@ Status BlockBasedTable::Get(const ReadOptions& read_options, const Slice& key,
PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_full_true_positive, 1, PERF_COUNTER_BY_LEVEL_ADD(bloom_filter_full_true_positive, 1,
rep_->level); rep_->level);
} }
if (s.ok() && !iiter->status().IsNotFound()) { if (s.ok()) {
s = iiter->status(); s = iiter->status();
} }
} }

View File

@ -686,8 +686,7 @@ class BlockBasedTableIterator : public InternalIteratorBase<TValue> {
return block_iter_.value(); return block_iter_.value();
} }
Status status() const override { Status status() const override {
// Prefix index set status to NotFound when the prefix does not exist if (!index_iter_->status().ok()) {
if (!index_iter_->status().ok() && !index_iter_->status().IsNotFound()) {
return index_iter_->status(); return index_iter_->status();
} else if (block_iter_points_to_real_block_) { } else if (block_iter_points_to_real_block_) {
return block_iter_.status(); return block_iter_.status();

View File

@ -1857,8 +1857,7 @@ void TableTest::IndexTest(BlockBasedTableOptions table_options) {
auto key = prefixes[i] + "9"; auto key = prefixes[i] + "9";
index_iter->Seek(InternalKey(key, 0, kTypeValue).Encode()); index_iter->Seek(InternalKey(key, 0, kTypeValue).Encode());
ASSERT_TRUE(index_iter->status().ok() || index_iter->status().IsNotFound()); ASSERT_OK(index_iter->status());
ASSERT_TRUE(!index_iter->status().IsNotFound() || !index_iter->Valid());
if (i == prefixes.size() - 1) { if (i == prefixes.size() - 1) {
// last key // last key
ASSERT_TRUE(!index_iter->Valid()); ASSERT_TRUE(!index_iter->Valid());
@ -1885,19 +1884,6 @@ void TableTest::IndexTest(BlockBasedTableOptions table_options) {
ASSERT_TRUE(BytewiseComparator()->Compare(prefix, ukey_prefix) < 0); ASSERT_TRUE(BytewiseComparator()->Compare(prefix, ukey_prefix) < 0);
} }
} }
for (const auto& prefix : non_exist_prefixes) {
index_iter->SeekForPrev(InternalKey(prefix, 0, kTypeValue).Encode());
// regular_iter->Seek(prefix);
ASSERT_OK(index_iter->status());
// Seek to non-existing prefixes should yield either invalid, or a
// key with prefix greater than the target.
if (index_iter->Valid()) {
Slice ukey = ExtractUserKey(index_iter->key());
Slice ukey_prefix = options.prefix_extractor->Transform(ukey);
ASSERT_TRUE(BytewiseComparator()->Compare(prefix, ukey_prefix) > 0);
}
}
c.ResetTableReader(); c.ResetTableReader();
} }

View File

@ -1,8 +1,18 @@
set(TOOLS set(CORE_TOOLS
sst_dump.cc sst_dump.cc
ldb.cc)
foreach(src ${CORE_TOOLS})
get_filename_component(exename ${src} NAME_WE)
add_executable(${exename}${ARTIFACT_SUFFIX}
${src})
target_link_libraries(${exename}${ARTIFACT_SUFFIX} ${ROCKSDB_LIB})
list(APPEND core_tool_deps ${exename})
endforeach()
if(WITH_TOOLS)
set(TOOLS
db_sanity_test.cc db_sanity_test.cc
write_stress.cc write_stress.cc
ldb.cc
db_repl_stress.cc db_repl_stress.cc
dump/rocksdb_dump.cc dump/rocksdb_dump.cc
dump/rocksdb_undump.cc) dump/rocksdb_undump.cc)
@ -14,8 +24,7 @@ foreach(src ${TOOLS})
list(APPEND tool_deps ${exename}) list(APPEND tool_deps ${exename})
endforeach() endforeach()
list(APPEND tool_deps)
add_custom_target(ldb_tests add_custom_target(ldb_tests
COMMAND python ${CMAKE_CURRENT_SOURCE_DIR}/ldb_tests.py COMMAND python ${CMAKE_CURRENT_SOURCE_DIR}/ldb_tests.py
DEPENDS ldb) DEPENDS ldb)
endif()