rocksdb

Author	SHA1	Message	Date
Zhichao Cao	c810947184	Separate handling of WAL Sync io error with SST flush io error (#8049 ) Summary: In previous codebase, if WAL is used, all the retryable IO Error will be treated as hard error. So write is stalled. In this PR, the retryable IO error from WAL sync is separated from SST file flush io error. If WAL Sync is ok and retryable IO Error only happens during SST flush, the error is mapped to soft error. So user can continue insert to Memtable and append to WAL. Resolve the bug that if WAL sync fails, the memtable status does not roll back due to calling PickMemtable early than calling and checking SyncClosedLog. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8049 Test Plan: added new unit test, make check Reviewed By: anand1976 Differential Revision: D26965529 Pulled By: zhichao-cao fbshipit-source-id: f5fecb66602212523c92ee49d7edcb6065982410	2021-03-18 14:33:16 -07:00
Zhichao Cao	08ec5e7321	Add the statistics and info log for Error handler (#8050 ) Summary: Add statistics and info log for error handler: counters for bg error, bg io error, bg retryable io error, auto resume, auto resume total retry, and auto resume sucess; Histogram for auto resume retry count in each recovery call. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8050 Test Plan: make check and add test to error_handler_fs_test Reviewed By: anand1976 Differential Revision: D26990565 Pulled By: zhichao-cao fbshipit-source-id: 49f71e8ea4e9db8b189943976404205b56ab883f	2021-03-17 22:38:13 -07:00
Jay Zhuang	9df78a94f1	Disable flaky error_handler_fs_test that could hang (#7964 ) Summary: The test is hang on `95013df278/db/error_handler_fs_test.cc (L947)` Seems db.mutex_ is lock twice in the test: `cf160b98e1/db/db_impl/db_impl_compaction_flush.cc (L3208)` `0a9a05ae12/db/db_impl/db_impl.cc (L469)` As it's just a test issue, disable it for now until the test is fixed. The hang could be reproduced by: `gtest-parallel ./error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.CompactionWriteFileScopeError -r 1000` Pull Request resolved: https://github.com/facebook/rocksdb/pull/7964 Reviewed By: zhichao-cao Differential Revision: D26447325 Pulled By: jay-zhuang fbshipit-source-id: 72f6a346458e059d10e9cc3347bd6bde040cf89e	2021-02-15 09:45:23 -08:00
Zhichao Cao	95013df278	Do not set bg error for compaction in retryable IO Error case (#7899 ) Summary: When retryable IO error occurs during compaction, it is mapped to soft error and set the BG error. However, auto resume is not called to clean the soft error since compaction will reschedule by itself. In this change, When retryable IO error occurs during compaction, BG error is not set. User will be informed the error via EventHelper. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7899 Test Plan: tested with error_handler_fs_test Reviewed By: anand1976 Differential Revision: D26094097 Pulled By: zhichao-cao fbshipit-source-id: c53424f11d237405592cd762f43cbbdf8da8234f	2021-01-27 17:58:12 -08:00
Jay Zhuang	c6ff4c0b70	Fix deadlock in `fs_test.WALWriteRetryableErrorAutoRecover1` (#7897 ) Summary: The recovery thread could hold the db.mutex, which is needed from sync write in main thread. Make sure the write is done before recovery thread starts. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7897 Test Plan: `gtest-parallel ./error_handler_fs_test --gtest_filter=DBErrorHandlingFSTest.WALWriteRetryableErrorAutoRecover1 -r 10000 --workers=200` Reviewed By: zhichao-cao Differential Revision: D26082933 Pulled By: jay-zhuang fbshipit-source-id: 226fc49228c0e5903f86ff45cc3fed3080abdb1f	2021-01-26 17:02:03 -08:00
Jay Zhuang	9425acacce	Fix flaky `error_handler_fs_test.MultiDBCompactionError` (#7896 ) Summary: The error recovery thread may out-live DBImpl object, which causing access released DBImpl.mutex. Close SstFileManager before closing DB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7896 Test Plan: the issue can be reproduced by adding sleep in recovery code. Pass the tests with sleep. Reviewed By: zhichao-cao Differential Revision: D26076655 Pulled By: jay-zhuang fbshipit-source-id: 0d9cc5639c12fcfc001427015e75a9736f33cd96	2021-01-26 11:00:12 -08:00
Zhichao Cao	48c0843e69	Treat File Scope Write IO Error the same as Retryable IO Error (#7840 ) Summary: In RocksDB, when IO error happens, the flags of IOStatus can be set. If the IOStatus is set as "File Scope IO Error", it indicate that the error is constrained in the file level. Since RocksDB does not continues write data to a file when any IO Error happens, File Scope IO Error can be treated the same as Retryable IO Error. Adding the logic to ErrorHandler::SetBGError to include the file scope IO Error in its error handling logic, which is the same as retryable IO Error. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7840 Test Plan: added new unit tests in error_handler_fs_test. make check Reviewed By: anand1976 Differential Revision: D25820481 Pulled By: zhichao-cao fbshipit-source-id: 69cabd3d010073e064d6142ce1cabf341b8a6806	2021-01-07 16:31:33 -08:00
mrambacher	55e99688cc	No elide constructors (#7798 ) Summary: Added "no-elide-constructors to the ASSERT_STATUS_CHECK builds. This flag gives more errors/warnings for some of the Status checks where an inner class checks a Status and later returns it. In this case, without the elide check on, the returned status may not have been checked in the caller, thereby bypassing the checked code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7798 Reviewed By: jay-zhuang Differential Revision: D25680451 Pulled By: pdillinger fbshipit-source-id: c3f14ed9e2a13f0a8c54d839d5fb4d1fc1e93917	2020-12-23 16:55:53 -08:00
Zhichao Cao	29e8f6a698	Add kManifestWriteNoWAL to BackgroundErrorReason to handle Flush IO Error when WAL is disabled (#7693 ) Summary: In the current code base, all the manifest writes with IO error will be set with reason: BackgroundErrorReason::kManifestWrite, which will be mapped to the kHardError if the IO Error is retryable. However, if the system does not use the WAL, all the retryable IO error should be mapped to kSoftError. Create this PR to handle is special case by adding kManifestWriteNoWAL to BackgroundErrorReason. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7693 Test Plan: make check, add new testing cases to error_handler_fs_test Reviewed By: anand1976 Differential Revision: D25066204 Pulled By: zhichao-cao fbshipit-source-id: d59553896c2eac3fb37c05238544d2b265379462	2020-12-02 18:24:01 -08:00
mrambacher	f35f7f2704	Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566 ) Summary: This PR does a few things: 1. The MockFileSystem class was split out from the MockEnv. This change would theoretically allow a MockFileSystem to be used by other Environments as well (if we created a means of constructing one). The MockFileSystem implements a FileSystem in its entirety and does not rely on any Wrapper implementation. 2. Make the RocksDB test suite work when MOCK_ENV=1 and ENCRYPTED_ENV=1 are set. To accomplish this, a few things were needed: - The tests that tried to use the "wrong" environment (Env::Default() instead of env_) were updated - The MockFileSystem was changed to support the features it was missing or mishandled (such as recursively deleting files in a directory or supporting renaming of a directory). 3. Updated the test framework to have a ROCKSDB_GTEST_SKIP macro. This can be used to flag tests that are skipped. Currently, this defaults to doing nothing (marks the test as SUCCESS) but will mark the tests as SKIPPED when RocksDB is upgraded to a version of gtest that supports this (gtest-1.10). I have run a full "make check" with MEM_ENV, ENCRYPTED_ENV, both, and neither under both MacOS and RedHat. A few tests were disabled/skipped for the MEM/ENCRYPTED cases. The error_handler_fs_test fails/hangs for MEM_ENV (presumably a timing problem) and I will introduce another PR/issue to track that problem. (I will also push a change to disable those tests soon). There is one more test in DBTest2 that also fails which I need to investigate or skip before this PR is merged. Theoretically, this PR should also allow the test suite to run against an Env loaded from the registry, though I do not have one to try it with currently. Finally, once this is accepted, it would be nice if there was a CircleCI job to run these tests on a checkin so this effort does not become stale. I do not know how to do that, so if someone could write that job, it would be appreciated :) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7566 Reviewed By: zhichao-cao Differential Revision: D24408980 Pulled By: jay-zhuang fbshipit-source-id: 911b1554a4d0da06fd51feca0c090a4abdcb4a5f	2020-10-27 10:33:09 -07:00
Zhichao Cao	b7062f0b2c	Status check enforcement for error_handler_fs_test (#7342 ) Summary: Added status check enforcement for error_test_fs_test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7342 Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 error_test_fs_test Reviewed By: akankshamahajan15 Differential Revision: D23972231 Pulled By: zhichao-cao fbshipit-source-id: fa41bfe440012e0c55f2c9507c1d0104e5e93f84	2020-10-02 16:41:13 -07:00
Zhichao Cao	485fd9d9db	fix the flaky test failure (#7415 ) Summary: Fix the flaky test failure in error_handler_fs_test. Add the sync point, solve the dependency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7415 Test Plan: make asan_check, ~/gtest-parallel/gtest-parallel -r 100 ./error_handler_fs_test Reviewed By: siying Differential Revision: D23804330 Pulled By: zhichao-cao fbshipit-source-id: 5175108651f7652e47e15978f2a9c1669ef59d80	2020-09-19 17:57:54 -07:00
Zhichao Cao	c268628c25	Map retryable IO error during Flush without WAL to soft error and no switch memtable during resume (#7310 ) Summary: In the current implementation, any retryable IO error happens during Flush is mapped to a hard error. In this case, DB is stopped and write is stalled unless the background error is cleaned. In this PR, if WAL is DISABLED, the retryable IO error during FLush is mapped to a soft error. Such that, the memtable can continue receive the writes. At the same time, if auto resume is triggered, SwtichMemtable will not be called during Flush when resuming the DB to avoid to many small memtables. Testing cases are added. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7310 Test Plan: adding new unit test, pass make check. Reviewed By: anand1976 Differential Revision: D23710892 Pulled By: zhichao-cao fbshipit-source-id: bc4ca50d11c6b23b60d2c0cb171d86d542b038e9	2020-09-17 20:25:45 -07:00
anand76	18a3227b12	Add a new IOStatus subcode to indicate that writes are fenced off (#7374 ) Summary: In a distributed file system, directory ownership is enforced by fencing off the previous owner once they've been preempted by a new owner. This PR adds a IOStatus subcode for ```StatusCode::IOError``` to indicate this. Once this error is returned for a file write, the DB is put in read-only mode and not allowed to resume in read-write mode. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7374 Test Plan: Add new unit tests in ```error_handler_fs_test``` Reviewed By: riversand963 Differential Revision: D23687777 Pulled By: anand1976 fbshipit-source-id: bef948642089dc0af399057864d9a8ca339e8b2f	2020-09-14 16:04:47 -07:00
sdong	b194c21bba	Whole DBTest to skip fsync (#7274 ) Summary: After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative. This commit reinstates https://github.com/facebook/rocksdb/issues/7049, whose un-revert was lost in an automatic infrastructure mis-merge. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7274 Test Plan: Run all existing files. Reviewed By: pdillinger Differential Revision: D23177444 fbshipit-source-id: 1f61690b2ac6333c3b2c87176fef6b2cba086b33	2020-08-17 18:42:25 -07:00
Zhichao Cao	ed4712fe7e	Remove time out testing cases in error_handler_fs_test (#7141 ) Summary: Remove the 3 testing cases that cause the time out in linux build by https://github.com/facebook/rocksdb/issues/6765 . Will fix them later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7141 Test Plan: make asan_check, buck run Reviewed By: ajkr Differential Revision: D22593831 Pulled By: zhichao-cao fbshipit-source-id: 14956c36476ecc3393f613178c22e13df843126e	2020-07-17 23:27:21 -07:00
Zhichao Cao	a10f12eda1	Auto resume the DB from Retryable IO Error (#6765 ) Summary: In current codebase, in write path, if Retryable IO Error happens, SetBGError is called. The retryable IO Error is converted to hard error and DB is in read only mode. User or application needs to resume it. In this PR, if Retryable IO Error happens in one DB, SetBGError will create a new thread to call Resume (auto resume). otpions.max_bgerror_resume_count controls if auto resume is enabled or not (if max_bgerror_resume_count<=0, auto resume will not be enabled). options.bgerror_resume_retry_interval controls the time interval to call Resume again if the previous resume fails due to the Retryable IO Error. If non-retryable error happens during resume, auto resume will terminate. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6765 Test Plan: Added the unit test cases in error_handler_fs_test and pass make asan_check Reviewed By: anand1976 Differential Revision: D21916789 Pulled By: zhichao-cao fbshipit-source-id: acb8b5e5dc3167adfa9425a5b7fc104f6b95cb0b	2020-07-15 11:03:58 -07:00
mrambacher	c7c7b07f06	More Makefile Cleanup (#7097 ) Summary: Cleans up some of the dependencies on test code in the Makefile while building tools: - Moves the test::RandomString, DBBaseTest::RandomString into Random - Moves the test::RandomHumanReadableString into Random - Moves the DestroyDir method into file_utils - Moves the SetupSyncPointsToMockDirectIO into sync_point. - Moves the FaultInjection Env and FS classes under env These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies. By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated. Tested both release and debug builds via Make and CMake for both static and shared libraries. More work remains to clean up how the tools are built and remove some unnecessary dependencies. There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097 Reviewed By: riversand963 Differential Revision: D22463160 Pulled By: pdillinger fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2	2020-07-09 14:35:17 -07:00
Peter Dillinger	52d59e0c93	Revert "Whole DBTest to skip fsync (#7049 )" (#7070 ) Summary: This reverts commit `4f1534bdb0`. This commit caused failures and deadlocks in MultiThreadedDBTest.MultiThreaded/69 and others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7070 Reviewed By: riversand963 Differential Revision: D22358778 Pulled By: pdillinger fbshipit-source-id: faf8f2cb469a7063a113921c8e9c64a9f7610dac	2020-07-02 10:22:43 -07:00
sdong	4f1534bdb0	Whole DBTest to skip fsync (#7049 ) Summary: After https://github.com/facebook/rocksdb/pull/7036, we still see extra DBTest that can timeout when running 10 or 20 in parallel. Expand skip-fsync mode in whole DBTest. Still preserve other tests from doing this mode to be conservative. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7049 Test Plan: Run all existing files. Reviewed By: pdillinger Differential Revision: D22301700 fbshipit-source-id: f9a9e3b3b26ce640665a47cb8bff33ba0c89b565	2020-07-01 19:37:56 -07:00
Peter Dillinger	079e77ff9e	Revamp cache_bench to resemble a real workload (#6629 ) Summary: I suspect LRUCache could use some optimization, and to support such an effort, a good benchmarking tool is needed. The existing cache_bench was heavily skewed toward insertion and lookup misses, and did not saturate memory with other work. This change should improve those things to better resemble a real workload. (All below using clang compiler, for some consistency, but not necessarily same version and settings.) The real workload is from production MySQL on RocksDB, filtering stacks containing "LRU", "ShardedCache" or "CacheShard." Lookup inclusive: 66% Insert inclusive: 17% Release inclusive: 15% An alternate simulated workload is MySQL running a LinkBench read test: Lookup inclusive: 54% Insert inclusive: 24% Release inclusive: 21% cache_bench default settings, prior to this change: Lookup inclusive: 35.8% Insert inclusive: 63.6% Release inclusive: 0% cache_bench after this change (intended as somewhat "tighter" workload than average production, more like LinkBench): Lookup inclusive: 52% Insert inclusive: 20% Release inclusive: 26% And top exclusive stacks (portion of stack samples as filtered above): Production MySQL: LRUHandleTable::FindPointer: 25.3% rocksdb::operator==: 15.1% <-- Slice == LRUCacheShard::LRU_Remove: 13.8% ShardedCache::Lookup: 8.9% __pthread_mutex_lock: 7.1% LRUCacheShard::LRU_Insert: 6.3% MurmurHash64A: 4.8% <-- Since upgraded to XXH3p ... Old cache_bench: LRUHandleTable::FindPointer: 23.6% __pthread_mutex_lock: 15.0% __pthread_mutex_unlock_usercnt: 11.7% __lll_lock_wait: 8.6% __lll_unlock_wake: 6.8% LRUCacheShard::LRU_Insert: 6.0% ShardedCache::Lookup: 4.4% LRUCacheShard::LRU_Remove: 2.8% ... rocksdb::operator==: 0.2% <-- Slice == ... New cache_bench: LRUHandleTable::FindPointer: 22.8% __pthread_mutex_unlock_usercnt: 14.3% rocksdb::operator==: 10.5% <-- Slice == LRUCacheShard::LRU_Insert: 9.0% __pthread_mutex_lock: 5.9% LRUCacheShard::LRU_Remove: 5.0% ... ShardedCache::Lookup: 2.9% ... So there's a bit more lock contention in the benchmark than in production, but otherwise looks similar enough to me. At least it's a big improvement over the existing code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6629 Test Plan: No production code changes, ran cache_bench with ASAN Reviewed By: ltamasi Differential Revision: D20824318 Pulled By: pdillinger fbshipit-source-id: 6f8dc5891ead0f87edbed3a615ecd5289d9abe12	2020-04-03 10:26:49 -07:00
Zhichao Cao	ef088f0e93	Fix the multi-thread Manifest write dependency in error_handler_fs_test (#6637 ) Summary: In CompactionManifestWriteRetryableError in error_handler_fs_test, the manifest write of flush should pass with no fs error. After flush, fs is set to error status and the manifest write of compaction should fail due to the IO Error. Currently, the manifest write of flush is not synced with the compaction in order, which might cause manifest write fails, which will cause test failure. Fixed by adding the LoadDependency of sync-point after flush and before compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6637 Test Plan: pass error_hanlder_fs_tes. Pass make asan_check Reviewed By: anand1976 Differential Revision: D20826969 Pulled By: zhichao-cao fbshipit-source-id: fb2e702caa19bd63c82570320536b7acda870ff1	2020-04-02 18:08:46 -07:00
Cheng Chang	ee50b8d499	Be able to decrease background thread's CPU priority when creating database backup (#6602 ) Summary: When creating a database backup, the background threads will not only consume IO resources by copying files, but also consuming CPU such as by computing checksums. During peak times, the CPU consumption by the background threads might affect online queries. This PR makes it possible to decrease CPU priority of these threads when creating a new backup. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6602 Test Plan: make check Reviewed By: siying, zhichao-cao Differential Revision: D20683216 Pulled By: cheng-chang fbshipit-source-id: 9978b9ed9488e8ce135e90ca083e5b4b7221fd84	2020-03-28 19:07:25 -07:00
Zhichao Cao	4246888101	Pass IOStatus to write path and set retryable IO Error as hard error in BG jobs (#6487 ) Summary: In the current code base, we use Status to get and store the returned status from the call. Specifically, for IO related functions, the current Status cannot reflect the IO Error details such as error scope, error retryable attribute, and others. With the implementation of https://github.com/facebook/rocksdb/issues/5761, we have the new Wrapper for IO, which returns IOStatus instead of Status. However, the IOStatus is purged at the lower level of write path and transferred to Status. The first job of this PR is to pass the IOStatus to the write path (flush, WAL write, and Compaction). The second job is to identify the Retryable IO Error as HardError, and set the bg_error_ as HardError. In this case, the DB Instance becomes read only. User is informed of the Status and need to take actions to deal with it (e.g., call db->Resume()). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6487 Test Plan: Added the testing case to error_handler_fs_test. Pass make asan_check Reviewed By: anand1976 Differential Revision: D20685017 Pulled By: zhichao-cao fbshipit-source-id: ff85f042896243abcd6ef37877834e26f36b6eb0	2020-03-27 16:04:43 -07:00
anand76	a9d168cfd7	Simplify migration to FileSystem API (#6552 ) Summary: The current Env/FileSystem API separation has a couple of issues - 1. It requires the user to specify 2 options - ```Options::env``` and ```Options::file_system``` - which means they have to make code changes to benefit from the new APIs. Furthermore, there is a risk of accessing the same APIs in two different ways, through Env in the old way and through FileSystem in the new way. The two may not always match, for example, if env is ```PosixEnv``` and FileSystem is a custom implementation. Any stray RocksDB calls to env will use the ```PosixEnv``` implementation rather than the file_system implementation. 2. There needs to be a simple way for the FileSystem developer to instantiate an Env for backward compatibility purposes. This PR solves the above issues and simplifies the migration in the following ways - 1. Embed a shared_ptr to the ```FileSystem``` in the ```Env```, and remove ```Options::file_system``` as a configurable option. This way, no code changes will be required in application code to benefit from the new API. The default Env constructor uses a ```LegacyFileSystemWrapper``` as the embedded ```FileSystem```. 1a. - This also makes it more robust by ensuring that even if RocksDB has some stray calls to Env APIs rather than FileSystem, they will go through the same object and thus there is no risk of getting out of sync. 2. Provide a ```NewCompositeEnv()``` API that can be used to construct a PosixEnv with a custom FileSystem implementation. This eliminates an indirection to call Env APIs, and relieves the FileSystem developer of the burden of having to implement wrappers for the Env APIs. 3. Add a couple of missing FileSystem APIs - ```SanitizeEnvOptions()``` and ```NewLogger()``` Tests: 1. New unit tests 2. make check and make asan_check Pull Request resolved: https://github.com/facebook/rocksdb/pull/6552 Reviewed By: riversand963 Differential Revision: D20592038 Pulled By: anand1976 fbshipit-source-id: c3801ad4153f96d21d5a3ae26c92ba454d1bf1f7	2020-03-23 21:54:21 -07:00
Zhichao Cao	e62fe50634	Introduce FaultInjectionTestFS to test fault File system instead of Env (#6414 ) Summary: In the current code base, we can use FaultInjectionTestEnv to simulate the env issue such as file write/read errors, which are used in most of the test. The PR https://github.com/facebook/rocksdb/issues/5761 introduce the File System as a new Env API. This PR implement the FaultInjectionTestFS, which can be used to simulate when File System has issues such as IO error. user can specify any IOStatus error as input, such that FS corresponding actions will return certain error to the caller. A set of ErrorHandlerFSTests are introduced for testing Pull Request resolved: https://github.com/facebook/rocksdb/pull/6414 Test Plan: pass make asan_check, pass error_handler_fs_test. Differential Revision: D20252421 Pulled By: zhichao-cao fbshipit-source-id: e922038f8ce7e6d1da329fd0bba7283c4b779a21	2020-03-04 12:35:05 -08:00

26 Commits